Doubleword blog

01January 22, 2026/Jamie Dborin

QueueSpec: Drafting While You Wait

Speculative decoding speeds up LLM generation by letting a system propose several “draft” tokens at once, and then having the target model verify them in a single forward pass. The usual question is: where do we get good drafts cheaply? In this post, we explore queue speculation (QueueSpec): draft tokens come from a smaller model that runs while a request is queuing, so verification can start immediately once the request is serviced. At doubleword we use speculative decoding techniques like this and other throughput-specific optimizations to deliver cheaper inference at scale, by sacrificing end to end latency. If you want to get started with some free credits sign up here: Doubleword Platform

02January 22, 2026/Meryem Arik

Price Reduction for Qwen3-235B on Doubleword

Today we’re reducing the price of our highest-intelligence model, Qwen3-235B-A22B-Instruct.

03January 16, 2026/Fergus Finn

Scaling Curation with LLM Comparisons

Building a content discovery system using parallel primitives and BST-based ranking with LLM comparisons

04January 13, 2026/Fergus Finn

LLM powered data structures: A lock-free binary search tree

A lock-free binary search tree optimized for expensive async comparisons, with threaded linked list for O(1) sorted iteration

05January 8, 2026/Jamie Dborin

ZeroDP: Just-In-Time Weight Offloading over NVLink for Data Parallelism

High throughput inference of LLMs using JIT weight offloading to optimize KV Cache.

06January 1, 2026/Fergus Finn

Large-Scale Semantic Search Without Embeddings

Applying parallel primitives to search and rank 2.4 million arXiv papers using LLM judgments

07December 31, 2025/Fergus Finn

Parallel Primitives for Multi-Agent Workflows

Exploring coordination patterns from parallel computing for multi-agent LLM systems

08December 15, 2025/Amanda Milberg

$1 for a Year of Research Digests. That's Less Than a Coffee.

Researchers face an impossible task in staying up to date within their field. In AI and Machine Learning alone, arXiv publishes 50-100 new papers daily. Multiply that across computer science, physics, biology, and other domains, and you're looking at hundreds of potentially relevant papers flooding in every single day.

09December 8, 2025/Amanda Milberg

Why Batch Inference Matters: Moving from AI Assistants to Autonomous Agents

The initial wave of Generative AI adoption focused on augmenting human work - chatbots that help developers write cleaner code, assistants that polish our emails, or tools that speed up content creation. These productivity enhancements have proven their value tenfold, as almost every individual has a version of ChatGPT open to assist them during their day. But they represent just the beginning of what's possible with AI.

10December 3, 2025/Jamie Dborin

Doubleword