• AiNews.com
  • Posts
  • Chain of Draft: Enhancing AI Reasoning Efficiency with Concise Prompts

Chain of Draft: Enhancing AI Reasoning Efficiency with Concise Prompts

An AI researcher sits at a desk studying two monitors that compare Chain-of-Thought and Chain of Draft prompting strategies. One screen shows a math problem solved in a long-form explanation (CoT) and a concise draft (CoD), while the other displays a performance table comparing accuracy, token usage, and latency across GPT-4o and Claude 3.5. The image illustrates how concise reasoning can boost AI efficiency.

Image Source: ChatGPT-4o

Chain of Draft: Enhancing AI Reasoning Efficiency with Concise Prompts

Researchers at Zoom Communications have introduced a new prompting strategy for large language models (LLMs) called Chain of Draft (CoD)—a method that dramatically improves reasoning efficiency by asking models to write the least amount necessary to solve a problem.

Unlike traditional chain-of-thought (CoT) prompting, which walks through each reasoning step in detail, Chain of Draft mirrors the way humans take quick notes—recording only what’s essential. That shift in style leads to significant reductions in token usage and latency, without sacrificing accuracy.

How Chain of Draft Works

CoD prompts the model to generate minimal, information-rich drafts—often just a few words or equations per step. Rather than elaborate explanations, the model might write:

“20 - 12 = 8”

...instead of a full sentence like:

“First, I subtract 12 from 20 to get 8.”

This drafting style forces the model to focus on logical steps, not verbose narration. The CoD method can also be used in few-shot prompting setups, although that’s not unique to CoD itself.

Importantly, CoD is not an iterative drafting method. It doesn’t involve multiple rounds of revision or self-correction. Instead, it emphasizes concise reasoning at each stage.

Experimental Results: Less Text, Same Accuracy

The researchers evaluated CoD across three major categories:

  • Arithmetic Reasoning

  • Commonsense Reasoning

  • Symbolic Reasoning

Key outcomes:

  • CoD achieved comparable or better accuracy than CoT in nearly all tasks.

  • It used as little as 7.6% of the tokens required by CoT—up to a 92% reduction in output length.

  • On average, token use ranged from 7.6% to 32% depending on the task.

These savings make CoD ideal for real-time systems, cost-sensitive applications, and environments with limited compute resources.

CoD vs. CoT: Measurable Gains in Accuracy, Tokens, and Latency

To validate their method, the researchers tested Chain of Draft on both GPT-4o and Claude 3.5, comparing it against standard prompting and Chain-of-Thought (CoT) across three dimensions: accuracy, token usage, and latency.

Table comparing the performance of GPT-4o and Claude 3.5 across three prompting methods—Standard, Chain-of-Thought (CoT), and Chain of Draft (CoD). Metrics include accuracy, average token usage, and response latency. CoD shows similar accuracy to CoT but with significantly fewer tokens and lower latency, highlighting its efficiency advantage.

Performance Comparison of Prompting Methods: Standard vs. CoT vs. CoD. Image Source: Perplexity AI

Key Takeaways:

  • CoD achieves over 90% accuracy on both models—slightly behind CoT, but dramatically faster and cheaper.

  • Compared to CoT, token usage drops by ~80% and latency is cut by more than half.

  • Standard prompting performs significantly worse in accuracy, showing that structured reasoning remains essential—but it doesn’t have to be long-winded.

These results reinforce the central idea behind CoD: short, focused steps can deliver nearly the same quality as full reasoning chains, with a fraction of the overhead.

When Chain of Draft Works—and When It Doesn’t

Strengths:

  • Works especially well on structured reasoning tasks where logical steps matter more than explanation.

  • Reduces both compute costs and response time—without requiring fine-tuning or changes to the model itself.

  • Can be layered with few-shot prompts for further performance gains.

Limitations:

  • CoD is less effective in zero-shot settings, where models lack prior context or examples.

  • May not work well for tasks requiring detailed, human-readable output (e.g. education, explanations, or legal summaries).

  • Underperforms with smaller models (under 3B parameters) that struggle to infer logic from minimal cues.

What This Means

Chain of Draft isn’t just a cost-saving trick—it’s a shift in how we think about AI reasoning. By pushing models to do more with less, it unlocks a path toward faster, more efficient systems that maintain high performance across complex reasoning tasks.

Looking Ahead

As LLMs scale up, prompting strategies like CoD show that bigger isn't always better—smarter is. Rather than relying on longer outputs or expensive model tuning, CoD proves that focused structure can achieve better outcomes with fewer words.

In a world obsessed with generative length, Chain of Draft makes a compelling case for generative precision.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.