HeadlinesBriefing favicon HeadlinesBriefing.com

CDLM: 14x Faster AI Language Models

Hacker News •
×

Researchers from Seoul National University, UC Berkeley, and Together AI have developed Consistency Diffusion Language Models (CDLM), achieving up to 14.5x faster inference on math and coding tasks without sacrificing quality. Unlike traditional autoregressive models, CDLM iteratively refines masked sequences, enabling parallel generation of multiple tokens per iteration. This breakthrough addresses major bottlenecks in standard diffusion language models.

CDLM tackles two key inefficiencies: KV caching incompatibility under full bidirectional attention and high refinement step counts. The model uses a post-training recipe that enables reliable fewer-step inference while supporting exact block-wise KV caching. By employing a block-wise causal mask and three-objective loss function, CDLM maintains quality at reduced step counts.

In testing, CDLM achieved the largest step reductions across benchmarks, cutting refinement steps by 4.1x–7.7x with minimal accuracy changes. These step reductions translate into significant latency improvements: 11.2x on GSM8K-CoT and 14.5x on MBPP-Instruct. The approach often delivers the highest tokens per second throughput while maintaining pass@1 quality across various tasks.