HeadlinesBriefing favicon HeadlinesBriefing.com

Orthrus Framework: Dual-View LLM Invention

Hacker News •
×

Researchers have unveiled Orthrus, a dual-architecture framework that combines autoregressive Large Language Models with diffusion models to achieve both exact generation fidelity and parallel token generation. The breakthrough addresses the sequential bottleneck in standard autoregressive decoding, enabling faster inference while maintaining the original model's predictive distribution.

Orthrus delivers impressive performance metrics, offering up to 5.36× speedup on generation tasks while using only 16% of total model parameters for fine-tuning. The framework maintains zero redundant memory overhead with O(1) cache overhead by sharing the exact same Key-Value cache across dual views, making it highly efficient for resource-constrained environments.

When compared to existing methods, Orthrus outperforms speculative decoding approaches like EAGLE-3 and DFlash, achieving higher token acceptance rates and faster inference times. Unlike diffusion language models that suffer from conditional drift, Orthrus maintains strict accuracy on complex reasoning tasks, establishing a new state-of-the-art for parallel generation fidelity without compromising output quality.