HeadlinesBriefing favicon HeadlinesBriefing.com

Subquadratic Claims Breakthrough in LLM Efficiency With Sparse Attention Model

MIT Technology Review •
×

Miami-based AI startup Subquadratic emerged from stealth claiming it solved a decade-old bottleneck in large language models. The company's Sub Q model replaces dense attention with sparse attention, a mathematical shift that could dramatically reduce computational costs. Initial skepticism ran high since they offered minimal proof, but third-party validation from Appen now supports their performance claims.

The core innovation addresses how transformers process text. Dense attention multiplies every token against every other token, creating quadratic computational growth. A 10,000-word document triggers roughly 50 million multiplications. Sparse attention selectively chooses which relationships matter, avoiding unnecessary calculations. Sub Q's dynamic selection differs from fixed-pattern approaches used previously, which struggled to match dense attention performance.

Appen's independent tests show compelling results. Sub Q ran 56 times faster than Flash Attention on speed benchmarks and scored 89.7% on Live Code Bench, matching top coding models. Most striking: Sub Q cost $8 to run RULER 128 tests versus $2,600 for Anthropic's Opus model. The model handles up to 12 million tokens—twelve times most competitors' capacity.

These claims could reshape how engineers build production LLMs if verified. The efficiency gains matter because current models consume enormous compute resources, limiting accessibility. Subquadratic aims to kick off a new age of efficiency, though they haven't fully open-sourced their approach. The startup's fate now hinges on whether independent researchers can replicate these results at scale.