HeadlinesBriefing favicon HeadlinesBriefing.com

Sub Q 1.1 Small Cuts Long‑Context Attention by 64x

Hacker News •
×

Sub Q 1.1 Small arrives as a response to enterprise AI’s long‑context ceiling. The model replaces quadratic dense attention with SSA, a learned sparse mechanism that scales linearly. By compressing attention to 0.13 % of relationships, the system can process entire codebases or legal documents in a single pass.

Benchmarking shows near‑perfect retrieval on the Needle‑in‑a‑Haystack test up to 12 M tokens, with 64.5x less compute than dense attention at 1 M tokens and 56x faster than Flash Attention‑2. The model retains strong performance across knowledge, coding, and finance benchmarks, staying close to frontier models.

Training began with an open‑weight model, swapped dense attention for SSA, and extended context through staged increments—262 K, 512 K, 1 M, 2 M—followed by one trillion tokens of long‑artifact pretraining. This loop enabled hundreds of experiments, pushing the system to generalize reliably across 12 M‑token contexts.

Practical use cases span finance, legal, and software engineering, where reasoning over full documents or repositories outweighs summarization. Sub Q’s ability to load an entire codebase or contract into a single context window unlocks architecture‑level reasoning and cross‑file refactoring in one pass, delivering measurable gains in efficiency and accuracy.