HeadlinesBriefing favicon HeadlinesBriefing.com

Xiaomi MiMo-V2.5-Pro-UltraSpeed Hits 1000 Tokens/Second

Hacker News •
×

Xiaomi has released MiMo-V2.5-Pro-UltraSpeed, a 1-trillion-parameter model achieving 1000 tokens per second generation speed in collaboration with TileRT. This breakthrough marks the first time a trillion-parameter model has crossed the 1000 tps threshold, transforming AI from a waiting tool into real-time thinking extension.

The model launches with a limited-time API priced at 3x standard MiMo-V2.5 but delivering roughly 10x speed. Available June 9-23, 2026, access requires application approval prioritizing professional developers. Beyond speed, this enables parallel reasoning paths, real-time coding agents, and millisecond decision loops for trading and medical applications where seconds matter.

Achieving this required extreme model-system codesign. FP4 quantization targets bandwidth bottlenecks on commodity GPUs, while DFlash speculative decoding uses block-level masked parallel prediction to eliminate serial constraints. The draft model employs Sliding Window Attention, reducing per-prediction compute from linear to constant. This collaboration between Xiaomi's model team and TileRT's system team proves trillion-scale speed is possible without specialized hardware.

On standard 8-GPU nodes, the system delivers 1000+ tokens/second from a 1T model. This isn't just faster output—it's a paradigm shift enabling real-time AI in trading, fraud detection, and surgical assistance where inference speed directly impacts outcomes.