HeadlinesBriefing favicon HeadlinesBriefing.com

Xiaomi slashes MiMo‑V2.5 API fees up to 99%

Hacker News •
×

Xiaomi unveiled a permanent price cut for its MiMo‑V2.5 series API, slashing fees by as much as 99% and removing input‑length tiers. The adjustment takes effect at midnight Beijing time on May 27, 2026, with global synchronization. Developers can now access the same model capacity for a fraction of the previous cost in production environments.

Alongside the price cut, Xiaomi overhauled its TokenPlan billing. Usage limits rise 5‑8 times without extra charge, and all active quotas reset at the same midnight. The new rules promise clearer, “what you see is what you get” billing, eliminating hidden fees and aligning costs with actual token consumption.

The Quadrillion Token Creator Incentive, launched April 28, distributed its full 100 trillion tokens ahead of schedule, ending on May 26. Participants, including Apache Software Foundation members, retain long‑term welfare benefits. Expired TokenPlan users will receive surprise gifts announced next week, underscoring Xiaomi’s effort to retain developers across the ecosystem.

Behind the pricing shift, Xiaomi’s engineering team refined the inference stack with Sliding Window Attention on SGLang HiCache, cutting KV‑cache traffic to roughly one‑seventh and boosting cacheable tokens fivefold. Parallelism tweaks and input‑length bucketing further lift throughput while trimming per‑token costs, delivering enterprise‑grade efficiency that could spur broader AI infrastructure adoption.