HeadlinesBriefing favicon HeadlinesBriefing.com

Day‑0 Open‑Source Stack Enables DeepSeek‑V4 Inference and RL

Hacker News •
×

The open‑source community got Day‑0 support for DeepSeek‑V4, a 1.6‑trillion‑parameter model with a 284‑billion‑parameter Flash variant. Developers can now run inference and reinforcement‑learning (RL) training using the newly released SGLang and Miles stack, which includes custom kernels, prefix caching and speculative decoding tuned for the model’s hybrid sparse‑attention and manifold‑constrained hyper‑connections (mHC). and supports both Nvidia and AMD accelerators.

Key optimizations keep throughput near the model’s 1 M‑token context window. ShadowRadix introduces a radix‑tree index that unifies three KV pools, allowing independent lifetimes for sliding‑window and compressed slots, so a 10k‑token request stores only 128 live tokens plus shared compressed KV. HiSparse offloads inactive C4 cache to CPU memory, tripling long‑context throughput on Hopper and Blackwell GPUs during real‑world workloads.

Training pipelines exploit full parallelism across data, tensor, pipeline and expert dimensions, while FP4 expert weights enable efficient MoE inference on the latest Blackwell silicon. Benchmarks show decode speed dropping less than 10 % from 4K to 900K tokens, with B200 GPUs moving from 199 to 180 tokens per second. The stack delivers significantly production‑ready serving and RL for FP4‑enabled models today.