HeadlinesBriefing favicon HeadlinesBriefing.com

DeepSeek-V4: 1M Token Context Models Challenge GPT-5

Hacker News •
×

DeepSeek has released its V4 series of Mixture-of-Experts language models, featuring DeepSeek-V4-Pro with 1.6 trillion parameters (49 billion activated) and DeepSeek-V4-Flash with 284 billion parameters (13 billion activated). Both models support a context length of one million tokens, making them among the most capable open-source options for long-context tasks and positioning DeepSeek as a leader in efficient large language model development.

The V4 series introduces a hybrid attention mechanism combining Compressed Sparse Attention and Heavily Compressed Attention. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2, representing a major efficiency breakthrough. The models also incorporate Manifold-Constrained Hyper-Connections to enhance signal propagation stability and use the Muon optimizer for faster convergence.

DeepSeek-V4-Pro-Max achieves top-tier performance in coding benchmarks and narrows the gap with leading closed-source models on reasoning and agentic tasks. The models were pre-trained on over 32 trillion diverse, high-quality tokens and underwent a comprehensive post-training pipeline. They support three reasoning effort modes: Non-think for fast responses, Think High for complex problem-solving, and Think Max for maximum reasoning capability.