HeadlinesBriefing favicon HeadlinesBriefing.com

DeepSeek V4 Pro and Flash: New Open‑Weight Models Hit Record Prices

Hacker News •
×

Chinese AI lab DeepSeek has released the first two preview models of its V4 series, DeepSeek‑V4‑Pro and DeepSeek‑V4‑Flash. Both maintain a one‑million‑token context using a Mixture‑of‑Experts architecture. Pro clocks 1.6 trillion total parameters with 49 billion active, while Flash runs 284 billion total and 13 billion active. The models ship under the MIT license.

DeepSeek‑V4‑Pro tops the market for open‑weight models, surpassing Kimi K2.6’s 1.1 trillion and GLM‑5.1’s 754 billion. With 865 GB on Hugging Face, Pro dwarfs its predecessor, V3.2’s 685 GB. Flash, at 160 GB, offers a lightweight alternative that may run on a 128 GB MacBook Pro after light quantization and still leverages a highly efficient Mixture‑of‑Experts framework that boosts performance while maintaining low energy.

Pricing reflects DeepSeek’s emphasis on efficiency. Flash charges $0.14 per million input tokens and $0.28 per million output tokens, while Pro runs at $1.74 and $3.48 respectively. Compared with Gemini, OpenAI, and Anthropic, Flash is the cheapest small model, and Pro is the most affordable among larger frontier offerings for developers needing high throughput at low cost.

DeepSeek cites a 27% reduction in single‑token FLOPs and a 10% cut in KV cache size for the 1M‑token setting relative to V3.2, while Flash achieves only 10% FLOPs and 7% KV. Benchmarks show V4‑Pro‑Max outperforms GPT‑5.2 and Gemini‑3.0‑Pro but trails GPT‑5.4 by about three months, making it a competitive choice for research and production teams.