HeadlinesBriefing favicon HeadlinesBriefing.com

LLM Architecture Gallery: 2024-2025 Model Comparison

Hacker News •
×

This comprehensive gallery catalogs 33 major language models from 2024-2025, showcasing the evolution of transformer architectures. From Llama 3's dense 8B design to DeepSeek V3's 671B MoE breakthrough, the collection reveals how attention mechanisms, normalization choices, and parameter scaling have shaped modern AI development.

The timeline reveals distinct architectural waves: early 2024 models like OLMo 2 experimented with post-norm layouts, while late 2024 saw DeepSeek V3 popularize the dense-prefix MoE pattern. 2025 brought specialization with models like Gemma 3's aggressive local attention, GPT-OSS's alternating sliding-window approach, and Kimi Linear's hybrid linear-attention design.

Notable innovations include MiniMax M2's per-layer QK-Norm, Qwen3 Next's DeltaNet hybrid, and DeepSeek V3.2's sparse attention for long-context efficiency. The gallery particularly highlights how MoE scaling (from 37B active parameters in DeepSeek V3 to 3B in Qwen3 Next) enables massive models while maintaining practical inference costs.