HeadlinesBriefing favicon HeadlinesBriefing.com

Unsloth Dynamic 2.0 GGUFs: Major Upgrade for LLM Quantization

Hacker News •
×

Unsloth has released Dynamic 2.0 GGUFs, a significant upgrade to their quantization method that outperforms leading techniques on 5-shot MMLU and KL Divergence benchmarks. This new approach allows developers to run and fine-tune quantized LLMs while preserving accuracy, working across all models including MoE and non-MoE architectures. The system now intelligently adjusts quantization for every layer based on the specific model.

A key innovation is the revamped layer selection process, which uses a custom calibration dataset of over 1.5M tokens to enhance conversational performance. The team also developed a new efficiency metric that balances MMLU accuracy against disk space usage, addressing the common issue where random guessing achieves 25% accuracy on multiple-choice tests. This provides a more meaningful comparison between quantized and full-precision models.

For Gemma 3 specifically, Dynamic 2.0 achieves 67.07% accuracy on 5-shot MMLU compared to 67.15% for full bfloat16, while being 2GB smaller than QAT versions. The team also resolved critical bugs in Llama 4's RoPE scaling and QK Norm implementation, improving MMLU Pro accuracy from 68.58% to 71.53%. All future GGUF uploads will utilize Dynamic 2.0, with Dynamic 4-bit safe tensor quants set to benefit from these improvements.