HeadlinesBriefing favicon HeadlinesBriefing.com

Microsoft's BitNet.cpp Boosts 1-bit LLM Inference by 6x

Hacker News •
×

Microsoft has released bitnet.cpp, the official inference framework for 1-bit large language models like BitNet b1.58. The framework delivers 1.37x to 6.17x speedups on CPUs compared to traditional inference methods, with ARM CPUs seeing gains from 1.37x to 5.07x and x86 CPUs achieving 2.37x to 6.17x improvements. Energy consumption drops by 55.4% to 82.2% across different architectures.

Built on llama.cpp and optimized with parallel kernel implementations, bitnet.cpp supports fast and lossless inference of ternary models. The framework can run a 100B parameter BitNet model on a single CPU at 5-7 tokens per second, matching human reading speed. Recent optimizations add configurable tiling and embedding quantization support, providing an additional 1.15x to 2.1x speedup over the original implementation.

Available on GitHub, bitnet.cpp supports multiple model formats and offers installation for Windows, Linux, and macOS. The framework includes a demo running on Apple M2 hardware and provides comprehensive benchmarking tools. Microsoft's release aims to democratize access to large language models by enabling efficient local inference without specialized hardware.