HeadlinesBriefing favicon HeadlinesBriefing.com

llmfit Tool Finds Best LLM Models for Your Hardware

Hacker News •
×

A new terminal tool called llmfit helps developers find which large language models will actually run on their hardware. The tool detects system specs including RAM, CPU cores, and GPU memory, then scores hundreds of models from providers like HuggingFace based on quality, speed, and fit. It supports multi-GPU setups, MoE architectures, and dynamic quantization selection.

llmfit ships with both an interactive TUI and classic CLI mode, automatically detecting acceleration backends like CUDA, Metal, and ROCm for accurate speed estimates. The tool includes a unique plan mode that estimates required hardware for specific models rather than just finding what fits. Users can override GPU memory detection manually when autodetection fails, and cap context lengths for memory estimation.

Installation is straightforward via curl script or Homebrew on macOS/Linux, with Windows support through Cargo. The tool's database includes hundreds of models with memory requirements computed across quantization hierarchies from Q8_0 to Q2_K. MoE models like Mixtral are handled specially since only a subset of experts activate per token. The project also includes sympozium for managing agents in Kubernetes, making it part of a broader ecosystem for LLM deployment optimization.