HeadlinesBriefing favicon HeadlinesBriefing.com

whichllm Ranks Local LLMs by Real Performance, Not Just Size

Hacker News •
×

GitHub user Andyyyy64 released whichllm, a command-line tool that identifies the best local language models for your specific hardware configuration. Unlike traditional model selectors that simply match parameter counts to VRAM, whichllm evaluates actual performance across multiple benchmarks including LiveBench, Artificial Analysis, and Chatbot Arena ELO scores.

The tool automatically detects NVIDIA, AMD, and Apple Silicon GPUs along with CPU specifications, then ranks HuggingFace models by a composite score factoring benchmark quality, quantization penalties, and evidence confidence. This solves a critical problem: many users waste time downloading models that technically fit their hardware but perform poorly in practice.

Each recommendation includes real-world tokens-per-second estimates and accounts for architectural differences like MoE active parameters versus total parameters. The scoring system actively rejects fabricated claims and cross-family score inheritance, ensuring users get honest performance predictions rather than marketing-driven rankings.

Built with a single-command interface, whichllm enables instant chat sessions with downloaded models via isolated environments. The project emphasizes scriptability with JSON output and GPU simulation capabilities for hardware planning, making it valuable for developers evaluating local AI deployment options.