HeadlinesBriefing favicon HeadlinesBriefing.com

Local AI Inference with Gemma 4 and LM Studio CLI

Hacker News •
×

Google's Gemma 4 26B model now runs locally on modest hardware through LM Studio's new headless CLI, offering developers zero API costs and privacy without cloud dependencies. The mixture-of-experts architecture activates only 4B parameters per forward pass, enabling performance on laptops with 48GB memory at 51 tokens per second.

LM Studio v0.4.0 introduces llmster, a standalone inference engine that operates entirely from the command line. This allows running models on headless servers, in CI/CD pipelines, or via SSH sessions—perfect for developers who prefer terminal workflows over GUI interfaces.

Practical implementation shows the 26B-A4B variant achieves performance competitive with 400B+ parameter models while consuming just 17.99GB of memory. With 256K context window, vision support, and native function calling, this MoE architecture delivers exceptional capability for local inference without requiring high-end GPU clusters.