HeadlinesBriefing favicon HeadlinesBriefing.com

Google's Gemma 4 12B runs on laptops with 16GB RAM

Ars Technica •
×

Google unveiled the Gemma 4 12B model, a 12‑billion‑parameter LLM that can run on any laptop equipped with 16GB RAM. Despite half the size of the 26‑billion‑parameter sibling, it handles multistep reasoning and agentic workflows previously reserved for larger variants. The release targets developers who want high‑end AI without cloud reliance.

A new Multi‑Token Prediction (MTP) drafter exploits idle cycles to forecast multiple future tokens, boosting speed and efficiency. Google also stripped away heavyweight encoders: vision inputs pass through a single‑matrix embedding with positional data, while raw audio maps directly onto text token vectors. The result is lower latency and memory use across all modalities.

The model is instantly testable in LM Studio, Google AI Edge Gallery and similar front‑ends, and the 18‑GB weight file is downloadable from Kaggle and Hugging Face. By fitting on consumer hardware, Gemma 4 12B democratizes access to sophisticated generative AI, pressuring cloud‑centric providers to reconsider pricing and performance trade‑offs.

Local deployment lowers data‑privacy risks and cuts operational costs for enterprises experimenting with AI assistants, code generators, or image analysis pipelines. Competitors such as Meta and Microsoft have released similar edge‑optimized models, but Google’s combined MTP and streamlined multimodal stack gives Gemma 4 12B a performance edge on modest hardware. The move signals a shift toward on‑device intelligence.