HeadlinesBriefing favicon HeadlinesBriefing.com

Ollama Gemma 4 26B Setup Guide for Apple Silicon Mac Mini

Hacker News •
×

Ollama users can now run Gemma 4 26B on Apple Silicon Mac minis with 24GB RAM via a streamlined setup process. Apple's MLX backend enables GPU acceleration without manual configuration, leveraging M1/M2/M3/M4/M5 chips for optimized performance. The model loads in ~17GB storage and uses ~20GB memory when active, requiring users to close memory-heavy apps to avoid system strain.

Step-by-step automation ensures the model stays preloaded and responsive. A custom launch agent keeps Gemma 4 warm with periodic prompts, while the OLLAMA_KEEP_ALIVE=-1 environment variable prevents unloading after inactivity. Users can verify GPU usage via `ollama ps`, which typically shows 86% GPU utilization during inference.

Local API access at `http://localhost:11434` supports OpenAI-compatible coding agents, enabling seamless integration with tools like Claude Code. Caching improvements reduce memory usage across conversations, while intelligent checkpoints accelerate responses by preserving shared prefixes.

Why this matters: This setup democratizes access to advanced AI models for developers using compact Apple hardware. Google DeepMind's Gemma 4 26B now offers near-production performance on consumer-grade machines, accelerating prototyping for AI-driven applications. The MLX framework's efficiency on Apple Silicon sets a benchmark for on-device large language model execution.