HeadlinesBriefing favicon HeadlinesBriefing.com

Running a Local AI on a 24‑GB MacBook Pro

Hacker News •
×

Running a local language model on a 24‑GB MacBook Pro has moved from experiment to practical. The author settles on Qwen 3.5‑9B Q4, running through LM Studio, achieving about 40 tokens per second with a 128 K context window. No internet is required, trimming reliance on large‑tech APIs. The setup also leaves room for other Electron apps and everyday workflow and additional tools.

Choosing a runner—Ollama, llama.cpp, or LM Studio—brings distinct quirks, while model selection hinges on memory fit and context size. Earlier trials with Qwen 3.6 Q3, GPT‑OSS 20B, and Devstral 24B failed under 24 GB, leaving Gemma 4B as the only viable option until Qwen 3.5‑9B proved stable. The author tunes temperature to 0.6, top_p to 0.95, and top_k to 20 while maintaining a 128K context window.

The author integrates the model into Pi and OpenCode harnesses. In Pi, a JSON snippet enables the Qwen model and hides distracting thinking blocks; OpenCode’s config exposes tool use and a 131 072‑token length. Interaction remains step‑by‑step: the model can suggest edits, resolve git conflicts, or rewrite Elixir tests, but users must guide it closely to achieve accurate results consistently.

While the local setup cannot match a SOTA model’s autonomous problem solving, it offers a tangible trade‑off: full control, zero latency, and privacy. For developers seeking a lightweight, no‑internet assistant that still handles coding and debugging tasks, Qwen 3.5‑9B on a MacBook Pro proves a viable, production‑ready alternative without relying on cloud services, it delivers consistent performance across typical development workflows and reduces operational costs for small teams today.