HeadlinesBriefing favicon HeadlinesBriefing.com

Gemma 2B Beats GPT‑3.5 Turbo on MT‑Bench—Runs on a Laptop CPU

Hacker News •
×

Gemma 4 E2B‑it, a 2‑billion‑parameter open‑source model, scored ~8.0 on MT‑Bench, matching GPT‑3.5 Turbo’s 7.94 while running on a 4‑core laptop CPU. The benchmark used 80 open‑ended questions across diverse domains, proving that a 4 GB download can rival a 175‑billion‑parameter paid API.

The team identified seven reproducible failure classes—arithmetic slips, logic missteps, constraint drift, persona breaks, and qualifier ignores—each fixable with about 60 lines of Python. Adding these guardrails lifted the score to ~8.2, surpassing GPT‑3.5 Turbo on key question types.

Running Gemma locally eliminates GPU costs and vendor lock‑in. No API key, no monthly fee, and the model stays on the version downloaded. Latency averages 30–60 seconds per response, but the trade‑off is full data ownership and zero recurring expense.

Open weights, a public tape of every turn, and a free Telegram bot allow developers to audit and extend the system—demonstrating that compute limits can be overcome with focused engineering rather than scale alone.