HeadlinesBriefing favicon HeadlinesBriefing.com

TurboQuant lets Gemma 4 E2B draw diagrams in the browser

Hacker News •
×

TurboQuant turns Gemma 4 E2B outputs into Excalidraw drawings entirely in the browser. The LLM emits a ~50‑token code string instead of the usual 5,000‑token JSON, letting Chrome 134+ render diagrams on the fly.

The core TurboQuant algorithm blends polar coordinates with QJL compression, shrinking the KV cache by roughly 2.4×. This lets longer conversations fit into GPU memory, but the feature needs WebGPU subgroups—Safari/iOS are unsupported—and about 3 GB RAM, which most mobile browsers lack.

A WGSL compute‑shader reimplementation runs on the GPU at 30+ tokens per second, while a sibling turboquant‑wasm npm package offers the same logic in WASM+SIMD for CPU‑side vector search. The result is a lightweight, browser‑based tool that lets developers generate complex diagrams without leaving the page.