HeadlinesBriefing favicon HeadlinesBriefing.com

General Instinct shrinks frontier model for edge AI

Hacker News •
×

Guanming and Bill of General Instinct announced a new effort to shrink frontier‑scale models for robotics. After years of hitting the wall that datacenter‑centric models demand massive GPUs, bandwidth and constant connectivity, they asked how much of a state‑of‑the‑art model could survive on edge hardware. Their answer shapes a toolchain aimed at real‑world machines for autonomous drones and industrial arms.

The team released InstinctRazor on GitHub, demonstrating that the 245 GB BF16 mixture‑of‑experts model Qwen3.5-122B-A10B can be compressed to a 48 GiB GGUF file. The shrunken model undercuts the size of Gemma‑4‑26B‑A4B while beating it on MMLU‑Pro and GPQA‑D benchmarks. They keep always‑active components such as the router and vision pathway, aggressively quantize routed experts, then apply on‑policy distillation to restore lost capability and reduces inference latency dramatically.

A “small GPU” mode streams experts from system RAM, keeping peak VRAM under 8 GB even with an 8k context window. The authors invite robotists and edge‑AI engineers to share which models they run locally and the biggest production bottlenecks they face. The open‑source stack now lets developers evaluate frontier performance without datacenter resources and supports batch inference on modest CPUs.