HeadlinesBriefing favicon HeadlinesBriefing.com

Moonshine STT Models Beat Whisper Large v3 on OpenASR Leaderboard

Hacker News •
×

A six-person startup with a sub-$100k monthly GPU budget has released Moonshine, a family of open-weights speech-to-text models that outperform OpenAI's Whisper Large v3 on the Hugging Face OpenASR leaderboard. The team trained these models from scratch to address specific limitations they encountered while building real-time voice applications. Moonshine offers streaming capabilities with lower word-error rates than Whisper's largest model, even competing against Nvidia's Parakeet family.

Unlike Whisper's fixed 30-second input window, Moonshine models process audio incrementally, reducing latency for live applications. The models cache previous computations, eliminating redundant processing when transcribing ongoing speech. Moonshine also demonstrates superior multilingual performance, addressing Whisper's poor support for many languages. The framework runs on-device across platforms including iOS, Android, Linux, and Raspberry Pi, with models ranging from a 26MB tiny version to a 245 million parameter medium model.

The company provides high-level APIs for common tasks like transcription, speaker identification, and command recognition. With benchmarks showing Moonshine Medium Streaming achieving 6.65% WER compared to Whisper Large v3's 7.44%, the startup positions its technology as ideal for developers building responsive voice interfaces that require low latency and strong accuracy across multiple languages.