HeadlinesBriefing favicon HeadlinesBriefing.com

Rust Voxtral Mini 4B speech recognition now runs in browsers

Hacker News: Front Page •
×

Rust implementation of Mistral's Voxtral Mini 4B realtime speech recognition model now operates directly within web browsers. This client-side solution leverages WebGPU via WASM to overcome browser memory and allocation limits. The project addresses constraints like the 2GB allocation cap and 4GB address space by using sharded loading and two-phase model loading. Q4 quantization enables the 2.5GB model to run efficiently, though padding workarounds were necessary to prevent decoder errors with mic recordings. Burn ML framework powers the native backend, while cubecl-wgpu patches enable workgroup limits for WebGPU.

The architecture processes audio into mel spectrograms, uses a causal encoder, and employs an adapter layer before autoregressive decoding. Native CLI and browser demo options are available, with benchmarks forthcoming. This development marks a significant step towards running large language models directly in user browsers without server dependencies.