HeadlinesBriefing favicon HeadlinesBriefing.com

erm CLI Tool Removes Disfluencies From Speech Audio Automatically

Hacker News •
×

Whisper-powered CLI tool erm automatically removes ums, uhs, and erms from voice recordings. The utility processes audio files locally, outputting cleaned .wav files alongside JSON cut lists for precise editing. Unlike manual cleanup, it handles the tedious work of detecting and excising disfluencies without sending recordings to cloud services.

The naive approach fails because Whisper omits many fillers from transcripts, arbitrary cuts create audible clicks, and mismatched background hiss creates perceptible shifts. Creator built erm to solve these three problems through multiple detection passes: word-level analysis, gap fillers, embedded fillers, and abnormally long words. It also uses faster-whisper for better performance than vanilla Whisper.

Cut points slide up to 60ms to find quiet moments, then snap to zero-crossings to eliminate waveform steps. Crossfades scale to cut length rather than using fixed durations. Room tone loops mask background mismatches across splices. Four denoising modes balance detection accuracy against clean output, with hybrid mode as the sensible default.

The validate subcommand confirms output integrity through three checks: file opens, duration reduction matches cuts, and no fillers reappear in re-transcription. It intentionally preserves discourse markers like 'like' and 'you know' since these serve linguistic function beyond simple disfluency.