HeadlinesBriefing favicon HeadlinesBriefing.com

Descript's AI Translation Pipeline Solves Dubbing's Timing Problem

OpenAI Blog •
×

Descript has rebuilt its translation pipeline to solve a fundamental problem in AI dubbing: translated speech often sounds unnatural because different languages take different amounts of time to express the same idea. The video editing platform redesigned its system using OpenAI's reasoning models to optimize for both semantic meaning and natural pacing during generation, not after.

Traditional dubbing workflows forced creators to either manually retime audio or rewrite translations to fit video segments. German, for example, typically requires more syllables than English for the same content, leading to audio that sounded sped up or slowed down. Early systems achieved only 40-60% of segments within acceptable pacing windows. Descript's new approach breaks transcripts into chunks, calculates target syllable counts using language-specific speaking rates, and prompts models to optimize for both timing and meaning simultaneously.

The results are measurable: duration adherence improved by 13-43 percentage points across languages, with 73-83% of segments now falling within natural pacing ranges. Semantic fidelity remained strong at 85.5% of segments rated four or five out of five. The company is now building batch processing capabilities for enterprise localization projects and exploring multimodal approaches that incorporate audio and video signals to preserve tone and emphasis in translations.