HeadlinesBriefing favicon HeadlinesBriefing.com

Apple's AI speech breakthrough speeds text-to-speech

9to5Mac •
×

Apple and Tel-Aviv University researchers have developed a method to accelerate AI-based text-to-speech generation without compromising clarity. This innovation focuses on autoregressive speech models, which generate audio tokens sequentially. The researchers identified that these models often reject predictions that are acoustically or semantically interchangeable, creating a processing bottleneck.

The solution, called Principled Coarse-Graining (PCG), groups similar-sounding speech tokens, allowing for a more flexible verification process. This approach uses two models: a smaller proposer and a larger judge, which checks if tokens fall within the correct acoustic group. The result is a 40% increase in speech generation speed, maintaining a 4.09 naturalness score and low word error rates.

PCG requires minimal additional resources, making it practical for deployment on devices with limited memory. This advancement could significantly enhance future voice features in Apple products, offering a balance of speed, quality, and efficiency. As voice assistants and AI-driven speech applications gain popularity, Apple's new method positions the company at the forefront of speech synthesis technology.