HeadlinesBriefing favicon HeadlinesBriefing.com

Google launches Gemini 3.1 Flash TTS with audio tags

Google DeepMind Blog •
×

Google's DeepMind team unveiled Gemini 3.1 Flash TTS, a text‑to‑speech model that promises more natural, expressive output. The system adds granular audio tags, letting developers steer vocal style, pacing, and delivery with simple inline commands. Available today in preview through the Gemini API, Google AI Studio, Vertex AI, and Google Vids, the model supports over 70 languages, with low‑latency inference for real‑time apps.

On the Artificial Analysis TTS leaderboard the model recorded an Elo score of 1,211, placing it in the “most attractive quadrant” for high quality and low cost. Multi‑speaker dialogue runs natively, and audio tags enable scene direction and speaker‑level specificity, which can be exported as Gemini API code for consistent voice branding across projects.

All generated audio carries an imperceptible SynthID watermark, allowing reliable detection of AI‑created speech and helping curb misinformation. Early testers report that the new controls turn plain text into high‑fidelity performances, opening possibilities for localized, character‑driven experiences at scale. Developers can now prototype immersive audio without sacrificing cost efficiency. The model also integrates with existing Google Cloud pipelines, simplifying deployment for large enterprises.