HeadlinesBriefing favicon HeadlinesBriefing.com

Google upgrades Gemini audio for live agents and translation

Google DeepMind Blog •
×

Google DeepMind unveiled an upgraded Gemini 2.5 Flash Native Audio model aimed at live voice agents. The revision sharpens function calling, improves instruction adherence and retrieves prior turn context, delivering smoother multi‑turn dialogs. Developers can now access the model through Vertex AI and the Gemini API, while Google AI Studio offers a preview interface. The update also rolls out to Search Live, letting users brainstorm ideas aloud.

Across Google products the model now powers Gemini Live, Search Live and the new live speech‑to‑speech translation beta in the Google Translate app. The translation engine handles over 70+ languages, preserving speaker intonation and filtering ambient noise, enabling two‑way conversations through headphones. Users simply plug headphones into any Android device and tap ‘Live translate’ to start streaming multilingual dialogue. Early testers report natural‑sounding output and reliable language detection.

Enterprises such as Shopify and United Wholesale Mortgage have already integrated Gemini 2.5 Flash Native Audio for customer‑service bots and loan processing, citing higher satisfaction and faster throughput. With function‑calling accuracy reaching 71.5% on ComplexFuncBench Audio and instruction adherence climbing to 90%, the model positions Google as a leader in conversational AI hardware‑agnostic solutions. Developers can fine‑tune prompts via the Gemini API Cookbook, accelerating deployment cycles.