HeadlinesBriefing favicon HeadlinesBriefing.com

Google DeepMind Unveils Gemini 2.5 Flash-Lite: A Cost-Efficient AI Model for Scaled Production

Google DeepMind Blog •
×

Gemini 2.5 Flash-Lite, previously in preview, is now stable and available for production deployment. This Google DeepMind model combines $0.10 input per 1M tokens and $0.40 output per 1M tokens, making it the lowest-cost option in the Gemini 2.5 family. With a 1 million-token context window and multimodal capabilities, it balances speed, affordability, and quality for latency-sensitive tasks like translation and classification.

Built to compete with predecessors 2.0 Flash-Lite and 2.0 Flash, the 2.5 iteration demonstrates higher benchmark scores in coding, math, and multimodal tasks. Its 45% latency reduction compared to baseline models enables real-time applications, such as satellite data processing for Satlyt and multilingual video translation for HeyGen. Audio input pricing dropped 40% from preview, further lowering operational costs.

The model supports advanced features: controllable thinking budgets, native tools like Code Execution, and Google Search grounding. Deployments like DocsHound use it to convert long videos into documentation faster, while Evertune leverages its speed for rapid analysis of AI model outputs. These use cases highlight its versatility in AI-driven automation and data synthesis.

Developers can access Gemini 2.5 Flash-Lite via Google AI Studio and Vertex AI. The preview alias will retire on August 25th, consolidating the model under its stable name. For teams prioritizing cost-effective, high-performance AI, this release marks a pivotal step in scalable, multimodal AI adoption.