HeadlinesBriefing favicon HeadlinesBriefing.com

Google Launches Gemini 3.1 Flash-Lite for Scalable AI Workloads

Google DeepMind Blog •
×

Gemini 3.1 Flash-Lite is now available in preview, offering developers and enterprises a cost-efficient, high-speed AI solution. Priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, it outperforms its predecessor, 2.5 Flash, with 2.5X faster Time to First Answer Token and 45% increased output speed, according to Artificial Analysis benchmarks. Designed for high-volume tasks like translation, content moderation, and UI generation, it balances affordability with performance.

Built for scalability, the model integrates thinking levels in AI Studio and Vertex AI, allowing developers to adjust the model’s reasoning depth for specific workflows. This adaptability makes it suitable for both simple, high-frequency operations and complex tasks requiring detailed analysis, such as creating simulations or dashboards. Early adopters like Latitude and Cartwheel praise its precision in handling intricate inputs while maintaining cost efficiency.

Benchmark results highlight its dominance: an Elo score of 1432 on Arena.ai and top-tier scores on GPQA Diamond (86.9%) and MMMU Pro (76.8%). These metrics position it as a competitive option against larger models, despite its optimized size. Google emphasizes its role in enabling real-time, responsive applications without compromising quality.

With early access rolling out via Google AI Studio and Vertex AI, Gemini 3.1 Flash-Lite addresses a critical gap in scalable AI deployment. Its combination of speed, affordability, and versatility could redefine how developers approach high-volume, real-time workloads. For now, it remains in preview, inviting further testing and feedback from the developer community.