HeadlinesBriefing favicon HeadlinesBriefing.com

DeepMind’s Gemini Robotics 1.5 Brings Thinking AI to Physical Robots

Google DeepMind Blog •
×

Google’s DeepMind team unveiled Gemini Robotics 1.5, a vision‑language‑action model that lets robots translate visual cues and spoken instructions into precise motor commands. The system can think through a task before acting, explaining its reasoning in natural language. This capability opens the door to reliable, multi‑step automation in everyday settings for industry and home with unprecedented flexibility in every environment today.

Complementing the action model, Gemini Robotics‑ER 1.5 serves as a high‑level planner that reasons about space, calls digital tools like Google Search, and drafts multi‑step plans. Trained on 15 academic benchmarks, it tops state‑of‑the‑art scores on Point‑Bench, RefSpatial, and RoboSpatial‑VQA, proving its grasp of embodied reasoning across diverse robotics scenarios, enabling seamless task execution today.

Developers can now access Gemini Robotics‑ER 1.5 through the Gemini API in Google AI Studio, while select partners experiment with the full 1.5 stack. By learning motions across different robot embodiments, the system accelerates skill acquisition and reduces the need for model retraining when deploying on new hardware and ensures consistent performance across platforms today.

Safety remains a priority; the team released an upgraded ASIMOV benchmark to test semantic safety, adding new video modalities and tail‑coverage questions. Early trials show the models obey collision‑avoidance sub‑systems and maintain respectful human dialogue, laying groundwork for responsible deployment in public spaces at scale.