HeadlinesBriefing favicon HeadlinesBriefing.com

Step 3.7 Flash Raises Agent Coding to Claude Opus‑Level Efficiency

Hacker News •
×

Step 3.7 Flash, the latest release from Step, pushes agent efficiency forward. The model combines native multimodal understanding with robust tool orchestration, enabling it to parse product UIs, documents, charts, and natural scenes before writing code or calling APIs. Benchmarks show the system matches larger models while staying within Flash‑tier latency for developers daily tasks.

On coding tests, Step 3.7 Flash lifts performance by 5–6% over its predecessor, reaching 67.08% average on Step‑SWE‑Bench. With Advisor Mode, the model attains 97% of Claude Opus 4.6’s coding score at just $0.19 per task versus $1.76. This cost gap makes high‑quality autonomous code production affordable for enterprise customers and developers at scale today.

Search capabilities receive a boost, too. Step 3.7 Flash scores 47.20% on HLE with tools and 92.82% F1 on DeepSearchQA, matching models with ten times the parameter count. Visual search identifies long‑tail entities that other systems miss, broadening the agent’s knowledge base without enlarging the model for real‑world applications across domains today and future tasks.

Integration costs stay low because the model works natively with mainstream harnesses like Claude Code, Hermes Agent, and OpenClaw. Its balanced performance across diverse tool stacks means developers can drop Step 3.7 Flash into existing workflows without rewiring prompts, ensuring reliable, coherent runs in production environments for scalable automation in today's software ecosystem and maintenance.