HeadlinesBriefing favicon HeadlinesBriefing.com

Anthropic Models Struggle With Tool Calling Schemas

Hacker News •
×

Newer Anthropic models including Opus 4.8 and Sonnet 5 are failing at tool calling schemas that older siblings handled cleanly. When invoking Pi's edit tool, these frontier models invent extra fields in the nested edits array — keys like requireUnique, oldText2, or even event.0.additionalProperties — despite producing byte-correct edit payloads. The harness rejects the malformed calls, forcing retries that older models never needed.

Tool calls are not magic; they're in-band text signals serialized via ANTML markers that resemble XML but aren't. Models either follow learned conventions or use grammar-aware constrained decoding to emit valid structures. Pi's multi-edit schema differs from Claude Code's flat file_path, old_string, new_string format, and that mismatch exposes the problem. A fresh single-turn prompt rarely triggers the bug, but agentic histories with file reads and multi-line edits reproduce it around 20 percent of the time.

The leading hypothesis: post-training on Claude Code's forgiving harness taught models that sloppy tool calls still succeed. That client silently repairs aliases, coerces types, filters unknown keys, and retries malformed invocations. Reinforcement learning in such an environment provides little gradient against invented fields. Worse, models may now overfit to Claude Code's specific schema, making alternative tool shapes increasingly off-distribution — a regression from Opus 4.5's adaptability.