HeadlinesBriefing favicon HeadlinesBriefing.com

AI Carb Counters Show Dangerous Inconsistency

Hacker News •
×

A preprint study sent 13 real‑world food photos to four leading language models—OpenAI’s GPT-5.4, Anthropic’s Claude Sonnet 4.6, Google Gemini 2.5 Pro and Gemini 3.1 Pro Preview—over 500 times each, totaling 26,904 queries. Even at the lowest randomness setting, every model produced different carbohydrate estimates for the same image, with variations large enough to trigger hypoglycaemia, making the results alarming for anyone relying on AI‑driven carb trackers.

Variation differed by model. Claude’s estimates stayed within a 5 % coefficient of variation for most dishes, while Gemini 2.5 Pro swung over 10‑20 % and sometimes spanned 429 g of carbs—a 42.9‑unit insulin discrepancy on a single paella photo. GPT‑5.4 showed the widest spread, averaging a 74 g carb count for a 40 g reference sandwich. Such inconsistency undermines trust in automated nutrition advice.

Confidence scores from all four models proved useless; correlation with actual accuracy hovered near zero, and high‑confidence outputs were often the least accurate. Because a diabetes app returns only a single number, users cannot gauge whether they received a typical estimate or an extreme outlier. Regulators should treat these tools as experimental, reinforcing warnings against using generic LLMs for insulin dosing.