HeadlinesBriefing favicon HeadlinesBriefing.com

Cloudflare rolls out AI Gateway for unified model access

Hacker News •
×

Cloudflare unveiled AI Gateway, a unified inference layer that lets developers call any LLM or multimodal model through a single API. By reusing the existing AI.run() binding in Workers, a switch from a Cloudflare‑hosted model to OpenAI’s Claude or Google’s Gemini becomes a one‑line change. The service debuted with zero‑setup default gateways and automatic retries.

At launch the catalog lists more than 70 models from over 12 providers, including open‑source options on Workers AI and proprietary services such as Anthropic, Alibaba Cloud and Runway. Developers can query the same endpoint for image, video or speech models, enabling multimodal agents without juggling separate credentials or pricing dashboards.

AI Gateway aggregates billing so a single credit pool covers every call, and custom metadata attached to requests lets teams break spend down by user, feature or workflow. Cloudflare reports the average app now hits 3.5 models across providers, a pattern that previously obscured true cost and reliability metrics.

Cloudflare also previewed a “bring‑your‑own‑model” flow using Replicate’s Cog to containerize custom PyTorch models, which will later be deployable via Workers AI commands. Today’s agents can therefore achieve low latency and resilient inference from a single, global edge network.