HeadlinesBriefing favicon HeadlinesBriefing.com

OpenAI and Broadcom Launch Jalapeño: A New Era for LLM Inference

OpenAI Blog •
×

OpenAI and Broadcom unveiled Jalapeño, a purpose‑built inference accelerator designed around OpenAI’s LLM insights. Built from scratch, the chip targets GPT‑5.3‑Codex‑Spark and other models, aiming to cut latency while boosting throughput. Early lab runs show the first‑generation chip delivers performance per watt far ahead of today’s leading ASICs.

Co‑development spanned nine months from design to tape‑out, a record for high‑performance ASICs. Broadcom’s Tomahawk networking silicon couples with OpenAI’s kernel‑level optimizations to reduce data movement and balance compute, memory, and network resources. The result is an inference platform that promises tighter theoretical‑peak utilization and lower operating costs for data‑center partners.

OpenAI plans to deploy Jalapeño at gigawatt scale by 2026, partnering with Microsoft and others. The chip’s architecture lets it serve all current and future LLMs, tightening the full‑stack loop from model design to deployment. By owning the hardware, OpenAI can tighten latency, cut energy use, and lower service prices, making advanced AI more accessible.

Industry analysts see Jalapeño as a catalyst for scaling generative AI workloads. Its reduced data movement and balanced resource allocation translate into measurable efficiency gains, potentially lowering cloud inference costs by tens of percent. As OpenAI pushes this architecture into production, developers will benefit from faster response times in chat, code‑generation, and API services, tightening the competitive edge of AI platforms.