HeadlinesBriefing favicon HeadlinesBriefing.com

Taalas Unveils AI Inference Platform 10x Faster Than Current State of the Art

Hacker News •
×

Taalas, a startup founded just 2.5 years ago, has unveiled its first product: a hard-wired Llama 3.1 8B inference platform achieving 17K tokens/sec per user. This performance is nearly 10X faster than the current state of the art, while costing 20X less to build and consuming 10X less power. The company's approach tackles the two major barriers to ubiquitous AI: high latency and astronomical cost.

Their Hardcore Models leverage total hardware specialization, merging storage and compute on a single chip at DRAM-level density, eliminating the costly memory-compute boundary that underpins modern inference hardware complexity. Taalas achieves this through radical simplification, avoiding exotic technologies like HBM, advanced packaging, and liquid cooling. Their first-generation silicon platform, HC1, delivers leadership performance on the Llama 3.1 8B model, though it uses a custom 3-bit base data type, introducing some quality trade-offs compared to GPU benchmarks.

Future models, like a mid-sized reasoning LLM in spring and a frontier LLM using next-gen HC2 silicon in winter, promise even greater efficiency. Taalas aims to make AI inference run at sub-millisecond speeds and near-zero cost, enabling applications previously impractical. The company, with 24 team members and $30M spent, emphasizes substance over spectacle, focusing on precision strikes against entrenched problems with disciplined execution.