HeadlinesBriefing favicon HeadlinesBriefing.com

Semble cuts token use 98% for AI code search

Hacker News •
×

Stephan and Thomas have open‑sourced Semble, a code‑search library aimed at AI agents that stumble when plain‑text grep returns too much data. Agents like Claude Code often fall back to reading whole files, burning tokens and still missing relevant snippets. Semble replaces that workflow with semantic search, returning only the exact code chunk an agent needs.

The engine fuses Model2Vec embeddings from the potion‑code‑16M model with BM25 via Reciprocal Rank Fusion, then reranks with custom signals. On 1,250 query/document pairs across 63 repos and 19 languages, Semble hit 0.854 NDCG@10, matching 99 % of a 137M‑parameter transformer while using 98% fewer tokens. Indexing a typical repo costs about 250 ms and queries run in ~1.5 ms on CPU, roughly 200× faster than transformer search.

Semble runs entirely on CPU, needs no API keys or GPU, and can be deployed as an MCP server or invoked from the shell. Integration steps for Claude Code, Cursor, Codex and OpenCode involve a one‑line command, after which agents query language like “How is authentication handled?” and receive concise snippets. Early adopters report token savings in the hundreds of thousands, confirming the tool’s efficiency.