HeadlinesBriefing favicon HeadlinesBriefing.com

Persistent Agent Memory on Elasticsearch Achieves 0.89 Recall

Hacker News •
×

Agents forget everything between sessions, forcing developers to stuff entire conversation histories into context windows. This approach breaks down on cost and latency while suffering from the 'lost in the middle' effect where models ignore facts placed far from prompt edges. The author built a persistent memory layer on Elasticsearch to solve this.

The system uses three indices following cognitive science categories: episodic (time-stamped events), semantic (stable user facts), and procedural (multi-step playbooks). Each has different write rates and aging rules. A fourth surface handles world data. This separation prevents the haystack problem of undifferentiated storage while allowing appropriate lifecycle management for each memory type.

Recall uses hybrid search: RRF over BM25 and Jina v5 dense vectors, followed by a cross-encoder reranker. Documents index both ways from a single write. On 168 QA questions, the system achieves 0.89 recall@10 with zero cross-tenant leaks. Supersession handles contradictions while decay prevents older facts from outranking fresh ones.

This implementation demonstrates that search engines naturally fit agent memory requirements. Rather than splitting across vector stores, keyword engines, and auth services, a single Elasticsearch-based solution handles retrieval, isolation, and lifecycle management. The approach scales to years of interaction while maintaining strict user privacy boundaries.