HeadlinesBriefing favicon HeadlinesBriefing.com

mnemo: Local AI Memory Layer for LLMs Without Cloud

Hacker News •
×

GitHub repository zaydmulani09/mnemo introduces a local‑first AI memory layer that lets large language models persist knowledge across sessions. The sidecar service watches every prompt, pulls named entities and relationships via an LLM, and stores them in a SQLite graph. Context is re‑injected in under 50 ms, eliminating cloud dependence. This design frees developers from vendor lock‑in and keeps data local today.

mnemo exposes a REST API that accepts raw text at /ingest, extracts entities, and builds an in‑memory petgraph. Retrieval at /retrieve runs a six‑stage pipeline: full‑text search, entity lookup, graph expansion, relation filtering, scoring, and prompt assembly. The tool ships as a single static binary, supports Docker with Ollama for a free local model, or any OpenAI‑compatible backend for development.

Benchmarks on an Apple M2 show entity insertion at 0.12 ms and full retrieval in 4.2 ms, reaching 238 ops/s. The project is written in Rust across four crates: core, API, CLI, and benchmark, and follows an MIT license. Developers can pull the repo, spin up Ollama locally, or point the binary to an external LLM and add persistent memory to their applications.