HeadlinesBriefing favicon HeadlinesBriefing.com

Embedding‑Based “More Like This” Redefines Search

Hacker News •
×

More Like This (MLT) lets users start a search from an existing document rather than typing a query. Classic MLT relied on lexical matching—TF‑IDF, BM25, term frequencies—to pull records sharing the same words. It powered related‑article links, duplicate detection, and support‑ticket matching, but struggled with synonyms, paraphrases, and cross‑language similarity.

Embedding‑based MLT replaces term vectors with dense numeric representations stored in the index. By retrieving a document’s precomputed vector and running a K‑nearest‑neighbor (KNN) or approximate nearest‑neighbor (ANN) search, the engine returns semantically close items even when wording differs. This expands use cases to products, code snippets, images, and RAG contexts, while exact matches like error codes remain lexical territory.

Manticore Search implements this flow directly in the engine: a query supplies the source document ID, the system fetches its embedding field and performs KNN, returning IDs and distance scores via knn_dist(). Eliminating a separate vector fetch reduces latency and code complexity. The result is a streamlined “more like this” feature that blends lexical precision with semantic breadth.

Enterprises adopting vector MLT report faster ticket resolution and richer product recommendations, because related items surface despite divergent phrasing. Since the vector index lives alongside the inverted index, existing deployments can enable hybrid search with minimal schema changes, delivering both exact code matches and semantic discovery in a single query.