HeadlinesBriefing favicon HeadlinesBriefing.com

Epicure embeddings map multilingual food chemistry

Hacker News •
×

Epicure introduces a trio of skip‑gram ingredient embeddings rebuilt from scratch on a recipe corpus. The team aggregated 4.14M recipes from eleven sources covering English, Chinese, Russian, Vietnamese, Spanish, Turkish, Indonesian, German and Indian‑English. An LLM‑augmented pipeline collapsed raw strings to 1,790 canonical entries, creating a unified vocabulary for cross‑language analysis. The effort showcases how large‑scale LLM preprocessing can standardize noisy ingredient lists across languages.

From normalized list the authors built two graphs: a 203,508‑edge ingredient‑ingredient NPMI network and an 80,019‑edge typed FlavorDB graph linking ingredients to 2,247 compounds. These structures seed three Metapath2Vec variants that share architecture but differ in random‑walk schema. Cooc walks only the co‑occurrence graph, Chem follows compound metapaths, while Core blends both with injected ingredient walks, placing each model along a chemistry‑versus‑recipe continuum.

By marrying co‑occurrence signals with chemical similarity, Epicure yields embeddings that capture both cultural pairing habits and molecular flavor relationships. The vectors enable cross‑cuisine ingredient substitution, flavor recommendation engines, and research into food chemistry. Open‑source release lets developers integrate the models into cooking assistants or nutrition analysis tools, demonstrating that multilingual recipe data can power richer culinary AI.