HeadlinesBriefing favicon HeadlinesBriefing.com

Enterprise RAG Question Parsing: How Smart Dispatch Cuts LLM Costs by Two-Thirds

Towards Data Science •
×

Enterprise Document Intelligence tackles a fundamental RAG problem: when users ask about a CV's 'name', naive keyword matching fails because resumes don't contain that literal token. The solution lies in question-parsing that considers document context before retrieval. OpenAI's gpt-4.1 powers this parsing layer, which examines the document profile to understand what type of document it's handling and what fields typically matter.

The dispatcher makes three critical decisions after initial parsing: how much context to retrieve, whether to combine or sequence chunks, and which model tier to invoke. These cascade through concept-level overrides, answer-shape defaults, and project fallbacks. For single-value answers like amounts or dates, sequential processing stops after the first relevant chunk, saving roughly two-thirds of input tokens at k=3. Multi-chunk synthesis tasks instead combine all passages into one call.

Model selection separates conceptual tiers from specific implementations. Four vendor-agnostic tiers map to concrete model names in a registry table, allowing teams to swap gpt-4.5 or other models by updating rows rather than code. This architecture matters because enterprise deployments processing millions of documents multiply small token savings into substantial cost reductions. The approach treats defaults as conventions, not constraints, letting the LLM override conventions when questions explicitly contradict them.

This question-parsing brick completes half of a four-component enterprise RAG system, joining parsing, retrieval, and generation stages. By resolving dispatch decisions upfront, downstream components receive unified records with pre-determined strategies, eliminating redundant computation across the pipeline.