Repository: 👉 premsgdev/rag-engine
Retrieval-Augmented Generation (RAG) is widely used to reduce hallucinations in Large Language Models (LLMs). The idea is simple: retrieve relevant documents and force the model to answer only from them.
But while RAG helps with hallucination, it still fails in one critical area: Ambiguous user queries.
In this article, I’ll explain:
- Why traditional RAG breaks on ambiguity
- Why vector similarity alone is not enough
- How I built a Query Validation Agent that asks clarifying questions
- How this design fixes a fundamental RAG limitation
All examples and code references come from this real project: 👉 premsgdev/rag-engine
The Hidden Assumption in Most RAG Systems
Most RAG pipelines implicitly assume:
“If we retrieve documents, the user must be asking about them.”
That assumption is wrong.
Example
A user asks:
“What is life?”
The document corpus contains:
- LIFE Mission – Livelihood Inclusion and Financial Empowerment Mission
A traditional RAG system does this: Query → Vector Search → LIFE Mission docs → LLM answer.
The answer is grounded in documents, but the intent is wrong. This is not hallucination; it is semantic misalignment.
Why Vector Search Alone Cannot Fix This
Vector databases optimize for semantic similarity, not user intent. From an embedding perspective, “life” and “LIFE Mission” are close enough to match.
The vector DB does exactly what it is supposed to do. The failure happens after retrieval, not during retrieval.
The Missing Layer: Query Validation Before Answering
The key insight in this project is that RAG needs a decision layer before answering. Instead of immediately answering after retrieval, the system must first ask:
“Can this question be answered correctly using these documents?”
This decision should be evidence-based, deterministic, and explicit. This is where an agent fits naturally.
The Query Validation Agent (Real Code)
In this project, the validation logic lives in src/chat-agents/query-validator.agent.ts.
This agent:
- Does not retrieve
- Does not answer
- Only decides
Real responsibility of the agent
It receives the user query, language, and retrieved document snippets and returns:
| |
Evidence-Based Validation (Not Guessing)
The agent is invoked only after a cheap vector probe in src/chat-retrieval/vector-signal.service.ts. This service answers one question only:
“Does the corpus contain anything related to this query?”
If yes, a small set of snippets is passed to the agent. The agent prompt is in src/chat-llm/prompts/query-validator.prompt.ts, with a key instruction:
Decide if the question can be answered using ONLY the snippets below. If the question is ambiguous, ask for clarification.
This forces the LLM to reason only over retrieved evidence, not world knowledge.
What Happens on an Ambiguous Query
User input:
“What is life?”
Retrieved snippets:
“LIFE Mission is a government initiative focused on livelihood inclusion…”
Agent response (real behavior):
| |
This is the correct answer. Answering immediately would be misleading.
Why This Fixes a Core RAG Limitation
Traditional RAG answers whenever retrieval succeeds. This system answers only when intent and documents align. The agent explicitly separates:
- Retrieval success
- Answerability
That separation eliminates confident but wrong answers, silent intent rewrites, and subtle hallucinations.
Making It a Real Chat: Clarification State
Asking a clarification question is not enough; the system must remember it. This project introduces explicit conversation state in src/chat/chat-state.service.ts, stored in Redis:
| |
Multi-Turn Flow (End-to-End)
Turn 1
- User: What is life?
- System: Did you mean LIFE Mission?
Turn 2
- User: yes
The backend detects a pending clarification, resolves “yes” deterministically, rewrites the intent internally to “What is LIFE Mission?”, and continues retrieval and answering. This logic lives in src/chat/chat.service.ts.
Importantly: The LLM does NOT “remember.” The backend does.
Why This Is Better Than Chat History Prompts
Many systems rely on prompting chat history and hoping the model understands “yes.” That approach is non-deterministic, hard to debug, and model-dependent. This design stores intent explicitly, works with any LLM, and is testable and observable.
This is backend engineering, not prompt tricks.
The agent does decision-making, not answering.
Why This Matters in Real Systems
This pattern is critical for policy documents, government data, compliance systems, and enterprise knowledge bases. In these domains, a grounded but wrong answer is worse than no answer.
Asking for clarification is not a UX flaw — it is correctness.
Repository Reference
All concepts in this article are implemented in this repo: 👉 premsgdev/rag-engine
You can:
- Inspect the agent code
- Follow the ingestion pipeline
- See how clarification state is stored
- Trace the full streaming RAG flow
Workflow Diagram

Final Thoughts
RAG reduces hallucination by grounding answers in documents. But agentic validation solves a deeper problem: answering the wrong question confidently.
By adding a validation agent that:
- Reasons over retrieved evidence
- Detects ambiguity
- Asks clarifying questions
- Resumes deterministically
… This is the difference between:
“The model answered something”
and
“The system understood the user”