Repository: 👉 premsgdev/rag-engine

Retrieval-Augmented Generation (RAG) is widely used to reduce hallucinations in Large Language Models (LLMs). The idea is simple: retrieve relevant documents and force the model to answer only from them.

But while RAG helps with hallucination, it still fails in one critical area: Ambiguous user queries.

In this article, I’ll explain:

  • Why traditional RAG breaks on ambiguity
  • Why vector similarity alone is not enough
  • How I built a Query Validation Agent that asks clarifying questions
  • How this design fixes a fundamental RAG limitation

All examples and code references come from this real project: 👉 premsgdev/rag-engine


The Hidden Assumption in Most RAG Systems

Most RAG pipelines implicitly assume:

“If we retrieve documents, the user must be asking about them.”

That assumption is wrong.

Example

A user asks:

“What is life?”

The document corpus contains:

  • LIFE Mission – Livelihood Inclusion and Financial Empowerment Mission

A traditional RAG system does this: Query → Vector Search → LIFE Mission docs → LLM answer.

The answer is grounded in documents, but the intent is wrong. This is not hallucination; it is semantic misalignment.


Why Vector Search Alone Cannot Fix This

Vector databases optimize for semantic similarity, not user intent. From an embedding perspective, “life” and “LIFE Mission” are close enough to match.

The vector DB does exactly what it is supposed to do. The failure happens after retrieval, not during retrieval.


The Missing Layer: Query Validation Before Answering

The key insight in this project is that RAG needs a decision layer before answering. Instead of immediately answering after retrieval, the system must first ask:

“Can this question be answered correctly using these documents?”

This decision should be evidence-based, deterministic, and explicit. This is where an agent fits naturally.

The Query Validation Agent (Real Code)

In this project, the validation logic lives in src/chat-agents/query-validator.agent.ts.

This agent:

  • Does not retrieve
  • Does not answer
  • Only decides

Real responsibility of the agent

It receives the user query, language, and retrieved document snippets and returns:

1
2
3
4
5
6
export interface ValidateQueryResult {
  valid: boolean;
  confidence: number;
  reason: string | null;
  needsClarification: boolean;
}

Evidence-Based Validation (Not Guessing)

The agent is invoked only after a cheap vector probe in src/chat-retrieval/vector-signal.service.ts. This service answers one question only:

“Does the corpus contain anything related to this query?”

If yes, a small set of snippets is passed to the agent. The agent prompt is in src/chat-llm/prompts/query-validator.prompt.ts, with a key instruction:

Decide if the question can be answered using ONLY the snippets below. If the question is ambiguous, ask for clarification.

This forces the LLM to reason only over retrieved evidence, not world knowledge.

What Happens on an Ambiguous Query

User input:

“What is life?”

Retrieved snippets:

“LIFE Mission is a government initiative focused on livelihood inclusion…”

Agent response (real behavior):

1
2
3
4
5
6
{
  "valid": false,
  "confidence": 1.0,
  "reason": "Did you mean LIFE Mission (Livelihood Inclusion and Financial Empowerment Mission)?",
  "needsClarification": true
}

This is the correct answer. Answering immediately would be misleading.

Why This Fixes a Core RAG Limitation

Traditional RAG answers whenever retrieval succeeds. This system answers only when intent and documents align. The agent explicitly separates:

  • Retrieval success
  • Answerability

That separation eliminates confident but wrong answers, silent intent rewrites, and subtle hallucinations.

Making It a Real Chat: Clarification State

Asking a clarification question is not enough; the system must remember it. This project introduces explicit conversation state in src/chat/chat-state.service.ts, stored in Redis:

1
2
3
4
5
6
{
  "pendingClarification": {
    "type": "ENTITY_DISAMBIGUATION",
    "entity": "LIFE Mission"
  }
}

Multi-Turn Flow (End-to-End)

Turn 1

  • User: What is life?
  • System: Did you mean LIFE Mission?

Turn 2

  • User: yes

The backend detects a pending clarification, resolves “yes” deterministically, rewrites the intent internally to “What is LIFE Mission?”, and continues retrieval and answering. This logic lives in src/chat/chat.service.ts.

Importantly: The LLM does NOT “remember.” The backend does.

Why This Is Better Than Chat History Prompts

Many systems rely on prompting chat history and hoping the model understands “yes.” That approach is non-deterministic, hard to debug, and model-dependent. This design stores intent explicitly, works with any LLM, and is testable and observable.

This is backend engineering, not prompt tricks.

The agent does decision-making, not answering.

Why This Matters in Real Systems

This pattern is critical for policy documents, government data, compliance systems, and enterprise knowledge bases. In these domains, a grounded but wrong answer is worse than no answer.

Asking for clarification is not a UX flaw — it is correctness.

Repository Reference

All concepts in this article are implemented in this repo: 👉 premsgdev/rag-engine

You can:

  • Inspect the agent code
  • Follow the ingestion pipeline
  • See how clarification state is stored
  • Trace the full streaming RAG flow

Workflow Diagram

workflow

Final Thoughts

RAG reduces hallucination by grounding answers in documents. But agentic validation solves a deeper problem: answering the wrong question confidently.

By adding a validation agent that:

  • Reasons over retrieved evidence
  • Detects ambiguity
  • Asks clarifying questions
  • Resumes deterministically

… This is the difference between:

“The model answered something”

and

“The system understood the user”

Previous: Building a Hybrid RAG Engine with Local + Cloud Embeddings