RAG (Retrieval-Augmented Generation)

Definition

RAG combines a retrieval system with a generative language model. Instead of relying on the LLM’s parametric memory alone, RAG retrieves relevant context from an external knowledge base and passes it to the LLM as context for generation.

User query → Retrieval → Top-k documents → LLM (query + docs) → Response

Core Components

  1. Chunking — Split documents into indexable units (Text Chunking)
  2. Indexing — Embed chunks and store in vector database
  3. Retrieval — Find chunks most relevant to query (Dense Vector Retrieval, Hybrid Search)
  4. Generation — LLM synthesizes answer from retrieved context
  • Grounds LLM responses in actual documents (reduces hallucination)
  • Enables citing sources
  • Knowledge can be updated without retraining the LLM
  • Domain adaptation without fine-tuning

RAG Quality Factors

Retrieval quality (most critical):

Generation quality:

  • Context window management
  • Prompt engineering
  • Cross-encoder reranking before passing to LLM

Agentic RAG

Agentic Search extends RAG by making retrieval iterative:

  • Agent decides what to retrieve next based on current knowledge state
  • Multi-step reasoning with multiple retrieval rounds
  • Tools beyond text search (calculators, APIs, databases)

Search-R1: RL-Trained Multi-Turn Retrieval

Search-R1 takes agentic RAG further by training the model via Reinforcement Learning for Search — no human-labeled trajectories needed. The model learns to interleave <think>, <search>, and <information> tokens, iteratively querying a live search engine during reasoning. This contrasts with standard RAG’s static index and single-turn retrieval pattern.

Articles