RAG (Retrieval-Augmented Generation)

Definition

RAG combines a retrieval system with a generative language model. Instead of relying on the LLM’s parametric memory alone, RAG retrieves relevant context from an external knowledge base and passes it to the LLM as context for generation.

User query → Retrieval → Top-k documents → LLM (query + docs) → Response

Core Components

Chunking — Split documents into indexable units (Text Chunking)
Indexing — Embed chunks and store in vector database
Retrieval — Find chunks most relevant to query (Dense Vector Retrieval, Hybrid Search)
Generation — LLM synthesizes answer from retrieved context

Why RAG Matters for Search

Grounds LLM responses in actual documents (reduces hallucination)
Enables citing sources
Knowledge can be updated without retraining the LLM
Domain adaptation without fine-tuning

RAG Quality Factors

Retrieval quality (most critical):

Correct chunks must be retrieved; LLM can’t fix bad retrieval
Text Chunking strategy affects what’s retrievable
Embedding Fine-tuning improves domain-specific retrieval
Hypothetical Document Embeddings boosts zero-shot recall

Generation quality:

Context window management
Prompt engineering
Cross-encoder reranking before passing to LLM

Agentic RAG

Agentic Search extends RAG by making retrieval iterative:

Agent decides what to retrieve next based on current knowledge state
Multi-step reasoning with multiple retrieval rounds
Tools beyond text search (calculators, APIs, databases)

Search-R1: RL-Trained Multi-Turn Retrieval

Search-R1 takes agentic RAG further by training the model via Reinforcement Learning for Search — no human-labeled trajectories needed. The model learns to interleave <think>, <search>, and <information> tokens, iteratively querying a live search engine during reasoning. This contrasts with standard RAG’s static index and single-turn retrieval pattern.

Embeddings — the retrieval component of RAG uses embeddings to find relevant context
Dense Embeddings — typically the retrieval representation in RAG pipelines
Text Chunking — preprocessing for RAG
Dense Vector Retrieval — typical retrieval method in RAG
Hybrid Search — combining sparse + dense for better RAG retrieval
Hypothetical Document Embeddings — query-side improvement
Agentic Search — agentic extension of RAG
Task-Aware Embeddings — improves RAG by task-conditioning queries
Search-R1 — RL-trained evolution of RAG; multi-turn live-web retrieval interleaved with reasoning
Reinforcement Learning for Search — training paradigm that replaces supervised trajectory labeling

Articles

Chunking Strategies for LLM Applications
Evaluating the Ideal Chunk Size for a RAG System using LlamaIndex 1
Improve your RAG applications by moving to Task-aware Embeddings
Hypothetical Document Embeddings HyDE
Agentic Search as an Agile Engineering Process
Agentic Search for Context Engineering — Leonie Monigatti; traces evolution RAG → agentic RAG → context engineering; articulates where single-pass RAG breaks
From RAG to Search-R1 - Evolving Language Models from Knowledge Retrieval to Autonomous Reasoning — Lakshmi Devi Prakash; traces evolution from RAG to RL-based multi-turn search
SEARCH-R1 - Reinforcement Learning-Enhanced Multi-Turn Search and Reasoning for LLMs — technical breakdown of Search-R1 framework

Awesome Search KG

Explorer

RAG

RAG (Retrieval-Augmented Generation)

Definition

Core Components

Why RAG Matters for Search

RAG Quality Factors

Agentic RAG

Search-R1: RL-Trained Multi-Turn Retrieval

Articles

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

RAG

RAG (Retrieval-Augmented Generation)

Definition

Core Components

Why RAG Matters for Search

RAG Quality Factors

Agentic RAG

Search-R1: RL-Trained Multi-Turn Retrieval

Related Concepts

Articles

Graph View

Table of Contents

Backlinks