Hypothetical Document Embeddings (HyDE)

HyDE addresses embedding retriever generalization problems by using an instruction-following LLM to generate “hypothetical documents” that capture relevant patterns from a query.

When to use it

Retrieval performance is poor (low Recall)
Your pipeline queries against large document collections
Your domain data differs substantially from the retriever’s training data

How it works

LLM generates ~5 hypothetical documents for the query
Each hypothetical document is encoded into an embedding vector
Vectors are averaged into a single consolidated embedding
That embedding is used for vector similarity search against real documents

The averaged embedding captures the semantic space of relevant content rather than the query string itself.

Haystack implementation

Components needed:

OpenAI generator (configured for multiple outputs)
Prompt builder
Output adapter
Document embedder
Custom component to compute mean of hypothetical document embeddings

HyDE
Dense Embeddings
Embeddings
RAG
Semantic Search

Awesome Search KG

Explorer

Hypothetical Document Embeddings (HyDE)

Hypothetical Document Embeddings (HyDE)

When to use it

How it works

Haystack implementation

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Hypothetical Document Embeddings (HyDE)

Hypothetical Document Embeddings (HyDE)

When to use it

How it works

Haystack implementation

Related Concepts

People

Graph View

Table of Contents

Backlinks