Reranking

Definition

Reranking is a second-stage scoring step that takes an initial candidate set retrieved cheaply (first stage) and rescores it using a more expensive, higher-quality model. The reranker sees both the query and document together, enabling richer interaction than first-stage retrievers.

Why Reranking Exists

First-stage retrievers (BM25, Bi-Encoder) must score millions of documents quickly — they use independent query/document representations or simple term statistics. This limits their expressiveness. A reranker only needs to score ~100–1000 candidates, so it can afford deep query-document interaction.

Standard Pipeline

Query
  │
  ▼
First-stage retrieval (BM25 / bi-encoder ANN)   → top-1000 candidates
  │
  ▼
Reranker (cross-encoder or LLM)                 → rescored top-1000
  │
  ▼
Final ranked list (top-10/20 served to user)

Reranker Types

Cross-Encoder (Most Common)

A Cross-Encoder processes query + document concatenated as a single sequence, producing a relevance score. Sees full interaction between query and document tokens.

Models: cross-encoder/ms-marco-MiniLM-L-6-v2, Cohere Rerank, bge-reranker-*
Latency: ~50–200ms for 100 candidates on GPU
Quality: significantly better than bi-encoder for nuanced relevance

LLM-as-Reranker

Use a large language model (LLM as Judge) to score or listwise-rank candidates. Higher quality, much higher cost.

Pointwise: “Is this document relevant to the query? Yes/No”
Listwise: “Rank these 10 documents by relevance”

ColBERT Late Interaction

ColBERT / Late Interaction sits between bi-encoder speed and cross-encoder quality — efficient enough for first-stage in some setups, but also used as a reranker.

Reranking in RAG

In RAG pipelines, reranking is critical: the LLM context window is limited, so only the top 3–5 chunks are included. A reranker narrows 50–100 retrieved chunks down to the best ones before LLM generation.

Tradeoffs

	First-Stage	Reranker
Throughput	Millions of docs/sec	Hundreds of docs/sec
Latency	<10ms	50–500ms
Quality	Good	Excellent
Cost	Low	Higher

When Reranking Becomes a System Boundary

From When Reranking Becomes a System Boundary (Ravindra Harige):

Retrieval defines eligibility; reranking defines order. If a document is not retrieved, no downstream stage can recover it.

Ranking as a Projection

Reranking does not redo retrieval-time computation (term matches, field contributions, BM25 components). It operates on a compressed, lossy representation of what survived into the candidate set. This is structural — not a failure of implementation.

Compensatory Reranking

The system crosses a boundary when performance gains come from widening the rerank window rather than improving retrieval. At that point the window size is load-bearing (not a latency knob), and reranking has become compensatory.

Evaluation Split

Stage	Metric	Blindspot
Retrieval	Recall@K	NDCG can improve while recall is weak
Reranking	NDCG, MRR	Metrics improve while user-visible relevance plateaus

Retrieval (engineering) and reranking (ML/data science) are owned by different teams with different metrics. Neither dashboard shows the full picture. The gap closes only when someone is accountable for the space between them.

Retrieval Pipeline — the multi-stage architecture reranking fits into
Cross-Encoder — primary reranking architecture
Bi-Encoder — first-stage retriever that feeds the reranker
ColBERT — late interaction alternative
LLM as Judge — LLM-based reranking
RAG — key use case for reranking
Learning to Rank — related family of ranking approaches

Awesome Search KG

Explorer

Reranking

Reranking

Definition

Why Reranking Exists

Standard Pipeline

Reranker Types

Cross-Encoder (Most Common)

LLM-as-Reranker

ColBERT Late Interaction

Reranking in RAG

Tradeoffs

When Reranking Becomes a System Boundary

Ranking as a Projection

Compensatory Reranking

Evaluation Split

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Reranking

Reranking

Definition

Why Reranking Exists

Standard Pipeline

Reranker Types

Cross-Encoder (Most Common)

LLM-as-Reranker

ColBERT Late Interaction

Reranking in RAG

Tradeoffs

Related Concepts

When Reranking Becomes a System Boundary

Ranking as a Projection

Compensatory Reranking

Evaluation Split

Graph View

Table of Contents

Backlinks