Precision and Recall

Definition

Precision: Of the documents retrieved, what fraction are relevant?
Recall: Of all relevant documents, what fraction were retrieved?

Precision = |Retrieved ∩ Relevant| / |Retrieved|
Recall    = |Retrieved ∩ Relevant| / |Relevant|

The Precision-Recall Tradeoff

Precision and recall are in tension:

Higher recall: retrieve more documents → more relevant ones found, but also more irrelevant ones → lower precision
Higher precision: be selective → top results are relevant, but some relevant docs missed → lower recall

This is the fundamental tradeoff in information retrieval. There is no “best” point — the optimal depends on the use case.

Precision@k (P@k)

Instead of measuring over the full ranked list, measure precision at a cutoff:

P@k = (relevant documents in top k) / k

Examples:

P@1: is the first result relevant? (Click-satisfaction proxy)
P@5: are most of the first page relevant?
P@10: standard IR evaluation cutoff

Unlike NDCG, P@k ignores document ranking within the cutoff — all positions within k are equally weighted.

Recall@k

Fraction of relevant documents found in top k:

Recall@k = (relevant documents in top k) / |all relevant documents|

High Recall@1000 is the target for first-stage retrieval in Retrieval Pipeline:

First stage needs to recall nearly all relevant docs
Second stage (Cross-Encoder reranker) handles ranking quality
Missing a relevant doc at stage 1 = permanent recall loss

F1 Score

Harmonic mean of precision and recall:

F1 = 2 × (Precision × Recall) / (Precision + Recall)

F1 treats precision and recall equally. Use when both matter and you want a single number.

When to Use Each Metric

Metric	Best for
P@1	First result quality (navigational queries)
P@5	Above-the-fold quality
P@10	Standard page quality
Recall@1000	First-stage retrieval coverage
NDCG@10	Overall ranking quality with graded relevance
MRR	Known-item / QA systems
F1	Classification tasks, information extraction

Precision vs. NDCG

P@k treats all positions equally within k — a relevant document at rank 1 = rank k.
NDCG discounts lower ranks — a relevant document at rank 1 is worth more than at rank k.

NDCG is generally preferred for ranked retrieval evaluation.

NDCG — more nuanced ranking metric
MRR — position-aware precision for first relevant document
Search Evaluation — how these feed into evaluation
Judgment Lists — relevance labels needed
Retrieval Pipeline — Recall@1000 critical for first stage

People

Daniel Tunkelang — precision/recall in search quality discussions
Doug Turnbull — practical P@k and Recall@k usage

Awesome Search KG

Explorer

Precision and Recall

Precision and Recall

Definition

The Precision-Recall Tradeoff

Precision@k (P@k)

Recall@k

F1 Score

When to Use Each Metric

Precision vs. NDCG

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Precision and Recall

Precision and Recall

Definition

The Precision-Recall Tradeoff

Precision@k (P@k)

Recall@k

F1 Score

When to Use Each Metric

Precision vs. NDCG

Related Concepts

People

Graph View

Table of Contents

Backlinks