NDCG (Normalized Discounted Cumulative Gain)

Definition

NDCG is the most widely used ranking quality metric in information retrieval. It measures how well a ranked list puts the most relevant results at the top, accounting for the diminishing value of lower-ranked positions.

Formula

CG (Cumulative Gain) — the simplest form, sums relevance scores with no position weighting:

CG@k = Σᵢ₌₁ᵏ rel_i

CG is position-insensitive — a relevant document at rank 10 counts the same as one at rank 1. Useful when position doesn’t matter (e.g., measuring total coverage), but not for ranking quality.

DCG (Discounted Cumulative Gain) — adds a logarithmic discount so top positions matter more:

DCG@k = Σᵢ₌₁ᵏ  rel_i / log₂(i + 1)

where rel_i is the relevance score of the document at rank i.

IDCG (Ideal DCG): DCG of the perfect ranking.

NDCG:

NDCG@k = DCG@k / IDCG@k

Range: [0, 1]. NDCG@k = 1.0 means perfect ranking.

Relevance Grades

NDCG supports graded relevance (unlike binary metrics):

Grade	Meaning
0	Not relevant
1	Marginally relevant
2	Relevant
3	Highly relevant
4	Perfect match

The logarithmic discount means position 1 is worth much more than position 10 — reflecting user behavior (users rarely scroll far).

Variants: “Flavors of NDCG”

Confusion arises because there are multiple valid formulations:

Variant 1: Standard (Jarvelin & Kekalainen)

DCG = rel₁ + Σᵢ₌₂ᵏ rel_i / log₂(i)

Variant 2: Burges (Microsoft/LightGBM default)

DCG = Σᵢ₌₁ᵏ (2^rel_i - 1) / log₂(i + 1)

The 2^rel - 1 transform emphasizes high-relevance documents more strongly.

Both normalize to NDCG the same way; the IDCG changes accordingly. Be explicit about which variant you use when comparing systems.

NDCG@k Choices

NDCG@3: measures top-3, relevant for autocomplete/instant answers
NDCG@5: common for web search (above the fold)
NDCG@10: most common standard benchmark (MS MARCO, BEIR)
NDCG@100: measures recall quality in first-stage retrieval

Computing NDCG

import numpy as np
 
def dcg_at_k(relevances, k):
    relevances = np.asarray(relevances[:k], dtype=float)
    n = len(relevances)
    if n == 0:
        return 0.0
    discounts = np.log2(np.arange(2, n + 2))
    return np.sum((2**relevances - 1) / discounts)
 
def ndcg_at_k(relevances, k):
    ideal = sorted(relevances, reverse=True)
    return dcg_at_k(relevances, k) / dcg_at_k(ideal, k) if dcg_at_k(ideal, k) > 0 else 0.0

NDCG in Practice

NDCG requires relevance judgments (Judgment Lists) — human-rated (query, document) pairs. Creating judgments is expensive; sampling strategy matters enormously (Search Evaluation).

Benchmark usage: MS MARCO, TREC, BEIR benchmarks use NDCG@10 as primary metric for comparing retrieval systems. ELSER reports 17% NDCG@10 improvement over BM25; Matryoshka Embeddings compares at 99% NDCG relative to full-dimension embeddings.

NDCG Limitations in Agentic Systems

Standard NDCG assumes irrelevant documents are neutral (grade 0). In LLM/RAG pipelines, some documents actively harm accuracy by introducing distractors. UDCG extends NDCG with negative utility scores for distractor documents.

MRR — simpler ranking metric (binary relevance, first relevant result)
MAP — binary-relevance alternative; considers all relevant document positions
Precision and Recall — non-ranking variants
UDCG — NDCG extension with negative utility for distractors (agentic RAG)
Judgment Lists — required input for NDCG computation
Search Evaluation — broader evaluation framework
LLM as Judge — automated relevance judgments

People

Daniel Tunkelang — frequent NDCG commentary; Evaluating Search series
Doug Turnbull — “Flavors of NDCG” post; practical NDCG guidance

Awesome Search KG

Explorer

NDCG (Normalized Discounted Cumulative Gain)

NDCG (Normalized Discounted Cumulative Gain)

Definition

Formula

Relevance Grades

Variants: “Flavors of NDCG”

Variant 1: Standard (Jarvelin & Kekalainen)

Variant 2: Burges (Microsoft/LightGBM default)

NDCG@k Choices

Computing NDCG

NDCG in Practice

NDCG Limitations in Agentic Systems

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

NDCG (Normalized Discounted Cumulative Gain)

NDCG (Normalized Discounted Cumulative Gain)

Definition

Formula

Relevance Grades

Variants: “Flavors of NDCG”

Variant 1: Standard (Jarvelin & Kekalainen)

Variant 2: Burges (Microsoft/LightGBM default)

NDCG@k Choices

Computing NDCG

NDCG in Practice

NDCG Limitations in Agentic Systems

Related Concepts

People

Graph View

Table of Contents

Backlinks