Demystifying nDCG and ERR

Source: https://opensourceconnections.com/blog/2019/12/09/demystifying-ndcg-and-err/
Publisher: OpenSource Connections

Summary

A clear explainer contrasting NDCG with ERR (Expected Reciprocal Rank) — two ranking metrics that model user behavior differently. Aimed at search practitioners who have heard of both but aren’t sure when to use each.

NDCG Review

NDCG assumes:

  • All relevant documents are worth finding
  • Position matters (logarithmic discount)
  • Documents don’t “compete” — each has independent value

This is the independent utility model: each document contributes its relevance independently.

ERR (Expected Reciprocal Rank)

ERR models users as a cascade: they read results from top to bottom, and their probability of being satisfied at rank i depends on not having been satisfied at ranks 1 through i-1.

ERR = Σᵢ  (1/i) × Rᵢ × Πⱼ<ᵢ (1 - Rⱼ)

Where Rᵢ is the probability of satisfaction at rank i (function of relevance grade).

Intuition: If the first result is perfect (Rᵢ = 1.0), the user is satisfied and doesn’t look further. A highly relevant document at rank 3 contributes almost nothing if ranks 1 and 2 are also highly relevant.

NDCG vs. ERR: Key Difference

ScenarioNDCGERR
Perfect result at rank 1, more relevant at rank 5Rewards rank 5 (adds to DCG)Minimal reward (user already satisfied)
Two good results at rank 1 and 2Sums bothRank 2 partially discounted (user may stop at 1)
Ambiguous query needing multiple anglesRewards diversity of good resultsStops at first satisfaction
Known-item searchEquivalent to MRRSimilar behavior

When to Use Each

NDCG is better for:

  • Research evaluation (multiple relevant docs expected)
  • Exploratory/informational queries
  • E-commerce (user wants to see multiple options)

ERR is better for:

  • Navigational queries (user wants one specific result)
  • Known-item search
  • When “one great result > two good results” is the right model

Normalized vs. Unnormalized

Both NDCG and ERR can be computed normalized (by ideal) or unnormalized:

  • Normalized: [0,1] range, enables comparison across query sets with different difficulty
  • Unnormalized DCG: measures absolute quality; harder to compare across queries