MRR (Mean Reciprocal Rank)

Definition

MRR measures the quality of a ranked list based on the position of the first relevant result. It is simpler than NDCG and appropriate when there is exactly one correct answer.

Formula

For a set of queries Q:

MRR = (1/|Q|) × Σ_q  1 / rank_q

where rank_q is the position of the first relevant result for query q.

Example:

  • Query 1: first relevant at rank 1 → 1/1 = 1.0
  • Query 2: first relevant at rank 3 → 1/3 = 0.33
  • Query 3: no relevant result → 1/∞ = 0
  • MRR = (1.0 + 0.33 + 0) / 3 = 0.44

When to Use MRR

Best for:

  • Question answering: one correct answer exists
  • Known-item search: user knows exactly what they want
  • Informational queries with a single definitive source

Not ideal for:

  • Multi-document retrieval: multiple relevant docs (use NDCG)
  • E-commerce search: many acceptable products (use NDCG or Precision@k)
  • Exploratory search: no single “correct” answer

MRR vs. NDCG

AspectMRRNDCG
RelevanceBinary (relevant/not)Graded (0–4)
Documents countedFirst relevant onlyAll top-k
Position sensitivityHigh (first position premium)Logarithmic discount
ComplexitySimpleComplex
Best forQA, known-itemRanked retrieval generally

Computing MRR

def mrr_at_k(ranked_results, relevant_docs, k=10):
    for rank, doc in enumerate(ranked_results[:k], 1):
        if doc in relevant_docs:
            return 1.0 / rank
    return 0.0
 
def mean_mrr(queries_results, queries_relevant, k=10):
    return sum(mrr_at_k(r, rel, k) 
               for r, rel in zip(queries_results, queries_relevant)) / len(queries_results)

Pandas implementation (faster for large sets):

import pandas as pd
 
def compute_mrr_pandas(df):
    # df has columns: query_id, doc_id, rank, relevant
    relevant = df[df['relevant'] == 1]
    first_relevant = relevant.groupby('query_id')['rank'].min()
    return (1 / first_relevant).mean()

MRR@k

MRR@k only counts a result if the first relevant doc appears within the top k positions:

  • MRR@3: first relevant must be in top 3
  • MRR@10: standard in most benchmarks

People