UDCG (Utility and Distraction-aware Cumulative Gain)

Definition

UDCG is a retrieval evaluation metric designed for agentic and RAG systems where retrieved documents are consumed by an LLM rather than scanned by a human. It extends NDCG by assigning negative utility scores to distractors — passages that cause the model to answer incorrectly.

Introduced by Trappolini et al. to address the fundamental inadequacy of traditional IR metrics in the agentic era.

Why Standard Metrics Fail

NDCG, MAP, and MRR were designed for human-facing search, where:

Relevance grades are monotonically non-negative (irrelevant = 0, perfect = 4)
Irrelevant documents are neutral — they waste a position but don’t actively harm the user
Users can skip past bad results with minimal cost

Neither assumption holds for LLMs:

Retrieved documents are injected into context and processed in parallel — there is no skipping
Some passages are actively harmful: semantically plausible but factually wrong content that leads the model to incorrect conclusions
A retriever can score high on nDCG@10 while consistently poisoning the context with high-confidence distractors

The UDCG Formula

UDCG assigns utility scores based on model behavior:

u(d) > 0 — document helps the model answer correctly
u(d) = 0 — document is neutral / irrelevant
u(d) < 0 — document is a distractor (causes an incorrect answer)

UDCG@k = Σᵢ₌₁ᵏ  u(dᵢ) / log₂(i + 1)

Utility is derived empirically: include/exclude each document and observe whether model accuracy changes.

Key Insight

Precision is more valuable than recall in the agentic era, because a miss costs less than a distractor.

UDCG exposes that distractor injection — not just irrelevance — is the principal failure mode of retrieval in LLM pipelines. Systems optimizing for nDCG may produce worse agentic outcomes than systems with lower nDCG but fewer distractors.

Reported improvement: up to 36% better correlation with end-to-end answer accuracy compared to traditional metrics.

Implications for Retrieval Design

Move from relevance to utility as the optimization target
Treat context like an interface with explicit failure modes (see Clean Context)
Prefer abstention over injecting uncertain passages
Use dynamic-k retrieval (stop when confidence drops) rather than fixed top-k

NDCG — standard metric that UDCG extends with negative utility
Clean Context — the goal UDCG incentivizes: context free of distractors
Diversity Metrics — another extension of NDCG for coverage; UDCG extends it for harm
Agentic Search — the context where UDCG matters most
Search Evaluation — broader evaluation framework

People

Trappolini et al. — UDCG authors
Lester Solbakken — UDCG metric advocacy and dynamic-k/abstention design

Mutually Assured Distraction

Awesome Search KG

Explorer

UDCG (Utility and Distraction-aware Cumulative Gain)

UDCG (Utility and Distraction-aware Cumulative Gain)

Definition

Why Standard Metrics Fail

The UDCG Formula

Key Insight

Implications for Retrieval Design

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

UDCG (Utility and Distraction-aware Cumulative Gain)

UDCG (Utility and Distraction-aware Cumulative Gain)

Definition

Why Standard Metrics Fail

The UDCG Formula

Key Insight

Implications for Retrieval Design

Related Concepts

People

Related Articles

Graph View

Table of Contents

Backlinks