Explicit Relevance Evaluation with Probability-Proportional-to-Size Sampling

Explicit relevance evaluation: domain experts manually rate search result quality. Less initial investment than implicit methods (click analysis), stronger direct signal.

Why PPTSS over simple random sampling

Probability-proportional-to-size sampling weights frequent queries appropriately while still including less common ones — creating samples that mirror actual user traffic patterns. Simple random sampling over-represents tail queries.

Practical guidance

Query sample size

Start with 50 queries. This aligns with TREC standards and provides sufficient initial data. Multiple batches can follow.

Query quality validation

Manually review to eliminate:

Generic or ambiguous terms
Queries affected by ranking rules
Traffic unrelated to your inventory

Aim for ~50 viable queries after filtering.

Timeline expectations

Defining information needs: ~1 hour with subject matter experts
For 50 queries × 2 rankers × depth 5 (500 total judgments): ~1h40m at ~5 ratings/minute

Tooling

Tools like Quepid facilitate the explicit evaluation workflow.

People

Nate Day

Awesome Search KG

Explorer

How to succeed with explicit relevance evaluation using Probability-Proportional-to-Size sampling

Explicit Relevance Evaluation with Probability-Proportional-to-Size Sampling

Why PPTSS over simple random sampling

Practical guidance

Query sample size

Query quality validation

Timeline expectations

Tooling

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

How to succeed with explicit relevance evaluation using Probability-Proportional-to-Size sampling

Explicit Relevance Evaluation with Probability-Proportional-to-Size Sampling

Why PPTSS over simple random sampling

Practical guidance

Query sample size

Query quality validation

Timeline expectations

Tooling

Related Concepts

People

Graph View

Table of Contents

Backlinks