Measuring Search, A Human Approach

Effective search improvement requires online and offline evaluation working together — not independently.

The limitation of online metrics (A/B testing)

A/B testing captures user interaction patterns effectively but has a critical limitation: it reveals what users do, not why they do it.

CTR, for example, may promote engaging-but-less-relevant results. Users seeking quick answers may click on satisfactory results and move on — creating misleading engagement signals that don’t reflect relevance.

Human judgment as counterbalance

Human raters evaluate query-document pairs for relevance — focused assessments unaffected by surface-level appeal. They identify:

Why content matters to specific queries
Corner cases that log metrics miss

Limitations: raters are imperfect proxies — they lack knowledge of individual user tasks and motivations.

Launch reviews

Rubinstein advocates for launch reviews — team discussions examining human ratings alongside A/B test results — to understand whether algorithm changes achieve intended improvements or require recalibration.

Human evaluation is indispensable for preventing misguided optimization based on incomplete metrics.

People

James Rubinstein

Awesome Search KG

Explorer

Measuring Search, A Human Approach

Measuring Search, A Human Approach

The limitation of online metrics (A/B testing)

Human judgment as counterbalance

Launch reviews

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Measuring Search, A Human Approach

Measuring Search, A Human Approach

The limitation of online metrics (A/B testing)

Human judgment as counterbalance

Launch reviews

Related Concepts

People

Graph View

Table of Contents

Backlinks