Session vs Query based search evals

Traditional query-based evaluation

Create judgment lists from clickstream data, label results relevant/irrelevant for specific queries, evaluate with NDCG over aggregated signals.

Session-based evaluation

“Replay” individual user interactions directly. Instead of “how many queries improved?”, asks “would we have satisfied past users?” — evaluating whether the system would better satisfy single historical search-and-click events.

Advantages

Improved sampling accuracy: mirrors probability-based sampling (like political polling) — every user interaction has equal selection probability. Query-based methods introduce sampling bias by pre-selecting specific queries.
Time-sensitive features: aggregating clicks across months masks momentary conditions (e.g., temporary sales). Session-based data captures exact feature values present when users clicked.

Disadvantages

Clickstream biases (position bias) remain present
Harder to debug per-query since individual queries may contain insufficient data

Balanced view

Both derive from historical sessions and offer different insights:

Query-based: excellent for debugging specific search problems
Session-based: simulates A/B testing

Neither represents absolute truth — both are imperfect but useful models of search performance.

Session-Based Evaluation — the approach advocated
Search Evaluation — parent concept
Judgment Lists — query-based evaluation relies on pre-built judgment lists
Click Signals — clickstream data underlies both approaches
Position Bias — present in both session and query-based data
A-B Testing for Search — session-based eval simulates A/B testing

People

Doug Turnbull — author

Awesome Search KG

Explorer

Session vs Query based search evals

Session vs Query based search evals

Traditional query-based evaluation

Session-based evaluation

Advantages

Disadvantages

Balanced view

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Session vs Query based search evals

Session vs Query based search evals

Traditional query-based evaluation

Session-based evaluation

Advantages

Disadvantages

Balanced view

Related Concepts

People

Graph View

Table of Contents

Backlinks