Measuring Search Effectiveness
Search functions as an instrument supporting information-seeking behavior — not an end in itself. Evaluation must account for how well search facilitates the user’s broader journey.
Traditional metrics as foundation
Precision (proportion of returned results that are relevant) and recall (proportion of relevant results retrieved), plus position-aware variants like DCG. These have inherent limitations — they focus on isolated queries.
Sessions matter more than queries
Traditional evaluation concentrates on isolated queries, but users engage through multi-query sessions where intent may shift. Session ROI: compare user effort (queries, keystrokes, time on SERP) against returns (clicks, conversions).
Modern search incorporates contextual navigation (facets, category refinements) — evaluation should recognize where searchers are in their journey.
Multi-session user journeys
Complex decisions span multiple sessions over days/weeks. Attribution modeling assigns engagement to previous sessions. Watch for survivorship bias — dissatisfied users disappear, so examining new user behavior provides unbiased signals.
Component vs. system evaluation
Individual component evaluation (query classification, ranker) is essential for incremental improvement, but overall system impact matters most. Sensitivity analysis reveals how modifications affect the full experience — a query classification improvement may be redundant with existing ranking signals.
Key principle
As George Box observed: imperfect models remain useful. Comprehensive search evaluation integrates traditional metrics with session, user, and system-level perspectives. Measurement only gains meaning when organizations commit to acting on insights.