A/B Testing for Search is Different
Source: https://dtunkelang.medium.com/a-b-testing-for-search-is-different-f6b0f6f4d0f5 Author: Daniel Tunkelang
Summary
Why search A/B testing requires different methodology than standard product experimentation. The core problem: search changes don’t affect all queries equally, and user behavior spans multiple queries within a session.
Core Challenges
Query Sparsity
Not all queries are affected by a given change. Analyzing aggregate metrics dilutes signal from the queries that actually matter.
Session Effects
Changes may produce unintended consequences within the same user session. A narrowly-scoped improvement on target queries “might come at the expense of performance on other queries.”
Effect Size vs. Duration Tension
- Large effect sizes (doubling conversion): less testing time needed
- Small effect sizes (1% lift): require longer test duration
- Most real improvements are small → slow iteration cycle
Key Recommendations
- Scope by sessions, not individual queries — analyze at session level to catch cross-query effects
- Target narrow query sets — focus improvements on specific query types for faster statistical power
- Balance speed with validity — keep test scopes narrow for rapid iteration, but measure holistic session impact
- “A/B testing search isn’t just a switch that you flip on — it’s a science”