Evaluating the Best AB Testing Metrics for Search

Algolia introduces a two-dimensional framework for evaluating A/B testing metrics: directionality and sensitivity (from Yandex Research).

The Framework

Property	Definition
Directionality	A clear, unambiguous interpretation — higher is always better (or lower is always better)
Sensitivity	Ability to detect statistically significant differences with reasonable sample sizes and time

The fundamental tension: directional metrics (revenue) are slow to reach significance; sensitive metrics (CTR) are fast but gameable.

Metric	Directional?	Sensitive?	Pitfall
Revenue	High	Low	Needs months; confounded by product launches
Conversion rate	Medium	Medium	Treats all sales as equal; doesn’t capture margin
CTR	Low	High	Gameable (clickbait); misleading in knowledge-query domains
Margin-weighted CR	High	Medium	Best of both worlds for margin-focused businesses

Conversion rate boosting high-margin items: System B boosts expensive items → lower CR but higher revenue. CR misleads.
CTR in knowledge queries: Adding snippet answers drops CTR — but that’s actually better relevance. Use time-between-sessions instead.
Metric gaming: Authors publish clickbait titles → CTR rises without relevance improvement.

Use a directional primary metric (revenue or margin-weighted conversion) to define “better”
Use sensitive secondary metrics (CTR, session conversion) for fast signal
Never rely on a single metric — combine and watch for directional alignment
Check whether your metric can be gamed, intentionally or accidentally