Autosuggest Ranking
Source: https://medium.com/@dtunkelang/autosuggest-ranking Author: Giovanni Fernandez-Kincade
Summary
Once autocomplete candidates are retrieved (see Autosuggest Retrieval Data Structures and Algorithms), they need to be ranked. This article covers probabilistic and learning-to-rank approaches for autosuggest ranking.
The Ranking Problem
Given a prefix and a set of candidate completions, order them by predicted utility — the completion the user most likely wants to type next.
Baseline: Frequency-Based Ranking
Sort candidates by historical query frequency (global):
- “shoes” (10M queries) before “shoe repair near me” (50K queries)
- Simple, fast, interpretable
- Problem: ignores context (session, user, location, time)
Probabilistic Model
Model the probability a user will complete their prefix P with candidate C:
P(C | P) ∝ P(P | C) × P(C)
Where:
- P(C) = prior probability of the candidate (frequency-based)
- P(P | C) = likelihood of typing prefix P if intending C (edit model)
This is a noisy channel framing similar to spelling correction.
Feature-Based Ranking (LTR)
Key feature categories:
Temporal features:
- Recency of last usage
- Trending signal (spike in last 24h)
- Seasonality (Christmas gifts in December)
Demographic/contextual features:
- User’s location → local completions ranked higher
- User’s language/locale
- Session context (what has the user already searched this session?)
Behavioral features:
- Click-through rate of the completion
- Conversion rate if clicked
- Session abandonment rate
Query features:
- Length of prefix typed
- How specific vs. generic the candidate is
Two-Phase Ranking
For latency, use two stages:
- Fast retrieval (< 10ms): use edge n-grams or FST to get top-100 candidates
- Ranking (< 30ms): score and reorder with a lightweight LTR model
Personalization
Personalized autosuggest re-ranks based on user history:
- Boost completions the user has typed before
- Boost completions in categories the user frequently shops
- Decay over time (stale preferences less relevant)
Cold-Start Mitigation
New users have no history → fall back to:
- Location-based defaults
- Demographically similar users
- Pure frequency-based ranking
Key Concepts
- Noisy channel model — probabilistic framing of prefix-to-completion likelihood
- Temporal features — recency, trending, seasonality for ranking
- Two-phase ranking — fast retrieval + heavier scoring
- Personalized autosuggest — user-history-aware reranking
- Cold-start fallback — location/demographic defaults for new users