Autocomplete and Autosuggest

How to build search suggestions that help users express intent faster. Autocomplete (prefix completion) and autosuggest (semantic fallback) are two distinct problems with different architectures.


Autocomplete vs. Autosuggest

AutocompleteAutosuggest
TriggerUser typing a prefixQuery typed but intent unclear
MechanismPrefix index / FST lookupSemantic similarity / embedding KNN
GoalComplete the query the user is typingSuggest alternative framings
Response time<10ms<50ms

Both are often displayed in the same suggestion dropdown, but use different backends.


Retrieval: How to Get Candidates Fast

Edge N-gram Index

Pre-index all suggestion strings with edge n-grams at index time. Fast prefix lookup at query time, but large index size.

Finite State Transducer (FST)

Compact automaton for prefix matching. Lucene uses FSTs internally for its suggest component. Orders of magnitude more memory-efficient than edge n-grams.

Trie

Simple prefix tree. Good for small vocabularies; doesn’t scale to millions of suggestions.

Practical Choice

Use your search engine’s native suggest feature (Elasticsearch’s search_as_you_type, completion suggester) for most cases. Only build custom FST if you need custom ranking or multi-field matching.


Ranking: Once You Have Candidates

Baseline: Frequency-Based

Sort by historical query frequency. Fast, interpretable, but ignores context.

Probabilistic Model

P(C | prefix) ∝ P(prefix | C) × P(C) — noisy channel framing. P(C) = frequency prior; P(prefix|C) = edit model. Used in spell-correction-aware autocomplete.

LTR Feature Categories

Temporal: recency of last usage, trending signal (24h spike), seasonality

Behavioral: CTR of the completion if clicked, conversion rate, session abandonment rate

Contextual: user location, language/locale, session history (what has the user already searched?)

Query: length of prefix typed, specificity of the candidate

Two-Phase for Latency

  1. Fast retrieval (<10ms): FST/edge n-grams → top-100 candidates
  2. Ranking (<30ms): LTR model to re-score and reorder

Personalization

User-specific ranking:

  • Boost completions the user has typed before (with recency decay)
  • Boost completions in categories the user frequently shops
  • Cold start: fall back to location-based defaults, demographic cohort, or pure frequency

Cold Start: No Query History

New search products have no query logs to build suggestions from. Options:

  1. Seed from catalog: extract noun phrases from product descriptions as initial suggestions
  2. LLM generation: prompt an LLM to generate realistic search queries for each product/category. David Albrecht’s approach: GPT-4.1-nano generated 10 queries per product for 43k items at <$5 cost. Produces novel phrases traditional NLP can’t extract.
  3. Competitor/open data: use open query datasets as bootstrapping signal (MS MARCO, Amazon ESCI)

LLM-generated queries solve cold-start while producing user-like phrases. Validate over time with click/conversion data.


Scaling Autocomplete

The naive approach (significant_terms aggregation on shingles) collapses at millions of documents — CPU saturation and multi-second queries.

What works at scale:

  • Separate autocomplete index (different scaling profile from main search)
  • search_as_you_type field type in Elasticsearch + match_phrase_prefix
  • terms aggregation (frequency-ordered) instead of significant_phrases
  • Debounce 25-50ms before firing (saves 40-50% of requests)
  • Views/popularity as primary ranking signal via field_value_factor + log1p

At 6M+ docs: achieves 416 req/s at p99 ≤ 45ms.


Common Failure Modes

Showing irrelevant completions: frequency ranking without context → “apple” suggests “apple pie” to a developer

Stale suggestions: popular-last-year completions that nobody searches now — need recency decay and trending signal

Overly long completions: users want to type less, not see their full query echoed back

Zero-results for accepted suggestion: suggestion exists but accepted query returns no results — catastrophic UX failure