Etsy — Search Quality and Query Understanding

Etsy is a global handmade/vintage marketplace with a highly heterogeneous catalog — items have non-standard names, niche categories, and long-tail vocabulary that standard search techniques handle poorly. Their engineering blog documents several years of search quality work focused on query understanding, diversity, and spelling correction.

Key Insight

Etsy’s catalog is fundamentally different from a structured product catalog. Queries like “cottagecore”, “steampunk necklace”, or “geeky gift” are valid search intents that map to thousands of unique artisan items — not a single best answer. This drives their approach: broad queries need diversity, not precision.

Challenge 1: Broad and Head Queries

High-volume queries like “necklace”, “bag”, or “geeky” span enormous semantic space. Standard relevance scoring (BM25) produces “winner takes all” results — the most popular category dominates, ignoring other valid interpretations.

Solution: Query classification → entropy-based diversification.

  • Classify queries as broad (head) vs. navigational/specific
  • For broad queries, shift the objective from precision to diversity
  • Measure result diversity using Shannon entropy over category and attribute distributions
  • Set per-query entropy targets; re-rank or inject results to meet the target

See: Targeting Broad Queries in Search, How Etsy Uses Thermodynamics for Search

Challenge 2: Entropy-Based Diversity (Thermodynamics Analogy)

Etsy borrowed the concept of thermodynamic entropy to quantify result diversity:

  • Low entropy = results all look similar (converged, narrow)
  • High entropy = results span many categories and styles

For a query like “geeky”, optimal results should include geeky necklaces, t-shirts, mugs, art prints, and posters — not five variations of the same item. Entropy measurement over the result set’s category distribution drives re-ranking toward that target.

Challenge 3: Domain-Specific Spelling Correction

Generic dictionaries fail for Etsy’s vocabulary — “cottagecore” is not in Merriam-Webster, but it is a major search category.

Solution: Noisy-channel model trained on Etsy’s own query logs.

  • Error model: confusion matrices from common user typo patterns
  • Language model: trained on Etsy query logs (domain-specific corpus, not Wikipedia/news)
  • Precision guards: click data validates that a correction actually improves results; suppresses correction of valid uncommon terms
  • Key principle: the correction system must know what’s on the platform

See: Modeling Spelling Correction for Search at Etsy

What to Steal

PatternApplication
Classify queries as broad vs. navigationalApply diversity logic only where needed
Entropy-based diversity metricMeasurable objective for re-ranking diversity
Domain query logs as spelling LM corpusFar more accurate than generic corpus for vertical search
Click signals to validate correctionsPrevents correcting valid niche terms
Precision guard for correctionNever correct what users successfully click on