Etsy

Marketplace for handmade, vintage, and unique goods. Search is unusually challenging because Etsy’s catalog uses non-standard vocabulary — sellers create their own item names and descriptions with platform-specific terminology (“cottagecore”, “steampunk”).

Search Context

Etsy search must handle:

  • Long-tail vocabulary: seller-invented terms and styles not in standard dictionaries
  • Broad/ambiguous queries: “necklace” covers an enormous semantic space
  • Diversity imperative: for broad queries, showing five variations of the same item is a failure

Key Engineering Work

Entropy-Based Diversity

  • Applied Shannon entropy to measure result diversity over category/attribute distributions
  • Set entropy targets per query type: broad queries → high entropy target; specific queries → low
  • Thermodynamics analogy: optimal search = maximum useful entropy, not maximum relevance

Domain-Specific Spelling Correction

  • Standard spell correction would “fix” valid Etsy terms like “cottagecore”
  • Used noisy-channel model with domain-specific language model trained on Etsy query logs
  • Click data validates corrections: if a “correction” leads to zero clicks, it was wrong

Broad Query Handling

  • Classifies queries as broad vs. navigational
  • For broad queries: diversity injection to represent multiple interpretations
  • Prevented “winner takes all” domination by popular categories

Articles