Modeling Spelling Correction for Search at Etsy
Source: https://codeascraft.com/2017/05/01/modeling-spelling-correction-for-search-at-etsy/ Author: Etsy Engineering
Summary
How Etsy built a production spelling correction system for search, moving beyond simple dictionary-based approaches to a context-aware ML model. Demonstrates the difference between academic spelling correction and real-world search query correction.
The Search Context Problem
E-commerce spelling correction has unique challenges vs. general-purpose correction:
- Etsy items have non-standard names (“steampunk”, “cottagecore”)
- User query typos often involve category-specific terms not in standard dictionaries
- Some “misspellings” are valid Etsy-specific terms that should NOT be corrected
Approach
Used noisy-channel model similar to Norvig’s algorithm as the base, extended with:
- Domain-specific language model: trained on Etsy query logs (not generic corpus)
- Search-specific signals: click data to validate corrections
- Precision guards: avoid correcting valid uncommon terms
Key Principle
The correction model must know what’s on the platform — a query for “cottagecore” should not be corrected to “cottage core” or “core” even if it’s rare in a general corpus.