Spelling Correction

Definition

Spelling correction in search identifies and corrects typographic errors and misspellings in user queries before retrieval — replacing “iphne” with “iphone”, or “maneger” with “manager”. It is a foundational component of Query Understanding.

Why Spelling Matters in Search

Up to 15% of all search queries contain spelling errors. Without correction:

“iphne” → zero results (exact BM25 match fails)
User abandons → session failure
Revenue impact in e-commerce

Error Types

Type	Example	Correction Strategy
Typographic	”iphne” (dropped letter)	Edit distance
Phonetic	”fone” for “phone”	Soundex/Metaphone
Cognitive	”definately” for “definitely”	Statistical LM
Context error	”I ate desert” (should be “dessert”)	Context-aware LM

Classic Algorithm: Norvig’s Spell Corrector

Peter Norvig’s classic algorithm (2007):

Generate all edits within distance 1 and 2 of the input word
Filter edits that appear in the language model vocabulary
Rank by n-gram language model probability

def candidates(word):
    return (known([word]) or known(edits1(word)) or 
            known(edits2(word)) or [word])
 
def edits1(word):
    letters = 'abcdefghijklmnopqrstuvwxyz'
    splits = [(word[:i], word[i:]) for i in range(len(word)+1)]
    deletes = [L + R[1:] for L,R in splits if R]
    transposes = [L + R[1] + R[0] + R[2:] for L,R in splits if len(R)>1]
    replaces = [L + c + R[1:] for L,R in splits if R for c in letters]
    inserts = [L + c + R for L,R in splits for c in letters]
    return set(deletes + transposes + replaces + inserts)

SymSpell

Wolf Garbe’s SymSpell: 1000x faster spelling correction:

Pre-computes a dictionary of all words within edit distance 1 and 2
At query time: lookup (not generate) → O(1) per edit distance level
Works for both single-word and phrase corrections

Context-Aware Correction

Standard spell correction is word-by-word. Context-aware correction considers surrounding words:

“I went to the beech” → “beech” could be “beach” or tree
Context (“went to the”) disambiguates toward “beach”

Implemented with n-gram language models or transformer-based models (BERT).

Deep Learning Approaches

Modern approaches use neural models:

Chars2vec: character-level embeddings robust to spelling errors
BERT-based: pre-trained LM scores candidate corrections in context
JamSpell: context-aware correction using 3-gram LM + SymSpell

Search-Specific Considerations

Search spelling correction differs from general spell correction:

Query log augmentation: “iphne” → “iphone” can be learned from query logs without a dictionary
Brand names: proper nouns, product names, model numbers require product catalog
Don’t always correct: “python” vs “phyton” — user may mean the snake on a reptile site
“Did you mean” vs. automatic: show suggestion vs. silently correct (UX tradeoff)

Etsy Case Study

Etsy’s spelling correction uses:

Multiple candidate generators (edit distance, phonetic, query log)
Language model ranker (domain-specific, trained on Etsy query logs)
Real-time A/B testing to validate correction quality

Query Understanding — spelling correction is Part 1 (formulation analysis)
Query Segmentation — co-occurs with spelling correction in the preprocessing pipeline
Autocomplete — suggests correct queries before user finishes typing
Zero Results — spelling errors are the #1 cause of zero results
BM25 — fails completely on misspelled queries

People

Daniel Tunkelang — “Spelling Correction” at queryunderstanding.com

Awesome Search KG

Explorer

Spelling Correction

Spelling Correction

Definition

Why Spelling Matters in Search

Error Types

Classic Algorithm: Norvig’s Spell Corrector

SymSpell

Context-Aware Correction

Deep Learning Approaches

Search-Specific Considerations

Etsy Case Study

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Spelling Correction

Spelling Correction

Definition

Why Spelling Matters in Search

Error Types

Classic Algorithm: Norvig’s Spell Corrector

SymSpell

Context-Aware Correction

Deep Learning Approaches

Search-Specific Considerations

Etsy Case Study

Related Concepts

People

Graph View

Table of Contents

Backlinks