1000x Faster Spelling Correction Algorithm — SymSpell
Source: https://wolfgarbe.medium.com/1000x-faster-spelling-correction-algorithm-2012-8701fcd87a5f Author: Wolf Garbe (creator of SymSpell and SymSpell library)
Summary
SymSpell (Symmetric Delete Spelling Correction) achieves O(1) spelling correction through pre-computed deletion dictionaries, running “three orders of magnitude less expensive” than Norvig’s algorithm and up to 1,000,000x faster for edit distance 3.
Core Innovation: Delete-Only Transformations
Instead of generating all edit-distance variants at query time (deletes + inserts + replacements + transpositions), SymSpell generates only deletions from dictionary words during a one-time preprocessing step.
Why this works:
- An insertion in the query is equivalent to a deletion in the dictionary entry
- This symmetry means deletes cover all needed transformations
- For a 9-character word at edit distance 2: 114,324 candidate transforms → reduced to just 36 deletions
Algorithm
Preprocessing (once):
- For each dictionary word, generate all delete variants up to max edit distance
- Store in hash table:
deleted_variant → [original_words]
At query time:
- Generate delete variants of the query (just deletions, no inserts/replacements)
- Hash lookup against pre-computed dictionary
- Return candidates, sort by frequency
Performance
- O(1) lookup via hash table (vs. O(log n) for BK-Trees)
- Language-independent — works for Chinese (70,000+ Unicode chars) unlike trie-based approaches
- Pre-computation amortized across all queries
Benchmarks
Outperforms: BK-Trees, Tries, Norvig’s algorithm — especially at high concurrency (search engine loads).
Related Concepts
People
- Wolf Garbe — independent researcher/developer