1000x Faster Spelling Correction Algorithm — SymSpell

Source: https://wolfgarbe.medium.com/1000x-faster-spelling-correction-algorithm-2012-8701fcd87a5f Author: Wolf Garbe (creator of SymSpell and SymSpell library)

Summary

SymSpell (Symmetric Delete Spelling Correction) achieves O(1) spelling correction through pre-computed deletion dictionaries, running “three orders of magnitude less expensive” than Norvig’s algorithm and up to 1,000,000x faster for edit distance 3.

Core Innovation: Delete-Only Transformations

Instead of generating all edit-distance variants at query time (deletes + inserts + replacements + transpositions), SymSpell generates only deletions from dictionary words during a one-time preprocessing step.

Why this works:

  • An insertion in the query is equivalent to a deletion in the dictionary entry
  • This symmetry means deletes cover all needed transformations
  • For a 9-character word at edit distance 2: 114,324 candidate transforms → reduced to just 36 deletions

Algorithm

Preprocessing (once):

  1. For each dictionary word, generate all delete variants up to max edit distance
  2. Store in hash table: deleted_variant → [original_words]

At query time:

  1. Generate delete variants of the query (just deletions, no inserts/replacements)
  2. Hash lookup against pre-computed dictionary
  3. Return candidates, sort by frequency

Performance

  • O(1) lookup via hash table (vs. O(log n) for BK-Trees)
  • Language-independent — works for Chinese (70,000+ Unicode chars) unlike trie-based approaches
  • Pre-computation amortized across all queries

Benchmarks

Outperforms: BK-Trees, Tries, Norvig’s algorithm — especially at high concurrency (search engine loads).

People