Synonyms

Vocabulary bridging technique that maps alternative phrasings, abbreviations, and related terms to shared canonical tokens, improving recall for queries that don’t lexically match indexed content.

Types

Lexical Synonyms

Direct word-level equivalences: “sofa” → “couch”, “tv” → “television”

Semantic Synonyms

Contextually equivalent terms in a domain: “heart attack” → “myocardial infarction”

Product/Domain Synonyms

E-commerce specific: “shirt” → “top”, “blouse”, “tee”

Abbreviations

“ml” → “machine learning”, “ir” → “information retrieval”

Application Points

Synonyms can be applied at:

  1. Index time — expand document terms during ingestion
  2. Query time — expand the query before retrieval
  3. Both — risk of explosion in index size

Trade-offs

Expansion PointProsCons
Index timeFast query executionIndex bloat, static
Query timeFlexible, updatableSlower, affects all queries

Automated Synonym Discovery

  • Word2Vec / fastText neighborhood mining
  • Co-click analysis (users click same doc for different queries)
  • LLM-generated synonym sets
  • Query log clustering

Dense Vector Retrieval handles some synonym problems implicitly — semantically similar terms map to nearby vectors. However, rare domain terms and abbreviations still benefit from explicit synonym lists.