Mapping Search Queries To Search Intents

Core Insight

“Search queries are not the same as search intents.” Multiple distinct queries can map to a single intent (e.g., “mens shoes” and “shoes for men”).

Recognizing Query Equivalence

Surface Query Similarity Queries differing only in stemming, lemmatization, word order, or stop words often express identical intent. Includes singular/plural variations and compound word differences.

Similar Post-Search Behavior Equivalent-intent queries generate matching user engagement patterns. Behavioral similarity can be represented using vector embeddings of result titles.

Combined Approach

  1. Group queries by surface similarity via canonicalization (stem tokens, alphabetize, remove stop words)
  2. Split groups into behavioral clusters using vector cosine similarity

This ensures paired queries demonstrate both linguistic AND behavioral equivalence — preventing false positives like “dress shirt” vs. “shirt dress.”

Tradeoffs

ApproachProsCons
Surface similarity onlyMinimizes false positivesMisses synonyms and reformulations
Behavioral onlyCaptures semantic equivalenceRisks conflating different intents (e.g., “pants” vs. “dress pants”)
CombinedBest precision + recallMore complex to implement

Applications

  • Query Rewriting: Convert equivalent queries to canonical representations optimizing retrieval/ranking
  • Analytics: Aggregate fragmented behavioral signals across equivalent queries
  • Machine Learning: Use consolidated signals for more robust model training

People