Query Segmentation
Part of the Query Understanding series by Daniel Tunkelang.
Overview
When a query contains multiple words, the way those words are grouped into phrases fundamentally changes how the query should be interpreted. The same sequence of words can yield very different meanings depending on where the phrase boundaries fall. Query segmentation identifies these boundaries by finding word combinations that are statistically cohesive — pairs or groups that frequently appear together and are likely to represent a single concept. This matters downstream because many query understanding steps — entity recognition, synonym expansion, ranked matching — operate at the phrase level rather than the word level. Getting segmentation wrong leads to misidentified entities, incorrect scope, and relevance failures that are difficult to diagnose.
Note: Article content behind Medium paywall — accessible at source URL with Medium account.