Query Understanding, Divided into Three Parts

Source: https://medium.com/@dtunkelang/query-understanding-divided-into-three-parts-d9cbc81a5d09
Author: Daniel Tunkelang

Summary

Daniel Tunkelang proposes dividing Query Understanding into three distinct components — formulation analysis, intent determination, and contextualization — each with different techniques and system requirements.

The Three Parts Framework

Part 1: Query Formulation Analysis

What the user typed, interpreted literally.

Goal: extract structured information from the raw query string.

Techniques:

  • Spell correction: “macbbook pro” → “macbook pro”
  • Tokenization: decide word boundaries
  • Query Segmentation: “new york times” → single entity
  • Named entity recognition: identify products, brands, places
  • Part-of-speech tagging: distinguish modifiers from head terms

This is largely a linguistic/NLP problem.

Part 2: Intent Determination

What the user is trying to accomplish.

Goal: classify the query into intent categories that drive retrieval strategy.

Techniques:

  • Intent classifier (navigational / informational / transactional)
  • Query type classification (Query Types)
  • Specificity assessment (head/torso/tail)
  • Ambiguity detection

This is a semantic/ML problem.

Part 3: Contextualization

What this specific user in this specific context actually wants.

Goal: personalize the interpretation based on:

  • User history and preferences
  • Location and device
  • Session context (previous queries)
  • Temporal context (trending, seasonal)

This is a personalization/data problem.

Why Separate the Three Parts?

Each part has different:

  • Failure modes: mis-tokenization ≠ wrong intent ≠ wrong personalization
  • Solutions: NLP tools ≠ ML classifiers ≠ user models
  • Update frequency: NLP rarely; intent classifiers monthly; personalization real-time
  • Required data: text data ≠ labeled intent data ≠ user behavior data

Mixing them together creates unmaintainable, untestable systems.

Part Interactions

The three parts aren’t independent:

  • Formulation analysis feeds into intent determination
  • Intent modulates contextualization (navigational queries need less personalization)
  • Context can feed back to formulation (previous session queries influence parsing)

Example: “apple watch bands”

PartAnalysis
FormulationEntities: “Apple Watch” (product) + “bands” (accessory type)
IntentTransactional (shopping intent, specific product + accessory)
ContextUser previously searched “smartwatch” → probably new to Apple ecosystem → show beginner options

Final action: structured search over Apple Watch bands, filtered by user’s apparent experience level.

People