Search Query Expansion with Query Embeddings
Source: https://bytes.grubhub.com/search-query-embeddings-using-query2vec-f5931df27d79 Author: Grubhub Engineering
Note: Full content unavailable (SSL certificate error on bytes.grubhub.com). Summary based on known content from Grubhub’s engineering blog.
Summary
Grubhub’s Query2Vec system learns embeddings for food-domain queries, then uses those embeddings to expand queries with semantically similar terms — improving recall for food search without sacrificing precision.
Problem: Food Search Vocabulary Gap
Food queries have extreme vocabulary diversity:
- “tacos” vs “Mexican food” vs “burritos” — all semantically related but lexically disjoint
- “healthy” vs “salad” vs “low-calorie” — multi-dimensional semantic clusters
- BM25 alone misses cross-vocabulary matches
Query2Vec Approach
Train word2vec-style embeddings on query logs:
- Corpus: sequences of queries from the same user session (queries in a session are “context” for each other)
- Training: skip-gram with negative sampling; predict surrounding queries from current query
- Output: dense query embedding in semantic space
Unlike document embeddings, Query2Vec embeds entire queries (not individual words), capturing food-domain intent directly.
Query Expansion with Query2Vec
At query time:
- Embed incoming query with Query2Vec
- Find k-nearest-neighbor queries in embedding space
- Extract terms from neighbor queries as expansion candidates
- Expand original query with top-N candidate terms (weighted by similarity)
Results
- Improved recall on food queries by surfacing restaurants with semantically related menus
- Reduced zero-result rate for rare food queries
- Controlled precision degradation with similarity threshold tuning
Key Design Choices
- Session-level context (vs. document context) captures query intent better for this domain
- Domain-specific embeddings outperformed general-purpose embeddings (e.g., GloVe) for food search
- Query expansion applied at retrieval stage (not reranking), maximizing recall benefit
Key Concepts
- Query2Vec — session-context-trained query embeddings for food search
- Session-based query context — queries in a session as training signal
- Query expansion via KNN — expand with terms from similar queries
- Domain-specific embeddings — food-domain training outperforms general vectors