Optimizing Search at Uber Eats
Authors: Janani Narayanan (ML Engineer, Uber) and Karthik Ramasamy (Senior Staff SWE, Uber)
QCon San Francisco 2025 talk.
Challenge: nX Merchant Growth
Uber Eats expanded from restaurants to grocery (100k SKUs/store) and retail, with delivery radius growing from 10-15 min to 50+ miles. Goal: scale selection by nX while maintaining latency.
Naive approach failed: increasing early termination count by 200 restaurants → P50 latency increased 4x.
Architecture Layers
- Base Search (Elasticsearch/Lucene custom platform): lexical + vector retrieval
- NER Query Builder: entity recognition → implicit filters, zero-result detection
- Catalog API: fan-out queries, A/B test integration, redirect decisions; caching
- Search API + Algorithm Gateway: personalization/ML re-ranking + promotions bidding
Discovery surfaces: Home Feed (most orders), Search, Suggestions, Ads — all served by same infra.
Geosharding: Two Approaches
Latitude Sharding
- World sliced into latitude bands → equal-sized shards
- Benefit: cities from different time zones co-locate (traffic follows sun) → balanced load
- Problem: bands too narrow at center (dense cities), shards become uneven over time
Hex Sharding (H3)
- World tiled with hexagons at resolution 2-3
- Offline Spark job bins hexagons into N equal shards (bin-packing)
- Buffer zones: documents near shard boundaries indexed in both shards
- More uniform than latitude; handles dense city centers better
Document Layout Optimization
Eats Index Layout
Sorted: city → restaurants sorted by conversion rate → items per restaurant
- Skip directly to target city documents
- Denormalized fields (store metadata in item docs) → compression via delta encoding
- Static rank ordering → early termination once budget hit
Result: 60% latency improvement (145ms → 60ms for a 4K-doc query), index size -20%
Grocery Index Layout
Sorted: city → store (by conversion rate) → items per store
- Per-store budget: skip to next store once limit reached
- Diversity: results from multiple stores, not dominated by one
ETD (Estimated Time of Delivery) Indexing
Root cause of 4x latency: ingestion lost ETD distance metadata (close-by vs. far-away), so query layer couldn’t differentiate.
Solution: Pre-compute hexagon-to-ETD range assignments offline (time ranges: 0-10 min, 10-20 min, etc.). Each store indexed multiple times across different ETD ranges.
Trade-off: more storage/ingestion complexity → simpler, faster query execution.
Parallelization of ETD Ranges
Instead of one expanding-circle query:
- Multiple parallel queries, one per ETD range bucket
- Non-overlapping subqueries → safe to parallelize
- Also parallelized: strong match + fuzzy match + OR queries
Final result: 50% latency reduction at constant recall, plus improved recall vs. prior algorithm.
Key Lessons
- Index layout is “the most overlooked piece of software” — worth years of investment
- Move complexity from query time to ingestion time
- Benchmark before assuming root cause (3 months to diagnose, 4-6 months to XP)
- Test stores in production index added 3x latency overhead — clean separation matters
- Custom search platform built on Apache Lucene with Lambda architecture (Spark batch + Kafka streaming)