Optimizing Search at Uber Eats

Authors: Janani Narayanan (ML Engineer, Uber) and Karthik Ramasamy (Senior Staff SWE, Uber)
QCon San Francisco 2025 talk.

Challenge: nX Merchant Growth

Uber Eats expanded from restaurants to grocery (100k SKUs/store) and retail, with delivery radius growing from 10-15 min to 50+ miles. Goal: scale selection by nX while maintaining latency.

Naive approach failed: increasing early termination count by 200 restaurants → P50 latency increased 4x.

Architecture Layers

Base Search (Elasticsearch/Lucene custom platform): lexical + vector retrieval
NER Query Builder: entity recognition → implicit filters, zero-result detection
Catalog API: fan-out queries, A/B test integration, redirect decisions; caching
Search API + Algorithm Gateway: personalization/ML re-ranking + promotions bidding

Discovery surfaces: Home Feed (most orders), Search, Suggestions, Ads — all served by same infra.

Geosharding: Two Approaches

Latitude Sharding

World sliced into latitude bands → equal-sized shards
Benefit: cities from different time zones co-locate (traffic follows sun) → balanced load
Problem: bands too narrow at center (dense cities), shards become uneven over time

Hex Sharding (H3)

World tiled with hexagons at resolution 2-3
Offline Spark job bins hexagons into N equal shards (bin-packing)
Buffer zones: documents near shard boundaries indexed in both shards
More uniform than latitude; handles dense city centers better

Document Layout Optimization

Eats Index Layout

Sorted: city → restaurants sorted by conversion rate → items per restaurant

Skip directly to target city documents
Denormalized fields (store metadata in item docs) → compression via delta encoding
Static rank ordering → early termination once budget hit

Result: 60% latency improvement (145ms → 60ms for a 4K-doc query), index size -20%

Grocery Index Layout

Sorted: city → store (by conversion rate) → items per store

Per-store budget: skip to next store once limit reached
Diversity: results from multiple stores, not dominated by one

ETD (Estimated Time of Delivery) Indexing

Root cause of 4x latency: ingestion lost ETD distance metadata (close-by vs. far-away), so query layer couldn’t differentiate.

Solution: Pre-compute hexagon-to-ETD range assignments offline (time ranges: 0-10 min, 10-20 min, etc.). Each store indexed multiple times across different ETD ranges.

Trade-off: more storage/ingestion complexity → simpler, faster query execution.

Parallelization of ETD Ranges

Instead of one expanding-circle query:

Multiple parallel queries, one per ETD range bucket
Non-overlapping subqueries → safe to parallelize
Also parallelized: strong match + fuzzy match + OR queries

Final result: 50% latency reduction at constant recall, plus improved recall vs. prior algorithm.

Key Lessons

Index layout is “the most overlooked piece of software” — worth years of investment
Move complexity from query time to ingestion time
Benchmark before assuming root cause (3 months to diagnose, 4-6 months to XP)
Test stores in production index added 3x latency overhead — clean separation matters
Custom search platform built on Apache Lucene with Lambda architecture (Spark batch + Kafka streaming)

Awesome Search KG

Explorer

Optimizing Search at Uber Eats

Optimizing Search at Uber Eats

Challenge: nX Merchant Growth

Architecture Layers

Geosharding: Two Approaches

Latitude Sharding

Hex Sharding (H3)

Document Layout Optimization

Eats Index Layout

Grocery Index Layout

ETD (Estimated Time of Delivery) Indexing

Parallelization of ETD Ranges

Key Lessons

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Optimizing Search at Uber Eats

Optimizing Search at Uber Eats

Challenge: nX Merchant Growth

Architecture Layers

Geosharding: Two Approaches

Latitude Sharding

Hex Sharding (H3)

Document Layout Optimization

Eats Index Layout

Grocery Index Layout

ETD (Estimated Time of Delivery) Indexing

Parallelization of ETD Ranges

Key Lessons

Related Concepts

Graph View

Table of Contents

Backlinks