Uber Eats — Scaling Search for Food Delivery

Problem

Uber Eats expanded from restaurants to grocery (100k SKUs/store) and retail, with delivery radius growing from 10–15 min to 50+ miles. Naively increasing the retrieval budget blew up latency 4x. The team needed to scale selection by nX without sacrificing query speed.

Architecture

Four-layer stack, each layer independently cacheable:

Base Search (Elasticsearch/Lucene custom)
    ↓ lexical + vector retrieval
NER Query Builder
    ↓ entity recognition → implicit filters, zero-result detection
Catalog API
    ↓ fan-out, A/B test integration, redirect decisions, caching
Search API + Algorithm Gateway
    ↓ ML re-ranking, personalization, promotions bidding

Same infrastructure powers: Home Feed, Search, Suggestions, Ads.

Solutions

H3 Geosharding

Tiled the world with H3 hexagons (resolution 2–3), then used an offline Spark job to bin hexagons into N equal-sized shards. Buffer zones index boundary documents in both neighboring shards.

Previous latitude-band sharding created uneven shards over time as cities grew. H3 handles dense city centers far better.

Document Layout Optimization

Sorted the index: city → stores (by conversion rate) → items per store

  • Allows early termination once per-store budget is hit
  • Denormalized store metadata into item documents → better delta encoding compression
  • Result: 60% latency reduction (145ms → 60ms), index size -20%

ETD Range Indexing

Root cause of the original 4x latency spike: the query layer couldn’t distinguish close-by from far-away stores because ETD metadata was lost at ingestion. Fix: pre-compute hexagon-to-ETD range assignments offline (0–10 min, 10–20 min, etc.) and index each store in every applicable ETD bucket.

More storage and ingestion complexity → dramatically simpler, faster query execution.

Parallelized Queries

Instead of one expanding-circle query: multiple parallel subqueries per ETD range bucket. Also parallelized: strong match + fuzzy match + OR queries.

Result: 50% further latency reduction at constant recall, plus improved recall vs. prior algorithm.

Key Lessons

  • Index layout is the most overlooked piece of search engineering — worth years of investment
  • Move complexity from query time to ingestion time
  • Diagnose before assuming root cause — took 3 months to identify the actual bottleneck
  • Test stores mixed into production index added 3x latency overhead — environment isolation matters
  • Cache layering at every tier is essential; each layer independently cacheable

What to Steal

  • H3-based geosharding for any location-sensitive search problem
  • Static sort order by conversion rate + per-entity budget for structured catalogs (grocery, retail)
  • ETD/SLA pre-bucketing pattern: pre-compute constraint ranges at index time instead of filtering at query time