E-commerce Search and Recommendation with Vespa

Jo Kristian Bergum (Vespa) surveys how Vespa addresses core e-commerce search challenges. Used at Yahoo e-commerce sites in Asia.

Key Capabilities Covered

Text Ranking Beyond BM25/TF-IDF

Vespa’s native ranking features consider query term proximity (not just frequency) — important for e-commerce where “black mini iPad cover compatible with iPad 2” shouldn’t outrank “iPad 2.”

Custom Business Ranking Logic

Ranking profiles allow hand-crafted + ML ranking expressions. Sponsored/promoted items can be tiered above organic results via a simple rank profile, combined with LTR models (TensorFlow, XGBoost via ONNX).

Facets and Grouping

Vespa Grouping Language supports deep nested facet aggregation and pagination within groups.

Vocabulary Mismatch

“Ladies pregnancy dress” ≠ “Women maternity gown” — solved via semantic retrieval (dense tensor embeddings from Universal Sentence Encoder). Enables multilingual retrieval from a single index.

Query Classification

ML models (BERT, deployed via ONNX) classify query intent (navigational vs. informational vs. category browse), selecting different ranking profiles or triggering query rewrites.

Scaling for Holiday Traffic

Vespa’s C++ core avoids JVM GC pauses; no shard concept (elastic resizing without re-indexing); graceful degradation under overload; 40-50K partial attribute updates/sec for real-time inventory/price signals.

Key Takeaway

Vespa’s design philosophy — “ranking is just math” — gives relevance engineers full control over scoring functions without needing to rewrite queries. First-class LTR + real-time partial updates + prefiltered ANN makes it a strong all-in-one e-commerce search platform.