Vespa

What They Build

Vespa is an open-source big data serving engine originally developed by Yahoo and now maintained as an independent open-source project (vespa.ai). Unlike Elasticsearch (document-centric) Vespa was built with ML-native ranking in mind from the start — it supports tensor computations, native embedding models, and multi-phase ranking as first-class features.

Vespa is particularly strong for use cases that mix retrieval, ranking, and ML inference in a single platform without an external reranking tier.

Search Contributions

Models and Embeddings

  • ColBERT embedder — native integration of ColBERT late-interaction model into Vespa with asymmetric binarization (32x compression with minimal quality loss). Led by Jo Kristian Bergum
  • Native HNSW approximate nearest neighbor for dense vector retrieval
  • BM25 + dense vector Hybrid Search as a built-in feature

Methodology

  • LLM-as-Judge for retrieval evaluation — demonstrated how to use LLMs to generate relevance labels and compute NDCG; showed strong correlation with human judgments (Spearman ρ ≈ 0.85–0.90). Early and influential demonstration of the LLM as Judge pattern for retrieval

Architecture

  • Multi-phase ranking: cheap first-stage rankers → expensive ML models only on candidates
  • Tensor computations at serving time without external ML serving infrastructure
  • Native support for structured filtering alongside dense vector search

Position in the Ecosystem

Vespa is a direct alternative to Elasticsearch + separate ML serving tier. The value proposition: run BM25, dense vector ANN, and ML re-ranking in a single engine. The tradeoff: steeper learning curve, smaller community than Elasticsearch.

People

Articles

Concepts

ColBERT · Late Interaction · LLM as Judge · Hybrid Search · Dense Vector Retrieval · Reranking