Weaviate Vector DB

Open-source vector database providing semantic and hybrid search infrastructure. Supports native multi-stage retrieval pipelines including ANN retrieval and cross-encoder reranking. Maintained by Weaviate (the company).


What It Does

Weaviate stores objects with vector representations and exposes search via GraphQL and REST APIs. It integrates vectorization and reranking as first-class pipeline stages rather than requiring external tooling.

Key capabilities:

  • ANN searchHNSW-based index
  • Hybrid search — combines dense and sparse (BM25) retrieval with fusion
  • Native reranking — cross-encoder reranking built into the search pipeline (no separate service)
  • Vectorizer modules — pluggable embedding providers (OpenAI, Cohere, HuggingFace, etc.) called at import and query time
  • Multi-modal — image + text vectors in the same collection
  • GraphQL API — declarative query syntax for retrieval, filtering, aggregation

Two-Stage Pipeline

Weaviate’s canonical retrieval pattern:

  1. Stage 1 — Bi-encoder ANN: fast candidate retrieval using pre-computed embeddings (Bi-Encoder)
  2. Stage 2 — Cross-encoder reranking: accurate reranking of top-K candidates using query-aware scoring (Cross-Encoder)

This separates scalability (stage 1) from accuracy (stage 2).

  • Qdrant Vector DB — competing vector database; stronger quantization options (TurboQuant)
  • Pinecone — managed-only competitor
  • FAISS — library only; no pipeline features

Articles

People

  • Laura Ham — cross-encoder pipeline explainer