BlockMax WAND: How Weaviate Achieved 10x Faster Keyword Search

Authors: André Mourão, Joon-Pil (JP) Hwang (Weaviate) Published: 2025-02-26

Weaviate implemented Block-Max WAND as a technical preview in v1.29, achieving p50 query speedups of 5–10x and index size reductions of 50–90% over their existing WAND-based keyword search.


Problem

As text corpora grow, keyword search latency increases relative to vector search. Even with WAND already reducing documents inspected from 100% to 15–30%, there was significant headroom for further optimization. Keyword search is a core leg of Hybrid Search for RAG and agentic AI.

Key Concepts

Inverted Index and Tokenization

Documents are tokenized (whitespace, lowercase, etc.) and stored in posting lists mapping term → (doc_id, tf). Exhaustive scoring requires visiting every document with at least one query term — wasteful for top-k queries that only need 10–100 results.

WAND (existing approach)

WAND uses global per-term IDF upper bounds to skip documents that cannot beat the current k-th score. Reduces documents inspected from 100% to 15–30% on standard benchmarks.

Block-Max WAND

Block-Max WAND (BMW) further refines this by dividing posting lists into fixed blocks (128 docs/block in Weaviate), each with local max impact metadata:

  • Shallow advances: skip entire blocks without decoding doc IDs, using block-level max impact
  • Deep block skipping: avoid loading blocks from disk entirely when they cannot beat the threshold
  • Reduces documents inspected from 15–30% (WAND) to 5–15% — roughly halving WAND’s work
DatasetWAND % scoredBMW % scoredReduction
MS Marco (8.6M docs)15.1%6.7%-56%
Fever (5.4M docs)20.8%8.4%-60%
Climate Fever (5.4M docs)29.3%12.2%-58%

Compression

Block storage also enables aggressive compression:

varenc (variable-length encoding): store only the minimum bits needed to represent the max value in each block.

Delta encoding (doc IDs): store differences between consecutive doc IDs. Consecutive IDs in a sorted posting list are close together, producing small deltas that compress well.

Combined result: 50–90% reduction in index size depending on data distribution.

DatasetWAND indexBMW indexReduction
MS Marco (8.6M docs)10531 MB941 MB-91%
Fever/Climate Fever (5.4M docs)9326 MB1175 MB-87%

Query Time Results

DatasetWAND p50BMW p50Reduction
MS Marco (8.6M docs)136 ms27 ms-80%
Fever (5.4M docs)517 ms33 ms-94%
Climate Fever (5.4M docs)712 ms87 ms-88%

At 100M documents: scaled from 1 to 50 QPS while maintaining p50 100–200ms, p99 ≤1000ms.

Status and Availability

Available in Weaviate v1.29 as a technical preview. Requires:

  • New collections (no transparent migration for existing data yet)
  • Specific env vars enabled

Not recommended for production environments. Weaviate is developing a transparent migration path.

People