BBQ — Better Binary Quantization

Elasticsearch’s production approach to aggressive vector compression, combining binary quantization with techniques that recover precision. Part of Elasticsearch’s broader Vector Quantization story.

The Core Idea

Naive binary quantization (sign of each dimension → 1 bit) is fast but lossy. BBQ adds:

  1. Centroid centering — subtract the mean vector from all stored vectors before quantizing; reduces distortion from asymmetric distributions
  2. Rescoring — BBQ candidates are reranked with full-precision dot products, recovering most of the lost recall

Net result: 32× memory reduction with recall competitive with float32.

OSQ — Optimized Scalar Quantization

Elastic’s companion technique: maps float32 dimensions to a uniform integer grid, enabling integer SIMD arithmetic.

  • 10–40× faster than float32 HNSW traversal on CPU hardware
  • Uniform grid = every comparison becomes integer arithmetic (no float division)
  • Uses anisotropic loss — weights high-magnitude dimensions more heavily when choosing quantization boundaries, since they contribute more to dot product
  • CPU-friendly: leverages AVX2/AVX-512 SIMD without GPU requirement

OSQ vs TurboQuant

TurboQuant (Google Research, Zandieh et al. 2026) uses a rotation + Lloyd-Max codebook approach. OSQ uses a uniform integer grid — simpler, SIMD-native.

  • Elastic’s benchmark (Thomas Veasey): OSQ competitive with TurboQuant at practical recall targets; OSQ wins on CPU throughput
  • Qdrant’s benchmark (10 datasets): TurboQuant 4-bit (8×) beats SQ on 3/10 datasets by up to 4.6 pp; TurboQuant 2-bit beats BQ by 9–24 pp across all datasets

Elasticsearch BBQ in Practice

- Index type: dense_vector with element_type: byte or bit
- bbq_hnsw: Elasticsearch's binary quantized HNSW index
- bbq_flat: flat scan with binary quantization
- Rescoring: num_candidates >> k, then exact rescore

Articles

People

  • Thomas Veasey — Elastic principal engineer; BBQ and OSQ designer