Scalar Quantization

Maps each float32 dimension of a dense vector to a smaller integer type — independently, coordinate by coordinate. The simplest and most widely-used form of Vector Quantization. Works on any embedding distribution; no isotropy assumption required.

How It Works

For each coordinate independently:

  1. Determine the min/max range of values across the dataset (or segment)
  2. Map the float32 range linearly onto the integer range
  3. Store the integer; reconstruct float at score time via the inverse mapping
float32 value: 0.47
range: [-1.2, 1.4]   →   int8 range: [-128, 127]
encoded:  round((0.47 + 1.2) / 2.6 × 255 - 128)  =  35

Variants

VariantBits/dimCompressionRecall impact
SQ8 (int8)8< 1% on most embeddings
SQ4 (int4)41–3% typically
SQ66~5.3×Between SQ8 and SQ4

SQ8 is the practical default: near-lossless at 4× compression.

Optimized Scalar Quantization (OSQ) — Elastic’s Variant

Elasticsearch’s production implementation adds:

  • Uniform integer grid → every similarity computation becomes integer arithmetic (no float division per compare)
  • Anisotropic loss function — weights high-magnitude dimensions more heavily when choosing quantization boundaries, since they contribute more to the dot product
  • SIMD arithmetic: _mm_maddubs_epi16, AVX-512 integer dot products
  • Result: 10–40× faster HNSW traversal than float32 on CPU

See BBQ (which covers OSQ as part of Elasticsearch’s quantization stack).

vs. Other Quantization Methods

MethodCompressionRecallDistribution assumption
SQ8~float32None — universal
Binary Quantization16–32×Drops without rescoringZero-mean, isotropic
TurboQuant 4-bitCompetitive with SQ8Handles anisotropy via calibration
TurboQuant 2-bit16×Beats BQ 2-bit by 9–24 ppHandles anisotropy
Product Quantization32–64×ModerateLearned per dataset

SQ is the safest default: works well regardless of embedding model. Switch to TurboQuant 4-bit to halve memory at similar recall.

L1 compatibility

SQ works for all distance metrics including L1. Rotation-based methods (TurboQuant, RaBitQ) preserve L2 norm but not L1 — making SQ the only option for L1 similarity.

Where Used

  • HNSW + SQ is the standard production combination in Elasticsearch, Qdrant, and Weaviate
  • IVF-SQ: SQ applied inside each inverted list
  • FAISS IndexIVFScalarQuantizer