Scalar Quantization

Maps each float32 dimension of a dense vector to a smaller integer type — independently, coordinate by coordinate. The simplest and most widely-used form of Vector Quantization. Works on any embedding distribution; no isotropy assumption required.

How It Works

For each coordinate independently:

Determine the min/max range of values across the dataset (or segment)
Map the float32 range linearly onto the integer range
Store the integer; reconstruct float at score time via the inverse mapping

float32 value: 0.47
range: [-1.2, 1.4]   →   int8 range: [-128, 127]
encoded:  round((0.47 + 1.2) / 2.6 × 255 - 128)  =  35

Variants

Variant	Bits/dim	Compression	Recall impact
SQ8 (int8)	8	4×	< 1% on most embeddings
SQ4 (int4)	4	8×	1–3% typically
SQ6	6	~5.3×	Between SQ8 and SQ4

SQ8 is the practical default: near-lossless at 4× compression.

Optimized Scalar Quantization (OSQ) — Elastic’s Variant

Elasticsearch’s production implementation adds:

Uniform integer grid → every similarity computation becomes integer arithmetic (no float division per compare)
Anisotropic loss function — weights high-magnitude dimensions more heavily when choosing quantization boundaries, since they contribute more to the dot product
SIMD arithmetic: _mm_maddubs_epi16, AVX-512 integer dot products
Result: 10–40× faster HNSW traversal than float32 on CPU

See BBQ (which covers OSQ as part of Elasticsearch’s quantization stack).

vs. Other Quantization Methods

Method	Compression	Recall	Distribution assumption
SQ8	4×	~float32	None — universal
Binary Quantization	16–32×	Drops without rescoring	Zero-mean, isotropic
TurboQuant 4-bit	8×	Competitive with SQ8	Handles anisotropy via calibration
TurboQuant 2-bit	16×	Beats BQ 2-bit by 9–24 pp	Handles anisotropy
Product Quantization	32–64×	Moderate	Learned per dataset

SQ is the safest default: works well regardless of embedding model. Switch to TurboQuant 4-bit to halve memory at similar recall.

L1 compatibility

SQ works for all distance metrics including L1. Rotation-based methods (TurboQuant, RaBitQ) preserve L2 norm but not L1 — making SQ the only option for L1 similarity.

Where Used

HNSW + SQ is the standard production combination in Elasticsearch, Qdrant, and Weaviate
IVF-SQ: SQ applied inside each inverted list
FAISS IndexIVFScalarQuantizer

Vector Quantization — parent concept
Binary Quantization — more aggressive compression; needs rescoring
TurboQuant — rotation-based alternative; 4-bit beats SQ8 at 8× compression
BBQ — Elasticsearch’s OSQ + binary quantization stack
HNSW — primary index combined with SQ in production
IVF — IVF-SQ variant
Dense Vector Retrieval — where SQ is applied

Awesome Search KG

Explorer

Scalar Quantization

Scalar Quantization

How It Works

Variants

Optimized Scalar Quantization (OSQ) — Elastic’s Variant

vs. Other Quantization Methods

L1 compatibility

Where Used

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Scalar Quantization

Scalar Quantization

How It Works

Variants

Optimized Scalar Quantization (OSQ) — Elastic’s Variant

vs. Other Quantization Methods

L1 compatibility

Where Used

Related Concepts

Graph View

Table of Contents

Backlinks