Dimensionality Reduction

Definition

Dimensionality reduction transforms high-dimensional vectors (e.g., 768-dim BERT embeddings) into lower-dimensional representations while preserving as much structure as possible. In search, this reduces storage, speeds up ANN search, and enables visualization.

Why It Matters for Search

Dense embedding models produce vectors of 768–4096 dimensions. At scale:

1B documents × 768 dims × 4 bytes = 3TB of vectors
ANN index build time and memory scale with dimensionality
Reducing to 256 dims can cut storage 3× with modest quality loss

Alternative: Vector Quantization reduces precision per dimension; dimensionality reduction reduces the number of dimensions.

Main Techniques

PCA (Principal Component Analysis)

PCA — linear projection onto axes of maximum variance. Fast, deterministic, reversible.

Good for: moderate compression, preserving global structure, ML preprocessing
Limitation: linear only — can’t capture non-linear manifolds

t-SNE (t-distributed Stochastic Neighbor Embedding)

t-SNE — non-linear projection optimized for preserving local neighborhoods. Produces beautiful 2D/3D visualizations.

Good for: visualization only — not suitable for retrieval (non-parametric, can’t project new points)
Limitation: slow on large datasets, non-deterministic, between-cluster spacing is meaningless

UMAP (Uniform Manifold Approximation and Projection)

UMAP — non-linear projection that preserves both local and global structure better than t-SNE. Faster and parametric (can project new points).

Good for: visualization + moderate retrieval use
Increasingly used to compress embeddings for search

Matryoshka / Truncation

Matryoshka Embeddings are trained so that the first N dimensions already form a good representation — just truncate to reduce dimensionality without any projection. This makes them dimension-flexible at inference time: pick any supported dimension (e.g. 64, 128, 256, 512) without re-encoding. The cleanest approach for models that support it.