Matryoshka embeddings: faster OpenAI vector search using Adaptive Retrieval
OpenAI’s text-embedding-3 models can shorten dimensions via a dimensions API parameter, powered by Matryoshka Representation Learning (MRL) — a training technique that embeds information at multiple granularity levels coarse-to-fine.
How shortening works
Truncate from the end, then re-normalize (the normalization step is critical — truncating a unit vector breaks magnitude=1).
Adaptive Retrieval
Two-pass search exploiting the sub-vector hierarchy:
- First pass (fast, low-dim): ANN search at 512d → shortlist of
match_count × 8candidates - Second pass (accurate, high-dim): KNN re-rank at full 3072d
Results (1M DBpedia vectors)
| Method | Accuracy | QPS |
|---|---|---|
| Single-pass ANN 1536d | 89.2% | 670 |
| Adaptive Retrieval 512d→3072d | 99% | 580 |
pgvector implementation
-- Functional index on 512d sub-vectors
CREATE INDEX ON documents
USING hnsw ((sub_vector(embedding, 512)::vector(512)) vector_ip_ops)
WITH (m = 32, ef_construction = 400);
-- Two-pass query: first shortlist at 512d, re-rank at 3072dKey insights
text-embedding-3-large @ 256doutperformsada-002 @ 1536d(MTEB 62.0 vs 61.0)- Optimal first-pass: 512d (likely a training granularity)
- N-pass extension = Funnel Retrieval (MRL paper)
- Shorter first-pass vectors aren’t always faster — lower accuracy requires retrieving more records