Matryoshka Representations: A Guide to Faster Semantic Search

OpenAI’s text-embedding-3 models support dynamic embedding dimension adjustment via MRL (Matryoshka Representation Learning).

Two-phase search strategy

Rather than comparing full embeddings against all indexed vectors:

Phase 1: Compare shortened query embeddings (256d) against truncated database embeddings
Phase 2: Re-rank top candidates using full-length embeddings

Indexed ~45K movies (title + overview + genre) via OpenAI API with multithreading. Total cost: under $0.08.

Utility: take the long embeddings, chop them off and normalize them → shortened variants for phase 1.

Query	Non-MRL	MRL
”the godfather”	0.34s	0.045s

~10x faster while maintaining comparable result relevance.

Despite MRL having extra steps and 2 cosine similarity computations, shorter first-pass vectors dramatically reduce the search space.

MRL eliminates the need to maintain multiple models for different filtering levels — a single model serves both coarse and fine-grained search.