Matryoshka Representations: A Guide to Faster Semantic Search

OpenAI’s text-embedding-3 models support dynamic embedding dimension adjustment via MRL (Matryoshka Representation Learning).

Two-phase search strategy

Rather than comparing full embeddings against all indexed vectors:

  1. Phase 1: Compare shortened query embeddings (256d) against truncated database embeddings
  2. Phase 2: Re-rank top candidates using full-length embeddings

Implementation (45K movie dataset)

Indexed ~45K movies (title + overview + genre) via OpenAI API with multithreading. Total cost: under $0.08.

Utility: take the long embeddings, chop them off and normalize them → shortened variants for phase 1.

Results

QueryNon-MRLMRL
”the godfather”0.34s0.045s

~10x faster while maintaining comparable result relevance.

Despite MRL having extra steps and 2 cosine similarity computations, shorter first-pass vectors dramatically reduce the search space.

Key takeaway

MRL eliminates the need to maintain multiple models for different filtering levels — a single model serves both coarse and fine-grained search.

People