Embeddings

A numerical representation of an object (text, image, product, user) as a fixed-length vector of real numbers, where geometric proximity corresponds to semantic similarity. The core primitive of modern semantic search.

"running shoes"  →  [0.21, -0.84, 0.03, 0.67, ...]   (768 dimensions)
"jogging sneakers" →  [0.22, -0.81, 0.05, 0.65, ...]  ← close in space
"tax return"       →  [-0.45, 0.12, -0.78, 0.03, ...] ← far in space

The Core Insight

Meaning can be encoded in direction and distance. Two vectors that point similarly encode similar meaning. This enables:

Similarity search — find items closest in vector space to a query
Clustering — group items by meaning
Algebra — king − man + woman ≈ queen (Word2Vec era)
Transfer — a model trained on general text produces useful representations for specialized domains

Brief History

Era	Model	Representation
2013	Word2Vec, GloVe	Word-level; static
2015	FastText	Subword-aware
2018	ELMo	Contextual word embeddings
2018	BERT	Contextual; CLS token as sentence rep
2019+	Sentence-BERT, E5, BGE	Sentence/passage bi-encoders
2024+	Qwen3, NV-Embed	Instruction-tuned; multi-task; MRL

Dense vs Sparse

The two main families differ in representation structure:

Property	Dense Embeddings	Sparse Embeddings
Dimensionality	256–3072 (all non-zero)	30k–100k (mostly zero)
Space	Semantic latent space	Vocabulary space
Strengths	Semantic similarity, paraphrase	Exact match, rare terms
Index type	ANN (HNSW, IVF)	Inverted index
Examples	E5, BGE, OpenAI	BM25, SPLADE, ELSER

Best results in practice: Hybrid Search combining both.

How Embeddings Are Trained

Contrastive learning (most common for retrieval):

Positive pairs: (query, relevant document)
Negative pairs: (query, irrelevant document)
Loss: pull positives together, push negatives apart (InfoNCE / NT-Xent)

Masked language modeling (BERT pretraining):

Predict masked tokens — forces contextual understanding

Knowledge distillation:

Train smaller model to mimic a larger teacher’s embedding space
See Distilling Retrieval Pipelines to a Single Embedding Model

Fine-tuning:

Start from a general-purpose model; continue training on domain pairs
See Embedding Fine-tuning

Dimensionality

Dims	Models	Notes
256–384	MiniLM, all-MiniLM	Fast; light memory
768	BERT-base, E5-base	Standard; good quality
1024	E5-large, BGE-large	Higher quality; slower
1536	OpenAI text-embedding-3-small	API-served
3072	OpenAI text-embedding-3-large	Highest quality API

Matryoshka Embeddings (MRL) allow truncating to smaller dims at query time — the same model supports multiple dimensionalities.

Embedding Quality vs Cost

At scale, the economics of embedding inference matter:

Embedding generation is compute-bound (not memory-bound)
RTX 4090 offers better FLOPS/$ than H100 for inference
~$0.01/1M tokens achievable with commodity hardware
See Why Are Embeddings So Cheap

Compression via Quantization

Full-precision (float32) embeddings are memory-hungry. Vector Quantization compresses them:

SQ8 (int8): 4× smaller, near-lossless
Binary / BBQ: 32× smaller, fast Hamming distance
Product Quantization: 32–64× smaller

See BBQ for Elasticsearch’s implementation.

Dense Embeddings — all dimensions active; semantic latent space
Sparse Embeddings — vocabulary-space; mostly zero
Dense Vector Retrieval — how dense embeddings are indexed and queried
Sparse Vector Retrieval — inverted index approach
Hybrid Search — combining dense + sparse
Bi-Encoder — architecture that produces embeddings
Embedding Fine-tuning — domain adaptation
Matryoshka Embeddings — flexible-dimension embeddings
Vector Quantization — compressing embeddings for scale
Task-Aware Embeddings — instruction-guided representations

Articles

Why Are Embeddings So Cheap — Piotr Mazurek; economics of embedding inference
Fine-Tuning Qwen3 Embeddings for Product Category Classification — LoRA fine-tuning on domain data
Qwen3 Embedding Series — state-of-the-art open-source embedding models
Distilling Retrieval Pipelines to a Single Embedding Model — Daniel Tunkelang; bag-of-documents training approach

People

Daniel Tunkelang — bag-of-documents embedding model
Piotr Mazurek — embedding economics

Awesome Search KG

Explorer

Embeddings

Embeddings

The Core Insight

Brief History

Dense vs Sparse

How Embeddings Are Trained

Dimensionality

Embedding Quality vs Cost

Compression via Quantization

Articles

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Embeddings

Embeddings

The Core Insight

Brief History

Dense vs Sparse

How Embeddings Are Trained

Dimensionality

Embedding Quality vs Cost

Compression via Quantization

Related Concepts

Articles

People

Graph View

Table of Contents

Backlinks