Modality Gap

Definition

The modality gap is the observation that in jointly trained multimodal embedding spaces (e.g., CLIP), image embeddings and text embeddings occupy distinct, non-overlapping regions of the embedding space — even after joint training intended to align them.

Images cluster in one cone of the hypersphere, text in another. The gap between these cones is the modality gap.

Why It Occurs

Two contributing factors:

Initialization: Different encoder architectures (vision transformer vs. text transformer) start from different random initializations, pushing representations into different regions.
Contrastive Gap: Contrastive training (InfoNCE loss) pushes matched pairs together and mismatched pairs apart, but doesn’t force both modalities to occupy the same geometric region — just to be closer to their match than to random negatives.

Impact on Multimodal Search

In Multimodal Embeddings systems (image-text retrieval, cross-modal search):

Cosine similarity between a text query and an image is systematically lower than same-modality similarity
Nearest neighbor searches can be biased toward same-modality results
Uniform thresholds for “relevant” don’t work across modalities

Mitigations

Temperature scaling: rescale similarity scores per modality before combining
Modality-specific normalization: normalize scores within modality before cross-modal comparison
Fine-tuning with cross-modal negatives: explicitly train to close the gap
Modality tokens: some models add explicit modality indicators to pull representations closer

Multimodal Embeddings — the broader context where modality gap appears
Contrastive Gap — related phenomenon from contrastive training dynamics
Dense Embeddings — the vector space where the gap manifests
Vector Similarity Metrics — gap affects cosine similarity comparisons
Embedding Fine-tuning — one way to mitigate the gap

Awesome Search KG

Explorer

Modality Gap

Modality Gap

Definition

Why It Occurs

Impact on Multimodal Search

Mitigations

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

Modality Gap

Modality Gap

Definition

Why It Occurs

Impact on Multimodal Search

Mitigations

Related Concepts

Graph View

Table of Contents

Backlinks