T-SNE Explained — Math and Intuition

Author: Achinoam Soroker
Source: https://medium.com/swlh/t-sne-explained-math-and-intuition-94599ab164cf

Summary

Three-stage derivation of the t-SNE algorithm, building intuition before introducing math. Covers why simple projection fails (density differences), how joint probability distributions handle it, and why Student-t solves the crowding problem.

The Three Stages

Stage 1: Joint probability distribution in high-dim

Compute Euclidean distances between all pairs
Convert distances to conditional probabilities via Gaussian centered at xᵢ (perplexity controls σᵢ)
Symmetrize: pᵢⱼ = (p_{j|i} + p_{i|j}) / 2N

Stage 2: Random low-dim dataset

Initialize random points in target dimension (usually 2D or 3D)
Model their similarities using Student-t distribution (not Gaussian)
Why t-distribution: heavy tails prevent “crowding” — moderate high-dim distances become more extreme in low-dim, separating clusters

Stage 3: Gradient descent

Minimize KL divergence between high-dim (P) and low-dim (Q) distributions
Cost function: C = Σᵢⱼ pᵢⱼ log(pᵢⱼ / qᵢⱼ)
Convergence = a 2D/3D layout where neighborhood structure matches the original

Perplexity

Controls σᵢ per point — a target number of effective neighbors. Robust between 5–50. Points in dense regions get smaller σᵢ (narrower Gaussian); sparse regions get larger σᵢ.

Key Limitation

t-SNE is non-parametric: no reusable transform. Cannot project new points — must rerun on the full dataset. This disqualifies it for retrieval/search use.

t-SNE — concept note
Dimensionality Reduction — parent concept
PCA — linear alternative that is parametric

t-SNE Explained - Visualising High-Dimensional Data — step-by-step numerical walkthrough
t-SNE Clearly Explained — includes CNN feature map application
PCA vs t-SNE vs UMAP - Visualizing the Invisible — comparative overview

People

Achinoam Soroker
Laurens van der Maaten — t-SNE original author
Geoffrey Hinton — t-SNE co-author

Awesome Search KG

Explorer

T-SNE Explained — Math and Intuition

T-SNE Explained — Math and Intuition

Summary

The Three Stages

Stage 1: Joint probability distribution in high-dim

Stage 2: Random low-dim dataset

Stage 3: Gradient descent

Perplexity

Key Limitation

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

T-SNE Explained — Math and Intuition

T-SNE Explained — Math and Intuition

Summary

The Three Stages

Stage 1: Joint probability distribution in high-dim

Stage 2: Random low-dim dataset

Stage 3: Gradient descent

Perplexity

Key Limitation

Related Concepts

Related Articles

People

Graph View

Table of Contents

Backlinks