t-SNE Explained: Visualising High-Dimensional Data

Author: Billy Chan
Source: https://medium.com/data-science-explained/t-sne-explained-visualising-high-dimensional-data-4f556041a80e

Summary

Most detailed step-by-step walkthrough of the t-SNE algorithm, working through a toy 5-point dataset numerically at every stage. Explains the “why” behind each design decision: why probabilities over distances, why perplexity, why Student-t in low-D.

Why Probabilities Over Distances

Raw distances aren’t comparable across dense vs sparse regions. Converting to probabilities normalizes locally: a distance of 0.5 in a dense cluster gets a different probability than the same distance in a sparse region. This lets t-SNE treat all points fairly.

Step-by-Step Algorithm

Pick point i; compute Euclidean distances to all j’s
Convert distances to Gaussian similarity scores (RBF kernel)
Normalize scores → conditional probabilities p_{j|i}
Choose σᵢ via perplexity: binary search for σ that gives target entropy log(perplexity)
Collect all conditional probabilities → asymmetric matrix
Symmetrize: pᵢⱼ = (p_{j|i} + p_{i|j}) / 2N — now all 10 joint probs (for 5 points) sum to 1
Initialize random low-dim positions yᵢ
Compute low-dim similarities using Cauchy distribution (Student-t with 1 degree of freedom)
Minimize KL divergence via gradient descent

Why Student-t (Cauchy) in Low-D

In high-D, distances concentrate — most points are roughly equidistant. Gaussian tails drop too fast in low-D, causing the crowding problem (all points collapse together). The Student-t heavy tail keeps distant points meaningfully apart in probability terms.

Key Warning

Between-cluster spacing in t-SNE plots is arbitrary — clusters can be at any distance from each other. Only within-cluster structure is meaningful. The KL divergence asymmetry means t-SNE cares much more about preserving close neighbors than about placing distant clusters correctly.

t-SNE — concept note
Dimensionality Reduction — parent concept

t-SNE Explained - Math and Intuition — complementary derivation, same algorithm
t-SNE Clearly Explained — includes early exaggeration/compression tricks
PCA vs t-SNE vs UMAP - Visualizing the Invisible — comparison table

People

Billy Chan
Laurens van der Maaten — t-SNE original author

Awesome Search KG

Explorer

t-SNE Explained: Visualising High-Dimensional Data

t-SNE Explained: Visualising High-Dimensional Data

Summary

Why Probabilities Over Distances

Step-by-Step Algorithm

Why Student-t (Cauchy) in Low-D

Key Warning

People

Graph View

Table of Contents

Backlinks

Awesome Search KG

Explorer

t-SNE Explained: Visualising High-Dimensional Data

t-SNE Explained: Visualising High-Dimensional Data

Summary

Why Probabilities Over Distances

Step-by-Step Algorithm

Why Student-t (Cauchy) in Low-D

Key Warning

Related Concepts

Related Articles

People

Graph View

Table of Contents

Backlinks