PCA vs t-SNE vs UMAP: Visualizing the Invisible in Your Data

Author: Lakhan Bukkawar
Source: https://medium.com/@laakhanbukkawar/pca-vs-t-sne-vs-umap-visualizing-the-invisible-in-your-data-92cb2baebdbb

Summary

Side-by-side comparison of PCA, t-SNE, and UMAP with Python implementations on the Iris dataset. PCA maximizes variance through linear projection; t-SNE preserves local neighborhoods for cluster visualization; UMAP balances local and global structure with better speed and scalability than t-SNE.

Technique Summaries

PCA: Standardize → covariance matrix → eigenvectors → project onto top-k. Linear, deterministic, fast. Preserves global variance structure. Suitable for ML preprocessing.

t-SNE: Models high-dim similarities as Gaussian probabilities, low-dim as Student-t, then minimizes KL divergence. Non-linear, non-parametric (can’t project new points). Best for visualization only.

UMAP: Builds a fuzzy graph in high-dim space, optimizes low-dim layout. Preserves both local and some global structure. Fast, scalable, parametric (can project new points). Better than t-SNE for large datasets.

Decision Framework

Need dimensionality reduction?
├── Modeling/preprocessing → PCA or UMAP
│   └── Interpretability needed → PCA
└── Visualization?
    ├── Small dataset → t-SNE
    └── Large + noisy → UMAP

Comparison Table

FeaturePCAt-SNEUMAP
TypeLinearNon-linearNon-linear
PreservesGlobalLocalLocal + some global
SpeedFastSlowFast
Use in ML modelsYesNoYes
Clustering friendlyNoYesYes
Inverse transformYesNoNo

People