Understanding Principal Component Analysis (PCA)
Author: Roshmita Dey
Source: https://medium.com/@roshmitadey/understanding-principal-component-analysis-pca-d4bb40e12d33
Summary
Comprehensive guide covering why dimensionality reduction is needed, the mathematics of PCA, and applications across domains. Strong on motivating the curse of dimensionality before presenting the algorithm.
Why Dimensionality Reduction
Four problems that high-dimensionality causes:
- Computational complexity — resources grow exponentially with features
- Overfitting — models fit noise in high-dim space, poor generalization
- Visualization — impossible to visualize beyond 3D
- Redundancy — correlated features carry duplicate information, safely eliminable
PCA Steps
- Standardize (mean=0, std=1)
- Compute covariance matrix
- Compute eigenvalues and eigenvectors
- Sort by eigenvalue descending (first PC = most variance explained)
- Select top-k eigenvectors (transformation matrix)
- Project data into reduced space
Eigenvalue Interpretation
- Eigenvalue λ = variance explained by that principal component; always non-negative
- Eigenvector = direction of that component in feature space; normalized to length 1
- First PC has largest eigenvalue; PCs are orthogonal (uncorrelated by construction)
Applications
- Image compression and face recognition
- Bioinformatics (gene expression dimensionality)
- Recommendation systems (user-item matrix compression)
- Finance (trend identification in price data)
Related Concepts
- PCA — concept note
- Dimensionality Reduction — parent concept
- Embeddings — the vectors being compressed
Related Articles
- Principal Component Analysis (PCA) In Depth — algorithm walkthrough with worked numerical example
- PCA vs t-SNE vs UMAP - Visualizing the Invisible — comparison with non-linear methods