PCA Dimensionality Reduction

PCA Dimensionality Reduction – Interactive 3D→2D Projection

Watch Principal Component Analysis find the directions of maximum variance in 3D data and project it onto a 2D plane. Generate clustered data, rotate the 3D view, and observe how PCA preserves the most important structure while discarding less informative dimensions.

Technical implementation:

Real eigendecomposition using power iteration method (no NumPy/SciPy).
Covariance matrix computation: C = (1/(n-1))·XᵀX where X is mean-centered.
Variance explained calculation: λᵢ/Σλⱼ for each eigenvalue.
Orthogonal projection: X_reduced = X_centered · W where W contains top-k eigenvectors.
Xavier initialization for stable convergence in power iteration.

Mathematical foundations: PCA solves the optimization problem: find orthogonal directions that maximize variance. The solution is the eigenvectors of the covariance matrix, ordered by eigenvalue magnitude. The first PC captures the most variance, the second captures the second-most (orthogonal to the first), and so on.

Why power iteration: Instead of using a black-box eigendecomposition library, this implements the power iteration algorithm:

v ← random vector

for iteration:

v ← A·v / ‖A·v‖

converges to dominant eigenvector.

Then uses deflation (A' = A - λvvᵀ) to find subsequent eigenvectors.

Interactive features:

3D rotation controls to explore data from different angles.
Adjustable parameters: number of points, clusters, spread (σ).
Side-by-side comparison: original 3D vs PCA 2D projection.
Variance bars: visual breakdown of information retention per component.
Auto-regeneration: sliders update data in real-time.

Educational value:

Why 94% variance retained: The bars show exactly how much information each PC preserves.
Curse of dimensionality: Generate high-dimensional data to see why PCA matters for visualization.
When PCA fails: Try highly nonlinear manifolds (e.g., Swiss roll) to understand PCA's limitations.

Key insight displayed: The visualization makes it immediately obvious that PCA finds the "best" 2D plane to view 3D data, it's the plane where the projected points are most spread out. This is exactly what "maximum variance" means geometrically.

Comparison to t-SNE/UMAP: Unlike t-SNE or UMAP (which preserve local structure), PCA is a linear method that preserves global variance. The trade-off:

PCA: Interpretable axes (each PC is a linear combination of original features).
t-SNE/UMAP: Better at preserving cluster structure but no interpretable axes.

When to use in practice:

Exploratory data analysis: Quick check for linear structure in high-D data.
Feature extraction: Use top PCs as input to downstream models.
Data compression: Keep 95% variance with 10% of dimensions.
Noise reduction: Discard low-variance PCs (often noise).

Implementation details worth noting:

Handles numerical stability (log-sum-exp trick in softmax if extended).
Proper mean-centering before covariance computation.
Normalized eigenvectors (unit length).
Sorted by eigenvalue magnitude (descending).

Page updated

Google Sites

Report abuse