Gradient Descent Visualizer

Gradient Descent Visualizer – Ball Rolling on a Loss Landscape

Watch optimization algorithms in action as a ball rolls down a 2D loss surface, following gradient vectors toward local minima. Compare vanilla gradient descent against momentum-based methods and see why modern optimizers work better on difficult landscapes.

What makes this stand out:

Real gradient computation – analytical derivatives for each loss function (no numerical approximations).
4 optimizer implementations from scratch – Vanilla GD, Momentum, Nesterov Momentum, and Adam (with bias correction).
4 pathological loss functions – Rosenbrock valley (narrow curved valley), saddle points, Rastrigin (many local minima), and quadratic bowl.
Live visualization – ball position, gradient arrow, optimization trail, and synchronized loss curve.

Technical implementation:

Closed-form gradients for all functions (e.g., Rosenbrock: ∇f = [-2(1-x) - 4bx(y-x²), 2b(y-x²)]).
Marching squares algorithm for smooth contour line generation.
Customizable color maps (Turbo, Viridis, Warm-Cool) with gamma correction.
Finite-value clamping to prevent NaN explosions in unstable regions.
Log-scale loss plotting with automatic tick placement.

Optimizer details:

Momentum: v ← βv + (1-β)∇L, θ ← θ - αv.
Nesterov: Lookahead gradient evaluation before update.
Adam: Adaptive learning rates with bias-corrected first/second moments.

Interactive controls:

Click canvas to set starting position.
Adjustable learning rate, momentum coefficients (β₁, β₂).
Gradient noise injection to simulate stochastic gradients.
Play/pause/step for frame-by-frame analysis.
Randomize start position to explore different trajectories.

Why Rosenbrock matters: The Rosenbrock function (banana valley) is a classic test case because it has a narrow, curved valley where vanilla GD struggles—momentum methods follow the valley curvature much better. This visualizer makes this difference immediately obvious.

Educational value:

For interviews: "Explain why momentum helps" → Show this
For teaching: Demonstrate why learning rate tuning matters (try α=0.5 on Rosenbrock)
For debugging: Understand why your optimizer might be oscillating or diverging

Self-testing framework: Includes automated checks for coordinate finiteness and rendering stability to catch edge cases (NaN gradients, overflow, canvas sizing bugs).

Page updated

Google Sites

Report abuse