Mathematics of the Garden
The Dynamics Behind the Garden
In the Garden of Time and Steps, we visualize algorithms as discretizations of continuous-time dynamical systems. Here is the mathematical formulation of the vector fields governing the particles.
1. Objective Functions
We aim to minimize a function $f: \mathbb{R}^d \to \mathbb{R}$.
- Quadratic: $f(x) = \frac{1}{2} x^\top A x$. Simple, convex, single global minimum.
- Rastrigin: $f(x) = 10d + \sum_{i=1}^d (x_i^2 - 10 \cos(2\pi x_i))$. Highly non-convex with many local minima.
- Mixture of Gaussians (Log-Likelihood): $f(x) = -\log \left( \sum_k w_k \mathcal{N}(x; \mu_k, \Sigma_k) \right)$. Non-convex, multimodal.
2. Vector Fields & Dynamics
The movement of particles follows a specific flow determined by the chosen dynamics.
A. Gradient Flow
The steepest descent path. It is a first-order dissipative system. \(\dot{x}_t = -\nabla f(x_t)\)
- Discretization: Gradient Descent (GD).
- Behavior: Moves directly towards the nearest local minimum.
B. Hamiltonian Flow
A conservative system that preserves energy. It is a second-order system (position $x$ and velocity $v$). The Hamiltonian is $H(x, v) = f(x) + \frac{1}{2}|v|^2$. \(\begin{aligned} \dot{x}_t &= v_t \\ \dot{v}_t &= -\nabla f(x_t) \end{aligned}\)
- Discretization: Symplectic integrators (e.g., Leapfrog).
- Behavior: Particles orbit level sets of $H$. They do not converge to a minimum but explore the space with constant energy.
C. Damped Hamiltonian Flow (Nesterov)
To achieve convergence (optimization), we add friction (damping) to the Hamiltonian system. \(\begin{aligned} \dot{x}_t &= v_t \\ \dot{v}_t &= -\nabla f(x_t) - \gamma_t v_t \end{aligned}\)
- Constant Damping ($\gamma_t = \gamma > 0$): Heavy Ball method.
- Vanishing Damping ($\gamma_t = 3/t$): Nesterov’s Accelerated Gradient Flow (Su, Boyd, Candes ‘14).
- Behavior: Oscillates but eventually settles into a minimum. Can escape shallow local minima due to momentum.
3. Stochasticity (Langevin Dynamics)
In our simulation, we can also add noise (via the $\sigma$ parameter), turning the dynamics into Stochastic Differential Equations (SDEs): \(dX_t = -\nabla f(X_t) dt + \sqrt{2} dW_t\) This corresponds to Langevin Dynamics, used for sampling from $\pi(x) \propto e^{-f(x)}$.