The Dynamics Behind the Garden

In the Garden of Time and Steps, we visualize algorithms as discretizations of continuous-time dynamical systems. Here is the mathematical formulation of the vector fields governing the particles.

1. Objective Functions

We aim to minimize a function $f: \mathbb{R}^d \to \mathbb{R}$.

  • Quadratic: $f(x) = \frac{1}{2} x^\top A x$. Simple, convex, single global minimum.
  • Rastrigin: $f(x) = 10d + \sum_{i=1}^d (x_i^2 - 10 \cos(2\pi x_i))$. Highly non-convex with many local minima.
  • Mixture of Gaussians (Log-Likelihood): $f(x) = -\log \left( \sum_k w_k \mathcal{N}(x; \mu_k, \Sigma_k) \right)$. Non-convex, multimodal.

2. Vector Fields & Dynamics

The movement of particles follows a specific flow determined by the chosen dynamics.

A. Gradient Flow

The steepest descent path. It is a first-order dissipative system. \(\dot{x}_t = -\nabla f(x_t)\)

  • Discretization: Gradient Descent (GD).
  • Behavior: Moves directly towards the nearest local minimum.

B. Hamiltonian Flow

A conservative system that preserves energy. It is a second-order system (position $x$ and velocity $v$). The Hamiltonian is $H(x, v) = f(x) + \frac{1}{2}|v|^2$. \(\begin{aligned} \dot{x}_t &= v_t \\ \dot{v}_t &= -\nabla f(x_t) \end{aligned}\)

  • Discretization: Symplectic integrators (e.g., Leapfrog).
  • Behavior: Particles orbit level sets of $H$. They do not converge to a minimum but explore the space with constant energy.

C. Damped Hamiltonian Flow (Nesterov)

To achieve convergence (optimization), we add friction (damping) to the Hamiltonian system. \(\begin{aligned} \dot{x}_t &= v_t \\ \dot{v}_t &= -\nabla f(x_t) - \gamma_t v_t \end{aligned}\)

  • Constant Damping ($\gamma_t = \gamma > 0$): Heavy Ball method.
  • Vanishing Damping ($\gamma_t = 3/t$): Nesterov’s Accelerated Gradient Flow (Su, Boyd, Candes ‘14).
  • Behavior: Oscillates but eventually settles into a minimum. Can escape shallow local minima due to momentum.

3. Stochasticity (Langevin Dynamics)

In our simulation, we can also add noise (via the $\sigma$ parameter), turning the dynamics into Stochastic Differential Equations (SDEs): \(dX_t = -\nabla f(X_t) dt + \sqrt{2} dW_t\) This corresponds to Langevin Dynamics, used for sampling from $\pi(x) \propto e^{-f(x)}$.