Continuity equation

$\def \R {\mathbb R} \def \X {\mathcal X} \def \T {\mathsf T} \def \A {\mathcal A} \def \B {\mathcal B} \def \I {\mathsf{I}} \def \Tr {\mathsf{Tr}} \def \H {\mathcal H} \def \Diff {\mathsf{Diff}} \newcommand{part}[2]{\frac{\partial #1}{\partial #2}}$ We derive the continuity equation

\[\part{\rho}{t} + \nabla \cdot (\rho v) = 0\]

which describes the evolution of a density $\rho$ under the flow of a vector field $v$. This is following Mackey’s Time’s Arrow, particularly $\S4A$. First, we recall some definitions from measure theory.

Pushforward measure

Let $(X,\A,\mu)$ be a measure space, which recall means $X$ is a set ($X$ is the phase space on which our dynamics operate), $\A$ is a $\sigma$-algebra that defines the measurable subsets of $X$, and $\mu \colon \A \to \R$ is a non-negative measure on $X$. We also assume $\mu$ is a $\sigma$-finite measure, which means we can partition $X = \bigcup_{k=1}^\infty A_k$ with $\mu(A_k) < \infty$. The simplest example we’ll keep in mind is $X = \R^d$ with $\mu$ the Lebesgue measure.

A map $S \colon X \to Y$ between two measurable spaces $(X,\A)$ and $(Y,\B)$ is said to be measurable if the inverse image of any measurable set is measurable:

\[S^{-1}(B) = \{a \in X \colon\; S(a) \in B\} \in \A ~~~~~~\forall~B \in \B\]

(Notice in particular the above doesn’t say $S$ maps measurable sets to measurable sets, but $S^{-1}$ does.)

Given a measurable map $S \colon X \to Y$ and a measure $\mu$ on $(X,\A)$, we define the pushforward measure $\nu = S_\ast \mu$ on $(Y,\B)$ by “pulling back” any set to $X$ and applying $\mu$:

\[S_\ast \mu(B) := \mu(S^{-1}(B))~~~~~\forall~B \in \B\]

Note that the measurability assumption on $S$ ensures $S^{-1}(B) \in \A$, so the above is well-defined.

Given two $\sigma$-finite measures $\mu$, $\nu$ on the same measurable space $(X,\A)$, we say $\nu$ is absolutely continuous with respect to $\mu$, denoted by $\nu \ll \mu$, if $\nu(A) = 0$ whenever $\mu(A) = 0$. Then $\nu$ has a density $f = \frac{d\nu}{d\mu} \colon X \to [0,\infty)$ with respect to $\mu$, called the Radon-Nikodym derivative, defined by

\[\nu(A) = \int_A f d\mu\]

for any measurable set $A \in \A$. The function $f$ is uniquely defined $\mu$-almost everywhere.

A measurable map $S \colon X \to X$ is said to be nonsingular with respect to a base measure $\mu$ if $S_* \mu \ll \mu$.

Frobenius-Perron operator

Let $S \colon X \to X$ be a nonsingular transformation with respect to a $\sigma$-finite base measure $\mu$ on $X$.

Any nonnegative measurable function $f$ on $X$ defines a measure $\mu_f = \int f d\mu$, which we can pushforward by $S$ to obtain $\nu_f = S_* \mu_f$. It is easy to see that $\nu_f \ll \mu$ (not $\mu_f$), so it has a density $g = \frac{d\nu_f}{d\mu}$ with respect to $\mu$, which satisfies the defining equation

\[\int_A g \: d\mu = \nu_f(A) = \mu_f(S^{-1}(A)) = \int_{S^{-1}(A)} f \: d\mu\]

So there is a unique map that transforms any density $f \colon X \to [0,\infty)$ to another density $g \colon X \to [0,\infty)$ as defined above. By linearity, we can extend this to any measurable function $f \colon X \to \R$, say with finite $L^1$-norm $|f| = \int_X |f| d\mu < \infty$.

So we define the Frobenius-Perron operator (corresponding to $S$) as the unique linear operator

\[P \colon L^1 \to L^1\]

defined by

\[\int_A Pf \: d\mu = \int_{S^{-1}(A)} f \: d\mu\]

Then $P$ is also a Markov operator, which means $Pf \ge 0$ if $f \ge 0$ (pointwise), and $|Pf| = |f|$. In particular, if we start with a probability density $f$, which means $f \ge 0$ with $|f|=1$, then we also end up with another probability density $Pf$.

Now for concreteness, suppose $X = \R^d$ and $\mu$ is the Lebesgue measure, so we can write $d\mu = dx$. Suppose further that $S$ is smooth and invertible, for simplicity. Then by substituting $y = S(x)$ and using the change of variable formula, we can write

\[\int_A Pf(x)\:dx = \int_{S^{-1}(A)} f(x)\:dx = \int_A f(S^{-1}(x)) \: |\det J^{-1}(x)| \:dx\]

where $J^{-1}(x) = \part{S^{-1}(x)}{x}$ is the Jacobian matrix of partial derivatives of $S^{-1}$, which by the inverse function theorem is equal to the inverse matrix of the Jacobian $J$ of $S$.

Thus, we have the formula for the Frobenius-Perron operator (with respect to the Lebesgue measure):

\[Pf(x) = f(S^{-1}(x)) \: |\det J^{-1}(x)|\]

Dynamical system

A dynamical system $(S_t)_{t \in \R}$ on $X$ is a collection of maps $S_t \colon X \to X$ satisfying the flow property:

\[S_t \circ S_{t'} = S_{t+t'}~~~~~\forall\;t,t'\in\R\]

In particular, $S_0 = \I$ is the identity map on $X$, and each $S_t$ is invertible with $(S_t)^{-1} = S_{-t}$. We also say that the dynamics is reversible, or time-reversal invariant. For example, systems of ordinary differential equations are reversible, including all equations of classical and quantum physics (Mackey, p. 3).

Equivalently, each $S_t$ is an element of the diffeomorphism group $M = \Diff(X)$, and the flow property above means a dynamical system $S = (S_t)_{t \in \R}$ is a one-parameter subgroup of $M$, that is, the map

\[S \colon \R \to M\]

is a group homomorphism from $\R$ (with addition as group operation) to the diffeomorphism group $M$ (with composition as group operation). So the image $S(\R) = (S_t)_{t \in \R}$ is a continuous path in $M$ passing through the identity. Any such path is generated by a “tangent vector” at the identity in $M$.

The diffeomorphism group $M = \Diff(X)$ is an infinite-dimensional Lie group, whose tangent space at the identity is the Lie algebra of vector fields on $X$. Thus, a dynamical system is generated by a vector field

\[v \colon X \to \R^d\]

that assigns to each point $x \in X$ the velocity vector $v(x) \in \T_xX \cong \R^d$ of the dynamics at each time. That is, a dynamics $S = (S_t)_{t \in \R}$ is the integral curve or solution to the differential equation

\[\dot S_t = \frac{dS_t}{dt} = v(S_t)\]

(with a slight abuse of notation above; really we should write $\dot X_t = v(X_t)$ where $X_t := S_t(x_0) \in X$ for any initial point $x_0 \in X$). Then $S_t$ is obtained by integrating $v$:

\[S_t = S_0 + \int_0^t \dot S_u \:du = \I + \int_0^t v(S_u) \: du\]

which is really the exponential map in the Lie group $M$, so formally we can write $S_t = \exp(tv)$. In particular, for small $t$ we have the approximation

\[S_t = \I + tv\]

Evolution of a density

Let $S = (S_t)_{t \in \R}$ be a dynamical system on $X = \R^d$ equipped with the Lebesgue measure. To each map $S_t \colon X \to X$ we can define its corresponding Frobenius-Perron operator $P^t \colon L^1 \to L^1$ as above:

\[P^t f(x) = f(S_{-t}(x)) \: |\det J^{-t}(x)|\]

where $J^{-t}$ is the Jacobian matrix of $S_{-t} = S_t^{-1}$.

So starting from an initial density $f(x)$, we can let it evolve into the density $f(t,x)$ at time $t$ given by

\[f(t,x) := P^t f(x)\]

We wish to determine the evolution equation for $f(t,x)$. By shifting time, it suffices to look at $t = 0$. Then we can use the small $t$ expansion above:

\[S_{-t}(x) = (\I - tv)(x) = x - tv(x)\]

so we can write:

\[f(S_{-t}(x)) = f(x - t v(x)) = f(x) - t \langle \nabla f(x), v(x) \rangle\]

Moreover, by differentiating the approximation for $S_{-t}$ above we also get an approximation for $J^{-t}$:

\[J^{-t}(x) = \frac{\partial S_{-t}(x)}{\partial x} = \I - t \partial v(x)\]

where $\I$ is the $d \times d$ identity matrix, and $\partial v(x) \equiv \part{v(x)}{x}$ is the Jacobian matrix of $v$.

So to compute $\det J^{-t}(x)$ we need to compute the determinant of a small perturbation $\I + t A$ of the identity, where $A = -\partial v(x)$. For simplicity suppose $A$ is symmetric (which will be true for our case of interest below), so we have an eigendecomposition $A = U \Lambda U^\top$, where $U$ is orthogonal and $\Lambda = \text{diag}(\lambda_i)$ is a diagonal matrix of eigenvalues. Then $\I + tA = U(\I + t \Lambda)U^\top$, so:

\[\det(\I + tA) = \prod_{i=1}^d (1 + t \lambda_i) = 1 + t \sum_{i=1}^d \lambda_i = 1 + t \Tr(A)\]

where $\Tr(A) = \sum_{i=1}^d \lambda_i = \sum_{i=1}^d A_{ii}$ is the trace operator. Therefore, plugging in $A = -\partial v(x)$, we get

\[|\det J^{-t}(x)| = 1 - t \Tr(\partial v(x)) = 1 - t \sum_{i=1}^d \part{v_i(x)}{x_i} = 1 - t (\nabla \cdot v)(x)\]

where $v(x) = (v_1(x),\dots,v_d(x))$ are the components of $v$, and $\nabla \cdot$ is the divergence operator.

Now combining the two approximations above, we can write:

\[\begin{align*} f(t,x) &= f(S_{-t}(x)) \: |\det J^{-t}(x)| \\ &= \big\{ f(x) - t \langle \nabla f(x), v(x) \rangle \big\} \big\{ 1 - t (\nabla \cdot v) \big\} \\ &= f(x) - t \big\{ \langle \nabla f(x), v(x) \rangle + f(x) (\nabla \cdot v)(x) \big\} \\ &= f(x) - t \big\{\nabla \cdot (fv) \big\}(x) \end{align*}\]

Notice how the linear terms in the two approximations conspire nicely to form the divergence of the weighted vector field $fv$.

Thus, we find that the time rate of change in the density $f(t,x)$ is equal to the space divergence:

\[\part{f}{t}\Big|_{t=0} = \lim_{t \to 0} \frac{f(t,x) - f(x)}{t} = -\nabla \cdot (fv)\]

and again by shifting time, or changing the initial density $f \mapsto f(t,\cdot)$, the above is also valid for any $t$.

Continuity equation

As we derived above, the continuity equation

\[\part{f}{t} + \nabla \cdot (fv) = 0\]

describes the evolution of a density $f(t,x)$ under the flow of a vector field $v(x)$. This is supposed to be an intuitive property, like “conservation of mass”, or “what comes in must go out”.

In particular, if we have a density $f(x)$ satisfying

\[\nabla \cdot (fv) = 0\]

then $f$ is preserved under the flow of $v$, since the above is simply the evolution equation when $\part{f}{t} = 0$. We also say $f$ is a stationary (or invariant) measure for $v$.

For example, suppose $v = (\part{\H}{p}, -\part{\H}{q})$ is the Hamiltonian vector field of a Hamiltonian function $\H(q,p)$ on phase space $X = \R^d \times \R^d$. Then with $f(x) = 1$, so $\mu_f = \mu$ is the Lebesgue measure, we have that

\[\nabla \cdot v = \part{\,}{q}\left(\part{\H}{p}\right) + \part{\,}{p}\left(-\part{\H}{q}\right) = 0\]

since mixed partial derivatives commute. Thus, Hamiltonian flow preserves the Lebesgue measure, no matter what the Hamiltonian function is.

Finally, notice that we can write the continuity equation above as the spacetime divergence-free condition:

\[\tilde \nabla \cdot (f \tilde v) = 0\]

where $\tilde \nabla \cdot = (\part{\,}{x}, \part{\,}{t})^\top$ is the divergence in spacetime, $f(t,x)$ is the density in spacetime, and $\tilde v(t,x) = (v(x), 1)$ is the vector field in spacetime. In particular, note that time moves with a constant unit speed (which might be interesting to change later in the Lagrangian approach for optimization).

And as a side note, weirdly there is also this recent Wired article about continuity equation.