Flow and Diffusion Models
Sampling is achieved via the transformation of samples from a simple distribution \(p_\text{init}\) to samples from the target distribution \(p_\text{data}\), the desired transformation can be obtained as the simulation of a suitably constructed differential equation. For flow matching and diffusion models, it involves simulating ODEs and SDEs respectively.
Flow models#
A flow model is described by the initial value problem
\[X_0\sim p_\text{init} \\ \frac{d}{dt} X_t = u^\theta_t(X_t)\]where the vector field \(u^\theta_t: \R^d\times [0,1]\to \R^d\) is a (general) neural network \(u^\theta_t\) with parameters \(\theta\). We want to make the endpoint \(X_1\) of the trajectory have the distribution \(p_\text{data}\), namely
\[X_1 \sim p_\text{data} \Leftrightarrow \psi^\theta_1(X_0) \sim p_\text{data}\]where \(\psi^\theta_t\) describes the flow induced by \(u^\theta_t\). Note that the neural network parameterizes the vector field, not the flow. We need to simulate the ODE to compute the flow, with, say, Euler method.
Diffusion models#
Recall that a stochastic process is simply a collection \((X_t)_{t\in T}\) of random variables. The idea of an SDE is to extend the deterministic dynamics of an ODE by adding stochastic dynamics driven by a white noise variable, say, Wiener process \(W=(W_t)_{t\in T}; T=[0,1]\), which is a stochastic process such that \(W_0 = 0\), with the trajectories \(t\mapsto W_t\) continuous, together with the following two conditions
- Normal increments: \(W_t - W_s \sim \mathcal{N}(0,(t-s)I_d)\) for all \(0\leq s < t\) - increments have a Gaussian distribution with variance increasing linearly in time.
- Independent increments: For any \(t_i\), the increments \(W_{t_i} - W_{t_{i-1}}\) are independent random variables.
Now we try to extend an ODE with a Wiener process \(W\). For an ODE
\[\frac{d}{dt} X_t = u_t (X_t)\]we expand and get
\[\frac{1}{h}(X_{t+h}-X_t) = u_t(X_t)+R_t(h)\]or equivalently \(X_{t+h} = X_t + hu_t (X_t) + hR_t(h)\) with \(R_t\) being the higher order terms. Now a trajectory \((X_t)_{0\leq t\leq 1}\) of an SDE takes at every step a small step in the direction \(u_t(X_t)\) with perturbations from a Wiener process:
\[X_{t+h} = X_t + hu_t (X_t) + \sigma_t(W_{t+h} -W_t)+ hR_t(h)\]where \(\sigma_t\geq 0\) describes the diffusion coefficient. The function \(R_t\) now describes a stochastic error term such that the standard deviation \(\mathbb{E}[||R_t(h)||^2]^{½}\) goes to zero for \(h\to 0\). This is often described in the following symbolic notation, along with the initial condition
\[\mathrm{d} X_t = u_t(X_t)\mathrm{d}t + \sigma_t \mathrm{d}W_t \\ X_0 = x_0.\]We have the following theorem
Theorem. SDE solution existence and uniqueness. Given \(u:\R^d \times [0,1] \to \R^d\) continuously differentiable with a bounded derivative, and \(\sigma_t\) continuous, then the SDE above has a solution given by the unique stochastic process \((X_t)_T\).
For more about SDE see Xuerong Mao, Stochastic differential equations and applications.
A diffusion model is simply given by
\[\mathrm{d} X_t = u^\theta_t(X_t)\mathrm{d}t + \sigma_t \mathrm{d}W_t \\ X_0 \sim p_\text{init}\]A diffusion model with \(\sigma_t = 0\) is a flow model.