扩散过程
$x_0,x_1,...,x_T$ 其中 $x_0$ 是原特征,$x_T =N(0,I)$ 是高斯分布
设 $\boldsymbol{x}t = \alpha_t \boldsymbol{x}{t-1} + \beta_t \boldsymbol{\varepsilon}_t,\quad \boldsymbol{\varepsilon}_t\sim\mathcal{N}(\boldsymbol{0}, \boldsymbol{I}),\alpha_t,\beta_t>0,\alpha_t^2+\beta_t^2=1$ 那么
$$ \begin{aligned} \boldsymbol{x}t =&\, \alpha_t \boldsymbol{x}{t-1} + \beta_t \boldsymbol{\varepsilon}t \\ =&\, \alpha_t \big(\alpha{t-1} \boldsymbol{x}{t-2} + \beta{t-1} \boldsymbol{\varepsilon}_{t-1}\big) + \beta_t \boldsymbol{\varepsilon}_t \\ =&\,\cdots\\ =&\,(\alpha_t\cdots\alpha_1) \boldsymbol{x}_0 + \underbrace{(\alpha_t\cdots\alpha_2)\beta_1 \boldsymbol{\varepsilon}1 + (\alpha_t\cdots\alpha_3)\beta_2 \boldsymbol{\varepsilon}2 + \cdots + \alpha_t\beta{t-1} \boldsymbol{\varepsilon}{t-1} + \beta_t \boldsymbol{\varepsilon}t}{\text{多个相互独立的正态噪声之和}} \end{aligned} $$
$$ \boldsymbol{x}t = \underbrace{(\alpha_t\cdots\alpha_1)}{\text{记为}\bar{\alpha}_t} \boldsymbol{x}0 + \underbrace{\sqrt{1 - (\alpha_t\cdots\alpha_1)^2}}{\text{记为}\bar{\beta}_t} \bar{\boldsymbol{\varepsilon}}_t,\quad \bar{\boldsymbol{\varepsilon}}_t\sim\mathcal{N}(\boldsymbol{0}, \boldsymbol{I}) $$
$$ \boldsymbol{x}_t=\bar{\alpha}_t \boldsymbol{x}_0+\sqrt{1-\bar{\alpha}_t ^2} \bar{\boldsymbol{\varepsilon}}_t, \bar{\boldsymbol{\varepsilon}}_t\sim\mathcal{N}(\boldsymbol{0}, \boldsymbol{I}) $$
取 $\alpha_t = \sqrt{1 - \frac{0.02t}{T}}$ 因为 $\log \bar{\alpha}T = \sum{t=1}^T \log\alpha_t = \frac{1}{2} \sum_{t=1}^T \log\left(1 - \frac{0.02t}{T}\right) < \frac{1}{2} \sum_{t=1}^T \left(- \frac{0.02t}{T}\right) = -0.005(T+1)$ 在 T=1000 时 $\bar{\alpha}_T\approx e^{-5}\approx0$,我们可以认为$\boldsymbol{x}_T\sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{I})$
我们要的模型为$\boldsymbol{\mu}:\boldsymbol{x}t\to \boldsymbol{x}{t-1}$,可以取 $\boldsymbol{\mu}(\boldsymbol{x}_t) = \frac{1}{\alpha_t}\left(\boldsymbol{x}t - \beta_t \boldsymbol{\epsilon}{\boldsymbol{\theta}}(\boldsymbol{x}_t, t)\right)$,这个时候由于$\boldsymbol{x}t$已知,我们需要训的模型就是 $\boldsymbol{\epsilon}{\boldsymbol{\theta}}(\boldsymbol{x}t, t)$,也就是要预测噪声 $\boldsymbol{\varepsilon}{t}$
训练过程
while not converged:
$x_0\\sim q(x_0)$ 从数据集中采样一个样本
$t\\sim \\text{Uniform}({1,...,T})$ 从步数 [1, T] 中采样一个时间步
$\\boldsymbol{\\varepsilon}_t\\sim \\mathcal{N}(\\boldsymbol{0}, \\boldsymbol{I})$ 从标准正态分布中采样一个 $\\boldsymbol{\\varepsilon}_t$ 作为噪声
$\\boldsymbol{x}_t=\\bar{\\alpha}_t \\boldsymbol{x}_0+\\sqrt{1-\\bar{\\alpha}_t ^2} \\boldsymbol{\\varepsilon}_t$ 构建带噪声的样本
$\\text{logits}=\\text{Model}(x_t, t)$ 从带噪音的样本里预测噪声
$\\text{loss}=L_2( \\boldsymbol{\\varepsilon}_t , \\text{logits})$ 拟合噪声
推理过程