diff --git a/unit1/02_diffusion_models_from_scratch.ipynb b/unit1/02_diffusion_models_from_scratch.ipynb index 70b3c94..ebfd622 100644 --- a/unit1/02_diffusion_models_from_scratch.ipynb +++ b/unit1/02_diffusion_models_from_scratch.ipynb @@ -1478,19 +1478,23 @@ "source": [ "### The Corruption Process\n", "\n", - "The DDPM paper describes a corruption process that adds a small amount of noise for every 'timestep'. Given $x_{t-1}$ for some timestep, we can get the next (slightly more noisy) version $x_t$ with:

\n", + "The DDPM paper describes a corruption process that adds a small amount of noise for every 'timestep'. Given \\\\( x_{t-1} \\\\) for some timestep, we can get the next (slightly more noisy) version \\\\( x_t \\\\) with:

\n", "\n", - "$q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1}) = \\mathcal{N}(\\mathbf{x}_t; \\sqrt{1 - \\beta_t} \\mathbf{x}_{t-1}, \\beta_t\\mathbf{I}) \\quad\n", - "q(\\mathbf{x}_{1:T} \\vert \\mathbf{x}_0) = \\prod^T_{t=1} q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1})$

\n", + "$$q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1}) = \\mathcal{N}(\\mathbf{x}_t; \\sqrt{1 - \\beta_t} \\mathbf{x}_{t-1}, \\beta_t\\mathbf{I}) \\quad\n", + "q(\\mathbf{x}_{1:T} \\vert \\mathbf{x}_0) = \\prod^T_{t=1} q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1})$$\n", "\n", + "

\n", "\n", - "That is, we take $x_{t-1}$, scale it by $\\sqrt{1 - \\beta_t}$ and add noise scaled by $\\beta_t$. This $\\beta$ is defined for every t according to some schedule, and determines how much noise is added per timestep. Now, we don't necessarily want to do this operation 500 times to get $x_{500}$ so we have another formula to get $x_t$ for any t given $x_0$:

\n", "\n", - "$\\begin{aligned}\n", + "That is, we take \\\\(x_{t-1}\\\\), scale it by \\\\(\\sqrt{1 - \\beta_t}\\\\) and add noise scaled by \\\\(\\beta_t\\\\). This \\\\(\\beta\\\\) is defined for every t according to some schedule, and determines how much noise is added per timestep. Now, we don't necessarily want to do this operation 500 times to get \\\\(x_{500}\\\\) so we have another formula to get \\\\(x_t\\\\) for any \\\\(t\\\\) given \\\\(x_0\\\\):

\n", + "\n", + "$$\\begin{aligned}\n", "q(\\mathbf{x}_t \\vert \\mathbf{x}_0) &= \\mathcal{N}(\\mathbf{x}_t; \\sqrt{\\bar{\\alpha}_t} \\mathbf{x}_0, \\sqrt{(1 - \\bar{\\alpha}_t)} \\mathbf{I})\n", - "\\end{aligned}$ where $\\bar{\\alpha}_t = \\prod_{i=1}^T \\alpha_i$ and $\\alpha_i = 1-\\beta_i$

\n", + "\\end{aligned}$$\n", + "\n", + "where \\\\(\\bar{\\alpha}_t = \\prod_{i=1}^T \\alpha_i\\\\)

\n", "\n", - "The maths notation always looks scary! Luckily the scheduler handles all that for us (uncomment the next cell to check out the code). We can plot $\\sqrt{\\bar{\\alpha}_t}$ (labelled as `sqrt_alpha_prod`) and $\\sqrt{(1 - \\bar{\\alpha}_t)}$ (labelled as `sqrt_one_minus_alpha_prod`) to view how the input (x) and the noise are scaled and mixed across different timesteps:\n" + "The maths notation always looks scary! Luckily the scheduler handles all that for us (uncomment the next cell to check out the code). We can plot \\\\(\\sqrt{\\bar{\\alpha}_t}\\\\) (labelled as `sqrt_alpha_prod`) and \\\\(\\sqrt{(1 - \\bar{\\alpha}_t)}\\\\) (labelled as `sqrt_one_minus_alpha_prod`) to view how the input (x) and the noise are scaled and mixed across different timesteps:\n" ] }, {