Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 11 additions & 7 deletions unit1/02_diffusion_models_from_scratch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1478,19 +1478,23 @@
"source": [
"### The Corruption Process\n",
"\n",
"The DDPM paper describes a corruption process that adds a small amount of noise for every 'timestep'. Given $x_{t-1}$ for some timestep, we can get the next (slightly more noisy) version $x_t$ with:<br><br>\n",
"The DDPM paper describes a corruption process that adds a small amount of noise for every 'timestep'. Given \\\\( x_{t-1} \\\\) for some timestep, we can get the next (slightly more noisy) version \\\\( x_t \\\\) with:<br><br>\n",
"\n",
"$q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1}) = \\mathcal{N}(\\mathbf{x}_t; \\sqrt{1 - \\beta_t} \\mathbf{x}_{t-1}, \\beta_t\\mathbf{I}) \\quad\n",
"q(\\mathbf{x}_{1:T} \\vert \\mathbf{x}_0) = \\prod^T_{t=1} q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1})$<br><br>\n",
"$$q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1}) = \\mathcal{N}(\\mathbf{x}_t; \\sqrt{1 - \\beta_t} \\mathbf{x}_{t-1}, \\beta_t\\mathbf{I}) \\quad\n",
"q(\\mathbf{x}_{1:T} \\vert \\mathbf{x}_0) = \\prod^T_{t=1} q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1})$$\n",
"\n",
"<br><br>\n",
"\n",
"That is, we take $x_{t-1}$, scale it by $\\sqrt{1 - \\beta_t}$ and add noise scaled by $\\beta_t$. This $\\beta$ is defined for every t according to some schedule, and determines how much noise is added per timestep. Now, we don't necessarily want to do this operation 500 times to get $x_{500}$ so we have another formula to get $x_t$ for any t given $x_0$: <br><br>\n",
"\n",
"$\\begin{aligned}\n",
"That is, we take \\\\(x_{t-1}\\\\), scale it by \\\\(\\sqrt{1 - \\beta_t}\\\\) and add noise scaled by \\\\(\\beta_t\\\\). This \\\\(\\beta\\\\) is defined for every t according to some schedule, and determines how much noise is added per timestep. Now, we don't necessarily want to do this operation 500 times to get \\\\(x_{500}\\\\) so we have another formula to get \\\\(x_t\\\\) for any \\\\(t\\\\) given \\\\(x_0\\\\): <br><br>\n",
"\n",
"$$\\begin{aligned}\n",
"q(\\mathbf{x}_t \\vert \\mathbf{x}_0) &= \\mathcal{N}(\\mathbf{x}_t; \\sqrt{\\bar{\\alpha}_t} \\mathbf{x}_0, \\sqrt{(1 - \\bar{\\alpha}_t)} \\mathbf{I})\n",
"\\end{aligned}$ where $\\bar{\\alpha}_t = \\prod_{i=1}^T \\alpha_i$ and $\\alpha_i = 1-\\beta_i$<br><br>\n",
"\\end{aligned}$$\n",
"\n",
"where \\\\(\\bar{\\alpha}_t = \\prod_{i=1}^T \\alpha_i\\\\)<br><br>\n",
"\n",
"The maths notation always looks scary! Luckily the scheduler handles all that for us (uncomment the next cell to check out the code). We can plot $\\sqrt{\\bar{\\alpha}_t}$ (labelled as `sqrt_alpha_prod`) and $\\sqrt{(1 - \\bar{\\alpha}_t)}$ (labelled as `sqrt_one_minus_alpha_prod`) to view how the input (x) and the noise are scaled and mixed across different timesteps:\n"
"The maths notation always looks scary! Luckily the scheduler handles all that for us (uncomment the next cell to check out the code). We can plot \\\\(\\sqrt{\\bar{\\alpha}_t}\\\\) (labelled as `sqrt_alpha_prod`) and \\\\(\\sqrt{(1 - \\bar{\\alpha}_t)}\\\\) (labelled as `sqrt_one_minus_alpha_prod`) to view how the input (x) and the noise are scaled and mixed across different timesteps:\n"
]
},
{
Expand Down