diff --git a/unit1/02_diffusion_models_from_scratch.ipynb b/unit1/02_diffusion_models_from_scratch.ipynb
index 70b3c94..ebfd622 100644
--- a/unit1/02_diffusion_models_from_scratch.ipynb
+++ b/unit1/02_diffusion_models_from_scratch.ipynb
@@ -1478,19 +1478,23 @@
"source": [
"### The Corruption Process\n",
"\n",
- "The DDPM paper describes a corruption process that adds a small amount of noise for every 'timestep'. Given $x_{t-1}$ for some timestep, we can get the next (slightly more noisy) version $x_t$ with:
\n",
+ "The DDPM paper describes a corruption process that adds a small amount of noise for every 'timestep'. Given \\\\( x_{t-1} \\\\) for some timestep, we can get the next (slightly more noisy) version \\\\( x_t \\\\) with:
\n",
"\n",
- "$q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1}) = \\mathcal{N}(\\mathbf{x}_t; \\sqrt{1 - \\beta_t} \\mathbf{x}_{t-1}, \\beta_t\\mathbf{I}) \\quad\n",
- "q(\\mathbf{x}_{1:T} \\vert \\mathbf{x}_0) = \\prod^T_{t=1} q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1})$
\n",
+ "$$q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1}) = \\mathcal{N}(\\mathbf{x}_t; \\sqrt{1 - \\beta_t} \\mathbf{x}_{t-1}, \\beta_t\\mathbf{I}) \\quad\n",
+ "q(\\mathbf{x}_{1:T} \\vert \\mathbf{x}_0) = \\prod^T_{t=1} q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1})$$\n",
"\n",
+ "
\n",
"\n",
- "That is, we take $x_{t-1}$, scale it by $\\sqrt{1 - \\beta_t}$ and add noise scaled by $\\beta_t$. This $\\beta$ is defined for every t according to some schedule, and determines how much noise is added per timestep. Now, we don't necessarily want to do this operation 500 times to get $x_{500}$ so we have another formula to get $x_t$ for any t given $x_0$:
\n",
"\n",
- "$\\begin{aligned}\n",
+ "That is, we take \\\\(x_{t-1}\\\\), scale it by \\\\(\\sqrt{1 - \\beta_t}\\\\) and add noise scaled by \\\\(\\beta_t\\\\). This \\\\(\\beta\\\\) is defined for every t according to some schedule, and determines how much noise is added per timestep. Now, we don't necessarily want to do this operation 500 times to get \\\\(x_{500}\\\\) so we have another formula to get \\\\(x_t\\\\) for any \\\\(t\\\\) given \\\\(x_0\\\\):
\n",
+ "\n",
+ "$$\\begin{aligned}\n",
"q(\\mathbf{x}_t \\vert \\mathbf{x}_0) &= \\mathcal{N}(\\mathbf{x}_t; \\sqrt{\\bar{\\alpha}_t} \\mathbf{x}_0, \\sqrt{(1 - \\bar{\\alpha}_t)} \\mathbf{I})\n",
- "\\end{aligned}$ where $\\bar{\\alpha}_t = \\prod_{i=1}^T \\alpha_i$ and $\\alpha_i = 1-\\beta_i$
\n",
+ "\\end{aligned}$$\n",
+ "\n",
+ "where \\\\(\\bar{\\alpha}_t = \\prod_{i=1}^T \\alpha_i\\\\)
\n",
"\n",
- "The maths notation always looks scary! Luckily the scheduler handles all that for us (uncomment the next cell to check out the code). We can plot $\\sqrt{\\bar{\\alpha}_t}$ (labelled as `sqrt_alpha_prod`) and $\\sqrt{(1 - \\bar{\\alpha}_t)}$ (labelled as `sqrt_one_minus_alpha_prod`) to view how the input (x) and the noise are scaled and mixed across different timesteps:\n"
+ "The maths notation always looks scary! Luckily the scheduler handles all that for us (uncomment the next cell to check out the code). We can plot \\\\(\\sqrt{\\bar{\\alpha}_t}\\\\) (labelled as `sqrt_alpha_prod`) and \\\\(\\sqrt{(1 - \\bar{\\alpha}_t)}\\\\) (labelled as `sqrt_one_minus_alpha_prod`) to view how the input (x) and the noise are scaled and mixed across different timesteps:\n"
]
},
{