huggingface · burtenshaw · Sep 12, 2025 · Sep 12, 2025
diff --git a/unit1/02_diffusion_models_from_scratch.ipynb b/unit1/02_diffusion_models_from_scratch.ipynb
@@ -1478,19 +1478,23 @@
    "source": [
     "### The Corruption Process\n",
     "\n",
-    "The DDPM paper describes a corruption process that adds a small amount of noise for every 'timestep'. Given $x_{t-1}$ for some timestep, we can get the next (slightly more noisy) version $x_t$ with:<br><br>\n",
+    "The DDPM paper describes a corruption process that adds a small amount of noise for every 'timestep'. Given \\\\( x_{t-1} \\\\) for some timestep, we can get the next (slightly more noisy) version \\\\( x_t \\\\) with:<br><br>\n",
     "\n",
-    "$q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1}) = \\mathcal{N}(\\mathbf{x}_t; \\sqrt{1 - \\beta_t} \\mathbf{x}_{t-1}, \\beta_t\\mathbf{I}) \\quad\n",
-    "q(\\mathbf{x}_{1:T} \\vert \\mathbf{x}_0) = \\prod^T_{t=1} q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1})$<br><br>\n",
+    "$$q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1}) = \\mathcal{N}(\\mathbf{x}_t; \\sqrt{1 - \\beta_t} \\mathbf{x}_{t-1}, \\beta_t\\mathbf{I}) \\quad\n",
+    "q(\\mathbf{x}_{1:T} \\vert \\mathbf{x}_0) = \\prod^T_{t=1} q(\\mathbf{x}_t \\vert \\mathbf{x}_{t-1})$$\n",
     "\n",
+    "<br><br>\n",
     "\n",
-    "That is, we take $x_{t-1}$, scale it by $\\sqrt{1 - \\beta_t}$ and add noise scaled by $\\beta_t$. This $\\beta$ is defined for every t according to some schedule, and determines how much noise is added per timestep. Now, we don't necessarily want to do this operation 500 times to get $x_{500}$ so we have another formula to get $x_t$ for any t given $x_0$: <br><br>\n",
     "\n",
-    "$\\begin{aligned}\n",
+    "That is, we take \\\\(x_{t-1}\\\\), scale it by \\\\(\\sqrt{1 - \\beta_t}\\\\) and add noise scaled by \\\\(\\beta_t\\\\). This \\\\(\\beta\\\\) is defined for every t according to some schedule, and determines how much noise is added per timestep. Now, we don't necessarily want to do this operation 500 times to get \\\\(x_{500}\\\\) so we have another formula to get \\\\(x_t\\\\) for any \\\\(t\\\\) given \\\\(x_0\\\\): <br><br>\n",
+    "\n",
+    "$$\\begin{aligned}\n",
     "q(\\mathbf{x}_t \\vert \\mathbf{x}_0) &= \\mathcal{N}(\\mathbf{x}_t; \\sqrt{\\bar{\\alpha}_t} \\mathbf{x}_0, \\sqrt{(1 - \\bar{\\alpha}_t)} \\mathbf{I})\n",
-    "\\end{aligned}$ where $\\bar{\\alpha}_t = \\prod_{i=1}^T \\alpha_i$ and $\\alpha_i = 1-\\beta_i$<br><br>\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "where \\\\(\\bar{\\alpha}_t = \\prod_{i=1}^T \\alpha_i\\\\)<br><br>\n",
     "\n",
-    "The maths notation always looks scary! Luckily the scheduler handles all that for us (uncomment the next cell to check out the code). We can plot $\\sqrt{\\bar{\\alpha}_t}$ (labelled as `sqrt_alpha_prod`) and $\\sqrt{(1 - \\bar{\\alpha}_t)}$ (labelled as `sqrt_one_minus_alpha_prod`) to view how the input (x) and the noise are scaled and mixed across different timesteps:\n"
+    "The maths notation always looks scary! Luckily the scheduler handles all that for us (uncomment the next cell to check out the code). We can plot \\\\(\\sqrt{\\bar{\\alpha}_t}\\\\) (labelled as `sqrt_alpha_prod`) and \\\\(\\sqrt{(1 - \\bar{\\alpha}_t)}\\\\) (labelled as `sqrt_one_minus_alpha_prod`) to view how the input (x) and the noise are scaled and mixed across different timesteps:\n"
    ]
   },
   {