Interactive Explainer
A diffusion model learns to reverse a noising process. Drag the timestep slider — watch a 2D shape dissolve into Gaussian. Press "reverse" — watch noise reassemble back into the shape. Every modern image generator rides this idea.
The forward process is fixed: repeatedly add small Gaussian noise until the data becomes indistinguishable from N(0, I). The reverse process is learned: a neural network predicts how much noise was added at each step, so you can subtract it and work your way back.
x_t = √(α̅_t) · x_0 + √(1 − α̅_t) · ε ε ~ N(0, I)
Because Gaussians compose under noise-addition, you can sample x_t directly from x_0 without stepping through — huge training speedup. α̅_t is a cumulative product of noise-retention factors; it goes from ~1 at t=0 to ~0 at t=T.
L = E_{t, x_0, ε} [ ∥ε − ε_θ(x_t, t)∥² ]
Sample a random clean image, a random timestep, and a random noise ε. Compute x_t. Ask the network to predict ε. That's the entire loss. Simple, stable, and works better than everything else.
Part of the ES 667 Deep Learning course · IIT Gandhinagar · Aug 2026.