noise added
0	2.00	—
1	1.99 · √0.99 + 0.1·ε = 1.97 + 0.08 = 2.05	ε = 0.8
2	2.02	ε = -0.4
3	2.00	ε = -0.2
4	2.03	ε = 0.3
5	2.06	ε = 0.2

	Linear α̅_t	Cosine α̅_t	What's left
0	1.00	1.00	clean signal
250	0.62	0.82	linear: 38% destroyed · cosine: 18%
500	0.17	0.50	linear: already mostly gone
750	0.02	0.18	linear: essentially pure noise
1000	0.00	0.00	both: N(0, I)

	Behavior
50	too coarse; each step must learn a big jump; sample quality hurts
200	works but poor quality at the extremes
1000	default; great quality with cosine schedule
4000	slight quality gain; 4× inference cost; rarely worth it

	VAE	GAN	Diffusion
Sample quality	blurry	sharp	SOTA
Training	stable, fast	brittle	stable, slow
Likelihood	ELBO	✗	ELBO (loose)
Sampling	1 forward	1 forward	T forwards
Mode coverage	strong	mode collapse risk	strong
Interpretability	structured latent	messy latent	uniform latent

Lecture 21 — summary

Forward process · add small Gaussian noise over T steps; closed form .
Reverse process · neural net predicts the noise; subtract step by step.
DDPM loss · MSE between true noise and predicted noise. Stable, simple.
Schedule · linear or cosine; cosine is modern default.
Architecture · U-Net with sinusoidal time embedding + attention at low res.
Score matching · same model through a different lens; reverse diffusion ≈ Langevin dynamics along the score.

Read before Lecture 22

Prince Ch 18 (later sections) + HF diffusers docs + Rombach 2022 (Stable Diffusion).

Next lecture

Diffusion Models — Practice — classifier-free guidance, latent diffusion, DDIM, DiT.

Notebook 21 · 21-ddpm-2d.ipynb — implement DDPM on a 2D toy dataset (Swiss roll); visualize forward noising + reverse denoising animations.

Diffusion Models — Theory

Lecture 21 · ES 667: Deep Learning

Learning outcomes

Where we are

PART 1

Forward & reverse · the big picture

The intuition in one sentence

A physical analogy · ink in water

Forward (easy)

Reverse (hard)

Why this is better than GANs

GAN problems

Diffusion advantages

Forward corrupts · reverse reassembles

Forward + reverse · sequence view

PART 2

The forward process

One forward step · slightly blurring a photo

Why shrink and add noise

A step-by-step · small β = 0.01

The closed form · compounding fades

Why closed-form matters · training speed

Naive (iterative) forward

Closed-form

optional · Closed-form · the derivation in 3 lines

Noise schedules · in one chart

Noise schedules · the numbers

Noise schedules · linear vs cosine

Picking T · the hyperparameter most people ignore

PART 3

Training objective

The reverse process · parameterized

Why predict instead of the mean?

Training · the noise-guessing game

DDPM loss · surprisingly simple

Worked example · one training step

DDPM in PyTorch · 30 lines

Reverse step · denoise then re-noise a little

Network architecture · in one picture

Network architecture · U-Net with time

Sinusoidal time embedding

Time conditioning · inject at every block

PART 4

Connection to score matching

The score field in one picture

Score · the "uphill arrows" view

Score · the mountain-range analogy

The score function · math

Score vs density · why use the score?

Density

Score

Diffusion ≈ score matching · derivation

Two views side-by-side

DDPM view (Ho 2020)

Score-SDE view (Song 2020)

PART 5

Why diffusion won

Diffusion vs VAE vs GAN

Why diffusion beat GANs on image quality

A picture of why iteration helps

Applications · 2026 state

Frontier · where diffusion is heading

Faster sampling

Richer conditioning

Common questions · FAQ

Lecture 21 — summary

Read before Lecture 22

Next lecture