Family	How it samples	Training
VAE (L19)	sample z ~ p(z), decode	ELBO
GAN (L20)	sample z ~ p(z), generator	minimax
Normalizing flows	invertible transforms of p(z)	exact likelihood
Diffusion (L21-22)	iterative denoising from noise	score matching / denoising

Layer	Shape	Params
Input	784	—
Linear → ReLU	256	200,960
Linear → ReLU	64	16,448
Linear (μ only)	16	bottleneck · 1,040
Linear → ReLU	64	1,088
Linear → ReLU	256	16,640
Linear → sigmoid	784	201,488

Training loop

for x in loader:
    recon, kl = model(x)
    recon_loss = F.mse_loss(recon, x, reduction='none').sum(-1)
    loss = (recon_loss + BETA * kl).mean()
    opt.zero_grad(); loss.backward(); opt.step()

Tuning BETA:

BETA = 1 · standard VAE (follows the ELBO derivation).
BETA > 1 · β-VAE (Higgins 2017). Stronger regularization → more disentangled, often blurrier.
BETA < 1 · more weight to reconstruction, less to latent structure.

	Sample quality	Training stability	Likelihood	Sampling speed
VAE	✗ often blurry	✓ stable	✓ ELBO	✓✓ one pass
GAN	✓✓ sharp	✗ brittle	✗ no	✓✓ one pass
Diffusion	✓✓✓ SOTA	✓ stable	≈	✗ many passes

Autoencoders & VAEs

Lecture 19 · ES 667: Deep Learning

Learning outcomes

Where we are

Generative modeling · the task

Three generative strategies

Density-based

Latent-based

PART 1

The generative model family tree

Four families of generative models

PART 2

Autoencoder first

Autoencoder · the postcard analogy

Postcard analogy · in math terms

Worked numeric · 4-pixel autoencoder

Autoencoder vs PCA · what's added

Bottleneck intuition · why it's crucial

A concrete AE · MNIST dimensionality

But autoencoders aren't generative

PART 3

The VAE fix

AE vs VAE

Why a prior? · two jobs it does

1. Defines the sampling distribution

2. Regularizes the posterior

VAE · the encoder outputs a distribution

ELBO geometry

The punchline · you can skip the derivation

ELBO · the hard problem and the easy trick

Deriving the ELBO · step by step

The ELBO · reconstruction + KL

KL · the "cost of being different"

KL · worked numeric example

Posterior collapse · the picture

Posterior collapse · what to watch for

PART 4

Reparameterization in one picture

Reparameterization · gradient flow

Reparameterization · the conveyor-belt analogy

The reparameterization trick

Reparameterization trick · making randomness differentiable

VAE in PyTorch · the whole thing

Training loop

β · the seesaw between recon and KL

Disentanglement · what β-VAE buys you

Conditional VAE · putting labels into the game

PART 5

Generating with a VAE

Worked example · one VAE forward pass

Latent-space interpolation · in pictures

Sampling + interpolation

Sampling gotchas

VAE vs GAN vs Diffusion · quality ranking

Common questions · FAQ

Lecture 19 — summary

Read before Lecture 20

Next lecture