Interactive Explainer

VAE Latent-Space Explorer

The β knob in a VAE trades reconstruction for latent structure. At β=0 you have a vanilla autoencoder — great reconstruction, chaotic latent space. At β=10 you have a strong VAE — clean unit-Gaussian latent, but samples get blurry. Drag the slider and see it.

~8 minDeep Learning · VAE · Generative

A VAE's loss has two terms: reconstruction (how well can I decode this back?) and a KL divergence pulling the latent distribution q(z|x) toward the standard normal prior. The β-VAE (Higgins 2017) simply weights the KL term: L = recon + β · KL. Slide β to see what it does.

The playground

β (KL weight): 1.0

Avg latent norm ∥z∥: – Latent spread (std): – Recon error: –

What's happening: at β=0 the latent points cluster in a few tight clumps — one per class. At β=1 the clumps become blobs of width 1, roughly tiling a unit circle. At β=10 everything collapses to the origin (posterior collapse) — the model stops using the latent.

Tradeoff rule. β=1 is the original VAE. β≈4 was found by Higgins 2017 to produce "disentangled" latents where each dimension controls one factor. β>6 usually collapses. Always look at samples, not just latent plots.

The reparameterization trick

z = μ(x) + σ(x) ⊙ ε ε ~ N(0, I)

The encoder outputs μ and σ (per input). We sample ε from a fixed noise distribution and compute z deterministically. The noise is in ε, which has no parameters, so gradients flow through μ and σ fine. This is the trick that makes VAE training work.

The ELBO

log p(x) ≥ E_{q(z|x)}[log p(x|z)] − KL(q(z|x) ∥ p(z))

The ELBO is a lower bound on the log-marginal likelihood. We can't compute log p(x) directly (intractable marginal), but we CAN compute and maximize its lower bound. That's what VAE training does.

Part of the ES 667 Deep Learning course · IIT Gandhinagar · Aug 2026.