Course website Syllabus & reading map L00 Primer GitHub repo
Textbook backbone
The course follows Prince · Understanding Deep Learning (2023, free PDF). Each lecture maps to one or more UDL chapters — see the tables below and the full syllabus.
| Primary — assigned reading |
Prince, Understanding Deep Learning (2023) — free PDF, CC-BY figures |
| Supplement — rigour & curious reader |
Bishop & Bishop, Deep Learning: Foundations and Concepts (2024) |
| Labs — hands-on PyTorch |
Zhang et al., Dive into Deep Learning |
| Video — Transformer / LLM |
Karpathy, Neural Networks: Zero to Hero |
| Classic — bibliography |
Goodfellow, Bengio, Courville (2016) |
Module 0 · A Probabilistic View of ML (spans 2 sessions, recommended pre-read)
The spine that explains every loss and regularizer in the course. Linear and logistic regression, L1 / L2, KL — all derived from Bayes’ rule, MLE, MAP, and information theory. Recommended for anyone who hasn’t done probability recently; foundational for VAE, GAN, diffusion, RLHF, and distillation later.
Session 1 · distributions (Bernoulli / Categorical / Normal · why Normal · plate notation · sampling primitives · reparameterization trick) · Bayes’ rule · MLE for coin / linear / logistic / multiclass. Session 2 · MAP + L1/L2 from priors · KL divergence + forward vs reverse · how KL underlies VAE / diffusion / RLHF / distillation.
| 0 |
A Probabilistic View of ML |
Bishop & Bishop §2.1–2.3, §4.1–4.3, §5.4; Murphy Ch 2 |
HTML · PDF |
Notebook |
Module 1 · Foundations & Deep Networks (L1–L3)
| 1 |
Why Deep Learning + MLP Recap |
Ch 1, 3 |
HTML · PDF |
|
| 2 |
Universal Approximation & Going Deep |
Ch 4, 7, 11 |
HTML · PDF |
|
| 3 |
Training Deep Networks in Practice |
Ch 6, 8 |
HTML · PDF |
|
Module 2 · Optimization (L4–L5)
| 4 |
SGD, Momentum, Nesterov |
Ch 6 |
HTML · PDF |
|
| 5 |
Adam, AdamW, LR Schedules |
Ch 6, 7 |
HTML · PDF |
|
Module 3 · Regularization (L6) — spans two sessions
| 6 |
Regularization (classical + dropout + normalization) |
Ch 9, 11 |
HTML · PDF |
|
Module 4 · CNNs & Visual Recognition (L7–L9)
| 7 |
CNN Deep Dive + Classic Architectures |
Ch 10 |
HTML · PDF |
|
| 8 |
Modern Architectures & Transfer Learning |
Ch 10, 11 |
HTML · PDF |
|
| 9 |
Object Detection, Localization & Segmentation |
Bishop Ch 10 |
HTML · PDF |
|
Module 5 · Sequence Models (L10–L11)
| 10 |
RNNs, LSTMs, GRUs |
Bishop Ch 12 |
HTML · PDF |
|
| 11 |
Seq2Seq & Motivation for Attention |
Bishop Ch 12 |
HTML · PDF |
|
Module 7 · LLMs (L15–L16)
| 15 |
Large Language Models |
Chinchilla + HF course |
HTML · PDF |
|
| 16 |
Alignment & Fine-tuning |
HF PEFT + DPO |
HTML · PDF |
|
Module 8 · Self-Supervised & Vision-Language (L17–L18)
| 17 |
Self-Supervised & Contrastive Learning |
Ch 14 |
HTML · PDF |
|
| 18 |
Vision-Language Models |
Ch 12 + CLIP |
HTML · PDF |
|
Module 9 · Generative Models (L19–L22)
| 19 |
Autoencoders & VAEs |
Ch 17 |
HTML · PDF |
|
| 20 |
GANs |
Ch 15 |
HTML · PDF |
|
| 21 |
Diffusion Models — Theory |
Ch 18 |
HTML · PDF |
|
| 22 |
Diffusion Models — Practice |
Ch 18 + diffusers |
HTML · PDF |
|
Module 10 · Wrap-up (L23–L24)
| 23 |
Efficient Inference (KV-cache, quantization, FlashAttention) |
Chip Huyen + HF inference |
HTML · PDF |
|
| 24 |
Frontier · Agents, Reasoning, Interpretability |
Anthropic interp + o1 blog |
HTML · PDF |
|
Code threads
- Language · micrograd (L3) → MLP (L3) → Transformer (L12–13) → nanoGPT (L13–14) → LoRA fine-tune (L16)
- Vision · MLP-MNIST (L1) → CNN-CIFAR (L7) → transfer-ResNet (L8) → ViT-CLIP (L18) → diffusion (L21–22)
Canonical examples
Cat / not-cat · MNIST · CIFAR-10 · Tiny Shakespeare · CelebA