Primary textbook · Simon J. D. Prince, Understanding Deep Learning (MIT Press, 2023). Free PDF at udlbook.github.io.
The course follows the UDL chapter order closely. Read the cited chapter(s) before each lecture.
Lecture 0 is a probability / MLE primer; 24 main lectures follow. L6 (Regularization) is comprehensive — spans two class sessions.
Module 0 · Probability & MLE Primer
| 0 |
Probability, MLE & NLL |
Bishop & Bishop Ch 2–5; Prince Ch 1, Ch 3 |
KL, MAP, reparameterization, score-function preview |
Module 1 · Foundations & Going Deep
| 1 |
Why DL + MLP Recap |
Ch 1 Introduction · Ch 3 Shallow networks |
Bishop Ch 6 |
| 2 |
UAT + ResNets + Initialization |
Ch 4 Deep networks · Ch 7 Gradients & init · Ch 11 Residual networks |
He et al. 2015 (ResNet) |
| 3 |
Training Deep Networks in Practice |
Ch 6 Fitting models (early) · Ch 8 Measuring performance |
Karpathy makemore-1 |
Module 2 · Optimization
| 4 |
SGD, Momentum, Nesterov |
Ch 6 (SGD + momentum) |
Sutskever et al. 2013 |
| 5 |
Adam, AdamW, Schedules |
Ch 6 (Adam) · Ch 7 (initialization revisited) |
Kingma & Ba 2015; Loshchilov & Hutter 2017 |
Module 3 · Regularization
| 6 |
Regularization (2 sessions) · classical + data + dropout + normalization |
Ch 9 Regularization · Ch 11 (BatchNorm) |
Belkin 2019 (double descent); Santurkar 2018 (BN); Ba 2016 (LN) |
Module 4 · CNNs & Visual Recognition
| 7 |
CNN Deep Dive + Classic Architectures |
Ch 10 Convolutional networks (early) |
LeCun 1998; Krizhevsky 2012 |
| 8 |
Modern CNNs & Transfer Learning |
Ch 10 (advanced) · Ch 11 (skip in CNNs) |
Szegedy 2014; Howard 2017 |
| 9 |
Detection & Segmentation |
Not in Prince — use Bishop Ch 10 + CS231n notes |
Ren 2015 (Faster R-CNN); Redmon 2015 (YOLO); Ronneberger 2015 (U-Net); SAM |
Module 5 · Sequence Models
| 10 |
RNNs, LSTMs, GRUs |
Bishop Ch 12 · d2l §9–10 |
Hochreiter & Schmidhuber 1997 |
| 11 |
Seq2Seq & Motivation for Attention |
Bishop Ch 12 · d2l §10.6–10.8 |
Sutskever et al. 2014 |
Module 7 · LLMs
| 15 |
Large Language Models |
Hoffmann 2022 (Chinchilla); HuggingFace course Ch 1 |
Karpathy State of GPT; Touvron 2023 (Llama 2); Su 2021 (RoPE) |
| 16 |
Alignment & Fine-tuning |
HF PEFT docs; Ouyang 2022 (InstructGPT); Rafailov 2023 (DPO) |
Hu 2021 (LoRA); Dettmers 2023 (QLoRA); Anthropic Constitutional AI |
Module 8 · Self-Supervision & Vision-Language
| 17 |
Self-Supervised & Contrastive Learning |
Ch 14 Unsupervised learning (contrastive) |
Chen 2020 (SimCLR); Grill 2020 (BYOL); He 2021 (MAE); Oquab 2023 (DINOv2) |
| 18 |
Vision-Language Models |
Ch 12 (ViT) + papers |
Dosovitskiy 2020 (ViT); Radford 2021 (CLIP); Liu 2023 (LLaVA); Alayrac 2022 (Flamingo) |
Module 9 · Generative Models
| 19 |
Autoencoders & VAEs |
Ch 17 Variational autoencoders |
Kingma & Welling 2013 |
| 20 |
GANs |
Ch 15 GANs |
Goodfellow 2014; Radford 2015 (DCGAN); Arjovsky 2017 (WGAN) |
| 21 |
Diffusion Models — Theory |
Ch 18 Diffusion models (early) |
Ho et al. 2020 (DDPM); Song 2020 (Score-SDE) |
| 22 |
Diffusion Models — Practice |
Ch 18 (later) + HF diffusers docs |
Rombach 2022 (Stable Diffusion); Ho & Salimans 2022 (CFG); Peebles & Xie 2023 (DiT) |
Module 10 · Wrap-up
| 23 |
Efficient Inference (KV-cache, quantization, FlashAttention, distillation, speculative decoding) |
Chip Huyen blog; HF inference docs; Dao 2022 (FlashAttention); Hinton 2015 (distillation); Leviathan 2022 (speculative decoding); Kwon 2023 (vLLM) |
| 24 |
Frontier · Agents, Reasoning, Interpretability + Course Recap |
Yao 2022 (ReAct); Wei 2022 (CoT); OpenAI o1 blog; Anthropic interpretability blog; Elhage 2021 (circuits) |
Where UDL doesn’t cover
- L9 · Detection & Segmentation — UDL does not cover OD/segmentation. Use Bishop Ch 10 + CS231n notes.
- L15 · LLMs at scale — scaling laws, RoPE, GQA, distributed training. Use Chinchilla paper + HF course.
- L16 · Alignment & Fine-tuning — LoRA, RLHF, DPO. Use HF PEFT docs + DPO paper.
- L23 · Efficient Inference — KV-cache, quantization, FlashAttention. Blog posts + papers.
- L24 · Frontier · Agents, Reasoning, Interpretability — active 2024–26 research. Curated blogs + papers.
Other references
- Bishop & Bishop, Deep Learning: Foundations and Concepts (2024) — rigorous second opinion.
- Zhang et al., Dive into Deep Learning (d2l.ai) — hands-on PyTorch notebooks.
- Karpathy, Neural Networks: Zero to Hero (YouTube) — build-from-scratch video series; backup for L12–L14.
- Goodfellow, Bengio, Courville, Deep Learning (2016) — classical reference.