ES 667: Deep Learning

IIT Gandhinagar · Aug 2026 · Prof. Nipun Batra

Course website Syllabus & reading map L00 Primer GitHub repo

Textbook backbone

The course follows Prince · Understanding Deep Learning (2023, free PDF). Each lecture maps to one or more UDL chapters — see the tables below and the full syllabus.

Role	Resource
Primary — assigned reading	Prince, Understanding Deep Learning (2023) — free PDF, CC-BY figures
Supplement — rigour & curious reader	Bishop & Bishop, Deep Learning: Foundations and Concepts (2024)
Labs — hands-on PyTorch	Zhang et al., Dive into Deep Learning
Video — Transformer / LLM	Karpathy, Neural Networks: Zero to Hero
Classic — bibliography	Goodfellow, Bengio, Courville (2016)

Module 0 · A Probabilistic View of ML (spans 2 sessions, recommended pre-read)

The spine that explains every loss and regularizer in the course. Linear and logistic regression, L1 / L2, KL — all derived from Bayes’ rule, MLE, MAP, and information theory. Recommended for anyone who hasn’t done probability recently; foundational for VAE, GAN, diffusion, RLHF, and distillation later.

Session 1 · distributions (Bernoulli / Categorical / Normal · why Normal · plate notation · sampling primitives · reparameterization trick) · Bayes’ rule · MLE for coin / linear / logistic / multiclass. Session 2 · MAP + L1/L2 from priors · KL divergence + forward vs reverse · how KL underlies VAE / diffusion / RLHF / distillation.

#	Topic	Reading	Slides	Notebook
0	A Probabilistic View of ML	Bishop & Bishop §2.1–2.3, §4.1–4.3, §5.4; Murphy Ch 2	HTML · PDF	Notebook

Module 1 · Foundations & Deep Networks (L1–L3)

#	Topic	UDL	Slides
1	Why Deep Learning + MLP Recap	Ch 1, 3	HTML · PDF
2	Universal Approximation & Going Deep	Ch 4, 7, 11	HTML · PDF
3	Training Deep Networks in Practice	Ch 6, 8	HTML · PDF

Module 2 · Optimization (L4–L5)

#	Topic	UDL	Slides	Notebook
4	SGD, Momentum, Nesterov	Ch 6	HTML · PDF
5	Adam, AdamW, LR Schedules	Ch 6, 7	HTML · PDF

Module 3 · Regularization (L6) — spans two sessions

#	Topic	UDL	Slides	Notebook
6	Regularization (classical + dropout + normalization)	Ch 9, 11	HTML · PDF

Module 4 · CNNs & Visual Recognition (L7–L9)

#	Topic	UDL	Slides
7	CNN Deep Dive + Classic Architectures	Ch 10	HTML · PDF
8	Modern Architectures & Transfer Learning	Ch 10, 11	HTML · PDF
9	Object Detection, Localization & Segmentation	Bishop Ch 10	HTML · PDF

Module 5 · Sequence Models (L10–L11)

#	Topic	Reading	Slides	Notebook
10	RNNs, LSTMs, GRUs	Bishop Ch 12	HTML · PDF
11	Seq2Seq & Motivation for Attention	Bishop Ch 12	HTML · PDF

Module 6 · Attention & Transformers (L12–L14)

#	Topic	UDL	Slides
12	Attention Mechanism	Ch 12	HTML · PDF
13	The Transformer — Built Live	Ch 12	HTML · PDF
14	Tokenization & Pretraining Paradigms	Ch 12	HTML · PDF

Module 7 · LLMs (L15–L16)

#	Topic	Reading	Slides	Notebook
15	Large Language Models	Chinchilla + HF course	HTML · PDF
16	Alignment & Fine-tuning	HF PEFT + DPO	HTML · PDF

Module 8 · Self-Supervised & Vision-Language (L17–L18)

#	Topic	UDL	Slides	Notebook
17	Self-Supervised & Contrastive Learning	Ch 14	HTML · PDF
18	Vision-Language Models	Ch 12 + CLIP	HTML · PDF

Module 9 · Generative Models (L19–L22)

#	Topic	UDL	Slides
19	Autoencoders & VAEs	Ch 17	HTML · PDF
20	GANs	Ch 15	HTML · PDF
21	Diffusion Models — Theory	Ch 18	HTML · PDF
22	Diffusion Models — Practice	Ch 18 + `diffusers`	HTML · PDF

Module 10 · Wrap-up (L23–L24)

#	Topic	Reading	Slides	Notebook
23	Efficient Inference (KV-cache, quantization, FlashAttention)	Chip Huyen + HF inference	HTML · PDF
24	Frontier · Agents, Reasoning, Interpretability	Anthropic interp + o1 blog	HTML · PDF

Code threads

Language · micrograd (L3) → MLP (L3) → Transformer (L12–13) → nanoGPT (L13–14) → LoRA fine-tune (L16)
Vision · MLP-MNIST (L1) → CNN-CIFAR (L7) → transfer-ResNet (L8) → ViT-CLIP (L18) → diffusion (L21–22)

Canonical examples

Cat / not-cat · MNIST · CIFAR-10 · Tiny Shakespeare · CelebA