Model	AIME 2024 (math)	Codeforces
GPT-4	12%	~800 Elo
o1	74%	~1800 Elo
o3	97%	~2700 Elo (grandmaster)

SAE · how it disentangles superposition

Problem · superposition. A single neuron in the residual stream might fire for "Golden Gate Bridge", "the colour red", AND "Python syntax errors". Confusing.

SAE recipe:

Input · residual-stream vector (e.g. ).
Encoder · expand to a much wider sparse vector (). , with a sparsity penalty forcing most entries to 0 (typically only 10–50 non-zero per input).
Decoder · reconstruct .
Train to minimize + sparsity penalty.

Worked numeric (toy). (meaningless — superposition). Trained encoder maps to:

Researchers find all inputs that activate feature 6 → all about the Golden Gate Bridge. Label: "Feature 6 = Golden Gate Bridge".

Anthropic 2024 · SAE on Claude 3 Sonnet → millions of human-readable features. Clamp features → control behavior ("Golden Gate Claude" demo).

2018	2024	2030 (?)
Misclassify image	Give wrong factual answer	Autonomously execute bad plan
Cost: annoy user	Cost: spread misinformation	Cost: catastrophic

Module	Covered
Foundations (L1-L2)	why DL, UAT, depth vs width, residuals
Training craft (L3-L6)	recipe, SGD / Adam, schedules, regularization
Vision (L7-L9)	CNN mechanics, ResNet family, detection, SAM
Sequences → Transformers (L10-L14)	RNN/LSTM/GRU, Seq2Seq, attention, Transformer, tokenization
LLMs (L15-L16)	scaling laws, RoPE, GQA, LoRA, RLHF, DPO
Self-supervision + VLMs (L17-L18)	SimCLR, MAE, CLIP, LLaVA
Generative (L19-L22)	VAE, GAN, DDPM, CFG, latent diffusion
Systems + frontier (L23-L24)	KV-cache, quantization, agents, reasoning, interp

Module	Lectures	Big ideas
1 Foundations	L1–L3	MLP, ResNets, training recipe
2 Optimization	L4–L5	SGD, momentum, Adam, schedules
3 Regularization	L6	double descent, augmentation, norm, dropout
4 CNNs	L7–L9	architecture evolution, detection, SAM
5 Sequences	L10–L11	LSTM, Seq2Seq, bottleneck
6 Transformers	L12–L14	attention, nanoGPT, tokenization
7 LLMs	L15–L16	scaling laws, LoRA, RLHF, DPO
8 SSL + VLM	L17–L18	SimCLR, CLIP, LLaVA
9 Generative	L19–L22	VAE, GAN, DDPM, Stable Diffusion
10 Frontier	L23–L24	inference, agents, interp

Frontier — Agents · Reasoning · Interpretability

Lecture 24 · ES 667: Deep Learning

Learning outcomes

The final lecture

PART 1

Agents & tool use

Brain in a jar · why agents matter

From chatbot to agent

The ReAct loop

ReAct · annotated with tools

Function calling · how agents work

Computer use · agents controlling UIs

PART 2

Reasoning models

Chain-of-thought · the 2022 discovery

Reasoning models · outcome vs process rewards

Reasoning · the training recipe

Worked example · rewarding chains of thought

The scaling laws recap · in one figure

Test-time compute · the college-exam analogy

A new scaling axis

Reasoning · tree search at inference

Reasoning models · benchmarks

PART 3

Mechanistic interpretability

The problem

Induction head · a discovered circuit

Residual stream · the shared-whiteboard analogy

Residual stream · worked numeric

Sparse autoencoders · the dictionary analogy

Sparse autoencoders · the inverted bottleneck

SAE · how it disentangles superposition

Why mech interp matters

PART 4

Open problems

Safety · why alignment matters

What you've learned · a final recap

Open problems · the short list

Where's the field going?

PART 5

Course recap

The 24-lecture arc

What you can now do

What to read next

What to do next

Lecture 24 — summary

Thank you.

ES 667 · Deep Learning · Aug 2026