Creating Training Examples

From name "aarav", create (context → target) pairs:

Language Models

The Secret Behind ChatGPT

Learning to Predict the Next Token

The Story So Far

The Shocking Truth About ChatGPT

You Already Use This!

Today's Goal

Part 1: The Core Idea

Next Token Prediction

What is Next-Token Prediction?

Why Prediction = Understanding

Our Toy Problem: Name Generation

The Prediction Task

Part 2: Building the Dataset

From Names to Training Examples

Our Vocabulary

Creating Training Examples

Wait, Context is Just 1 Character?

Context Window of Size 3

Building the Dataset in Python

What Does Our Dataset Look Like?

Part 3: Embeddings

From Characters to Vectors

The Problem: Neural Nets Need Numbers

Character Embeddings

The Embedding Layer

Why Embeddings Are Powerful

Embeddings Learn Similarity!

Embedding the Context

Part 4: The Model

From Embeddings to Predictions

Model Architecture

The Model in PyTorch

Understanding the Forward Pass

Part 5: Training

Learning to Predict

The Output: Logits

Softmax: Logits → Probabilities

The Loss Function: Cross-Entropy

Why Cross-Entropy Works

Cross-Entropy: Worked Example

The Training Loop

What Gets Learned?

Training Progress

Part 6: Generation

Sampling New Names

Generating a Name

Generation Code

Generated Names

Wait... What Did We Just Build?

Why Sampling, Not Argmax?

Part 7: Temperature

Controlling Creativity

The Temperature Knob

Temperature Examples

Temperature in Code

Temperature Demo

Temperature: A Real-World Analogy

Part 8: The Big Picture

From Toy Model to ChatGPT

What We Built vs ChatGPT

Key Differences in Real LLMs

Tokenization: Not Chars, Not Words

Tokenization Examples

The "Strawberry" Problem

Try It Yourself: OpenAI Tokenizer

What is Attention?

Attention: The Intuition

The Transformer (2017)

From Prediction to Assistant

Summary: The Recipe

Key Takeaways

What's Next?

You Built a Language Model!

The Secret: Predict Next Token, Repeat