Week	What We Learned	Key Tool
Week 1-5	Data pipeline: collect → clean → label → augment	pandas, Label Studio, Snorkel
Week 6	Use foundation models via APIs	OpenAI, Gemini
Week 7	Evaluate models: train/test, CV, bias-variance	`cross_val_score`, StratifiedKFold

Section	Topic
Part 1	Hyperparameter Tuning — from brute force to smart search
Part 2	Experiment Tracking — tame the chaos of 100+ runs
Part 3	Reproducibility — make experiments repeatable
Part 4	AutoML — let the computer search for you

Model	Parameters (learned)	How many?
Linear Regression	Weights , bias	One per feature + 1
Neural Network	All weights and biases across every layer	Thousands to billions
Decision Tree	Split thresholds, split features, leaf values	Depends on depth
KNN	None! (stores all training data)	0

Hyperparameters: What YOU Choose

Hyperparameters are knobs you set before training — they control how the model learns.

Model	Hyperparameters (you choose)
Decision Tree	`max_depth`, `min_samples_leaf`, `min_samples_split`
Random Forest	`n_estimators` (number of trees), `max_depth`, `max_features`
Gradient Descent	`learning_rate`, `n_iterations`, `batch_size`
KNN	`n_neighbors` (k), distance metric
Neural Network	Number of layers, neurons per layer, activation function, dropout rate
SVM	`C` (regularization), `kernel`, `gamma`

model = RandomForestClassifier(n_estimators=100, max_depth=10)  # YOU set these
model.fit(X_train, y_train)   # then training learns the parameters

	Parameters	Hyperparameters
Set by	Training algorithm	You (the engineer)
When	During `model.fit()`	Before `model.fit()`
How	Learned from data	Trial and error, or tuning (today!)
ML examples	Weights, split thresholds	max_depth, learning_rate

Strategy	How It Works	Smart?
Grid Search	Drill every 100m, evenly spaced	No — wastes drills in barren areas
Random Search	Drill at random locations	Better — covers more ground
Bayesian Optimization	Look at past drills, build a map, drill where gold is likely	Yes!

	Problem 1: Active Learning	Problem 2: Bayesian Optimization
Goal	Estimate the gold distribution everywhere	Find the location of maximum gold
Question	"What does the whole landscape look like?"	"Where is the richest deposit?"
Strategy	Sample where most uncertain	Sample where score likely highest
Use case	Data labeling (Weeks 4-5)	Hyperparameter tuning (today!)

General Formulation	Gold Mining Version
Unknown function	Gold concentration at location
Each evaluation is expensive	Each drill costs ₹10,000
We want	We want the richest drilling location
Limited budget of evaluations	We can only afford drills

Active Learning	Bayesian Optimization
"Where am I most uncertain?"	"Where might the score be highest?"
Spreads samples evenly	Focuses samples near the peak
Great for labeling data	Great for tuning hyperparameters

Gold mining concept	Optuna equivalent
One drill location	`trial`
How much gold found	`objective(trial)` — you write this
The map of all drills	`study`
Richest spot found	`study.best_params`

	Grid	Random	BayesOpt (Optuna)
Uses past results?	No	No	Yes
Intelligence	None	None	High
Efficiency	Low	Medium	High
Scales to many params	No	Yes	Yes
Pruning support	No	No	Yes

Problem	What Happens
Lost configs	"What hyperparameters gave me 85.7%?" — no record
No comparison	50 runs in a notebook, can't tell which is best at a glance
No history	Overwrite a cell → previous results gone forever
No reproducibility	"It worked yesterday" — but you don't know what changed
Wasted time	Re-run experiments you've already tried but forgot about

Category	Examples	Why
Config	Hyperparameters, model type, dataset version	Know what you tried
Metrics	Accuracy, loss, F1 — per step and final	Know what worked
Artifacts	Model weights, plots, confusion matrices	Reproduce the best
Environment	Python version, package versions, git hash	Debug differences
Metadata	Run name, tags, notes, timestamp	Organize and search

Feature	Details
Local storage	SQLite in `~/.cache/huggingface/trackio/`
Dashboard	Gradio-based, runs locally (`trackio show`)
W&B-compatible API	`init`, `log`, `finish` — same pattern
Non-blocking logging	Background thread, thousands of logs/sec
CLI	`trackio list`, `trackio show` for quick access
Free forever	No account, no cloud, no cost

Tool	Hosting	Best For
Trackio	Local	Free, simple, course projects
MLflow	Self-hosted	Enterprise, model registry
W&B	Cloud	Teams, sweeps, rich visualizations
TensorBoard	Local	TF/PyTorch training curves

Part 2: Tracking	Part 3: Reproducibility
"Which run had 85.7%?"	"Can I get 85.7% again?"
Records the what	Ensures the how is repeatable

Good for	Be careful when
Tabular data (CSVs)	Model must be interpretable
Quick baselines	Latency matters (real-time serving)
Lack time or ML expertise	Model must fit on edge device
Kaggle competitions	Non-tabular data (images, text)

Strategy	Key Idea	When to Use
Grid search	Try every combination	2-3 params, small grid
Random search	Sample randomly, better coverage	Quick exploration, 4+ params
Bayesian (Optuna)	Learn from past results, focus on promising areas	Serious tuning, expensive evaluations
AutoML	Loop over model families + tuning	Find the performance ceiling

Practice	Key Idea
Experiment tracking	Log every run with Trackio (`init`, `log`, `finish`)
sklearn reproducibility	`random_state=42` is enough
Multi-seed reporting	Report mean ± std across 5 seeds
Pipeline	Put preprocessing inside to prevent leakage
AL vs BayesOpt	Same GP model, different goal (learn everywhere vs find the max)

Tuning, AutoML & Experiment Tracking

Week 8: CS 203 - Software Tools and Techniques for AI

Previously on CS 203...

Today's Roadmap

Where We Are

Part 1: Hyperparameter Tuning

Parameters: What the Model Learns

Hyperparameters: What YOU Choose

Parameters vs Hyperparameters: Summary

Motivating Example: Which Polynomial Degree?

The Gold Mining Analogy

Three Strategies for Finding Gold

Strategy 1: Grid Search

Grid Search in 1D: The Simplest Approach

Grid Search in Multiple Dimensions

Grid Search: The Explosion Problem

Grid Search: The sklearn Way

Strategy 2: Random Search

Grid's Hidden Problem: Wasted Evaluations

Why Random Beats Grid in 2D

Random Search in Code

But Both Are Still Blind!

Strategy 3: Bayesian Optimization

The Gold Field: What's Really Underground

Two Problems, One Tool

The Model: A Map with Error Bars

Before Any Drills: The Prior

After a Few Drills: The Posterior

Quick Detour: Active Learning

Active Learning Refresher (Weeks 4-5)

Active Learning: Iteration 0

Active Learning: Iteration 1

Active Learning: Iteration 2

Active Learning: Iteration 3

Active Learning: Iteration 4

Active Learning: Iteration 5

Active Learning: Iteration 6

Active Learning: Iteration 7

Active Learning: Iteration 8

Active Learning: Iteration 9

Back to Bayesian Optimization

BayesOpt: The Problem Statement

BayesOpt: The Algorithm

Where to Drill Next? The Explore-Exploit Dilemma

Acquisition Functions: Scoring Every Location

Expected Improvement: The Most Common Acquisition Function

Other Ways to Pick the Next Drill

BayesOpt in Action: Iteration 0

BayesOpt: Iteration 1

BayesOpt: Iteration 2

BayesOpt: Iteration 3

BayesOpt: Iteration 4

BayesOpt: Iteration 5

BayesOpt: Iteration 6

BayesOpt: Iteration 7

BayesOpt: Iteration 8

BayesOpt: Iteration 9

AL vs BayesOpt: Side by Side

AL vs BayesOpt: Iteration 0

AL vs BayesOpt: Iteration 1

AL vs BayesOpt: Iteration 2

AL vs BayesOpt: Iteration 3

AL vs BayesOpt: Iteration 4

AL vs BayesOpt: Iteration 5

AL vs BayesOpt: Iteration 6

AL vs BayesOpt: Iteration 7

AL vs BayesOpt: Iteration 8

AL vs BayesOpt: Iteration 9

AL vs BayesOpt: The Takeaway

Going to 2D (and Beyond)

Gold Mining in 2D

BayesOpt in 2D: Iteration 0

BayesOpt in 2D: Iteration 1

BayesOpt in 2D: Iteration 2

BayesOpt in 2D: Iteration 3

BayesOpt in 2D: Iteration 4

BayesOpt in 2D: Iteration 5

BayesOpt in 2D: Iteration 6

BayesOpt in 2D: Iteration 7

BayesOpt in 2D: Iteration 8