Strategy	When to Use
Manual labeling	Small datasets, high-stakes domains
Active learning	Large unlabeled pool, expensive labels
Weak supervision	Heuristics + noisy labels at scale
LLM-assisted	Text tasks, when budget allows API calls

Domain	Techniques
Images	Flip, rotate, crop, color jitter
Text	Synonym replacement, back-translation, paraphrase
Tabular	SMOTE, noise injection

Concept	Why It Matters
Cross-validation	One train/test split is a coin flip
Stratified K-Fold	Preserves class balance in each fold
Bias-variance tradeoff	Underfitting vs overfitting, visualized
Leakage	The #1 cause of "too good to be true" results

Method	Tries	Finds Best?	When to Use
Grid Search	All combos	Eventually	< 3 hyperparameters
Random Search	Random subset	Often faster	3+ hyperparameters
Bayesian (Optuna)	Smart picks	Yes	Expensive evaluations

Framework	Best For	Lines of Code
Flask	Custom web apps (you write HTML)	~40
Streamlit	Data dashboards (pure Python)	~15
Gradio	ML demos (define inputs/outputs)	~10
FastAPI	Production APIs (JSON in/out)	~20

Profiling & Quantization: What We Learned

Profiling: Find the bottleneck before optimizing.

time.time() → cProfile → line_profiler → memory_profiler

Quantization: Make models smaller with minimal accuracy loss.

FP32 (4 bytes) → INT8 (1 byte) = 4x smaller, < 1% accuracy drop

ONNX: Train in Python, run anywhere (mobile, browser, edge).

Key insight: The #1 speedup for web apps is loading the model once at startup instead of per-request. Profile before you optimize.

Topic	Why It Matters	Where to Learn
Deep Learning	CNNs, transformers, fine-tuning	CS 337, fast.ai
MLOps	CI/CD, model registries, monitoring	Made With ML
Cloud Deployment	AWS/GCP/Azure, Kubernetes	Cloud provider docs
Data Engineering	ETL pipelines, data warehouses	dbt, Airflow
ML System Design	Scaling, A/B testing, feedback loops	Chip Huyen's book

Course Summary & What's Next