CS 203: Software Tools and Techniques for AI
Lecture Slides & Lab Exercises
Instructor: Prof. Nipun Batra, IIT Gandhinagar
Semester: January 2026
Format: 1.5-hour lecture + 3-hour hands-on lab each week
Course Website: nipunbatra.github.io/stt-ai-26
Repository: github.com/nipunbatra/stt-ai-teaching
About This Course
This 15-week course covers the complete software engineering stack for modern AI development. While most ML courses focus on algorithms, this course focuses on the engineering practices that make ML systems work in the real world.
Learning Outcomes
By the end of this course, you will:
- Build end-to-end ML systems — From data collection to production deployment
- Choose the right tools — Know which tool to use for any data/ML task
- Debug systematically — Understand why things break and fix them
- Write production code — Code that others can understand, maintain, and reproduce
What You’ll Learn
| Module | Key Skills |
|---|---|
| Data Engineering | APIs, web scraping, data validation, annotation pipelines |
| Labeling at Scale | Active learning, weak supervision, LLM-based labeling |
| Model Development | AutoML, transfer learning, LLM fine-tuning |
| Deployment | Docker, FastAPI, Streamlit, CI/CD |
| Production | Model optimization, profiling, drift monitoring |
Part I: Data Engineering (Weeks 1-5)
Week 1: Data Collection
Build robust data collection pipelines using HTTP APIs and web scraping.
Key Tools: Chrome DevTools, curl, requests, BeautifulSoup, Playwright
| Materials | |
|---|---|
| Lecture | PDF ∙ HTML ∙ Source |
| Lab (Notebook) | View ∙ Download .ipynb |
Topics Covered:
- HTTP fundamentals (methods, status codes, headers)
- REST APIs and authentication
- Rate limiting and pagination
- Web scraping with BeautifulSoup
- Building a movie data collector
Week 2: Data Validation & Quality
Ensure data integrity through validation schemas and cleaning pipelines.
Key Tools: jq, csvkit, Pydantic, pandas
| Materials | |
|---|---|
| Lecture | PDF ∙ HTML ∙ Source |
| Lab (Notebook) | View ∙ Download .ipynb |
Topics Covered:
- CLI data inspection (head, tail, wc, jq)
- Schema validation with JSON Schema and Pydantic
- Data profiling and quality metrics
- Handling missing values and type mismatches
Week 3: Data Labeling & Annotation
Set up annotation workflows and measure inter-annotator agreement.
Key Tools: Label Studio, Cohen’s Kappa, CVAT
| Materials | |
|---|---|
| Lecture | PDF ∙ HTML ∙ Source |
| Lab (Notebook) | View ∙ Download .ipynb |
Topics Covered:
- Annotation task types (text, image, audio, video)
- Setting up Label Studio
- Inter-annotator agreement metrics (Kappa, IoU)
- Quality control and annotation guidelines
Week 4: Optimizing Labeling
Reduce labeling costs using active learning, weak supervision, and LLM-based labeling.
Key Tools: modAL, Snorkel, cleanlab, Gemini API
| Materials | |
|---|---|
| Lecture | PDF ∙ HTML ∙ Source |
| Lab (Notebook) | View ∙ Download .ipynb |
Topics Covered:
- Active learning strategies (uncertainty sampling, QBC)
- Weak supervision with labeling functions (Snorkel)
- LLM-based labeling with Gemini API
- Handling noisy labels (cleanlab)
Week 5: Data Augmentation
Expand training datasets through synthetic transformations.
Key Tools: Albumentations, nlpaug, audiomentations
| Materials | |
|---|---|
| Lecture | PDF ∙ HTML ∙ Source |
| Lab (Notebook) | View ∙ Download .ipynb |
Topics Covered:
- Image augmentation (geometric, color, noise)
- Text augmentation (synonym replacement, back-translation)
- Audio augmentation (noise, pitch, speed)
- AutoAugment and test-time augmentation
Part II: Model Development & Deployment (Weeks 6-11)
Week 6: LLM APIs & Multimodal AI
Leverage foundation models for text, vision, and audio tasks.
Key Tools: Gemini API, OpenRouter, OpenAI API
| Materials | |
|---|---|
| Lecture | PDF ∙ HTML ∙ Source |
| Lab (Notebook) | View ∙ Download .ipynb |
Topics Covered:
- LLM fundamentals (tokens, temperature, sampling)
- Prompt engineering (zero-shot, few-shot, chain-of-thought)
- Multimodal capabilities (vision, audio, video)
- Structured outputs and function calling
Week 7: Model Development & Training
Train and evaluate machine learning models effectively.
Key Tools: scikit-learn, AutoGluon, PyTorch, LoRA
| HTML | Source | ||
|---|---|---|---|
| Lecture | Download | View | GitHub |
| Lab | Download | View | GitHub |
Topics Covered:
- Baseline models and cross-validation
- AutoML with AutoGluon
- Transfer learning and fine-tuning
- LLM fine-tuning with LoRA
Week 8: Reproducibility & Environments
Ensure experiments are reproducible and shareable.
Key Tools: Docker, DVC, MLflow, Weights & Biases
| HTML | Source | ||
|---|---|---|---|
| Lecture | Download | View | GitHub |
| Lab | Download | View | GitHub |
Topics Covered:
- Virtual environments and dependency management
- Docker containers for ML
- Data versioning with DVC
- Experiment tracking with MLflow
Week 9: Interactive AI Demos
Build and deploy interactive AI applications.
Key Tools: Streamlit, Gradio, Hugging Face Spaces
| HTML | Source | ||
|---|---|---|---|
| Lecture | Download | View | GitHub |
| Lab | Download | View | GitHub |
Topics Covered:
- Streamlit app development
- Gradio interfaces
- State management and callbacks
- Deployment to Hugging Face Spaces
Week 10: HTTP, APIs & FastAPI
Develop production-grade ML APIs.
Key Tools: FastAPI, curl, Pydantic
| HTML | Source | ||
|---|---|---|---|
| Lecture | Download | View | GitHub |
| Lab | Download | View | GitHub |
Topics Covered:
- REST API design principles
- FastAPI basics and async programming
- Request validation with Pydantic
- Serving ML models as APIs
Week 11: Git, GitHub Actions & CI/CD
Automate testing and deployment with CI/CD pipelines.
Key Tools: PyGithub, GitHub Actions, pytest
| HTML | Source | ||
|---|---|---|---|
| Lecture | Download | View | GitHub |
| Lab | Download | View | GitHub |
Topics Covered:
- Git automation and hooks
- GitHub API with PyGithub
- CI/CD workflow design
- Automated testing for ML
Part III: Production & Optimization (Weeks 12-15)
Week 12: Deployment on Constrained Devices
Optimize models for edge devices and mobile.
Key Tools: ONNX, Quantization, Pruning
| HTML | Source | ||
|---|---|---|---|
| Lecture | Download | View | GitHub |
| Lab | Download | View | GitHub |
Topics Covered:
- Model compression techniques
- Quantization (INT8, FP16)
- Pruning and distillation
- ONNX Runtime deployment
Week 13: Profiling & Optimization
Identify and fix performance bottlenecks.
Key Tools: PyTorch Profiler, NVIDIA Nsight, AMP
| HTML | Source | ||
|---|---|---|---|
| Lecture | Download | View | GitHub |
| Lab | Download | View | GitHub |
Topics Covered:
- GPU profiling and bottleneck analysis
- Mixed precision training (AMP)
- Knowledge distillation
- Memory optimization
Week 14: Model Monitoring & Observability
Detect and respond to model degradation in production.
Key Tools: Evidently AI, Prometheus, Data drift detection
| HTML | Source | ||
|---|---|---|---|
| Lecture | Download | View | GitHub |
| Lab | Download | View | GitHub |
Topics Covered:
- Data drift vs concept drift
- Statistical tests for drift detection
- Monitoring dashboards
- Alerting and retraining triggers
Week 15: Course Summary & Future Trends
Review the ML pipeline and explore emerging topics.
Key Topics: LLMOps, AI Agents, Edge AI, Final Projects
| HTML | Source | ||
|---|---|---|---|
| Lecture | Download | View | GitHub |
Topics Covered:
- Complete ML pipeline review
- Emerging trends (LLMOps, AI Agents)
- Career paths in ML engineering
- Final project presentations
Additional Resources
- Course Syllabus: View on main website
- Course Specification: COURSE_SPEC.md - pedagogical philosophy and content guidelines
- Build Instructions: See README.md for building slides locally
- Report Issues: GitHub Issues
- Diagram Generators: diagram-generators/ - Python scripts for all course diagrams
Core Tools Covered
| Category | Tools |
|---|---|
| Data Collection | requests, BeautifulSoup, Playwright |
| Data Quality | Pydantic, pandas, jq |
| Labeling | Label Studio, modAL, Snorkel |
| LLM APIs | Gemini API, OpenAI API |
| Model Training | scikit-learn, AutoGluon, PyTorch |
| Deployment | FastAPI, Streamlit, Docker |
| Testing & CI | pytest, GitHub Actions |
| Monitoring | Evidently AI, Prometheus |
Course Materials by Prof. Nipun Batra | IIT Gandhinagar
Last updated: December 2024