CS 203: Software Tools and Techniques for AI

Lecture Slides & Lab Exercises

NoteCourse Information

Instructor: Prof. Nipun Batra, IIT Gandhinagar

Semester: January 2026

Format: 1.5-hour lecture + 3-hour hands-on lab each week

Course Website: nipunbatra.github.io/stt-ai-26

Repository: github.com/nipunbatra/stt-ai-teaching

About This Course

This 15-week course covers the complete software engineering stack for modern AI development. While most ML courses focus on algorithms, this course focuses on the engineering practices that make ML systems work in the real world.

Learning Outcomes

By the end of this course, you will:

  1. Build end-to-end ML systems — From data collection to production deployment
  2. Choose the right tools — Know which tool to use for any data/ML task
  3. Debug systematically — Understand why things break and fix them
  4. Write production code — Code that others can understand, maintain, and reproduce

What You’ll Learn

Module Key Skills
Data Engineering APIs, web scraping, data validation, annotation pipelines
Labeling at Scale Active learning, weak supervision, LLM-based labeling
Model Development AutoML, transfer learning, LLM fine-tuning
Deployment Docker, FastAPI, Streamlit, CI/CD
Production Model optimization, profiling, drift monitoring

Quick Navigation

Part I — Data Engineering (Weeks 1-5)

Week Topic Lecture Lab
1 Data Collection PDFHTML Notebook
2 Data Validation PDFHTML Notebook
3 Data Labeling PDFHTML Notebook
4 Optimizing Labeling PDFHTML Notebook
5 Data Augmentation PDFHTML Notebook

Part II — Model Development & Deployment (Weeks 6-11)

Week Topic Lecture Lab
6 LLM APIs PDFHTML Notebook
7 Model Development PDFHTML Slides
8 Reproducibility PDFHTML Slides
9 Interactive Demos PDFHTML Slides
10 HTTP & APIs PDFHTML Slides
11 Git & CI/CD PDFHTML Slides

Part III — Production & Optimization (Weeks 12-15)

Week Topic Lecture Lab
12 Edge Deployment PDFHTML Slides
13 Profiling PDFHTML Slides
14 Monitoring PDFHTML Slides
15 Summary PDFHTML

Part I: Data Engineering (Weeks 1-5)

Learning Goal: Transform raw, messy data into clean, labeled datasets ready for ML training.

Week 1: Data Collection

Build robust data collection pipelines using HTTP APIs and web scraping.

Key Tools: Chrome DevTools, curl, requests, BeautifulSoup, Playwright

Materials
Lecture PDFHTMLSource
Lab (Notebook) ViewDownload .ipynb

Topics Covered:

  • HTTP fundamentals (methods, status codes, headers)
  • REST APIs and authentication
  • Rate limiting and pagination
  • Web scraping with BeautifulSoup
  • Building a movie data collector

Week 2: Data Validation & Quality

Ensure data integrity through validation schemas and cleaning pipelines.

Key Tools: jq, csvkit, Pydantic, pandas

Materials
Lecture PDFHTMLSource
Lab (Notebook) ViewDownload .ipynb

Topics Covered:

  • CLI data inspection (head, tail, wc, jq)
  • Schema validation with JSON Schema and Pydantic
  • Data profiling and quality metrics
  • Handling missing values and type mismatches

Week 3: Data Labeling & Annotation

Set up annotation workflows and measure inter-annotator agreement.

Key Tools: Label Studio, Cohen’s Kappa, CVAT

Materials
Lecture PDFHTMLSource
Lab (Notebook) ViewDownload .ipynb

Topics Covered:

  • Annotation task types (text, image, audio, video)
  • Setting up Label Studio
  • Inter-annotator agreement metrics (Kappa, IoU)
  • Quality control and annotation guidelines

Week 4: Optimizing Labeling

Reduce labeling costs using active learning, weak supervision, and LLM-based labeling.

Key Tools: modAL, Snorkel, cleanlab, Gemini API

Materials
Lecture PDFHTMLSource
Lab (Notebook) ViewDownload .ipynb

Topics Covered:

  • Active learning strategies (uncertainty sampling, QBC)
  • Weak supervision with labeling functions (Snorkel)
  • LLM-based labeling with Gemini API
  • Handling noisy labels (cleanlab)

Week 5: Data Augmentation

Expand training datasets through synthetic transformations.

Key Tools: Albumentations, nlpaug, audiomentations

Materials
Lecture PDFHTMLSource
Lab (Notebook) ViewDownload .ipynb

Topics Covered:

  • Image augmentation (geometric, color, noise)
  • Text augmentation (synonym replacement, back-translation)
  • Audio augmentation (noise, pitch, speed)
  • AutoAugment and test-time augmentation

Part II: Model Development & Deployment (Weeks 6-11)

Learning Goal: Build, train, and deploy ML models as production-ready applications.

Week 6: LLM APIs & Multimodal AI

Leverage foundation models for text, vision, and audio tasks.

Key Tools: Gemini API, OpenRouter, OpenAI API

Materials
Lecture PDFHTMLSource
Lab (Notebook) ViewDownload .ipynb

Topics Covered:

  • LLM fundamentals (tokens, temperature, sampling)
  • Prompt engineering (zero-shot, few-shot, chain-of-thought)
  • Multimodal capabilities (vision, audio, video)
  • Structured outputs and function calling

Week 7: Model Development & Training

Train and evaluate machine learning models effectively.

Key Tools: scikit-learn, AutoGluon, PyTorch, LoRA

PDF HTML Source
Lecture Download View GitHub
Lab Download View GitHub

Topics Covered:

  • Baseline models and cross-validation
  • AutoML with AutoGluon
  • Transfer learning and fine-tuning
  • LLM fine-tuning with LoRA

Week 8: Reproducibility & Environments

Ensure experiments are reproducible and shareable.

Key Tools: Docker, DVC, MLflow, Weights & Biases

PDF HTML Source
Lecture Download View GitHub
Lab Download View GitHub

Topics Covered:

  • Virtual environments and dependency management
  • Docker containers for ML
  • Data versioning with DVC
  • Experiment tracking with MLflow

Week 9: Interactive AI Demos

Build and deploy interactive AI applications.

Key Tools: Streamlit, Gradio, Hugging Face Spaces

PDF HTML Source
Lecture Download View GitHub
Lab Download View GitHub

Topics Covered:

  • Streamlit app development
  • Gradio interfaces
  • State management and callbacks
  • Deployment to Hugging Face Spaces

Week 10: HTTP, APIs & FastAPI

Develop production-grade ML APIs.

Key Tools: FastAPI, curl, Pydantic

PDF HTML Source
Lecture Download View GitHub
Lab Download View GitHub

Topics Covered:

  • REST API design principles
  • FastAPI basics and async programming
  • Request validation with Pydantic
  • Serving ML models as APIs

Week 11: Git, GitHub Actions & CI/CD

Automate testing and deployment with CI/CD pipelines.

Key Tools: PyGithub, GitHub Actions, pytest

PDF HTML Source
Lecture Download View GitHub
Lab Download View GitHub

Topics Covered:

  • Git automation and hooks
  • GitHub API with PyGithub
  • CI/CD workflow design
  • Automated testing for ML

Part III: Production & Optimization (Weeks 12-15)

Learning Goal: Optimize, deploy, and monitor ML systems in production environments.

Week 12: Deployment on Constrained Devices

Optimize models for edge devices and mobile.

Key Tools: ONNX, Quantization, Pruning

PDF HTML Source
Lecture Download View GitHub
Lab Download View GitHub

Topics Covered:

  • Model compression techniques
  • Quantization (INT8, FP16)
  • Pruning and distillation
  • ONNX Runtime deployment

Week 13: Profiling & Optimization

Identify and fix performance bottlenecks.

Key Tools: PyTorch Profiler, NVIDIA Nsight, AMP

PDF HTML Source
Lecture Download View GitHub
Lab Download View GitHub

Topics Covered:

  • GPU profiling and bottleneck analysis
  • Mixed precision training (AMP)
  • Knowledge distillation
  • Memory optimization

Week 14: Model Monitoring & Observability

Detect and respond to model degradation in production.

Key Tools: Evidently AI, Prometheus, Data drift detection

PDF HTML Source
Lecture Download View GitHub
Lab Download View GitHub

Topics Covered:

  • Data drift vs concept drift
  • Statistical tests for drift detection
  • Monitoring dashboards
  • Alerting and retraining triggers

Additional Resources

TipUseful Links

Core Tools Covered

Category Tools
Data Collection requests, BeautifulSoup, Playwright
Data Quality Pydantic, pandas, jq
Labeling Label Studio, modAL, Snorkel
LLM APIs Gemini API, OpenAI API
Model Training scikit-learn, AutoGluon, PyTorch
Deployment FastAPI, Streamlit, Docker
Testing & CI pytest, GitHub Actions
Monitoring Evidently AI, Prometheus

Course Materials by Prof. Nipun Batra | IIT Gandhinagar

Last updated: December 2024