Interactive Apps

Hands-On Learning with Streamlit Apps

Interactive web apps provide visual, hands-on exploration of machine learning concepts through progressive complexity. All apps follow a simple → sophisticated learning path to build intuition gradually.

Educational Philosophy

Each app starts with interpretable, simple features and progressively introduces more sophisticated techniques. This approach reveals a key insight: good features matter more than complex algorithms!

Image Clustering Apps

1. Image Clustering (ResNet18)

Focus: Semantic image embeddings and similarity search

Learn how pretrained CNNs create meaningful representations for clustering and search:

Features: 512-D ResNet18 embeddings extracted from ImageNet-trained model
Clustering: K-Means, DBSCAN, Hierarchical algorithms
Visualization: PCA, t-SNE, UMAP dimensionality reduction
Analysis: Cosine similarity, pairwise similarity matrices
Optimization: Elbow method to find optimal number of clusters

Key Concepts: - Extract semantic features before final classification layer - Cluster images by visual content, not just pixels - Similarity search for reverse image lookup - 45 diverse sample images included

Try This

Use the elbow curve to automatically find the optimal number of clusters, then explore how different images cluster together based on semantic content!

Repository: apps/image-clustering/

2. Simple Image Clustering (Progressive Features)

Focus: Understanding how feature choice affects clustering quality

Progressive exploration from hand-crafted to learned features:

Tab 1: RGB Features (3D) - Cluster pixels by color similarity only - Simple, fast, interpretable - Problem: Ignores spatial information - similar colors anywhere group together

Tab 2: Position Features (2D) - Cluster pixels by X,Y location only - Creates spatially coherent regions - Problem: Ignores color - might group unrelated regions

Tab 3: RGB + Position (5D) - Combine color AND location - Adjustable weights to balance features - Better: Spatially coherent regions with similar colors

Tab 4: ResNet18 Dense Features (64-512D) - Extract CNN features at each spatial location - Bilinear upsampling to full resolution - Layer selection (layer1-4) trades resolution vs semantics - Best: Semantic understanding - recognizes objects, textures, patterns

Key Learning: - Feature engineering matters enormously - Deep features capture semantics, not just statistics - Dense feature maps enable pixel-level understanding - 7 sample images with diverse content

Key Insight

ResNet18 features dramatically outperform hand-crafted features (RGB, position) because they capture semantic meaning, not just surface statistics. This is true across all clustering algorithms!

Repository: apps/simple-clustering/

Document Clustering App

3. Document Clustering (Progressive NLP)

Focus: Evolution of text representation from keywords to semantics

Progressive exploration of NLP techniques:

Tab 1: Simple Statistical Features (6D) - Word count, sentence count, average word length - Punctuation, digits, uppercase counts - Clusters by: Document style (length, complexity) - Problem: Ignores content - technical papers on different topics group together

Tab 2: TF-IDF Vectors (50-200D) - Term Frequency × Inverse Document Frequency - Highlights important words per document - Clusters by: Shared vocabulary - Better: Captures topic keywords - Problem: Treats words independently - misses synonyms and context

Tab 3: BERT Embeddings (384D) - Sentence-BERT (all-MiniLM-L6-v2) pretrained transformer - Dense semantic representations - Clusters by: Actual meaning and topic - Best: Understands synonyms, context, paraphrasing - Shows cluster cohesion with cosine similarity

Key Learning: - Simple features cluster by style, not meaning - TF-IDF captures keywords but treats words as independent tokens - BERT understands semantic content - “ML” = “machine learning” = “AI” - 19 sample documents across 4 categories (Tech, Health, Environment, Sports)

Experiment

Compare clustering results across tabs. Notice how simple features group documents incorrectly, while BERT accurately identifies topics!

Repository: apps/document-clustering/

Common Features Across All Apps

Elbow Method Visualization

Every app includes elbow curve analysis:

Plot: Within-Cluster Sum of Squares (WCSS) vs K
Automatic detection: Second derivative finds maximum curvature
Visual marker: Red star and line indicate suggested K
Education: Learn when adding more clusters stops helping

Clustering Algorithms

K-Means: Fast, assumes spherical clusters, requires specifying K
DBSCAN: Density-based, finds arbitrary shapes (image apps)
Hierarchical: Agglomerative clustering (image apps)

Interactive Controls

Adjust number of clusters with sliders
Choose dimensionality reduction method (PCA/t-SNE/UMAP)
Control visualization parameters
Expand/collapse educational explanations

Running Apps Locally

Prerequisites

# Install dependencies
cd apps/<app-name>
pip install -r requirements.txt

Launch App

# Start Streamlit server
streamlit run app.py

# Or specify port
streamlit run app.py --server.port 8501

Sample Data Included

Image Clustering: 45 diverse images
Simple Clustering: 7 carefully selected images
Document Clustering: 19 documents across 4 categories

Deployment (Coming Soon)

Apps will be deployed on: - Hugging Face Spaces: Free hosting with GPU support - Streamlit Cloud: Official Streamlit platform

Deployed links will be added here when available.

Pedagogical Approach

Progressive Complexity

All apps follow this learning path:

Simple: Start with interpretable features (RGB, word counts)
Intermediate: Combine features or use TF-IDF
Advanced: Deep learning features (ResNet18, BERT)

Visual Learning

Clear matplotlib visualizations
Side-by-side comparisons
Annotated plots with explanations
2D projections of high-dimensional data

Active Exploration

Modify parameters and see instant updates
Sample data included - no setup needed
Fast iteration for experimentation
Expandable “What’s happening?” sections

Cross-Domain Insights

Images ↔︎ Documents Analogy:

Concept	Images	Documents
Simple features	Position (X, Y)	Word/sentence counts
Intermediate	RGB colors	TF-IDF vectors
Deep features	ResNet18 (512-D)	BERT (384-D)
Semantic space	Visual similarity	Topic similarity

Universal Lesson: Same clustering algorithms work everywhere - features are what matter!

Learning Outcomes

After using these apps, students will understand:

Conceptual: - How feature choice fundamentally affects clustering quality - Why deep learning features capture semantics - Progressive complexity reveals insights gradually - Trade-offs between speed and quality

Technical: - Extract features from pretrained models (ResNet18, BERT) - Apply clustering algorithms (K-Means, DBSCAN) - Use dimensionality reduction for visualization - Determine optimal K with elbow method

Practical: - Build interactive ML visualizations - Deploy Streamlit apps - Process images and text with deep learning - Interpret high-dimensional embeddings

Technical Stack

Framework: Streamlit ML Libraries: scikit-learn, PyTorch, sentence-transformers Visualization: matplotlib, numpy Models: - ResNet18 (torchvision) - 11M params, 512-D embeddings - Sentence-BERT (all-MiniLM-L6-v2) - 22M params, 384-D embeddings

Repository Structure

apps/
├── image-clustering/              # ResNet18 image clustering
│   ├── app.py                    # Main Streamlit app
│   ├── requirements.txt          # Dependencies
│   ├── sample_images/            # 45 sample images
│   └── README.md                 # Documentation
│
├── simple-clustering/            # Progressive image clustering
│   ├── app.py                    # Main Streamlit app
│   ├── requirements.txt          # Dependencies
│   ├── sample_images/            # 7 sample images
│   ├── download_samples.py       # Sample downloader
│   └── README.md                 # Documentation
│
├── document-clustering/          # Progressive document clustering
│   ├── app.py                    # Main Streamlit app
│   ├── requirements.txt          # Dependencies
│   ├── sample_documents/         # 19 text files
│   ├── create_samples.py         # Sample generator
│   └── README.md                 # Documentation
│
└── README.md                     # Main apps documentation

Contributing

Want to add a new app or improve existing ones?

Follow the progressive complexity philosophy
Include sample data for immediate use
Add educational explanations in expandable sections
Update this page and the main apps README
Submit a pull request

For questions or feedback, visit Prof. Nipun Batra’s homepage or create an issue

Hands-On Learning with Streamlit Apps

Image Clustering Apps

1. Image Clustering (ResNet18)

2. Simple Image Clustering (Progressive Features)

Document Clustering App

3. Document Clustering (Progressive NLP)

Common Features Across All Apps

Elbow Method Visualization

Clustering Algorithms

Interactive Controls

Running Apps Locally

Prerequisites

Launch App

Sample Data Included

Deployment (Coming Soon)

Pedagogical Approach

Progressive Complexity

Visual Learning

Active Exploration

Cross-Domain Insights

Learning Outcomes

Technical Stack

Repository Structure

Quick Navigation

Contributing