Week 6 Lab: LLM APIs & Prompt Engineering

CS 203: Software Tools and Techniques for AI
IIT Gandhinagar


Learning Objectives

By the end of this lab, you will be able to:

  1. Set up and use Gemini API (free tier) and OpenRouter (free models)
  2. Apply prompt engineering techniques (zero-shot, few-shot, chain-of-thought)
  3. Use LLMs for data labeling (connecting to Weeks 3-4)
  4. Use LLMs for text augmentation (connecting to Week 5)
  5. Extract structured data using JSON mode
  6. Build practical NLP pipelines with LLMs

Connection to Previous Weeks

Previous Week What We Did How LLMs Help Today
Week 1: Data Collection Collected movie data via APIs Parse unstructured text to JSON
Week 2: Data Validation Validated with Pydantic schemas Fix/normalize messy data
Week 3: Data Labeling Manual annotation, Cohen’s Kappa Auto-label 100x faster
Week 4: Optimizing Labeling Active learning, weak supervision LLM as labeling function
Week 5: Data Augmentation nlpaug, Albumentations Generate paraphrases

Today’s Goal: Use LLMs to supercharge our Netflix movie pipeline!


Part 1: Environment Setup

1.1 Install Required Packages

# Install required packages
!pip install google-genai openai requests pandas numpy matplotlib seaborn pydantic
# Import libraries
import os
import json
import time
import pandas as pd
import numpy as np
from typing import List, Optional
from pydantic import BaseModel
import warnings
warnings.filterwarnings('ignore')

print("All imports successful!")

1.2 API Key Setup

You have two free options:

  1. Gemini API (Recommended): aistudio.google.com/apikey
  2. OpenRouter: openrouter.ai/keys
# Option 1: Set API keys directly (for quick testing)
# WARNING: Don't commit these to git!

# Uncomment and fill in your keys:
# os.environ['GEMINI_API_KEY'] = 'your-gemini-key-here'
# os.environ['OPENROUTER_API_KEY'] = 'your-openrouter-key-here'

# Option 2: Load from environment (recommended for production)
GEMINI_API_KEY = os.environ.get('GEMINI_API_KEY', '')
OPENROUTER_API_KEY = os.environ.get('OPENROUTER_API_KEY', '')

print(f"Gemini API Key configured: {'Yes' if GEMINI_API_KEY else 'No'}")
print(f"OpenRouter API Key configured: {'Yes' if OPENROUTER_API_KEY else 'No'}")

Part 2: Setting Up LLM Clients

2.1 Gemini Client Setup

# Gemini API Setup
try:
    from google import genai
    
    if GEMINI_API_KEY:
        gemini_client = genai.Client(api_key=GEMINI_API_KEY)
        GEMINI_MODEL = "gemini-2.0-flash-exp"  # Fast and free
        print(f"Gemini client initialized with model: {GEMINI_MODEL}")
    else:
        gemini_client = None
        print("Gemini client not initialized (no API key)")
        
except ImportError:
    gemini_client = None
    print("google-genai not installed. Run: pip install google-genai")

2.2 OpenRouter Client Setup

OpenRouter provides access to 100+ models with a unified API. Many are free!

# OpenRouter Setup - uses OpenAI-compatible API
import openai

if OPENROUTER_API_KEY:
    openrouter_client = openai.OpenAI(
        base_url="https://openrouter.ai/api/v1",
        api_key=OPENROUTER_API_KEY
    )
    print("OpenRouter client initialized!")
else:
    openrouter_client = None
    print("OpenRouter client not initialized (no API key)")

# Free models on OpenRouter (as of 2024-2025)
FREE_MODELS = {
    "llama-3.1-8b": "meta-llama/llama-3.1-8b-instruct:free",
    "gemma-2-9b": "google/gemma-2-9b-it:free",
    "phi-3-mini": "microsoft/phi-3-mini-128k-instruct:free",
    "mistral-7b": "mistralai/mistral-7b-instruct:free",
    "qwen-2-7b": "qwen/qwen-2-7b-instruct:free",
}

print("\nFree models available:")
for name, model_id in FREE_MODELS.items():
    print(f"  - {name}: {model_id}")

2.3 Unified LLM Interface

Let’s create a unified interface that works with both providers.

def generate_text(prompt, provider="gemini", model=None, temperature=0.7, max_tokens=1024):
    """
    Unified text generation interface for Gemini and OpenRouter.
    
    Args:
        prompt: The text prompt
        provider: "gemini" or "openrouter"
        model: Model name (uses defaults if None)
        temperature: Creativity (0=deterministic, 1=creative)
        max_tokens: Maximum output length
    
    Returns:
        Generated text string
    """
    if provider == "gemini" and gemini_client:
        model = model or GEMINI_MODEL
        response = gemini_client.models.generate_content(
            model=model,
            contents=prompt,
            config={"temperature": temperature, "max_output_tokens": max_tokens}
        )
        return response.text
    
    elif provider == "openrouter" and openrouter_client:
        model = model or FREE_MODELS["llama-3.1-8b"]
        response = openrouter_client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature,
            max_tokens=max_tokens
        )
        return response.choices[0].message.content
    
    else:
        return f"[Mock response] No {provider} client available. Prompt was: {prompt[:100]}..."

# Test both providers
test_prompt = "What is 2 + 2? Answer in one word."

print("Testing Gemini:")
print(generate_text(test_prompt, provider="gemini"))

print("\nTesting OpenRouter (Llama 3.1):")
print(generate_text(test_prompt, provider="openrouter"))

Part 3: Our Netflix Movie Dataset

Let’s continue with our movie theme from previous weeks!

# Movie reviews from our Netflix pipeline
movie_reviews = [
    {"id": 1, "movie": "Inception", "review": "Mind-blowing! Nolan does it again with this masterpiece. The layers of dreams within dreams kept me on the edge of my seat."},
    {"id": 2, "movie": "The Room", "review": "So bad it's good. Hilarious unintentionally. Tommy Wiseau's acting is legendarily terrible."},
    {"id": 3, "movie": "Parasite", "review": "Gripping from start to finish. Deserved every Oscar. Bong Joon-ho is a genius."},
    {"id": 4, "movie": "Cats", "review": "What did I just watch? Truly bizarre and unsettling. Those CGI cats will haunt my nightmares."},
    {"id": 5, "movie": "The Godfather", "review": "A timeless classic. Marlon Brando's performance is perfect. The cinematography is stunning."},
    {"id": 6, "movie": "Avatar", "review": "Visually stunning but the story is predictable. James Cameron knows how to make a spectacle."},
    {"id": 7, "movie": "The Dark Knight", "review": "Heath Ledger's Joker is unforgettable. Best superhero movie ever made."},
    {"id": 8, "movie": "Twilight", "review": "Not my cup of tea but I can see the appeal for the target audience."},
    {"id": 9, "movie": "Interstellar", "review": "Made me cry. Beautiful exploration of love and time. Hans Zimmer's score is incredible."},
    {"id": 10, "movie": "Emoji Movie", "review": "Just... no. Avoid at all costs. A soulless cash grab."},
    {"id": 11, "movie": "Pulp Fiction", "review": "Tarantino's dialogue is unmatched. Non-linear storytelling at its finest."},
    {"id": 12, "movie": "Sharknado", "review": "Ridiculous premise but entertaining in a weird way. Perfect for a bad movie night."},
    {"id": 13, "movie": "The Shawshank Redemption", "review": "Hope is a good thing. Best movie ever made in my opinion. Tim Robbins and Morgan Freeman are incredible."},
    {"id": 14, "movie": "Transformers 5", "review": "Explosions. That's it. That's the review. Michael Bay gonna Michael Bay."},
    {"id": 15, "movie": "La La Land", "review": "Bittersweet ending that stays with you. Ryan Gosling and Emma Stone have great chemistry."},
]

df_reviews = pd.DataFrame(movie_reviews)
print(f"Loaded {len(df_reviews)} movie reviews")
df_reviews.head()

Part 4: Sentiment Classification (Zero-Shot)

4.1 Basic Zero-Shot Classification

# SOLVED: Zero-shot sentiment classification

def classify_sentiment_zero_shot(review, provider="gemini"):
    """
    Classify sentiment using zero-shot prompting.
    """
    prompt = f"""Classify the sentiment of this movie review as exactly one of: Positive, Negative, or Neutral.

Review: "{review}"

Respond with only the sentiment label (Positive, Negative, or Neutral)."""
    
    response = generate_text(prompt, provider=provider, temperature=0)
    return response.strip()

# Test on a few reviews
print("Zero-Shot Sentiment Classification:\n")
for _, row in df_reviews.head(5).iterrows():
    sentiment = classify_sentiment_zero_shot(row['review'])
    print(f"Movie: {row['movie']}")
    print(f"Review: {row['review'][:80]}...")
    print(f"Sentiment: {sentiment}\n")

Question 4.1: Batch Classification

Classify all reviews and add the sentiment to our DataFrame.

# TODO: Classify all reviews and add a 'sentiment' column to df_reviews
# Use a loop with rate limiting (time.sleep(1) between requests)

# Your code here

Part 5: Few-Shot Learning

5.1 (Solved) Few-Shot Sentiment with Examples

# SOLVED: Few-shot classification with examples

def classify_sentiment_few_shot(review, provider="gemini"):
    """
    Classify sentiment using few-shot prompting with examples.
    """
    prompt = """Classify movie reviews as Positive, Negative, or Mixed.

Examples:

Review: "Amazing film! Best I've seen this year. A must-watch!"
Sentiment: Positive

Review: "Terrible waste of time. The acting was wooden and the plot made no sense."
Sentiment: Negative

Review: "Visually stunning but the story is weak. Great effects, poor writing."
Sentiment: Mixed

Review: "It was okay. Nothing special but not terrible either."
Sentiment: Mixed

Now classify this review:

Review: "{review}"
Sentiment:""".format(review=review)
    
    response = generate_text(prompt, provider=provider, temperature=0)
    return response.strip()

# Test
test_reviews = [
    "Visually stunning but the story is predictable.",
    "So bad it's good. Hilarious unintentionally.",
    "Perfect in every way. A masterpiece."
]

print("Few-Shot Classification:\n")
for review in test_reviews:
    sentiment = classify_sentiment_few_shot(review)
    print(f"Review: {review}")
    print(f"Sentiment: {sentiment}\n")

Question 5.1: Create Your Own Few-Shot Classifier

Create a few-shot classifier for movie genre based on the review text.

# TODO: Create a few-shot genre classifier
# Genres: Action, Comedy, Drama, Horror, Sci-Fi, Romance
# Provide 2-3 examples for each genre

def classify_genre_few_shot(review, provider="gemini"):
    """
    Classify movie genre from review using few-shot prompting.
    """
    prompt = """Classify the likely genre of this movie based on the review.

Examples:

# Add your examples here

Now classify:

Review: "{review}"
Genre:""".format(review=review)
    
    # Your code here
    pass

# Test your classifier

Part 6: LLM-Based Data Labeling (Week 3-4 Connection)

Remember Week 3-4? We spent effort on manual labeling and active learning. LLMs can accelerate labeling 10-100x!

6.1 (Solved) Multi-Label Classification

# SOLVED: Multi-label classification for movie review attributes

def label_review_attributes(review, provider="gemini"):
    """
    Label multiple attributes of a movie review.
    Returns structured JSON with multiple labels.
    """
    prompt = f"""Analyze this movie review and provide labels for the following attributes:

Review: "{review}"

Provide your analysis in this exact JSON format:
{{
    "sentiment": "Positive" or "Negative" or "Mixed",
    "mentions_acting": true or false,
    "mentions_visuals": true or false,
    "mentions_story": true or false,
    "mentions_director": true or false,
    "would_recommend": true or false,
    "intensity": "Strong" or "Moderate" or "Mild"
}}

Respond with ONLY the JSON, no other text."""
    
    response = generate_text(prompt, provider=provider, temperature=0)
    
    # Parse JSON from response
    try:
        # Clean up response (remove markdown code blocks if present)
        cleaned = response.strip()
        if cleaned.startswith("```"):
            cleaned = cleaned.split("```")[1]
            if cleaned.startswith("json"):
                cleaned = cleaned[4:]
        return json.loads(cleaned)
    except json.JSONDecodeError:
        return {"error": "Failed to parse JSON", "raw": response}

# Test on a review
test_review = "Heath Ledger's Joker is unforgettable. Best superhero movie ever made."
labels = label_review_attributes(test_review)

print(f"Review: {test_review}")
print(f"\nLabels:")
print(json.dumps(labels, indent=2))

6.2 (Solved) Named Entity Recognition

# SOLVED: Extract named entities from reviews

def extract_entities(review, provider="gemini"):
    """
    Extract named entities (people, movies, etc.) from review.
    """
    prompt = f"""Extract all named entities from this movie review.

Review: "{review}"

Categories:
- PERSON: Actors, directors, characters
- MOVIE: Movie titles mentioned
- AWARD: Awards or accolades
- ORGANIZATION: Studios, production companies

Respond in JSON format:
{{
    "PERSON": ["name1", "name2"],
    "MOVIE": ["movie1"],
    "AWARD": ["award1"],
    "ORGANIZATION": ["org1"]
}}

Only include categories with entities found. Respond with ONLY JSON."""
    
    response = generate_text(prompt, provider=provider, temperature=0)
    
    try:
        cleaned = response.strip()
        if cleaned.startswith("```"):
            cleaned = cleaned.split("```")[1]
            if cleaned.startswith("json"):
                cleaned = cleaned[4:]
        return json.loads(cleaned)
    except json.JSONDecodeError:
        return {"error": "Failed to parse", "raw": response}

# Test
test_reviews = [
    "Heath Ledger's Joker in The Dark Knight is unforgettable. Christopher Nolan is a genius.",
    "Bong Joon-ho's Parasite deserved every Oscar it won. The Oscars got it right for once.",
]

print("Named Entity Recognition:\n")
for review in test_reviews:
    entities = extract_entities(review)
    print(f"Review: {review}")
    print(f"Entities: {json.dumps(entities, indent=2)}\n")

Question 6.1: LLM as Labeling Function (Snorkel Connection)

In Week 4, we used Snorkel labeling functions. Create an LLM-based labeling function.

# TODO: Create an LLM-based labeling function for Snorkel
# The function should:
# 1. Take a review text
# 2. Return 1 (Positive), 0 (Negative), or -1 (Abstain)
# 3. Only label high-confidence cases

POSITIVE = 1
NEGATIVE = 0
ABSTAIN = -1

def lf_llm_sentiment(review, provider="gemini"):
    """
    LLM-based labeling function for sentiment.
    Returns ABSTAIN for uncertain cases.
    """
    prompt = f"""Rate your confidence in the sentiment of this review.

Review: "{review}"

Respond in JSON format:
{{
    "sentiment": "Positive" or "Negative" or "Neutral",
    "confidence": 0.0 to 1.0
}}

Only JSON, no other text."""
    
    # Your code here
    # Parse response and return POSITIVE, NEGATIVE, or ABSTAIN based on confidence
    pass

# Test your labeling function

Question 6.2: Batch Labeling with Cost Tracking

# TODO: Create a batch labeling function that:
# 1. Labels multiple reviews
# 2. Tracks the number of API calls
# 3. Estimates cost (assume $0.001 per 1K tokens)
# 4. Handles rate limiting with sleep

def batch_label_reviews(reviews, provider="gemini", delay=1.0):
    """
    Label multiple reviews with cost tracking.
    
    Returns:
        labels: List of sentiment labels
        stats: Dict with api_calls, estimated_cost, etc.
    """
    # Your code here
    pass

# Test on first 5 reviews

Part 7: LLM-Based Data Augmentation (Week 5 Connection)

Remember Week 5? We used nlpaug for text augmentation. LLMs can generate more natural paraphrases!

7.1 (Solved) Paraphrase Generation

# SOLVED: Generate paraphrases using LLM

def generate_paraphrases(text, n=3, provider="gemini"):
    """
    Generate n paraphrases of the input text.
    Maintains the original meaning and sentiment.
    """
    prompt = f"""Generate {n} different paraphrases of this movie review.
Keep the same sentiment and meaning, but vary the wording.

Original: "{text}"

Respond with a JSON array of {n} paraphrases:
["paraphrase 1", "paraphrase 2", ...]

Only JSON, no other text."""
    
    response = generate_text(prompt, provider=provider, temperature=0.7)
    
    try:
        cleaned = response.strip()
        if cleaned.startswith("```"):
            cleaned = cleaned.split("```")[1]
            if cleaned.startswith("json"):
                cleaned = cleaned[4:]
        return json.loads(cleaned)
    except json.JSONDecodeError:
        return [response]  # Return raw response if parsing fails

# Test
original = "Mind-blowing! Nolan does it again with this masterpiece."
paraphrases = generate_paraphrases(original, n=3)

print(f"Original: {original}")
print(f"\nParaphrases:")
for i, p in enumerate(paraphrases, 1):
    print(f"  {i}. {p}")

7.2 (Solved) Style Transfer for Augmentation

# SOLVED: Style transfer - rewrite in different styles

def style_transfer(text, style, provider="gemini"):
    """
    Rewrite text in a different style while keeping sentiment.
    
    Styles: formal, casual, enthusiastic, critical, brief, detailed
    """
    prompt = f"""Rewrite this movie review in a {style} style.
Keep the same sentiment (positive/negative) but change the writing style.

Original: "{text}"

Rewritten ({style} style):"""
    
    response = generate_text(prompt, provider=provider, temperature=0.7)
    return response.strip()

# Test different styles
original = "This movie was absolutely fantastic! A must-watch."
styles = ["formal", "casual", "enthusiastic", "brief"]

print(f"Original: {original}\n")
for style in styles:
    rewritten = style_transfer(original, style)
    print(f"{style.capitalize()}: {rewritten}\n")

Question 7.1: Augment Training Data

Create an augmentation pipeline that expands our dataset.

# TODO: Create a function that augments a dataset by:
# 1. Generating 2 paraphrases per review
# 2. Applying one style transfer per review
# 3. Keeping track of original vs augmented samples

def augment_dataset(reviews, labels, augmentation_factor=3, provider="gemini"):
    """
    Augment a dataset of reviews.
    
    Args:
        reviews: List of review texts
        labels: List of sentiment labels
        augmentation_factor: How many augmented samples per original
    
    Returns:
        augmented_reviews: List including originals + augmented
        augmented_labels: Corresponding labels
        is_augmented: Boolean list (True if augmented)
    """
    # Your code here
    pass

# Test on a small sample

Part 8: Chain-of-Thought Reasoning

8.1 (Solved) CoT for Complex Analysis

# SOLVED: Chain-of-thought for nuanced review analysis

def analyze_review_cot(review, provider="gemini"):
    """
    Analyze a review step-by-step using chain-of-thought.
    """
    prompt = f"""Analyze this movie review step by step.

Review: "{review}"

Think through this carefully:

Step 1: What specific aspects of the movie does the reviewer mention?
Step 2: For each aspect, is the reviewer positive, negative, or neutral?
Step 3: What is the overall tone of the review?
Step 4: Would the reviewer recommend this movie?
Step 5: Final sentiment classification and confidence.

Provide your analysis:"""
    
    response = generate_text(prompt, provider=provider, temperature=0.3)
    return response

# Test on a nuanced review
nuanced_review = "Visually stunning but the story is predictable. James Cameron knows how to make a spectacle."
analysis = analyze_review_cot(nuanced_review)

print(f"Review: {nuanced_review}")
print(f"\nChain-of-Thought Analysis:\n{analysis}")

Question 8.1: CoT for Comparison

Use chain-of-thought to compare two reviews and determine which movie is better reviewed.

# TODO: Create a CoT function that compares two reviews

def compare_reviews_cot(review1, movie1, review2, movie2, provider="gemini"):
    """
    Compare two movie reviews using chain-of-thought.
    
    Returns:
        winner: Which movie is better reviewed
        reasoning: The step-by-step analysis
    """
    # Your code here
    pass

# Test: Compare Inception vs Avatar reviews

Part 9: Structured Output with Pydantic

9.1 (Solved) Pydantic Models for Review Analysis

# SOLVED: Structured output with Pydantic models

from pydantic import BaseModel, Field
from typing import List, Optional

class ReviewAnalysis(BaseModel):
    """Structured analysis of a movie review."""
    sentiment: str = Field(description="Overall sentiment: Positive, Negative, or Mixed")
    confidence: float = Field(ge=0, le=1, description="Confidence score 0-1")
    key_points: List[str] = Field(description="Main points from the review")
    mentioned_aspects: List[str] = Field(description="Aspects mentioned: acting, visuals, story, etc.")
    recommendation: bool = Field(description="Would the reviewer recommend this movie?")
    summary: str = Field(description="One-sentence summary of the review")

def analyze_review_structured(review, provider="gemini"):
    """
    Analyze review and return structured Pydantic model.
    """
    schema = ReviewAnalysis.model_json_schema()
    
    prompt = f"""Analyze this movie review and provide structured output.

Review: "{review}"

Respond with JSON matching this schema:
{json.dumps(schema['properties'], indent=2)}

Only JSON, no other text."""
    
    response = generate_text(prompt, provider=provider, temperature=0)
    
    try:
        cleaned = response.strip()
        if cleaned.startswith("```"):
            cleaned = cleaned.split("```")[1]
            if cleaned.startswith("json"):
                cleaned = cleaned[4:]
        data = json.loads(cleaned)
        return ReviewAnalysis(**data)
    except Exception as e:
        return {"error": str(e), "raw": response}

# Test
review = "Heath Ledger's Joker is unforgettable. Best superhero movie ever made."
analysis = analyze_review_structured(review)

print(f"Review: {review}")
print(f"\nStructured Analysis:")
if isinstance(analysis, ReviewAnalysis):
    print(f"  Sentiment: {analysis.sentiment} (confidence: {analysis.confidence})")
    print(f"  Key Points: {analysis.key_points}")
    print(f"  Aspects: {analysis.mentioned_aspects}")
    print(f"  Recommend: {analysis.recommendation}")
    print(f"  Summary: {analysis.summary}")
else:
    print(analysis)

Question 9.1: Create Custom Pydantic Model

Create a Pydantic model for extracting movie metadata from reviews.

# TODO: Create a MovieMetadata Pydantic model that extracts:
# - likely_genre (list of genres)
# - mentioned_actors (list of names)
# - mentioned_director (optional string)
# - year_hints (optional int if mentioned)
# - similar_movies_mentioned (list of movie titles)

class MovieMetadata(BaseModel):
    # Your fields here
    pass

def extract_movie_metadata(review, provider="gemini"):
    # Your code here
    pass

# Test on a review

Part 10: Comparing Models

10.1 (Solved) Model Comparison on Same Task

# SOLVED: Compare different models on the same task

def compare_models(review, task="sentiment"):
    """
    Run the same prompt on multiple models and compare results.
    """
    if task == "sentiment":
        prompt = f"""Classify this review as Positive, Negative, or Mixed.
Review: "{review}"
Sentiment:"""
    else:
        prompt = task  # Use as custom prompt
    
    results = {}
    
    # Gemini
    if gemini_client:
        try:
            results["Gemini Flash"] = generate_text(prompt, provider="gemini")
        except Exception as e:
            results["Gemini Flash"] = f"Error: {e}"
    
    # OpenRouter models
    if openrouter_client:
        for name, model_id in list(FREE_MODELS.items())[:3]:  # Test first 3 free models
            try:
                results[name] = generate_text(prompt, provider="openrouter", model=model_id)
                time.sleep(1)  # Rate limiting
            except Exception as e:
                results[name] = f"Error: {e}"
    
    return results

# Compare on a tricky review
tricky_review = "So bad it's good. Hilarious unintentionally."
results = compare_models(tricky_review)

print(f"Review: {tricky_review}")
print(f"\nModel Responses:")
for model, response in results.items():
    print(f"  {model}: {response.strip()[:100]}")

Question 10.1: Benchmark Models

Create a benchmark comparing model accuracy on labeled test data.

# TODO: Create a benchmark with ground truth labels

# Ground truth labels for our reviews
ground_truth = {
    1: "Positive",   # Inception
    2: "Mixed",      # The Room (so bad it's good)
    3: "Positive",   # Parasite
    4: "Negative",   # Cats
    5: "Positive",   # The Godfather
    6: "Mixed",      # Avatar (stunning but predictable)
    7: "Positive",   # The Dark Knight
    8: "Neutral",    # Twilight
    9: "Positive",   # Interstellar
    10: "Negative",  # Emoji Movie
}

def benchmark_models(reviews_df, ground_truth, providers=["gemini"]):
    """
    Benchmark models on labeled data.
    
    Returns:
        results: Dict with accuracy per model
    """
    # Your code here
    pass

# Run benchmark

Part 11: Cost-Effective Strategies

11.1 (Solved) Batching Multiple Items

# SOLVED: Batch multiple reviews in one API call

def batch_classify(reviews, provider="gemini"):
    """
    Classify multiple reviews in a single API call.
    More cost-effective than individual calls.
    """
    # Format reviews with numbers
    formatted = "\n".join([f"{i+1}. \"{r}\"" for i, r in enumerate(reviews)])
    
    prompt = f"""Classify the sentiment of each movie review below.
For each review, respond with the number and sentiment (Positive/Negative/Mixed).

Reviews:
{formatted}

Format your response as:
1. [Sentiment]
2. [Sentiment]
..."""
    
    response = generate_text(prompt, provider=provider, temperature=0, max_tokens=500)
    
    # Parse results
    results = []
    for line in response.strip().split("\n"):
        line = line.strip()
        if line and line[0].isdigit():
            # Extract sentiment
            for sentiment in ["Positive", "Negative", "Mixed", "Neutral"]:
                if sentiment.lower() in line.lower():
                    results.append(sentiment)
                    break
    
    return results

# Test batch classification
test_reviews = [
    "Amazing film! Best I've seen this year.",
    "Terrible waste of time.",
    "Visually stunning but boring story.",
    "Perfect in every way.",
    "Meh. It was okay."
]

sentiments = batch_classify(test_reviews)

print("Batch Classification Results:\n")
for review, sentiment in zip(test_reviews, sentiments):
    print(f"{sentiment:10} | {review[:50]}...")

print(f"\nClassified {len(test_reviews)} reviews in 1 API call!")

Question 11.1: Calculate Cost Savings

# TODO: Calculate the cost difference between:
# 1. Individual API calls (one per review)
# 2. Batched API calls (10 reviews per call)

# Assume:
# - Average prompt length: 50 tokens
# - Average response length: 10 tokens per review
# - Cost: $0.001 per 1K input tokens, $0.002 per 1K output tokens

def calculate_cost_comparison(n_reviews, tokens_per_prompt=50, tokens_per_response=10):
    """
    Calculate cost comparison between individual and batched calls.
    """
    # Your code here
    pass

# Calculate for 1000 reviews

Part 12: Building a Complete Pipeline

Challenge: Movie Review Analysis Pipeline

Build a complete pipeline that processes movie reviews through multiple stages.

# Challenge: Build a complete analysis pipeline

class MovieReviewPipeline:
    """
    Complete pipeline for movie review analysis.
    
    Stages:
    1. Classification (sentiment)
    2. Entity extraction (actors, directors)
    3. Attribute labeling (acting, visuals, story)
    4. Augmentation (paraphrases for training)
    5. Summary generation
    """
    
    def __init__(self, provider="gemini"):
        self.provider = provider
        self.stats = {
            "api_calls": 0,
            "reviews_processed": 0,
            "errors": 0
        }
    
    def classify_sentiment(self, review):
        """Stage 1: Sentiment classification."""
        self.stats["api_calls"] += 1
        return classify_sentiment_few_shot(review, self.provider)
    
    def extract_entities(self, review):
        """Stage 2: Named entity extraction."""
        self.stats["api_calls"] += 1
        return extract_entities(review, self.provider)
    
    def label_attributes(self, review):
        """Stage 3: Multi-attribute labeling."""
        self.stats["api_calls"] += 1
        return label_review_attributes(review, self.provider)
    
    def augment(self, review, n=2):
        """Stage 4: Generate paraphrases."""
        self.stats["api_calls"] += 1
        return generate_paraphrases(review, n, self.provider)
    
    def summarize(self, review):
        """Stage 5: Generate one-line summary."""
        self.stats["api_calls"] += 1
        prompt = f'Summarize this movie review in one sentence: "{review}"'
        return generate_text(prompt, self.provider, temperature=0.3)
    
    def process_review(self, review, stages=["classify", "entities", "attributes", "summarize"]):
        """
        Process a single review through selected stages.
        """
        result = {"original": review}
        
        try:
            if "classify" in stages:
                result["sentiment"] = self.classify_sentiment(review)
            
            if "entities" in stages:
                result["entities"] = self.extract_entities(review)
            
            if "attributes" in stages:
                result["attributes"] = self.label_attributes(review)
            
            if "augment" in stages:
                result["paraphrases"] = self.augment(review)
            
            if "summarize" in stages:
                result["summary"] = self.summarize(review)
            
            self.stats["reviews_processed"] += 1
            
        except Exception as e:
            result["error"] = str(e)
            self.stats["errors"] += 1
        
        return result
    
    def get_stats(self):
        """Get processing statistics."""
        return self.stats

# Test the pipeline
pipeline = MovieReviewPipeline(provider="gemini")

# Process a review through all stages
test_review = "Heath Ledger's Joker is unforgettable. Christopher Nolan's best work."
result = pipeline.process_review(test_review)

print(f"Pipeline Result for: {test_review}\n")
for key, value in result.items():
    print(f"{key}: {value}\n")

print(f"\nStats: {pipeline.get_stats()}")

Challenge: Process Entire Dataset

# TODO: Process all 15 reviews through the pipeline
# Add rate limiting (1 second delay between reviews)
# Track total processing time and costs

# Your code here

Challenge Problems

Challenge 1: Hybrid Labeling System

# Challenge: Build a hybrid labeling system that:
# 1. Uses LLM for initial labeling with confidence scores
# 2. Sends low-confidence items to human review queue
# 3. Uses majority voting when multiple LLM models disagree

class HybridLabeler:
    def __init__(self, confidence_threshold=0.8):
        self.threshold = confidence_threshold
        self.human_queue = []
        self.auto_labeled = []
    
    def label(self, review):
        """
        Label a review using hybrid approach.
        """
        # Your code here
        pass
    
    def get_human_queue(self):
        return self.human_queue
    
    def get_auto_labeled(self):
        return self.auto_labeled

# Test your hybrid labeler

Challenge 2: LLM-Based Data Validator

# Challenge: Use LLM to validate and fix movie data (Week 2 connection)

messy_data = [
    {"title": "the godfather", "year": "1972", "rating": "9.2/10"},
    {"title": "INCEPTION", "year": "two thousand ten", "rating": "good"},
    {"title": "The Dark Knight  ", "year": "2008", "rating": "9.0"},
    {"title": "interstellar", "year": "14", "rating": "8.6 out of 10"},
]

def validate_and_fix_with_llm(data, provider="gemini"):
    """
    Use LLM to validate and fix messy movie data.
    
    Returns:
        cleaned_data: List of properly formatted records
        fixes_made: List of fixes that were applied
    """
    # Your code here
    pass

# Test your validator

Challenge 3: Build a Review Generator for Testing

# Challenge: Generate synthetic movie reviews for testing# This is useful for testing your ML pipeline when real data is limiteddef generate_synthetic_reviews(movie_title, sentiment, n=5, provider="gemini"):    """    Generate synthetic movie reviews with specified sentiment.        Args:        movie_title: Name of the movie        sentiment: "Positive", "Negative", or "Mixed"        n: Number of reviews to generate        Returns:        List of synthetic review texts    """    # Your code here    pass# Generate 5 positive and 5 negative reviews for a new movie

—## Part 13: Beyond Movies - Real-World LLM ApplicationsNow let’s explore the true power of LLMs across diverse tasks!### 13.1 Text Summarization

# Text Summarization - Works on any content!def summarize_text(text, max_sentences=3, provider="gemini"):    """Summarize any text into key points."""    prompt = f"""Summarize this text in {max_sentences} sentences or less.Focus on the key points.Text: {text}Summary:"""    return generate_text(prompt, provider=provider, temperature=0.3)# Example: Summarize a research abstractresearch_abstract = """Machine learning models are increasingly being deployed in production environments,but maintaining their performance over time remains challenging. This paper introducesa novel approach to continuous model monitoring using statistical drift detectioncombined with automated retraining pipelines. Our method achieves 95% accuracy indetecting performance degradation within 24 hours, compared to 48+ hours for baselineapproaches. We validate our approach on three real-world datasets spanning recommendationsystems, fraud detection, and natural language processing. Results show a 40% reductionin model downtime and 25% improvement in overall system reliability."""summary = summarize_text(research_abstract)print("Original (Research Abstract):")print(research_abstract[:200] + "...")print(f"\nSummary:\n{summary}")

13.2 Translation

# Translation - Any language pair!def translate(text, source_lang, target_lang, provider="gemini"):    """Translate text between languages."""    prompt = f"""Translate this text from {source_lang} to {target_lang}.Only provide the translation, no explanations.Text: {text}Translation:"""    return generate_text(prompt, provider=provider, temperature=0.3)# Examplesexamples = [    ("Hello, how are you today?", "English", "Hindi"),    ("Machine learning is transforming industries.", "English", "Spanish"),    ("Bonjour, je m'appelle Claude.", "French", "English"),]print("Translation Examples:\n")for text, source, target in examples:    translation = translate(text, source, target)    print(f"{source}: {text}")    print(f"{target}: {translation}\n")

13.3 Code Generation & Explanation

# Code Generation - Write code from descriptions!def generate_code(description, language="Python", provider="gemini"):    """Generate code from natural language description."""    prompt = f"""Write {language} code that does the following:{description}Only provide the code with comments. No explanations outside the code."""    return generate_text(prompt, provider=provider, temperature=0.3)def explain_code(code, provider="gemini"):    """Explain what a piece of code does."""    prompt = f"""Explain what this code does in simple terms:```{code}```Explanation:"""    return generate_text(prompt, provider=provider, temperature=0.3)# Example 1: Generate codedescription = "Calculate the Fibonacci sequence up to n terms and return as a list"code = generate_code(description)print(f"Task: {description}")print(f"\nGenerated Code:\n{code}")# Example 2: Explain codemystery_code = """def f(x):    return x if x <= 1 else f(x-1) + f(x-2)"""explanation = explain_code(mystery_code)print(f"\n\nCode to explain: {mystery_code}")print(f"Explanation: {explanation}")

13.4 Question Answering from Context

# Question Answering - Extract answers from documents!def answer_question(context, question, provider="gemini"):    """Answer a question based on given context."""    prompt = f"""Based on the following context, answer the question.If the answer is not in the context, say "Not found in context."Context:{context}Question: {question}Answer:"""    return generate_text(prompt, provider=provider, temperature=0)# Example: Company FAQcompany_context = """TechStart Inc. was founded in 2018 by Dr. Sarah Chen and Mark Williams.The company is headquartered in Bangalore, India with offices in San Francisco and London.TechStart specializes in AI-powered customer service solutions and has raised $50 millionin Series B funding. The company has 250 employees and serves over 500 enterprise clients.Their flagship product, ChatAssist Pro, handles 10 million customer interactions monthly.Office hours are 9 AM to 6 PM IST, Monday through Friday."""questions = [    "Who founded TechStart?",    "How much funding has the company raised?",    "Where is the headquarters?",    "What is their main product?",    "What is the company's stock price?"  # Not in context]print("Q&A from Company Document:\n")for q in questions:    answer = answer_question(company_context, q)    print(f"Q: {q}")    print(f"A: {answer}\n")

13.5 Structured Data Extraction

# Structured Data Extraction - Parse unstructured text to JSON!def extract_structured_data(text, schema_description, provider="gemini"):    """Extract structured data from unstructured text."""    prompt = f"""Extract information from this text into structured JSON.Text: {text}Extract these fields: {schema_description}Respond with only valid JSON."""    response = generate_text(prompt, provider=provider, temperature=0)    try:        cleaned = response.strip()        if cleaned.startswith("```"):            cleaned = cleaned.split("```")[1]            if cleaned.startswith("json"):                cleaned = cleaned[4:]        return json.loads(cleaned)    except:        return {"raw": response}# Example 1: Extract from emailemail = """Hi Team,Just wanted to confirm our meeting for next Tuesday, December 10th at 2:30 PM.We'll be discussing the Q4 budget review in Conference Room B.Please bring your laptops and the latest sales reports.Thanks,Jennifer MartinezSenior Project Manager"""schema = "sender_name, sender_role, meeting_date, meeting_time, meeting_topic, location, required_items"extracted = extract_structured_data(email, schema)print("Email Extraction:")print(json.dumps(extracted, indent=2))# Example 2: Extract from job postingjob_posting = """We're hiring a Senior Machine Learning Engineer at Google Bangalore.Requirements: 5+ years experience, Python, PyTorch, distributed systems.Salary: 40-60 LPA. Remote-friendly. Apply by January 15, 2025."""schema = "job_title, company, location, experience_required, skills, salary_range, deadline, remote_policy"extracted = extract_structured_data(job_posting, schema)print("\nJob Posting Extraction:")print(json.dumps(extracted, indent=2))

13.6 Intent Classification & Routing

# Intent Classification - Route messages to the right department!def classify_intent(message, intents, provider="gemini"):    """Classify user message intent for routing."""    intent_list = ", ".join(intents)    prompt = f"""Classify this customer message into one of these intents: {intent_list}Message: "{message}"Respond with JSON:{{"intent": "chosen_intent", "confidence": 0.0-1.0, "reasoning": "brief explanation"}}Only JSON."""    response = generate_text(prompt, provider=provider, temperature=0)    try:        cleaned = response.strip()        if cleaned.startswith("```"):            cleaned = cleaned.split("```")[1].strip()            if cleaned.startswith("json"):                cleaned = cleaned[4:].strip()        return json.loads(cleaned)    except:        return {"raw": response}# Customer support routing exampleintents = ["billing", "technical_support", "sales", "returns", "general_inquiry"]messages = [    "My credit card was charged twice for the same order!",    "The app keeps crashing when I try to upload photos.",    "I'd like to know about enterprise pricing for 100+ users.",    "I want to return the headphones I bought last week.",    "What are your office hours?",    "The product I received is different from what I ordered."]print("Customer Support Intent Classification:\n")for msg in messages:    result = classify_intent(msg, intents)    if isinstance(result, dict) and "intent" in result:        print(f"Message: {msg[:50]}...")        print(f"  -> Intent: {result['intent']} (confidence: {result.get('confidence', 'N/A')})\\n")    else:        print(f"Message: {msg[:50]}... -> {result}\\n")

Question 13.1: Build Your Own ApplicationChoose one of these mini-projects and implement it using the techniques above.

# TODO: Choose and implement ONE of these mini-projects:# Option A: Resume Parser# - Extract name, email, skills, education, experience from resume text# - Return structured JSON# Option B: Recipe Assistant# - Take a list of ingredients# - Generate a recipe that uses those ingredients# - Include cooking time and difficulty level# Option C: Study Assistant# - Take a paragraph of educational content# - Generate 5 quiz questions with answers# - Vary difficulty levels# Option D: Sentiment-Aware Chatbot# - Detect user sentiment from their message# - Respond appropriately (empathetic if negative, enthusiastic if positive)# Your implementation here:def my_application(input_data, provider="gemini"):    """    Your mini-project implementation.    """    # Your code here    pass# Test your application:

Summary

In this lab, you learned:

  1. API Setup: Gemini (free tier) and OpenRouter (free models)
  2. Prompt Engineering: Zero-shot, few-shot, chain-of-thought
  3. Data Labeling: LLMs as 10-100x faster labelers (Week 3-4 connection)
  4. Data Augmentation: Paraphrases and style transfer (Week 5 connection)
  5. Structured Output: JSON extraction with Pydantic
  6. Cost Optimization: Batching, model selection

Key Takeaways

Task LLM Approach Cost Savings
Sentiment Classification Zero-shot/Few-shot Free with Gemini!
Data Labeling LLM + confidence filtering 10-100x faster than humans
Text Augmentation Paraphrase generation More natural than rule-based
Data Validation LLM-based fixing Handles edge cases

Free Resources

Next Week

Week 7: Model Development - Train your own models with the labeled data!