# Install required packages
# !pip install google-genai pillow requests matplotlib pandas numpyIntroduction
This comprehensive notebook demonstrates the multimodal capabilities of Google’s Gemini API using the latest models and features.
What You’ll Learn
API Fundamentals
- Single vs batch processing
- Streaming responses
- Configuration options
- Error handling
Text Capabilities (10 examples)
- Sentiment analysis, classification, NER
- Summarization, Q&A, translation
- Text completion and rewriting
- Entity and keyword extraction
Vision Capabilities (8 examples)
- Object counting and visual Q&A
- Chart and document analysis
- Image captioning and comparison
- Visual reasoning and diagrams
Video Capabilities (3 examples)
- Video understanding and analysis
- Frame-by-frame analysis
- Action recognition
Audio Capabilities (2 examples)
- Speech transcription
- Audio content analysis
PDF Capabilities (2 examples)
- Document analysis
- Multi-page extraction
Advanced Features (8 examples)
- Structured JSON output
- Function calling
- Code execution
- Search grounding
- Long context (1M tokens)
- Code generation
- Mathematical and scientific reasoning
- Creative writing
Model Used
Model: gemini-2.0-flash-thinking-exp-1219 - Input modalities: Text, Image, Video, Audio, PDF - Output: Text - Context: 1M+ tokens input, 65K+ tokens output - Special features: Code execution, search grounding, thinking mode
Setup and Configuration
import os
import json
import time
from pathlib import Path
from typing import List, Dict, Any
from google import genai
from PIL import Image, ImageDraw, ImageFont
import requests
from io import BytesIO
import base64
import matplotlib.pyplot as plt
import numpy as np
# Check for API key
if 'GEMINI_API_KEY' not in os.environ:
raise ValueError(
"GEMINI_API_KEY not found in environment.\n"
"Set it with: export GEMINI_API_KEY='your-key'\n"
"Get your key at: https://aistudio.google.com/apikey"
)
# Initialize client (new SDK)
client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])
print(" Gemini client initialized successfully")
print("Using google-genai SDK (new version)")
# Note: We'll use gemini-2.0-flash-thinking-exp-1219 as the default model
MODEL = "gemini-2.0-flash-thinking-exp-1219"
print(f"Default model: {MODEL}") Gemini client initialized successfully
Using google-genai SDK (new version)
Default model: gemini-2.0-flash-thinking-exp-1219
Helper Functions
def print_section(title: str):
"""Print formatted section header."""
print("\n" + "="*80)
print(title)
print("="*80)
def print_result(label: str, content: str, indent: int = 0):
"""Print formatted result."""
prefix = " " * indent
print(f"{prefix}{label}: {content}")
def load_image_from_url(url: str) -> Image.Image:
"""Load an image from a URL."""
response = requests.get(url)
return Image.open(BytesIO(response.content))
def create_sample_image(text: str, size=(800, 600)) -> Image.Image:
"""Create a simple image with text for testing."""
img = Image.new('RGB', size, color='white')
draw = ImageDraw.Draw(img)
draw.text((50, size[1]//2), text, fill='black')
return img
print("Helper functions loaded")Helper functions loaded
Part 1: API Usage Patterns
Single Request vs Batch Processing
print("Single Request")
print("="*80)
# Single request - simplest way
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="What is the capital of France?")
print_result("Question", "What is the capital of France?")
print_result("Answer", response.text)Single Request
================================================================================
Question: What is the capital of France?
Answer: The capital of France is **Paris**.
print("\n" + "="*80)
print("Batch Processing")
print("="*80)
# Process multiple prompts efficiently
prompts = [
"Translate 'Hello' to Spanish",
"Translate 'Goodbye' to French",
"Translate 'Thank you' to German",
"Translate 'Welcome' to Italian"
]
# Method 1: Sequential (simple but slower)
print("Sequential processing:")
start = time.time()
results_seq = []
for prompt in prompts:
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=prompt
)
results_seq.append(response.text.strip())
time_seq = time.time() - start
for i, (prompt, result) in enumerate(zip(prompts, results_seq), 1):
print(f" {i}. {prompt} → {result}")
print(f"Time: {time_seq:.2f}s")
================================================================================
Batch Processing
================================================================================
Sequential processing:
1. Translate 'Hello' to Spanish → The most common way to say "Hello" in Spanish is **Hola**.
2. Translate 'Goodbye' to French → The most common and direct translation of "Goodbye" in French is:
**Au revoir**
3. Translate 'Thank you' to German → The most common and direct translation of 'Thank you' to German is:
**Danke**
For a slightly more formal or emphatic "Thank you" (similar to "Thank you very much"):
**Danke schön**
Or, to say "Many thanks" / "Thank you very much":
**Vielen Dank**
4. Translate 'Welcome' to Italian → The most common translation for "Welcome" as a greeting in Italian is:
* To a **man**: **Benvenuto**
* To a **woman**: **Benvenuta**
* To a **group** (mixed or all male): **Benvenuti**
* To a **group** (all female): **Benvenute**
So, depending on who you are welcoming, you would use the appropriate form.
If you mean "You're welcome" in response to "Thank you," the common Italian phrases are:
* **Prego**
* **Di niente**
* **Di nulla**
Time: 10.11s
Part 2: Text-Only Tasks (10 tasks)
print("\n" + "="*80)
print("Zero-Shot Sentiment Analysis")
print("="*80)
texts = [
"This product is absolutely amazing! Best purchase I've made all year.",
"Terrible experience. Waste of money and time.",
"It's okay. Nothing special but does the job.",
"I'm disappointed with the quality. Expected much better.",
"Exceeded all my expectations! Highly recommend!"
]
prompt_template = """Classify the sentiment: Positive, Negative, or Neutral.
Reply with ONLY the sentiment label.
Text: {text}
Sentiment:"""
for i, text in enumerate(texts, 1):
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=prompt_template.format(text=text)
)
sentiment = response.text.strip()
print(f"{i}. '{text[:50]}...'")
print(f" → {sentiment}\n")
================================================================================
Zero-Shot Sentiment Analysis
================================================================================
1. 'This product is absolutely amazing! Best purchase ...'
→ Positive
2. 'Terrible experience. Waste of money and time....'
→ Negative
3. 'It's okay. Nothing special but does the job....'
→ Neutral
4. 'I'm disappointed with the quality. Expected much b...'
→ Negative
5. 'Exceeded all my expectations! Highly recommend!...'
→ Positive
print("\n" + "="*80)
print("Few-Shot Text Classification")
print("="*80)
# Intent classification with examples
prompt = """Classify customer service queries into categories.
Examples:
"How do I reset my password?" → Technical Support
"I was charged twice" → Billing
"What are your hours?" → General Inquiry
"This is broken" → Complaint
"I want to cancel" → Account Management
Query: "{query}"
Category:"""
test_queries = [
"My app keeps crashing when I upload photos",
"Why was I charged for premium when I'm on free plan?",
"Do you ship to Canada?",
"The product arrived damaged",
"How do I delete my account?"
]
for query in test_queries:
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=prompt.format(query=query)
)
print(f"Query: {query}")
print(f"Category: {response.text.strip()}\n")
================================================================================
Few-Shot Text Classification
================================================================================
Query: My app keeps crashing when I upload photos
Category: Category: Technical Support
Query: Why was I charged for premium when I'm on free plan?
Category: Billing
Query: Do you ship to Canada?
Category: General Inquiry
Query: The product arrived damaged
Category: Complaint
Query: How do I delete my account?
Category: Account Management
print("\n" + "="*80)
print("Named Entity Recognition (NER)")
print("="*80)
text = """Apple Inc. CEO Tim Cook announced a $500 million investment in renewable
energy projects across California next month. The announcement was made at the
company's headquarters in Cupertino on December 15, 2024."""
prompt = f"""Extract all named entities and categorize them:
PERSON, ORGANIZATION, LOCATION, MONEY, DATE
Text: {text}
Format as JSON with entity type as key."""
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=prompt)
print(f"Text: {text}\n")
print("Entities:")
print(response.text)
================================================================================
Named Entity Recognition (NER)
================================================================================
Text: Apple Inc. CEO Tim Cook announced a $500 million investment in renewable
energy projects across California next month. The announcement was made at the
company's headquarters in Cupertino on December 15, 2024.
Entities:
```json
{
"ORGANIZATION": [
"Apple Inc."
],
"PERSON": [
"Tim Cook"
],
"MONEY": [
"$500 million"
],
"LOCATION": [
"California",
"Cupertino"
],
"DATE": [
"December 15, 2024"
]
}
```
Text Summarization
print("\n" + "="*80)
print("Text Summarization")
print("="*80)
article = """Artificial intelligence continues to transform industries worldwide. Recent
advances in large language models have enabled more natural conversations between humans
and machines. These models can understand context, generate coherent text, and even
perform complex reasoning tasks. However, challenges remain in ensuring factual accuracy,
reducing computational costs, and addressing ethical concerns around bias and privacy.
Researchers are actively working on making AI more efficient, transparent, and aligned
with human values. The field is evolving rapidly, with new breakthroughs announced weekly.
From healthcare to education, AI is reshaping how we work and live."""
prompts = [
"Summarize in 1 sentence:",
"Summarize in 3 bullet points:",
"Create a tweet-length summary (280 chars):"
]
print(f"Original ({len(article)} chars):\n{article}\n")
for prompt_type in prompts:
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=f"{prompt_type}\n\n{article}"
)
print(f"{prompt_type}")
print(f" {response.text.strip()}\n")
================================================================================
Text Summarization
================================================================================
Original (671 chars):
Artificial intelligence continues to transform industries worldwide. Recent
advances in large language models have enabled more natural conversations between humans
and machines. These models can understand context, generate coherent text, and even
perform complex reasoning tasks. However, challenges remain in ensuring factual accuracy,
reducing computational costs, and addressing ethical concerns around bias and privacy.
Researchers are actively working on making AI more efficient, transparent, and aligned
with human values. The field is evolving rapidly, with new breakthroughs announced weekly.
From healthcare to education, AI is reshaping how we work and live.
Summarize in 1 sentence:
Artificial intelligence, especially with advancements in large language models enabling natural interaction and complex reasoning, is rapidly transforming industries and daily life despite ongoing challenges in accuracy, cost, and ethical concerns that researchers are actively addressing.
Summarize in 3 bullet points:
Here's a 3-bullet point summary:
* Artificial intelligence (AI), particularly large language models, is transforming industries and enabling more natural human-machine interactions through advanced contextual understanding and reasoning.
* Recent AI breakthroughs are reshaping various sectors like healthcare and education, fundamentally changing how we work and live.
* Significant challenges remain, including ensuring factual accuracy, reducing computational costs, and addressing ethical concerns around bias and privacy, with ongoing research focused on making AI more efficient, transparent, and aligned with human values.
Create a tweet-length summary (280 chars):
AI is rapidly transforming industries globally, with advanced LLMs now enabling natural conversations & complex reasoning. While exciting, key challenges include ensuring accuracy, reducing costs, and addressing ethical concerns like bias & privacy. Researchers are striving for efficient, transparent, and human-aligned AI that reshapes our world.
print("\n" + "="*80)
print("Question Answering")
print("="*80)
context = """The Eiffel Tower is a wrought-iron lattice tower located on the Champ de Mars
in Paris, France. It was constructed from 1887 to 1889 as the centerpiece of the 1889
World's Fair. The tower is 330 meters (1,083 feet) tall, about the same height as an
81-story building. It was the tallest man-made structure in the world until the Chrysler
Building was completed in New York in 1930."""
questions = [
"When was the Eiffel Tower built?",
"How tall is the Eiffel Tower?",
"Where is it located?",
"What material is it made of?",
"When did it stop being the tallest structure?"
]
print(f"Context: {context}\n")
for q in questions:
prompt = f"Context: {context}\n\nQuestion: {q}\nAnswer (concise):"
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=prompt
)
print(f"Q: {q}")
print(f"A: {response.text.strip()}\n")
================================================================================
Question Answering
================================================================================
Context: The Eiffel Tower is a wrought-iron lattice tower located on the Champ de Mars
in Paris, France. It was constructed from 1887 to 1889 as the centerpiece of the 1889
World's Fair. The tower is 330 meters (1,083 feet) tall, about the same height as an
81-story building. It was the tallest man-made structure in the world until the Chrysler
Building was completed in New York in 1930.
Q: When was the Eiffel Tower built?
A: 1887 to 1889
Q: How tall is the Eiffel Tower?
A: 330 meters (1,083 feet)
Q: Where is it located?
A: On the Champ de Mars in Paris, France.
Q: What material is it made of?
A: Wrought-iron
Q: When did it stop being the tallest structure?
A: 1930
print("\n" + "="*80)
print("Multi-Language Translation")
print("="*80)
text = "Artificial intelligence is changing the world."
languages = ["Spanish", "French", "German", "Japanese", "Hindi", "Arabic"]
print(f"Original (English): {text}\n")
for lang in languages:
prompt = f"Translate to {lang}: {text}"
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=prompt
)
print(f"{lang}: {response.text.strip()}")
================================================================================
Multi-Language Translation
================================================================================
Original (English): Artificial intelligence is changing the world.
Spanish: **Inteligencia artificial está cambiando el mundo.**
French: **L'intelligence artificielle change le monde.**
German: **Künstliche Intelligenz ändert die Welt.**
Japanese: **人工知能は世界を変えています。**
* **人工知能 (Jinkō Chinō)**: Artificial intelligence
* **は (wa)**: Topic particle (like "is" or "as for")
* **世界 (sekai)**: World
* **を (o)**: Object particle
* **変えています (kaeteimasu)**: Is changing (from 変える "to change" in the -teiru form for continuous action, and -masu for politeness)
You could also use a slightly less polite form, which is common in general statements:
**人工知能は世界を変えている。**
Hindi: Here are a couple of ways to translate it, both common and correct:
**1. कृत्रिम बुद्धिमत्ता दुनिया को बदल रही है।**
(Kritrim Buddhimatta duniya ko badal rahi hai.)
*This is the most direct and common translation.*
**2. कृत्रिम बुद्धिमत्ता विश्व को बदल रही है।**
(Kritrim Buddhimatta vishva ko badal rahi hai.)
*This uses "विश्व" (vishva) for world, which is also correct and slightly more formal than "दुनिया" (duniya).*
Both are perfectly fine, with the first one being slightly more colloquial.
Arabic: **الذكاء الاصطناعي يغير العالم.**
(Adh-dhakā' al-iṣṭināʿī yughayyiru al-ʿālam.)
print("\n" + "="*80)
print("Text Completion")
print("="*80)
prompts = [
"The secret to happiness is",
"In the year 2050, technology will",
"The most important skill for the future is"
]
for prompt in prompts:
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=f"Complete this sentence in 1-2 sentences: {prompt}"
)
print(f"Prompt: '{prompt}'")
print(f"Completion: {response.text.strip()}\n")
================================================================================
Text Completion
================================================================================
Prompt: 'The secret to happiness is'
Completion: The secret to happiness is **not a destination, but a continuous journey of embracing the present moment and cultivating genuine connections.** It lies in finding joy in everyday experiences and nurturing relationships that bring meaning and support to your life.
Prompt: 'In the year 2050, technology will'
Completion: In the year 2050, technology will be seamlessly integrated into every facet of daily life, with advanced AI acting as a personal co-pilot, optimizing everything from urban infrastructure to individual health and learning pathways. This pervasive intelligence will anticipate our needs, enhance productivity, and enable sustainable living solutions across the globe.
Prompt: 'The most important skill for the future is'
Completion: Here are a few options, each focusing on a different crucial skill:
**Option 1 (Adaptability):**
The most important skill for the future is **adaptability**, as the pace of technological advancement and global change necessitates continuous learning and reinvention. The ability to quickly acquire new knowledge and unlearn outdated approaches will be paramount for navigating an ever-evolving landscape.
**Option 2 (Critical Thinking):**
The most important skill for the future is **critical thinking**, allowing individuals to analyze complex information, discern truth from misinformation, and solve novel problems. In an era of abundant data and AI-generated content, the capacity to evaluate, question, and innovate will be indispensable.
**Option 3 (Creativity):**
The most important skill for the future is **creativity**, as automation increasingly handles routine tasks, leaving uniquely human contributions like original thought and innovative problem-solving in high demand. The ability to generate novel ideas and solutions will drive progress in every field.
print("\n" + "="*80)
print("Structured Entity Extraction")
print("="*80)
resume = """JOHN DOE
john.doe@email.com | (555) 123-4567 | linkedin.com/in/johndoe
EXPERIENCE
Senior Software Engineer, TechCorp (2020-Present)
- Led team of 5 engineers in developing cloud infrastructure
- Expertise: Python, AWS, Docker, Kubernetes
EDUCATION
M.S. Computer Science, Stanford University (2018)
B.S. Computer Science, MIT (2016)"""
prompt = f"""Extract key information as JSON:
{{
"name": "",
"email": "",
"phone": "",
"current_role": "",
"company": "",
"skills": [],
"education": []
}}
Resume:
{resume}
Return only valid JSON:"""
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=prompt)
print("Extracted data:")
print(response.text)
================================================================================
Structured Entity Extraction
================================================================================
Extracted data:
```json
{
"name": "JOHN DOE",
"email": "john.doe@email.com",
"phone": "(555) 123-4567",
"current_role": "Senior Software Engineer",
"company": "TechCorp",
"skills": [
"Python",
"AWS",
"Docker",
"Kubernetes"
],
"education": [
"M.S. Computer Science, Stanford University (2018)",
"B.S. Computer Science, MIT (2016)"
]
}
```
Keyword Extraction
print("\n" + "="*80)
print("Keyword Extraction")
print("="*80)
text = """Machine learning and deep learning are subsets of artificial intelligence
that focus on training algorithms to recognize patterns in data. Neural networks,
inspired by biological neurons, form the basis of deep learning systems. These
technologies power applications like computer vision, natural language processing,
and autonomous vehicles."""
prompt = f"""Extract the 5 most important keywords from this text.
Return as a comma-separated list.
Text: {text}
Keywords:"""
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=prompt)
print(f"Text: {text}\n")
print(f"Keywords: {response.text.strip()}")
================================================================================
Keyword Extraction
================================================================================
Text: Machine learning and deep learning are subsets of artificial intelligence
that focus on training algorithms to recognize patterns in data. Neural networks,
inspired by biological neurons, form the basis of deep learning systems. These
technologies power applications like computer vision, natural language processing,
and autonomous vehicles.
Keywords: Artificial intelligence, Machine learning, Deep learning, Neural networks, Data
Text Rewriting
print("\n" + "="*80)
print("Text Rewriting for Different Audiences")
print("="*80)
original = """The algorithm leverages advanced neural architectures to optimize
multi-dimensional parameter spaces through stochastic gradient descent."""
audiences = [
"Explain to a 10-year-old",
"Make it poetic"
]
print(f"Original: {original}\n")
for audience in audiences:
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=f"{audience}:\n\n{original}"
)
print(f"{audience}:")
print(f" {response.text.strip()}\n")
================================================================================
Text Rewriting for Different Audiences
================================================================================
Original: The algorithm leverages advanced neural architectures to optimize
multi-dimensional parameter spaces through stochastic gradient descent.
Explain to a 10-year-old:
Okay, imagine you have a super-smart robot, and you want it to learn how to do something really, really well – like drawing a perfect circle, or finding the absolute best way to win a tricky video game level.
Here's how that fancy sentence breaks down for our robot friend:
1. **"The algorithm leverages advanced neural architectures..."**
* **Algorithm:** This is just a fancy word for a step-by-step plan or a recipe. So, the robot has a special plan.
* **Leverages:** Means it *uses* or *makes the most of*.
* **Advanced neural architectures:** This is the cool part! Think of it like giving our robot a very special, super-flexible "brain" – not a real brain like yours, but one inspired by how your brain learns. It has lots of tiny connections, like tiny puzzle pieces, that can change and adapt as it learns. So, our robot uses this special, smart brain-like structure.
2. **"...to optimize multi-dimensional parameter spaces..."**
* **Optimize:** This means to find the *absolute best* way to do something, or the *perfect* settings.
* **Multi-dimensional parameter spaces:** Imagine our robot is trying to draw that perfect circle. There are so many things it can change, right? How much pressure to put on the pencil, how fast to move its arm, what angle to hold the pencil, how big the circle should be... Each of these is a "parameter" or a setting.
* "Multi-dimensional" means there are *hundreds* or even *thousands* of these settings it can change!
* "Spaces" means all the possible combinations of these settings. It's like a huge map where every spot on the map is a different way to draw the circle.
* So, our robot's goal is to find the *perfect spot* on that huge map of settings that makes the best circle.
3. **"...through stochastic gradient descent."**
* This is how the robot *finds* that perfect spot! Imagine you're blindfolded on a giant, bumpy hill, and you want to find the very bottom of the valley.
* **Descent:** You take a small step, and if you feel like you're going downhill (getting closer to the bottom, or in our robot's case, getting closer to a perfect circle), you take another step in that general direction. The robot tries something, sees if it got better, and tries to keep improving.
* **Gradient:** This is like feeling which way is "downhill." The robot checks how much better (or worse) its circle got after changing a setting.
* **Stochastic:** Sometimes, when you're blindfolded, you might stumble a little, or take a step that isn't *perfectly* downhill, but it's generally in the right direction. The robot doesn't know the whole map perfectly, so it takes slightly random, small steps, learns from each one, and slowly but surely moves towards the best settings.
**Putting it all together for our robot:**
"Our robot uses its special, smart, brain-like structure (advanced neural architecture) and a step-by-step plan (algorithm) to find the absolute best way (optimize) to adjust hundreds of its settings (multi-dimensional parameter spaces). It does this by taking many small, slightly random steps, always trying to get a little bit better, like someone blindfolded carefully feeling their way downhill to find the bottom of a valley (stochastic gradient descent)."
It's basically a really smart way for a computer to learn and get better at something complicated, bit by bit!
Make it poetic:
A silent spirit, born of code,
Through neural webs, profoundly wove,
Seeks to perfect, in realms untold,
Where countless futures softly flowed.
With steps both random, yet designed,
A slow descent, a path defined,
To sculpt the core, leave dross behind,
And find the perfect form enshrined.
print("\n" + "="*80)
print("Object Counting in Images")
print("="*80)
# Create a test image with multiple objects
fig, ax = plt.subplots(figsize=(10, 8))
ax.set_xlim(0, 10)
ax.set_ylim(0, 10)
ax.axis('off')
# Draw different shapes
circles = [(2, 2), (5, 5), (8, 3), (3, 7), (7, 8)]
squares_x = [1, 6, 9]
squares_y = [5, 2, 7]
for x, y in circles:
circle = plt.Circle((x, y), 0.3, color='red', alpha=0.7)
ax.add_patch(circle)
from matplotlib.patches import Rectangle
for x, y in zip(squares_x, squares_y):
square = Rectangle((x-0.3, y-0.3), 0.6, 0.6, color='blue', alpha=0.7)
ax.add_patch(square)
plt.title('Count the Objects', fontsize=16)
plt.savefig('/tmp/objects.png', dpi=150, bbox_inches='tight')
================================================================================
Object Counting in Images
================================================================================

plt.close()
image = Image.open('/tmp/objects.png')
prompt = """Count the objects in this image:
1. How many red circles?
2. How many blue squares?
3. Total number of objects?"""
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=[prompt, image])
print(response.text)Here are the counts for the objects in the image:
1. **Red circles:** 5
2. **Blue squares:** 3
3. **Total number of objects:** 8
Visual Question Answering (VQA)
print("\n" + "="*80)
print("Visual Question Answering")
print("="*80)
# Use sample images from URLs
try:
image_url = "https://images.unsplash.com/photo-1506905925346-21bda4d32df4?w=800"
image = load_image_from_url(image_url)
# Show image
plt.imshow(image)
questions = [
"What is the dominant color in this image?",
"Describe the scenery",
"What time of day does it appear to be?",
"What mood does this image convey?"
]
for q in questions:
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=[q, image]
)
print(f"Q: {q}")
print(f"A: {response.text.strip()}\n")
except Exception as e:
print(f"Note: Image loading requires internet. Error: {str(e)[:100]}")
================================================================================
Visual Question Answering
================================================================================
Q: What is the dominant color in this image?
A: The dominant color in this image is a **cool blue-grey**.
While there are beautiful warm orange and pink hues from the sunrise/sunset on the right and on some mountain peaks, the vast majority of the image is covered by the pale blue-grey sea of clouds in the valley and the cooler blue-purple tones of the sky, especially on the left side. The dark mountains also have a significant blue-grey cast.
Q: Describe the scenery
A: This image captures a breathtaking alpine landscape at either dawn or dusk, characterized by a stunning inversion layer.
In the foreground, the immediate terrain consists of dark, rugged, rocky ground, suggesting a high vantage point overlooking a vast valley. This dark, textured ground contrasts sharply with the brightness beyond.
Dominating the midground is an expansive, undulating sea of low-lying clouds. These clouds fill the valley below, resembling a soft, fluffy blanket of cotton or a tranquil, misty ocean. Their surface reflects the ambient light, appearing a luminous white and soft grey, with deeper shadows hinting at their depth and movement.
Above this cloud sea, a magnificent range of snow-capped mountains rises majestically. The prominent peaks, especially the central one, are rugged and jagged, still holding substantial snow and ice. These elevated summits are bathed in the warm, rosy glow of the rising or setting sun, with their snow and rock faces illuminated in hues of gold, pink, and orange. The shadowed sides of the mountains retain cooler tones of deep grey and blue, providing a dramatic contrast.
The sky above transitions gracefully from a clear, pale blue at its zenith to a vibrant canvas of soft purples, pinks, and oranges closer to the horizon, indicative of the golden hour. A few wispy clouds are scattered across the colorful horizon, catching the last (or first) rays of sunlight.
The overall impression is one of immense scale, serene beauty, and dramatic light. The scene evokes a sense of solitude and awe, placing the viewer high above the mundane world, looking out over a landscape transformed by light and cloud.
Q: What time of day does it appear to be?
A: It appears to be **sunrise**.
Here's why:
* **Warm, low-angle light:** The mountain peaks are bathed in a beautiful, warm, golden light, indicating that the sun is low on the horizon.
* **Sky colors:** The sky transitions from a soft blue/purple overhead to vibrant oranges, pinks, and yellows near the horizon, a classic characteristic of either sunrise or sunset.
* **Cloud inversion:** The valley is filled with a sea of clouds (a cloud inversion), which frequently forms overnight in cool valleys and begins to burn off or dissipate as the sun rises and warms the air.
* **Freshness of light:** There's a particular crispness to the light that often accompanies the early morning, as the world is just waking up and the sun's rays are just beginning to illuminate the highest points.
Q: What mood does this image convey?
A: This image conveys a mood of **profound serenity and majestic grandeur**.
The scene, likely captured at sunrise or sunset, features **snow-capped mountains rising above a vast, tranquil 'sea' of clouds** that fills the valley below. The soft, warm hues of the sky and the light catching the peaks create an **ethereal and dreamlike** atmosphere.
Overall, it evokes feelings of:
* **Awe and wonder** at the beauty and scale of nature.
* **Peace and tranquility** due to the still clouds and soft lighting.
* **Inspiration and quiet contemplation**, suggesting a place of solitude and reflection far above the world.

print("\n" + "="*80)
print("Chart Analysis")
print("="*80)
# Create a complex chart
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))
# Bar chart
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
sales = [45000, 52000, 48000, 61000, 58000, 72000]
ax1.bar(months, sales, color='steelblue')
ax1.set_title('Monthly Sales 2024', fontsize=14, fontweight='bold')
ax1.set_ylabel('Sales ($)')
ax1.grid(axis='y', alpha=0.3)
# Line chart
days = list(range(1, 31))
visitors = [100 + 50*np.sin(x/5) + np.random.randint(-10, 10) for x in days]
ax2.plot(days, visitors, marker='o', linewidth=2, markersize=4)
ax2.set_title('Daily Website Visitors', fontsize=14, fontweight='bold')
ax2.set_xlabel('Day of Month')
ax2.set_ylabel('Visitors')
ax2.grid(alpha=0.3)
plt.tight_layout()
plt.savefig('/tmp/charts.png', dpi=150)
================================================================================
Chart Analysis
================================================================================

plt.close()
chart_image = Image.open('/tmp/charts.png')
prompt = """Analyze these charts:
1. What trends do you see in the sales data?
2. Which month had the highest sales?
3. What pattern is visible in the website visitors chart?
4. Any notable insights?"""
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=[prompt, chart_image])
print(response.text)Here's an analysis of the provided charts:
### 1. What trends do you see in the sales data?
The monthly sales data for 2024 shows an overall upward trend from January to June, with some fluctuations:
* Sales started around $45,000 in January.
* They increased to approximately $52,000 in February.
* There was a slight dip in March to about $48,000.
* Sales then surged significantly in April, reaching around $61,000.
* A minor decrease occurred in May, bringing sales to about $58,000.
* Finally, June saw the highest sales figure, climbing to approximately $72,000.
In summary, the trend is generally positive, indicating growth over the first six months of 2024, with June being the strongest month by a considerable margin.
### 2. Which month had the highest sales?
**June** had the highest sales, reaching approximately $72,000.
### 3. What pattern is visible in the website visitors chart?
The daily website visitors chart shows a distinct "peak and trough" pattern within the month:
* Visitors start strong at the beginning of the month (around 110-115 on Day 1).
* They rapidly increase and peak around Day 8-9, reaching approximately 150 visitors.
* Following this peak, there's a relatively sharp and sustained decline through the middle of the month.
* The number of visitors hits its lowest point around Day 23-24, dropping significantly to about 40-42 visitors.
* Towards the end of the month, there's a moderate recovery, with visitors climbing back to around 80 by Day 30.
This pattern suggests a strong initial engagement in the first week, followed by a substantial drop-off mid-month, and then a partial recovery.
### 4. Any notable insights?
* **Positive Sales Momentum:** Despite some monthly fluctuations, the overall sales performance is robust, with a clear growth trajectory in the first half of 2024, culminating in an impressive June. This suggests effective sales strategies or increasing market demand.
* **Volatile Website Traffic:** The daily website visitor data reveals significant volatility. The dramatic drop in visitors from the peak (Day 8-9) to the trough (Day 23-24) is substantial (over a 70% decrease). This might indicate cyclical behavior (e.g., strong weekday traffic, weekend dips, or specific campaign timing), or it could point to issues like expiring marketing campaigns, technical problems, or a lack of fresh content mid-month.
* **Disconnection/Other Channels:** While website visitors fluctuate wildly within a month, the overall monthly sales are consistently growing. This could imply a few things:
* The website visitors shown might not be the primary driver of sales, or conversion rates are extremely high during peak visitor periods.
* Sales are heavily supported by other channels (offline, direct, recurring customers) not reflected in the daily website visitor chart.
* The "Daily Website Visitors" chart might represent a single, specific month where traffic was unusual, and not necessarily reflective of the visitor patterns for the months shown in the sales chart.
* **Areas for Investigation:**
* **Sales:** Understanding the reasons for the dips in March and May could help stabilize and further boost sales. Identifying what drove the significant growth in April and June is crucial for replication.
* **Website Visitors:** The sharp decline in visitors mid-month needs investigation. Is this a recurring pattern? What external factors or internal strategies could be causing this? Can the strategies that drive the early-month peak be sustained or replicated to mitigate the mid-month slump? Increasing and stabilizing website traffic could potentially lead to even higher sales.
Document OCR and Understanding
print("\n" + "="*80)
print("Document OCR + Understanding")
print("="*80)
# Create a sample receipt
img = Image.new('RGB', (600, 800), color='white')
draw = ImageDraw.Draw(img)
receipt_lines = [
"ACME STORE",
"123 Main Street",
"Phone: (555) 123-4567",
"",
"Date: 2024-12-01",
"Receipt #: 45678",
"-" * 40,
"Coffee Beans (2kg) $24.99",
"Milk (1L) $3.49",
"Bread $2.99",
"Fresh Vegetables $12.50",
"-" * 40,
"Subtotal: $43.97",
"Tax (8%): $3.52",
"TOTAL: $47.49",
"",
"Payment: VISA ****1234",
"Thank you for shopping!"
]
y = 50
for line in receipt_lines:
draw.text((50, y), line, fill='black')
y += 35
img.save('/tmp/receipt.png')
receipt_img = Image.open('/tmp/receipt.png')
plt.imshow(receipt_img)
plt.axis('off')
================================================================================
Document OCR + Understanding
================================================================================

prompt = """Extract information from this receipt:
1. Store name and address
2. Date and receipt number
3. List of items purchased with prices
4. Total amount
5. Payment method
Format as structured JSON."""
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=[prompt, receipt_img])
print(response.text)```json
{
"store_info": {
"name": "ACME STORE",
"address": "123 Main Street"
},
"transaction_info": {
"date": "2024-12-01",
"receipt_number": "45678"
},
"items": [
{
"name": "Coffee Beans (2kg)",
"price": 24.99
},
{
"name": "Milk (1L)",
"price": 3.49
},
{
"name": "Bread",
"price": 2.99
},
{
"name": "Fresh Vegetables",
"price": 12.50
}
],
"summary": {
"subtotal": 43.97,
"tax_percentage": "8%",
"tax_amount": 3.52,
"total": 47.49
},
"payment_info": {
"method": "VISA",
"last_four_digits": "1234"
}
}
```
Image Captioning
print("\n" + "="*80)
print("Image Captioning (Multiple Styles)")
print("="*80)
try:
image_url = "https://images.unsplash.com/photo-1506905925346-21bda4d32df4?w=800"
image = load_image_from_url(image_url)
# Show image
plt.imshow(image)
plt.axis('off')
caption_styles = [
"Write a short caption (1 sentence)",
"Write an Instagram caption with hashtags",
]
for style in caption_styles:
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=[style, image]
)
print(f"{style}:")
print(f" {response.text.strip()}\n")
except Exception as e:
print(f"Using local image. Error: {str(e)[:100]}")
================================================================================
Image Captioning (Multiple Styles)
================================================================================
Write a short caption (1 sentence):
Majestic mountains rise above a vast sea of clouds, bathed in the golden light of sunrise.
Write an Instagram caption with hashtags:
Here are a few options for an Instagram caption, ranging in tone:
**Option 1 (Evocative & Dreamy):**
Floating above the world on a sea of clouds. ✨ There's nothing quite like watching snow-capped peaks pierce through a golden hour glow, reminding you of the breathtaking magic our planet holds. Absolutely spellbinding.
#MountainViews #CloudInversion #GoldenHour #SunriseOrSunset #AlpineAdventures #NaturePhotography #BreathtakingViews #Wanderlust #AboveTheClouds #DreamyLandscape #EarthFocus #TravelGram
**Option 2 (Short & Sweet):**
When the sky meets the mountains, and the clouds fill the valley. Pure magic at golden hour! 🏔️☁️🧡
#Mountains #SeaOfClouds #SunsetVibes #SunriseMagic #NatureLover #EpicViews #HighAltitude #ExploreMore #LandscapePhotography
**Option 3 (Reflective):**
This view truly puts things into perspective. A blanket of clouds below, majestic peaks reaching for the pastel sky, and the quiet serenity of dawn or dusk. Grateful for moments like these that remind us to look up.
#Mountainscape #Cloudscape #PeacefulMoments #NatureHeals #Perspective #TravelInspiration #AdventureTime #PhotographyLovers #BeautifulDestinations #InstaNature
**Choose the one that best fits your personal style!**

print("\n" + "="*80)
print("Multi-Image Comparison")
print("="*80)
# Create two different chart images
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# Image 1: Pie chart
sizes = [30, 25, 20, 15, 10]
labels = ['A', 'B', 'C', 'D', 'E']
axes[0].pie(sizes, labels=labels, autopct='%1.1f%%')
axes[0].set_title('Product Distribution - Q1')
# Image 2: Pie chart with different values
sizes2 = [35, 20, 25, 10, 10]
axes[1].pie(sizes2, labels=labels, autopct='%1.1f%%')
axes[1].set_title('Product Distribution - Q2')
plt.tight_layout()
plt.savefig('/tmp/comparison.png', dpi=150)
================================================================================
Multi-Image Comparison
================================================================================

plt.close()
comp_image = Image.open('/tmp/comparison.png')
prompt = """Compare these two pie charts:
1. What are the main differences?
2. Which products increased/decreased?
3. What insights can you derive?"""
response = client.models.generate_content(
model="gemini-2.0-flash-thinking-exp-1219",
contents=[prompt, comp_image])
print(response.text)Here's a comparison of the two pie charts:
**1. What are the main differences?**
The main differences lie in the shifts of percentage distribution among products A, B, C, and D from Q1 to Q2, while product E remained stable.
* **Product A** significantly increased its share, solidifying its position as the leading product.
* **Product B** decreased its share, dropping from the second-largest product to the third.
* **Product C** increased its share, moving up to become the second-largest product.
* **Product D** saw a decrease in its share.
* **Product E** maintained the exact same share in both quarters.
**2. Which products increased/decreased?**
* **Increased:**
* **Product A:** Increased from 30.0% to 35.0% (an increase of 5.0 percentage points).
* **Product C:** Increased from 20.0% to 25.0% (an increase of 5.0 percentage points).
* **Decreased:**
* **Product B:** Decreased from 25.0% to 20.0% (a decrease of 5.0 percentage points).
* **Product D:** Decreased from 15.0% to 10.0% (a decrease of 5.0 percentage points).
* **Remained Stable:**
* **Product E:** Stayed at 10.0% in both quarters.
**3. What insights can you derive?**
1. **Shifting Market Leadership and Competition:** Product A is strengthening its dominant position, while Product C is emerging as a stronger contender, effectively taking market share from Product B. The increases for A and C are directly offset by the decreases for B and D, suggesting a reallocation of customer preference or strategic focus within the existing market.
2. **Product A's Strong Performance:** Product A is performing exceptionally well, gaining the most market share. This indicates either successful marketing, increased demand, superior product features, or issues with competing products. It is now the clear market leader.
3. **Product C's Growth:** Product C has shown significant growth, moving into the second position. This product might be gaining popularity, benefiting from new initiatives, or capturing customers from declining competitors like Product B and D.
4. **Challenges for Products B and D:** Products B and D are losing ground. This warrants investigation into their performance, customer satisfaction, competitive threats, or potential internal issues (e.g., supply chain, marketing, quality). Product D's share has dropped to match Product E, putting it at the bottom alongside E.
5. **Product E's Stability:** Product E maintains a consistent 10% share. This could indicate a stable niche market, consistent baseline demand, or perhaps a product that is not heavily impacted by the market dynamics affecting A, B, C, and D. It's a reliable, though not growing, contributor.
6. **Zero-Sum Game (in terms of distribution):** Since these are percentage distributions, the gains of 10% (5% for A + 5% for C) are exactly balanced by the losses of 10% (5% for B + 5% for D). This implies that customers are shifting between the existing products rather than a significant change in the overall product mix or total volume (though we cannot infer total volume from these charts alone).
Visual Reasoning
print("\n" + "="*80)
print("Visual Pattern Reasoning")
print("="*80)
# Create a visual pattern puzzle
fig, axes = plt.subplots(1, 4, figsize=(12, 3))
patterns = [
{'shape': 'circle', 'color': 'red', 'size': 0.3},
{'shape': 'square', 'color': 'blue', 'size': 0.4},
{'shape': 'circle', 'color': 'red', 'size': 0.5},
None # To be predicted
]
for i, (ax, pattern) in enumerate(zip(axes, patterns)):
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.axis('off')
if pattern:
if pattern['shape'] == 'circle':
circle = plt.Circle((0.5, 0.5), pattern['size'], color=pattern['color'])
ax.add_patch(circle)
else:
square = Rectangle(
(0.5-pattern['size'], 0.5-pattern['size']),
2*pattern['size'], 2*pattern['size'],
color=pattern['color']
)
ax.add_patch(square)
else:
ax.text(0.5, 0.5, '?', fontsize=60, ha='center', va='center')
ax.set_title(f'Position {i+1}')
plt.tight_layout()
plt.savefig('/tmp/pattern.png', dpi=150)
================================================================================
Visual Pattern Reasoning
================================================================================

video_path = '14801276_2160_3840_30fps.mp4'
from IPython.display import Video
Video(video_path, embed=True)