import matplotlib.pyplot as plt
import numpy as np
print(np.__version__)
import torch
import torch.nn as nn
import pandas as pd
# Retina mode
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
2.2.4
Nipun Batra
March 17, 2025
independent, identically distributed, iid, random variables, joint probability, normal distribution
import matplotlib.pyplot as plt
import numpy as np
print(np.__version__)
import torch
import torch.nn as nn
import pandas as pd
# Retina mode
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
2.2.4
Independent and Identically Distributed (i.i.d) random variables are fundamental building blocks in probability theory and statistics. This concept forms the theoretical foundation for many statistical methods, from simple sampling to complex machine learning algorithms. When we say random variables are i.i.d, we mean two crucial things: they are independent (the outcome of one doesn’t affect another) and identically distributed (they all follow the same probability distribution).
Understanding i.i.d random variables is essential for: - Statistical inference and hypothesis testing - The Law of Large Numbers and Central Limit Theorem - Monte Carlo simulations - Machine learning model assumptions - Data sampling and experimental design
By the end of this notebook, you will be able to:
Two random variables \(X_1\) and \(X_2\) are independent if:
\[P(X_1 = x_1, X_2 = x_2) = P(X_1 = x_1) \cdot P(X_2 = x_2)\]
For continuous random variables, this becomes:
\[f_{X_1,X_2}(x_1, x_2) = f_{X_1}(x_1) \cdot f_{X_2}(x_2)\]
where \(f_{X_1,X_2}(x_1, x_2)\) is the joint probability density function.
Random variables are identically distributed if they have the same probability distribution. This means: - Same probability density function (PDF) or probability mass function (PMF) - Same parameters (mean, variance, etc.) - Same support (the set of possible values)
When random variables \(X_1, X_2, \ldots, X_n\) are i.i.d:
Combined: \(f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = \prod_{i=1}^n f(x_i)\)
Let’s explore these concepts through computational examples.
We’ll create two independent normal random variables, both following \(N(0,1)\) (standard normal distribution). Since they have the same distribution parameters and are independent, they are i.i.d.
For independent random variables, we can compute their individual probability densities separately:
For i.i.d random variables, the joint probability density is the product of individual densities:
\[f_{X_1,X_2}(x_1, x_2) = f_{X_1}(x_1) \cdot f_{X_2}(x_2)\]
Let’s verify this with our example:
Definition: i.i.d random variables are both independent (outcomes don’t affect each other) and identically distributed (same probability distribution)
Mathematical Property: For i.i.d variables \(X_1, \ldots, X_n\): \[f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = \prod_{i=1}^n f(x_i)\]
Visual Indicators:
Practical Importance:
Understanding i.i.d random variables provides the foundation for advanced topics in probability, statistics, and machine learning. This concept bridges theoretical probability with practical data analysis applications.
# Example 4: Simulating Coin Flips (Classic i.i.d Example)
torch.manual_seed(42) # For reproducibility
# Simulate 1000 coin flips (Bernoulli random variables)
n_flips = 1000
p_heads = 0.5 # Fair coin
# Each flip is an i.i.d Bernoulli(0.5) random variable
flips = torch.distributions.Bernoulli(p_heads).sample((n_flips,))
# Plot results
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# 1. Sequence of flips (first 100)
axes[0].plot(range(100), flips[:100].numpy(), 'o-', markersize=3, alpha=0.7)
axes[0].set_title('First 100 Coin Flips\n(0=Tails, 1=Heads)')
axes[0].set_xlabel('Flip Number')
axes[0].set_ylabel('Outcome')
axes[0].set_ylim(-0.1, 1.1)
axes[0].grid(True, alpha=0.3)
# 2. Running proportion of heads
cumulative_heads = torch.cumsum(flips, dim=0)
proportion_heads = cumulative_heads / torch.arange(1, n_flips + 1)
axes[1].plot(range(1, n_flips + 1), proportion_heads.numpy(), 'b-', alpha=0.7)
axes[1].axhline(y=0.5, color='red', linestyle='--', label='True probability (0.5)')
axes[1].set_title('Running Proportion of Heads\n(Converges to true probability)')
axes[1].set_xlabel('Number of Flips')
axes[1].set_ylabel('Proportion of Heads')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
# 3. Histogram of outcomes
axes[2].hist(flips.numpy(), bins=[-0.25, 0.25, 0.75, 1.25], alpha=0.7,
density=True, rwidth=0.8)
axes[2].set_title(f'Distribution of Outcomes\n({int(flips.sum())} heads, {n_flips - int(flips.sum())} tails)')
axes[2].set_xlabel('Outcome')
axes[2].set_ylabel('Probability')
axes[2].set_xticks([0, 1])
axes[2].set_xticklabels(['Tails', 'Heads'])
axes[2].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print(f"Final proportion of heads: {proportion_heads[-1]:.4f}")
print(f"Expected proportion: {p_heads}")
print(f"Difference from expected: {abs(proportion_heads[-1] - p_heads):.4f}")
print("\nThis demonstrates the Law of Large Numbers:")
print("As n increases, the sample proportion converges to the true probability.")
# Case 1: i.i.d variables (both N(0,1))
X1_iid = torch.distributions.Normal(0, 1).sample((1000,))
X2_iid = torch.distributions.Normal(0, 1).sample((1000,))
# Case 2: Independent but NOT identically distributed
X1_ind = torch.distributions.Normal(0, 1).sample((1000,)) # N(0,1)
X2_ind = torch.distributions.Normal(2, 0.5).sample((1000,)) # N(2,0.5)
# Case 3: Identically distributed but NOT independent (correlated)
# Using multivariate normal with correlation
mean = torch.tensor([0.0, 0.0])
cov = torch.tensor([[1.0, 0.7], [0.7, 1.0]]) # correlation = 0.7
correlated_samples = torch.distributions.MultivariateNormal(mean, cov).sample((1000,))
X1_cor = correlated_samples[:, 0]
X2_cor = correlated_samples[:, 1]
# Create comparison plot
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
# Row 1: Scatter plots
titles = ['i.i.d Variables', 'Independent, Not Identical', 'Identical, Not Independent']
X_pairs = [(X1_iid, X2_iid), (X1_ind, X2_ind), (X1_cor, X2_cor)]
for i, (X1, X2) in enumerate(X_pairs):
axes[0, i].scatter(X1.numpy(), X2.numpy(), alpha=0.5, s=10)
axes[0, i].set_title(titles[i])
axes[0, i].set_xlabel('X1')
axes[0, i].set_ylabel('X2')
axes[0, i].grid(True, alpha=0.3)
# Add correlation info
corr = torch.corrcoef(torch.stack([X1, X2]))[0, 1]
axes[0, i].text(0.05, 0.95, f'Corr: {corr:.3f}', transform=axes[0, i].transAxes,
bbox=dict(boxstyle="round,pad=0.3", facecolor="yellow"))
# Row 2: Histograms
for i, (X1, X2) in enumerate(X_pairs):
axes[1, i].hist(X1.numpy(), bins=30, alpha=0.6, label='X1', density=True)
axes[1, i].hist(X2.numpy(), bins=30, alpha=0.6, label='X2', density=True)
axes[1, i].set_title(f'Marginal Distributions')
axes[1, i].set_xlabel('Value')
axes[1, i].set_ylabel('Density')
axes[1, i].legend()
axes[1, i].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Summary statistics
print("COMPARISON SUMMARY:")
print("="*50)
for i, (name, (X1, X2)) in enumerate(zip(titles, X_pairs)):
corr = torch.corrcoef(torch.stack([X1, X2]))[0, 1]
print(f"\n{name}:")
print(f" X1: mean={X1.mean():.3f}, std={X1.std():.3f}")
print(f" X2: mean={X2.mean():.3f}, std={X2.std():.3f}")
print(f" Correlation: {corr:.3f}")
# Check properties
same_mean = abs(X1.mean() - X2.mean()) < 0.2
same_std = abs(X1.std() - X2.std()) < 0.2
independent = abs(corr) < 0.1
print(f" ✓ Identically distributed: {same_mean and same_std}")
print(f" ✓ Independent: {independent}")
print(f" ✓ i.i.d: {same_mean and same_std and independent}")
Let’s contrast i.i.d variables with non-i.i.d ones to understand the difference.
# Generate samples from i.i.d normal random variables
n_samples = 1000
# Two i.i.d normal random variables
X1_samples = torch.distributions.Normal(0, 1).sample((n_samples,))
X2_samples = torch.distributions.Normal(0, 1).sample((n_samples,))
# Plot the samples
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Individual distributions
axes[0].hist(X1_samples.numpy(), bins=30, alpha=0.7, label='X1', color='blue', density=True)
axes[0].hist(X2_samples.numpy(), bins=30, alpha=0.7, label='X2', color='red', density=True)
axes[0].set_title('Individual Distributions\n(Should be identical)')
axes[0].set_xlabel('Value')
axes[0].set_ylabel('Density')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Joint distribution (scatter plot)
axes[1].scatter(X1_samples.numpy(), X2_samples.numpy(), alpha=0.5, s=10)
axes[1].set_title('Joint Distribution\n(Should show no correlation)')
axes[1].set_xlabel('X1')
axes[1].set_ylabel('X2')
axes[1].grid(True, alpha=0.3)
# Correlation check
correlation = torch.corrcoef(torch.stack([X1_samples, X2_samples]))[0, 1]
axes[2].text(0.1, 0.7, f'Sample Correlation: {correlation:.4f}', fontsize=12,
transform=axes[2].transAxes, bbox=dict(boxstyle="round,pad=0.3", facecolor="lightblue"))
axes[2].text(0.1, 0.5, f'Expected (theory): 0.0000', fontsize=12,
transform=axes[2].transAxes, bbox=dict(boxstyle="round,pad=0.3", facecolor="lightgreen"))
axes[2].text(0.1, 0.3, 'Independence verified if\ncorrelation ≈ 0', fontsize=11,
transform=axes[2].transAxes)
axes[2].set_title('Independence Check')
axes[2].axis('off')
plt.tight_layout()
plt.show()
print(f"Sample means: X1 = {X1_samples.mean():.4f}, X2 = {X2_samples.mean():.4f}")
print(f"Sample stds: X1 = {X1_samples.std():.4f}, X2 = {X2_samples.std():.4f}")
print(f"Sample correlation: {correlation:.4f}")
print("\nFor i.i.d N(0,1) variables, we expect:")
print("- Means ≈ 0, Standard deviations ≈ 1, Correlation ≈ 0")
Let’s generate samples from i.i.d random variables and visualize their properties.
Result Interpretation: - \(P(X_1 = 0.2) \approx 0.391\) - Individual probability density at \(x_1 = 0.2\) - \(P(X_2 = 0.4) \approx 0.368\) - Individual probability density at \(x_2 = 0.4\)
- \(P(X_1 = 0.2, X_2 = 0.4) \approx 0.144\) - Joint probability density
Notice that the joint probability equals the product of individual probabilities, confirming independence: \(0.391 \times 0.368 \approx 0.144\).