Independent and Identically Distributed (i.i.d) Random Variables

Probability
Statistics
Random Variables
Mathematics
Understanding the concept of independent and identically distributed random variables, their properties, and applications in probability theory and statistics
Author

Nipun Batra

Published

March 17, 2025

Keywords

independent, identically distributed, iid, random variables, joint probability, normal distribution

import matplotlib.pyplot as plt
import numpy as np
print(np.__version__)
import torch 
import torch.nn as nn

import pandas as pd
# Retina mode
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
2.2.4

Independent and Identically Distributed (i.i.d) Random Variables

Introduction

Independent and Identically Distributed (i.i.d) random variables are fundamental building blocks in probability theory and statistics. This concept forms the theoretical foundation for many statistical methods, from simple sampling to complex machine learning algorithms. When we say random variables are i.i.d, we mean two crucial things: they are independent (the outcome of one doesn’t affect another) and identically distributed (they all follow the same probability distribution).

Understanding i.i.d random variables is essential for: - Statistical inference and hypothesis testing - The Law of Large Numbers and Central Limit Theorem - Monte Carlo simulations - Machine learning model assumptions - Data sampling and experimental design

Learning Objectives

By the end of this notebook, you will be able to:

  1. Define independence and identical distribution for random variables
  2. Compute joint probability density functions for i.i.d random variables
  3. Apply the multiplication rule for independent random variables
  4. Recognize when the i.i.d assumption is appropriate in real-world scenarios
  5. Implement simulations involving i.i.d random variables using Python
  6. Analyze the properties and implications of i.i.d assumptions

Theoretical Background

Independence of Random Variables

Two random variables \(X_1\) and \(X_2\) are independent if:

\[P(X_1 = x_1, X_2 = x_2) = P(X_1 = x_1) \cdot P(X_2 = x_2)\]

For continuous random variables, this becomes:

\[f_{X_1,X_2}(x_1, x_2) = f_{X_1}(x_1) \cdot f_{X_2}(x_2)\]

where \(f_{X_1,X_2}(x_1, x_2)\) is the joint probability density function.

Identical Distribution

Random variables are identically distributed if they have the same probability distribution. This means: - Same probability density function (PDF) or probability mass function (PMF) - Same parameters (mean, variance, etc.) - Same support (the set of possible values)

The i.i.d Property

When random variables \(X_1, X_2, \ldots, X_n\) are i.i.d:

  1. Independence: \(f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = \prod_{i=1}^n f_{X_i}(x_i)\)
  2. Identical Distribution: \(f_{X_1}(x) = f_{X_2}(x) = \cdots = f_{X_n}(x) = f(x)\)

Combined: \(f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = \prod_{i=1}^n f(x_i)\)


Practical Implementation

Let’s explore these concepts through computational examples.

Example 1: Two Independent Normal Random Variables

We’ll create two independent normal random variables, both following \(N(0,1)\) (standard normal distribution). Since they have the same distribution parameters and are independent, they are i.i.d.

Computing Individual Probabilities

For independent random variables, we can compute their individual probability densities separately:

X1 = torch.distributions.Normal(0, 1)
X2 = torch.distributions.Normal(0, 1)

Computing Joint Probability for i.i.d Variables

For i.i.d random variables, the joint probability density is the product of individual densities:

\[f_{X_1,X_2}(x_1, x_2) = f_{X_1}(x_1) \cdot f_{X_2}(x_2)\]

Let’s verify this with our example:


Summary and Key Takeaways

What We’ve Learned:

  1. Definition: i.i.d random variables are both independent (outcomes don’t affect each other) and identically distributed (same probability distribution)

  2. Mathematical Property: For i.i.d variables \(X_1, \ldots, X_n\): \[f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = \prod_{i=1}^n f(x_i)\]

  3. Visual Indicators:

    • Zero correlation between variables (independence)
    • Identical marginal distributions (identical distribution)
    • Circular scatter plots for bivariate normal i.i.d variables
  4. Practical Importance:

    • Foundation for statistical inference
    • Enables the Law of Large Numbers
    • Assumption in many machine learning algorithms
    • Critical for sampling theory

Key Connections to Broader Concepts:

  • Law of Large Numbers: Sample means of i.i.d variables converge to population mean
  • Central Limit Theorem: Sums of i.i.d variables approach normal distribution
  • Statistical Inference: Many hypothesis tests assume i.i.d observations
  • Machine Learning: Training examples are often assumed to be i.i.d
  • Monte Carlo Methods: Rely on i.i.d random sampling

When to Question i.i.d Assumptions:

  • Time series data (autocorrelation)
  • Spatial data (spatial correlation)
  • Clustered data (within-cluster correlation)
  • Sequential learning (changing distributions)
  • Measurement instruments (systematic errors)

Understanding i.i.d random variables provides the foundation for advanced topics in probability, statistics, and machine learning. This concept bridges theoretical probability with practical data analysis applications.

# Example 4: Simulating Coin Flips (Classic i.i.d Example)
torch.manual_seed(42)  # For reproducibility

# Simulate 1000 coin flips (Bernoulli random variables)
n_flips = 1000
p_heads = 0.5  # Fair coin

# Each flip is an i.i.d Bernoulli(0.5) random variable
flips = torch.distributions.Bernoulli(p_heads).sample((n_flips,))

# Plot results
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# 1. Sequence of flips (first 100)
axes[0].plot(range(100), flips[:100].numpy(), 'o-', markersize=3, alpha=0.7)
axes[0].set_title('First 100 Coin Flips\n(0=Tails, 1=Heads)')
axes[0].set_xlabel('Flip Number')
axes[0].set_ylabel('Outcome')
axes[0].set_ylim(-0.1, 1.1)
axes[0].grid(True, alpha=0.3)

# 2. Running proportion of heads
cumulative_heads = torch.cumsum(flips, dim=0)
proportion_heads = cumulative_heads / torch.arange(1, n_flips + 1)

axes[1].plot(range(1, n_flips + 1), proportion_heads.numpy(), 'b-', alpha=0.7)
axes[1].axhline(y=0.5, color='red', linestyle='--', label='True probability (0.5)')
axes[1].set_title('Running Proportion of Heads\n(Converges to true probability)')
axes[1].set_xlabel('Number of Flips')
axes[1].set_ylabel('Proportion of Heads')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# 3. Histogram of outcomes
axes[2].hist(flips.numpy(), bins=[-0.25, 0.25, 0.75, 1.25], alpha=0.7, 
             density=True, rwidth=0.8)
axes[2].set_title(f'Distribution of Outcomes\n({int(flips.sum())} heads, {n_flips - int(flips.sum())} tails)')
axes[2].set_xlabel('Outcome')
axes[2].set_ylabel('Probability')
axes[2].set_xticks([0, 1])
axes[2].set_xticklabels(['Tails', 'Heads'])
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Final proportion of heads: {proportion_heads[-1]:.4f}")
print(f"Expected proportion: {p_heads}")
print(f"Difference from expected: {abs(proportion_heads[-1] - p_heads):.4f}")
print("\nThis demonstrates the Law of Large Numbers:")
print("As n increases, the sample proportion converges to the true probability.")

Real-World Applications and When i.i.d Assumptions Hold

Common Examples of i.i.d Random Variables:

  1. Coin Flips: Each flip is independent of previous flips and has the same probability distribution
  2. Measurement Errors: In well-controlled experiments, measurement errors are often i.i.d
  3. Random Sampling: Drawing samples with replacement from a population
  4. Manufacturing Quality: Products from a stable manufacturing process
  5. Network Packet Arrivals: In some network models

When i.i.d Assumptions Break Down:

  1. Time Series Data: Today’s stock price depends on yesterday’s price (not independent)
  2. Spatial Data: Nearby locations are often similar (not independent)
  3. Learning Systems: Performance improves over time (not identically distributed)
  4. Batch Effects: Different experimental batches may have different distributions
# Case 1: i.i.d variables (both N(0,1))
X1_iid = torch.distributions.Normal(0, 1).sample((1000,))
X2_iid = torch.distributions.Normal(0, 1).sample((1000,))

# Case 2: Independent but NOT identically distributed
X1_ind = torch.distributions.Normal(0, 1).sample((1000,))    # N(0,1)
X2_ind = torch.distributions.Normal(2, 0.5).sample((1000,))  # N(2,0.5)

# Case 3: Identically distributed but NOT independent (correlated)
# Using multivariate normal with correlation
mean = torch.tensor([0.0, 0.0])
cov = torch.tensor([[1.0, 0.7], [0.7, 1.0]])  # correlation = 0.7
correlated_samples = torch.distributions.MultivariateNormal(mean, cov).sample((1000,))
X1_cor = correlated_samples[:, 0]
X2_cor = correlated_samples[:, 1]

# Create comparison plot
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Row 1: Scatter plots
titles = ['i.i.d Variables', 'Independent, Not Identical', 'Identical, Not Independent']
X_pairs = [(X1_iid, X2_iid), (X1_ind, X2_ind), (X1_cor, X2_cor)]

for i, (X1, X2) in enumerate(X_pairs):
    axes[0, i].scatter(X1.numpy(), X2.numpy(), alpha=0.5, s=10)
    axes[0, i].set_title(titles[i])
    axes[0, i].set_xlabel('X1')
    axes[0, i].set_ylabel('X2')
    axes[0, i].grid(True, alpha=0.3)
    
    # Add correlation info
    corr = torch.corrcoef(torch.stack([X1, X2]))[0, 1]
    axes[0, i].text(0.05, 0.95, f'Corr: {corr:.3f}', transform=axes[0, i].transAxes,
                    bbox=dict(boxstyle="round,pad=0.3", facecolor="yellow"))

# Row 2: Histograms
for i, (X1, X2) in enumerate(X_pairs):
    axes[1, i].hist(X1.numpy(), bins=30, alpha=0.6, label='X1', density=True)
    axes[1, i].hist(X2.numpy(), bins=30, alpha=0.6, label='X2', density=True)
    axes[1, i].set_title(f'Marginal Distributions')
    axes[1, i].set_xlabel('Value')
    axes[1, i].set_ylabel('Density')
    axes[1, i].legend()
    axes[1, i].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Summary statistics
print("COMPARISON SUMMARY:")
print("="*50)
for i, (name, (X1, X2)) in enumerate(zip(titles, X_pairs)):
    corr = torch.corrcoef(torch.stack([X1, X2]))[0, 1]
    print(f"\n{name}:")
    print(f"  X1: mean={X1.mean():.3f}, std={X1.std():.3f}")
    print(f"  X2: mean={X2.mean():.3f}, std={X2.std():.3f}")
    print(f"  Correlation: {corr:.3f}")
    
    # Check properties
    same_mean = abs(X1.mean() - X2.mean()) < 0.2
    same_std = abs(X1.std() - X2.std()) < 0.2
    independent = abs(corr) < 0.1
    
    print(f"  ✓ Identically distributed: {same_mean and same_std}")
    print(f"  ✓ Independent: {independent}")
    print(f"  ✓ i.i.d: {same_mean and same_std and independent}")

Example 3: Comparing i.i.d vs Non-i.i.d Variables

Let’s contrast i.i.d variables with non-i.i.d ones to understand the difference.

# Generate samples from i.i.d normal random variables
n_samples = 1000

# Two i.i.d normal random variables
X1_samples = torch.distributions.Normal(0, 1).sample((n_samples,))
X2_samples = torch.distributions.Normal(0, 1).sample((n_samples,))

# Plot the samples
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Individual distributions
axes[0].hist(X1_samples.numpy(), bins=30, alpha=0.7, label='X1', color='blue', density=True)
axes[0].hist(X2_samples.numpy(), bins=30, alpha=0.7, label='X2', color='red', density=True)
axes[0].set_title('Individual Distributions\n(Should be identical)')
axes[0].set_xlabel('Value')
axes[0].set_ylabel('Density')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Joint distribution (scatter plot)
axes[1].scatter(X1_samples.numpy(), X2_samples.numpy(), alpha=0.5, s=10)
axes[1].set_title('Joint Distribution\n(Should show no correlation)')
axes[1].set_xlabel('X1')
axes[1].set_ylabel('X2')
axes[1].grid(True, alpha=0.3)

# Correlation check
correlation = torch.corrcoef(torch.stack([X1_samples, X2_samples]))[0, 1]
axes[2].text(0.1, 0.7, f'Sample Correlation: {correlation:.4f}', fontsize=12, 
             transform=axes[2].transAxes, bbox=dict(boxstyle="round,pad=0.3", facecolor="lightblue"))
axes[2].text(0.1, 0.5, f'Expected (theory): 0.0000', fontsize=12, 
             transform=axes[2].transAxes, bbox=dict(boxstyle="round,pad=0.3", facecolor="lightgreen"))
axes[2].text(0.1, 0.3, 'Independence verified if\ncorrelation ≈ 0', fontsize=11, 
             transform=axes[2].transAxes)
axes[2].set_title('Independence Check')
axes[2].axis('off')

plt.tight_layout()
plt.show()

print(f"Sample means: X1 = {X1_samples.mean():.4f}, X2 = {X2_samples.mean():.4f}")
print(f"Sample stds:  X1 = {X1_samples.std():.4f}, X2 = {X2_samples.std():.4f}")
print(f"Sample correlation: {correlation:.4f}")
print("\nFor i.i.d N(0,1) variables, we expect:")
print("- Means ≈ 0, Standard deviations ≈ 1, Correlation ≈ 0")

Example 2: Visualizing i.i.d Random Variables

Let’s generate samples from i.i.d random variables and visualize their properties.

Result Interpretation: - \(P(X_1 = 0.2) \approx 0.391\) - Individual probability density at \(x_1 = 0.2\) - \(P(X_2 = 0.4) \approx 0.368\) - Individual probability density at \(x_2 = 0.4\)
- \(P(X_1 = 0.2, X_2 = 0.4) \approx 0.144\) - Joint probability density

Notice that the joint probability equals the product of individual probabilities, confirming independence: \(0.391 \times 0.368 \approx 0.144\).

# say sample is
sample = torch.tensor([0.2, 0.4])
P_X_x1_ = X1.log_prob(sample[0]).exp()
P_X_x2_ = X2.log_prob(sample[1]).exp()

print(P_X_x1_, P_X_x2_)
tensor(0.3910) tensor(0.3683)
joint_pdf = P_X_x1_ * P_X_x2_
print(joint_pdf)
tensor(0.1440)