import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import pandas as pd
# Retina mode
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
PMF and some common discrete distributions
PMF of Bernoulli distribution
Let \(X\) be a Bernoulli random variable. \(X\) can take on one of two values, 0 or 1, with probabilities \(1-p\) and \(p\), respectively.
Example: Suppose we flip a coin with probability \(p\) of landing heads. Let \(X\) be the random variable that is 1 if the coin lands heads and 0 if the coin lands tails.
The probability mass function (PMF) of a Bernoulli random variable is given by:
\[ p_X(x) = \begin{cases} 1-p, & \text{if } x = 0, \\ p, & \text{if } x = 1. \end{cases} \]
or, equivalently,
\[ p_X(x) = p^x(1-p)^{1-x}, \quad x \in \{0, 1\}. \]
where \(0 < p < 1\) is called the Bernoulli parameter. We write
\[ X \sim \text{Bernoulli}(p) \]
to denote that \(X\) is drawn from a Bernoulli distribution with parameter \(p\).
The probability mass function (PMF) of a Bernoulli distribution is given by: \[ \begin{equation} f(x) = \begin{cases} p & \text{if } x = 1 \\ 1 - p & \text{if } x = 0 \end{cases} \end{equation} \]
where \(p\) is the probability of success.
## Plotting the PMF
def plot_pmf_bernoilli(p, title):
= np.array([0, 1])
x = np.array([1-p, p])
y
plt.bar(x, y)
plt.title(title)'x')
plt.xlabel('P(X=x)')
plt.ylabel(0, 1]) plt.xticks([
0.3, 'Bernoulli PMF with p=0.3') plot_pmf_bernoilli(
0.8, 'Bernoulli PMF with p=0.3') plot_pmf_bernoilli(
= torch.distributions.Bernoulli? dist
Init signature: torch.distributions.Bernoulli(probs=None, logits=None, validate_args=None)
Docstring:
Creates a Bernoulli distribution parameterized by :attr:`probs`
or :attr:`logits` (but not both).
Samples are binary (0 or 1). They take the value `1` with probability `p`
and `0` with probability `1 - p`.
Example::
>>> # xdoctest: +IGNORE_WANT("non-deterministic")
>>> m = Bernoulli(torch.tensor([0.3]))
>>> m.sample() # 30% chance 1; 70% chance 0
tensor([ 0.])
Args:
probs (Number, Tensor): the probability of sampling `1`
logits (Number, Tensor): the log-odds of sampling `1`
File: ~/mambaforge/lib/python3.12/site-packages/torch/distributions/bernoulli.py
Type: type
Subclasses:
Bernoulli Distribution in PyTorch
# Create a Bernoulli distribution with p=0.9
= torch.distributions.Bernoulli(probs=0.9) dist
dist
Bernoulli(probs: 0.8999999761581421)
# Print all attributes of the Bernoulli distribution -- do not have __ or _ in the beginning
= [attr for attr in dir(dist) if not attr.startswith('_')]
attrs pd.Series(attrs)
0 arg_constraints
1 batch_shape
2 cdf
3 entropy
4 enumerate_support
5 event_shape
6 expand
7 has_enumerate_support
8 has_rsample
9 icdf
10 log_prob
11 logits
12 mean
13 mode
14 param_shape
15 perplexity
16 probs
17 rsample
18 sample
19 sample_n
20 set_default_validate_args
21 stddev
22 support
23 variance
dtype: object
dist.mean
tensor(0.9000)
dist.probs
tensor(0.9000)
dist.support
Boolean()
1.0)).exp() dist.log_prob(torch.tensor(
tensor(0.9000)
0.0)).exp() dist.log_prob(torch.tensor(
tensor(0.1000)
try:
0.5)).exp()
dist.log_prob(torch.tensor(except Exception as e:
print(e)
Expected value argument (Tensor of shape ()) to be within the support (Boolean()) of the distribution Bernoulli(probs: 0.8999999761581421, logits: 2.1972243785858154), but found invalid values:
0.5
dist.sample()
tensor(0.)
= dist.sample(torch.Size([1000])) samples
10] samples[:
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
samples.mean()
tensor(0.9110)
Lime vs Lemon
https://www.healthline.com/nutrition/lime-vs-lemon
Limes are small, green, and more tart than lemons, which are larger, oval-shaped, and yellow. Nutritionally, they’re almost identical and share many of the same potential health benefits.
Lemons are usually bright yellow, while limes are typically a bright shade of green. However, certain types of limes will turn yellow as they ripen, making the distinction a little more difficult.
Limes are also smaller and rounder than lemons. They can vary in size but are usually 1–2 inches (3–6 centimeters) in diameter.
In comparison, lemons tend to be 2–4 inches (7–12 centimeters) in diameter and have a more oval or oblong shape.
Main question
Given a fruit (lime or lemon) and its radius, we want to predict if it is a lime or a lemon.
Let us denote the radius of the fruit by \(r\) and the type of the fruit by \(y\) where \(y=0\) if the fruit is a lime and \(y=1\) if the fruit is a lemon.
We want to model the probability of the fruit being a lemon given its radius, i.e., we want to model \(p(y=1|r)\).
Generative process
# Set random seed
42) torch.manual_seed(
<torch._C.Generator at 0x12ab80ab0>
= torch.distributions.Uniform(0, 3).sample((1000,)) radius_array
10] radius_array[:
tensor([2.6468, 2.7450, 1.1486, 2.8779, 1.1713, 1.8027, 0.7697, 2.3809, 2.8223,
0.3996])
= plt.hist(radius_array) _
We start by modeling the generative process of the data.
We assume if w*r + b > 0, then the fruit is a lemon, otherwise it is a lime.
Let us assume that w_true
= 1.2 and b_true
= -2.0.
def linear(r, w, b):
return w * r + b
= 1.2
w_true = -2.0
b_true
= linear(radius_array, w_true, b_true) logits
pd.Series(logits.numpy()).describe()
count 1000.000000
mean -0.249277
std 1.045096
min -1.991556
25% -1.134019
50% -0.342291
75% 0.646800
max 1.599296
dtype: float64
Can we use logits
to model the probability of the fruit being a lemon given its radius?
No! These logits can be any real number, but we want to model the probability of the fruit being a lemon given its radius, which is a number between 0 and 1.
We can use the sigmoid function to map the logits to a number between 0 and 1.
\[ \sigma(x) = \frac{1}{1 + e^{-x}} \]
def sigmoid(x):
return 1 / (1 + torch.exp(-x))
= sigmoid(logits) probs
= pd.DataFrame({
df 'radius': radius_array.numpy(),
'logits': logits.numpy(),
'probabilities': probs.numpy()
})
df.head()
radius | logits | probabilities | |
---|---|---|---|
0 | 2.646808 | 1.176169 | 0.764258 |
1 | 2.745012 | 1.294014 | 0.784826 |
2 | 1.148591 | -0.621690 | 0.349397 |
3 | 2.877917 | 1.453500 | 0.810537 |
4 | 1.171345 | -0.594386 | 0.355629 |
'radius < 0.2').head() df.query(
radius | logits | probabilities | |
---|---|---|---|
29 | 0.018481 | -1.977822 | 0.121551 |
50 | 0.015137 | -1.981835 | 0.121123 |
74 | 0.012186 | -1.985377 | 0.120747 |
99 | 0.048608 | -1.941670 | 0.125464 |
108 | 0.187182 | -1.775382 | 0.144874 |
We can observe as per our model, smaller fruits are more likely to be limes (probability of being a lemon is less) and larger fruits are more likely to be lemons (probability of being a lemon is more).
Generate a dataset
= torch.distributions.Bernoulli(probs=probs).sample() y_true
'y_true'] = y_true.numpy() df[
'y_true == 0').head(10) df.query(
radius | logits | probabilities | y_true | |
---|---|---|---|---|
6 | 0.769717 | -1.076339 | 0.254199 | 0.0 |
9 | 0.399558 | -1.520531 | 0.179383 | 0.0 |
15 | 1.288213 | -0.454144 | 0.388376 | 0.0 |
17 | 1.721713 | 0.066056 | 0.516508 | 0.0 |
18 | 0.799740 | -1.040312 | 0.261090 | 0.0 |
19 | 1.882347 | 0.258817 | 0.564345 | 0.0 |
24 | 0.315945 | -1.620866 | 0.165085 | 0.0 |
25 | 0.808484 | -1.029819 | 0.263119 | 0.0 |
26 | 1.076438 | -0.708274 | 0.329980 | 0.0 |
28 | 1.641575 | -0.030110 | 0.492473 | 0.0 |
'y_true == 1').head(10) df.query(
radius | logits | probabilities | y_true | |
---|---|---|---|---|
0 | 2.646808 | 1.176169 | 0.764258 | 1.0 |
1 | 2.745012 | 1.294014 | 0.784826 | 1.0 |
2 | 1.148591 | -0.621690 | 0.349397 | 1.0 |
3 | 2.877917 | 1.453500 | 0.810537 | 1.0 |
4 | 1.171345 | -0.594386 | 0.355629 | 1.0 |
5 | 1.802686 | 0.163223 | 0.540715 | 1.0 |
7 | 2.380924 | 0.857109 | 0.702056 | 1.0 |
8 | 2.822314 | 1.386777 | 0.800077 | 1.0 |
10 | 2.803794 | 1.364553 | 0.796499 | 1.0 |
11 | 1.780739 | 0.136887 | 0.534168 | 1.0 |
We can notice that even though the probability of the event is very low, it still happens. This is the nature of the Bernoulli distribution.
'y_true == 0').head(10) df.query(
radius | logits | probabilities | y_true | |
---|---|---|---|---|
0 | 2.646808 | 1.176169 | 0.764258 | 0.0 |
3 | 2.877917 | 1.453500 | 0.810537 | 0.0 |
4 | 1.171345 | -0.594386 | 0.355629 | 0.0 |
5 | 1.802686 | 0.163223 | 0.540715 | 0.0 |
7 | 2.380924 | 0.857109 | 0.702056 | 0.0 |
9 | 0.399558 | -1.520531 | 0.179383 | 0.0 |
10 | 2.803794 | 1.364553 | 0.796499 | 0.0 |
14 | 2.223282 | 0.667939 | 0.661041 | 0.0 |
15 | 1.288213 | -0.454144 | 0.388376 | 0.0 |
18 | 0.799740 | -1.040312 | 0.261090 | 0.0 |
# Plot the data
=0.1, marker='|', color='k')
plt.scatter(radius_array, y_true, alpha'Radius')
plt.xlabel(
# Use Limes and Lemon markers only on y-axis
0, 1], ['Limes', 'Lemons'])
plt.yticks(['Fruit') plt.ylabel(
Text(0, 0.5, 'Fruit')
# Logistic regression model
class LogisticRegression(nn.Module):
def __init__(self):
super(LogisticRegression, self).__init__()
self.linear = nn.Linear(1, 1)
def forward(self, x):
return self.linear(x)
= LogisticRegression()
model
# Training the model
= nn.BCEWithLogitsLoss()
criterion
= torch.optim.Adam(model.parameters(), lr=0.01)
optimizer
# Convert the data to PyTorch tensors
= radius_array.unsqueeze(1)
radius_tensor = y_true.unsqueeze(1)
y_true_tensor
# Training loop
= 1000
n_epochs for epoch in range(n_epochs):
model.train()
optimizer.zero_grad()
# Forward pass
= model(radius_tensor)
y_pred
# Compute loss
= criterion(y_pred, y_true_tensor)
loss
# Backward pass
loss.backward()
# Update weights
optimizer.step()
if epoch % 100 == 0:
print(f'Epoch: {epoch}, Loss: {loss.item()}')
Epoch: 0, Loss: 0.8052948117256165
Epoch: 100, Loss: 0.6282714605331421
Epoch: 200, Loss: 0.593073308467865
Epoch: 300, Loss: 0.5790674686431885
Epoch: 400, Loss: 0.5744827389717102
Epoch: 500, Loss: 0.5731877088546753
Epoch: 600, Loss: 0.5728737711906433
Epoch: 700, Loss: 0.5728095769882202
Epoch: 800, Loss: 0.5727986693382263
Epoch: 900, Loss: 0.5727971196174622
# Learned weights and bias
= model.linear.weight.item()
w_learned = model.linear.bias.item()
b_learned
# Compare the true and learned weights and bias
print(f'True weights: {w_true}, Learned weights: {w_learned}')
print(f'True bias: {b_true}, Learned bias: {b_learned}')
True weights: 1.2, Learned weights: 1.2514334917068481
True bias: -2.0, Learned bias: -2.0183537006378174
# Test if a new fruit is a lime or a lemon
def predict_fruit(radius, model):
eval()
model.= torch.tensor([[radius]])
radius_tensor = model(radius_tensor)
logits = sigmoid(logits).item()
prob = ['Lime', 'Lemon'][int(prob > 0.5)]
fruit return fruit, prob
0.5, model) predict_fruit(
('Lime', 0.19898711144924164)
1.5, model) predict_fruit(
('Lime', 0.46475765109062195)
2.0, model) predict_fruit(
('Lemon', 0.6188130378723145)
# Decision surface
= torch.linspace(0, 3, 100).unsqueeze(1)
radius_values = sigmoid(model(radius_values)).detach()
probs
plt.plot(radius_values, probs)'Radius')
plt.xlabel('Probability of being a lemon')
plt.ylabel('Decision surface')
plt.title(0, 1)
plt.ylim(0.5, color='r', linestyle='--')
plt.axhline(
Categorical distribution
Say, we have a random variable \(X\) that can take on one of \(K\) possible values, \(1, 2, \ldots, K\). The probability mass function (PMF) of a categorical distribution is given by:
\[ p_X(x) = \begin{cases} \theta_1, & \text{if } x = 1, \\ \theta_2, & \text{if } x = 2, \\ \vdots \\ \theta_K, & \text{if } x = K. \end{cases} \]
where \(\theta_1, \theta_2, \ldots, \theta_K\) are the parameters of the categorical distribution and satisfy the following constraints:
\[ 0 \leq \theta_i \leq 1, \quad \sum_{i=1}^K \theta_i = 1. \]
We write
\[ X \sim \text{Categorical}(\theta_1, \theta_2, \ldots, \theta_K) \]
to denote that \(X\) is drawn from a categorical distribution with parameters \(\theta_1, \theta_2, \ldots, \theta_K\). The categorical distribution is a generalization of the Bernoulli distribution to more than two outcomes.
If we had a fair 6-sided die, the PMF of the die roll would be given by:
\[ p_X(x) = \frac{1}{6}, \quad x \in \{1, 2, 3, 4, 5, 6\}. \]
Imagenet
The ImageNet project is a large visual database designed for use in visual object recognition research.
= torch.tensor([0.1, 0.2, 0.3, 0.4])
theta_vec
#
= pd.Series(theta_vec.numpy())
ser ='bar', rot=0)
ser.plot(kind'Outcome')
plt.xlabel('Probability') plt.ylabel(
Text(0, 0.5, 'Probability')
= torch.distributions.Categorical(probs=theta_vec)
dist print(dist)
Categorical(probs: torch.Size([4]))
dist.support
IntegerInterval(lower_bound=0, upper_bound=3)
0.0)).exp() dist.log_prob(torch.tensor(
tensor(0.1000)
try:
4.0)).exp()
dist.log_prob(torch.tensor(except Exception as e:
print(e)
Expected value argument (Tensor of shape ()) to be within the support (IntegerInterval(lower_bound=0, upper_bound=3)) of the distribution Categorical(probs: torch.Size([4]), logits: torch.Size([4])), but found invalid values:
4.0
= dist.sample(torch.Size([1000])) samples
10] samples[:
tensor([2, 3, 2, 1, 0, 3, 3, 3, 2, 3])
=True).sort_index() pd.value_counts(samples.numpy(), normalize
/var/folders/z8/gpvqr8mn3w9_f38byxhnsk780000gn/T/ipykernel_11326/3899675828.py:1: FutureWarning: pandas.value_counts is deprecated and will be removed in a future version. Use pd.Series(obj).value_counts() instead.
pd.value_counts(samples.numpy(), normalize=True).sort_index()
0 0.115
1 0.192
2 0.276
3 0.417
Name: proportion, dtype: float64
Quality control in factories
A factory produces electronic chips, and each chip has a probability \(p\) of being defective due to manufacturing defects. The quality control team randomly selects 10 chips from a large production batch and checks how many are defective.
How can we model the number of defective chips in the sample?
We can model the number of defective chips in the sample using a binomial distribution.
Let \(X\) be the number of defective chips in the sample. \(X\) can take on values \(0, 1, 2, \ldots, 10\). The probability mass function (PMF) of a binomial distribution is given by:
\[ p_X(x) = \binom{n}{x} p^x(1-p)^{n-x}, \quad x \in \{0, 1, 2, \ldots, 10\}. \]
where \(n\) is the number of chips in the sample, \(0 < p < 1\) is the probability of a chip being defective, and \(\binom{n}{x}\) is the binomial coefficient, which is the number of ways to choose \(x\) defective chips from \(n\) chips.
= 0.1
p_failure = 10
n_chips = torch.distributions.Binomial(n_chips, p_failure) dist
dist
Binomial(total_count: 10.0, probs: 0.10000000149011612)
dist.support
IntegerInterval(lower_bound=0, upper_bound=10.0)
= torch.arange(0, n_chips+1)
x = dist.log_prob(x).exp()
y = pd.DataFrame({
df_prob_binomial 'x': x.numpy(),
'P(X=x)': y.numpy().round(5)
})
df_prob_binomial
x | P(X=x) | |
---|---|---|
0 | 0 | 0.34868 |
1 | 1 | 0.38742 |
2 | 2 | 0.19371 |
3 | 3 | 0.05740 |
4 | 4 | 0.01116 |
5 | 5 | 0.00149 |
6 | 6 | 0.00014 |
7 | 7 | 0.00001 |
8 | 8 | 0.00000 |
9 | 9 | 0.00000 |
10 | 10 | 0.00000 |
='bar', x='x', y='P(X=x)', rot=0)
df_prob_binomial.plot(kind'x')
plt.xlabel('P(X=x)') plt.ylabel(
Text(0, 0.5, 'P(X=x)')
= dist.sample(torch.Size([1000])) samples
5] samples[:
tensor([2., 0., 1., 3., 0.])
pd.Series(samples.numpy()).value_counts().sort_index()
0.0 354
1.0 390
2.0 184
3.0 58
4.0 12
5.0 1
6.0 1
Name: count, dtype: int64
Number of SMS rceived per day
Hat tip: Bayesian Methods for Hackers book
= "https://raw.githubusercontent.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/refs/heads/master/Chapter1_Introduction/data/txtdata.csv"
url = pd.read_csv(url, header=None) data
= 'Day'
data.index.name = ['Count'] data.columns
data
Count | |
---|---|
Day | |
0 | 13.0 |
1 | 24.0 |
2 | 8.0 |
3 | 24.0 |
4 | 7.0 |
... | ... |
69 | 37.0 |
70 | 5.0 |
71 | 14.0 |
72 | 13.0 |
73 | 22.0 |
74 rows × 1 columns
= plt.subplots(figsize=(21, 7))
fig, ax ='bar', rot=0, ax=ax) data.plot(kind
How can you model the number of SMS messages you receive per day?
We can model the number of SMS messages you receive per day using a Poisson distribution.
Let \(X\) be the number of SMS messages you receive per day. \(X\) can take on values \(0, 1, 2, \ldots\). The probability mass function (PMF) of a Poisson distribution is given by:
\[ p_X(x) = \frac{\lambda^x e^{-\lambda}}{x!}, \quad x \in \{0, 1, 2, \ldots\}. \]
where \(\lambda > 0\) is the average number of SMS messages you receive per day.
= 6
rate_param = torch.distributions.Poisson(rate_param) dist
dist
Poisson(rate: 6.0)
dist.support
IntegerGreaterThan(lower_bound=0)
= torch.arange(0, 20)
x_range = dist.log_prob(x_range).exp()
y = pd.DataFrame({
df_prob_poisson 'x': x_range.numpy(),
'P(X=x)': y.numpy().round(6)
})
df_prob_poisson
x | P(X=x) | |
---|---|---|
0 | 0 | 0.002479 |
1 | 1 | 0.014873 |
2 | 2 | 0.044618 |
3 | 3 | 0.089235 |
4 | 4 | 0.133853 |
5 | 5 | 0.160623 |
6 | 6 | 0.160623 |
7 | 7 | 0.137677 |
8 | 8 | 0.103258 |
9 | 9 | 0.068839 |
10 | 10 | 0.041303 |
11 | 11 | 0.022529 |
12 | 12 | 0.011264 |
13 | 13 | 0.005199 |
14 | 14 | 0.002228 |
15 | 15 | 0.000891 |
16 | 16 | 0.000334 |
17 | 17 | 0.000118 |
18 | 18 | 0.000039 |
19 | 19 | 0.000012 |
='bar', x='x', y='P(X=x)', rot=0) df_prob_poisson.plot(kind
Sales Calls Before a Successful Sale
A sales representative makes cold calls to potential customers. Each call has a probability \(p\) of resulting in a successful sale. We want to model the number of calls needed before achieving a successful sale.
We can model the number of calls needed before achieving a successful sale using a geometric distribution.
Let \(X\) be the number of calls needed before achieving a successful sale. \(X\) can take on values \(1, 2, 3, \ldots\). The probability mass function (PMF) of a geometric distribution is given by:
\[ p_X(x) = (1-p)^{x-1}p, \quad x \in \{1, 2, 3, \ldots\}. \]
where \(0 < p < 1\) is the probability of a successful sale on each call.
= 0.3
p = torch.distributions.Geometric(p) dist
dist
Geometric(probs: 0.30000001192092896)
dist.support
IntegerGreaterThan(lower_bound=0)
10])) dist.sample(torch.Size([
tensor([5., 7., 1., 4., 5., 0., 3., 0., 5., 3.])