Assignment 3 (released 18 Sep, due 27 September 7 PM)

Published

September 18, 2023

Instructions

  • Total marks: 9
  • Use torch for the assignment
  • For sampling, use torch.distributions and do not use torch.random directly
  • For computing log_prob, use torch.distributions and do not use custom formulas
  • The assignment has to be done in groups of two.
  • The assignment should be a single Jupyter notebook.
  • The results from every question of your assignment should be in visual formats such as plots and tables. Don’t show model’s log directly in Viva. All plots should have labels and legends appropriately. If not done, we may cut some marks for presentation (e.g. 10%).
  • To know more about a distribution, just look at the Wikipedia page (table on the right will have everything you need).

Questions

  1. [2.5 marks] Approximate the normalization constant for standard normal distribution using Monte Carlo integration. Assume that we only know the numerator term: \(e^\frac{-x^2}{2}\) and want to find \(I = \int_{-\infty}^{\infty} e^\frac{-x^2}{2} dx\): \[\begin{equation} \int f(x) p(x) dx \approx \frac{1}{N} \sum_{i=1}^{N} f(x_i) \text{ where } x_i \sim p(x). \end{equation}\]
    1. Take p(x) = Uniform(-a, a). Choose the following values of a {0.01, 0.02, 0.05, 0.1, 0.5, 1, 2, 3, 5}. Draw 1000 samples for each a and plot the normalizing constant on y-axis for each value of a on x-axis. Draw a horizontal line showing the analytical normalizing constant as ground truth. [1 marks]
    2. Estimate I using Monte Carlo integration for varying number of MC samples {10, 100, 10^3 , 10^4, 10^5} for a=4. For each value of number of MC samples, repeat the experiment 10 times and plot the mean estimate with plt.plot and the standard deviation of the estimate using plt.fill_between. [1 mark]
    3. Repeat (i.) using scipy.integrate.quad and compare it with the estimates obtained in (i.) using a similar plot used in (i.). [0.5 mark]
  2. [1.5 marks] Inverse CDF sampling for Cauchy distribution:
    1. Analytically derive the Inverse CDF from the CDF of the Cauchy distribution. [0.5 mark]
    2. Draw samples from the Cauchy distribution (loc=0, scale=1) with inverse CDF sampling. Use the inverse CDF derived in (i.). While generating samples from the uniform distribution, restrict the samples to be between 0.05 and 0.95 to avoid numerical instability. Verify that drawn samples are correct by plotting the kernel density estimation (empirical pdf) with sns.kdeplot (sns stands for the seaborn library) along with pdf obtained with dist.log_prob, where dist is torch.distributions.Cauchy(loc=0, scale=1) [0.5 mark]
    3. Repeat (ii.) using inverse CDF from torch.distributions.Cauchy. You can access the inverse CDF at dist.icdf where dist is an instance of torch.distributions.Cauchy. [0.5 mark]
  3. [1.5 marks] Rejection sampling:
    1. Sample the unnormalized distribution shown in the code below using rejection sampling. Use Normal(loc=5, scale=5), Uniform(-15, 15), and Laplace distribution(loc=5, scale=5) as the proposal distributions. Report the accepance ratios for each proposal distribution (You may choose suitable Multiplier value (M) while considering the support -15 < x < 15). [1 mark]
    2. Create and compare plots showing the target distribution (taget_pdf function), proposal distribution (pdf via log_prob method), scaled proposal distribution (scaled by M), and pdf of final normalized target distribution (empirical pdf) using sns.kdeplot. [0.5 mark]
import torch
import torch.distributions as D

def target_pdf(x):
    gaussian_pdf = D.Normal(0, 1.5).log_prob(x).exp()
    cauchy_pdf = D.Cauchy(5, 3).log_prob(x).exp()
    return 0.5 * gaussian_pdf + 0.7 * cauchy_pdf
  1. [3.5 marks] Generate the following classification dataset:
from sklearn.datasets import make_circles
X, y = make_circles(n_samples=100, noise=0.02, random_state=42)

We want to perform classification with the following Neural Network which returns logits for the cross-entropy loss:

import torch.nn as nn
model = nn.Sequential(
    nn.Linear(2, 8),
    nn.ReLU(),
    nn.Linear(8, 1)
)
  1. What are the total number of parameters in this model? Apply \(\mathcal{N}(0, 1)\) prior on all the parameters of the neural network and find MAP estimate of parameters and plot the predicted probability surface on a dense grid along with data (see this example for code reference). You may choose the appropriate limits for the grid to cover the data with some additional margin. [0.5 mark]
  2. What is the expected size of the Hessian matrix? Compute Full Hessian and invert it to get the covariance matrix. Visualize the heatmap of this covariance matrix with seaborn or matplotlib. A valid covariance matrix is always positive semi-definite (PSD). Check if the obtained covariance matrix is PSD. [1.5 marks]
  3. As an approximation, we can ignore the off-diagonal elements of Hessian. Why such a matrix might be easier to invert? After inverting and getting the covariance matrix, visualize it with a heatmap and compare it with the Full covariance obtained in (ii). [0.5 mark]
  4. Sample parameters from the posterior distribution. Use these parameters to obtain the predictive posterior with Monte Carlo sampling on a uniform 2d grid. Plot mean and standard deviation surfaces side-by-side with something similar to plt.subplots(). What are your observations? [1 mark]

Following functions from hamiltorch will become useful while working on this:

# flatten function returns a 1d tensor of parameters from the model.
flat_params = hamiltorch.util.flatten(model)

# unflatten function returns a list of parameters which has
# the same structure as list(model.parameters())
params_list = hamiltorch.util.unflatten(model, flat_params)

# Put back the weights in the model in-place
hamiltorch.util.update_model_params_in_place(model, params_list)