Lab 5: Truncated data

STA 602L: Bayesian and Modern Statistics

Due: 11:59pm, Friday, May 28

Housekeeping

R/RStudio

You all should have R and RStudio installed on your computers by now. If you do not, first install the latest version of R here: https://cran.rstudio.com (remember to select the right installer for your operating system). Next, install the latest version of RStudio here: https://www.rstudio.com/products/rstudio/download/. Scroll down to the “Installers for Supported Platforms” section and find the right installer for your operating system.

R Markdown

You are required to use R Markdown to type up this lab report. If you do not already know how to use R markdown, here is a very basic R Markdown template: https://sta-360-602l-su21.github.io/Course-Website/labs/resources/LabReport.Rmd. Refer to the resources tab of the course website (here: https://sta-360-602l-su21.github.io/Course-Website/resources/ ) for links to help you learn how to use R markdown.

Gradescope

You MUST submit both your .Rmd and .pdf files to the course site on Gradescope here: https://www.gradescope.com/courses/190490/assignments. Make sure to knit to pdf and not html; ask the TA about knitting to pdf if you cannot figure it out. Be sure to submit under the right assignment entry.

Getting started

You will need the following R packages. If you do not already have them installed, please do so first using the install.packages function.

require(tidyverse)
require(magrittr)
require(bayesplot)
require(loo)
require(readxl)
require(plyr)
require(ggrepel)
library(cowplot)
library(truncnorm)

Introduction

So far we have only worked with the uniform density where the boundaries of the support are known. That is, so far, if \(X \sim \textrm{Unif}(a,b)\), then \(a\) and \(b\) are known. This lab will take you through the process of building a generative model using a conjugate prior for the uniform density on an interval \([0, \theta]\), with some parameter \(\theta\). The model we will encounter restricts, or truncates, the support of some of the parameters. Using these truncated distributions, we will write and implement a Gibbs sampling algorithm to infer the value of the boundary parameter \(\theta\) given noisy measurements from a uniform interval.

Uniform likelihood

Truncated distributions are usually described by a density function accompanied with an indicator function, which modifies the support of the random variable. For instance, a random variable \(X\) with a uniform density function on an interval \([0, \theta]\) will have density function:

\[ p(x | \theta) = \frac{1}{\theta} \cdot \mathbb{I}[0 < x \leq \theta] \]

Let’s work through the conjuage update for the uniform likelihood to practice working with indicator functions.

Inference on boundary parameter \(\theta\)

Conjugate prior

As a function of \(\theta\) alone, the uniform likelihood is proportional to a Pareto density function. If \(\theta \sim \textrm{Pareto}(m,k)\), then \[ \pi(\theta | m,k) = \frac{k m^k}{\theta^{k + 1}} \cdot \mathbb{I}[\theta \geq m], \]

For hyperparameters \(m\) and \(k\).

The Pareto prior is conjugate for the uniform likelihood, so that the posterior distribution of \(\theta\) will also be Pareto. What is the form of this posterior distribution?

Posterior update

Since we assume an independent sampling mechanism, we can combine the indicator functions from the likelihood densities of the individual \(X_i\). The product of indicator functions then becomes a single indicator function operating on a combination of the \(X_i\). What does this indicator function look like? What is the posterior support of \(\theta\)?

\[ \begin{aligned} \pi(\theta | X_1, \dots, X_n) &\propto p(x_1, \dots, x_n | \theta) \cdot \pi(\theta) \\ & \propto \left(\prod^n_{i=1}\frac{1}{\theta} \mathbb{I}[0 < x_i \leq \theta] \right) \cdot \frac{1}{\theta^{k + 1}} \mathbb{I}[\theta \geq m] \\ & \propto \frac{1}{\theta^n} \mathbb{I}[0 < \textrm{all elements of } \{x_1,\ldots,x_n\} \leq \theta] \cdot \frac{1}{\theta^{k + 1}} \mathbb{I}[\theta \geq m] \\ & \propto \frac{1}{\theta^n} \mathbb{I}[0 < \max(x_1, \dots, x_n) \leq \theta] \cdot \frac{1}{\theta^{k + 1}} \mathbb{I}[m \leq \theta] \\ &= \frac{1}{\theta^{k + n + 1}} \cdot \mathbb{I}[0 < \max(x_1, \dots, x_n, m) \leq \theta] \\ &\propto \text{Pareto}\bigg(\max(x_1, \dots, x_n, m), k + n \bigg) \end{aligned} \]

Noisy samples from a disk of unknown radius

Suppose that we obtain “noisy” measurements of points sampled uniformly from a circular area of unknown radius. Our goal will be to infer the radius of the circular area, \(R\).

Suppose further that we know in advance:

The second and third items on the list above tells us that if we observe data drawn uniformly from the surface of a circle with radius \(R\), each point’s squared distance from the center of the circle will be uniformly distributed on \((0, R)\). Given these facts, let’s think about how we might do inference on \(R\).

0. Visualize the data generating process

It is often helpful to have a picture of what’s going on when we describe a generative model. Here we can simulate and plot our data, overlaying the true radius \(R\) on the sampled points \(r_1, \dots, r_n\):

#
n <- 1000
true_Rsquared <- 5
true_sigma <- 1.25
#
u <- runif(n, 0, true_Rsquared)
r <- u + rnorm(n, sd = true_sigma)
theta <- runif(n, 0, 2*pi)
#
ggplot2::ggplot() +
  geom_point(data = data.frame(x = sign(r)*sqrt(abs(r))*cos(theta), y = sign(r)*sqrt(abs(r))*sin(theta)),
             aes(x = x, y = y), shape = 1) +
  geom_path(data = data.frame(R = true_Rsquared) %>%
                              plyr::ddply(.(R), function(d){
                                data.frame(x = sqrt(d$R)*cos(seq(0, 2*pi, length.out = 100)),
                                            y = sqrt(d$R)*sin(seq(0, 2*pi, length.out = 100)))
                              }),
                       aes(x = x, y = y), alpha = 1, colour = "red") +
  coord_fixed()

1. Write down the rest of the generative model

Let \(r_i\) denote the noisy squared radius of observed data point \(i\). Given the noiseless squared radial positions of the points, we assume that the observations \(r_i\) are normally distributed, each with mean \(u_i\) and variance \(\sigma^2\). A model for the data generating process might look something like this: \[ \begin{aligned} r_i | \mu_i, \sigma^2 &\sim N(u_i, \sigma^2) \\ u_i | R^2 &\overset{iid}\sim U(0, R^2), \\ \end{aligned} \] with priors: \[ \begin{aligned} R^2 | m,k &\sim Pa(m, k) \\ 1 / \sigma^2 | \alpha, \beta &\sim Ga(\alpha, \beta) \end{aligned} \]

Here we have used the conjugate prior, the Pareto distribution, for the uniform likelihood. We have also used the conjugate Gamma prior for the precision parameter.

2. Write down the data likelihood

Write \(r = (r_1, \ldots, r_n)\). The generative model implies that the likelihood takes the form

\[ \begin{aligned} L(r ; u, \sigma^2) &= \left(\frac{1}{2 \pi \sigma^2}\right)^{n/2} \textrm{exp} \left\{-\frac{1}{2 \sigma^2}\sum_{i=1}^n (r_i - u_i)^2\right\} \end{aligned} \]

3. Write down (and factorize) the joint distribution for all parameters

Which parameters are independent of one another? How does this inform the way we can decompose the joint distribution of the parameters and the data? What does this decomposition imply for a Gibbs sampling algorithm that produces draws from the posterior distribution of \(R^2\)?

\[ \begin{aligned} p(r, u, 1/\sigma^2, R^2 | m, k, \alpha, \beta) &= p(r | u, 1/\sigma^2, R^2, m, k) \cdot p(u | R^2) \cdot p(R^2 | m, k) \cdot p(1/\sigma^2 | \alpha, \beta) \end{aligned} \]

4. Write down (if possible) full conditional distributions

In this case, we can explicitly write out the full conditional distributions for the model parameters \(u, \theta\) and \(1/\sigma^2\). What are they? Make sure to keep track of indicator functions that may truncate the support of a density function.

\[ u_i | r, \sigma^2, R^2 \sim \text{Truncated Normal}\left(r_i, \sigma^2, 0, R^2 \right) \]

\[ 1 / \sigma^2 | r, u, R^2 \sim Ga\left(n/2 + \alpha, \frac{1}{2}\sum_{i=1}^n (r_i - u_i)^2 + \beta \right) \]

\[ R^2 | u, r, \sigma^2 \sim Pa\left(\max(u_1, \dots, u_n, m), k + n \right) \]

If you are seeing the truncated normal distribution for the first time, take a few minutes to read https://en.wikipedia.org/wiki/Truncated_normal_distribution.

Sampling and results

Now we are ready to write a Gibbs sampler using truncated full conditional distributions to make inferences about the radius parameter \(R\).

# hyper-parameters
m <- 3
k <- 1
alpha <- 5/2
beta <- 5/2

#
rpareto <- function(m, k, trunc = NULL){
  p <- m*(1 - runif(1))^(-1/k)
  if(!is.null(trunc)){
    while(p > trunc){
      p <- m*(1 - runif(1))^(-1/k)
    }
  }
  return(p)
}

#
uni_pareto_gibbs <- function(S, r, m, k, alpha, beta, burn_in = min(1000, S / 2), thin = 5){
  # Reparametrize X matrix to squared radius values
  Rsq <- r
  n <- length(Rsq)
  R <- rep(1, S)
  U <- matrix(0, nrow = S, ncol = n)
  U[1, ] <- runif(n, 0, R)
  sigma <- rep(1, S)
  #
  U_curr <- U[1, ]
  R_curr <- R[1]
  sigma_curr <- sigma[1]
  for(s in 1:S){
    # Sample from full conditional of the inner radius
    R_curr <- rpareto(max(c(U_curr, m)), k + n)
    R[s] <- R_curr
    # Sample from full conditional of U values
    U_curr <- truncnorm::rtruncnorm(n, a = 0, b = R_curr, mean = Rsq, sd = sigma_curr)
    U[s, ] <- U_curr
    # Sample from full conditional of sigma
    sigma_curr <-  #complete this line
    sigma[s] <- sigma_curr
  }
  return(list(R = R[seq(burn_in, S, by = thin)], 
              U = U[seq(burn_in, S, by = thin), ], 
              sigma = sigma[seq(burn_in, S, by = thin)]))
}
#
gibbs_samps <- uni_pareto_gibbs(S = 100000, r, m, k, alpha, beta, burn_in=2000)

  1. Complete the Gibbs sampling step for \(\sigma\). Use the full conditional from “part 4” above. Make sure to save as standard deviation \(\sigma\) not variance \(\sigma^2\).

Let’s visualize our samples along with the data and the true radius \(R\):

ggplot2::ggplot() +
  geom_point(data = data.frame(x = sign(r)*sqrt(abs(r))*cos(theta), y = sign(r)*sqrt(abs(r))*sin(theta)),
             aes(x = x, y = y), shape = 1) +
  geom_path(data = data.frame(R = gibbs_samps$R) %>%
                              plyr::ddply(.(R), function(d){
                                data.frame(x = sqrt(d$R)*cos(seq(0, 2*pi, length.out = 100)),
                                            y = sqrt(d$R)*sin(seq(0, 2*pi, length.out = 100)))
                              }),
                       aes(x = x, y = y), alpha = 0.005, colour = "blue") +
  geom_path(data = data.frame(R = true_Rsquared) %>%
                              plyr::ddply(.(R), function(d){
                                data.frame(x = sqrt(d$R)*cos(seq(0, 2*pi, length.out = 100)),
                                            y = sqrt(d$R)*sin(seq(0, 2*pi, length.out = 100)))
                              }),
                       aes(x = x, y = y), alpha = 1, colour = "red") +
  coord_fixed()


  1. Make a plot to visualize the marginal posterior density of \(R^2\). How does this density compare to the true value of \(R^2\)?
  2. Make a plot to visualize the marginal posterior density of \(\sigma^2\). How does this density compare to the true value of \(\sigma^2\)? Note that the posterior samples are stored as \(\sigma\).
  3. Make a contour plot (you can use the cowplot package if you want) to visualize the bivariate posterior density of \(R^2\) and \(\sigma^2\). Indicate the truth on the plot. Comment on this bivariate plot.

Grading

10 points: 4 points for question 1; 2 points each for questions 2,3 and 4.

Acknowledgement

This lab was created by Jordan Bryan and Becky Tang.