class: center, middle, inverse, title-slide # STA 360/602L: Module 6.2 ## Bayesian linear regression (illustration) ### Dr. Olanrewaju Michael Akande --- ## Bayesian linear regression recap - Sampling model: .block[ `$$\boldsymbol{Y} \sim \mathcal{N}_n(\boldsymbol{X}\boldsymbol{\beta}, \sigma^2\boldsymbol{I}_{n\times n}).$$` ] -- - Semi-conjugate prior for `\(\boldsymbol{\beta}\)`: .block[ `$$\pi(\boldsymbol{\beta}) = \mathcal{N}_p(\boldsymbol{\mu}_0, \Sigma_0).$$` ] -- - Semi-conjugate prior for `\(\sigma^2\)`: .block[ `$$\pi(\sigma^2) = \mathcal{IG} \left(\dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right)$$` ] --- ## Full conditional .block[ $$ `\begin{split} \pi(\boldsymbol{\beta} | \boldsymbol{y}, \boldsymbol{X}, \sigma^2) & = \ \mathcal{N}_p(\boldsymbol{\mu}_n, \Sigma_n),\\ \end{split}` $$ ] -- where .block[ $$ `\begin{split} \Sigma_n & = \left[\Sigma_0^{-1} + \frac{1}{\sigma^2} \boldsymbol{X}^T \boldsymbol{X} \right]^{-1}\\ \boldsymbol{\mu}_n & = \Sigma_n \left[\Sigma_0^{-1}\boldsymbol{\mu}_0 + \frac{1}{\sigma^2} \boldsymbol{X}^T \boldsymbol{y} \right], \end{split}` $$ ] -- and .block[ $$ `\begin{split} \pi(\sigma^2 | \boldsymbol{y}, \boldsymbol{X}, \boldsymbol{\beta}) & = \mathcal{IG} \left(\dfrac{\nu_n}{2}, \dfrac{\nu_n\sigma_n^2}{2}\right), \\ \end{split}` $$ ] -- where .block[ $$ `\begin{split} \nu_n & = \nu_0 + n\\ \sigma_n^2 & = \dfrac{1}{\nu_n} \left[ \nu_0 \sigma_0^2 + (\boldsymbol{y} - \boldsymbol{X}\boldsymbol{\beta})^T (\boldsymbol{y} - \boldsymbol{X}\boldsymbol{\beta}) \right] = \dfrac{1}{\nu_n} \left[ \nu_0 \sigma_0^2 + \text{SSR}(\boldsymbol{\beta}) \right]. \end{split}` $$ ] --- ## Swimming data - Back to the swimming example. The data is from Exercise 9.1 in Hoff. -- - The data set we consider contains times (in seconds) of four high school swimmers swimming 50 yards. ```r Y <- read.table("http://www2.stat.duke.edu/~pdh10/FCBS/Exercises/swim.dat") Y ``` ``` ## V1 V2 V3 V4 V5 V6 ## 1 23.1 23.2 22.9 22.9 22.8 22.7 ## 2 23.2 23.1 23.4 23.5 23.5 23.4 ## 3 22.7 22.6 22.8 22.8 22.9 22.8 ## 4 23.7 23.6 23.7 23.5 23.5 23.4 ``` -- - There are 6 times for each student, taken every two weeks. That is, each swimmer has six measurements at `\(t = 2, 4, 6, 8, 10, 12\)` weeks. -- - Each row corresponds to a swimmer and a higher column index indicates a later date. --- ## Swimming data - Given that we don't have enough data, we can explore hierarchical models. That way, we can borrow information across swimmers. -- - For now, however, we will fit a separate linear regression model for each swimmer, with swimming time as the response and week as the explanatory variable (which we will mean center). -- - For setting priors, we have one piece of information: times for this age group tend to be between 22 and 24 seconds. -- - Based on that, we can set uninformative parameters for the prior on `\(\sigma^2\)` and for the prior on `\(\boldsymbol{\beta}\)`, we can set .block[ .small[ `\begin{eqnarray*} \pi(\boldsymbol{\beta}) = \mathcal{N}_2\left(\boldsymbol{\mu}_0 = \left(\begin{array}{c} 23\\ 0 \end{array}\right),\Sigma_0 = \left(\begin{array}{cc} 5 & 0 \\ 0 & 2 \end{array}\right)\right).\\ \end{eqnarray*}` ] ] -- - This centers the intercept at 23 (the middle of the given range) and the slope at 0 (so we are assuming no increase) but we choose the variance to be a bit large to err on the side of being less informative. --- ## Posterior computation ```r #Create X matrix, transpose Y for easy computayion Y <- t(Y) n_swimmers <- ncol(Y) n <- nrow(Y) W <- seq(2,12,length.out=n) X <- cbind(rep(1,n),(W-mean(W))) p <- ncol(X) #Hyperparameters for the priors mu_0 <- matrix(c(23,0),ncol=1) Sigma_0 <- matrix(c(5,0,0,2),nrow=2,ncol=2) nu_0 <- 1 sigma_0_sq <- 1/10 #Initial values for Gibbs sampler #No need to set initial value for sigma^2, we can simply sample it first beta <- matrix(c(23,0),nrow=p,ncol=n_swimmers) sigma_sq <- rep(1,n_swimmers) #first set number of iterations and burn-in, then set seed n_iter <- 10000; burn_in <- 0.3*n_iter set.seed(1234) #Set null matrices to save samples BETA <- array(0,c(n_swimmers,n_iter,p)) SIGMA_SQ <- matrix(0,n_swimmers,n_iter) ``` --- ## Posterior computation ```r #Now, to the Gibbs sampler #library(mvtnorm) for multivariate normal #first set number of iterations and burn-in, then set seed n_iter <- 10000; burn_in <- 0.3*n_iter set.seed(1234) for(s in 1:(n_iter+burn_in)){ for(j in 1:n_swimmers){ #update the sigma_sq nu_n <- nu_0 + n SSR <- t(Y[,j] - X%*%beta[,j])%*%(Y[,j] - X%*%beta[,j]) nu_n_sigma_n_sq <- nu_0*sigma_0_sq + SSR sigma_sq[j] <- 1/rgamma(1,(nu_n/2),(nu_n_sigma_n_sq/2)) #update beta Sigma_n <- solve(solve(Sigma_0) + (t(X)%*%X)/sigma_sq[j]) mu_n <- Sigma_n %*% (solve(Sigma_0)%*%mu_0 + (t(X)%*%Y[,j])/sigma_sq[j]) beta[,j] <- rmvnorm(1,mu_n,Sigma_n) #save results only past burn-in if(s > burn_in){ BETA[j,(s-burn_in),] <- beta[,j] SIGMA_SQ[j,(s-burn_in)] <- sigma_sq[j] } } } ``` --- ## Results - Before looking at the posterior samples, what are the OLS estimates for all the parameters? ```r beta_ols <- matrix(0,nrow=p,ncol=n_swimmers) for(j in 1:n_swimmers){ beta_ols[,j] <- solve(t(X)%*%X)%*%t(X)%*%Y[,j] } colnames(beta_ols) <- c("Swimmer 1","Swimmer 2","Swimmer 3","Swimmer 4") rownames(beta_ols) <- c("beta_0","beta_1") beta_ols ``` ``` ## Swimmer 1 Swimmer 2 Swimmer 3 Swimmer 4 ## beta_0 22.93333333 23.35000000 22.76667 23.56666667 ## beta_1 -0.04571429 0.03285714 0.02000 -0.02857143 ``` -- </br> - <div class="question"> Can you interpret the parameters? </div> </br> -- - Any thoughts on who the coach should recommend based on this alone? Is this how we should be answering the question? --- ## Posterior inference - Posterior means are almost identical to OLS estimates. ```r beta_postmean <- t(apply(BETA,c(1,3),mean)) colnames(beta_postmean) <- c("Swimmer 1","Swimmer 2","Swimmer 3","Swimmer 4") rownames(beta_postmean) <- c("beta_0","beta_1") beta_postmean ``` ``` ## Swimmer 1 Swimmer 2 Swimmer 3 Swimmer 4 ## beta_0 22.9339174 23.34963191 22.76617785 23.56614309 ## beta_1 -0.0453998 0.03251415 0.01991469 -0.02854268 ``` -- - How about credible intervals? ```r beta_postCI <- apply(BETA,c(1,3),function(x) quantile(x,probs=c(0.025,0.975))) colnames(beta_postCI) <- c("Swimmer 1","Swimmer 2","Swimmer 3","Swimmer 4") beta_postCI[,,1]; beta_postCI[,,2] ``` ``` ## Swimmer 1 Swimmer 2 Swimmer 3 Swimmer 4 ## 2.5% 22.76901 23.15949 22.60097 23.40619 ## 97.5% 23.09937 23.53718 22.93082 23.73382 ``` ``` ## Swimmer 1 Swimmer 2 Swimmer 3 Swimmer 4 ## 2.5% -0.093131856 -0.02128792 -0.02960257 -0.07704344 ## 97.5% 0.002288246 0.08956464 0.06789081 0.01940960 ``` --- ## Posterior inference - <div class="question"> Is there any evidence that the times matter? </div> ```r beta_pr_great_0 <- t(apply(BETA,c(1,3),function(x) mean(x > 0))) colnames(beta_pr_great_0) <- c("Swimmer 1","Swimmer 2","Swimmer 3","Swimmer 4") rownames(beta_pr_great_0) <- c("beta_0","beta_1") beta_pr_great_0 ``` ``` ## Swimmer 1 Swimmer 2 Swimmer 3 Swimmer 4 ## beta_0 1.0000 1.0000 1.0000 1.0000 ## beta_1 0.0287 0.9044 0.8335 0.0957 ``` ```r #or alternatively, beta_pr_less_0 <- t(apply(BETA,c(1,3),function(x) mean(x < 0))) colnames(beta_pr_less_0) <- c("Swimmer 1","Swimmer 2","Swimmer 3","Swimmer 4") rownames(beta_pr_less_0) <- c("beta_0","beta_1") beta_pr_less_0 ``` ``` ## Swimmer 1 Swimmer 2 Swimmer 3 Swimmer 4 ## beta_0 0.0000 0.0000 0.0000 0.0000 ## beta_1 0.9713 0.0956 0.1665 0.9043 ``` --- ## Posterior predictive inference - How about the posterior predictive distributions for a future time two weeks after the last recorded observation? ```r x_new <- matrix(c(1,(14-mean(W))),ncol=1) post_pred <- matrix(0,nrow=n_iter,ncol=n_swimmers) for(j in 1:n_swimmers){ post_pred[,j] <- rnorm(n_iter,BETA[j,,]%*%x_new,sqrt(SIGMA_SQ[j,])) } colnames(post_pred) <- c("Swimmer 1","Swimmer 2","Swimmer 3","Swimmer 4") plot(density(post_pred[,"Swimmer 1"]),col="red3",xlim=c(21.5,25),ylim=c(0,3.5),lwd=1.5, main="Predictive Distributions",xlab="swimming times") legend("topleft",2,c("Swimmer1","Swimmer2","Swimmer3","Swimmer4"),col=c("red3","blue3","orange2","black"),lwd=2,bty="n") lines(density(post_pred[,"Swimmer 2"]),col="blue3",lwd=1.5) lines(density(post_pred[,"Swimmer 3"]),col="orange2",lwd=1.5) lines(density(post_pred[,"Swimmer 4"]),lwd=1.5) ``` --- ## Posterior predictive inference <img src="6-2-linear-regression-II_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- ## Posterior predictive inference - How else can we answer the question on who the coach should recommend for the swim meet in two weeks time? Few different ways. -- - Let `\(Y_j^\star\)` be the predicted swimming time for each swimmer `\(j\)`. We can do the following: using draws from the predictive distributions, compute the posterior probability that `\(P(Y_j^\star = \text{min}(Y_1^\star,Y_2^\star,Y_3^\star,Y_4^\star))\)` for each swimmer `\(j\)`, and based on this make a recommendation to the coach. -- - That is, ```r post_pred_min <- as.data.frame(apply(post_pred,1,function(x) which(x==min(x)))) colnames(post_pred_min) <- "Swimmers" post_pred_min$Swimmers <- as.factor(post_pred_min$Swimmers) levels(post_pred_min$Swimmers) <- c("Swimmer 1","Swimmer 2","Swimmer 3","Swimmer 4") table(post_pred_min$Swimmers)/n_iter ``` ``` ## ## Swimmer 1 Swimmer 2 Swimmer 3 Swimmer 4 ## 0.7790 0.0078 0.1994 0.0138 ``` - <div class="question"> Which swimmer would you recommend? </div> --- class: center, middle # What's next? ### Move on to the readings for the next module!