class: center, middle, inverse, title-slide # STA 360/602L: Module 3.5 ## The normal model: joint inference for mean and variance ### Dr. Olanrewaju Michael Akande --- ## Joint inference for mean and variance - We have derived the posterior for the `\(\mu\)`, conditional on `\(\sigma\)`/ `\(\tau\)` being known. What happens when `\(\sigma\)`/ `\(\tau\)` is unknown? We need a joint prior `\(\pi(\mu,\sigma^2)\)` for `\(\mu\)` and `\(\sigma^2\)`. -- - Write the joint prior distribution for the mean and variance as the product of a conditional and a marginal distribution. That is, .block[ .small[ `$$\pi(\mu,\sigma^2) \ = \ \pi(\mu | \sigma^2)\pi(\sigma^2).$$` ] ] -- - From the previous module, we have seen that we can set the conditional prior `\(\pi(\mu | \sigma^2)\)` to be a normal distribution. -- - For `\(\pi(\sigma^2)\)`, we need a distribution with support on `\((0,\infty)\)`. One such family is the gamma family, but this is NOT conjugate for the variance of a normal distribution. -- - The gamma distribution is, however, conjugate for the precision `\(\tau\)`, and in that case, we say that `\(\sigma^2\)` has an .hlight[inverse-gamma] distribution. --- ## Joint inference for mean and variance - Recall that conjugacy means that for a prior `\(\pi(\theta)\)` in a class of distributions `\(\mathcal{P}\)` , `\(\pi(\theta | Y)\)` is also in class `\(\mathcal{P}\)`. -- - However, when we have multiple parameters, the dependence structure in the prior must also be preserved in the posterior, for conjugacy to hold. -- - So, if .block[ .small[ `$$\pi(\mu,\sigma^2) \ = \ \pi(\mu | \sigma^2)\pi(\sigma^2).$$` ] ] with `\(\pi(\mu | \sigma^2)\)` a normal distribution, and `\(\pi(\sigma^2)\)` an inverse-gamma distribution, we will have conjugacy if `\(\pi(\mu,\sigma^2 | Y)\)` can also be written as .block[ .small[ `$$\pi(\mu,\sigma^2 | Y) \ = \ \pi(\mu | \sigma^2, Y)\pi(\sigma^2|Y),$$` ] ] where `\(\pi(\mu | \sigma^2, Y)\)` is also a normal distribution, and `\(\pi(\sigma^2|Y)\)` is an inverse-gamma distribution, just like the prior. --- ## Inverse-gamma distribution - As before, we will continue to work mostly in terms of the precision `\(\tau\)`. -- - That is, we will deal with the already familiar gamma distribution, instead of the inverse-gamma distribution. -- - However, as a quick review, if `\(\theta \sim \mathcal{IG}(a,b)\)`, then the pdf is .block[ `$$p(\theta) = \frac{b^a}{\Gamma(a)} \theta^{-(a+1)} e^{-\frac{b}{\theta} }\ \ \ \textrm{for} \ \ \ a, b > 0,$$` ] where + `\(\mathbb{E}[\theta] = \frac{b}{a - 1}\)`; + `\(\mathbb{V}[\theta] = \frac{b^2}{(a - 1)^2(a-2)} \ \ \textrm{for} \ \ a \geq 2\)`.; + `\(\textrm{Mode}[\theta] = \frac{b}{a+1}\)`. --- ## Conjugate prior - Once again, suppose `\(Y = (y_1,y_2,\ldots,y_n)\)`, where each .block[ `$$y_i \sim \mathcal{N}(\mu, \tau^{-1}).$$` ] -- - A conjugate joint prior is given by .block[ $$ `\begin{split} \tau = \frac{1}{\sigma^2} \ & \sim \textrm{Gamma}\left(\frac{\nu_0}{2}, \frac{\nu_0\sigma_0^2}{2}\right)\\ \mu|\tau & \sim \mathcal{N}\left(\mu_0, \frac{1}{\kappa_0 \tau}\right).\\ \end{split}` $$ ] -- - This is often called a .hlight[normal-gamma] prior distribution. -- - `\(\sigma_0^2\)` is the prior guess for `\(\sigma^2\)`, while `\(\nu_0\)` is often referred to as the "prior degrees of freedom", our degree of confidence in `\(\sigma_0^2\)`. -- - We do not have conjugacy if we replace `\(\frac{1}{\kappa_0 \tau}\)` in the normal prior with an arbitrary prior variance independent of `\(\tau/\sigma^2\)`. To do inference in that scenario, we need .hlight[Gibbs sampling] (to come soon!). --- ## Conjugate prior - So, we have .block[ .small[ `$$\pi(\mu|\tau) = \mathcal{N}\left(\mu_0, \frac{1}{\kappa_0 \tau}\right)\propto \ \tau^{\frac{1}{2}} \cdot \ \textrm{exp}\left\{-\frac{1}{2} \kappa_0 \tau (\mu - \mu_0)^2 \right\}.$$` ] ] -- - and .block[ .small[ `$$\pi(\tau) = \textrm{Ga}\left(\frac{\nu_0}{2}, \frac{\nu_0\sigma_0^2}{2}\right) \propto \tau^{\frac{\nu_0}{2}-1} \textrm{exp}\left\{-\frac{\tau\nu_0\sigma_0^2}{2}\right\}.$$` ] ] - Thus, the kernel of the normal-gamma prior distribution is .block[ .small[ $$ `\begin{split} \Rightarrow \pi(\mu,\tau) \ & \boldsymbol{ =} \ \pi(\mu|\tau) \cdot \pi(\tau) = \mathcal{N}\left(\mu_0, \frac{1}{\kappa_0 \tau} \right) \cdot \textrm{Gamma}\left(\frac{\nu_0}{2}, \frac{\nu_0\sigma_0^2}{2}\right)\\ & \boldsymbol{ \propto} \underbrace{\ \tau^{\frac{1}{2}}\textrm{exp}\left\{-\frac{1}{2} \kappa_0 \tau (\mu - \mu_0)^2 \right\}}_{\propto \ \pi(\mu|\tau)} \cdot \underbrace{\tau^{\frac{\nu_0}{2}-1} \textrm{exp}\left\{-\frac{\tau\nu_0\sigma_0^2}{2}\right\}}_{\propto \ \pi(\tau)}.\\ \end{split}` $$ ] ] -- - Take note of this form. When we derive the posterior kernel, we will try to match it to this to recognize the parameters. --- ## Posterior for the mean given variance, under normal-gamma prior - Based on the normal-gamma prior, we need `\(\pi(\mu|Y,\tau)\)` and `\(\pi(\tau|Y)\)`. -- - For `\(\pi(\mu|Y,\tau)\)`, we already know from the previous module that it will be a normal distribution. -- - However, some algebra is required to get `\(\pi(\tau|Y)\)`. -- - Infact, we need to write the full joint posterior and go from there, because we will need to keep some of the terms we discarded in the derivation in the last module. -- - First, recall that the likelihood is .block[ .large[ `$$P(Y| \mu,\tau) \propto \tau^{\frac{n}{2}} \ \textrm{exp}\left\{-\frac{1}{2} \tau s^2(n-1) \right\} \ \textrm{exp}\left\{-\frac{1}{2} \tau n(\mu - \bar{y})^2 \right\}.$$` ] ] --- ## Posterior derivation .block[ .small[ $$ `\begin{split} \text{Then, } \ \ \ \ \pi(\mu,\tau | Y) \ & \boldsymbol{ \propto} \ \pi(\mu|\tau) \times \pi(\tau) \times P(Y| \mu,\tau)\\ \\ & \boldsymbol{ \propto} \underbrace{\ \tau^{\frac{1}{2}}\textrm{exp}\left\{-\frac{1}{2} \kappa_0 \tau (\mu - \mu_0)^2 \right\}}_{\propto \ \pi(\mu|\sigma^2)} \times \underbrace{\tau^{\frac{\nu_0}{2}-1} \textrm{exp}\left\{-\frac{\tau\nu_0\sigma_0^2}{2}\right\}}_{\propto \ \pi(\tau)}\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \underbrace{\tau^{\frac{n}{2}} \ \textrm{exp}\left\{-\frac{1}{2} \tau s^2(n-1) \right\} \ \textrm{exp}\left\{-\frac{1}{2} \tau n(\mu - \bar{y})^2 \right\}}_{\propto \ P(Y| \mu,\tau)}\\ \\ & \boldsymbol{=} \underbrace{\textrm{exp}\left\{-\frac{1}{2} \kappa_0 \tau (\mu - \mu_0)^2 \right\} \ \textrm{exp}\left\{-\frac{1}{2} \tau n(\mu - \bar{y})^2 \right\}}_{\textrm{Terms involving } \mu}\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \underbrace{\ \tau^{\frac{1}{2}}\tau^{\frac{\nu_0}{2}-1} \textrm{exp}\left\{-\frac{\tau\nu_0\sigma_0^2}{2}\right\} \tau^{\frac{n}{2}} \ \textrm{exp}\left\{-\frac{1}{2} \tau s^2(n-1) \right\}}_{\text{Terms involving } \tau \text{ but NOT } \mu}\\ \end{split}` $$ ] ] --- ## Posterior derivation .block[ .small[ $$ `\begin{split} \pi(\mu,\tau | Y) \ & \boldsymbol{ \propto} \underbrace{\textrm{exp}\left\{-\frac{1}{2} \kappa_0 \tau (\mu^2 - 2\mu\mu_0 + \mu_0^2) \right\}\ \textrm{exp}\left\{-\frac{1}{2} \tau n(\mu^2 - 2\mu\bar{y} + \bar{y}^2) \right\}}_{\textrm{Terms involving } \mu}\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \underbrace{\ \tau^{\frac{1}{2}}\tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau \left[\nu_0\sigma_0^2 + s^2(n-1) \right] }{2}\right\}}_{\text{Terms involving } \tau \text{ but NOT } \mu}\\ & \boldsymbol{=} \ \ \underbrace{\textrm{exp}\left\{-\frac{1}{2} \left[\kappa_0 \tau (\mu^2 - 2\mu\mu_0) + \tau n(\mu^2 - 2\mu\bar{y}) \right] \right\}}_{\textrm{Terms involving } \mu}\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \underbrace{\ \tau^{\frac{1}{2}}\textrm{exp}\left\{-\frac{1}{2} \left[ \kappa_0 \tau \mu_0^2 + \tau n \bar{y}^2\right] \right\} \ \cdot \tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau \left[\nu_0\sigma_0^2 + s^2(n-1) \right] }{2}\right\}}_{\text{Terms involving } \tau \text{ but NOT } \mu}\\ & \boldsymbol{=} \ \ \underbrace{\textrm{exp}\left\{-\frac{1}{2} \left[ \mu^2(n\tau + \kappa_0 \tau) - 2\mu(n\tau \bar{y} + \kappa_0 \tau\mu_0) \right] \right\}}_{\textrm{Terms involving } \mu}\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \underbrace{\ \tau^{\frac{1}{2}}\textrm{exp}\left\{-\frac{1}{2} \left[ \kappa_0 \tau \mu_0^2 + \tau n \bar{y}^2\right] \right\} \ \cdot \tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau \left[\nu_0\sigma_0^2 + s^2(n-1) \right] }{2}\right\}}_{\text{Terms involving } \tau \text{ but NOT } \mu}\\ \end{split}` $$ ] ] --- ## Posterior derivation - To match the terms for the terms involving `\(\mu\)` to the normal kernel in the prior, we need to complete the square so that we have something that looks like the `\((\mu - \mu_0)^2\)` term in our prior. -- - Recall how to complete the square. Specifically, we can write .block[ .small[ `$$a\mu^2 + b\mu$$` ] ] as .block[ .small[ `$$a(\mu + d)^2 + e,$$` ] ] where + `\(d = \dfrac{b}{2a}\)`, and + `\(e = - \dfrac{b^2}{4a}\)`. --- ## Posterior derivation - First, write out the posterior again: .block[ .small[ $$ `\begin{split} \pi(\mu,\tau | Y) \ & \boldsymbol{=} \ \ \underbrace{\textrm{exp}\left\{-\frac{1}{2} \left[ (n\tau + \kappa_0 \tau)\mu^2 - 2\mu(n\tau \bar{y} + \kappa_0 \tau\mu_0) \right] \right\}}_{\textrm{Terms involving } \mu}\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \underbrace{\ \tau^{\frac{1}{2}} \textrm{exp}\left\{-\frac{1}{2} \left[ \kappa_0 \tau \mu_0^2 + \tau n \bar{y}^2\right] \right\} \ \cdot \tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau \left[\nu_0\sigma_0^2 + s^2(n-1) \right] }{2}\right\}}_{\text{Terms involving } \tau \text{ but NOT } \mu}\\ \end{split}` $$ ] ] - Set `\(a^\star = (n\tau + \kappa_0 \tau)\)` and `\(b^\star = (n\tau \bar{y} + \kappa_0 \tau\mu_0)\)`, then complete the square for the first part. .block[ .small[ $$ `\begin{split} \Rightarrow \ \pi(\mu,\tau | Y) \ & \boldsymbol{ \propto} \ \ \underbrace{\textrm{exp}\left\{-\frac{1}{2} \left[ a^\star \mu^2 - 2 b^\star \mu \right] \right\}}_{\textrm{Terms involving } \mu}\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \underbrace{\ \tau^{\frac{1}{2}}\textrm{exp}\left\{-\frac{1}{2} \left[ \kappa_0 \tau \mu_0^2 + \tau n \bar{y}^2\right] \right\} \ \cdot \tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau \left[\nu_0\sigma_0^2 + s^2(n-1) \right] }{2}\right\}}_{\text{Terms involving } \tau \text{ but NOT } \mu}\\ \end{split}` $$ ] ] --- ## Posterior derivation .block[ .small[ $$ `\begin{split} \Rightarrow \ \pi(\mu,\tau | Y) \ & \boldsymbol{ \propto} \ \ \ \tau^{\frac{1}{2}} \textrm{exp}\left\{-\frac{1}{2} a^\star \left[ \mu- \frac{b^\star}{a^\star} \right]^2 + \frac{(b^\star)^2}{2a^\star} \right\} \ \cdot \ \textrm{exp}\left\{-\frac{1}{2} \left[ \kappa_0 \tau \mu_0^2 + \tau n \bar{y}^2\right] \right\}\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau \left[\nu_0\sigma_0^2 + s^2(n-1) \right] }{2}\right\}\\ \\ & \boldsymbol{=} \ \ \ \tau^{\frac{1}{2}} \textrm{exp}\left\{-\frac{1}{2} a^\star \left[ \mu- \frac{b^\star}{a^\star} \right]^2 \right\} \ \underbrace{\textrm{exp}\left\{-\frac{1}{2} \left[ \kappa_0 \tau \mu_0^2 + \tau n \bar{y}^2 - \frac{(b^\star)^2}{a^\star}\right] \right\}}_{\textrm{Next, substitute the values for } a^\star \textrm{ and } b^\star \textrm{ back}}\\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau \left[\nu_0\sigma_0^2 + s^2(n-1) \right] }{2}\right\}\\ \\ & \boldsymbol{=} \ \ \ \tau^{\frac{1}{2}} \textrm{exp}\left\{-\frac{1}{2} a^\star \left[ \mu- \frac{b^\star}{a^\star} \right]^2 \right\} \ \textrm{exp} \underbrace{\left\{-\frac{1}{2} \left[ \kappa_0 \tau \mu_0^2 + \tau n \bar{y}^2 - \frac{(n\tau \bar{y} + \kappa_0 \tau\mu_0)^2}{(n\tau + \kappa_0 \tau)}\right] \right\}}_{\textrm{Next, expand terms and recombine}} \\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau \left[\nu_0\sigma_0^2 + s^2(n-1) \right] }{2}\right\}\\ \end{split}` $$ ] ] --- ## Posterior derivation .block[ .small[ $$ `\begin{split} \Rightarrow \ \pi(\mu,\tau | Y) \ & \boldsymbol{ \propto} \ \ \ \tau^{\frac{1}{2}} \textrm{exp}\left\{-\frac{1}{2} a^\star \left[ \mu- \frac{b^\star}{a^\star} \right]^2 \right\} \ \textrm{exp}\left\{-\frac{1}{2} \left[\frac{n\kappa_0\tau^2 (\mu_0^2- 2\mu_0\bar{y} + \bar{y}^2)}{\tau(\kappa_0 + n)}\right] \right\}\\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau \left[\nu_0\sigma_0^2 + s^2(n-1) \right] }{2}\right\}\\ \\ & \boldsymbol{=} \ \ \ \tau^{\frac{1}{2}} \textrm{exp}\left\{-\frac{1}{2} a^\star \left[ \mu- \frac{b^\star}{a^\star} \right]^2 \right\} \ \textrm{exp}\left\{-\frac{\tau}{2} \left[\frac{n\kappa_0 (\bar{y} - \mu_0)^2}{(\kappa_0 + n)}\right] \right\}\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau \left[\nu_0\sigma_0^2 + s^2(n-1) \right] }{2}\right\}\\ \\ & \boldsymbol{=} \ \ \ \tau^{\frac{1}{2}}\textrm{exp} \underbrace{\left\{-\frac{1}{2} a^\star \left[ \mu- \frac{b^\star}{a^\star} \right]^2 \right\}}_{\textrm{Substitute the values for } a^\star \textrm{ and } b^\star \textrm{ back}} \\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau \left[\nu_0\sigma_0^2 + s^2(n-1) \right] }{2}\right\} \textrm{exp}\left\{-\frac{\tau}{2} \left[\frac{n\kappa_0 (\bar{y} - \mu_0)^2}{(\kappa_0 + n)}\right] \right\}\\ \end{split}` $$ ] ] --- ## Posterior derivation .block[ .small[ $$ `\begin{split} \Rightarrow \ \pi(\mu,\tau | Y) \ & \boldsymbol{ \propto} \ \ \underbrace{\ \tau^{\frac{1}{2}}\textrm{exp}\left\{-\frac{1}{2} (n\tau + \kappa_0 \tau) \left[ \mu- \frac{(n\tau \bar{y} + \kappa_0 \tau\mu_0)}{(n\tau + \kappa_0 \tau)} \right]^2 \right\}}_{\textrm{Normal Kernel}}\\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \underbrace{\tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau}{2} \left[\nu_0\sigma_0^2 + s^2(n-1) + \frac{n\kappa_0}{(\kappa_0 + n)} (\bar{y} - \mu_0)^2 \right] \right\}}_{\textrm{Gamma Kernel}}\\ \\ & \boldsymbol{=} \ \ \underbrace{\ \tau^{\frac{1}{2}} \textrm{exp}\left\{-\frac{1}{2} \tau(\kappa_0 + n) \left[ \mu - \frac{(\kappa_0 \mu_0 + n \bar{y})}{(\kappa_0 + n)} \right]^2 \right\}}_{\textrm{Normal Kernel}}\\\ & \ \ \ \ \ \ \ \ \ \ \ \ \times \underbrace{\tau^{\frac{\nu_0 + n}{2}-1} \textrm{exp}\left\{-\frac{\tau}{2} \left[\nu_0\sigma_0^2 + s^2(n-1) + \frac{n\kappa_0}{(\kappa_0 + n)} (\bar{y} - \mu_0)^2 \right] \right\}}_{\textrm{Gamma Kernel}}\\ \end{split}` $$ ] ] --- ## Posterior derivation .block[ .small[ $$ `\begin{split} \Rightarrow \ \pi(\mu,\tau | Y) \ & \boldsymbol{=} \ \ \mathcal{N}\left(\mu_n, \frac{1}{\kappa_n\tau} \right) \times \textrm{Gamma}\left(\frac{\nu_n}{2}, \frac{\nu_n \sigma_n^2}{2}\right) = \pi(\mu|Y, \tau) \pi(\tau|Y), \end{split}` $$ ] ] where .block[ .small[ $$ `\begin{split} \kappa_n & = \kappa_0 + n\\ \mu_n & = \frac{\kappa_0 \mu_0 + n\bar{y}}{\kappa_n} = \frac{\kappa_0}{\kappa_n} \mu_0 + \frac{n}{\kappa_n} \bar{y}\\ \nu_n & = \nu_0 + n\\ \sigma_n^2 & = \frac{1}{\nu_n}\left[\nu_0\sigma_0^2 + s^2(n-1) + \frac{n\kappa_0}{\kappa_n} (\bar{y} - \mu_0)^2 \right] = \frac{1}{\nu_n}\left[\nu_0\sigma_0^2 + \sum_{i=1}^n (y_i-\bar{y})^2 + \frac{n\kappa_0}{\kappa_n} (\bar{y} - \mu_0)^2 \right]\\ \end{split}` $$ ] ] -- - Turns out that the marginal posterior of `\(\mu\)`, that is, `\(\pi(\mu|Y) = \int_0^\infty \pi(\mu, \tau|Y) d\tau\)` is a **t-distribution**. -- - You can derive that distribution if you are interested, we won't spend time on it in class. We will be able to sample from it through Monte Carlo anyway. --- class: center, middle # What's next? ### Move on to the readings for the next module!