class: center, middle, inverse, title-slide # STA 360/602L: Module 5.4 ## Hierarchical normal modeling of means and variances ### Dr. Olanrewaju Michael Akande --- ## Hierarchical modeling of means recap - We've looked at the hierarchical normal model of the form .block[ .small[ $$ `\begin{split} y_{ij} | \theta_j, \sigma^2 & \sim \mathcal{N} \left( \theta_j, \sigma^2 \right); \ \ \ i = 1,\ldots, n_j\\ \theta_j | \mu, \tau^2 & \sim \mathcal{N} \left( \mu, \tau^2 \right); \ \ \ j = 1, \ldots, J. \end{split}` $$ ] ] -- - The model gives us an extra hierarchy through the prior on the means, leading to sharing of information across the groups, when estimating the group-specific means. -- - We set the variance, `\(\sigma^2\)`, as the same for all groups, to simplify posterior inference. -- - We will relax that assumption in this module. --- ## Hierarchical modeling of means and variances - Often researchers emphasize differences in means. However, variances can be very important. -- - If we think means vary across groups, why shouldn't we worry about variances also varying across groups? -- - In that case, we have the model .block[ .small[ $$ `\begin{split} y_{ij} | \theta_j, \sigma^2 & \sim \mathcal{N} \left( \theta_j, \sigma^2_j \right); \ \ \ i = 1,\ldots, n_j\\ \theta_j | \mu, \tau^2 & \sim \mathcal{N} \left( \mu, \tau^2 \right); \ \ \ j = 1, \ldots, J, \end{split}` $$ ] ] -- - However, now we also need a model on all the `\(\sigma^2_j\)`'s that lets us borrow information about across groups. --- ## Posterior inference - Now we need to find a semi-conjugate distribution for the `\(\sigma^2_j\)`'s. Before, with one `\(\sigma^2\)`, we had .block[ .small[ $$ `\begin{split} \pi(\sigma^2) & = \mathcal{IG} \left(\dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right),\\ \end{split}` $$ ] ] which was nicely semi-conjugate. -- - That suggests that maybe we should start with. .block[ .small[ $$ `\begin{split} \sigma^2_1, \ldots, \sigma^2_J | \nu_0, \sigma_0^2 & \sim \mathcal{IG} \left(\dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right) \end{split}` $$ ] ] -- - However, if we just fix the hyperparameters `\(\nu_0\)` and `\(\sigma_0^2\)` in advance, the prior on the `\(\sigma^2_j\)`'s does not allow borrowing of information across other values of `\(\sigma^2_j\)`, to aid in estimation. -- - Thus, we actually need to treat `\(\nu_0\)` and `\(\sigma_0^2\)` as parameters in a hierarchical model for both means and variances. --- ## Posterior inference - Therefore, the full posterior is now: .block[ .small[ $$ `\begin{split} \pi(\theta_1, \ldots, \theta_J, \sigma^2_1, \ldots, \sigma^2_J, \mu,\tau^2,\nu_0, \sigma_0^2 | Y) & \boldsymbol{\propto} p(y | \theta_1, \ldots, \theta_J, \sigma^2_1, \ldots, \sigma^2_J, \mu, \tau^2, \nu_0, \sigma_0^2)\\ & \ \ \ \ \times p(\theta_1, \ldots, \theta_J | \sigma^2_1, \ldots, \sigma^2_J, \mu, \tau^2, \nu_0, \sigma_0^2)\\ & \ \ \ \ \times p(\sigma^2_1, \ldots, \sigma^2_J | \mu, \tau^2,\nu_0, \sigma_0^2)\\ & \ \ \ \ \times \pi(\mu, \tau^2,\nu_0, \sigma_0^2)\\ \\ & \boldsymbol{=} p(y | \theta_1, \ldots, \theta_J, \sigma^2_1, \ldots, \sigma^2_J )\\ & \ \ \ \ \times p(\theta_1, \ldots, \theta_J | \mu,\tau^2)\\ & \ \ \ \ \times p(\sigma^2_1, \ldots, \sigma^2_J | \nu_0, \sigma_0^2)\\ & \ \ \ \ \times \pi(\mu) \cdot \pi(\tau^2) \cdot \pi(\nu_0) \cdot \pi(\sigma_0^2)\\ \\ & \boldsymbol{=} \left\{ \prod_{j=1}^{J} \prod_{i=1}^{n_j} p(y_{ij} | \theta_j, \sigma^2_j ) \right\}\\ & \ \ \ \ \times \left\{ \prod_{j=1}^{J} p(\theta_j | \mu,\tau^2) \right\}\\ & \ \ \ \ \times \left\{ \prod_{j=1}^{J} p(\sigma^2_j | \nu_0, \sigma_0^2) \right\}\\ & \ \ \ \ \times \pi(\mu) \cdot \pi(\tau^2) \cdot \pi(\nu_0) \cdot \pi(\sigma_0^2)\\ \end{split}` $$ ] ] --- ## Full conditionals - Notice that this new factorization won't affect the full conditionals for `\(\mu\)` and `\(\tau^2\)` from before, since those have nothing to do with all the new `\(\sigma^2_j\)`'s. -- - That is, .block[ .small[ $$ `\begin{split} & \pi(\mu | \cdots \cdots ) = \mathcal{N}\left(\mu_n, \gamma^2_n \right) \ \ \ \ \textrm{where}\\ \\ & \gamma^2_n = \dfrac{1}{ \dfrac{J}{\tau^2} + \dfrac{1}{\gamma_0^2} } ; \ \ \ \ \ \ \ \ \mu_n = \gamma^2_n \left[ \dfrac{J}{\tau^2} \bar{\theta} + \dfrac{1}{\gamma_0^2} \mu_0 \right], \end{split}` $$ ] ] and .block[ .small[ $$ `\begin{split} & \pi(\tau^2 | \cdots \cdots ) = \mathcal{IG} \left(\dfrac{\eta_n}{2}, \dfrac{\eta_n\tau_n^2}{2}\right) \ \ \ \ \textrm{where}\\ \\ & \eta_n = \eta_0 + J ; \ \ \ \ \ \ \ \tau_n^2 = \dfrac{1}{\eta_n} \left[\eta_0\tau_0^2 + \sum\limits_{j=1}^{J} (\theta_j - \mu)^2 \right].\\ \end{split}` $$ ] ] --- ## Full conditionals - The full conditional for each `\(\theta_j\)`, we have .block[ .small[ $$ `\begin{split} \pi(\theta_j | \theta_{-j}, \mu, \sigma^2_1, \ldots, \sigma^2_J,\tau^2, Y) & \boldsymbol{\propto} \left\{ \prod_{i=1}^{n_j} p(y_{ij} | \theta_j, \sigma^2_j ) \right\} \cdot p(\theta_j | \mu,\tau^2) \\ \end{split}` $$ ] ] with the only change from before being `\(\sigma^2_j\)`. -- - That is, those terms still include a normal density for `\(\theta_j\)` multiplied by a product of normals in which `\(\theta_j\)` is the mean, again mirroring the previous case, so you can show that .block[ .small[ $$ `\begin{split} \pi(\theta_j | \theta_{-j}, \mu, \sigma^2_1, \ldots, \sigma^2_J,\tau^2, Y) & = \mathcal{N}\left(\mu_j^\star, \tau_j^\star \right) \ \ \ \ \textrm{where}\\ \\ \tau_j^\star & = \dfrac{1}{ \dfrac{n_j}{\sigma^2_j} + \dfrac{1}{\tau^2} } ; \ \ \ \ \ \ \ \mu_j^\star = \tau_j^\star \left[ \dfrac{n_j}{\sigma^2_j} \bar{y}_j + \dfrac{1}{\tau^2} \mu \right] \end{split}` $$ ] ] --- ## How about within-group variances? - Before we get to the choice of the priors for `\(\nu_0\)` and `\(\sigma_0^2\)`, we have enough to derive the full conditional for each `\(\sigma^2_j\)`. This actually takes a similar form to what we had before we indexed by `\(j\)`, that is, .block[ .small[ $$ `\begin{split} \pi(\sigma^2_j | \sigma^2_{-j}, \theta_1, \ldots, \theta_J, \mu, \tau^2, \nu_0, \sigma_0^2,Y) & \boldsymbol{\propto} \left\{ \prod_{i=1}^{n_j} p(y_{ij} | \theta_j, \sigma^2_j ) \right\} \cdot \pi(\sigma^2_j | \nu_0, \sigma_0^2)\\ \end{split}` $$ ] ] -- - This still looks like what we had before, that is, products of normals and one inverse-gamma, so that .block[ .small[ $$ `\begin{split} \pi(\sigma^2_j | \sigma^2_{-j}, \theta_1, \ldots, \theta_J, \mu, \tau^2, \nu_0, \sigma_0^2, Y) & = \mathcal{IG} \left(\dfrac{\nu_{j}^\star}{2}, \dfrac{\nu_{j}^\star\sigma_{j}^{2(\star)}}{2}\right) \ \ \ \ \textrm{where}\\ \\ \nu_{j}^\star = \nu_0 + n_j ; \ \ \ \ \ \ \ \sigma_{j}^{2(\star)} & = \dfrac{1}{\nu_{j}^\star} \left[\nu_0\sigma_0^2 + \sum\limits_{i=1}^{n_j} (y_{ij} - \theta_j)^2 \right].\\ \end{split}` $$ ] ] --- ## Remaining hyper-priors - Now we can get back to priors for `\(\nu_0\)` and `\(\sigma_0^2\)`. Turns out that a semi-conjugate prior for `\(\sigma_0^2\)` (you have seen this on the homework) is a gamma distribution. That is, if we set .block[ .small[ $$ `\begin{split} \pi(\sigma_0^2) & = \mathcal{Ga} \left(a,b\right),\\ \end{split}` $$ ] ] then, .block[ .small[ $$ `\begin{split} \pi(\sigma_0^2 | \theta_1, \ldots, \theta_J, \sigma^2_1, \ldots, \sigma^2_J, \mu, \tau^2, \nu_0, Y) & \boldsymbol{\propto} \left\{ \prod_{j=1}^{J} p(\sigma^2_j | \nu_0, \sigma_0^2) \right\} \cdot \pi(\sigma_0^2)\\ & \propto \ \mathcal{IG} \left(\sigma^2_j; \dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right) \ \cdot \ \mathcal{Ga} \left(\sigma_0^2 ; a,b\right) \\ \end{split}` $$ ] ] -- - Recall that + `\(\mathcal{Ga}(y; a, b) \equiv \dfrac{b^a}{\Gamma(a)} y^{a - 1} e^{-b y}\)`, and + `\(\mathcal{IG}(y; a, b) \equiv \dfrac{b^a}{\Gamma(a)} y^{-(a+1)} e^{-\dfrac{b}{y}}\)`. --- ## Remaining hyper-priors - So `\(\pi(\sigma_0^2 | \theta_1, \ldots, \theta_J, \sigma^2_1, \ldots, \sigma^2_J, \mu, \tau^2, \nu_0, Y)\)` .block[ .small[ $$ `\begin{split} & \boldsymbol{\propto} \left\{ \prod_{j=1}^{J} p(\sigma^2_j | \nu_0, \sigma_0^2) \right\} \cdot \pi(\sigma_0^2)\\ & \boldsymbol{\propto} \ \prod_{j=1}^{J} \ \mathcal{IG} \left(\sigma^2_j; \dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right) \ \cdot \ \mathcal{Ga} \left(\sigma_0^2 ; a,b\right) \\ & \boldsymbol{=} \left[ \prod_{j=1}^{J} \dfrac{\left( \dfrac{\nu_0\sigma_0^2}{2} \right)^\left(\dfrac{\nu_0}{2} \right)}{\Gamma\left(\dfrac{\nu_0}{2} \right)} (\sigma^2_j)^{-\left(\dfrac{\nu_0}{2}+1 \right)} e^{-\dfrac{\nu_0\sigma_0^2}{2 (\sigma^2_j)}} \right] \cdot \left[ \dfrac{b^a}{\Gamma(a)} (\sigma_0^2)^{a - 1} e^{-b \sigma_0^2} \right] \\ & \boldsymbol{\propto} \left[ \prod_{j=1}^{J} \left( \sigma_0^2 \right)^\left(\dfrac{\nu_0}{2} \right) e^{-\dfrac{\nu_0\sigma_0^2}{2 (\sigma^2_j)}} \right] \cdot \left[ (\sigma_0^2)^{a - 1} e^{-b \sigma_0^2} \right] \\ & \boldsymbol{\propto} \left[ \left( \sigma_0^2 \right)^\left(\dfrac{J \nu_0}{2} \right) e^{- \sigma_0^2 \left[ \dfrac{\nu_0}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right]} \right] \cdot \left[ (\sigma_0^2)^{a - 1} e^{-b \sigma_0^2} \right] \\ \end{split}` $$ ] ] --- ## Remaining hyper-priors - That is, the full conditional is .block[ .small[ $$ `\begin{split} \pi(\sigma_0^2 | \cdots \cdots ) & \boldsymbol{\propto} \left[ \left( \sigma_0^2 \right)^\left(\dfrac{J \nu_0}{2} \right) e^{- \sigma_0^2 \left[ \dfrac{\nu_0}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right]} \right] \cdot \left[ (\sigma_0^2)^{a - 1} e^{-b \sigma_0^2} \right] \\ & \boldsymbol{\propto} \left[ \left( \sigma_0^2 \right)^\left( a + \dfrac{J \nu_0}{2} - 1 \right) e^{- \sigma_0^2 \left[ b + \dfrac{\nu_0}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right]} \right] \\ & \equiv \mathcal{Ga} \left(\sigma_0^2 ; a_n ,b_n \right), \end{split}` $$ ] ] where .block[ .small[ $$ `\begin{split} a_n = a + \dfrac{J \nu_0}{2}; \ \ \ \ b_n = b + \dfrac{\nu_0}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j}. \end{split}` $$ ] ] --- ## Remaining hyper-priors - Ok that leaves us with one parameter to go, i.e., `\(\nu_0\)`. Turns out there is no simple conjugate/semi-conjugate prior for `\(\nu_0\)`. -- - Common practice is to restrict `\(\nu_0\)` to be an integer (which makes sense when we think of it as being degrees of freedom, which also means it cannot be zero). With the restriction, we need a discrete distribution as the prior with support on `\(\nu_0 = 1, 2, 3, \ldots\)`. -- - **Question: Can we use either a binomial or a Poisson prior on for `\(\nu_0\)`?** -- - A popular choice is the geometric distribution with pmf `\(p(\nu_0) = (1-p)^{\nu_0-1} p\)`. -- - However, we will rewrite the kernel as `\(\pi(\nu_0) \propto e^{-\alpha \nu_0}\)`. How did we get here from the geometric pmf and what is `\(\alpha\)`? --- ## Final full conditional - With this prior, `\(\pi(\nu_0 | \theta_1, \ldots, \theta_J, \sigma^2_1, \ldots, \sigma^2_J, \mu, \tau^2, \sigma_0^2, Y)\)` .block[ .small[ $$ `\begin{split} & \boldsymbol{\propto} \left\{ \prod_{j=1}^{J} p(\sigma^2_j | \nu_0, \sigma_0^2) \right\} \cdot \pi(\nu_0)\\ & \propto \ \prod_{j=1}^{J} \ \mathcal{IG} \left(\sigma^2_j; \dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right) \ \cdot \ e^{-\alpha \nu_0} \\ & \boldsymbol{=} \left[ \prod_{j=1}^{J} \dfrac{\left( \dfrac{\nu_0\sigma_0^2}{2} \right)^\left(\dfrac{\nu_0}{2} \right)}{\Gamma\left(\dfrac{\nu_0}{2} \right)} \left(\sigma^2_j\right)^{-\left(\dfrac{\nu_0}{2}+1 \right)} e^{-\dfrac{\nu_0\sigma_0^2}{2 (\sigma^2_j)}} \right] \cdot e^{-\alpha \nu_0} \\ & \boldsymbol{\propto} \left[ \left( \dfrac{\left( \dfrac{\nu_0\sigma_0^2}{2} \right)^\left(\dfrac{\nu_0}{2} \right)}{\Gamma\left(\dfrac{\nu_0}{2} \right)} \right)^J \cdot \left(\prod_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right)^{\left(\dfrac{\nu_0}{2}+1 \right)} \cdot e^{- \nu_0 \left[ \dfrac{\sigma_0^2}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right]} \right] \cdot e^{-\alpha \nu_0} \\ \end{split}` $$ ] ] --- ## Final full conditional - That is, the full conditional is .block[ .small[ $$ `\begin{split} \pi(\nu_0 | \cdots \cdots ) & \boldsymbol{\propto} \left[ \left( \dfrac{\left( \dfrac{\nu_0\sigma_0^2}{2} \right)^\left(\dfrac{\nu_0}{2} \right)}{\Gamma\left(\dfrac{\nu_0}{2} \right)} \right)^J \cdot \left(\prod_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right)^{\left(\dfrac{\nu_0}{2}+1 \right)} \cdot e^{- \nu_0 \left[ \alpha + \dfrac{\sigma_0^2}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right]} \right], \\ \end{split}` $$ ] ] which is not a well known kernel and is unnormalized (i.e., does not integrate to 1 in its current form). -- - This sure looks like a lot, but it will be relatively easy to compute in R. -- - Now, technically, the support is `\(\nu_0 = 1, 2, 3, \ldots\)`. -- - However, at every iteration, we can compute this unnormalized density across a grid of `\(\nu_0\)` values, say, `\(\nu_0 = 1, 2, 3, \ldots, K\)` for some large `\(K\)`, and then sample. --- ## Final full conditional - One more thing, computing these probabilities on the raw scale can be problematic especially because of the product inside. Good idea to transform to the log scale instead. -- - That is, .block[ .small[ $$ `\begin{split} \pi(\nu_0 | \cdots \cdots ) & \boldsymbol{\propto} \left[ \left( \dfrac{\left( \dfrac{\nu_0\sigma_0^2}{2} \right)^\left(\dfrac{\nu_0}{2} \right)}{\Gamma\left(\dfrac{\nu_0}{2} \right)} \right)^J \cdot \left(\prod_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right)^{\left(\dfrac{\nu_0}{2}-1 \right)} \cdot e^{- \nu_0 \left[ \alpha + \dfrac{\sigma_0^2}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right]} \right]\\ \\ \Rightarrow \ \text{ln} \pi(\nu_0 | \cdots \cdots ) & \boldsymbol{\propto} \left(\dfrac{J\nu_0}{2} \right) \text{ln} \left( \dfrac{\nu_0\sigma_0^2}{2} \right) - J\text{ln}\left[ \Gamma\left(\dfrac{\nu_0}{2} \right) \right] \\ & \ \ \ \ + \left(\dfrac{\nu_0}{2}+1 \right) \left(\sum_{j=1}^{J} \text{ln} \left[\dfrac{1}{\sigma^2_j} \right] \right) \\ & \ \ \ \ - \nu_0 \left[ \alpha + \dfrac{\sigma_0^2}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right] \\ \end{split}` $$ ] ] --- ## Full model As a recap, the final model is therefore: .block[ .small[ $$ `\begin{split} y_{ij} | \theta_j, \sigma^2 & \sim \mathcal{N} \left( \theta_j, \sigma^2_j \right); \ \ \ i = 1,\ldots, n_j; \ \ \ j = 1, \ldots, J\\ \\ \\ \theta_j | \mu, \tau^2 & \sim \mathcal{N} \left( \mu, \tau^2 \right); \ \ \ j = 1, \ldots, J\\ \\ \sigma^2_1, \ldots, \sigma^2_J | \nu_0, \sigma_0^2 & \sim \mathcal{IG} \left(\dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right); \ \ \ j = 1, \ldots, J\\ \\ \\ \mu & \sim \mathcal{N}\left(\mu_0, \gamma^2_0\right)\\ \\ \tau^2 & \sim \mathcal{IG} \left(\dfrac{\eta_0}{2}, \dfrac{\eta_0\tau_0^2}{2}\right).\\ \\ \\ \pi(\nu_0) & \propto e^{-\alpha \nu_0} \\ \\ \sigma_0^2 & \sim \mathcal{Ga} \left(a,b\right).\\ \end{split}` $$ ] ] --- class: center, middle # What's next? ### Move on to the readings for the next module!