STA 360/602L: Module 5.4

class: center, middle, inverse, title-slide

# STA 360/602L: Module 5.4
## Hierarchical normal modeling of means and variances
### Dr. Olanrewaju Michael Akande

---

## Hierarchical modeling of means recap

- We've looked at the hierarchical normal model of the form
.block[
.small[
$$
`\begin{split}
y_{ij} | \theta_j, \sigma^2 & \sim \mathcal{N} \left( \theta_j, \sigma^2 \right); \ \ \ i = 1,\ldots, n_j\\
\theta_j | \mu, \tau^2 & \sim \mathcal{N} \left( \mu, \tau^2 \right); \ \ \ j = 1, \ldots, J.
\end{split}`
$$
]
]

- The model gives us an extra hierarchy through the prior on the means, leading to sharing of information across the groups, when estimating the group-specific means.
  
--

- We set the variance, `$\sigma^2$`, as the same for all groups, to simplify posterior inference.

- We will relax that assumption in this module.

---
## Hierarchical modeling of means and variances

- Often researchers emphasize differences in means. However, variances can be very important.

- If we think means vary across groups, why shouldn't we worry about variances also varying across groups?

- In that case, we have the model
.block[
.small[
$$
`\begin{split}
y_{ij} | \theta_j, \sigma^2 & \sim \mathcal{N} \left( \theta_j, \sigma^2_j \right); \ \ \ i = 1,\ldots, n_j\\
\theta_j | \mu, \tau^2 & \sim \mathcal{N} \left( \mu, \tau^2 \right); \ \ \ j = 1, \ldots, J,
\end{split}`
$$
]
]

- However, now we also need a model on all the `$\sigma^2_j$`'s that lets us borrow information about across groups.

---
## Posterior inference

- Now we need to find a semi-conjugate distribution for the `$\sigma^2_j$`'s. Before, with one `$\sigma^2$`, we had
.block[
.small[
$$
`\begin{split}
\pi(\sigma^2) & = \mathcal{IG} \left(\dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right),\\
\end{split}`
$$
]
]

which was nicely semi-conjugate.

- That suggests that maybe we should start with.
.block[
.small[
$$
`\begin{split}
\sigma^2_1, \ldots, \sigma^2_J | \nu_0, \sigma_0^2 & \sim \mathcal{IG} \left(\dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right)
\end{split}`
$$
]
]

- However, if we just fix the hyperparameters `$\nu_0$` and `$\sigma_0^2$` in advance, the prior on the `$\sigma^2_j$`'s does not allow borrowing of information across other values of `$\sigma^2_j$`, to aid in estimation.

- Thus, we actually need to treat `$\nu_0$` and `$\sigma_0^2$` as parameters in a hierarchical model for both means and variances.

---
## Posterior inference

- Therefore, the full posterior is now:
.block[
.small[
$$
`\begin{split}
\pi(\theta_1, \ldots, \theta_J, \sigma^2_1, \ldots, \sigma^2_J, \mu,\tau^2,\nu_0, \sigma_0^2 | Y) & \boldsymbol{\propto} p(y | \theta_1, \ldots, \theta_J, \sigma^2_1, \ldots, \sigma^2_J, \mu, \tau^2, \nu_0, \sigma_0^2)\\
& \ \ \ \ \times p(\theta_1, \ldots, \theta_J | \sigma^2_1, \ldots, \sigma^2_J, \mu, \tau^2, \nu_0, \sigma_0^2)\\
& \ \ \ \ \times p(\sigma^2_1, \ldots, \sigma^2_J | \mu, \tau^2,\nu_0, \sigma_0^2)\\
& \ \ \ \ \times \pi(\mu, \tau^2,\nu_0, \sigma_0^2)\\
\\
& \boldsymbol{=} p(y | \theta_1, \ldots, \theta_J, \sigma^2_1, \ldots, \sigma^2_J )\\
& \ \ \ \ \times p(\theta_1, \ldots, \theta_J | \mu,\tau^2)\\
& \ \ \ \ \times p(\sigma^2_1, \ldots, \sigma^2_J | \nu_0, \sigma_0^2)\\
& \ \ \ \ \times \pi(\mu) \cdot \pi(\tau^2) \cdot \pi(\nu_0) \cdot \pi(\sigma_0^2)\\
\\
& \boldsymbol{=} \left\{ \prod_{j=1}^{J} \prod_{i=1}^{n_j} p(y_{ij} | \theta_j, \sigma^2_j ) \right\}\\
& \ \ \ \ \times \left\{ \prod_{j=1}^{J} p(\theta_j | \mu,\tau^2) \right\}\\
& \ \ \ \ \times \left\{ \prod_{j=1}^{J} p(\sigma^2_j | \nu_0, \sigma_0^2) \right\}\\
& \ \ \ \ \times \pi(\mu)  \cdot \pi(\tau^2) \cdot \pi(\nu_0) \cdot \pi(\sigma_0^2)\\
\end{split}`
$$
]
]

---
## Full conditionals

- Notice that this new factorization won't affect the full conditionals for `$\mu$` and `$\tau^2$` from before, since those have nothing to do with all the new `$\sigma^2_j$`'s.

- That is,
.block[
.small[
$$
`\begin{split}
& \pi(\mu | \cdots \cdots ) = \mathcal{N}\left(\mu_n, \gamma^2_n \right) \ \ \ \ \textrm{where}\\
\\
& \gamma^2_n = \dfrac{1}{ \dfrac{J}{\tau^2} + \dfrac{1}{\gamma_0^2}  } ; \ \ \ \ \ \ \ \  \mu_n = \gamma^2_n \left[ \dfrac{J}{\tau^2} \bar{\theta} + \dfrac{1}{\gamma_0^2} \mu_0 \right],
\end{split}`
$$
]
]

and
.block[
.small[
$$
`\begin{split}
& \pi(\tau^2 | \cdots \cdots ) = \mathcal{IG} \left(\dfrac{\eta_n}{2}, \dfrac{\eta_n\tau_n^2}{2}\right) \ \ \ \ \textrm{where}\\
\\
& \eta_n = \eta_0 + J ; \ \ \ \ \ \ \ \tau_n^2 = \dfrac{1}{\eta_n} \left[\eta_0\tau_0^2 + \sum\limits_{j=1}^{J} (\theta_j - \mu)^2 \right].\\
\end{split}`
$$
]
]

---
## Full conditionals

- The full conditional for each `$\theta_j$`, we have
.block[
.small[
$$
`\begin{split}
\pi(\theta_j | \theta_{-j}, \mu, \sigma^2_1, \ldots, \sigma^2_J,\tau^2, Y) & \boldsymbol{\propto} \left\{ \prod_{i=1}^{n_j} p(y_{ij} | \theta_j, \sigma^2_j ) \right\} \cdot p(\theta_j | \mu,\tau^2) \\
\end{split}`
$$
]
]

with the only change from before being `$\sigma^2_j$`.

- That is, those terms still include a normal density for `$\theta_j$` multiplied by a product of normals in which `$\theta_j$` is the mean, again mirroring the previous case, so you can show that
.block[
.small[
$$
`\begin{split}
\pi(\theta_j | \theta_{-j}, \mu, \sigma^2_1, \ldots, \sigma^2_J,\tau^2, Y) & = \mathcal{N}\left(\mu_j^\star, \tau_j^\star \right) \ \ \ \ \textrm{where}\\
\\
\tau_j^\star & = \dfrac{1}{ \dfrac{n_j}{\sigma^2_j} + \dfrac{1}{\tau^2}  } ; \ \ \ \ \ \ \ \mu_j^\star = \tau_j^\star \left[ \dfrac{n_j}{\sigma^2_j} \bar{y}_j + \dfrac{1}{\tau^2} \mu \right]
\end{split}`
$$
]
]

---
## How about within-group variances?

- Before we get to the choice of the priors for `$\nu_0$` and `$\sigma_0^2$`, we have enough to derive the full conditional for each `$\sigma^2_j$`. This actually takes a similar form to what we had before we indexed by `$j$`, that is,
.block[
.small[
$$
`\begin{split}
\pi(\sigma^2_j | \sigma^2_{-j}, \theta_1, \ldots, \theta_J, \mu, \tau^2, \nu_0, \sigma_0^2,Y) & \boldsymbol{\propto} \left\{ \prod_{i=1}^{n_j} p(y_{ij} | \theta_j, \sigma^2_j ) \right\} \cdot \pi(\sigma^2_j | \nu_0, \sigma_0^2)\\
\end{split}`
$$
]
]

- This still looks like what we had before, that is, products of normals and one inverse-gamma, so that
.block[
.small[
$$
`\begin{split}
\pi(\sigma^2_j | \sigma^2_{-j}, \theta_1, \ldots, \theta_J, \mu, \tau^2, \nu_0, \sigma_0^2, Y) & = \mathcal{IG} \left(\dfrac{\nu_{j}^\star}{2}, \dfrac{\nu_{j}^\star\sigma_{j}^{2(\star)}}{2}\right) \ \ \ \ \textrm{where}\\
\\
\nu_{j}^\star = \nu_0 + n_j ; \ \ \ \ \ \ \ \sigma_{j}^{2(\star)} & = \dfrac{1}{\nu_{j}^\star} \left[\nu_0\sigma_0^2 + \sum\limits_{i=1}^{n_j} (y_{ij} - \theta_j)^2 \right].\\
\end{split}`
$$
]
]

---
## Remaining hyper-priors

- Now we can get back to priors for `$\nu_0$` and `$\sigma_0^2$`. Turns out that a semi-conjugate prior for `$\sigma_0^2$` (you have seen this on the homework) is a gamma distribution. That is, if we set
.block[
.small[
$$
`\begin{split}
\pi(\sigma_0^2) & = \mathcal{Ga} \left(a,b\right),\\
\end{split}`
$$
]
]

then,
.block[
.small[
$$
`\begin{split}
\pi(\sigma_0^2 | \theta_1, \ldots, \theta_J, \sigma^2_1, \ldots, \sigma^2_J, \mu, \tau^2, \nu_0, Y) & \boldsymbol{\propto} \left\{ \prod_{j=1}^{J} p(\sigma^2_j | \nu_0, \sigma_0^2) \right\} \cdot \pi(\sigma_0^2)\\
& \propto \ \mathcal{IG} \left(\sigma^2_j; \dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right) \  \cdot \ \mathcal{Ga} \left(\sigma_0^2 ; a,b\right) \\
\end{split}`
$$
]
]

- Recall that
  + `$\mathcal{Ga}(y; a, b) \equiv \dfrac{b^a}{\Gamma(a)} y^{a - 1} e^{-b y}$`, and
  + `$\mathcal{IG}(y; a, b) \equiv \dfrac{b^a}{\Gamma(a)} y^{-(a+1)} e^{-\dfrac{b}{y}}$`.

---
## Remaining hyper-priors

- So `$\pi(\sigma_0^2 | \theta_1, \ldots, \theta_J, \sigma^2_1, \ldots, \sigma^2_J, \mu, \tau^2, \nu_0, Y)$`
.block[
.small[
$$
`\begin{split}
& \boldsymbol{\propto} \left\{ \prod_{j=1}^{J} p(\sigma^2_j | \nu_0, \sigma_0^2) \right\} \cdot \pi(\sigma_0^2)\\
& \boldsymbol{\propto} \ \prod_{j=1}^{J} \ \mathcal{IG} \left(\sigma^2_j; \dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right) \  \cdot \ \mathcal{Ga} \left(\sigma_0^2 ; a,b\right) \\
& \boldsymbol{=} \left[ \prod_{j=1}^{J} \dfrac{\left( \dfrac{\nu_0\sigma_0^2}{2} \right)^\left(\dfrac{\nu_0}{2} \right)}{\Gamma\left(\dfrac{\nu_0}{2} \right)} (\sigma^2_j)^{-\left(\dfrac{\nu_0}{2}+1 \right)} e^{-\dfrac{\nu_0\sigma_0^2}{2 (\sigma^2_j)}} \right]  \cdot \left[ \dfrac{b^a}{\Gamma(a)} (\sigma_0^2)^{a - 1} e^{-b \sigma_0^2} \right] \\
& \boldsymbol{\propto} \left[ \prod_{j=1}^{J} \left( \sigma_0^2 \right)^\left(\dfrac{\nu_0}{2} \right)  e^{-\dfrac{\nu_0\sigma_0^2}{2 (\sigma^2_j)}} \right]  \cdot \left[  (\sigma_0^2)^{a - 1} e^{-b \sigma_0^2} \right] \\
& \boldsymbol{\propto} \left[ \left( \sigma_0^2 \right)^\left(\dfrac{J \nu_0}{2} \right)  e^{- \sigma_0^2 \left[ \dfrac{\nu_0}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right]} \right]  \cdot \left[  (\sigma_0^2)^{a - 1} e^{-b \sigma_0^2} \right] \\
\end{split}`
$$
]
]

---
## Remaining hyper-priors

- That is, the full conditional is
.block[
.small[
$$
`\begin{split}
\pi(\sigma_0^2 | \cdots \cdots ) & \boldsymbol{\propto} \left[ \left( \sigma_0^2 \right)^\left(\dfrac{J \nu_0}{2} \right)  e^{- \sigma_0^2 \left[ \dfrac{\nu_0}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right]} \right]  \cdot \left[  (\sigma_0^2)^{a - 1} e^{-b \sigma_0^2} \right] \\
& \boldsymbol{\propto} \left[ \left( \sigma_0^2 \right)^\left( a + \dfrac{J \nu_0}{2} - 1 \right)  e^{- \sigma_0^2 \left[ b +  \dfrac{\nu_0}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right]} \right] \\
& \equiv \mathcal{Ga} \left(\sigma_0^2 ; a_n ,b_n \right),
\end{split}`
$$
]
]

where
.block[
.small[
$$
`\begin{split}
a_n = a + \dfrac{J \nu_0}{2}; \ \ \ \ b_n = b +  \dfrac{\nu_0}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j}.
\end{split}`
$$
]
]

---
## Remaining hyper-priors

- Ok that leaves us with one parameter to go, i.e., `$\nu_0$`. Turns out there is no simple conjugate/semi-conjugate prior for `$\nu_0$`.

- Common practice is to restrict  `$\nu_0$` to be an integer (which makes sense when we think of it as being degrees of freedom, which also means it cannot be zero). With the restriction, we need a discrete distribution as the prior with support on `$\nu_0 = 1, 2, 3, \ldots$`.

- **Question: Can we use either a binomial or a Poisson prior on for `$\nu_0$`?**

- A popular choice is the geometric distribution with pmf `$p(\nu_0) = (1-p)^{\nu_0-1} p$`.

- However, we will rewrite the kernel as `$\pi(\nu_0) \propto e^{-\alpha \nu_0}$`. How did we get here from the geometric pmf and what is `$\alpha$`?

---
## Final full conditional

- With this prior, `$\pi(\nu_0 | \theta_1, \ldots, \theta_J, \sigma^2_1, \ldots, \sigma^2_J, \mu, \tau^2, \sigma_0^2, Y)$`
.block[
.small[
$$
`\begin{split}
& \boldsymbol{\propto} \left\{ \prod_{j=1}^{J} p(\sigma^2_j | \nu_0, \sigma_0^2) \right\} \cdot \pi(\nu_0)\\
& \propto \ \prod_{j=1}^{J} \ \mathcal{IG} \left(\sigma^2_j; \dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right) \  \cdot \ e^{-\alpha \nu_0} \\
& \boldsymbol{=} \left[ \prod_{j=1}^{J} \dfrac{\left( \dfrac{\nu_0\sigma_0^2}{2} \right)^\left(\dfrac{\nu_0}{2} \right)}{\Gamma\left(\dfrac{\nu_0}{2} \right)} \left(\sigma^2_j\right)^{-\left(\dfrac{\nu_0}{2}+1 \right)} e^{-\dfrac{\nu_0\sigma_0^2}{2 (\sigma^2_j)}} \right]  \cdot e^{-\alpha \nu_0} \\
& \boldsymbol{\propto} \left[ \left( \dfrac{\left( \dfrac{\nu_0\sigma_0^2}{2} \right)^\left(\dfrac{\nu_0}{2} \right)}{\Gamma\left(\dfrac{\nu_0}{2} \right)} \right)^J \cdot \left(\prod_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right)^{\left(\dfrac{\nu_0}{2}+1 \right)} \cdot  e^{- \nu_0 \left[ \dfrac{\sigma_0^2}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right]} \right]  \cdot e^{-\alpha \nu_0} \\
\end{split}`
$$
]
]

---
## Final full conditional

- That is, the full conditional is
.block[
.small[
$$
`\begin{split}
\pi(\nu_0 | \cdots \cdots ) & \boldsymbol{\propto} \left[ \left( \dfrac{\left( \dfrac{\nu_0\sigma_0^2}{2} \right)^\left(\dfrac{\nu_0}{2} \right)}{\Gamma\left(\dfrac{\nu_0}{2} \right)} \right)^J \cdot \left(\prod_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right)^{\left(\dfrac{\nu_0}{2}+1 \right)} \cdot  e^{- \nu_0 \left[ \alpha +  \dfrac{\sigma_0^2}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right]} \right], \\
\end{split}`
$$
]
]

which is not a well known kernel and is unnormalized (i.e., does not integrate to 1 in its current form).

- This sure looks like a lot, but it will be relatively easy to compute in R.

- Now, technically, the support is `$\nu_0 = 1, 2, 3, \ldots$`.

- However, at every iteration, we can compute this unnormalized density across a grid of `$\nu_0$` values, say, `$\nu_0 = 1, 2, 3, \ldots, K$` for some large `$K$`, and then sample.

---
## Final full conditional

- One more thing, computing these probabilities on the raw scale can be problematic especially because of the product inside. Good idea to transform to the log scale instead.

- That is,
.block[
.small[
$$
`\begin{split}
\pi(\nu_0 | \cdots \cdots ) & \boldsymbol{\propto} \left[ \left( \dfrac{\left( \dfrac{\nu_0\sigma_0^2}{2} \right)^\left(\dfrac{\nu_0}{2} \right)}{\Gamma\left(\dfrac{\nu_0}{2} \right)} \right)^J \cdot \left(\prod_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right)^{\left(\dfrac{\nu_0}{2}-1 \right)} \cdot  e^{- \nu_0 \left[ \alpha +  \dfrac{\sigma_0^2}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right]} \right]\\
\\
\Rightarrow \ \text{ln} \pi(\nu_0 | \cdots \cdots ) &  \boldsymbol{\propto} \left(\dfrac{J\nu_0}{2} \right) \text{ln} \left( \dfrac{\nu_0\sigma_0^2}{2} \right) - J\text{ln}\left[ \Gamma\left(\dfrac{\nu_0}{2} \right) \right] \\
& \ \ \ \ + \left(\dfrac{\nu_0}{2}+1 \right) \left(\sum_{j=1}^{J} \text{ln} \left[\dfrac{1}{\sigma^2_j} \right] \right) \\
& \ \ \ \ - \nu_0 \left[ \alpha +  \dfrac{\sigma_0^2}{2} \sum\limits_{j=1}^{J} \dfrac{1}{\sigma^2_j} \right] \\
\end{split}`
$$
]
]

---
## Full model

As a recap, the final model is therefore:
.block[
.small[
$$
`\begin{split}
y_{ij} | \theta_j, \sigma^2 & \sim \mathcal{N} \left( \theta_j, \sigma^2_j \right); \ \ \ i = 1,\ldots, n_j; \ \ \ j = 1, \ldots, J\\
\\
\\
\theta_j | \mu, \tau^2 & \sim \mathcal{N} \left( \mu, \tau^2 \right); \ \ \ j = 1, \ldots, J\\
\\
\sigma^2_1, \ldots, \sigma^2_J | \nu_0, \sigma_0^2 & \sim \mathcal{IG} \left(\dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right); \ \ \ j = 1, \ldots, J\\
\\
\\
\mu & \sim \mathcal{N}\left(\mu_0, \gamma^2_0\right)\\
\\
\tau^2 & \sim \mathcal{IG} \left(\dfrac{\eta_0}{2}, \dfrac{\eta_0\tau_0^2}{2}\right).\\
\\
\\
\pi(\nu_0) & \propto e^{-\alpha \nu_0} \\
\\
\sigma_0^2 & \sim \mathcal{Ga} \left(a,b\right).\\
\end{split}`
$$
]
]

---

class: center, middle

# What's next?

### Move on to the readings for the next module!