STA 360/602L: Module 7.4

class: center, middle, inverse, title-slide

# STA 360/602L: Module 7.4
## Metropolis within Gibbs
### Dr. Olanrewaju Michael Akande

---

## Combining Metropolis and Gibbs

- It is often the case that full conditionals are available for some parameters but not all.

- Very useful trick is to combine Gibbs and Metropolis.

- We will illustrate this by analyzing time series data on global warming.

---
## Carbon dioxide and temperature

- Data are from analysis of ice cores from East Antarctica

- Temperature (recorded in terms of difference from current present temp in degrees `$C$`) and `$\text{CO}_2$` (measured in ppm by volume) are standardized to have mean `$0$` and variance `$1$`.

- `$200$` values, each roughly `$2000$` years apart.

- `$\text{CO}_2$` values matched with temperature values roughly `$1000$` years later.

---
## Data

`$\text{CO}_2$` and temperature follow similar patterns over time.

---
## Inference

- Interest lies in predicting temperature as a function of `$\text{CO}_2$`.

- In these data, the error terms are temporally correlated so that a reasonable model for temperature is
.block[
`$$\boldsymbol{Y} \sim \mathcal{N}_n(\boldsymbol{X}\boldsymbol{\beta}, \Sigma),$$`
]

where `$\boldsymbol{X}$` contains a column for the intercept plus a column for `$\text{CO}_2$`, and `$\Sigma$` has a first-order autoregressive structure so that:
.small[
$$
\Sigma = \sigma^2 \boldsymbol{C}_\rho = \sigma^2
`\begin{bmatrix}
1 & \rho & \rho^2 & \ldots & \rho^{n-1} \\
\rho & 1 & \rho & \ldots & \rho^{n-2} \\
\rho^2 & \rho & 1 & \ldots & \rho^{n-3} \\
\vdots & \vdots & \vdots & \vdots & \vdots \\
\rho^{n-1} & \rho^{n-2} & \rho^{n-3} & \ldots & 1 \\
\end{bmatrix}`
$$
]

- The covariance model assumes constant variance but a decreasing correlation as the time between temperature measures is greater.

---
## Posterior inference

- We need to specify prior distributions for `$\boldsymbol{\beta}$`, `$\sigma^2$` and `$\rho$`.

- If we assume
.block[
`$$\pi(\boldsymbol{\beta}) = \mathcal{N}_p(\boldsymbol{\mu}_0, \Lambda_0),$$`
]

then
.block[
$$
`\begin{split}
\pi(\boldsymbol{\beta} | \boldsymbol{y}, \boldsymbol{X}, \sigma^2, \rho) & = \ \mathcal{N}_p(\boldsymbol{\mu}_n, \Lambda_n),\\
\end{split}`
$$
]

--
  where
.block[
$$
`\begin{split}
\Lambda_n & = \left[\Lambda_0^{-1} + \frac{1}{\sigma^2} \boldsymbol{X}^T \boldsymbol{C}_\rho^{-1}\boldsymbol{X} \right]^{-1}\\
\\
\boldsymbol{\mu}_n & = \Lambda_n \left[\Lambda_0^{-1}\boldsymbol{\mu}_0 + \frac{1}{\sigma^2} \boldsymbol{X}^T \boldsymbol{C}_\rho^{-1}\boldsymbol{y} \right].
\end{split}`
$$
]

---
## Posterior inference

- If we assume
.block[
`$$\pi(\sigma^2) = \mathcal{IG} \left(\frac{\nu_0}{2}, \frac{\nu_0\sigma_0^2}{2}\right),$$`
]

then
.block[
$$
`\begin{split}
\pi(\sigma^2 | \boldsymbol{y}, \boldsymbol{X}, \boldsymbol{\beta}, \rho) & = \mathcal{IG} \left(\dfrac{\nu_n}{2}, \dfrac{\nu_n\sigma_n^2}{2}\right), \\
\end{split}`
$$
]

--
  where
.block[
$$
`\begin{split}
\nu_n & = \nu_0 + n\\
\\
\sigma_n^2  & = \dfrac{1}{\nu_n} \left[ \nu_0 \sigma_0^2 + (\boldsymbol{y} - \boldsymbol{X}\boldsymbol{\beta})^T \boldsymbol{C}_\rho^{-1} (\boldsymbol{y} - \boldsymbol{X}\boldsymbol{\beta}) \right] = \dfrac{1}{\nu_n} \left[ \nu_0 \sigma_0^2 + \text{SSR}(\boldsymbol{\beta},\rho) \right].
\end{split}`
$$
]

- Therefore, given `$\rho$`, we can use Gibbs sampling to cycle through the full conditionals for `$\boldsymbol{\beta}$` and `$\sigma^2$`.

---
## Posterior inference

- Next, we need a prior for the correlation `$\rho$`. There is no semi-conjugate option here.

- Since we expect `$\rho$` to be positive, we could use `$\pi(\rho) = \text{Unif}(0,1)$`.

- Unfortunately, this does not lead to a standard full conditional.

- However, we can use Metropolis-Hastings for the resulting full conditional for `$\rho$`. Actually, if we could come up with a symmetric proposal for `$\rho$`, we can just use the Metropolis algorithm.

- So, technically, we have a Gibbs sampler since we will cycle through full conditionals. However, the sampling step for `$\rho$` will rely on Metropolis. 
--

- Therefore, we have a .hlight[Metropolis within Gibbs] sampler.

---
## Posterior inference

- Update for  `$\rho$` (Metropolis) at iteration `$(s+1)$`:
  
  1. Generate a _candidate_ value `$\rho^\star \sim \text{Unif}(\rho^{(s)} - \delta, \rho^{(s)} + \delta)$`. If `$\rho^\star < 0$`, reassign as `$|\rho^\star|$`. If `$\rho^\star > 1$`, reassign as `$2-\rho^\star$`.
  
      *I leave the proof that this "reflecting random walk" is symmetric to you.*
  
--

2. Compute the acceptance ratio
  .block[
.small[
$$
`\begin{split}
r & = \frac{p(\boldsymbol{y} | \boldsymbol{X}, \boldsymbol{\beta}^{(s+1)}, \sigma^{2(s+1)}, \rho^\star) \cdot \pi(\rho^\star)}{p(\boldsymbol{y} | \boldsymbol{X}, \boldsymbol{\beta}^{(s+1)}, \sigma^{2(s+1)}, \rho^{(s)}) \cdot \pi(\rho^{(s)})}.
\end{split}`
$$
]
]

3. Sample `$u \sim U(0,1)$` independently and set
  .block[
.small[
`\begin{eqnarray*}
\rho^{(s+1)} = \left\{ \begin{array}{ll}
\rho^\star & \quad \text{if} \quad u < r \\
\rho^{(s)} & \quad \text{if} \quad \text{otherwise} \\
\end{array} \right. .
\end{eqnarray*}`
]
]

- So, for each iteration, we first sample from the full conditionals for `$\boldsymbol{\beta}$` and `$\sigma^2$`, and then use this step to update `$\rho$`.

---
class: center, middle

# Move to the R script [here](https://sta-602l-s21.github.io/Course-Website/slides/IceCore.R).

---

class: center, middle

# What's next?

### Move on to the readings for the next module!