STA 360/602L: Module 3.4

class: center, middle, inverse, title-slide

# STA 360/602L: Module 3.4
## The normal model: conditional inference for the mean
### Dr. Olanrewaju Michael Akande

---

## Normal model

- Suppose we have independent observations `$Y = (y_1,y_2,\ldots,y_n)$`, where each `$y_i \sim \mathcal{N}(\mu, \sigma^2)$` or `$y_i \sim \mathcal{N}(\mu, \tau^{-1})$`, with unknown parameters `$\mu$` and `$\sigma^2$` (or `$\tau$`).

- Then, the likelihood is
.block[
.small[
$$
`\begin{split}
P(Y| \mu,\sigma^2) & = \prod_{i=1}^n \dfrac{1}{\sqrt{2\pi}} \tau^{\frac{1}{2}} \ \textrm{exp}\left\{-\frac{1}{2} \tau (y_i-\mu)^2\right\}\\
& \propto \tau^{\frac{n}{2}} \ \textrm{exp}\left\{-\frac{1}{2} \tau \sum_{i=1}^n (y_i-\mu)^2\right\}\\
& \propto \tau^{\frac{n}{2}} \ \textrm{exp}\left\{-\frac{1}{2} \tau \sum_{i=1}^n \left[ (y_i-\bar{y}) - (\mu - \bar{y}) \right]^2 \right\}\\
\\
& \propto \tau^{\frac{n}{2}} \ \textrm{exp}\left\{-\frac{1}{2} \tau \left[ \sum_{i=1}^n (y_i-\bar{y})^2 + \sum_{i=1}^n(\mu - \bar{y})^2 \right] \right\}\\
& \propto \tau^{\frac{n}{2}} \ \textrm{exp}\left\{-\frac{1}{2} \tau \left[ \sum_{i=1}^n (y_i-\bar{y})^2 + n(\mu - \bar{y})^2 \right] \right\}\\
& \propto \tau^{\frac{n}{2}} \ \textrm{exp}\left\{-\frac{1}{2} \tau s^2(n-1) \right\} \ \textrm{exp}\left\{-\frac{1}{2} \tau n(\mu - \bar{y})^2 \right\}.\\
\end{split}`
$$
]
]

---
## Likelihood for normal model

- Likelihood:
.block[
.large[
`$$P(Y| \mu,\sigma^2) \propto \tau^{\frac{n}{2}} \ \textrm{exp}\left\{-\frac{1}{2} \tau s^2(n-1) \right\} \ \textrm{exp}\left\{-\frac{1}{2} \tau n(\mu - \bar{y})^2 \right\},$$`
]
]

where
  + `$\bar{y} =\sum_{i=1}^n y_i$` is the sample mean; and
  + `$s^2 = \sum_{i=1}^n (y_i-\bar{y})^2/(n-1)$` is the sample variance.

- Sufficient statistics:
  + Sample mean `$\bar{y}$`; and
  + Sample sum of squares `$SS = s^2(n-1) = \sum_{i=1}^n (y_i-\bar{y})^2$`.

- MLEs:
  + `$\hat{\mu} = \bar{y}$`.
  + `$\hat{\tau} = n/SS$`, and `$\hat{\sigma}^2 = SS/n$`.

---
## Inference for mean, conditional on variance

- We can break down inference problem for this two-parameter model into two one-parameter problems.

- First start by developing inference on `$\mu$` when `$\sigma^2$` is known. Turns out we can use a conjugate prior for `$\pi(\mu|\sigma^2)$`. We will get to unknown `$\sigma^2$` in the next module.

- For `$\sigma^2$` known, the normal likelihood further simplifies to
.block[
.small[
`$$\propto \ \textrm{exp}\left\{-\frac{1}{2} \tau n(\mu - \bar{y})^2 \right\},$$`
]
]

leaving out everything else that does not depend on `$\mu$`.
  
--

- For `$\pi(\mu|\sigma^2)$`, we consider `$\mathcal{N}(\mu_0, \sigma_0^2)$`, i.e., `$\mathcal{N}(\mu_0, \tau_0^{-1})$`, where `$\tau_0^{-1} = \sigma_0^2$`.

- Let's derive the posterior `$\pi(\mu|Y,\sigma^2)$`.

---
## Inference for mean, conditional on variance

- First, the prior `$\pi(\mu|\sigma^2) = \mathcal{N}(\mu_0, \tau_0^{-1})$` can be written as
.block[
.small[
$$
`\begin{split}
\Rightarrow \pi(\mu|\sigma^2) \ & = \ \dfrac{1}{\sqrt{2\pi}} \tau_0^{\frac{1}{2}} \cdot \textrm{exp}\left\{-\frac{1}{2} \tau_0 (\mu-\mu_0)^2) \right\} \\
\\
&  \propto \ \textrm{exp}\left\{-\frac{1}{2} \tau_0 (\mu^2 - 2\mu\mu_0 + \mu_0^2) \right\} \\
\\
&  \propto \ \textrm{exp}\left\{-\frac{1}{2} \tau_0 (\mu^2 - 2\mu\mu_0) \right\}.\\
\end{split}`
$$
]
]

- **When the normal density is written in this form, note the following details in the exponent.**

+ First, we must have `$\mu^2 - 2\mu$`, and whatever term we see multiplying `$2\mu$` must be the mean, in this case, `$\mu_0$`.
  
  + Second, the precision `$\tau_0$` is outside the parenthensis.

---
## Inference for mean, conditional on variance

- Now to the posterior:
.block[
.small[
`$$\pi(\mu|Y,\sigma^2) \ \propto \ \pi(\mu|\sigma^2) P(Y| \mu,\sigma^2) \ \propto \ \textrm{exp}\left\{-\frac{1}{2} \tau_0 (\mu - \mu_0)^2 \right\}\  \textrm{exp}\left\{-\frac{1}{2} \tau n(\mu - \bar{y})^2 \right\}$$`
]
]

- Expanding out squared terms
.block[
.small[
`$$\Rightarrow \pi(\mu|Y,\sigma^2) \ \propto \ \textrm{exp}\left\{-\frac{1}{2} \tau_0 (\mu^2 - 2\mu\mu_0 + \mu_0^2) \right\}\  \textrm{exp}\left\{-\frac{1}{2} \tau n(\mu^2 - 2\mu\bar{y} + \bar{y}^2) \right\}$$`
]
]

- Ignoring terms not containing `$\mu$`
.block[
.small[
$$
`\begin{split}
\Rightarrow \pi(\mu|Y,\sigma^2) \ & \propto \ \textrm{exp}\left\{-\frac{1}{2} \tau_0 (\mu^2 - 2\mu\mu_0) \right\}\  \textrm{exp}\left\{-\frac{1}{2} \tau n(\mu^2 - 2\mu\bar{y}) \right\}\\
\\
& = \ \textrm{exp}\left\{-\frac{1}{2} \left[\tau_0 (\mu^2 - 2\mu\mu_0)  + \tau n(\mu^2 - 2\mu\bar{y}) \right] \right\}\\
\\
& = \ \textrm{exp}\left\{-\frac{1}{2} \left[ \mu^2(\tau n + \tau_0) - 2\mu(\tau n\bar{y} + \tau_0\mu_0)   \right] \right\}.\\
\end{split}`
$$
]
]

---
## Inference for mean, conditional on variance

- This sort of looks like a normal kernel but we need to do a bit more work to get there.

- Particularly, we need to have it be of the form `$b(\mu^2 - 2\mu a)$`, so that we have `$a$` as the mean and `$b$` as the precision.

- We have
.block[
.small[
$$
`\begin{split}
\pi(\mu|Y,\sigma^2) \ & \propto \ \textrm{exp}\left\{-\frac{1}{2} \left[ \mu^2(\tau n + \tau_0) - 2\mu(\tau n\bar{y} + \tau_0\mu_0)   \right] \right\}\\
\\
& = \ \textrm{exp}\left\{-\frac{1}{2} \cdot (\tau n + \tau_0) \left[ \mu^2 - 2\mu \left( \frac{\tau n\bar{y} + \tau_0\mu_0}{\tau n + \tau_0} \right)   \right] \right\}.\\
\end{split}`
$$
]
]

which now looks like the kernel of a normal distribution.

---
## Posterior with precision terms

- Again, the posterior is
.block[
$$
`\begin{split}
\pi(\mu|Y,\sigma^2) \ & \propto \ \textrm{exp}\left\{-\frac{1}{2} \cdot (\tau n + \tau_0) \left[ \mu^2 - 2\mu \left( \frac{\tau n\bar{y} + \tau_0\mu_0}{\tau n + \tau_0} \right)   \right] \right\}.\\
\end{split}`
$$
]

- So, in terms of precision, we have
.block[
`$$\mu|Y,\sigma^2 \sim \mathcal{N}(\mu_n, \tau_n^{-1})$$`
]

where
.block[
`$$\mu_n = \dfrac{\tau n\bar{y} + \tau_0\mu_0}{\tau n + \tau_0}$$`
]

and
.block[
`$$\tau_n = \tau n + \tau_0.$$`
]

---
## Posterior with precision terms

- As mentioned before, Bayesians often prefer to talk about precision instead of variance.

- We have
  + `$\tau$` as the sampling precision (how close the `$y_i$`'s are to `$\mu$`).
  
  + `$\tau_0$` as the prior precision (our prior belief about the uncertainty about `$\mu$` around our prior guess `$\mu_0$`).
  
  + `$\tau_n$` as the posterior precision

- From the posterior, we can see that, _the posterior precision equals the prior precision plus the data precision_.

- That is, once again, the posterior information is a combination of the prior information and the information from the data.

---
## Posterior with precision terms: combining information

- Posterior mean is weighted sum of prior information plus data information:
.block[
$$
`\begin{split}
\mu_n & = \dfrac{n\tau\bar{y} + \tau_0\mu_0}{\tau n + \tau_0}\\
& = \dfrac{\tau_0}{\tau_0 + \tau n} \mu_0 + \dfrac{n\tau}{\tau_0 + \tau n} \bar{y}
\end{split}`
$$
]

- Recall that `$\sigma^2$` (and thus `$\tau$`) is known for now.

- If we think of the prior mean as being based on `$\kappa_0$` prior observations from a similar population as `$y_1,y_2,\ldots,y_n$`, then we might set `$\sigma_0^2 = \frac{\sigma^2}{\kappa_0}$`, which implies `$\tau_0 = \kappa_0 \tau$`, and then the posterior mean is given by
.block[
$$
`\begin{split}
\mu_n & = \dfrac{\kappa_0}{\kappa_0 + n} \mu_0 + \dfrac{n}{\kappa_0 + n} \bar{y}.
\end{split}`
$$
]

---
## Posterior with variance terms

- In terms of variances, we have
.block[
`$$\mu|Y,\sigma^2 \sim \mathcal{N}(\mu_n, \sigma_n^2)$$`
]

where
.block[
`$$\mu_n = \dfrac{ \dfrac{n}{\sigma^2}\bar{y} + \dfrac{1}{\sigma^2_0} \mu_0}{\dfrac{n}{\sigma^2} + \dfrac{1}{\sigma^2_0}}$$`
]

and
.block[
`$$\sigma^2_n = \dfrac{1}{\dfrac{n}{\sigma^2} + \dfrac{1}{\sigma^2_0}}.$$`
]

- It is still easy to see that we can re-express the posterior information as a sum of the prior information and the information from the data.

---

class: center, middle

# What's next?

### Move on to the readings for the next module!