class: center, middle, inverse, title-slide # STA 360/602L: Module 4.1 ## Multivariate normal model I ### Dr. Olanrewaju Michael Akande --- ## Multivariate data - So far we have only considered basic models with scalar/univariate outcomes, `\(Y_1, \ldots, Y_n\)`. -- - In practice however, outcomes of interest are actually often multivariate, e.g., + Repeated measures of weight over time in a weight loss study + Measures of multiple disease markers + Tumor counts at different locations along the intestine -- - Longitudinal data is just a special case of multivariate data. -- - Interest then is often on how multiple outcomes are correlated, and on how that correlation may change across outcomes or time points. --- ## Multivariate normal distribution - The most common model for multivariate outcomes is the .hlight[multivariate normal distribution]. -- - Let `\(\boldsymbol{Y} = (Y_1,\ldots,Y_p)^T\)`, where `\(p\)` represents the dimension of the multivariate outcome variable for a single unit of observation. -- - For multiple observations, `\(\boldsymbol{Y_i} = (Y_{i1},\ldots,Y_{ip})^T\)` for `\(i = 1,\ldots, n\)`. -- - `\(\boldsymbol{Y}\)` follows a multivariate normal distribution, that is, `\(\boldsymbol{Y} \sim \mathcal{N}_p(\boldsymbol{\mu}, \Sigma)\)`, if .block[ `$$p(\boldsymbol{y} | \boldsymbol{\mu}, \Sigma) = (2\pi)^{-\frac{p}{2}} \left|\Sigma\right|^{-\frac{1}{2}} \ \textrm{exp} \left\{-\dfrac{1}{2} (\boldsymbol{y} - \boldsymbol{\mu})^T \Sigma^{-1} (\boldsymbol{y} - \boldsymbol{\mu})\right\},$$` ] where `\(\left|\Sigma\right|\)` denotes the determinant of `\(\Sigma\)`. --- ## Multivariate normal distribution If `\(\boldsymbol{Y} \sim \mathcal{N}_p(\boldsymbol{\mu}, \Sigma)\)`, then -- + `\(\boldsymbol{\mu}\)` is the `\(p \times 1\)` mean vector, that is, `$$\boldsymbol{\mu} = \mathbb{E}[\boldsymbol{Y}] = \{\mathbb{E}[Y_1],\ldots,\mathbb{E}[Y_p]\} = (\mu_1, \ldots,\mu_p)^T.$$` -- + `\(\Sigma\)` is the `\(p \times p\)` **positive definite and symmetric** covariance matrix, that is, - `\(\Sigma = \{\sigma_{jk}\}\)`, where `\(\sigma_{jk}\)` denotes the covariance between `\(Y_j\)` and `\(Y_k\)`. -- + `\(Y_1,\ldots,Y_p\)` may be linearly dependent depending on the structure of `\(\Sigma\)`, which characterizes the association between them. -- + For each `\(j = 1, \ldots, p\)`, `\(Y_j \sim \mathcal{N}(\mu_j, \sigma_{jj})\)`. --- ## Bivariate normal distribution - In the bivariate case, we have .block[ .small[ `\begin{eqnarray*} \boldsymbol{Y} = \begin{pmatrix}Y_1\\ Y_2 \end{pmatrix} & \sim & \mathcal{N}_2\left[\mu = \left(\begin{array}{c} \mu_1\\ \mu_2 \end{array}\right),\Sigma = \left(\begin{array}{cc} \sigma_{11} = \sigma^2_1 & \sigma_{12} \\ \sigma_{21} & \sigma_{22} = \sigma^2_2 \end{array}\right)\right],\\ \end{eqnarray*}` ] ] and `\(\sigma_{12} = \sigma_{21} = \mathbb{Cov}[Y_1,Y_2]\)`. -- - The correlation between `\(Y_1\)` and `\(Y_2\)` is defined as .block[ .small[ `$$\rho_{1,2} = \dfrac{\mathbb{Cov}[Y_1,Y_2]}{\sqrt{\mathbb{Var}[Y_1]}\sqrt{\mathbb{Var}[Y_2]}} = \dfrac{\sigma_{12}}{\sigma_1 \sigma_2}.$$` ] ] -- - `\(-1 \leq \rho_{1,2} \leq 1\)`. -- - Correlation coefficient is free of the measurement units. --- ## Back to the multivariate normal - There are many special properties of the multivariate normal as we will see as we continue to work with the distribution. -- - First, dependence between any `\(Y_j\)` and `\(Y_k\)` does not depend on the other `\(p-2\)` variables. -- - Second, while generally, **independence implies zero covariance**, for the normal family, the converse is also true. That is, **zero covariance also implies independence**. -- - Thus, the covariance `\(\Sigma\)` carries a lot of information about marginal relationships, especially **marginal independence**. -- - If `\(\boldsymbol{\epsilon} = (\epsilon_1, \ldots, \epsilon_p) \sim \mathcal{N}_p(\boldsymbol{0}, \boldsymbol{I}_p)\)`, that is, `\(\epsilon_1, \ldots, \epsilon_p \overset{iid}{\sim} \mathcal{N}(0,1)\)`, then .block[ `$$\boldsymbol{Y} = \boldsymbol{\mu} + A\boldsymbol{\epsilon} \Rightarrow \ \boldsymbol{Y} \sim \mathcal{N}_p(\boldsymbol{\mu}, \Sigma)$$` ] holds for any matrix square root `\(A\)` of `\(\Sigma\)`, that is, `\(AA^T = \Sigma\)` (see Cholesky decomposition). --- ## Conditional distributions - Partition `\(\boldsymbol{Y} = (Y_1,\ldots,Y_p)^T\)` as .block[ .small[ `\begin{eqnarray*} \boldsymbol{Y} = \begin{pmatrix}\boldsymbol{Y}_1\\ \boldsymbol{Y}_2 \end{pmatrix} & \sim & \mathcal{N}_p\left[\left(\begin{array}{c} \boldsymbol{\mu}_1\\ \boldsymbol{\mu}_2 \end{array}\right),\left(\begin{array}{cc} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array}\right)\right],\\ \end{eqnarray*}` ] ] where + `\(\boldsymbol{Y}_1\)` and `\(\boldsymbol{\mu}_1\)` are `\(q \times 1\)`, + `\(\boldsymbol{Y}_2\)` and `\(\boldsymbol{\mu}_2\)` are `\((p-q) \times 1\)`, + `\(\Sigma_{11}\)` is `\(q \times q\)`, and + `\(\Sigma_{22}\)` is `\((p-q) \times (p-q)\)`, with `\(\Sigma_{22} > 0\)`. -- - Then, .block[ `$$\boldsymbol{Y}_1 | \boldsymbol{Y}_2 = \boldsymbol{y}_2 \sim \mathcal{N}_q\left(\boldsymbol{\mu}_1 + \Sigma_{12}\Sigma_{22}^{-1} (\boldsymbol{y}_2-\boldsymbol{\mu}_2), \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}\right).$$` ] -- - Marginal distributions are once again normal, that is, .block[ `$$\boldsymbol{Y}_1 \sim \mathcal{N}_q\left(\boldsymbol{\mu}_1, \Sigma_{11}\right); \ \ \ \boldsymbol{Y}_2 \sim \mathcal{N}_{p-q}\left(\boldsymbol{\mu}_2, \Sigma_{22}\right).$$` ] --- ## Conditional distributions - In the bivariate normal case with .block[ .small[ `\begin{eqnarray*} \boldsymbol{Y} = \begin{pmatrix}Y_1\\ Y_2 \end{pmatrix} & \sim & \mathcal{N}_2\left[\mu = \left(\begin{array}{c} \mu_1\\ \mu_2 \end{array}\right),\Sigma = \left(\begin{array}{cc} \sigma_{11} = \sigma^2_1 & \sigma_{12} \\ \sigma_{21} & \sigma_{22} = \sigma^2_2 \end{array}\right)\right],\\ \end{eqnarray*}` ] ] we have .block[ .small[ `$$Y_1 | Y_2 = y_2 \sim \mathcal{N}\left(\mu_1 + \dfrac{\sigma_{12}}{\sigma_2^2} (y_2-\mu_2), \sigma_1^2 - \dfrac{\sigma_{12}^2}{\sigma_2^2}\right).$$` ] ] -- which can also be written as .block[ .small[ `$$Y_1 | Y_2 = y_2 \sim \mathcal{N}\left(\mu_1 + \dfrac{\sigma_{1}}{\sigma_2} \rho (y_2-\mu_2), (1-\rho^2) \sigma_1^2 \right).$$` ] ] --- ## Multivariate normal likelihood - Suppose `\(\boldsymbol{Y}_i = (Y_{i1},\ldots,Y_{ip})^T \sim \mathcal{N}_p(\boldsymbol{\theta}, \Sigma)\)`, `\(i = 1, \ldots, n\)`. -- - Write `\(\boldsymbol{Y} = (\boldsymbol{y}_1,\ldots,\boldsymbol{y}_n)^T\)`. The resulting likelihood can then be written as .block[ .small[ $$ `\begin{split} p(\boldsymbol{Y} | \boldsymbol{\theta}, \Sigma) & = \prod^n_{i=1} (2\pi)^{-\frac{p}{2}} \left|\Sigma\right|^{-\frac{1}{2}} \ \textrm{exp} \left\{-\dfrac{1}{2} (\boldsymbol{y}_i - \boldsymbol{\theta})^T \Sigma^{-1} (\boldsymbol{y}_i - \boldsymbol{\theta})\right\}\\ \\ & \propto \left|\Sigma\right|^{-\frac{n}{2}} \ \textrm{exp} \left\{-\dfrac{1}{2} \sum^n_{i=1} (\boldsymbol{y}_i - \boldsymbol{\theta})^T \Sigma^{-1} (\boldsymbol{y}_i - \boldsymbol{\theta})\right\}. \end{split}` $$ ] ] -- - It will be super useful to be able to write the likelihood in two different formulations depending on whether we care about the posterior of `\(\boldsymbol{\theta}\)` or `\(\Sigma\)`. --- ## Multivariate normal likelihood - For inference on `\(\boldsymbol{\theta}\)`, it is convenient to write `\(p(\boldsymbol{Y} | \boldsymbol{\theta}, \Sigma)\)` as .block[ .small[ $$ `\begin{split} p(\boldsymbol{Y} | \boldsymbol{\theta}, \Sigma) & \propto \underbrace{\left|\Sigma\right|^{-\frac{n}{2}}}_{\text{does not involve } \boldsymbol{\theta}} \ \ \cdot \ \ \textrm{exp} \left\{-\dfrac{1}{2} \sum^n_{i=1} (\boldsymbol{y}_i - \boldsymbol{\theta})^T \Sigma^{-1} (\boldsymbol{y}_i - \boldsymbol{\theta})\right\}\\ & \propto \textrm{exp} \left\{-\dfrac{1}{2} \sum^n_{i=1} (\boldsymbol{y}_i^T - \boldsymbol{\theta}^T) \Sigma^{-1} (\boldsymbol{y}_i - \boldsymbol{\theta})\right\}\\ & = \textrm{exp} \left\{-\dfrac{1}{2} \sum^n_{i=1} \left[\underbrace{\boldsymbol{y}_i^T\Sigma^{-1}\boldsymbol{y}_i}_{\text{does not involve } \boldsymbol{\theta}} - \ \ \ \underbrace{ \boldsymbol{y}_i^T\Sigma^{-1}\boldsymbol{\theta} - \boldsymbol{\theta}^T\Sigma^{-1}\boldsymbol{y}_i}_{\text{same term}} \ \ \ + \ \ \ \boldsymbol{\theta}^T\Sigma^{-1}\boldsymbol{\theta} \right] \right\}\\ & \propto \textrm{exp} \left\{-\dfrac{1}{2} \sum^n_{i=1} \left[\boldsymbol{\theta}^T\Sigma^{-1}\boldsymbol{\theta} -2\boldsymbol{\theta}^T\Sigma^{-1}\boldsymbol{y}_i \right] \right\}\\ & = \textrm{exp} \left\{-\dfrac{1}{2} \sum^n_{i=1} \boldsymbol{\theta}^T\Sigma^{-1}\boldsymbol{\theta} - \dfrac{1}{2}\sum^n_{i=1} (-2)\boldsymbol{\theta}^T\Sigma^{-1}\boldsymbol{y}_i \right\}\\ & = \textrm{exp} \left\{-\dfrac{1}{2} n \boldsymbol{\theta}^T\Sigma^{-1}\boldsymbol{\theta} + \boldsymbol{\theta}^T\Sigma^{-1} \sum^n_{i=1}\boldsymbol{y}_i \right\}\\ & = \textrm{exp} \left\{-\dfrac{1}{2} \boldsymbol{\theta}^T(n\Sigma^{-1})\boldsymbol{\theta} + \boldsymbol{\theta}^T (n\Sigma^{-1} \bar{\boldsymbol{y}}) \right\},\\ \end{split}` $$ ] ] where `\(\bar{\boldsymbol{y}} = (\bar{y}_1,\ldots,\bar{y}_p)^T\)`. --- ## Multivariate normal likelihood - For inference on `\(\Sigma\)`, we need to rewrite the likelihood a bit. -- - First a few results from matrix algebra: 1. `\(\text{tr}(\boldsymbol{A}) = \sum^p_{j=1}a_{jj}\)`, where `\(a_{jj}\)` is the `\(j\)`th diagonal element of a square `\(p \times p\)` matrix `\(\boldsymbol{A}\)`, where `\(\text{tr}(\cdot)\)` is the **trace function** (sum of diagonal elements). -- 2. Cyclic property: `$$\text{tr}(\boldsymbol{A}\boldsymbol{B}\boldsymbol{C}) = \text{tr}(\boldsymbol{B}\boldsymbol{C}\boldsymbol{A}) = \text{tr}(\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}),$$` given that the product `\(\boldsymbol{A}\boldsymbol{B}\boldsymbol{C}\)` is a square matrix. -- 3. If `\(\boldsymbol{A}\)` is a `\(p \times p\)` matrix, then for a `\(p \times 1\)` vector `\(\boldsymbol{x}\)`, `$$\boldsymbol{x}^T \boldsymbol{A} \boldsymbol{x} = \text{tr}(\boldsymbol{x}^T \boldsymbol{A} \boldsymbol{x})$$` holds by (1), since `\(\boldsymbol{x}^T \boldsymbol{A} \boldsymbol{x}\)` is a scalar. -- 4. `\(\text{tr}(\boldsymbol{A} + \boldsymbol{B}) = \text{tr}(\boldsymbol{A}) + \text{tr}(\boldsymbol{B})\)`. --- ## Multivariate normal likelihood - It is convenient to rewrite `\(p(\boldsymbol{Y} | \boldsymbol{\theta}, \Sigma)\)` as .block[ .small[ $$ `\begin{split} p(\boldsymbol{Y} | \boldsymbol{\theta}, \Sigma) & \propto \underbrace{\left|\Sigma\right|^{-\frac{n}{2}} \ \textrm{exp} \left\{-\dfrac{1}{2} \sum^n_{i=1} (\boldsymbol{y}_i - \boldsymbol{\theta})^T \Sigma^{-1} (\boldsymbol{y}_i - \boldsymbol{\theta})\right\}}_{\text{no algebra/change yet}}\\ & = \left|\Sigma\right|^{-\frac{n}{2}} \ \textrm{exp} \left\{-\dfrac{1}{2} \sum^n_{i=1} \underbrace{\text{tr}\left[(\boldsymbol{y}_i - \boldsymbol{\theta})^T \Sigma^{-1} (\boldsymbol{y}_i - \boldsymbol{\theta}) \right]}_{\text{by result 3}} \right\}\\ & = \left|\Sigma\right|^{-\frac{n}{2}} \ \textrm{exp} \left\{-\dfrac{1}{2} \sum^n_{i=1} \underbrace{\text{tr}\left[(\boldsymbol{y}_i - \boldsymbol{\theta})(\boldsymbol{y}_i - \boldsymbol{\theta})^T \Sigma^{-1} \right]}_{\text{by cyclic property}} \right\}\\ & = \left|\Sigma\right|^{-\frac{n}{2}} \ \textrm{exp} \left\{-\dfrac{1}{2} \underbrace{\text{tr}\left[\sum^n_{i=1} (\boldsymbol{y}_i - \boldsymbol{\theta})(\boldsymbol{y}_i - \boldsymbol{\theta})^T \Sigma^{-1} \right]}_{\text{by result 4}} \right\}\\ & = \left|\Sigma\right|^{-\frac{n}{2}} \ \textrm{exp} \left\{-\dfrac{1}{2}\text{tr}\left[\boldsymbol{S}_\theta \Sigma^{-1} \right] \right\},\\ \end{split}` $$ ] ] where `\(\boldsymbol{S}_\theta = \sum^n_{i=1}(\boldsymbol{y}_i - \boldsymbol{\theta})(\boldsymbol{y}_i - \boldsymbol{\theta})^T\)` is the residual sum of squares matrix. --- class: center, middle # What's next? ### Move on to the readings for the next module!