class: center, middle, inverse, title-slide # STA 360/602L: Module 7.3 ## The Metropolis-Hastings algorithm ### Dr. Olanrewaju Michael Akande --- ## Metropolis-Hastings algorithm - Gibbs sampling and the Metropolis algorithm are special cases of the .hlight[Metropolis-Hastings algorithm]. -- - The Metropolis-Hastings algorithm is more general in that it allows arbitrary proposal distributions. -- - These can be symmetric around the current values, full conditionals, or something else entirely as long as they do not depend on values in our sequence that are previous to the most current values. -- - That last point is to ensure the sequence is a Markov chain! -- - In terms of how this works, the only real change from before is that now, the acceptance probability should also incorporate the proposal density when it is not symmetric. --- ## Metropolis-Hastings algorithm - Suppose our target distribution is `\(p_0(\theta)\)`. The algorithm proceeds as follows: 1. Given a current draw `\(\theta^{(s)}\)`, propose a new value `\(\theta^\star \sim g_{\theta}[\theta^\star | \theta^{(s)}]\)`. -- 2. Compute the acceptance ratio .block[ $$ `\begin{split} r & = \dfrac{p_0(\theta^\star)}{p_0(\theta^{(s)})} \times \dfrac{g_{\theta}[\theta^{(s)} | \theta^\star]}{g_{\theta}[\theta^\star | \theta^{(s)}]}. \end{split}` $$ ] -- 3. Sample `\(u \sim U(0,1)\)` and set .block[ `\begin{eqnarray*} \theta^{(s+1)} = \left\{ \begin{array}{ll} \theta^\star & \quad \text{if} \quad u < r \\ \theta^{(s)} & \quad \text{if} \quad \text{otherwise} \\ \end{array} \right. . \end{eqnarray*}` ] --- ## Metropolis-Hastings algorithm - If `\(p_0(\theta)\)` corresponds to a posterior distribution `\(\pi(\theta | y)\)` as is often the case for us, then we have 1. Propose a new value `\(\theta^\star \sim g_{\theta}[\theta^\star | \theta^{(s)}]\)`. -- 2. Compute the acceptance ratio .block[ $$ `\begin{split} r & = \dfrac{\pi(\theta^\star | y)}{\pi(\theta^{(s)} | y)} \times \dfrac{g_{\theta}[\theta^{(s)} | \theta^\star]}{g_{\theta}[\theta^\star | \theta^{(s)}]}\\ & = \dfrac{p(y | \theta^\star) \pi(\theta^\star)}{p(y | \theta^{(s)}) \pi(\theta^{(s)})} \times \dfrac{g_{\theta}[\theta^{(s)} | \theta^\star]}{g_{\theta}[\theta^\star | \theta^{(s)}]}. \end{split}` $$ ] -- 3. Sample `\(u \sim U(0,1)\)` and set .block[ `\begin{eqnarray*} \theta^{(s+1)} = \left\{ \begin{array}{ll} \theta^\star & \quad \text{if} \quad u < r \\ \theta^{(s)} & \quad \text{if} \quad \text{otherwise} \\ \end{array} \right. . \end{eqnarray*}` ] --- ## Metropolis-Hastings algorithm - Suppose our target distribution is `\(p_0(u,v)\)`, a bivariate distribution for random variables `\(U\)` and `\(V\)`. -- - For example, `\(p_0(u,v)\)` could be the joint posterior distribution for `\(U\)` and `\(V\)`. -- - Two options: + Define one joint proposal density `\(g_{u,v}[u^\star, v^\star | u^{(s)}, v^{(s)}]\)` for `\(U\)` and `\(V\)` if possible; or + Define two proposal densities, one for `\(U\)` and the other for `\(V\)`. That is, `\(g_u[u^\star | u^{(s)}, v^{(s)}]\)` and `\(g_v[v^\star | u^{(s)}, v^{(s)}]\)`. -- - First option follows directly from the main algorithm and often works very well when possible. Second option needs a little modification. --- ## Metropolis-Hastings algorithm 1. Update `\(U\)`: first, sample `\(u^\star \sim g_u[u^\star | u^{(s)}, v^{(s)}]\)`. Then, + Compute the acceptance ratio `$$r = \dfrac{p_0(u^\star, v^{(s)})}{p_0(u^{(s)}, v^{(s)})} \times \dfrac{g_u[u^{(s)} | u^\star, v^{(s)}]}{g_u[u^\star | u^{(s)}, v^{(s)}]}.$$` + Sample `\(w \sim U(0,1)\)`. Set `\(u^{(s+1)}\)` to `\(u^\star\)` if `\(w < r\)`, or set `\(u^{(s+1)}\)` to `\(u^{(s)}\)` otherwise. -- 2. Update `\(V\)`: first sample `\(v^\star \sim g_v[v^\star | u^{(s+1)}, v^{(s)}]\)`. Then, + Compute the acceptance ratio `$$r = \dfrac{p_0(u^{(s+1)}, v^\star)}{p_0(u^{(s+1)}, v^{(s)})} \times \dfrac{g_v[v^{(s)} | u^{(s+1)}, v^\star]}{g_v[v^\star | u^{(s+1)}, v^{(s)}]}.$$` + Sample `\(w \sim U(0,1)\)`. Set `\(v^{(s+1)}\)` to `\(v^\star\)` if `\(w < r\)`, or set `\(v^{(s+1)}\)` to `\(v^{(s)}\)` otherwise. --- ## Metropolis-Hastings algorithm - The acceptance ratio looks like what we had before except with an additional factor. -- - That factor is the ratio of the probability of generating the current value from the proposed to the probability of generating the proposed value from the current (ratio is equal to one for symmetric proposal -- see homework!). -- - Also, it is often the case that full conditionals are available for some parameters but not all. -- - Very useful trick is to combine Gibbs and Metropolis. See next module. --- class: center, middle # What's next? ### Move on to the readings for the next module!