STA 360/602L: Module 2.7

class: center, middle, inverse, title-slide

# STA 360/602L: Module 2.7
## Gamma-Poisson model I
### Dr. Olanrewaju Michael Akande

---

## Poisson distribution recap

- `$Y_1,\ldots,Y_n \overset{iid}{\sim} \textrm{Poisson}(\theta)$` denotes that each `$Y_i$` is a .hlight[Poisson random variable].

- The Poisson distribution is commonly used to model count data consisting of the number of events in a given time interval.

- Some examples: # children, # lifetime romantic partners, # songs on iPhone, # tumors on mouse, etc.

- The Poisson distribution is parameterized by `$\theta$` and the pmf is given by
.block[
`$$\Pr[Y_i = y_i | \theta] = \dfrac{\theta^{y_i} e^{-\theta}}{y_i!}; \ \ \ \ y_i=0,1,2,\ldots; \ \ \ \ \theta > 0.$$`
]

where
.block[
`$$\mathbb{E}[Y_i] = \mathbb{V}[Y_i] = \theta.$$`
]

- <div class="question">
What is the joint likelihood? What is the best guess (MLE) for the Poisson parameter? What is the sufficient statistic for the Poisson parameter?
</div>

---
## Gamma density recap

- The .hlight[gamma density] will be useful as a prior for parameters that are strictly positive.

- If `$\theta \sim \textrm{Ga}(a,b)$`, we have the pdf
.block[
.small[
`$$p(\theta) = \frac{b^a}{\Gamma(a)} \theta^{a-1}e^{-b\theta}.$$`
]
]

where `$a$` is known as the .hlight[shape parameter] and `$b$`, the .hlight[rate parameter]. 
  
--

- Another parameterization uses the .hlight[scale parameter] `$\phi = 1/b$` instead of `$b$`.

- Some properties:

+ `$\mathbb{E}[\theta] = \dfrac{a}{b}$`
  
  + `$\mathbb{V}[\theta] = \dfrac{a}{b^2}$`
  
  + `$\textrm{Mode}[\theta] = \dfrac{a-1}{b} \ \ \textrm{for} \ \ a \geq 1$`

---
## Gamma density

- If our prior guess of the expected count is `$\mu$` & we have a prior "scale" `$\phi$`, we can let
.block[
.small[
`$$\mathbb{E}[\theta] = \mu = \dfrac{a}{b}; \ \ \mathbb{V}[\theta] = \mu \phi = \dfrac{a}{b^2},$$`
]
]

and solve for `$a$`, `$b$`. We can play the same game if we have a prior variance or standard deviation.
  
  
--

- More properties:

+ If `$\theta_1,\ldots,\theta_p \overset{ind}{\sim} \textrm{Ga}(a_i,b)$`, then `$\sum_i \theta_i \sim \textrm{Ga}(\sum_i a_i,b)$`.

--
  + If `$\theta \sim \textrm{Ga}(a,b)$`, then for any `$c > 0$`, `$c\theta \sim \textrm{Ga}(a,b/c)$`.

--
  + If `$\theta \sim \textrm{Ga}(a,b)$`, then `$1/\theta$` has an .hlight[Inverse-Gamma distribution]. 
  
--
  *We'll take advantage of these soon!*
  
  
  
---
## Example gamma distributions

*R has the option to specify either the rate or scale parameter so always make sure to specify correctly when using "dgamma","rgamma",etc!*.

---
## Gamma-Poisson

Generally, it turns out that

.hlight[Poisson data]:
.block[
.small[
`$$p(y_i| \theta): y_1,\ldots,y_n \overset{iid}{\sim} \textrm{Poisson}(\theta)$$`
]
]

`$+$` .hlight[Gamma Prior]:
.block[
.small[
`$$\pi(\theta) = \frac{b^a}{\Gamma(a)} \theta^{a-1}e^{-b\theta} = \textrm{Ga}(a,b)$$`
]
]

`$\Rightarrow$` .hlight[Gamma posterior]:
.block[
.small[
`$$\pi(\theta | \{y_i\}): \theta | \{y_i\} \sim \textrm{Ga}(a+\sum y_i,b+n).$$`
]
]

That is, updating a gamma prior with a Poisson likelihood leads to a gamma posterior -- we once again have conjugacy.

<div class="question">
Can we derive the posterior distribution and its parameters? Let's do some work on the board.
</div>

---
## Gamma-Poisson

- With `$\pi(\theta | \{y_i\}) =  \textrm{Ga}(a+\sum y_i,b+n)$`, we can think of
  + `$b$` as the "number prior of observations" from some past data, and
  
  + `$a$` as the "sum of the counts from the `$b$` prior observations".
  
--

- Using the properties of the gamma distribution, we have
  + `$\mathbb{E}[\theta | \{y_i\}] = \dfrac{a+\sum y_i}{b+n}$`
  + `$\mathbb{V}[\theta | \{y_i\}] = \dfrac{a+\sum y_i}{(b+n)^2}$`
  
--

- So, as we did with the beta-binomial, we can once again write the posterior expectation as a weighted average of prior and data.
.block[
.small[
`$$\mathbb{E}(\theta | \{y_i\}) = \dfrac{a+\sum y_i}{b+n} = \dfrac{b}{b+n} \times \textrm{prior mean} + \dfrac{n}{b+n} \times \textrm{MLE}.$$`
]
]

- Again, as we get more and more data, the majority of our information about `$\theta$` comes from the data as opposed to the prior.

---
## Hoff example: birth rates

- Survey data on educational attainment and number of children of 155 forty-year-old women during the 1990's.

- These women were in their 20s during the 1970s, a period of historically low fertility rates in the US.

- **Goal**: compare birth rate `$\theta_1$` for women without bachelor's degrees to the rate `$\theta_2$` for women with.

- **Data**:
  + 111 women without a bachelor's degree had 217 children: `$(\bar{y}_1 = 1.95)$`
  + 44 women with bachelor's degrees had 66 children: `$(\bar{y}_2 = 1.50)$`
  
--

- Based on the data alone, looks like `$\theta_1$` should be greater than `$\theta_2$`. But...how sure are we?
  
--

- **Priors**: `$\theta_1, \theta_2 \sim \textrm{Ga}(2,1)$` (not much prior information; equivalent to 1 prior woman with 2 children). Posterior means will be close to the MLEs.

---
## Hoff example: birth rates

- Then,
  + `$\theta_1 | \{n_1=111, \sum y_{i,1}=217\} \sim \textrm{Ga}(2+217,1+111) = \textrm{Ga}(219,112).$`
  
  + `$\theta_2 | \{n_2=44, \sum y_{i,2}=66\} \sim \textrm{Ga}(2+66,1+44) = \textrm{Ga}(68,45).$`
  
--

- Use R to calculate posterior means and 95% CIs for `$\theta_1$` and `$\theta_2$`.
    
    ```r
    a=2; b=1; #prior
    n1=111; sumy1=217; n2=44; sumy2=66 #data
    (a+sumy1)/(b+n1); (a+sumy2)/(b+n2); #post means
    qgamma(c(0.025, 0.975),a+sumy1,b+n1) #95\% ci 1
    qgamma(c(0.025, 0.975),a+sumy2,b+n2) #95\% ci 2
    ```

- Posterior means: `$\mathbb{E}[\theta_1 | \{y_{i,1}\}] = 1.955$` and `$\mathbb{E}[\theta_2 | \{y_{i,2}\}] = 1.511$`.
  
--

- 95% credible intervals
  + `$\theta_1$`: [1.71, 2.22].
  + `$\theta_2$`: [1.17, 1.89].

---
## Hoff example: birth rates

Prior and posteriors:

---
## Hoff example: birth rates

- Posteriors indicate considerable evidence birth rates are higher among women without bachelor's degrees.

- Confirms what we observed.

- Using sampling we can quickly calculate `$\Pr(\theta_1 > \theta_2 | \textrm{data})$`.
    
    ```r
    mean(rgamma(10000,219,112)>rgamma(10000,68,45))
    ```

We have `$\Pr(\theta_1 > \theta_2 | \textrm{data}) = 0.97$`.
  
--

- Why/how does it work?

- .hlight[Monte Carlo approximation] coming soon!

- Clearly, that probability will change with different priors.

---

class: center, middle

# What's next?

### Move on to the readings for the next module!