The Beta-Binomial Distribution

21.2. The Beta-Binomial Distribution#

As in the previous section, let $X$ have the beta $(r, s)$ prior, and given $X = p$ let the $S_{n}$ be the number of heads in the first $n$ tosses of a $p$ -coin.

All the calculations we carried out in the previous section were under the condition that $S_{n} = k$ , but we never needed to find the probability of this event. It was part of the constant that made the posterior density of $X$ integrate to 1.

We can now find $P (S_{n} = k)$ by writing the posterior density in two ways:

By recalling that it is the beta $(r + k, s + n - k)$ density:

f_{X | S_{n} = k} (p) = C (r + k, s + n - k) p^{r + k - 1} (1 - p)^{s + n - k - 1}, 0 < p < 1

By using Bayes’ Rule:

f_{X | S_{n} = k} (p) = \frac{C (r, s) p^{r - 1} (1 - p)^{s - 1} \cdot (\binom{n}{k}) p^{k} (1 - p)^{n - k}}{P (S_{n} = k)}, 0 < p < 1

Now equate constants:

\frac{C (r, s) (\binom{n}{k})}{P (S_{n} = k)} = C (r + k, s + n - k)

21.2.1. Beta-Binomial Probabilities#

So for $k$ in the range 0 through $n$ ,

P (S_{n} = k) = (\binom{n}{k}) \frac{C (r, s)}{C (r + k, s + n - k)}

where $C (r, s)$ is the constant in the beta $(r, s)$ density, given by

C (r, s) = \frac{Γ (r + s)}{Γ (r) Γ (s)}

That’s not as awful as it looks. A better way to think of the formula is

P (S_{n} = k) = (\binom{n}{k}) \frac{constant in the prior beta}{constant in the posterior beta given k heads in n tosses}

This discrete distribution is called the beta-binomial distribution with parameters $r$ , $s$ , and $n$ . It is the distribution of the number of heads in $n$ tosses of a coin that lands heads with a probability picked according to the beta $(r, s)$ distribution.

One $(r, s)$ pair is particularly interesting: $r = s = 1$ . That’s the case when $X$ has the uniform prior. The distribution of $S_{n}$ reduces to

P (S_{n} = k) = \frac{n!}{k! (n - k)!} \cdot \frac{1!}{0! 0!} \cdot \frac{k! (n - k)!}{(n + 1)!} = \frac{1}{n + 1}

There’s no $k$ in the answer! The conclusion is that if you choose $p$ uniformly between 0 and 1 and toss a $p$ -coin $n$ times, the distribution of the number of heads is uniform on ${0, 1, 2, \dots, n}$ .

If you choose $p$ uniformly between 0 and 1, then for the conditional distribution of $S_{n}$ given that $p$ was the selected value is binomial $(n, p)$ . But the unconditional distribution of $S_{n}$ is uniform.

21.2.2. Checking by Integration#

If you prefer, you can find the distribution of $S_{n}$ directly, by conditioning on $X$ .

\begin{array}{r} \begin{aligned} P (S_{n} = k) & = \int_{0}^{1} P (S_{n} = k ∣ X = p) f_{X} (p) d p \\ = \int_{0}^{1} (\binom{n}{k}) p^{k} (1 - p)^{n - k} C (r, s) p^{r - 1} (1 - p)^{s - 1} d p \\ = (\binom{n}{k}) C (r, s) \int_{0}^{1} p^{r + k - 1} (1 - p)^{s + n - k - 1} d p \\ = (\binom{n}{k}) C (r, s) \frac{1}{C (r + k, s + n - k)} \end{aligned} \end{array}

21.2.3. Expectation#

Given $X = p$ , the conditional distribution of $S_{n}$ is binomial $(n, p)$ . Therefore

E (S_{n} ∣ X = p) = n p

or, equivalently,

E (S_{n} ∣ X) = n X

By iteration,

E (S_{n}) = E (n X) = n E (X) = n \frac{r}{r + s}

The expected proportion of heads in $n$ tosses is

E (\frac{S_{n}}{n}) = \frac{r}{r + s}

which is the expectation of the prior distribution of $X$ .

In the next section we will examine the long run behavior of this random proportion.

21.2.4. Endnote#

The unconditional probability $P (S_{n} = k)$ appeared in the denominator of our calculation of the posterior density of $X$ given $S_{n}$ . Because of the simplifications that result from using conjugate priors, we were able to calculate the denominator in a couple of different ways. But often the calculation can be intractable, especially in high dimensional settings. Methods of dealing with this problem are covered in more advanced courses.