The Beta-Binomial Distribution


As in the previous section, let $X$ have the beta $(r, s)$ prior, and given $X = p$ let the $S_n$ be the number of heads in the first $n$ tosses of a $p$-coin.

All the calculations we carried out in the previous section were under the condition that $S_n = k$, but we never needed to find the probability of this event. It was part of the constant that made the posterior density of $X$ integrate to 1.

We can now find $P(S_n = k)$ by writing the posterior density in two ways:

  • By recalling that it is the beta $(r+k, s+n-k)$ density:
  • By using Bayes’ Rule:

Now equate constants:

Beta-Binomial Probabilities

So for $k$ in the range 0 through $n$,

where $C(r,s)$ is the constant in the beta $(r, s)$ density, given by

This discrete distribution is called the beta-binomial distribution with parameters $r$, $s$, and $n$. It is the distribution of the number of heads in $n$ tosses of a coin that lands heads with a probability picked according to the beta $(r, s)$ distribution.

One $(r, s)$ pair is particularly interesting: $r = s = 1$. That’s the case when $X$ has the uniform prior. The distribution of $S_n$ reduces to

There’s no $k$ in the answer! The conclusion is that if you choose $p$ uniformly between 0 and 1 and toss a $p$-coin $n$ times, the distribution of the number of heads is uniform on ${ 0, 1, 2, \ldots, n}$.

If you choose $p$ uniformly between 0 and 1, then for the conditional distribution of $S_n$ given that $p$ was the selected value is binomial $(n, p)$. But the unconditional distribution of $S_n$ is uniform.

Checking by Integration

If you prefer, you can find the distribution of $S_n$ directly, by conditioning on $X$.


Given $X = p$, the conditional distribution of $S_n$ is binomial $(n, p)$. Therefore

or, equivalently, By iteration,

The expected proportion of heads in $n$ tosses is

which is the expectation of the prior distribution of $X$.

In the next section we will examine the long run behavior of this random proportion.


The unconditional probability $P(S_n = k)$ appeared in the denominator of our calculation of the posterior density of $X$ given $S_n$. Because of the simplifications that result from using conjugate priors, we were able to calculate the denominator in a couple of different ways. But often the calculation can be intractable, especially in high dimensional settings. Methods of dealing with this problem are covered in more advanced courses.