Odds Ratios

Interact

Binomial (n,p) probabilities involve powers and factorials, both of which are difficult to compute when n is large. This section is about a simplification of the computation of the entire distribution. The result also helps us understand the shape of binomial histograms.

Consecutive Odds Ratios

Fix n and p, and let P(k) be the binomial (n,p) probability of k. That is, let P(k) be the chance of getting k successes in n independent trials with probability p of success on each trial.

For k1, define the kth consecutive odds ratio

R(k)=P(k)P(k1)

To see how this helps us calculate each P(k) without having to calculate factorials and powers each time, notice that

P(0)=(1p)nP(1)=P(0)P(1)P(0)=P(0)R(1)P(2)=P(0)R(1)R(2)

and so on.

How is this more illuminating than plugging into the binomial formula? To see this, fix k1 and calculate the ratio R(k).

R(k)=(nk)pk(1p)nk(nk1)pk1(1p)nk+1=nk+1kp1p   (after cancellation)=(n+1k1)p1p

First, notice that the formulas for R(k) are simple. For example, if n3, it is easy to calculate P(3) as

P(3)=(1p)nn1+11p1pn2+12p1pn3+13p1p

No factorials involved.

Shapes of Binomial Histograms

Now observe that comparing R(k) to 1 tells us whether the histogram is going up, staying level, or going down at k.

R(k)>1P(k)>P(k1)R(k)=1P(k)=P(k1)R(k)<1P(k)<P(k1)

Note also that the form

R(k)=(n+1k1)p1p

tells us the the ratios are a decreasing function of k. In the formula, n and p are the parameters of the distribution and hence constant. It is k that varies, and k appears in the denominator.

This implies that once R(k)<1 for some k, it will remain less than 1 for all larger k. In other words, once the histogram starts going down, it will keep going down. It cannot come back up again.

That is why binomial histograms are either non-increasing or non-decreasing, or they go up and come down. But they can’t look like waves on the seashore. They can’t go up, come down, and go up again.

Let’s visualize this for a n=23 and p=0.7, two parameters that have no significance other than being our choice to use in this example.

n = 23
p = 0.7
k = range(n+1)
bin_23_7 = stats.binom.pmf(k, n, p)
bin_dist = Table().values(k).probability(bin_23_7)
Plot(bin_dist)

png

# It is important to define k as an array here,
# so you can do array operations
# to find all the ratios at once.
k = np.arange(1, n+1, 1)
((n - k + 1)/k)*(p/(1-p))
array([ 53.66666667,  25.66666667,  16.33333333,  11.66666667,
         8.86666667,   7.        ,   5.66666667,   4.66666667,
         3.88888889,   3.26666667,   2.75757576,   2.33333333,
         1.97435897,   1.66666667,   1.4       ,   1.16666667,
         0.96078431,   0.77777778,   0.61403509,   0.46666667,
         0.33333333,   0.21212121,   0.10144928])

What Python is helpfully telling us is that the invisible bar at 1 is 53.666… times larger than the even more invisible bar at 0. The ratios decrease after that but they are still bigger than 1 through k=16. The histogram rises till it reaches its peak at k=16. You can see that R(16)=1.1666>1. Then the ratios drop below one, so the histogram starts going down.

We can solve an inequality to show that the largest k for which R(k)1 is the integer part of (n+1)p. In our example, this is k=16 because

(n+1)*p
16.799999999999997

Mode of the Binomial

A mode of a discrete distribution is a possible value that has the highest probability. There may be more than one such value, so there may be more than one mode.

For all k such that R(k)1, we will say that the binomial histogram is either rising or flat at k. The largest k for which R(k)1 has to be a mode; for all larger k, the histogram will be falling.

Let q=1p. Every value k for which R(k)1 must satisfy

(n+1k1)pq  1

That is,

n+1k  qp+1 = 1p

which is equivalent to

k  (n+1)p

Therefore the largest k for which R(k)1 is the integer part of (n+1)p. That’s a mode of the binomial.

Because the odds ratios are non-decreasing in k, the only way in which there can be more than one mode is if there is a k such that R(k)=1. In that case, P(k)=P(k1) and therefore both k and k1 will be modes. To summarize:

The Mode

The mode of the binomial (n,p) distribution is the integer part of (n+1)p. If (n+1)p is an integer, then (n+1)p1 is also a mode.

But in fact, np is a more natural quantity to calculate. For example, if you are counting the number of heads in 100 tosses of a coin, then the distribution is binomial (100,1/2) and you naturally expect np=50 heads. You don’t want to be worrying about 101×(1/2).

You don’t have to worry when n is large, because then np and (n+1)p are pretty close. In a later section we will examine a situation in which you can use np to get an approximation to the shape of the binomial distribution when n is large.