# 13.2. Sums of IID Samples¶

After the dry, algebraic discussion of the previous section it is a relief to finally be able to compute some variances.

Let $$X_1, X_2, \ldots X_n$$ be random variables with sum $$S_n = \sum_{i=1}^n X_i$$ The variance of the sum is

\begin{split} \begin{align*} Var(S_n) &= Cov(S_n, S_n) \\ &= \sum_{i=1}^n\sum_{j=1}^n Cov(X_i, X_j) ~~~~ \text{(bilinearity)} \\ &= \sum_{i=1}^n Var(X_i) + \mathop{\sum \sum}_{1 \le i \ne j \le n} Cov(X_i, X_j) \end{align*} \end{split}

We say that the variance of the sum is the sum of all the variances and all the covariances.

If $$X_1, X_2 \ldots , X_n$$ are independent, then all the covariance terms in the formula above are 0.

Therefore if $$X_1, X_2, \ldots, X_n$$ are independent then $$Var(S_n) = \sum_{i=1}^n Var(X_i)$$

Thus for independent random variables $$X_1, X_2, \ldots, X_n$$, both the expectation and the variance add up nicely:

$E(S_n) = \sum_{i=1}^n E(X_i), ~~~~~~ Var(S_n) = \sum_{i=1}^n Var(X_i)$

When the random variables are i.i.d., this simplifies even further.

## 13.2.1. Sum of an IID Sample¶

Let $$X_1, X_2, \ldots, X_n$$ be i.i.d., each with mean $$\mu$$ and $$SD$$ $$\sigma$$. You can think of $$X_1, X_2, \ldots, X_n$$ as draws at random with replacement from a population, or the results of independent replications of the same experiment.

Let $$S_n$$ be the sample sum, as above. Then

$E(S_n) = n\mu ~~~~~~~~~~ Var(S_n) = n\sigma^2 ~~~~~~~~~~ SD(S_n) = \sqrt{n}\sigma$

This implies that as the sample size $$n$$ increases, the distribution of the sum $$S_n$$ shifts to the right and is more spread out.

Here is one of the most important applications of these results.

## 13.2.2. Variance of the Binomial¶

Let $$X$$ have the binomial $$(n, p)$$ distribution. We know that $$X = \sum_{i=1}^n I_j$$ where $$I_1, I_2, \ldots, I_n$$ are i.i.d. indicators, each taking the value 1 with probability $$p$$. Each of these indicators has expectation $$p$$ and variance $$pq = p(1-p)$$. Therefore

$E(X) = np ~~~~~~~~~~ Var(X) = npq ~~~~~~~~~~ SD(X) = \sqrt{npq}$

For example, if $$X$$ is the number of heads in 100 tosses of a coin, then

$E(X) = 100 \times 0.5 = 50 ~~~~~~~~~~ SD(X) = \sqrt{100 \times 0.5 \times 0.5} = 5$

Here is the distribution of $$X$$. You can see that there is almost no probability outside the range $$E(X) \pm 3SD(X)$$.

k = np.arange(25, 75, 1)
binom_probs = stats.binom.pmf(k, 100, 0.5)
binom_dist = Table().values(k).probabilities(binom_probs)
Plot(binom_dist, show_ev=True, show_sd=True) 