Sums of Simple Random Samples
When the random variables that are being added are not independent, finding the variance of the sum does involve finding covariances. As before, let X1,X2,…Xn be random variables with sum
Sn=n∑i=1XiThe variance of the sum is
Var(Sn) = n∑i=1Var(Xi)+∑∑1≤i≠j≤nCov(Xi,Xj)Before we apply this formula, let’s start out by finding a simple covariance.
Indicators
Let A and B be two events. Let IA be the indicator of A and let IB be the indicator of B. This is going to be one of the rare instances where we use an expected product to find a covariance. That’s because we know that products of indicators are themselves indicators.
Cov(IA,IB)=E(IAIB)−E(IA)E(IB)=P(AB)−P(A)P(B)You can see that the covariance is 0 if A and B are independent, consistent with the more general result of the previous section. When A and B are not independent, covariance helps us understand the nature of the dependence. For example, if Cov(IA,IB) is positive, then
P(AB)>P(A)P(B) ⟹ P(A)P(B∣A)>P(A)P(B) ⟹ P(B∣A)>P(B)That is, given that A has occurred, the chance of B is higher than it is overall. This is called positive association or positive dependence of A and B.
Variance of the Hypergeometric
Suppose you have a population of N elements of which G are good. Let X be the number of good elements in simple random sample of n elements drawn from the population. Remember that simple random samples are drawn without replacement.
We know that
X=n∑j=1Ijwhere Ij is the indicator that draw j yields a good element.
By symmetry, we know that E(Ij)=GN for each j. That is why
E(X) = nGN = np where p=GNThat’s the same formula as for the binomial.
We also know that Var(Ij)=GN⋅BN where B=N−G is the number of bad elements in the population.
Also by symmetry, Cov(Ij,Ik) is the same for each pair j,k where j≠k. The example above tells us how to calculate this common value.
Cov(Ij,Ik)=GN⋅G−1N−1−GN⋅GNTherefore
Var(X)=n∑j=1Var(Ij)+∑∑1≤j≠k≤nCov(Ij,Ik)=nGN⋅BN + n(n−1)(GN⋅G−1N−1−GN⋅GN)=nGN⋅BN + n(n−1)GN(G−1N−1−GN)=nGN⋅BN − n(n−1)GN⋅N−GN(N−1)=nGN⋅BN⋅(1−n−1N−1)=nGN⋅BN⋅(N−nN−1)=npq⋅N−nN−1where p=GN and q=1−p.
Notice that the formula is the same as the formula for the variance of the binomial, apart from the factor of N−nN−1.
We can generalize this result to the case where the population isn’t binary.
Variance of a Simple Random Sample Sum
Suppose we have a population of N numbers which need not be only zeros and ones. Suppose the population has mean μ and standard deviation σ. Draw a simple random sample of size n from the population. For j in the range 1 through n, let Xj be the jth value drawn.
Let Sn=X1+X2+⋯+Xn. Then E(Sn)=nμ, and
Var(Sn) = n∑i=1Var(Xi)+∑∑1≤i≠j≤nCov(Xi,Xj) = nσ2+n(n−1)Cov(X1,X2)by symmetry.
How can we find Cov(X1,X2)? It’s not a good idea to try and multiply the two variables, as they are dependent and their distributions might be unpleasant. The expected product will be hard to find.
What we can use is the observation that the equation we derived above for Var(Sn) is valid for any sample size. In particular, it is valid in the case when we take a census, that is, when we sample all the elements of the population. In that case n=N and the equation is
Var(SN)=Nσ2+N(N−1)Cov(X1,X2)Why is helpful? To answer this, think about the variability in SN. We have sampled the entire population without replacement. Therefore SN is just the total of the entire population. There is no sampling variability in SN, because there is only one possible sample of size N.
That means Var(SN)=0. We can use this to solve for Cov(X1,X2).
0=Nσ2+N(N−1)Cov(X1,X2) ⟹ Cov(X1,X2)=−σ2N−1Now plug this into the formula for Var(Sn) for any smaller sample size n.
Var(Sn) = nσ2−n(n−1)σ2N−1 = nσ2(1−n−1N−1) = nσ2N−nN−1Recall that the variance of the sample sum is nσ2 when the sample is drawn with replacement. When the sample is drawn without replacement, the formula is the same apart from the factor of N−nN−1.
That is exactly what we saw in the special case of the binary population. In the final section of this chapter we will investigate this relation between sampling with and without replacement.