Sums of Simple Random Samples

Interact

When the random variables that are being added are not independent, finding the variance of the sum does involve finding covariances. As before, let X1,X2,Xn be random variables with sum

Sn=ni=1Xi

The variance of the sum is

Var(Sn) = ni=1Var(Xi)+1ijnCov(Xi,Xj)

Before we apply this formula, let’s start out by finding a simple covariance.

Indicators

Let A and B be two events. Let IA be the indicator of A and let IB be the indicator of B. This is going to be one of the rare instances where we use an expected product to find a covariance. That’s because we know that products of indicators are themselves indicators.

Cov(IA,IB)=E(IAIB)E(IA)E(IB)=P(AB)P(A)P(B)

You can see that the covariance is 0 if A and B are independent, consistent with the more general result of the previous section. When A and B are not independent, covariance helps us understand the nature of the dependence. For example, if Cov(IA,IB) is positive, then

P(AB)>P(A)P(B)      P(A)P(BA)>P(A)P(B)      P(BA)>P(B)

That is, given that A has occurred, the chance of B is higher than it is overall. This is called positive association or positive dependence of A and B.

Variance of the Hypergeometric

Suppose you have a population of N elements of which G are good. Let X be the number of good elements in simple random sample of n elements drawn from the population. Remember that simple random samples are drawn without replacement.

We know that

X=nj=1Ij

where Ij is the indicator that draw j yields a good element.

By symmetry, we know that E(Ij)=GN for each j. That is why

E(X) = nGN = np    where p=GN

That’s the same formula as for the binomial.

We also know that Var(Ij)=GNBN where B=NG is the number of bad elements in the population.

Also by symmetry, Cov(Ij,Ik) is the same for each pair j,k where jk. The example above tells us how to calculate this common value.

Cov(Ij,Ik)=GNG1N1GNGN

Therefore

Var(X)=nj=1Var(Ij)+1jknCov(Ij,Ik)=nGNBN + n(n1)(GNG1N1GNGN)=nGNBN + n(n1)GN(G1N1GN)=nGNBN  n(n1)GNNGN(N1)=nGNBN(1n1N1)=nGNBN(NnN1)=npqNnN1

where p=GN and q=1p.

Notice that the formula is the same as the formula for the variance of the binomial, apart from the factor of NnN1.

We can generalize this result to the case where the population isn’t binary.

Variance of a Simple Random Sample Sum

Suppose we have a population of N numbers which need not be only zeros and ones. Suppose the population has mean μ and standard deviation σ. Draw a simple random sample of size n from the population. For j in the range 1 through n, let Xj be the jth value drawn.

Let Sn=X1+X2++Xn. Then E(Sn)=nμ, and

Var(Sn) = ni=1Var(Xi)+1ijnCov(Xi,Xj) = nσ2+n(n1)Cov(X1,X2)

by symmetry.

How can we find Cov(X1,X2)? It’s not a good idea to try and multiply the two variables, as they are dependent and their distributions might be unpleasant. The expected product will be hard to find.

What we can use is the observation that the equation we derived above for Var(Sn) is valid for any sample size. In particular, it is valid in the case when we take a census, that is, when we sample all the elements of the population. In that case n=N and the equation is

Var(SN)=Nσ2+N(N1)Cov(X1,X2)

Why is helpful? To answer this, think about the variability in SN. We have sampled the entire population without replacement. Therefore SN is just the total of the entire population. There is no sampling variability in SN, because there is only one possible sample of size N.

That means Var(SN)=0. We can use this to solve for Cov(X1,X2).

0=Nσ2+N(N1)Cov(X1,X2)          Cov(X1,X2)=σ2N1

Now plug this into the formula for Var(Sn) for any smaller sample size n.

Var(Sn) = nσ2n(n1)σ2N1 = nσ2(1n1N1) = nσ2NnN1

Recall that the variance of the sample sum is nσ2 when the sample is drawn with replacement. When the sample is drawn without replacement, the formula is the same apart from the factor of NnN1.

That is exactly what we saw in the special case of the binary population. In the final section of this chapter we will investigate this relation between sampling with and without replacement.