Variance Via Covariance


Variance Via Covariance

In this chapter we return to random sampling, in particular to the variability in the sum of a random sample. Binomial and hypergeometric random variables are such sums. Means of random samples are easily calculated once we have the sample sum. So it is worth taking some time to understand how sample sums behave.

By Chebychev's inequality, the value of a random variable $X$ is most likely to be in the range "$E(X) \pm$ a few $SD(X)$". The measure of spread $SD(X)$ is the root mean squared deviation of $X$ from the mean:

$$ SD(X) = \sqrt{E[(X-E(X))^2]} $$

Variance is the square of the standard deviation. We know that variance has better computational properties than the SD, so this chapter will focus on ways to find variance. We will use some familiar shorthand:

  • $\mu_X = E(X)$
  • $\sigma_X = SD(X)$

Let $D_X = X - \mu_X$ denote the deviation of $X$ from its mean. Then

$$ Var(X) = \sigma_X^2 = E(D_X^2) $$

Variance of a Sum

Let $X$ and $Y$ be two random variables on the same space, and let $S = X+Y$. Then $E(S) = \mu_X + \mu_Y$, and the deviation of $S$ is the sum of the deviations of $X$ and $Y$:

$$ D_S ~ = ~ S - \mu_S ~ = ~ X + Y - (\mu_X + \mu_Y) ~ = ~ D_X + D_Y $$

This gives us some insight into the variance of the sum $S$.

\begin{align*} Var(S) &= E(D_S^2) \\ &= E[(D_X + D_Y)^2] \\ &= E(D_X^2) + E(D_Y^2) + 2E(D_XD_Y) \\ &= Var(X) + Var(Y) + 2E(D_XD_Y) \end{align*}

The first thing to note is that while the expectation of a sum is the sum of the expectations, the calculation above shows that the variance of a sum is in general not the sum of the variances. There's an extra term.

To calculate the variance of a sum, we have to understand that extra term.


The covariance of $X$ and $Y$, denoted $Cov(X, Y)$, is the expected product of the deviations of $X$ and $Y$:

$$ Cov(X, Y) ~ = ~ E(D_XD_Y) ~=~ E[(X - \mu_X)(Y - \mu_Y)] $$

In this chapter we will learn how to utilize covariance to find variances of sums. The fundamental calculation is the one we did above; here is the result again, using the language of covariance.

$$ Var(X+Y) ~ = ~ Var(X) + Var(Y) + 2Cov(X, Y) $$

results matching ""

    No results matching ""