22.3. Variance by Conditioning

Iteration allows us to find expectation by conditioning. We now have the tools to find variance by conditioning as well.

Recall the notation of the previous section:

  • \(X\) and \(Y\) are jointly distributed random variables

  • \(b(X) = E(Y \mid X)\)

  • \(D_w = Y - b(X)\)

Define \(D_Y = Y - E(Y)\). Then

\[ D_Y ~ = ~ D_w + (b(X) - E(Y)) ~ = ~ D_w + D_b \]

where \(D_b = b(X) - E(Y)\) is the deviation of the random variable \(b(X)\) from its expectation \(E(Y)\).

In the graph below, the black line is at the level \(E(Y)\), and the dark blue point is a generic point \((X, Y)\) in the scatter plot. Its distance from the black line is \(D_Y\) and is equal to the sum of two lengths:

  • \(D_w\), the length of the purple segment

  • \(D_b\), the length of the green segment

../../_images/03_Variance_by_Conditioning_6_0.png

22.3.1. Decomposition of Variance

The expectation \(E(Y)\) is a constant. That means \(D_b = b(X) - E(Y)\) is a function of \(X\), and hence \(E(D_wD_b) = 0\). So

\[\begin{split} \begin{align*} Var(Y) ~ = ~ E(D_Y^2) ~ &= ~ E\big{(} (D_w + D_b)^2 \big{)} \\ &= E(D_w^2) + E(D_b^2) + 2 E(D_wD_b) \\ &= E(D_w^2) + E(D_b^2) \end{align*} \end{split}\]

Let’s take a closer look at the two terms on the right hand side. In the previous section we saw that

\[ E(D_w^2) ~ = ~ MSE(b) ~ = ~ E(Var(Y \mid X)) \]

Thus the first term on the right hand side is the expectation of the conditional variance.

To understand the second term, note that \(D_b = b(X) - E(Y) = b(X) - E(b(X))\). So

\[ E(D_b^2) ~ = ~ Var(b(X)) ~ = ~ Var(E(Y \mid X)) \]

Thus the second term on the right is the variance of the conditional expectation.

We thus have a decomposition of variance:

\[ Var(Y) ~ = ~ E(Var(Y \mid X)) + Var(E(Y \mid X)) \]

That is, the variance is equal to the expectation of the conditional variance plus the variance of the conditional expectation.

It makes sense that the two quantities on the right hand side are involved in the calculation of \(Var(Y)\). The variability of \(Y\) has two components:

  • the rough size of the variability within the individual vertical strips, that is, the expectation of the conditional variance

  • the variability between strips, measured by the variance of the centers of the strips.

The variance decomposition show that you can just add the two terms to get \(Var(Y)\).

This decomposition is the basis of analysis of variance (ANOVA), widely used in statistics. In this course we are going to use it to find variances by conditioning.