# 22.1. Conditional Expectation As a Projection¶

Suppose we are trying to predict the value of a random variable \(Y\) based on a related random variable \(X\). As you saw in Data 8, a natural method of prediction is to use the “center of the vertical strip” at the given value of \(X\).

Formally, given \(X=x\), we are proposing to predict \(Y\) by \(E(Y \mid X=x)\).

The conditional expectation \(E(Y \mid X)\) is the function of \(X\) defined by

We are using the letter \(b\) to signifiy the “best guess” of \(Y\) given the value of \(X\). Later in this chapter we will make precise the sense in which it is the best.

In random variable notation,

For a point \((X, Y)\), the error in this guess is

The subscript \(w\) reminds us that this error is a deviation *within* a vertical strip – it is the difference between \(Y\) and the center of the strip at the given value of \(X\).

To find properties of \(b(X)\) as an estimate of \(Y\) it will be helpful to recall some properties of conditional expectation.

## 22.1.1. Conditional Expectation: Review¶

The properties of conditional expectation are analogous to those of expectation, but the identities are of random variables, not real numbers. There are also some additional properties due to the aspect of conditioning. We provide a list of the properties here for ease of reference.

**Linear transformation**: \(E(aY + b \mid X) ~ = ~ aE(Y \mid X) + b\)**Additivity**: \(E(Y + W \mid X) ~ = ~ E(Y \mid X) + E(W \mid X)\)**“The given variable is a constant”**: \(E(g(X) \mid X) ~ = ~ g(X)\)**“Pulling out” constants**: \(E(g(X)Y \mid X) ~ = ~ g(X)E(Y \mid X)\)**Independence**: If \(X\) and \(Y\) are independent then \(E(Y \mid X) = E(Y)\), a constant.**Iteration**: \(E(Y) = E\big{(}E(Y \mid X)\big{)}\)

## 22.1.2. Expected Error is Zero¶

By additivity,

In other words, the average of the deviations within a strip is \(0\).

By iteration, \($ E(D_w) ~ = ~ 0 ~~~~~~ \text{and} ~~~~~~ E\big{(}b(X)\big{)} = E(Y) $\)