Expectations of Functions

Interact

Once we start using random variables as estimators, we will want to see how far the estimate is from a desired value. For example, we might want to see how far a random variable X is from the number 10. That’s a function of X. Let’s call it Y. Then

Y=|X10|

which is not a linear function. To find E(Y), we need a bit more technique. Throughout, we will assume that all the expectations that we are discussing are well defined.

This section is about finding the expectation of a function of a random variable whose distribution you know.

In what follows, let X be a random variable whose distribution (and hence also expectation) are known.

Linear Function Rule

Let Y=aX+b for some constants a and b. In an earlier section we showed that

E(Y)=aE(X)+b

This includes the case where a=0 and thus Y is just the constant b and thus has expectation b.

Non-linear Function Rule

Now let Y=g(X) where g is any numerical function. Remember that X is a function on Ω. So the function that defines the random variable Y is a composition:

Y(ω)=(gX)(ω)         for ωΩ

This allows us to write E(Y) in three equivalent ways:

On the range of Y

E(Y)=all yyP(Y=y)

On the domain Ω

E(Y)=E(g(X))=ωΩ(gX)(ω)P(ω)

On the range of X

E(Y)=E(g(X))=all xg(x)P(X=x)

As before, it is a straightforward matter of grouping to show that all the forms are equivalent.

The first form looks the simplest, but there’s a catch: you need to first find P(Y=y). The second form involves an unnecessarily high level of detail.

The third form is the one to use. It uses the known distribution of X. It says that to find E(Y) where Y=g(X) for some function g:

  • Take a generic value x of X.
  • Apply g to x; this g(x) is a generic value of Y.
  • Weight g(x) by P(X=x), which is known.
  • Do this for all x and add. The sum is E(Y).

The crucial thing to note about this method is that we didn’t have to first find the distribution of Y. That saves us a lot of work. Let’s see how our method works in some examples.

Example 1: Y=|X3|

Let X have a distribution we worked with earlier:

x = np.arange(1, 6)
probs = make_array(0.15, 0.25, 0.3, 0.2, 0.1)
dist = Table().values(x).probability(probs)
dist = dist.relabel('Value', 'x').relabel('Probability', 'P(X=x)')
dist
x P(X=x)
1 0.15
2 0.25
3 0.3
4 0.2
5 0.1

Let g be the function defined by g(x)=|x3|, and let Y=g(X). In other words, Y=|X3|.

To calculate E(Y), we first have to create a column that transforms the values of X into values of Y:

dist_with_Y = dist.with_column('g(x)', np.abs(dist.column('x')-3)).move_to_end('P(X=x)')

dist_with_Y
x g(x) P(X=x)
1 2 0.15
2 1 0.25
3 0 0.3
4 1 0.2
5 2 0.1

To get E(Y), find the appropriate weighed average: multiply the g(x) and P(X=x) columns, and add. The calculation shows that E(Y)=0.95.

ev_Y = sum(dist_with_Y.column('g(x)') * dist_with_Y.column('P(X=x)'))
ev_Y
0.94999999999999996

Example 2: Y=min(X,3)

Let X be as above, but now let Y=min(X,3). We want E(Y). What we know is the distribution of X:

dist
x P(X=x)
1 0.15
2 0.25
3 0.3
4 0.2
5 0.1

To find E(Y) we can just go row by row and replace the value of x by the value of min(x,3), and then find the weighted average:

ev_Y = 1*0.15 + 2*0.25 + 3*0.3 + 3*0.2 + 3*0.1
ev_Y
2.45

Example 3: E(X2) for a Poisson Variable X

Let X have the Poisson (μ) distribution. You will see in the next chapter that it will be useful to know the value of E(X2). By our non-linear function rule,

E(X2)=k=0k2eμμkk!

This sum turns out to be hard to simplify. The term for k=0 is 0. In each term for k1, one of the k’s in the numerator cancels a k in the denominator but the other factor of k in the numerator remains. It would be so nice if that factor k were k1 instead, so it could cancel k1 in the denominator.

This motivates the following calculation. No matter what X is, if we know E(X) and can find E(X(X1)), then we can use additivity to find E(X2) as follows:

E(X(X1))=E(X2X)=E(X2)E(X)

so

E(X2)=E(X(X1))+E(X)

Let’s see if we can find E(X(X1)) by applying the non-linear function rule.

E(X(X1))=k=0k(k1)eμμkk!=eμμ2k=2μk2(k2)!=eμμ2eμ=μ2

We know that E(X)=μ, so

E(X2)=μ2+μ

Notice that E(X2)>(E(X))2. This is an instance of a general fact. Later in the course we will see why it matters.

For now, as an exercise, see if you can find E(X(X1)(X2)) and hence E(X3).