12.2. Prediction and Estimation#

One way to think about the SD is in terms of errors in prediction. Suppose I am going to generate a value of the random variable X, and I ask you to predict the value I am going to get. What should you use as your predictor?

A natural choice is μX, the expectation of X. But you could choose any number c. The error that you will make is Xc. About how big is that? For most reasonable choices of c, the error will sometimes be positive and sometimes negative. To find the rough size of this error, we will avoid cancellation as before, and start by calculating the mean squared error of the predictor c:

MSE(c) = E[(Xc)2]

Notice that by definition, the variance of X is the mean squared error of using μX as the predictor.

MSE(μX) = E[(XμX)2] = σX2
See More

We will now show that μX is the least squares constant predictor, that is, it has the smallest mean squared error among all constant predictors. Since we have guessed that μX is the best choice, we will organize the algebra around that value.

MSE(c) = E[(Xc)2]=E[((XμX)+(μXc))2]=E[(XμX)2]+2(μXc)E[(XμX)]+(μXc)2=σX2+0+(μXc)2σX2=MSE(μX)

with equality if and only if c=μX.

12.2.1. The Mean as a Least Squares Predictor#

What we have shown is the predictor μX has the smallest mean squared error among all choices c. That smallest mean squared error is the variance of X, and hence the smallest root mean squared error is the SD σX.

This is why a common approach to prediction is, “My guess is the mean, and I’ll be off by about an SD.”

Quick Check

Your friend has a random dollar amount X in their wallet. Suppose you know that E(X)=16 dollars and SD(X)=3 dollars. In all your answers below, please include units of measurement.

(a) What is the least squares constant predictor of X?

(b) What is the mean squared error of this predictor?

(c) What is the root mean squared error of this predictor?

12.2.2. German Tanks, Revisited#

Recall the German tanks problem in which we have a sample X1,X2,,Xn drawn at random without replacement from 1,2,,N for some fixed N, and we are trying to estimate N.

We came up with two unbiased estimators of N:

  • An estimator based on the sample mean: T1=2X¯n1 where X¯n is the sample average 1ni=1nXi

  • An estimator based on the sample maximum: T2=Mn+1n1 where M=max(X1,X2,,Xn).

Here are simulated distributions of T1 and T2 in the case N=300 and n=30, based on 5000 repetitions.

def simulate_T1_T2(N, n):
    """Returns one pair of simulated values of T_1 and T_2
    based on the same simple random sample"""
    tanks = np.arange(1, N+1)
    sample = np.random.choice(tanks, size=n, replace=False)
    t1 = 2*np.mean(sample) - 1
    t2 = max(sample)*(n+1)/n - 1
    return [t1, t2]

def compare_T1_T2(N, n, repetitions):
    """Returns a table of simulated values of T_1 and T_2, 
    with the number of rows = repetitions
    and each row containing the two estimates based on the same simple random sample"""
    tbl = Table(['T_1 = 2*Mean-1', 'T_2 = Augmented Max'])
    for i in range(repetitions):
        tbl.append(simulate_T1_T2(N, n))
    return tbl

N = 300
n = 30
repetitions = 5000
comparison = compare_T1_T2(N, n, 5000)   
comparison.hist(bins=np.arange(N/2, 3*N/2))
plt.title('$N =$'+str(N)+', $n =$'+str(n)+' ('+str(repetitions)+' repetitions)');
../../_images/334f754e0313a9fba6ff8481a258483736bb4826a2ce8d3063b489ec048f6dcb.png

We know that both estimators are unbiased: E(T1)=N=E(T2). But is clear from the simulation that SD(T1)>SD(T2) and hence T2 is a better estimator than T1.

The empirical values of the two means and standard deviations based on this simulation are calculated below.

t1 = comparison.column(0)
np.mean(t1), np.std(t1)
(300.07926666666668, 30.068877736808055)
t2 = comparison.column(1)
np.mean(t2), np.std(t2)
(299.98106666666666, 9.1113762209668376)

These standard deviations are calculated based on empirical data given a specified value of the parameter N=300 and a specified sample size n=30. In the next chapter we will develop properties of the SD that will allow us to obtain algebraic expressions for SD(T1) and SD(T2) for all N and n.