# 24.5. Exercises¶

1. Let $$X$$ and $$Y$$ be jointly distributed random variables and let $$\hat{Y}$$ be the linear regression estimate of $$Y$$ based on $$X$$. Show that the mean squared error of this estimate is $$(1 - r^2)Var(Y)$$ where $$r$$ is the correlation between $$X$$ and $$Y$$. This leads to the Data 8 formula for the SD of the residuals in simple linear regression.

[Use Alternative Form I of the regression equation, and preserve deviations as we did here.]

2. Let $$X$$ and $$Y$$ be standard bivariate normal with correlation $$\rho$$.

(a) Suppose I ask you for the least squares estimate of $$Y$$ based on $$X$$, but I don’t tell you $$X$$. What is your estimate, and what is its mean squared error?

(b) Suppose I now show you $$X$$. Now what is your least squares estimate of $$Y$$, and what is its mean squared error?

(c) What is your least squares estimate of $$Y$$ based only on linear functions of $$X$$, and what is its mean squared error?

3. Let $$(X, Y)$$ be the weight and height of a person picked at random from a population, and suppose the distribution of $$(X, Y)$$ is bivariate normal with correlation 0.6. Suppose also that

• $$X$$ has mean 150 pounds and SD 25 pounds

• $$Y$$ has mean 68 inches and SD 3 inches

Sketch the conditional density of $$Y$$ given $$X = 170$$ pounds. Mark the numerical values of the conditional mean and SD appropriately on your sketch.

4. Let $$X$$ and $$Y$$ have a bivariate normal distribution (not necessarily standard) with correlation $$\rho \in (0, 1)$$. Suppose you are given that $$X$$ is on the 30th percentile.

(a) Pick the right option for the least squares estimate of $$Y$$, and explain.

(i) Below the 30th percentile

(ii) On the 30th percentile

(iii) Above the 30th percentile

(b) Write a single math expression for the percentile rank corresponding to the least squares estimate of $$Y$$. Your answer can involve $$\rho$$ and the standard normal cdf $$\Phi$$.

5. Let $$X$$ and $$Y$$ be standard bivariate normal with correlation $$\rho \in (0, 1)$$.

(a) Without calculation, pick the right option and explain. $$P(X > 0, Y < 0)$$ is

(i) less than $$0.25$$

(ii) equal to $$0.25$$

(iii) greater than $$0.25$$

(b) Now find $$P(X > 0, Y < 0)$$ in terms of $$\rho$$.

[No integration is needed. Write $$Y$$ in terms of $$X$$ and standard normal $$Z$$ independent of $$X$$, sketch the region, and use what you know about the joint density of $$(X, Z)$$.]

6. Let $$X$$ and $$Y$$ be standard bivariate normal with correlation $$\rho$$. Find $$E(\max(X, Y))$$. The easiest way is to use the fact that for any two numbers $$a$$ and $$b$$, $$\max(a, b) = (a + b + \vert a - b \vert)/2$$. Check the fact first, and then use it.

7. Suppose that $$X$$ is normal $$(\mu_X, \sigma_X^2$$), $$Y$$ is normal $$(\mu_Y, \sigma_Y^2)$$, and the two random variables are independent. Let $$S = X+Y$$.

(a) Find the conditional distribution of $$X$$ given $$S=s$$.

(b) Find the least squares predictor of $$X$$ based on $$S$$ and provide its mean squared error.

(c) Find the least squares linear predictor of $$X$$ based on $$S$$ and provide its mean squared error.