Further Review Exercises
Many of these exercises require the material of Chapter 19 onwards, which of course relies on the material of the previous chapters. However, some of them can be solved using earlier material alone.
According to students and alumni, some of the exercises have appeared as questions in “quant” interviews.
1. A coin lands heads with probability p. Let X be the number of tosses till the first head appears and let Y be the number of tails before the first head.
(a) Find the moment generating function of X.
(b) Use the answer to (a) to find E(X). Note that by now you have found E(X) in several ways: by the tail sum formula, by conditioning on the first toss, by the pgf, and now by the mgf.
(c) Use the answer to (a) to find the moment generating function of Y.
2. Let X1,X2,…,Xn be i.i.d. Poisson (μ) random variables. Find the maximum likelihood estimate of μ.
3. Let X1,X2,…,Xn be i.i.d. uniform on (0,θ).
(a) Find the MLE of θ. [Don’t leap into a calculation. Sketch a graph of the function you are trying to maximize, and be careful about its domain.]
(b) Is the MLE unbiased? If not, use the MLE to construct an unbiased estimate of θ.
4. X and Y are i.i.d. with moment generating function M(t)=et+t2, −∞<t<∞. What is the distribution of (X−Y)2?
5. Capture-recapture methods are sometimes used to estimate population sizes. A standard image is that a pond contains N fish for some fixed but unknown N, and that G of the N fish have been captured, tagged, and returned alive to the pond. You can assume that G/N isn’t close to 0.
In the recapture phase, assume that a simple random sample of n fish is drawn from the N fish in the pond (you might have to use some imagination to believe this assumption). We can observe X, the random number of tagged fish in the sample.
The goal is to use the observation to estimate N.
(a) For large n, the sample proportion X/n is likely to be close to a constant. Identify the constant and hence construct an estimate of N based on X.
Later in this exercise you will see how your estimate is related to the MLE of N.
(b) For N≥n, find the likelihood lik(N). You can assume n>G.
(c) Find the likelihood ratio R(N)=lik(N)lik(N−1) for N>n. Simplify the answer as much as you can.
(d) Find the maximum likelihood estimate of N by comparing the likelihood ratios and 1. How does the MLE compare with your estimate in (a)?
6. Show that if r>1 and s>1 then the mode of the beta (r,s) distribution is (r−1)/(r+s−2). Remember to ignore multiplicative constants and take the log before maximizing.
7. Suppose that X has the beta (r,s) distribution, and that given X=p, the conditional distribution of H is binomial (10,p). Find
(a) the conditional distribution of X given H=7
(b) E(X∣H=7)
(c) the MAP estimate of X given H=7
(d) P(H=7)
(e) E(H)
8. The chance of heads of a random coin is picked according to the beta (r,s) distribution. The coin is tossed repeatedly.
(a) What is the chance that the first three tosses are heads and the next three tosses are tails?
(b) Given that the first three tosses are heads and the next three tosses are tails, what is the chance that the seventh toss is a head?
(c) Given that three out of the first six tosses are heads, what is the chance that the seventh toss is a head? Compare with the answer to (b).
9. Person A creates a coin by picking its chance of heads uniformly on (0,1). In three tosses of that coin, Person A gets two heads.
Independently of Person A, Person B creates a coin by picking its chance of heads uniformly on (0,1). In three tosses of that coin, Person B gets one head.
(a) Given the data, what is the distribution of the chance of heads of Person A’s coin?
(b) Given the data, what is the distribution of the chance of heads of Person B’s coin?
(c) Given the data, what is the probability that Person A’s coin has a higher chance of heads than Person B’s coin?
10: Markov and Chebyshev Bounds on the Poisson-Binomial Upper Tail. For j≥1 let Ij be independent indicators such that P(Ij=1)=pj. Let X=I1+I2+…+In. Then X is the number of successes in n independent trials that are not necessarily identically distributed.
We say that X has the Poisson-binomial distribution with parameters p1,p2,…,pn. The binomial is the special case when all the pj’s are equal.
You saw in lab that the number of occupied tables in a Chinese Restaurant process had a Poisson-Binomial distribution. These distributions arise in statistical learning theory, the theory of randomized algorithms, and other areas.
Let E(X)=μ. For c>0, you are going to find an upper bound on P(X≥(1+c)μ). That’s the chance that X exceeds its mean by some percent.
In the special case of the binomial, μ=np and so P(X≥(1+c)μ) can be rewritten as P(Xn−p≥cp). That’s the chance that the sample proportion exceeds p by some percent.
(a) Find μ=E(X) and σ2=Var(X) in terms of p1,p2,…,pn.
(b) Find Markov’s bound on P(X≥(1+c)μ).
(c) Find Chebyshev’s bound on P(X≥(1+c)μ) in terms of μ and σ.
(d) If all the pj’s are equal to p, what is the value of the bound in (c)?
11: Chernoff Bound on Poisson-Binomial Upper Tail. This exercise continues the previous one and uses the same notation.
(a) Show that the mgf of Ij is given by MIj(t)=1+pj(et−1) for all t.
(b) Use (a) to derive an expression for MX(t), the mgf of X evaluated at t.
(c) An useful exponential bound is that ex≥1+x for all x. You don’t have to show it but please look at the graphs. Use the fact to show that MX(t)≤exp(μ(et−1)) for all t. Notice that the right hand side is the mgf of a Poisson random variable that has the same mean as X.
(d) Use Chernoff’s method and the bound in (c) to show that
P(X≥(1+c)μ) ≤ (exp(c)(1+c)1+c)μRemember that μ=np when all the pj’s are equal. If g(c)=exp(c)/(1+c)1+c is small then the bound above will decrease exponentially as n gets large. That is the focus of the next exercise.
12: Simplified Chernoff Bounds on Poisson-Binomial Upper Tail. This exercise continues the previous one and uses the same notation.
The bound in the previous exercise is a bit complicated. Often, simpler versions are used because they are good enough even though they are weaker.
(a) It is not hard to show that log(1+c)≥2c2+c for c>0. You don’t have to show it but please look at the graphs. Use the fact to show that c−(1+c)log(1+c)≤−c22+c for c>0.
(b) Show that if X has a Poisson-binomial distribution with mean μ then
P(X≥(1+c)μ) ≤ exp(−c22+cμ), c>0(c) A simpler but weaker version of the bound in (b) is also often used. Show that
P(X≥(1+c)μ) ≤ exp(−c23μ), c∈(0,1)13. A positive random variable V has expectation μ and variance σ2.
(a) For each v>0, the conditional distribution of X given V=v is Poisson (v). Find E(X) and Var(X) in terms of μ and σ.
(b) For each v>0, the conditional distribution of X given V=v is gamma (v,λ) for some fixed λ. Find E(X) and Var(X) in terms of μ and σ.
14. Let X1,X2,…,Xn be i.i.d. with expectation μ and variance σ2. Let S=∑ni=1Xi.
(a) Find the least squares predictor of S based on X1, and find the mean squared error (MSE) of the predictor.
(b) Find the least squares predictor of X1 based on S, and find the MSE of the predictor. Is the predictor a linear function of S? If so, it must also be the best among all linear predictors based on S, which is commonly known as the regression predictor.
15. A p-coin is tossed repeatedly. Let WH be the number of tosses till the first head appears, and WHH the number of tosses till two consecutive heads appear.
(a) Describe a random variable X that depends only on the tosses after WH and satisfies WHH=WH+X.
(b) Use Part (a) to find E(WHH) and Var(WHH).
16. Let N be a non-negative integer valued random variable, and let X,X1,X2,… be i.i.d. and independent of N. As before, define the random sum S by
S = 0 if N=0= X1+X2+⋯+Xn if N=n>0(a) Let M be our usual notation for moment generating functions. By conditioning on N, show that
MS(t) = MN(logMX(t))assuming that all the quantities above are well defined. [The identity (ea)n=ean might be handy.]
(b) Let N have the geometric (p) distribution on 1,2,3,…. Find the mgf of N. This doesn’t use Part (a).
(c) Let X1,X2,… be i.i.d. exponential (λ) variables and let N be geometric as in Part (b). Use the results of Parts (a) and (b) to identify the distribution of S.
17. A random vector Y=[Y1 Y2 ⋯ Yn]T has mean vector μ and covariance matrix σ2In where σ>0 is a number and In is the n×n identity matrix.
(a) Pick one option and explain: Y1 and Y2 are
(i) independent. (ii) uncorrelated but might not be independent. (iii) not uncorrelated.
(b) Pick one option and explain: Var(Y1) and Var(Y2) are
(i) equal. (ii) possibly equal, but might not be. (iii) not equal.
(c) For m≤n let A be an m×n matrix of real numbers, and let b be an m×1 vector of real numbers. Let V=AY+b. Find the mean vector μV and covariance matrix ΣV of V.
(d) Let c be an m×1 vector of real numbers and let W=cTV for V defined in Part (c). In terms of c, μV and ΣV, find E(W) and Var(W).
18. Let [X1 X2 X3]T be multivariate normal with mean vector μ and covariance matrix Σ given by
μ = [μμμ] Σ = [σ21σ12σ13σ21σ22σ23σ31σ32σ23]Find P((X1+X2)/2<X3+1).
19. Let X be standard normal. Construct a random variable Y as follows:
- Toss a fair coin.
- If the coin lands heads, let Y=X.
- If the coin lands tails, let Y=−X.
(a) Find the cdf of Y.
(b) Find E(XY) by conditioning on the result of the toss.
(c) Are X and Y uncorrelated?
(d) Are X and Y independent?
(e) Is the joint distribution of X and Y bivariate normal?
20. Let X and Y be standard bivariate normal with correlation ρ. Find E(max(X,Y)). The easiest way is to use the fact that for any two numbers a and b, max(a,b)=(a+b+|a−b|)/2. Check the fact first, and then use it.
21. Suppose that X is normal (μX,σ2X), Y is normal (μY,σ2Y), and the two random variables are independent. Let S=X+Y.
(a) Find the conditional distribution of X given S=s.
(b) Find the least squares predictor of X based on S and provide its mean squared error.
(c) Find the least squares linear predictor of X based on S and provide its mean squared error.
22. Let X be a p×1 random vector and suppose we are trying to predict a random variable Y by a linear function of X. A section of the textbook identifies the least squares linear predictor by restricting the search to linear functions h(X) for which E(h(X))=μY. Show that this is a legitimate move.
Specifically, let ˆY1=cTX+d be a linear predictor such that E(ˆY1)≠μY. Find a non-zero constant k such that ˆY2=ˆY1+k satisfies E(ˆY2)=μY. Then show that MSE(ˆY1)≥MSE(ˆY2). This will show that the least squares linear predictor has to have the same mean as Y.
23. Let U1,U2,U3,… be i.i.d. uniform on (0,1). For n≥1, let Sn=∑ni=1Ui.
Let fSn be the density of Sn. The formula for fSn is piecewise polynomial on the possible values (0,n). In this problem we will just focus on the density on the interval (0,1) and discover a nice consequence.
(a) For 0<x<1, find fS2(x).
(b) Use Part (a) and the convolution formula to find fS3(x) for 0<x<1.
(c) Guess a formula for fSn(x) for 0<x<1 and prove it by induction.
(d) Find P(Sn<1).
(e) Let N=minn:Sn>1. Use Part (d) to find E(N).
24: Normal Sample Mean and Sample Variance, Part 1. Let X1,X2,…,Xn be i.i.d. with mean μ and variance σ2. Let
ˉX = 1nn∑i=1Xidenote the sample mean and
S2 = 1n−1n∑i=1(Xi−ˉX)2denote the sample variance as defined earlier in the course.
(a) For 1≤i≤n let Di=Xi−ˉX. Find Cov(Di,ˉX).
(b) Now assume in addition that X1,X2,…,Xn are i.i.d. normal (μ,σ2). What is the joint distribution of ˉX,D1,D2,…,Dn−1? Explain why Dn isn’t on the list.
(c) True or false (justify your answer): The sample mean and sample variance of an i.i.d. normal sample are independent of each other.
25: Normal Sample Mean and Sample Variance, Part 2
(a) Let R have the chi-squared distribution with n degrees of freedom. What is the mgf of R?
(b) For R as in Part (a), suppose R=V+W where V and W are independent and V has the chi-squared distribution with m<n degrees of freedom. Can you identify the distribution of W? Justify your answer.
(c) Let X1,X2,…,Xn be any sequence of random variables and let ˉX=1n∑ni=1Xi. Let α be any constant. Prove the sum of squares decomposition
n∑i=1(Xi−α)2 = n∑i=1(Xi−ˉX)2 + n(ˉX−α)2(d) Now let X1,X2,…,Xn be i.i.d. normal with mean μ and variance σ2>0. Let S2 be the “sample variance” defined by
S2 = 1n−1n∑i=1(Xi−ˉX)2Find a constant c such that cS2 has a chi-squared distribution. Provide the degrees of freedom.
[Use Parts (b) and (c) as well as the result of the previous exercise.]
26. The heights of a population of mother-daughter pairs have the bivariate normal distribution with equal means of 67 inches, equal SDs of 2 inches, and correlation 0.5.
(a) Of the mothers on the 90th percentile of heights, what proportion have daughters who are taller than the 90th percentile?
(b) In what proportion of mother-daughter pairs are both women taller than average?