18.2. Sums of Independent Normal Variables#

This section consists of examples based on one important fact:

The sum of independent normal variables is normal.

We will prove the fact in a later section using moment generating functions. For now, we will just run a quick simulation and then see how to use the fact in examples.

mu_X = 10
sigma_X = 2
mu_Y = 15
sigma_Y = 3
x = stats.norm.rvs(mu_X, sigma_X, size=10000)
y = stats.norm.rvs(mu_Y, sigma_Y, size=10000)
s = x+y
Table().with_column('S = X+Y', s).hist(bins=20)
plt.title('$X$ is normal (10, $2^2$); $Y$ is normal (15, $3^2$) independent of $X$');
../../_images/c2fc53cc11148fd1b4dc61558a2efd31d5de9c9aa26d2629166b08db38af804d.png

The simulation above generates 10,000 copies of X+Y where X has the normal distribution with mean 10 and SD 2 and Y is independent of X and has the normal distribution with mean 15 and SD 3. The distribution of the sum is clearly normal. You can vary the parameters and check that the distribution of the sum has the same shape, though with different labels on the axes.

To identify which normal, you have to find the mean and variance of the sum. Just use properties of the mean and variance:

If X has the normal (μX,σX2) distribution, and Y independent of X has the normal (μY,σY2) distribution, then the distribution of X+Y is normal with mean μX+μY and variance σX2+σY2.

This means that we don’t need the joint density of X and Y to find probabilities of events determined by X+Y.

18.2.1. Sums of IID Normal Variables#

Let X1,X2,,Xn be i.i.d. normal with mean μ and variance σ2. Let Sn=X1+X2++Xn. Then the distribution of Sn is normal with mean nμ and variance nσ2.

This looks rather like the Central Limit Theorem but notice that there is no assumption that n is large, and no approximation.

If the underlying distribution is normal, then the distribution of the i.i.d. sample sum is normal regardless of the sample size.

18.2.2. The Difference of Two Independent Normal Variables#

If Y is normal, then so is Y. So if X and Y are independent normal variables then XY is normal with mean μXμY and variance given by

Var(XY) = Var(X)+Var(Y) = σX2+(1)2σY2 = σX2+σY2

For example, let the heights of Persons A and B be HA and HB respectively, and suppose HA and HB are i.i.d. normal with mean 66 inches and SD 3 inches. Then the chance that Person A is more than 2 inches taller than Person B is

P(HA>HB+2)=P(HAHB>2)=1Φ(2018)

because HAHB is normal with mean 0 and SD 32+32=18=4.24 inches.

../../_images/be237b0a7ba8c702b22ccd07dc1de228e01c50827c9be5d58483d36e723bc3fd.png
mu = 0
sigma = 18**0.5
1 - stats.norm.cdf(2, mu, sigma)
0.31867594411696853

18.2.3. Comparing Two Sample Proportions#

A candidate is up for election. In State 1, 50% of the voters favor the candidate. In State 2, only 27% of the voters favor the candidate. A simple random sample of 1000 voters is taken in each state. You can assume that the samples are independent of each other and that there are millions of voters in each state.

Question. Approximately what is the chance that in the sample from State 1, the proportion of voters who favor the candidate is more than twice as large as the proportion in the State 2 sample?

Answer. For i=1,2, let Xi be the proportion of voters who favor the candidate in the sample from State i. We want the approximate value of P(X1>2X2). By the Central Limit Theorem, both X1 and X2 are approximately normal. So X12X2 is also approximately normal.

Now it’s just a matter of figuring out the mean and the SD.

E(X12X2) = 0.52×0.27=0.04
Var(X1)=0.5×0.51000=0.00025,      Var(X2)=0.27×0.731000=0.000197
Var(X12X2)=Var(X1)+4Var(X2)=0.00104,      SD(X12X2)=0.03222

So

P(X1>2X2) = P(X12X2>0)  1Φ(0(0.04)0.03222)  10.7%
mu = 0.5 - 2*0.27
var = (0.5*0.5/1000) + 4*(0.27*.73/1000)
sigma = var**0.5
1 - stats.norm.cdf(0, mu, sigma)
0.1072469993885582

Quick Check

X is normal (20,52) and Y is normal (75,102). Assume X and Y are independent.

Sketch the density of Y2X. Mark the numerical values of the expectation and SD appropriately on your sketch. You can use the code cell below to calculate them.