Birth Weight | Gestational Days | Maternal Age | Maternal Height | Maternal Pregnancy Weight | Maternal Smoker | \n", "
---|---|---|---|---|---|
120 | 284 | 27 | 62 | 100 | False | \n", "
113 | 282 | 33 | 64 | 135 | False | \n", "
128 | 279 | 28 | 64 | 115 | True | \n", "
108 | 282 | 23 | 67 | 125 | True | \n", "
136 | 286 | 25 | 62 | 93 | False | \n", "
138 | 244 | 33 | 62 | 178 | False | \n", "
132 | 245 | 23 | 65 | 140 | False | \n", "
120 | 289 | 25 | 62 | 125 | False | \n", "
143 | 299 | 30 | 66 | 136 | True | \n", "
140 | 351 | 27 | 68 | 120 | False | \n", "
... (1164 rows omitted)
" ], "text/plain": [ "Birth Weight | Gestational Days | Maternal Age | Maternal Height | Maternal Pregnancy Weight | Maternal Smoker\n", "120 | 284 | 27 | 62 | 100 | False\n", "113 | 282 | 33 | 64 | 135 | False\n", "128 | 279 | 28 | 64 | 115 | True\n", "108 | 282 | 23 | 67 | 125 | True\n", "136 | 286 | 25 | 62 | 93 | False\n", "138 | 244 | 33 | 62 | 178 | False\n", "132 | 245 | 23 | 65 | 140 | False\n", "120 | 289 | 25 | 62 | 125 | False\n", "143 | 299 | 30 | 66 | 136 | True\n", "140 | 351 | 27 | 68 | 120 | False\n", "... (1164 rows omitted)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "baby" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The third column consists of the ages of the mothers. Let's construct an approximate 95% confidence interval for the mean age of mothers in the population. We did this in Data 8 using the bootstrap, so we will be able to compare results.\n", "\n", "We can apply the methods of this section because our data come from a large random sample." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "27.228279386712096" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ages = baby.column('Maternal Age')\n", "\n", "samp_mean = np.mean(ages)\n", "samp_mean" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1174" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n = baby.num_rows\n", "n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The observed value of $\\bar{X}_n$ in the sample is 27.23 years. We know that $n = 1174$, so all we need is the population SD $\\sigma$ and then we can complete our calculation.\n", "\n", "But of course we don't know the population SD $\\sigma$. We only have a sample.\n", "\n", "As data scientists, we are used to lifting ourselves by our own bootstraps. Notice that the SD of the sample mean is $\\sigma/\\sqrt{n}$. If we estimate $\\sigma$ by the SD of the data, there will be some error in the estimate but the error will be divided by $\\sqrt{n}$ and therefore won't have much effect. \n", "\n", "That means we can use \"sample SD divided by $\\sqrt{n}$\" as an estimate of $\\sigma/\\sqrt{n}$. \n", "\n", "The sample SD, our estimate of $\\sigma$, is about 5.82 years." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5.815360404190897" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sigma_estimate = np.std(ages)\n", "sigma_estimate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An approximate 95% confidence interval for the mean birth weight of babies in the population is $(26.89, 27.57)$ years." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([26.89562086, 27.56093791])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sd_sample_mean = sigma_estimate/(n ** 0.5)\n", "\n", "ci_95_pop_mean = samp_mean + 1.96 * make_array(-1, 1) * sd_sample_mean\n", "ci_95_pop_mean" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No bootstrapping required! " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [ "remove-input", "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "