{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove_cell" ] }, "outputs": [], "source": [ "# HIDDEN\n", "from datascience import *\n", "from prob140 import *\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "plt.style.use('fivethirtyeight')\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examples ##" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This section is a workout in finding probabilities of events determined by two jointly distributed random variables. We will apply the computational methods of the previous section and also the mathematical framework for finding probabilities that was developed there.\n", "\n", "In some of the examples you may find yourself wondering why we are bothering to write out math notation for numerical answers that we have already obtained using Python. It is because the Python visualizations help us understand the math. That understanding then helps us answer questions in generality, not just in particular numerical settings, as you will see in the final example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fair and Loaded Dice ###\n", "I have two dice, one of which is fair. The other die is shaped so that it is biased towards larger numbers of spots. The distribution of the number of spots on one roll of this biased shape is given by\n", "\n", "|value| 1 | 2 | 3 | 4 | 5 | 6 |\n", "|----:|:---:|:---:|:---:|:---:|:---:|:---:|\n", "|**probability**| 1/16 | 1/16 | 3/16 | 3/16 | 4/16 | 4/16 |\n", "\n", "Suppose I roll each die once." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Equality ###\n", "\n", "**Question.** What is the chance that I get the same number on both dice?\n", "\n", "**Answer 1, by symmetry.** No matter what number appears on the biased shape, the chance that the fair die shows that number is 1/6. So the answer is 1/6.\n", "\n", "Not convinced? Then let's calculate.\n", "\n", "**Answer 2.** Let $F$ be the number on the fair die and let $S$ be the number on the biased shape. It is reasonable to assume that the outcome of one die doesn't affect chances for the other. So for every $i$ and $j$ such that $1 \\le i, j \\le 6$, we have\n", "\n", "$$\n", "P(F = i, S = j) ~ = ~ P(F = i)P(S = j) ~ = ~ \\frac{1}{6}P(S = j)\n", "$$\n", "\n", "We want $P(F = S)$. For this we have to add up the probabilities of the $(i, j)$ pairs that satisfy $i = j$. Those are just the pairs $(i, i)$ for $1 \\le i \\le 6$.\n", "\n", "$$\n", "P(F = S) ~ = ~ \\sum_{i=1}^6 P(F = i, S = i) ~ = ~ \\sum_{i=1}^6 \\frac{1}{6}P(S = i) ~ = ~ \\frac{1}{6} \\sum_{i=1}^6 P(S = i) ~ = ~ \\frac{1}{6} \\cdot 1 ~ = ~ \\frac{1}{6}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Difference ###\n", "\n", "**Question.** What is the chance that the number on the biased shape exceeds the number on the fair die by more than 2?\n", "\n", "**Answer.** We can visualize the event $\\{ S > F + 2 \\}$ using the methods of the previous section.\n", "\n", "We know that the joint distribution of $F$ and $S$ is given by \n", "\n", "$$\n", "P(F = i, S = j) ~ = ~\n", "\\begin{cases} \n", "\\frac{1}{6} \\cdot \\frac{1}{16}, ~~~ 1 \\le i \\le 6, ~ j = 1, 2 \\\\\n", "\\frac{1}{6} \\cdot \\frac{3}{16}, ~~~ 1 \\le i \\le 6, ~ j = 3, 4 \\\\\n", "\\frac{1}{6} \\cdot \\frac{4}{16}, ~~~ 1 \\le i \\le 6, ~ j = 5, 6\n", "\\end{cases}\n", "$$\n", "\n", "We can display this in a joint distribution table. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
F=1F=2F=3F=4F=5F=6
S=60.0416670.0416670.0416670.0416670.0416670.041667
S=50.0416670.0416670.0416670.0416670.0416670.041667
S=40.0312500.0312500.0312500.0312500.0312500.031250
S=30.0312500.0312500.0312500.0312500.0312500.031250
S=20.0104170.0104170.0104170.0104170.0104170.010417
S=10.0104170.0104170.0104170.0104170.0104170.010417
\n", "
" ], "text/plain": [ " F=1 F=2 F=3 F=4 F=5 F=6\n", "S=6 0.041667 0.041667 0.041667 0.041667 0.041667 0.041667\n", "S=5 0.041667 0.041667 0.041667 0.041667 0.041667 0.041667\n", "S=4 0.031250 0.031250 0.031250 0.031250 0.031250 0.031250\n", "S=3 0.031250 0.031250 0.031250 0.031250 0.031250 0.031250\n", "S=2 0.010417 0.010417 0.010417 0.010417 0.010417 0.010417\n", "S=1 0.010417 0.010417 0.010417 0.010417 0.010417 0.010417" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spots = np.arange(1, 7) # possible values of F; same set for S\n", "\n", "fair = (1/6) * np.ones(6) \n", "biased = make_array(1/16, 1/16, 3/16, 3/16, 4/16, 4/16)\n", "\n", "def joint_probability(i, j): # returns P(F = i, S = j)\n", " return fair.item(i-1) * biased.item(j-1)\n", "\n", "# joint distribution table of F and S\n", "two_dice = Table().values('F', spots, 'S', spots).probability_function(joint_probability)\n", "two_dice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the indicator function of the event $\\{S > F + 2 \\}$ and then use the event method." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P(Event) = 0.2395833333333333\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
F=1F=2F=3F=4F=5F=6
S=60.04166670.04166670.0416667
S=50.04166670.0416667
S=40.03125
S=3
S=2
S=1
\n", "
" ], "text/plain": [ " F=1 F=2 F=3 F=4 F=5 F=6\n", "S=6 0.0416667 0.0416667 0.0416667 \n", "S=5 0.0416667 0.0416667 \n", "S=4 0.03125 \n", "S=3 \n", "S=2 \n", "S=1 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def indicator(i, j):\n", " return j > i + 2\n", "\n", "two_dice.event(indicator, 'F', 'S')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The answer $P(S > F + 2) \\approx 0.24$ has been obtained by adding the chances of all the cells $(i, j)$ for which $j > i+2$ or equivalently $i < j-2$. Since $j$ can be at most 6, this implies $1 \\le i \\le 3$, as is visible in the display.\n", "\n", "Expressed in math notation, the calculation is\n", "\n", "$$\n", "P(S > F + 2) ~ = ~ \\mathop{\\sum \\sum}_{j > i+2} P(F = i, S= j)\n", "~ = ~ \\sum_{i=1}^3\\sum_{j=i+3}^6 P(F = i, S= j)\n", "$$\n", "\n", "For each fixed value of $i$, the inner sum is the sum of the terms visible in the column labeled $F = i$. The outer sum adds up the three column sums." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Absolute Difference ###\n", "\n", "**Question.** What is the chance that the numbers on the two dice differ by no more than 1?\n", "\n", "**Answer.** The goal is to find $P(\\vert F - S \\vert \\le 1)$. We defined two_dice, the joint distribution of $F$ and $S$, in the previous problem. So now our work is simple." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P(Event) = 0.44791666666666674\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
F=1F=2F=3F=4F=5F=6
S=60.04166670.0416667
S=50.04166670.04166670.0416667
S=40.031250.031250.03125
S=30.031250.031250.03125
S=20.01041670.01041670.0104167
S=10.01041670.0104167
\n", "
" ], "text/plain": [ " F=1 F=2 F=3 F=4 F=5 F=6\n", "S=6 0.0416667 0.0416667\n", "S=5 0.0416667 0.0416667 0.0416667\n", "S=4 0.03125 0.03125 0.03125 \n", "S=3 0.03125 0.03125 0.03125 \n", "S=2 0.0104167 0.0104167 0.0104167 \n", "S=1 0.0104167 0.0104167 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def indicator_absdiff_atmost_1(i, j):\n", " return abs(i - j) <= 1\n", "\n", "two_dice.event(indicator_absdiff_atmost_1, 'F', 'S')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The calculation shows that $P(\\vert F - S \\vert \\le 1) \\approx 0.448$.\n", "\n", "The event is a diagonal band of cells, bounded by the lines $j = i-1$ and $j = i+1$. That is because the condition $\\vert i - j \\vert \\le 1$ is the same as $i-1 \\le j \\le i+1$.\n", "\n", "Notice the edge cases $i=1$ and $i=6$. When $i=1$, the condition $\\vert i - j \\vert \\le 1$ is only satisfied by $j=1$ and $j=2$, because $j = -1$ is not a possible outcome of the dice. Nor is $j = 7$ when $i = 6$. So there are two terms to add in each of the columns labeled $F=1$ and $F=6$, and three in each of the other columns.\n", "\n", "Check carefully that you agree that in math notation the calculation is\n", "\n", "$$\n", "P(\\vert F - S \\vert \\le 1) ~ = ~ \\sum_{j=1}^2 P(F = 1, S = j) ~ + ~ \\sum_{i=2}^5\\sum_{j=i-1}^{i+1} P(F = i, S = j) ~ + ~ \\sum_{j=5}^6 P(F = 6, S = j)\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sum ###\n", "\n", "**Question.** What is the chance that the sum of the numbers on the two dice is 7?\n", "\n", "**Answer.** This time we'll write out the math first.\n", "\n", "The event $\\{ F + S = 7 \\}$ consists of all possible pairs $(i, j)$ such that $i + j = 7$. For each fixed $i$, there is exactly one $j$ that satisfies $i+j = 7$, and that's $j = 7-i$. So\n", "\n", "$$\n", "P(F+S = 7) ~ = ~ \\sum_{i=1}^6 P(F = i, S = 7-i) ~ = ~ \\sum_{i=1}^6 \\frac{1}{6}P(S = 7-i)\n", "~ = ~ \\frac{1}{6} \\sum_{i=1}^6 P(S = 7-i) ~ = ~ \\frac{1}{6}\n", "$$\n", "\n", "because $\\sum_{i=1}^6 P(S = 7-i) = P(S=6) + P(S=5) + \\cdots + P(S=1) = 1$.\n", "\n", "Notice that the argument doesn't depend on the nature of the bias in $S$. The chance that the sum of the numbers on two dice equals 7 is 1/6 as long as one of the dice is fair.\n", "\n", "We can check the answer by computation." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P(Event) = 0.16666666666666663\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
F=1F=2F=3F=4F=5F=6
S=60.0416667
S=50.0416667
S=40.03125
S=30.03125
S=20.0104167
S=10.0104167
\n", "
" ], "text/plain": [ " F=1 F=2 F=3 F=4 F=5 F=6\n", "S=6 0.0416667 \n", "S=5 0.0416667 \n", "S=4 0.03125 \n", "S=3 0.03125 \n", "S=2 0.0104167 \n", "S=1 0.0104167" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def indicator_sum_7(i, j):\n", " return i + j == 7\n", "\n", "two_dice.event(indicator_sum_7, 'F', 'S')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question.** Now suppose I have two $n$-sided dice, at least one of which is fair. I roll each of them once. What is the chance that the sum of the two numbers is $n+1$?\n", "\n", "**Answer.** We can't use our computational methods for this one because the model isn't numerical. But we know how to solve the problem mathematically.\n", "\n", "Let $F_n$ be the number on a fair die and $D_n$ the number on the other die.\n", "\n", "\n", "\\begin{align*}\n", "P(F_n + D_n = n+1) ~ &= ~ \\sum_{i=1}^n P(F_n = i, D_n = n+1-i) \\\\\n", "&= ~ \\sum_{i=1}^n \\frac{1}{n}P(D_n = n+1-i) \\\\\n", "&= ~ \\frac{1}{n} \\sum_{i=1}^n P(D_n = n+1-i) \\\\\n", "&= ~ \\frac{1}{n}\n", "\\end{align*}\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }