{ "cells": [ { "cell_type": "code", "execution_count": 16, "metadata": { "tags": [ "remove_cell" ] }, "outputs": [], "source": [ "# HIDDEN\n", "from datascience import *\n", "from prob140 import *\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('fivethirtyeight')\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions on an Outcome Space ##" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Random sampling can be thought of as repeated random trials, and therefore many outcome spaces consist of sequences. An outcome space representing two tosses of a coin is\n", "\n", "$$\n", "\\Omega = \\{ \\text{HH, HT, TH, TT} \\}\n", "$$\n", "\n", "If you were tossing 10 times, the outcome space would consist of the $2^{10}$ sequences of 10 elements where each element is H or T. The outcomes are a pain to list by hand, but computers are good at saving us that kind of pain." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Product Spaces ###\n", "The *product* of two sets $A$ and $B$ is the set of all pairs $(a, b)$ where $a \\in A$ and $b \\in B$. This concept is exactly what we need to describe spaces representing multiple trials.\n", "\n", "For example, the space representing the outcome of one toss of a coin is $\\Omega_1 = \\{ \\text{H, T}\\}$. The *product* of $\\Omega_1$ with itself is the set of pairs (H, H), (H, T), (T, H) and (T, T), which you will recognize as the outcomes of two tosses of a coin. The product of this new space and $\\Omega_1$ is the space representing three tosses. And so on.\n", "\n", "The Python module itertools contains a function product that constructs product spaces. Let's import it." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "from itertools import product" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see how product works, we will start with the outcomes of one toss of a coin. We are creating an array using make_array but you could use any other way of creating an array or list." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "one_toss = make_array('H', 'T')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use product, we have to specify the base space and the number of repetitions, and then covert the result to a list." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('H', 'H'), ('H', 'T'), ('T', 'H'), ('T', 'T')]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "two_tosses = list(product(one_toss, repeat=2))\n", "two_tosses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For three tosses, just change the number of repetitions:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('H', 'H', 'H'),\n", " ('H', 'H', 'T'),\n", " ('H', 'T', 'H'),\n", " ('H', 'T', 'T'),\n", " ('T', 'H', 'H'),\n", " ('T', 'H', 'T'),\n", " ('T', 'T', 'H'),\n", " ('T', 'T', 'T')]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "three_tosses = list(product(one_toss, repeat=3))\n", "three_tosses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A *probability space* is an outcome space accompanied by the probabilities of all the outcomes. If you assume all eight outcomes of three tosses are equally likely, the probabilities are all 1/8:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "three_toss_probs = (1/8)*np.ones(8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The corresponding probability space:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
omega P(omega)
['H' 'H' 'H'] 0.125
['H' 'H' 'T'] 0.125
['H' 'T' 'H'] 0.125
['H' 'T' 'T'] 0.125
['T' 'H' 'H'] 0.125
['T' 'H' 'T'] 0.125
['T' 'T' 'H'] 0.125
['T' 'T' 'T'] 0.125
" ], "text/plain": [ "omega | P(omega)\n", "['H' 'H' 'H'] | 0.125\n", "['H' 'H' 'T'] | 0.125\n", "['H' 'T' 'H'] | 0.125\n", "['H' 'T' 'T'] | 0.125\n", "['T' 'H' 'H'] | 0.125\n", "['T' 'H' 'T'] | 0.125\n", "['T' 'T' 'H'] | 0.125\n", "['T' 'T' 'T'] | 0.125" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "three_toss_space = Table().with_columns(\n", " 'omega', three_tosses,\n", " 'P(omega)', three_toss_probs\n", ")\n", "three_toss_space" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Product spaces get large very quickly. If you roll a die 5 times, there are almost 8,000 possible outcomes:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7776" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "6**5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But we have product so we can still list them all! Here is a probability space representing 5 rolls of a die." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
omega P(omega)
[1 1 1 1 1] 0.000128601
[1 1 1 1 2] 0.000128601
[1 1 1 1 3] 0.000128601
[1 1 1 1 4] 0.000128601
[1 1 1 1 5] 0.000128601
[1 1 1 1 6] 0.000128601
[1 1 1 2 1] 0.000128601
[1 1 1 2 2] 0.000128601
[1 1 1 2 3] 0.000128601
[1 1 1 2 4] 0.000128601
\n", "

... (7766 rows omitted)

" ], "text/plain": [ "omega | P(omega)\n", "[1 1 1 1 1] | 0.000128601\n", "[1 1 1 1 2] | 0.000128601\n", "[1 1 1 1 3] | 0.000128601\n", "[1 1 1 1 4] | 0.000128601\n", "[1 1 1 1 5] | 0.000128601\n", "[1 1 1 1 6] | 0.000128601\n", "[1 1 1 2 1] | 0.000128601\n", "[1 1 1 2 2] | 0.000128601\n", "[1 1 1 2 3] | 0.000128601\n", "[1 1 1 2 4] | 0.000128601\n", "... (7766 rows omitted)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "die = np.arange(1, 7, 1)\n", "\n", "five_rolls = list(product(die, repeat=5)) # All possible results of 5 rolls\n", "\n", "five_rolls_probs = (1/6**5)**np.ones(6**5) # Each result has chance 1/6**5\n", "\n", "five_rolls_space = Table().with_columns(\n", " 'omega', five_rolls,\n", " 'P(omega)', five_rolls_probs\n", ")\n", "\n", "five_rolls_space" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A Function on the Outcome Space ###\n", "Suppose you roll a die five times and add up the number of spots you see. If that seems artificial, be patient for a moment and you'll soon see why it's interesting. \n", "\n", "The sum of the rolls is a numerical function on the outcome space $\\Omega$ of five rolls. The sum is thus a *random variable*. Let's call it $S$. Then, formally,\n", "$$\n", "S: \\Omega \\rightarrow \\{ 5, 6, \\ldots, 30 \\}\n", "$$\n", "The range of $S$ is the integers 5 through 30, because each die shows at least one spot and at most six spots. We can also use the equivalent notation\n", "\n", "$$\n", "\\Omega \\stackrel{S}{\\rightarrow} \\{ 5, 6, \\ldots, 30 \\}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From a computational perspective, the elements of $\\Omega$ are in the column omega of five_roll_space. Let's apply this function and create a larger table." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
omega S(omega) P(omega)
[1 1 1 1 1] 5 0.000128601
[1 1 1 1 2] 6 0.000128601
[1 1 1 1 3] 7 0.000128601
[1 1 1 1 4] 8 0.000128601
[1 1 1 1 5] 9 0.000128601
[1 1 1 1 6] 10 0.000128601
[1 1 1 2 1] 6 0.000128601
[1 1 1 2 2] 7 0.000128601
[1 1 1 2 3] 8 0.000128601
[1 1 1 2 4] 9 0.000128601
\n", "

... (7766 rows omitted)

" ], "text/plain": [ "omega | S(omega) | P(omega)\n", "[1 1 1 1 1] | 5 | 0.000128601\n", "[1 1 1 1 2] | 6 | 0.000128601\n", "[1 1 1 1 3] | 7 | 0.000128601\n", "[1 1 1 1 4] | 8 | 0.000128601\n", "[1 1 1 1 5] | 9 | 0.000128601\n", "[1 1 1 1 6] | 10 | 0.000128601\n", "[1 1 1 2 1] | 6 | 0.000128601\n", "[1 1 1 2 2] | 7 | 0.000128601\n", "[1 1 1 2 3] | 8 | 0.000128601\n", "[1 1 1 2 4] | 9 | 0.000128601\n", "... (7766 rows omitted)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "five_rolls_sum = five_rolls_space.with_column(\n", " 'S(omega)', five_rolls_space.apply(sum, 'omega')\n", ").move_to_end('P(omega)')\n", "\n", "five_rolls_sum" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have every possible outcome of five rolls of a die, along with its total number of spots. You can see that the first row of the table shows the smallest possible number of spots, corresponding to all the rolls showing 1 spot. The 7776th row shows the largest:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
omega S(omega) P(omega)
[6 6 6 6 6] 30 0.000128601
" ], "text/plain": [ "omega | S(omega) | P(omega)\n", "[6 6 6 6 6] | 30 | 0.000128601" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "five_rolls_sum.take(7775)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the other values of $S$ are between these two extremes. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Functions of Random Variables ####\n", "A random variable is a numerical function on $\\Omega$. Therefore by composition, a numerical function of a random variable is also a random variable. \n", "\n", "For example, $S^2$ is a random variable, calculated as follows:\n", "\n", "$$\n", "S^2(\\omega) = \\big{(} S(\\omega)\\big{)}^2\n", "$$\n", "\n", "Thus for example $S^2(\\text{[6 6 6 6 6]}) = 30^2 = 900$." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### Events Determined by $S$ ###\n", "From the table five_rolls_sum it is hard to tell how many rows show a sum of 6, or 10, or any other value. To better understand the properties of $S$, we have to organize the information in five_rolls_sum.\n", "\n", "For any subset $A$ of the range of $S$, define the event $\\{S \\in A\\}$ as\n", "\n", "$$\n", "\\{S \\in A \\} = \\{\\omega: S(\\omega) \\in A \\}\n", "$$\n", "\n", "That is, $\\{ S \\in A\\}$ is the collection of all $\\omega$ for which $S(\\omega)$ is in $A$. In terms of the table, the set consists of the values of $\\omega$ in all the rows in which the sum is in $A$.\n", "\n", "Try out the definition in a special case. Take $A = \\{5, 30\\}$. Then $\\{S \\in A\\}$ if and only if either all the rolls show 1 spot or all the rolls show 6 spots. So \n", "$$\n", "\\{S \\in A\\} = \\{\\text{[1 1 1 1 1], [6 6 6 6 6]}\\}\n", "$$\n", "\n", "It is natural to ask about the chance the sum is a particular value, say 10. That's not easy to read off the table, but we can access the corresponding rows:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
omega S(omega) P(omega)
[1 1 1 1 6] 10 0.000128601
[1 1 1 2 5] 10 0.000128601
[1 1 1 3 4] 10 0.000128601
[1 1 1 4 3] 10 0.000128601
[1 1 1 5 2] 10 0.000128601
[1 1 1 6 1] 10 0.000128601
[1 1 2 1 5] 10 0.000128601
[1 1 2 2 4] 10 0.000128601
[1 1 2 3 3] 10 0.000128601
[1 1 2 4 2] 10 0.000128601
\n", "

... (116 rows omitted)

" ], "text/plain": [ "omega | S(omega) | P(omega)\n", "[1 1 1 1 6] | 10 | 0.000128601\n", "[1 1 1 2 5] | 10 | 0.000128601\n", "[1 1 1 3 4] | 10 | 0.000128601\n", "[1 1 1 4 3] | 10 | 0.000128601\n", "[1 1 1 5 2] | 10 | 0.000128601\n", "[1 1 1 6 1] | 10 | 0.000128601\n", "[1 1 2 1 5] | 10 | 0.000128601\n", "[1 1 2 2 4] | 10 | 0.000128601\n", "[1 1 2 3 3] | 10 | 0.000128601\n", "[1 1 2 4 2] | 10 | 0.000128601\n", "... (116 rows omitted)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "five_rolls_sum.where('S(omega)', are.equal_to(10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 126 values of $\\omega$ for which $S(\\omega) = 10$. Since all the $\\omega$ are equally likely, the chance that $S$ has the value 10 is 126/7776. \n", "\n", "We will usually be informal with notation and write $\\{ S = 10 \\}$ instead of $\\{ S \\in \\{10\\} \\}$:\n", "$$\n", "P(S = 10) = \\frac{126}{7776} = 1.62\\%\n", "$$" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }