Functions on an Outcome Space

Random sampling can be thought of as repeated random trials, and therefore many outcome spaces consist of sequences. An outcome space representing two tosses of a coin is

$$ \Omega = \{ \text{HH, HT, TH, TT} \} $$

If you were tossing 10 times, the outcome space would consist of the $2^{10}$ sequences of 10 elements where each element is H or T. The outcomes are a pain to list by hand, but computers are good at saving us that kind of pain.

Product Spaces

The product of two sets $A$ and $B$ is the set of all pairs $(a, b)$ where $a \in A$ and $b \in B$. This concept is exactly what we need to describe spaces representing multiple trials.

For example, the space representing the outcome of one toss of a coin is $ \Omega_1 = \{ \text{H, T}\}$. The product of $\Omega_1$ with itself is the set of pairs (H, H), (H, T), (T, H) and (T, T), which you will recognize as the outcomes of two tosses of a coin. The product of this new space and $\Omega_1$ is the space representing three tosses. And so on.

The Python module itertools contains a function product that constructs product spaces. Let's import it.

from itertools import product

To see how product works, we will start with the outcomes of one toss of a coin. We are creating an array using make_array but you could use any other way of creating an array or list.

one_toss = make_array('H', 'T')

To use product, we have to specify the base space and the number of repetitions, and then covert the result to a list.

two_tosses = list(product(one_toss, repeat=2))
two_tosses
[('H', 'H'), ('H', 'T'), ('T', 'H'), ('T', 'T')]

For three tosses, just change the number of repetitions:

three_tosses = list(product(one_toss, repeat=3))
three_tosses
[('H', 'H', 'H'),
 ('H', 'H', 'T'),
 ('H', 'T', 'H'),
 ('H', 'T', 'T'),
 ('T', 'H', 'H'),
 ('T', 'H', 'T'),
 ('T', 'T', 'H'),
 ('T', 'T', 'T')]

A probability space is an outcome space accompanied by the probabilities of all the outcomes. If you assume all eight outcomes of three tosses are equally likely, the probabilities are all 1/8:

three_toss_probs = (1/8)*np.ones(8)

The corresponding probability space:

three_toss_space = Table().with_columns(
    'omega', three_tosses,
    'P(omega)', three_toss_probs
)
three_toss_space
omega P(omega)
['H' 'H' 'H'] 0.125
['H' 'H' 'T'] 0.125
['H' 'T' 'H'] 0.125
['H' 'T' 'T'] 0.125
['T' 'H' 'H'] 0.125
['T' 'H' 'T'] 0.125
['T' 'T' 'H'] 0.125
['T' 'T' 'T'] 0.125

Product spaces get large very quickly. If you roll a die 5 times, there are almost 8,000 possible outcomes:

6**5
7776

But we have product so we can still list them all! Here is a probability space representing 5 rolls of a die.

die = np.arange(1, 7, 1)

five_rolls = list(product(die, repeat=5))  # All possible results of 5 rolls

five_rolls_probs = (1/6**5)**np.ones(6**5)  # Each result has chance 1/6**5

five_rolls_space = Table().with_columns(
   'omega', five_rolls,
    'P(omega)', five_rolls_probs
)

five_rolls_space
omega P(omega)
[1 1 1 1 1] 0.000128601
[1 1 1 1 2] 0.000128601
[1 1 1 1 3] 0.000128601
[1 1 1 1 4] 0.000128601
[1 1 1 1 5] 0.000128601
[1 1 1 1 6] 0.000128601
[1 1 1 2 1] 0.000128601
[1 1 1 2 2] 0.000128601
[1 1 1 2 3] 0.000128601
[1 1 1 2 4] 0.000128601

... (7766 rows omitted)

A Function on the Outcome Space

Suppose you roll a die five times and add up the number of spots you see. If that seems artificial, be patient for a moment and you'll soon see why it's interesting.

The sum of the rolls is a numerical function on the outcome space $\Omega$ of five rolls. The sum is thus a random variable. Let's call it $S$. Then, formally, $$ S: \Omega \rightarrow \{ 5, 6, \ldots, 30 \} $$ The range of $S$ is the integers 5 through 30, because each die shows at least one spot and at most six spots. We can also use the equivalent notation

$$ \Omega \stackrel{S}{\rightarrow} \{ 5, 6, \ldots, 30 \} $$

From a computational perspective, the elements of $\Omega$ are in the column omega of five_roll_space. Let's apply this function and create a larger table.

five_rolls_sum = five_rolls_space.with_column(
    'S(omega)', five_rolls_space.apply(sum, 'omega')
).move_to_end('P(omega)')

five_rolls_sum
omega S(omega) P(omega)
[1 1 1 1 1] 5 0.000128601
[1 1 1 1 2] 6 0.000128601
[1 1 1 1 3] 7 0.000128601
[1 1 1 1 4] 8 0.000128601
[1 1 1 1 5] 9 0.000128601
[1 1 1 1 6] 10 0.000128601
[1 1 1 2 1] 6 0.000128601
[1 1 1 2 2] 7 0.000128601
[1 1 1 2 3] 8 0.000128601
[1 1 1 2 4] 9 0.000128601

... (7766 rows omitted)

We now have every possible outcome of five rolls of a die, along with its total number of spots. You can see that the first row of the table shows the smallest possible number of spots, corresponding to all the rolls showing 1 spot. The 7776th row shows the largest:

five_rolls_sum.take(7775)
omega S(omega) P(omega)
[6 6 6 6 6] 30 0.000128601

All the other values of $S$ are between these two extremes.

Functions of Random Variables

A random variable is a numerical function on $\Omega$. Therefore by composition, a numerical function of a random variable is also a random variable.

For example, $S^2$ is a random variable, calculated as follows:

$$ S^2(\omega) = \big{(} S(\omega)\big{)}^2 $$

Thus for example $S^2(\text{[6 6 6 6 6]}) = 30^2 = 900$.

Events Determined by $S$

From the table five_rolls_sum it is hard to tell how many rows show a sum of 6, or 10, or any other value. To better understand the properties of $S$, we have to organize the information in five_rolls_sum.

For any subset $A$ of the range of $S$, define the event $\{S \in A\}$ as

$$ \{S \in A \} = \{\omega: S(\omega) \in A \} $$

That is, $\{ S \in A\}$ is the collection of all $\omega$ for which $S(\omega)$ is in $A$. In terms of the table, the set consists of the values of $\omega$ in all the rows in which the sum is in $A$.

Try out the definition in a special case. Take $A = \{5, 30\}$. Then $\{S \in A\}$ if and only if either all the rolls show 1 spot or all the rolls show 6 spots. So $$ \{S \in A\} = \{\text{[1 1 1 1 1], [6 6 6 6 6]}\} $$

It is natural to ask about the chance the sum is a particular value, say 10. That's not easy to read off the table, but we can access the corresponding rows:

five_rolls_sum.where('S(omega)', are.equal_to(10))
omega S(omega) P(omega)
[1 1 1 1 6] 10 0.000128601
[1 1 1 2 5] 10 0.000128601
[1 1 1 3 4] 10 0.000128601
[1 1 1 4 3] 10 0.000128601
[1 1 1 5 2] 10 0.000128601
[1 1 1 6 1] 10 0.000128601
[1 1 2 1 5] 10 0.000128601
[1 1 2 2 4] 10 0.000128601
[1 1 2 3 3] 10 0.000128601
[1 1 2 4 2] 10 0.000128601

... (116 rows omitted)

There are 126 values of $\omega$ for which $S(\omega) = 10$. Since all the $\omega$ are equally likely, the chance that $S$ has the value 10 is 126/7776.

We will usually be informal with notation and write $\{ S = 10 \}$ instead of $\{ S \in \{10\} \}$: $$ P(S = 10) = \frac{126}{7776} = 1.62\% $$