Joint Distributions
Joint Distributions¶
Suppose $X$ and $Y$ are two random variables defined on the same outcome space. For example, in three tosses of a coin, $X$ could be the number of heads in the first two tosses and $Y$ the number of heads in the last two tosses.
We will use the notation $P(X = x, Y = y)$ for the probability that $X$ has the value $x$ and $Y$ has the value $y$. In our example,
$$ P(X = 1, Y = 2) = P(\text{THH}) = \frac{1}{8} $$The joint distribution of $X$ and $Y$ consists of all the probabilities $P(X=x, Y=y)$ where $x$ ranges over all the possible values of $X$ and $y$ ranges over all the possible values of $Y$.
In our example, both $X$ and $Y$ have values in the range 0, 1, 2, and so there are nine pairs of values. We could use product
to list them all, but the domain
method extends to two variables and is simpler to use. It takes as its arguments the name of one variable, the range of that variable, the name of the other variable, and the range of that variable.
joint_table = Table().domain('X', np.arange(3), 'Y', np.arange(3))
joint_table
This display contains no probabilities yet, so let's put them in. For now, we will simply make an array of probabilities in the order in which the outcomes appear. Later we will see how to replace the array by a function that will compute the probability of each outcome.
probs = make_array(1/8, 1/8, 0, 1/8, 2/8, 1/8, 0, 1/8, 1/8 )
joint_table = joint_table.probability(probs)
joint_table
This table displays the joint distribution. To check that this is indeed a distribution, we can add up all the probabilities. The sum is 1, as it should be for a distribution.
joint_table.column(2).sum()
A Joint Distribution Table¶
Though the table above does display the joint distribution, it is more conventional and also more useful to display the same data in a different way.
The prob140
method toJoint
converts the table above into a JointDistribution object that is displayed as a conventional joint distribution table for $X$ and $Y$.
joint_dist = joint_table.toJoint()
joint_dist
This way of displaying the information makes it easier to understand the relation between the two variables, as we will soon see. For now, observe that each cell corresponds to a pair $(x, y)$, where $x$ is a value of $X$ and $y$ a value of $Y$. In the cell you see $P(X = x, Y = y)$, the probability of the pair $(x, y)$.
For example, the cell whose labels are X=1
and Y=0
contains the probability 0.125. That is because
$$
P(X = 1, Y = 0) = P(\text{HTT}) = \frac{1}{8} = 0.125
$$
You can check all the other cells in the same way.
The table shows it is most likely that both $X$ and $Y$ will be equal to 1. Two outcomes make this happen: HTT and TTH.
Finding Probabilities¶
The table contains complete information about $X$ and $Y$. To find the probabiilty of any event determined by $X$ and $Y$, simply identify the cells that make the event happen, and add up their chances. This is the random variable version of the Fundamental Method of finding probabilities (see Section 2.4).
For example,
\begin{align*} P(X > Y ) &= P(X = 1, Y = 0) + P(X = 2, Y = 0) + P(X = 2 , Y = 1) \\ &= 0.125 + 0 + 0.125 \\ &= 0.25 \end{align*}