Joint Distributions¶
This is a brief introduction to working with Joint Distributions from the prob140 library.
Table of Contents
Getting Started¶
As always, this should be the first cell if you are using a notebook.
# HIDDEN
from datascience import *
from prob140 import *
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('fivethirtyeight')
Constructing Joint Distributions¶
Distribution basics¶
We can construct a joint distribution by starting with a Table. Calling Table().values() with two lists will create a Table with X and Y taking on those values.
In [1]: from prob140 import *
In [2]: dist = Table().values(make_array(2, 3), np.arange(1, 6, 2))
In [3]: dist
Out[3]:
X | Y
2 | 1
2 | 3
2 | 5
3 | 1
3 | 3
3 | 5
We can then assign probabilities using probability() if we have an explicit list or array.
In [4]: dist = dist.probability([0.1, 0.1, 0.2, 0.3, 0.1, 0.2])
In [5]: dist
Out[5]:
X | Y | Probability
2 | 1 | 0.1
2 | 3 | 0.1
2 | 5 | 0.2
3 | 1 | 0.3
3 | 3 | 0.1
3 | 5 | 0.2
To turn it into a Joint Distribution object, call the to_joint() method.
In [6]: dist.to_joint()
Out[6]:
X=2 X=3
Y=5 0.2 0.2
Y=3 0.1 0.1
Y=1 0.1 0.3
By default, the joint distribution will display the Y values in reverse. To turn this functionality off, use the optional parameter reverse=False.
In [7]: dist.to_joint(reverse=False)
Out[7]:
X=2 X=3
Y=1 0.1 0.3
Y=3 0.1 0.1
Y=5 0.2 0.2
Naming the Variables¶
When defining a distribution, you can also give a name to each random variable rather than the default ‘X’ and ‘Y’. You must alternate between strings and lists when calling values()
In [8]: heads_table = Table().values("H1",[0.2,0.9],"H2",[2,1,0]).probability(make_array(.75*.04, .75*.32,.75*.64,.25*.81,.25*.18,.25*.01))
In [9]: heads_table
Out[9]:
H1 | H2 | Probability
0.2 | 2 | 0.03
0.2 | 1 | 0.24
0.2 | 0 | 0.48
0.9 | 2 | 0.2025
0.9 | 1 | 0.045
0.9 | 0 | 0.0025
In [10]: heads = heads_table.to_joint(reverse=False)
In [11]: heads
Out[11]:
H1=0.2 H1=0.9
H2=0 0.48 0.0025
H2=1 0.24 0.0450
H2=2 0.03 0.2025
You can also use strings for the values of the domain.
In [12]: coins_table = Table().values("Coin1",['H','T'],"Coin2", ['H','T']).probability(np.array([0.24, 0.36, 0.16, 0.24]))
In [13]: coins = coins_table.to_joint(reverse=False)
In [14]: coins
Out[14]:
Coin1=H Coin1=T
Coin2=H 0.24 0.16
Coin2=T 0.36 0.24
Probability Functions¶
You can also use a joint probability function that will take in the values of the random variables.
In [15]: def joint_func(dice1, dice2):
....: return (dice1 + dice2)/252
....:
In [16]: dice = Table().values("D1", np.arange(1,7),"D2", np.arange(1,7)).probability_function(joint_func).to_joint()
In [17]: dice
Out[17]:
D1=1 D1=2 D1=3 D1=4 D1=5 D1=6
D2=6 0.027778 0.031746 0.035714 0.039683 0.043651 0.047619
D2=5 0.023810 0.027778 0.031746 0.035714 0.039683 0.043651
D2=4 0.019841 0.023810 0.027778 0.031746 0.035714 0.039683
D2=3 0.015873 0.019841 0.023810 0.027778 0.031746 0.035714
D2=2 0.011905 0.015873 0.019841 0.023810 0.027778 0.031746
D2=1 0.007937 0.011905 0.015873 0.019841 0.023810 0.027778
Marginal Distributions¶
To see the marginal distribution of a variable, call the method marginal(label) where label is a string for the label.
In [18]: heads.marginal("H1")
Out[18]:
H1=0.2 H1=0.9
H2=0 0.48 0.0025
H2=1 0.24 0.0450
H2=2 0.03 0.2025
Sum: Marginal of H1 0.75 0.2500
In [19]: heads.marginal("H2")