Single-Variable Distributions¶
This is a brief introduction to the functionality in prob140
.
Table of Contents
Getting Started¶
Make sure you are on the most recent version of the prob140 library. You can check your version of prob140 (or any other Python library) by running the following:
In [1]: import prob140
In [2]: print(prob140.__version__)
0.3.5.1
If you are using an iPython notebook, use this as your first cell:
# HIDDEN
from datascience import *
from prob140 import *
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('fivethirtyeight')
You may want to familiarize yourself with Data8’s datascience
documentation first
Creating a Distribution¶
The prob140 library adds distribution methods to the default table class that you should already be familiar with. A distribution table is defined as a 2-column table in which the first column represents the possible values while the second column represents the probabilities associated with each value.
You can specify a list or array to the methods values and probability to specify those columns for a distribution
In [3]: from prob140 import *
In [4]: dist1 = Table().values(make_array(2, 3, 4)).probability(make_array(0.25, 0.5, 0.25))
In [5]: dist1
Out[5]:
Value | Probability
2 | 0.25
3 | 0.5
4 | 0.25
We can also construct a distribution by explicitly assigning values for the values but applying a probability function to the values of the domain
In [6]: def p(x):
...: return 0.25
...:
In [7]: dist2 = Table().values(np.arange(1, 8, 2)).probability_function(p)
In [8]: dist2
Out[8]:
Value | Probability
1 | 0.25
3 | 0.25
5 | 0.25
7 | 0.25
This can be very useful when we have a distribution with a known probability mass function
In [9]: from scipy.special import comb
In [10]: def pmf(x):
....: n = 10
....: p = 0.3
....: return comb(n,x) * p**x * (1-p)**(n-x)
....:
In [11]: binomial = Table().values(np.arange(11)).probability_function(pmf)
In [12]: binomial
Out[12]:
Value | Probability
0 | 0.0282475
1 | 0.121061
2 | 0.233474
3 | 0.266828
4 | 0.200121
5 | 0.102919
6 | 0.0367569
7 | 0.00900169
8 | 0.0014467
9 | 0.000137781
... (1 rows omitted)
Events¶
Often, we are concerned with specific values in a distribution rather than all the values.
Calling event
allows us to see a subset of the values in a distribution and
the associated probabilities.
In [13]: dist1
Out[13]:
Value | Probability
2 | 0.25
3 | 0.5
4 | 0.25
In [14]: dist1.event(np.arange(1,4))