{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"tags": [
"remove_cell"
]
},
"outputs": [],
"source": [
"# HIDDEN\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"from datascience import *\n",
"from prob140 import *\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"plt.style.use('fivethirtyeight')\n",
"%matplotlib inline\n",
"from scipy import stats"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Poissonizing the Binomial ##"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Suppose $N_H$ is the number of heads in 100 tosses of a coin, and $N_T$ the number of tails. Then $N_H$ and $N_T$ are far from independent. They are linear functions of each other because $N_T = 100 - N_H$. \n",
"\n",
"The same is true of any fixed number of tosses: if you know the number of heads, then you also know the number of tails. \n",
"\n",
"In any fixed number of Bernoulli trials, the number of successes and the number of failures are as dependent as it gets. If you know one, you know the other.\n",
"\n",
"However, something remarkable happens when the *number of trials is itself random and has a Poisson distribution.* After we see what happens, we will be able to understand why it matters."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": [
"remove-input",
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" VIDEO"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# VIDEO: Poisson Number of Trials\n",
"from IPython.display import YouTubeVideo\n",
"\n",
"YouTubeVideo('lnLUsr24r88')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Randomizing the Number of Bernoulli Trials ###\n",
"\n",
"Let $N$ have the Poisson $(\\mu)$ distribution, let $S$ be the number of successes in $N$ i.i.d. Bernoulli $(p)$ trials. More formally: \n",
"- Given $N = 0$, define $S$ to be 0 with probability 1. Given that there are no trials, there are also no successes.\n",
"- For $n \\ge 1$, let the conditional distribution of $S$ given $N = n$ be binomial $(n, p)$.\n",
"\n",
"Then the joint distribution of $N$ and $S$ is given by:\n",
"\n",
"$$\n",
"P(N=n, S=s) ~ = ~ e^{-\\mu}\\frac{\\mu^n}{n!} \\cdot \n",
"\\frac{n!}{s!(n-s)!} p^s(1-p)^{n-s}, ~~ 0 \\le s \\le n\n",
"$$\n",
"\n",
"You should check that the formula is correct when $n=0$.\n",
"\n",
"We can sum the terms in this joint distribution appropriately to get the marginal distribution of $S$."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"tags": [
"remove-input",
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" VIDEO"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# VIDEO: Poisson Number of Successes\n",
"\n",
"YouTubeVideo('JQT1NQNbxsQ')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A Poisson Number of Successes ###\n",
"The possible values of $S$ are $0, 1, 2, \\ldots$ with no upper limit because there is no upper limit on the possible values of $N$. For $s \\ge 0$,\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"P(S = s) &= \\sum_{n=s}^\\infty P(N=n, S=s) \\\\ \\\\\n",
"&= \\sum_{n=s}^\\infty e^{-\\mu}\\frac{\\mu^n}{n!} \\cdot \n",
"\\frac{n!}{s!(n-s)!} p^sq^{n-s} ~~~~ \\text{where } q = 1-p \\\\ \\\\\n",
"&= e^{-\\mu} \\frac{\\mu^sp^s}{s!} \\sum_{n=s}^\\infty\n",
"\\frac{\\mu^{n-s}q^{n-s}}{(n-s)!} \\\\ \\\\\n",
"&= e^{-\\mu} \\frac{(\\mu p)^s}{s!} \\sum_{n=s}^\\infty\n",
"\\frac{(\\mu q)^{n-s}}{(n-s)!} \\\\ \\\\\n",
"&= e^{-\\mu} \\frac{(\\mu p)^s}{s!} \\sum_{j=0}^\\infty\n",
"\\frac{(\\mu q)^j}{j!} \\\\ \\\\\n",
"&= e^{-\\mu} \\frac{(\\mu p)^s}{s!} e^{\\mu q} \\\\ \\\\\n",
"&= e^{-\\mu p} \\frac{(\\mu p)^s}{s!} ~~ \\text{because } \\mu p+ \\mu q = \\mu\n",
"\\end{align*}\n",
"$$\n",
"\n",
"Thus the distribution of $S$ is Poisson with parameter $\\mu p$.\n",
"\n",
"Notice what we have just proved. \n",
"\n",
"- If the number of trials $n$ is fixed, you know that the distribution of the number of successes is binomial $(n, p)$. \n",
"- But if the the number of trials is random with a Poisson $(\\mu)$ distribution, then the distribution of the number of successes is Poisson $(\\mu p)$. \n",
"\n",
"This is a major step in *Poissonizing* the binomial.\n",
"\n",
"The best is yet to come, but let's take a moment to look at the result numerically. Suppose you run a Poisson $(12)$ number of i.i.d. Bernoulli $(1/3)$ trials. Then the number of trials is most likely to be somewhere around 12, but you can't say exactly what it will be because it's random. What we have shown is that the number of successes is Poisson with parameter $12 \\times \\frac{1}{3} = 4$.\n",
"\n",
"The parameter 4 is not hard to understand intuitively. You're most likely to see around 12 trials, and about 1/3 of them are going to be successes, so you're most likely to see around 4 successes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{admonition} Quick Check\n",
"Each time I throw a dart, I have chance $1/4$ of hitting the bullseye, independently of all other times. Find the chance that I hit the bullseye 5 times if\n",
"\n",
"(a) I throw the dart 24 times\n",
"\n",
"(b) I pick a random number $N$ from a Poisson $(24)$ distribution, and then throw the dart $N$ times\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{admonition} Answer\n",
":class: dropdown\n",
"(a) $\\binom{24}{5} 0.25^5 0.75^{19}$\n",
"\n",
"(b) $e^{-6}\\frac{6^5}{5!}$\n",
"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": [
"remove-input",
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" VIDEO"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# VIDEO: Successes and Failures\n",
"\n",
"YouTubeVideo('2qLF6nNWPiw')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Successes and Failures are Independent ###\n",
"Yes, you read that right. If you run a Poisson number of i.i.d. Bernoulli trials, then the number of successes and the number of failures are *independent*.\n",
"\n",
"Randomizing parameters (in this case the number of trials) can have a dramatic effect on the relations between random variables.\n",
"\n",
"Let's prove our result, and then we will see a way in which it is used.\n",
"\n",
"Suppose as before that we are running $N$ i.i.d. Bernoulli $(p)$ trials, where $N$ has the Poisson $(\\mu)$ distribution independent of the results of the trials. Also as before, let $S$ be the number of successes. \n",
"\n",
"Now let $F$ be the number of failures. Then the distribution of $F$ is Poisson $(\\mu q)$ where $q = 1-p$. This follows by redefining \"success\" as \"failure\" in our previous argument.\n",
"\n",
"The joint distribution of $S$ and $F$ is\n",
"\n",
"$$\n",
"\\begin{align*} \n",
"P(S = s, F = f) &= P(N = s+f, S = s) \\\\ \\\\\n",
"&= e^{-\\mu} \\frac{\\mu^{s+f}}{(s+f)!} \\frac{(s+f)!}{s!f!} p^s q^f \\\\ \\\\\n",
"&= \\big{(} e^{-\\mu p} \\frac{ (\\mu p)^s}{s!} \\big{)} \n",
"\\big{(} e^{-\\mu q} \\frac{ (\\mu q)^f}{f!} \\big{)} \\\\ \\\\\n",
"&= P(S = s)P(F = f)\n",
"\\end{align*}\n",
"$$\n",
"\n",
"This shows that $S$ and $F$ are independent."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Summary: Poissonization of the Binomial ###\n",
"Suppose you run $N$ i.i.d. Bernoulli $(p)$ trials, where $N$ has the Poisson $(\\mu)$ distribution independent of the results of the trials. Let $S$ be the number of successes and $F$ the number of failures, and let $q = 1-p$. Then:\n",
"\n",
"- $S$ has the Poisson $(\\mu p)$ distribution\n",
"- $F$ has the Poisson $(\\mu q)$ distribution\n",
"- $S$ and $F$ are independent\n",
"\n",
"For example, suppose 90% of the individuals in a population are of Class A and 10% are of Class B. Suppose you draw $N$ times at random with replacement from the population, where $N$ has the Poisson $(20)$ distribution independent of the results of your draws. Then in your sample,\n",
"\n",
"- the number of people of Class A has the Poisson $(18)$ distribution,\n",
"- the number in Class B has the Poisson $(2)$ distribution,\n",
"- and the counts in the two classes are independent.\n",
"\n",
"Thus for example the chance that each class appears at least five times in your sample is\n",
"\n",
"$$\n",
"\\big{(} \\sum_{i=5}^\\infty e^{-{18}} \\frac{18^i}{i!} \\big{)}\n",
"\\big{(} \\sum_{j=5}^\\infty e^{-{2}} \\frac{2^j}{j!} \\big{)}\n",
"~ = ~ \n",
"\\big{(}1 - \\sum_{i=0}^4 e^{-{18}} \\frac{18^i}{i!} \\big{)}\n",
"\\big{(}1- \\sum_{j=0}^4 e^{-{2}} \\frac{2^j}{j!} \\big{)}\n",
"$$\n",
"\n",
"This is just over 5%."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.052648585218160585"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(1 - stats.poisson.cdf(4, 18))*(1 - stats.poisson.cdf(4, 2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{admonition} Quick Check\n",
"Each time I throw a dart, I have chance $1/4$ of hitting the bullseye, independently of all other times. Suppose I pick a random number $N$ from a Poisson $(24)$ distribution, and then throw the dart $N$ times. What is the chance that I hit the bullseye 5 times and miss 10 times?\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{admonition} Answer\n",
":class: dropdown\n",
"$e^{-6}\\frac{6^5}{5!} \\cdot e^{-18}\\frac{18^{10}}{10!}$\n",
"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"celltoolbar": "Tags",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}