{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"jupyter": {
"outputs_hidden": true
},
"tags": [
"remove_cell"
]
},
"outputs": [],
"source": [
"# HIDDEN\n",
"from datascience import *\n",
"from prob140 import *\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"plt.style.use('fivethirtyeight')\n",
"%matplotlib inline\n",
"import math\n",
"from scipy import stats"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Matching Problem ##"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This famous problem has been stated variously in terms of hats and people, letters and envelopes, tea cups and saucers – indeed, any situation in which you might want to match two kinds of items seems to have appeared somewhere as a setting for the matching problem. \n",
"\n",
"In the letter-envelope setting there are $n$ letters labeled 1 through $n$ and also $n$ envelopes labeled 1 through $n$. The letters are permuted randomly into the envelopes, one letter per envelope (a mishap usually blamed on an unfortunate hypothetical secretary), so that all permutations are equally likely. The main questions are about the number of letters that are placed into their matching envelopes.\n",
"\n",
"\"Real life\" settings aside, the problem is about the number of fixed points of a random permutation. A fixed point is an element whose position is unchanged by the shuffle."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Matches at Fixed Locations ###\n",
"Consider a random permutation of $n$ elements which for simplicity we will call $\\{1, 2, \\ldots , n\\}$. For any $i$ in the range 1 through $n$, what is the chance that Position $i$ is a fixed point? In other words, what is the chance that letter $i$ falls in envelope $i$?"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"tags": [
"remove-input",
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" VIDEO"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# VIDEO: Fixed Locations\n",
"from IPython.display import YouTubeVideo\n",
"\n",
"YouTubeVideo(\"wM-DD_j2npE\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We know that there are $n!$ possible permutations, all of which are equally likely. To find $P(\\text{match at Position }i)$ all we have to do is count the number of permutations that put letter $i$ in envelope $i$. Here is a good way to count these:\n",
"- Put letter $i$ in envelope $i$.\n",
"- Once that is done, the remaining $n-1$ letters can be permuted in $(n-1)!$ ways.\n",
"\n",
"So\n",
"\n",
"$$\n",
"P(\\text{match at Position }i) = \\frac{(n-1)!}{n!} \n",
"= \\frac{1}{n}\n",
"$$\n",
"\n",
"Notice the absence of $i$ from the answer. No matter which position you fix, the chance of a match at that position is $1/n$. This formalizes the intuitive notion that each letter is equally likely to fall in any envelope, so the chance that it falls in the matching envelope is $1/n$.\n",
"\n",
"Now fix any pair of positions $i \\ne j$. To find $P(\\text{matches at Positions } i \\text{ and } j)$, extend the method we used above:\n",
"- Put letter $i$ in envelope $i$ and letter $j$ in envelope $j$.\n",
"- Once that is done, the remaining $n-2$ letters can be permuted in $(n-2)!$ ways.\n",
"\n",
"So\n",
"\n",
"$$\n",
"P(\\text{matches at Positions } i \\text{ and } j) = \n",
"\\frac{(n-2)!}{n!} \n",
"= \\frac{1}{n} \\cdot \\frac{1}{n-1}\n",
"$$\n",
"\n",
"The second term in the product is \n",
"$P(\\text{match at } j \\mid \\text{match at } i)$ and is just the chance of a match at a fixed spot in the reduced set of $n-1$ letters after letter $i$ and envelope $i$ have been removed.\n",
"\n",
"You should check by induction that for $k = 1, 2, \\ldots , n$,\n",
"\n",
"$$\n",
"P(\\text{matches at a specified set of } k \\text{ positions})\n",
"= \\frac{1}{n} \\cdot \\frac{1}{n-1} \\cdot \\cdots \\cdot \\frac{1}{n-k+1}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### No Matches ###\n",
"If letters falling in the right envelopes are good events, then the worst possible event is every letter falling in a wrong envelope. That is the event that there are no matches, and is called a *derangement*. Let's find the chance of a derangement."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"tags": [
"remove-input",
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" VIDEO"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# VIDEO: Derangement\n",
"from IPython.display import YouTubeVideo\n",
"\n",
"YouTubeVideo(\"PYp39ZsLxOg\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The key is to notice that the complement is a union, and then use the inclusion-exclusion formula.\n",
"\n",
"$$\n",
"\\begin{align*}P(\\text{no match}) &= 1 - P(\\text{at least one match}) \\\\\n",
"&= 1 - P\\big{(}\\bigcup_{i=1}^n \\{\\text{match at Position } i\\} \\big{)} \\\\\n",
"&= 1 - P\\big{(}\\bigcup_{i=1}^n A_i \\big{)}\n",
"\\end{align*}\n",
"$$\n",
"\n",
"where $A_i$ is the event \"match at Position $i$\".\n",
"\n",
"By the inclusion-exclusion formula and our calculations above,\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"& P\\big{(}\\bigcup_{i=1}^n A_i \\big{)} \\\\\n",
"&=\n",
"\\sum_{i=1}^n P(A_i) - \\mathop{\\sum \\sum}_{1 \\le i < j \\le n} P(A_iA_j) + \\mathop{\\sum \\sum \\sum}_{1 \\le i < j < k \\le n} P(A_iA_jA_j) - \\cdots + (-1)^{n+1} P(A_1A_2 \\ldots A_n) \\\\\n",
"&= \\sum_{i=1}^n \\frac{1}{n} - \\mathop{\\sum \\sum}_{1 \\le i < j \\le n} \\frac{1}{n} \\cdot \\frac{1}{n-1} + \n",
"\\mathop{\\sum \\sum \\sum}_{1 \\le i < j < k \\le n}\n",
"\\frac{1}{n} \\cdot \\frac{1}{n-1} \\cdot \\frac{1}{n-2} -\n",
"\\cdots + (-1)^{n+1} \\frac{1}{n!}\n",
"\\end{align*}\n",
"$$\n",
"\n",
"If those sums look hair-raising, look again. None of the terms being added has an index ($i$, $j$, etc) in it! Each sum consists of adding a constant value multiple times, and is therefore equal to the constant times the number of terms in the sum. \n",
"\n",
"The number of terms in the first sum is $n$. As we observed in an earlier section, the number of terms being added in the second sum is\n",
"\n",
"$$\n",
"\\frac{n(n-1)}{2!}\n",
"$$\n",
"\n",
"In the third sum the number of terms is\n",
"\n",
"$$\n",
"\\frac{n(n-1)(n-2)}{3!}\n",
"$$\n",
"\n",
"and so on. Therefore\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"& P\\big{(}\\bigcup_{i=1}^n A_i \\big{)} \\\\ \\\\\n",
"&= n \\cdot \\frac{1}{n}\n",
"~-~ \\frac{n(n-1)}{2!} \\cdot \\frac{1}{n} \\cdot \\frac{1}{n-1}\n",
"~+~ \\frac{n(n-1)(n-2)}{3!} \\cdot \\frac{1}{n} \\cdot \\frac{1}{n-1} \\cdot \\frac{1}{n-2} ~-~\n",
"\\cdots + (-1)^{n+1} \\frac{1}{n!} \\\\ \\\\\n",
"&= 1 - \\frac{1}{2!} + \\frac{1}{3!} - \\cdots (-1)^{n+1}\\frac{1}{n!}\n",
"\\end{align*}\n",
"$$\n",
"\n",
"Remember that\n",
"\n",
"$$\n",
"P\\big{(}\\bigcup_{i=1}^n A_i \\big{)} = \n",
"P(\\text{at least one match})\n",
"$$\n",
"\n",
"So the chance of a derangement is\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"P(\\text{no match}) &= 1 - \\big{(}1 - \\frac{1}{2!} + \\frac{1}{3!} - \\cdots (-1)^{n+1}\\frac{1}{n!}\\big{)} \\\\\n",
"&= 1 - 1 + \\frac{1}{2!} - \\frac{1}{3!} + \\cdots (-1)^n\\frac{1}{n!} \\\\\n",
"&\\sim e^{-1}\n",
"\\end{align*}\n",
"$$\n",
"\n",
"when $n$ is large.\n",
"\n",
"In the language of random variables, let $M_n$ be the number of fixed points (matches) in a random permutation of $n$ elements. Then for every $n \\ge 1$ we have an exact formula for the chance that $M_n$ is 0:\n",
"\n",
"$$\n",
"P(M_n = 0) = 1 - 1 + \\frac{1}{2!} - \\frac{1}{3!} + \\cdots (-1)^n\\frac{1}{n!}\n",
"$$\n",
"\n",
"For large $n$, we also have an approximation:\n",
"\n",
"$$\n",
"P(M_n = 0) \\sim e^{-1} = 36.8\\%\n",
"$$\n",
"\n",
"roughly. When $n$ is large, about 36.8% of all permutations of $n$ elements move all of the elements away from their original positions. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### $k$ Matches ###\n",
"For $0 \\le k \\le n$, you can find $P(M_n = k)$ by using the following observations.\n",
"\n",
"- There are $\\binom{n}{k}$ ways of fixing the $k$ places for the matches.\n",
"- Once the places have been fixed, you have to get a match at those $k$ places; the chance of that is $1/(n(n-1) \\cdots (n-k+1))$.\n",
"- Given that there are matches at those $k$ places, there are $n-k$ letters left, with the corresponding $n-k$ envelopes, and there has to be a derangement of these. The conditional chance is equal to $P(M_{n-k} = 0)$.\n",
"\n",
"So for a fixed $k$ in the range $0 \\le k \\le n$,\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"& P(M_n = k) \\\\\n",
"&= \\binom{n}{k} \\cdot \\frac{1}{n(n-1) \\cdots (n-k+1)} \\cdot \n",
"\\big{(} 1 - 1 + \\frac{1}{2!} - \\frac{1}{3!} + \\cdots (-1)^{n-k}\\frac{1}{(n-k)!} \\big{)} \\\\\n",
"&= \\frac{1}{k!} \\cdot \\big{(} 1 - 1 + \\frac{1}{2!} - \\frac{1}{3!} + \\cdots (-1)^{n-k}\\frac{1}{(n-k)!} \\big{)} \\\\\n",
"&\\approx \\frac{1}{k!} e^{-1} ~~~~~~~~~ \\text{for large } n\n",
"\\end{align*}\n",
"$$\n",
"\n",
"We will see later that these probabilities form a *Poisson* distribution on the infinite set of non-negative integers."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"celltoolbar": "Tags",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}