5.4. Sampling Without Replacement#
Consider a set of
Let’s call such a permutation
Notice that the right hand side doesn’t depend on the particular permutation specified on the left. We say that the “coordinates
5.4.1. Symmetry#
For each fixed
using a now-familiar method of putting item
For any two coordinates
Once again, the probability on the right doesn’t depend on the particular
We have seen these probabilities earlier in the context of the matching problem. In that problem we were finding probabilities of matches, for example
5.4.2. Example: A Well Shuffled Deck#
Suppose a standard deck of cards is well shuffled, by which we will mean that all permutations are equally likely.
Question 1. What is the chance that the 17th card is an ace?
Answer 1. By our calculation above, the 17th card is equally likely to be any of the 52 cards. Of these, four are aces, so the chance that the 17th card is an ace is 4/52.
That’s the same as the chance that the first card is an ace, or the chance that the 32nd card is an ace. All of these unconditional marginal probabilities are equal by symmetry. If this seems mysterious, imagine the cards dealt in a circle. You can’t tell from that which is “first” and which is “17th”.
Question 2. What is the chance that the 17th card is an ace, given that the 32nd card is an ace?
Answer 2. By our calculation of the joint distribution of
Quick Check
In a class of 100 students, 30 are Data Science majors. Each student submits an assignment. If the tutor grades the submissions in random order, what is the chance that the fifth assignment she grades was submitted by a Data Science major?
Answer
Quick Check
A playlist consists of 12 songs, two of which are by Queen. If the playlist is shuffled randomly, what is the chance that the last two songs are by Queen?
Answer
5.4.3. Simple Random Samples#
A simple random sample is a sample drawn at random without replacement from a finite population. The sample is a random subset of the population, not a rearrangement of the entire population. If you take a simple random sample of 5 cards from a standard deck of 52, then the resulting “hand” is the subset of five cards that you get. The five cards could have appeared in your hand in any sequence, but the sequence doesn’t matter. All that matters is the set of five cards.
To find the chance of getting a particular subset of five cards in your hand, you have to count the number of sequences that result in that hand.
There are
sequences of five cards.To get the particular set of 5 in the hand, put one of them in Position 1; you can do this in 5 ways. Then put the next in Position 4, and so on.
Thus the chance of a particular hand is
This shows that dealing 5 cards one by one at random without replacement is probabilistically equivalent to shuffling the cards and pulling out five cards.
The special
module in scipy
allows you to compute these combinatorial terms.
from scipy import special
special.comb(52, 5)
2598960.0
5.4.4. The Number of Simple Random Samples#
There are almost 2.6 million five-card poker hands. That’s a lot of hands. It would be nice to have a theory that helps us work with them and with other simple random samples. In the next section we will start developing such a theory. We will end this one by counting the number of simple random samples drawn from a population.
Suppose you have a population of size
We will assume that the “sample” is the subset of
An analogous argument tells us that the number of different simple random samples is
and they are all equally likely.
5.4.5. Counting Good Elements in a Simple Random Sample#
If the population consists of two classes of individuals, the two classes are conventionally called “successes and failures” or “good and bad”. Here “good” almost invariably stands for the kind of individual you are trying to count. For example, if you are trying to count voters who support a particular candidate in an election, then that class of voters would be labeled “good” regardless of your opinion about their political beliefs.
Suppose a population of
The number of samples that contain
Pick
individuals from the good individuals in the population. You can do this in ways.For each choice of these
good individuals, there are choices of bad individuals you can make.
So the total number of samples containing
The chance of getting
These are called hypergeometric probabilities because the formula is related to the hypergeometric series of mathematics. We won’t be dealing with that series in this course, but we can still use the impressive name. We will have a lot more to do with these probabilities later in the course.
Technical Note:
If you are really careful, you will have started by trying to figure out which values of
But you need not worry about these technical details. Just define
Quick Check
A bridge hand is 13 cards dealt from a standard deck of 52 cards, of which 4 are aces. What is the chance that there are two aces in a bridge hand?
Answer
See More