Sampling Without Replacement


Sampling without Replacement

We will continue our theme of exploring aspects of dependence, and study properties of simple random samples. That's a standard name for a very natural kind of random sampling: drawing at random without replacement from a population.

When you sample in this way, information about how the first few draws came out affects the conditional chances of how the other draws will come out – the draws are not independent of each other. However, many remarkable symmetries arise in the distributions of simple random samples. Understanding these is a key to simplifying calculations and gaining insight.

Random Permutations and Symmetry

Consider a population of $n$ individuals labeled $1, 2 \ldots , n$. A simple random sample of size $n$ is a random permutation of all the elements. Let's call this sample $(X_1, X_2, \ldots , X_n)$. For any permutation $i_1, i_2, \ldots , i_n$ of the integers 1 through $n$,

$$ P(X_1 = i_1, X_2 = i_2, \ldots, X_n = i_n) = \frac{1}{n!} $$

Notice that the right hand side doesn't depend on the particular permutation specified on the left. We say that the "coordinates $X_1, X_2, \ldots , X_n$ are exchangeable."

For each fixed $i$, the $i$th coordinate $X_i$ is an integer between 1 and $n$. To find the marginal distribution of $X_i$, we need to find $P(X_i = k)$ for each $k$ in the range 1 through $n$. Since all permutations are equally likely,

$$ P(X_i = k) = \frac{(n-1)!}{n!} = \frac{1}{n} $$

using a now-familiar method of putting item $k$ at coordinate $i$ and letting the other $n-1$ elements vary arbitrarily. Thus for each $i$, the distribution of $X_i$ is uniform on 1 through $n$.

For any two coordinates $i$ and $j$, $$ P(X_i = k, X_j = l) = \frac{1}{n} \cdot \frac{1}{n-1}, ~~ 1 \le k \ne l \le n $$ Once again, the probability on the right doesn't depend on the particular $i$ and $j$ on the left.

We have seen these probabilities earlier in the context of the matching problem. In that problem we were finding probabilities of matches, for example $P(X_i = i, X_j = j)$. But the answers didn't depend on $i$ and $j$; it just mattered that we were looking at two positions. The same is true here.

Example: A Well Shuffled Deck

Suppose a standard deck of cards is well shuffled, by which we will mean that all permutations are equally likely.

Question 1. What is the chance that the 17th card is an ace?

Answer 1. By our calculation above, the 17th card is equally likely to be any of the 52 cards. Of these, four are aces, so the chance that the 17th card is an ace is 4/52.

That's the same as the chance that the first card is an ace, or the chance that the 32nd card is an ace. All of these unconditional marginal probabilities are equal by symmetry. If this seems mysterious, imagine the cards dealt in a circle. You can't tell from that which is "first" and which is "17th".

Question 2. What is the chance that the 17th card is an ace, given that the 32nd card is an ace?

Answer 2. By our calculation of the joint distribution of $X_i$ and $X_j$ above, the answer is the same as the chance that the second card is an ace given that the first card is an ace. That's 3/51.

Example: A Five-Card Poker Hand

Often, our sample is a subset of the population, not a rearrangement of the entire population. If you take a simple random sample of 5 cards from a standard deck of 52, then the resulting "hand" is the subset of five cards that you get; the five cards could have appeared in your hand in any sequence.

To find the chance of getting a particular subset of five cards in your hand, you have to count the number of sequences that result in that hand.

  • There are $52 \times 51 \times 50 \times 49 \times 48 $ sequences of five cards.
  • To get the particular set of 5 in the hand, put one of them in Position 1; you can do this in 5 ways. Then put the next in Position 4, and so on.

Thus the chance of a particular hand is $$ \frac{5 \times 4 \times 3 \times 2 \times 1}{52 \times 51 \times 50 \times 49 \times 48} = \frac{5! 47!}{52!} = \frac{1}{\binom{52}{5}} $$

This shows that dealing 5 cards one by one at random without replacement is probabilistically equivalent to shuffling the cards and pulling out five cards.

The misc module in scipy allows you to compute these combinatorial terms.

from scipy import misc
misc.comb(52, 5)

There are almost 2.6 million five-card poker hands. That's a lot of hands. It would be nice to have a theory that helps us work with them and with other simple random samples. In the next section we will start developing such a theory. We will end this one by counting the number of simple random samples drawn from a population.

The Number of Simple Random Samples

Suppose you have a population of size $N$ (a fixed integer, not a random variable), and you want to take a simple random sample of size $n \le N$. How many different samples can you draw?

We will assume that the "sample" is the subset of $n$ individuals, who could have appeared in any sequence. That's just like the poker hands, and an analogous argument tells us that the number of different simple random samples is

$$ \binom{N}{n} $$

results matching ""

    No results matching ""