Teacher resources and professional development across the curriculum
Teacher professional development and classroom resources across the curriculum
Let's return to thinking about coin flips. If our coin is fair, the probability that the result will be heads is , and the probability that the result will be tails is the same. If we flip the coin 100 times, the Law of Large Numbers says that we should have about 50 heads and about 50 tails. Furthermore, the more times we flip the coin, the closer we get to this ratio.
Let's now shift our thinking to consider sets of 100 coin flips. Flipping a coin 100 times is like running one marble through a 100-row version of our Galton board. Running many marbles through this system is like doing the 100-coin flip experiment many times, one for each marble. Instead of being concerned with each flip, or each left or right deflection of a marble, we are only concerned with the total result of 100 such individual events. According to the Law of Large Numbers, the more times we flip the coin, the closer our overall results will come to a 1-to-1 ratio of heads to tails.
However, if we cap our number of events at 100, and do multiple sets of 100 events, we will find that not all of the sets end up being an exact 50-50 split between heads and tails. Some will have more heads than tails and vice versa. It's also possible that a very few sets might come out all heads or all tails. To explain these results, we are going to need something a bit more powerful than the Law of Large Numbers.
What is amazing is that we can predict with a fair level of accuracy how many of these 100-flip tests should come out all heads, or all tails, or any mixture in between. In fact, the distribution of outcomes of our 100-flip tests will follow a normal distribution very closely. The guiding principle behind this reality is the Central Limit Theorem.
The Central Limit Theorem was developed shortly after Bernoulli's work on the Law of Large Numbers, first by Abraham De Moivre. De Moivre's work sat relatively unnoticed until Pierre-Simon Laplace continued its development decades later. Still, the Central Limit Theorem did not receive much recognition until the beginning of the 20th century. It is one of the jewels of probability theory.
The Central Limit Theorem can be quite useful in making predictions about a large group of results from a small sampling. For instance, in our sets of 100 coin flips, we don't actually have to do numerous rounds of 100 flips in order to be able to say with a fair amount of confidence what would happen were we to do so. We can, for instance, complete just one round of 100 flips, look at the outcome, say perhaps 75 heads and 25 tails, and ask, "how closely does this one experiment represent the whole?" This is essentially what happens during elections when television networks conduct exit polling.
In an exit polling situation, voters are asked if they voted for a particular candidate or not. If you ask 100 voters and you find that 75 voted for Candidate A and 25 voted for Candidate B, how representative of the overall tally is this? The mean of this sample is 75% for Candidate A. This is calculated by assigning a score of 1 to a vote for Candidate A and a score of 0 to a vote for Candidate B, multiplying the votes by the scores, adding these results, and dividing by the total number of votes.
Intuition tells us that it would be unwise to assume that the final tally of all the votes will exhibit exactly the same ratio as this one sampling. That would be akin to flipping a coin 100 times, getting 75 heads, and assuming that this is what would happen more or less every time. In other words, we can't assume that the mean value of this one sample of 100 voters is the same as the true mean value of the election at large. Even so, we can say something about how the mean we found in the exit poll relates to the true mean.
We can use the Central Limit Theorem to realize that the distribution of all possible 100-voter samples will be approximately normal, and, therefore, the 68-95-97.5 rule applies. Recall that this rule says that 68% of sample means will fall within one standard deviation of the true mean (the actual vote breakdown of the whole election). However, this rule is useful only if we know the standard deviation and the true mean, and if we knew the true mean, why would we need to conduct an exit poll in the first place?
To find an approximation of the standard deviation, we must first find the variance. Recall from the previous section that the variance is related to the difference between how each person voted and the mean. Because the possible votes are only A or B, and A is assigned a score of 1 whereas B gets a score of 0, then the possible differences are "1 minus the mean," which corresponds to the people who voted for A, and just the mean, which corresponds to the people who voted for B. The total number of voters multiplied by the mean is the total number of voters who voted for A. The total number of voters multiplied by "one minus the mean" is the total number of voters who voted for B. To find the variance, we square the differences, multiply by the vote proportions, add, and divide by the total number of votes. If the total number of votes is V, then the variance is:
The Vs cancel out and with a bit of algebra, we find:
Var = mean (1 – mean)
The standard deviation is thus .
The mean in which we are interested here is the true mean, but as yet we have only a sample mean. Luckily, sample means and true means usually give standard deviations that are pretty close to one another, so we can use the standard deviation given by the sample mean to help us find approximately where the true mean lies.
We have now seen how probability theory can be used to make powerful predictions about certain situations. Up until this point, however, we have been chiefly concerned with simple, idealistic examples such as coin tosses, the rolling of dice, and quincunx machines. Let's now turn our attention to probabilities that are more in line with what happens in the real world.