Skip to main content Skip to main content

Private: Learning Math: Data Analysis, Statistics, and Probability

Statistics As Problem Solving Part D: Bias in Sampling (20 minutes)

In This Part: Population and Sample

In data analysis, we use graphs, tables, and numerical summaries to study the variation present in our data. Often, we want to extend our interpretation to a larger group beyond the particular group studied. Such generalizations are only valid, however, if the data we examine are representative of that larger group. If not, our interpretation may misrepresent the larger group! See Note 4.

The entire group that we want information about is called the population. We can gain information about this group by examining a portion of the population, called a sample.

To gain useful information, the sample must be representative of the population. A representative sample is one in which the relevant characteristics of the sample members are generally the same as the characteristics of the population.

There are several good reasons that we use samples to study populations; chief among them are feasibility and cost. For instance, in a nationwide political survey of the population of all voters in the United States, it would be difficult, if not impossible, to poll every voter. It would also be quite expensive. Statistical theory shows that a survey of a 1,000 carefully selected voters suffices to represent the opinions of the millions of people in the population of voters.

Another problem in answering questions about a population arises when we want to inspect or test products. For example, testing an airbag to see if it works properly means that we have to destroy it. We certainly can’t test every airbag, but testing a carefully selected sample of airbags will tell us what we need to know about all the airbags in the population.


Problem D1

Think of a statistical question and a population. How could you determine a representative sample of that population? What would be a sample that is not representative?

A population might be the students at a certain school, the members of the Republican party, or all the soda cans shipped to the nearest convenience store this year. A representative sample must have all the same characteristics as the population.


How we select a sample is extremely important. Improper or biased sample selection can produce misleading conclusions. Sample selection is biased if it systematically favors certain outcomes. If we select only Democrats to participate in a political survey, the outcome will reflect Democrats’ opinions, but not other political parties’. If we personally select a sample of students we know and like for a school survey, we have just eliminated the differing opinions of those whom we do not know and like. We need to select our sample in an unbiased fashion.


In This Part: Random Sampling
Random sampling is a way to remove bias in sample selection. For example, to pick a random sample of 20 people out of a population of a 1,000, you might put all 1,000 names in a hat, then draw 20 of them. Random sampling attempts to reduce bias in sample selection, since every member of the population has an equal chance of being selected. See Note 5.

In this Interactive Activity, you will have the opportunity to see if you can personally select a sample that is representative of a particular population.

Here are 60 circles. Can you select five circles that best represent the size of all the circles? (The average size of the five circles should equal the average size of all the circles).

 

 

 

 

 

 

 

 

Then look at the picture for no longer than 20 seconds. Mark the five circles you choose. Use the scale on the picture to measure the diameter of those five circles. Find the average diameter of your sample.

The average diameter of all 60 circles is 1 unit. How close to that is your sample?

(Note that a computer, selecting any five of the 60 circles randomly, might generate average diameters ranging from as small as 0.5 units to as large as 2.2 units.)


Problem D2
Can you think of any circumstances in which it would be difficult or impossible to select a simple random sample?


You may have noticed that each of the problems you looked at in this session began with a question. Providing answers to questions like these is the goal of statistics. But sometimes, the variation in our data makes it difficult to answer statistical questions.

In order to identify any patterns present in the variation, we must analyze our data by organizing and summarizing it. Once this analysis is complete, we can interpret the data to answer our questions. In later sessions, we will look at the analysis and interpretation components in more detail.

It’s also important to remember that when you conduct a statistical investigation, the question you pose is designed to investigate a group (“the population”). The results of an investigation involving a sample are frequently used to draw conclusions about the entire population. If an attempt is made to include every individual from the population in a sample, then the investigation is called a census.


Problem D3
Why is a census still considered a sample?

Notes

Note 4
A voter poll taken during the 1936 presidential election provides a good example of the danger of biased sampling. The magazine Literary Digest sent a survey to 10 million Americans to determine how they would vote in the upcoming election between Democrat Franklin Roosevelt and Republican Alf Landon. More than two million Americans responded to this poll, and 60% supported Landon. The magazine published these findings, suggesting that Landon was guaranteed to win the election.

Despite the findings of the poll, however, Roosevelt defeated Landon in one of the largest landslide presidential elections ever. What happened? The sample used in the Literary Digest poll — a sample collected through magazine subscription lists, lists of car owners, and telephone directories — was not representative. Not all Americans at this time owned cars, had telephones, or subscribed to magazines. Moreover, Democrats were much less likely to own a car or have a telephone, and thus were less likely to be included in the sample. As a result, the sample was not representative, and the poll did not predict the outcome of the election.

Note 5
Good sampling practices rely on some form of random selection in order to remove the bias caused by human involvement in the selection process. The Interactive Activity in Part D is intended to demonstrate how human selection might result in biased results. You are asked to select a sample of five circles from a population of 60 circles in order to estimate the size of the circles in the entire population. You will then compare the accuracy of your sample with the accuracy of a random sample. A bias should appear: Most people tend to pick a sample that greatly overestimates the size of the circles.

Solutions

Problem D1

One such question is “Are girls better math students than boys?” Consider this question for the population of a certain school. A representative sample would be a selection of girls across grade and ability levels and a selection of boys across grade and ability levels. An unrepresentative sample might select only one grade level or one ability level. Comparing the girls and boys in the most challenging math course at the school would be a very unrepresentative sample.

Problem D2


It is difficult to select a simple random sample if full information about the population is not available. It would be extremely difficult to select a simple random sample of the world’s ant population, for instance, since it would be impractical (if not impossible) to obtain enough information about the population to set up the random sample.

Problem D3

A census is still considered a sample because there is no guarantee that the attempt to include everyone has been successful. For example, every 10 years, the U.S. population census misses between 1% and 3 % of the individuals in the population, and accidentally counts some people more than once. A full census for all but the smallest populations would be impossible to complete successfully.

Series Directory

Private: Learning Math: Data Analysis, Statistics, and Probability

Credits

Produced by WGBH Educational Foundation. 2001.
  • Closed Captioning
  • ISBN: 1-57680-481-X

Sessions