## Learning Math: Data Analysis, Statistics, and Probability

# Random Sampling and Estimation Part A: Random Samples (15 minutes)

**In This Pa****rt: Counting Penguins
**Statisticians often use a random sample to estimate characteristics of a population when the population is very large and they cannot obtain data on every individual in the population. Statistical estimation asks the fundamental questions “What can I say about a whole population based on information from a random sample of that population?” and “To what degree can I say that my estimate is accurate?” Let’s put random sampling into action to answer a question about demographics: “How many penguins are there on a particular ice floe in the Antarctic?”

Counting a penguin population can be tricky. Penguins tend to move around and swim off, and it’s cold! So scientists use aerial photographs and statistical sampling to estimate population size. Some of the techniques they use are quite sophisticated, but we can look at a simplified version of their approach to examine the basic ideas of random sampling and estimation.

Imagine a large, snow-covered, square region of the Antarctic that is inhabited by penguins. From above, it would look like a white square sprinkled with black dots:

If you had access to such an aerial view, you could count the dots to determine the number of penguins in this region. But suppose the region was too large to see in one photo. You might instead take 100 photographs of the 100 smaller square sub-regions, count the penguins in each sub-region, and total these to obtain a count for the entire region.

However, this might take too long and be too expensive. So here’s another alternative: You can select a representative sample of the sub-regions, obtain photos of only these, and use the counts from these sub-regions to estimate the total number of penguins in the entire region. See Note 2 below.

**In This Part:**** Making Estimates
**A possible sample might look like the one below. Let’s explore how we might use the information in this sample to estimate the total number of penguins in the entire region.

**Problem A1
**Suppose you had access to three samples: one with a single photo of one of the 100 sub-regions, one with photos of two sub-regions, and one with photos of three sub-regions. Use the results from each of these samples (pictured below) to make an estimate of the total number of penguins in the entire region (i.e., all 100 sub-regions).

Record your counts and estimates in this table:

In Samples B and C, you will need to use the sample results to make a “best guess” for the number of penguins in the entire region. What methods have you learned for coming up with such a guess?

In Problem A1, you may have determined a general rule for estimating the number of penguins in the entire population. One useful method is to find the mean of the counts in the sample and then multiply the mean by 100 (the number of sub-regions). See Note 3 below.

**Problem A2
**Below is a sample of 10 sub-regions. Based on the number of penguins in this sample, make an estimate of the number of penguins in the entire region:

In making estimates by sampling, there is a balancing act in selecting the sample size. A larger sample size may cost more money or be more difficult to generate, but it should provide a more accurate estimate of the population characteristic you are studying. On the other hand, a sample size that is too small may not be accurate enough for you to be certain of your results.

### Notes

**Note 2
**Working with a spatial representation of a population offers several advantages for the introduction of sampling ideas. You can picture the population, and, more importantly, you can view samples in relation to the population.

You are asked to think about how you might use the information in a sample to estimate the total number of penguins in the entire region. Some textbook presentations of sampling and estimation skip this question; the text gives the definition of the estimator and proceeds from there. It is important to understand that the estimator is a human invention, and that you can choose your own method for estimating a total number based on a sample.

**Note 3**

Some people will come up with a workable idea for estimating the total number of penguins immediately, while others may need some direction. It helps to start with a sample of one sub-region, as directed in this part. Some people will suggest multiplying the number of dots in the one sub-region by 100. Next, a sample of two and then three sub-regions can evolve into the idea of averaging the number of dots in the sub-regions before multiplying by 100.

### Solutions

**Problem A1**

**Sample A:**

This sample of one sub-region shows five penguins:

Based on this limited information, you might guess that each and every sub-region contains five penguins. Since there are 100 sub-regions, your estimate of the total number of penguins would be 100 x 5 = 500.

**Sample B:**

This sample of two sub-regions contains 5 + 6 = 11 penguins, or an average of 11/2 penguins per sub-region:

Based on this limited information, you might guess that the average for all 100 sub-regions is 11/2 penguins. Since there are 100 sub-regions, your estimate of the total number of penguins would be 100 x (11/2) = 550.

**Sample C:**

This sample of three sub-regions contains 5 + 6 + 3 = 14 penguins, or an average of 14/3 penguins per sub-region:

Based on this limited information, you might guess that the average for all 100 sub-regions is 14/3 penguins. Since there are 100 sub-regions, your estimate of the total number of penguins would be 100 x (14/3) = 1,400/3, or, to the nearest penguin, 467 penguins.

Here is the completed table:

**Problem A2
**First, find the average number of penguins in each sub-region of the sample. The total number of penguins is 5 + 6 + 6 + 7 + 5 + 2 + 1 + 5 + 5 + 3 = 45. Since there are 10 sub-regions in the sample, the average number of penguins is 45/10. Therefore, a good estimate for the total number of penguins is 100 x 45/10 = 450 penguins.