Learning Math: Data Analysis, Statistics, and Probability
Random Sampling and Estimation Part A: Random Samples (15 minutes)
In This Part: Counting Penguins
Statisticians often use a random sample to estimate characteristics of a population when the population is very large and they cannot obtain data on every individual in the population. Statistical estimation asks the fundamental questions “What can I say about a whole population based on information from a random sample of that population?” and “To what degree can I say that my estimate is accurate?” Let’s put random sampling into action to answer a question about demographics: “How many penguins are there on a particular ice floe in the Antarctic?”
Counting a penguin population can be tricky. Penguins tend to move around and swim off, and it’s cold! So scientists use aerial photographs and statistical sampling to estimate population size. Some of the techniques they use are quite sophisticated, but we can look at a simplified version of their approach to examine the basic ideas of random sampling and estimation.
Imagine a large, snow-covered, square region of the Antarctic that is inhabited by penguins. From above, it would look like a white square sprinkled with black dots:
If you had access to such an aerial view, you could count the dots to determine the number of penguins in this region. But suppose the region was too large to see in one photo. You might instead take 100 photographs of the 100 smaller square sub-regions, count the penguins in each sub-region, and total these to obtain a count for the entire region.
However, this might take too long and be too expensive. So here’s another alternative: You can select a representative sample of the sub-regions, obtain photos of only these, and use the counts from these sub-regions to estimate the total number of penguins in the entire region. See Note 2 below.
In This Part: Making Estimates
A possible sample might look like the one below. Let’s explore how we might use the information in this sample to estimate the total number of penguins in the entire region.
Suppose you had access to three samples: one with a single photo of one of the 100 sub-regions, one with photos of two sub-regions, and one with photos of three sub-regions. Use the results from each of these samples (pictured below) to make an estimate of the total number of penguins in the entire region (i.e., all 100 sub-regions).
Record your counts and estimates in this table:
In Samples B and C, you will need to use the sample results to make a “best guess” for the number of penguins in the entire region. What methods have you learned for coming up with such a guess?
In Problem A1, you may have determined a general rule for estimating the number of penguins in the entire population. One useful method is to find the mean of the counts in the sample and then multiply the mean by 100 (the number of sub-regions). See Note 3 below.
Below is a sample of 10 sub-regions. Based on the number of penguins in this sample, make an estimate of the number of penguins in the entire region:
In making estimates by sampling, there is a balancing act in selecting the sample size. A larger sample size may cost more money or be more difficult to generate, but it should provide a more accurate estimate of the population characteristic you are studying. On the other hand, a sample size that is too small may not be accurate enough for you to be certain of your results.
Working with a spatial representation of a population offers several advantages for the introduction of sampling ideas. You can picture the population, and, more importantly, you can view samples in relation to the population.
You are asked to think about how you might use the information in a sample to estimate the total number of penguins in the entire region. Some textbook presentations of sampling and estimation skip this question; the text gives the definition of the estimator and proceeds from there. It is important to understand that the estimator is a human invention, and that you can choose your own method for estimating a total number based on a sample.
Some people will come up with a workable idea for estimating the total number of penguins immediately, while others may need some direction. It helps to start with a sample of one sub-region, as directed in this part. Some people will suggest multiplying the number of dots in the one sub-region by 100. Next, a sample of two and then three sub-regions can evolve into the idea of averaging the number of dots in the sub-regions before multiplying by 100.
This sample of one sub-region shows five penguins:
Based on this limited information, you might guess that each and every sub-region contains five penguins. Since there are 100 sub-regions, your estimate of the total number of penguins would be 100 x 5 = 500.
This sample of two sub-regions contains 5 + 6 = 11 penguins, or an average of 11/2 penguins per sub-region:
Based on this limited information, you might guess that the average for all 100 sub-regions is 11/2 penguins. Since there are 100 sub-regions, your estimate of the total number of penguins would be 100 x (11/2) = 550.
This sample of three sub-regions contains 5 + 6 + 3 = 14 penguins, or an average of 14/3 penguins per sub-region:
Based on this limited information, you might guess that the average for all 100 sub-regions is 14/3 penguins. Since there are 100 sub-regions, your estimate of the total number of penguins would be 100 x (14/3) = 1,400/3, or, to the nearest penguin, 467 penguins.
Here is the completed table:
First, find the average number of penguins in each sub-region of the sample. The total number of penguins is 5 + 6 + 6 + 7 + 5 + 2 + 1 + 5 + 5 + 3 = 45. Since there are 10 sub-regions in the sample, the average number of penguins is 45/10. Therefore, a good estimate for the total number of penguins is 100 x 45/10 = 450 penguins.
Session 1 Statistics As Problem Solving
Consider statistics as a problem-solving process and examine its four components: asking questions, collecting appropriate data, analyzing the data, and interpreting the results. This session investigates the nature of data and its potential sources of variation. Variables, bias, and random sampling are introduced.
Session 2 Data Organization and Representation
Explore different ways of representing, analyzing, and interpreting data, including line plots, frequency tables, cumulative and relative frequency tables, and bar graphs. Learn how to use intervals to describe variation in data. Learn how to determine and understand the median.
Session 3 Describing Distributions
Continue learning about organizing and grouping data in different graphs and tables. Learn how to analyze and interpret variation in data by using stem and leaf plots and histograms. Learn about relative and cumulative frequency.
Session 4 Min, Max and the Five-Number Summary
Investigate various approaches for summarizing variation in data, and learn how dividing data into groups can help provide other types of answers to statistical questions. Understand numerical and graphic representations of the minimum, the maximum, the median, and quartiles. Learn how to create a box plot.
Session 5 Variation About the Mean
Explore the concept of the mean and how variation in data can be described relative to the mean. Concepts include fair and unfair allocations, and how to measure variation about the mean.
Session 6 Designing Experiments
Examine how to collect and compare data from observational and experimental studies, and learn how to set up your own experimental studies.
Session 7 Bivariate Data and Analysis
Analyze bivariate data and understand the concepts of association and co-variation between two quantitative variables. Explore scatter plots, the least squares line, and modeling linear relationships.
Session 8 Probability
Investigate some basic concepts of probability and the relationship between statistics and probability. Learn about random events, games of chance, mathematical and experimental probability, tree diagrams, and the binomial probability model.
Session 9 Random Sampling and Estimation
Learn how to select a random sample and use it to estimate characteristics of an entire population. Learn how to describe variation in estimates, and the effect of sample size on an estimate's accuracy.
Session 10 Classroom Case Studies, Grades K-2
Explore how the concepts developed in this course can be applied through a case study of a K-2 teacher, Ellen Sabanosh, a former course participant who has adapted her new knowledge to her classroom.
Session 11 Classroom Case Studies, Grades 3-5
Explore how the concepts developed in this course can be applied through case studies of a grade 3-5 teacher, Suzanne L'Esperance and grade 6-8 teacher, Paul Snowden, both former course participants who have adapted their new knowledge to their classrooms.