Learning Math: Data Analysis, Statistics, and Probability
Describing Distributions Part D: Ordering Hats: (35 minutes)
In This Part: Understanding the Question
In Parts A-C of this session, you learned several ways to organize numerical data by forming groups. Grouping is especially useful for wide-ranging data or data measured on the number line. See
In the following activity, you will apply several of the methods you have learned for grouping data to solve a problem about how many hats to order. Hats are made in a variety of styles and sizes. A merchant must decide what styles to keep in stock and how many of each size to order. At our theoretical hat shop, a unisex “Standard Fit” hat is available in the following sizes:
Are certain hat sizes more common than others? If not, then an equal number of each hat size can be ordered. But if certain sizes are more common, the merchant needs to order larger quantities of the more common sizes.
Before you begin, make an initial guess about whether you expect all hat sizes to be equally common. Explain your answer. If you think some hat sizes will be more common than others, which hat size would you expect to be the most common, and why?
In this video segment, the hat-ordering problem is introduced, and participants describe their initial expectations for a hat-size distribution. Watch this segment after completing Problem D1. How could a hat merchant determine which hat sizes are most common?
You can find this segment on the session video approximately 7 minutes and 46 seconds after the Annenberg Media logo.
Hat size is clearly determined by head size. Several possible measurements of the human head might be used to describe head size. Mail-order catalogs ask you to measure your head circumference and then determine your hat size from a table of circumferences they provide. To find your head circumference, measure the largest part of your head by placing the measuring tape just above your eyebrows.
What size Standard Fit hat would you wear?
In This Part: Data Analysis Using a Stem and Leaf Plot
Let’s plan an order for 1,000 Standard Fit hats.
1. Ask a Question
How large are people’s heads? Are some head sizes more common than others? For an order of 1,000 unisex Standard Fit hats, how many of each size should you order?
2. Collect Data
We used a metric tape measure to measure the head circumferences of 55 people to the nearest millimeter:
3. Analyze the Data
Create a stem and leaf plot for these data, using stems that correspond to Standard Fit hat sizes. Keep in mind that these are three-digit numbers and that, for these data, the stems will be based on the left two digits of the values.
4. Interpret the Results
Based on the stem and leaf plot, what are some things you can say about the way head sizes are distributed? Give two descriptive statements to answer the question “How large are people’s heads?”
Based on the stem and leaf plot, do some head sizes
What would happen if you ordered an equal number of each size of Standard Fit hats?
In This Part: Using a Histogram to Analyze the Hat-Size Data
The stem and leaf plot for this data set looks like this:
3. Analyze the Data
Use the stem and leaf plot to determine the following:
a. The grouped frequency and relative frequency tables for the head-circumference data
b. The frequency histogram for the head-circumference data
What information in the data is “lost” when the distribution is represented by a grouped frequency table and histogram instead of a stem and leaf plot?
4. Interpret the Results
What can you say about the way head sizes are distributed? Based on the grouped relative frequency table and the histogram, give two descriptive statements to answer the question “How large are people’s heads?”
Based on the grouped frequency table and the histogram, do some head sizes appear to be more common than others? Which head sizes are most common? Least common?
Based on these data, plan an order for 1,000 Standard Fit hats.
You will need to convert the relative frequencies into quantities of hats, adding up to 1,000. If you listed the frequencies as percentages, your data are already represented as portions of 100. Think about how you might convert your data so that they represent portions of 1,000.
Using frequency computations, the total number of hats might not be exactly 1,000.
a. Why did this happen?
b. To complete the order of 1,000, for which size would you order one more?
Do you see anything unusual in the variation illustrated in the stem and leaf plot and the relative frequency histogram? Can you think of a reason for this unusual pattern?
It may help to recall that this is a unisex “Standard Fit” hat size.
In this video segment, participants use a stem and leaf plot to analyze head-circumference data collected by the class. Based on what they see, they revise their initial expectations for the distribution of hat sizes. Watch this segment after completing Problem D13.
Note: The data set used by the onscreen participants is different from the one provided above.
What might one expect the middle values to be like? What accounts for the unexpected results?
You can find this segment on the session video approximately 13 minutes and 31 seconds after the Annenberg Media logo.
Use these same data to plan an order for two more hat styles:
a. Loose Fit: Five hat sizes; hat sizes are separated by 20 mm.
b. Exclusive Fit: 20 hat sizes; hat sizes are separated by five mm.
In This Part: Summary
In this session, we examined several different ways to organize continuous data measured on a number line. It is often helpful to organize this kind of data by grouping it. The stem and leaf plot is a grouping device that is useful for moderately sized data sets.
The grouped relative frequency table and histogram are more useful devices for larger data sets, since they allow you to visualize your data as portions of larger intervals. These representations, along with grouped cumulative frequency and relative frequency tables, allow you to recognize trends in large data sets by comparing the relative number of data values in each interval.
The following Interactive Activity uses the hat-size data from Part D to allow you to review these tabular and graphic representations of your data. Click through each representation to see how the display in one format relates to the display in the other format.
All of these methods are useful for summarizing the variation in numeric data so that we can provide better answers to the statistical questions we’re investigating.
In This Part: Homework
This series of problems leads you through the creation of a histogram and its corresponding tables for the data in Parts B and C, which you will now group by fives. Start with the stem and leaf plot for the grouping-by-fives scenario:
Create a grouped frequency table for this data set where the intervals have a width of five seconds.
The first interval will be 30 to < 35, the next will be 35 to < 40, etc. The last interval will be 90 to < 95.
Create a histogram for this data set where the intervals have a width of five seconds.
If you have difficulty, refer to the guide in Part B.
Using either the histogram or the grouped frequency table, create a relative frequency table and relative frequency histogram for this data set.
Refer to the guide in Part C if you have trouble here.
Use the information from Problems H1-H3 to create a cumulative frequency and relative cumulative frequency chart for this data set.
Using only the histogram and grouped relative frequency table based on an interval width of five, give two descriptive statements that provide an answer to the question “How well do people judge when a minute has elapsed?”
Based on the information in these problems, is it now possible to go back and answer any of the questions in Problem C1 that previously could not be answered with a histogram? Can you give more accurate answers for some of the questions in Problem C1? Are there some questions that still cannot be answered with a histogram?
Kader, Gary and Perry, Mike (September-October, 1994). Learning Statistics with Technology. Mathematics Teaching in the Middle School, 1 (2), 130-136.
Reproduced with permission from Mathematics Teaching in the Middle School. Copyright © 1994 by the National Council of Teachers of Mathematics. All rights reserved.
Pereira-Mendoza, Lionel and Dunkels, Andrejs (Summer, 1989). Stem-and-Leaf Plots in the Primary Grades. Teaching Statistics, 11 (2), 34-37.
This article first appeared in Teaching Statistics <http://science.ntu.ac.uk/rsscse/ts/> and is used with permission.
If you are working in a group, use your own data for the Ordering Hats activity. Each person should measure the head circumferences of several adults ahead of time, then bring their data to class. The group should have a total of 50-60 head circumferences for their data set. Also, consider having each person measure an equal number of men’s and women’s head circumferences. As an extension, you can look at the data separately for each sex.
Fathom Dynamic StatisticsTM Software, used by the onscreen participants, is helpful in creating graphical representations of data. You can use Fathom software to complete Problems D3-D13, as well as Homework Problems H1-H6. For more information, go to the Key Curriculum Press Web site at http://www.keypress.com/fathom/.
Measure your head and find out!
Here is the completed stem and leaf plot:
- All heads are between 520 and 615 mm.
- There is a range of 95 mm, which indicates a lot of variation in head circumferences.
- Thirty-five of the 55 head circumferences (63.6%) are between 550 and 587 mm, a range of 37 mm.
- Twenty-three of the 55 head circumferences (41.8%) are between 550 and 569 mm, a range of only 19 mm.
Head sizes between 550 and 569 mm are the most common. Head sizes below 540 mm and above 610 mm are the least common.
You would quickly sell out of the more common sizes and have many of the least common sizes still on hand.
Note that the relative frequencies add up to 99.9%, due to rounding.
You no longer have the actual data values, only the number of values within intervals of 10 millimeters
Answers will vary, but here are some observations:
- All heads are between 520 and 620 mm.
- There is a range of 100 mm, which indicates a lot of variation in head circumferences.
- Thirty-five of the 55 head circumferences (63.6%) are between 550 and 590 mm, a range of 40 mm.
- Twenty-three of the 55 head circumferences (41.8%) are between 550 and 570 mm, a range of only 20 mm.
Head sizes between 550 and 570 mm are the most common. Head sizes below 540 mm and above 610 mm are the least common.
Perform this by expressing the relative frequency as a decimal, then multiplying this decimal by 1,000. (If you wanted to work with the percentage value without converting it to a decimal, you need to remember that percentages are per 100, so you would need to multiply the percentage value by 10 to find the number per 1,000.)
a. You should have found a total of only 999 hats, due to the rounding in the relative frequencies from Problem D7.
b. Answers will vary. One possible answer is to use S4 or S5, since they are the most common sizes. Another is to use either S3 or S9, since the numbers of hats in these sizes when written as decimals are closest to being rounded up (S9, for example, would be 145.4545… hats).
Yes. There are two distinct peaks in the histogram, which may be due to the fact that male and female head sizes are mixed together in this data set. This raises several questions: Do men and women have similar-sized heads? If not, do men tend to have larger heads than women, or do women tend to have larger heads than men?
To calculate these answers, you will first need to set the hat sizes, then use the data values to determine the relative frequency of the hat sizes you selected, then multiply these frequencies expressed as decimals by 1,000 to determine how many of each you will order. Answers will vary, due to the flexibility in selecting the intervals for the hat sizes.
Session 1 Statistics As Problem Solving
Consider statistics as a problem-solving process and examine its four components: asking questions, collecting appropriate data, analyzing the data, and interpreting the results. This session investigates the nature of data and its potential sources of variation. Variables, bias, and random sampling are introduced.
Session 2 Data Organization and Representation
Explore different ways of representing, analyzing, and interpreting data, including line plots, frequency tables, cumulative and relative frequency tables, and bar graphs. Learn how to use intervals to describe variation in data. Learn how to determine and understand the median.
Session 3 Describing Distributions
Continue learning about organizing and grouping data in different graphs and tables. Learn how to analyze and interpret variation in data by using stem and leaf plots and histograms. Learn about relative and cumulative frequency.
Session 4 Min, Max and the Five-Number Summary
Investigate various approaches for summarizing variation in data, and learn how dividing data into groups can help provide other types of answers to statistical questions. Understand numerical and graphic representations of the minimum, the maximum, the median, and quartiles. Learn how to create a box plot.
Session 5 Variation About the Mean
Explore the concept of the mean and how variation in data can be described relative to the mean. Concepts include fair and unfair allocations, and how to measure variation about the mean.
Session 6 Designing Experiments
Examine how to collect and compare data from observational and experimental studies, and learn how to set up your own experimental studies.
Session 7 Bivariate Data and Analysis
Analyze bivariate data and understand the concepts of association and co-variation between two quantitative variables. Explore scatter plots, the least squares line, and modeling linear relationships.
Session 8 Probability
Investigate some basic concepts of probability and the relationship between statistics and probability. Learn about random events, games of chance, mathematical and experimental probability, tree diagrams, and the binomial probability model.
Session 9 Random Sampling and Estimation
Learn how to select a random sample and use it to estimate characteristics of an entire population. Learn how to describe variation in estimates, and the effect of sample size on an estimate's accuracy.
Session 10 Classroom Case Studies, Grades K-2
Explore how the concepts developed in this course can be applied through a case study of a K-2 teacher, Ellen Sabanosh, a former course participant who has adapted her new knowledge to her classroom.
Session 11 Classroom Case Studies, Grades 3-5
Explore how the concepts developed in this course can be applied through case studies of a grade 3-5 teacher, Suzanne L'Esperance and grade 6-8 teacher, Paul Snowden, both former course participants who have adapted their new knowledge to their classrooms.