Private: Learning Math: Data Analysis, Statistics, and Probability
Min, Max and the Five-Number Summary Part C: Quartiles and the Five-Number Summary (35 minutes)
Let’s go back to your 12 ordered noodles, arranged from shortest to longest on a new piece of paper or cardboard. As before, two of the noodles will be Min and Max. Now we’re going to identify three noodles that divide the 12 (including Min and Max) into four groups of the same size.
First, divide your noodles into four groups with an equal number of noodles in each:
As with the Three-Noodle Summary for an even data set, we need to insert three extra lines, which we’ll label Q1, Q2, and Q3, to divide and define the groups:
Note that Q2 is the median (Med) of this data set, since six noodles are to the left of Q2 and six are to the right.
What is the median of the six noodles to the left of Q2? What is the median of the six noodles to the right of Q2?
The median divides the set equally, so the median in a set of six noodles is the value that has three noodles to the left of it and three noodles to the right.
Q1, Q2, and Q3 are called quartiles, since they divide the noodles into four groups (i.e., quarters), with an equal number of noodles in each group. The line Q1 is the median of the six noodles to the left of Q2, and Q3 is the median of the six noodles to the right of Q2. Q2 is the median of the entire set of noodles.
The Five-Noodle Summary consists of Min, Q1, Med (Q2), Q3, and Max:
Using the information given in this Five-Noodle Summary, describe what you know about the 12 noodles. For example, what do you know about the ninth noodle, and what information are you still missing?
To convert the Five-Noodle Summary to the Five-Number Summary, use the same procedure you’ve followed throughout this session. Add a vertical number line so that you can indicate the lengths of the five noodles:
Remove the noodles, and you’re left with the Five-Number Summary:
The number Q1 is called the first or lower quartile. The number Q3 is called the third or upper quartile.
If N4 is the length of the fourth noodle, what information would you know about N4 from the Five-Number Summary?
Ralph claims that the Five-Number Summary is enough to know that N4 is closer to Q1 than it is to Med. He says, “Since N4, N5, and N6 are all between Q1 and Med, N4 has to be closer to Q1 than it is to Med.” Is his reasoning valid? Why or why not?
Try to build a data set that shows whether or not Ralph’s claim is valid.
In This Part: More Five-Number Summaries
In the previous example, there were 12 noodles. Twelve is a convenient number of data values for introducing quartiles, because it is an even number and it is divisible by four. In this case, the quartiles separate the data into groups that each contain three values.
Quartiles always produce four groups of data with an equal number of data values in each group. But when the total number of data values is not divisible by four, it’s trickier to determine exactly how many values will be in each of the four groups.
Determining quartiles is a two-step process:
• First, find the median, or Med. Med divides the ordered data into two groups of equal size. One group contains data values to the left of Med, and the other contains data values to the right of Med.
• Next, find the median of the data values to the left of Med, which is the first quartile (Q1). Similarly, the third quartile (Q3) is the median of the data values to the right of Med.
Let’s illustrate how this works for 13 data values (i.e., noodles). Since the total number of noodles is now odd, the median will be one of the original 13 noodles. Note that there is the same number of noodles to the left of Med as there is to the right of Med. Since you cannot divide the 13 noodles into two equal groups without splitting a noodle, take one noodle in the middle as the median and divide the other 12 noodles into two equal groups. This will occur whenever there is an odd number of noodles. The two equal groups will have exactly half of the noodles, with one noodle left in the middle as the median.
Now we find Q1, the median of the six noodles to the left of Med, and Q3, the median of the six noodles to the right of Med. Because there is an even number of noodles to the left and right of Med, Q1 and Q3 will be represented by lines between a pair of noodles.
Note that there are three noodles to the left of Q1, three noodles between Q1 and Med, three noodles between Med and Q3, and three noodles to the right of Q3. Also note that each group of three noodles is approximately one-fourth of the total of 13 noodles. As with the calculation of the median, the quartiles split each half of the noodles into two equal groups; if there is an odd number of noodles in a half, one will be left in the middle as the quartile.
In This Part: Review
Review how you would find the Five-Noodle Summary using a set of 12 noodles and a set of 13 noodles.
Explain how you would create a Five-Noodle Summary for 14 noodles. How many noodles are in each of the four groups?
Remember that the median of a group may be represented by a noodle or by a line drawn halfway between two noodles. Since a quartile is the median of half the data, it may also be represented by a noodle or by a line drawn halfway between two noodles.
Explain how you would create a Five-Number Summary for 15 noodles. How many numbers are in each of the four groups?
TAKE IT FURTHER
How many numbers are in each of the four groups if you started with 57 noodles? With 112 noodles? Can you find a rule that would allow you to determine the number of values in each group without creating a Five-Number Summary?
In general, the Five-Number Summary divides ordered numeric data into four groups, with each group having the same number of data values. If you know only the Five-Number Summary (Min, Q1, Med, Q3, and Max), these five values still give you a lot of information:
• All the data values are between Min and Max.
• Med divides the ordered data into two groups, with an equal number of values (approximately half) in each group:
• One group contains data values to the left of Med.
• One group contains data values to the right of Med.
• The quartiles divide the ordered data into four groups, with an equal number of values (approximately one-fourth) in each group:
• One group contains values to the left of Q1 (and includesMin).
• One group contains values between Q1 and Med.
• One group contains values between Med and Q3.
• One group contains values to the right of Q3 (and includes Max).
What information is learned from the interquartile range, the length of the interval between Q1 and Q3? Think about why this might be useful in describing the variation in your data.
For six noodles, the median is located between the third and fourth noodle. For the six noodles to the left of Q2, this median will be Q1. Similarly, the median of the six noodles to the right of Q2 will be Q3.
You would know that all of the lengths are between Min and Max, and that Med (Q2) divides the ordered data into two equal-sized groups. Six noodles will be shorter than Med, and six will be longer. The quartiles then divide the ordered data into four equal-sized groups. The first group contains three noodles shorter than Q1; these three noodles must have lengths the size of or larger than Min and smaller than Q1. The second group contains three noodles that are longer than Q1 but shorter than Med. The third group contains three noodles that are longer than Med but shorter than Q3. The final group contains three noodles that are longer than Q3 and the size of or smaller than Max. (For example, the ninth noodle is longer than Med and shorter than Q3.) You still don’t know how the three noodles in each group are distributed — only the ends of each interval. (For example, you don’t know whether the ninth noodle is closer to Q3 or to Med.)
You would know that N4 is larger than Q1, the first quartile, and that it is shorter than Med, the median.
No, Ralph’s reasoning is not necessarily valid. Here is a sample data set of noodle lengths, measured to the nearest millimeter: 30, 35, 38, 60, 61, 62, 64, 67, 70, 75, 90, 96. The fourth noodle, N4, has a length of 60 mm. The first quartile, Q1, is (38 + 60) / 2 = 49 mm. The median is (62 + 64) / 2 = 63 mm. In this set, N4 is closer to Med than to Q1. Remind Ralph that the information in the Five-Number Summary, while valuable, does not tell us anything about the actual values in each interval. Ralph’s claim would only be valid if the data are equally spaced, for example if each length was a multiple of 10.
First, find the median, between the seventh and eighth noodles. The first quartile is the median of the seven shortest noodles, which is the fourth noodle. The third quartile is the median of the seven longest noodles, which is the 11th noodle. There will be three noodles in each of the four groups.
First, find the median, which will be the eighth noodle. The first quartile is the median of the seven shortest noodles, which is the fourth noodle. The third quartile is the median of the seven longest noodles, which is the 12th noodle. There will be three noodles in each of the four groups.
If you started with 57 noodles, there would be 14 noodles in each group (the median is the 29th noodle). If you started with 112 noodles, there would be 28 noodles in each group (the median is between the 56th and 57th noodles). One possible rule is to take the number of noodles, divide by four, and then round down if you have a fractional result.
The interquartile range contains the center 50% of the data. This is a useful interval for describing variation; if the interquartile range is small compared to the overall range (from Min to Max), it suggests that there are a lot of extreme values in the data. If the interquartile range is wide compared to the overall range, it suggests that there are few extreme values and that the data are pretty tightly grouped.
Session 1 Statistics As Problem Solving
Consider statistics as a problem-solving process and examine its four components: asking questions, collecting appropriate data, analyzing the data, and interpreting the results. This session investigates the nature of data and its potential sources of variation. Variables, bias, and random sampling are introduced.
Session 2 Data Organization and Representation
Explore different ways of representing, analyzing, and interpreting data, including line plots, frequency tables, cumulative and relative frequency tables, and bar graphs. Learn how to use intervals to describe variation in data. Learn how to determine and understand the median.
Session 3 Describing Distributions
Continue learning about organizing and grouping data in different graphs and tables. Learn how to analyze and interpret variation in data by using stem and leaf plots and histograms. Learn about relative and cumulative frequency.
Session 4 Min, Max and the Five-Number Summary
Investigate various approaches for summarizing variation in data, and learn how dividing data into groups can help provide other types of answers to statistical questions. Understand numerical and graphic representations of the minimum, the maximum, the median, and quartiles. Learn how to create a box plot.
Session 5 Variation About the Mean
Explore the concept of the mean and how variation in data can be described relative to the mean. Concepts include fair and unfair allocations, and how to measure variation about the mean.
Session 6 Designing Experiments
Examine how to collect and compare data from observational and experimental studies, and learn how to set up your own experimental studies.
Session 7 Bivariate Data and Analysis
Analyze bivariate data and understand the concepts of association and co-variation between two quantitative variables. Explore scatter plots, the least squares line, and modeling linear relationships.
Session 8 Probability
Investigate some basic concepts of probability and the relationship between statistics and probability. Learn about random events, games of chance, mathematical and experimental probability, tree diagrams, and the binomial probability model.
Session 9 Random Sampling and Estimation
Learn how to select a random sample and use it to estimate characteristics of an entire population. Learn how to describe variation in estimates, and the effect of sample size on an estimate's accuracy.
Session 10 Classroom Case Studies, Grades K-2
Explore how the concepts developed in this course can be applied through a case study of a K-2 teacher, Ellen Sabanosh, a former course participant who has adapted her new knowledge to her classroom.
Session 11 Classroom Case Studies, Grades 3-5
Explore how the concepts developed in this course can be applied through case studies of a grade 3-5 teacher, Suzanne L'Esperance and grade 6-8 teacher, Paul Snowden, both former course participants who have adapted their new knowledge to their classrooms.