Learning Math: Data Analysis, Statistics, and Probability
Min, Max and the Five-Number Summary Part B: The Median and the Three-Number Summary (35 Minutes)
In This Part: The Median
Another useful summary measure for a collection of data is the median. As you learned in Session 2, the median is the middle data value in an ordered list. Here’s one way to find the median of our ordered noodles.
First, place your 11 noodles in order from shortest to longest on a new piece of paper or cardboard. Your arrangement should look something like this:
Next, remove two noodles at a time, one from each end, and put them to the side:
Continue this process until only one noodle remains. This noodle is the median. Label it “Med”:
Notice that the median divides the set of 11 noodles into two groups of equal size — the five noodles shorter than the median and the five noodles longer than the median. Another way to say this is that there are just as many noodles before the median as there are after the median.
If you could see only the median noodle, what would you know about the other noodles?
What would knowing the median tell you about each of the first five (the shortest five) noodles? What would it tell you about each of the last five (the longest five) noodles?
If you could see only the median noodle, describe some information you would not know about the other noodles.
In This Part: The Three-Noodle Summary
Now remove all the noodles except Min, Med, and Max.
We’ll call this display the “Three-Noodle Summary.
If you could see Min, Med, and Max, what would you know about the other noodles? Be specific about how this compares to Problem A3 (where you only knew Min and Max) and Problem B1 (where you only knew Med).
Describe some information you still wouldn’t know about the other noodles from the Three-Noodle Summary.
In This Part: The Three-Number Summary
Now let’s convert the Three-Noodle Summary to the Three-Number Summary. If they’re not already there, place the three noodles — Min, Med, and Max — in order on the horizontal axis.
Next add a vertical number line, and mark the lengths of the three noodles. (Left)
Remove the noodles, and you’re left with the Three-Number Summary. (Right)
If we call the length of the fourth noodle N4, how does N4 compare to Min, Med, and Max? What wouldn’t you know about N4 if you only knew Min, Med, and Max?
In This Part: Even Data Sets
In the previous example, it wasn’t hard to find the median because there were 11 noodles — an odd number. For an odd number of noodles, the median is the noodle in the middle. But how do we find the median for an even number of noodles?
Add a 12th noodle, with a different length from the other 11 noodles, to the original collection. Arrange the noodles in order from shortest to longest.
Using the method of removing pairs of noodles (the longest and the shortest), try to determine the median noodle length. What happens?
This time, there won’t be one remaining noodle in the middle — there will be two! If you remove this middle pair, you’ll have no noodles left.
Therefore, you’ll need to draw a line midway between the two remaining noodles to play the role of the median. The length of this line should be halfway between the lengths of the two middle noodles:
Move the middle pair aside, and you can see your new median:
Notice that this median still divides the set of noodles into two groups of the same size — the six noodles shorter than the median and the six noodles longer than the median:
The major difference is that, this time, the median is not one of the original noodles; it was computed to divide the set into two equal parts.
Note: It is a common mistake to include this median in your data set when you’ve added it in this way. This median, however, is not part of your data set.
In this video segment, participants discuss the process of finding the median of a data set with an even number of values (in this case n = 20). Watch this video segment to review the process you used in Problem B6 or if you would like further explanation.
Note: The data set used by the onscreen participants is different from the one provided above.
If you could see only the median of a set of 12, what would you know about the other noodles?
You can convert the Three-Noodle Summary for these 12 noodles to the Three-Number Summary in the same way you did it for the set of 11 noodles:
Add a vertical number line, and mark the lengths of the three noodles:
Remove the noodles, and you’re left with the Three-Number Summary:
In This Part: Review
As we have seen with the noodle examples, the median divides ordered numeric data into two groups, each with the same number of data values.
If you only know the Three-Number Summary (Min, Med, and Max) for a set of data, you can still glean quite a bit of information about the data. You know that all the data values are between Min and Max, and you know that Med divides the data into two groups of equal size. One group contains data values to the left of Med, and the other group contains data values to the right of Med. You also know that the group of values to the left of the median must be lower than (or equal to) the median in value, and that the group of values to the right of the median must be greater than (or equal to) the median in value.
You would know that there must be exactly five noodles shorter than the median noodle and five noodles longer than the median noodle.
You would not know the actual values of any of the other noodles: The five shorter noodles could be extremely short, the five longer noodles could be many feet long, they could all be fairly close in size to the median, etc. You would also not know or be able to estimate the maximum or minimum length of the other noodles.
You would know that all of the noodles are between Min and Max, and you can divide the noodles into two equal groups: five that are shorter than Med (including Min) and five that are longer than Med (including Max). This information gives you two specific intervals that contain an equal number of noodles, and all of the noodles are contained in these intervals. This is different from Problem A3, where you knew nothing about the size of the noodles between Min and Max, and from Problem B1, where you knew nothing about the upper and lower boundaries of your data set.
You still wouldn’t know the lengths of the noodles in the two intervals between Min and Med, or between Med and Max. These noodles could be very close to Med, very close to the extreme values, evenly spread within the intervals, or something else entirely. There is no way to know without more information.
You would know that N4 must be larger than Min, smaller than Med, and smaller than Max. This is true because N6 is the median, and N4 must be smaller than N6. You still wouldn’t know N4’s actual value or whether N4 was closer to Min or to Med. (A common mistake is to claim that N4 must be closer to Med than it is to Min. This is not necessarily true, since the values of N2 through N5 can be anywhere in the interval between Min and Med; for example, they could all be very close to Min.)
Session 1 Statistics As Problem Solving
Consider statistics as a problem-solving process and examine its four components: asking questions, collecting appropriate data, analyzing the data, and interpreting the results. This session investigates the nature of data and its potential sources of variation. Variables, bias, and random sampling are introduced.
Session 2 Data Organization and Representation
Explore different ways of representing, analyzing, and interpreting data, including line plots, frequency tables, cumulative and relative frequency tables, and bar graphs. Learn how to use intervals to describe variation in data. Learn how to determine and understand the median.
Session 3 Describing Distributions
Continue learning about organizing and grouping data in different graphs and tables. Learn how to analyze and interpret variation in data by using stem and leaf plots and histograms. Learn about relative and cumulative frequency.
Session 4 Min, Max and the Five-Number Summary
Investigate various approaches for summarizing variation in data, and learn how dividing data into groups can help provide other types of answers to statistical questions. Understand numerical and graphic representations of the minimum, the maximum, the median, and quartiles. Learn how to create a box plot.
Session 5 Variation About the Mean
Explore the concept of the mean and how variation in data can be described relative to the mean. Concepts include fair and unfair allocations, and how to measure variation about the mean.
Session 6 Designing Experiments
Examine how to collect and compare data from observational and experimental studies, and learn how to set up your own experimental studies.
Session 7 Bivariate Data and Analysis
Analyze bivariate data and understand the concepts of association and co-variation between two quantitative variables. Explore scatter plots, the least squares line, and modeling linear relationships.
Session 8 Probability
Investigate some basic concepts of probability and the relationship between statistics and probability. Learn about random events, games of chance, mathematical and experimental probability, tree diagrams, and the binomial probability model.
Session 9 Random Sampling and Estimation
Learn how to select a random sample and use it to estimate characteristics of an entire population. Learn how to describe variation in estimates, and the effect of sample size on an estimate's accuracy.
Session 10 Classroom Case Studies, Grades K-2
Explore how the concepts developed in this course can be applied through a case study of a K-2 teacher, Ellen Sabanosh, a former course participant who has adapted her new knowledge to her classroom.
Session 11 Classroom Case Studies, Grades 3-5
Explore how the concepts developed in this course can be applied through case studies of a grade 3-5 teacher, Suzanne L'Esperance and grade 6-8 teacher, Paul Snowden, both former course participants who have adapted their new knowledge to their classrooms.