Private: Learning Math: Data Analysis, Statistics, and Probability
Min, Max and the Five-Number Summary Part D: The Box Plot (25 minutes)
In This Part: Five-Number Summary with Measurement Data
Now we’ll look at how you can represent the Five-Number Summary graphically, using a box plot. For this activity, we will work with a set of 12 noodles with the following measurements (in millimeters):
Why is it necessary to order the data before creating a Five-Number Summary?
Let’s create a Five-Number Summary for this set of ordered data:
Determine Q1, Med, and Q3:
The lines representing Q1, Med, and Q3 each have lengths that are halfway between their adjacent noodles:
Q1 = (33 + 41) / 2 = 37
Med = (74 + 81) / 2 = 77.5
Q3 = (102 + 109) / 2 = 105.5
Here is the Five-Noodle Summary:
Add a vertical number line:
Here is the Five-Number Summary:
In This Part: Drawing a Box Plot
Once we have the Five-Number Summary, we can display it using a kind of graph known as a box plot. Here is the box plot for the noodle data we’ve been using:
The box plot is also called a box-and-whiskers plot. Though it looks very different from previous graphs, it’s just another way to represent the distribution of the data we’ve been working with all along:
- The lower whisker extends from Min to Q1. The length of this whisker indicates the range of the lowest (or, in this case, the shortest) fourth of the ordered data.
- The upper whisker extends from Q3 to Max. The length of this whisker indicates the range of the highest (or, in this case, the longest) fourth of the ordered data.
- The box (the rectangular portion of the graph) extends from Q1 to Q3, with a horizontal line segment indicating Med.
- The portion of the rectangle between Q1 and Med indicates the range of the second fourth of the ordered data.
- The portion of the rectangle between Med and Q3 indicates the range of the third fourth of the ordered data.
- The entire rectangle indicates the range of the middle half (the interquartile range) of the ordered data.
Note that the box plots can be drawn vertically or horizontally, depending on whether you display the Five-Number Summary along a vertical or a horizontal axis. See Note 2 below.
In this video segment, Professor Kader introduces the process of building a box plot. Watch this segment to review the process or to help you draw the box plots for the following problem.
Note: The data set used by the onscreen participants is different from the one provided above.
Let’s compare our noodle data as represented by the Five-Noodle Summary, the Five-Number Summary, and the box plot.
Review the sequence of illustrations on the previous page and on this page, to follow the progression from noodles through box plot.
Using the same scale for each plot, create a box plot for each of the data sets below, which we first saw in Session 2. Each is an ordered list of the number of raisins in a group of boxes from a particular brand. You may want to save your data for use in Session 6.
Start by listing the position for each value in the data set. For example, in the set of Brand A raisins, the value 23 is in the first position, 25 is in the second position, the second 25 is in the third position, and so forth.
Compare the two box plots from Problem D2 side by side. What conclusions can you draw about Brand A raisins in comparison to Brand B raisins, using only the box plots?
In this video segment, Professor Kader and participants use the box plot to compare different brands of raisins. They then discuss the usefulness of the box plot as a summary of data. Watch this segment after completing Problem D3.
Note: The data sets used by the onscreen participants is different from the ones provided above.
Is the box plot more useful for making comparisons between different distributions than a line plot? Why or why not?
FATHOM Dynamic StatisticsTM Software used with permission of Key Curriculum Press.
The Five-Number Summary uses intervals to describe the variation in different segments of your data. The longer the interval, the greater the variation. Some people will misinterpret a box plot. For example, given a box plot with the Q3-Max whisker considerably longer than the Min-Q1 whisker, one could think, “Wow, there are a lot more data in the highest interval than there are in the lowest interval.” We’re used to associating length with “how many” rather than “how far apart,” and we forget that the same number of values falls within each of these intervals.
It is also important to note the difference between a histogram and a box plot, another potential source of confusion. To construct a histogram, you prescribe intervals of uniform length and then count how many data values fall within each interval. To determine the five numbers for the box plot, you do the reverse: prescribe how many data values you want in each interval and then determine the intervals.
Fathom Software, used by the onscreen participants, is helpful in creating graphical representations of data. You can use Fathom Software to complete Problems D2-D3. For more information, go to the Key Curriculum Press Web Site at
Since the median and quartiles require separating the data into halves that are larger or smaller than a central value, it is necessary to order the data. If the data are unordered, it is much more difficult to find the value that splits the list into two equal groups.
To create a box plot, first create a Five-Number Summary for each data set:
a. For Brand A, here is the Five-Number Summary:
Min = 23
Q1 = 27
Med = 29.5
Q3 = 32
Max = 39
Here is the box plot:
b. For Brand B, here is the Five-Number Summary:
Min = 17
Q1 = 25
Med = 26
Q3 = 29
Max = 30
Here is the box plot:
Placing the box plots side by side clearly shows that a large number of Brand A boxes have more raisins than Brand B boxes. The interquartile range is a little wider for Brand A, and the top 25% of Brand A boxes are all higher than Brand B’s maximum. This suggests strongly that Brand A, on average, has more raisins in a typical box than Brand B.
Session 1 Statistics As Problem Solving
Consider statistics as a problem-solving process and examine its four components: asking questions, collecting appropriate data, analyzing the data, and interpreting the results. This session investigates the nature of data and its potential sources of variation. Variables, bias, and random sampling are introduced.
Session 2 Data Organization and Representation
Explore different ways of representing, analyzing, and interpreting data, including line plots, frequency tables, cumulative and relative frequency tables, and bar graphs. Learn how to use intervals to describe variation in data. Learn how to determine and understand the median.
Session 3 Describing Distributions
Continue learning about organizing and grouping data in different graphs and tables. Learn how to analyze and interpret variation in data by using stem and leaf plots and histograms. Learn about relative and cumulative frequency.
Session 4 Min, Max and the Five-Number Summary
Investigate various approaches for summarizing variation in data, and learn how dividing data into groups can help provide other types of answers to statistical questions. Understand numerical and graphic representations of the minimum, the maximum, the median, and quartiles. Learn how to create a box plot.
Session 5 Variation About the Mean
Explore the concept of the mean and how variation in data can be described relative to the mean. Concepts include fair and unfair allocations, and how to measure variation about the mean.
Session 6 Designing Experiments
Examine how to collect and compare data from observational and experimental studies, and learn how to set up your own experimental studies.
Session 7 Bivariate Data and Analysis
Analyze bivariate data and understand the concepts of association and co-variation between two quantitative variables. Explore scatter plots, the least squares line, and modeling linear relationships.
Session 8 Probability
Investigate some basic concepts of probability and the relationship between statistics and probability. Learn about random events, games of chance, mathematical and experimental probability, tree diagrams, and the binomial probability model.
Session 9 Random Sampling and Estimation
Learn how to select a random sample and use it to estimate characteristics of an entire population. Learn how to describe variation in estimates, and the effect of sample size on an estimate's accuracy.
Session 10 Classroom Case Studies, Grades K-2
Explore how the concepts developed in this course can be applied through a case study of a K-2 teacher, Ellen Sabanosh, a former course participant who has adapted her new knowledge to her classroom.
Session 11 Classroom Case Studies, Grades 3-5
Explore how the concepts developed in this course can be applied through case studies of a grade 3-5 teacher, Suzanne L'Esperance and grade 6-8 teacher, Paul Snowden, both former course participants who have adapted their new knowledge to their classrooms.