Learning Math: Data Analysis, Statistics, and Probability
Min, Max and the Five-Number Summary Part E: Finding the Five-Number Summary Numerically (30 minutes)
In This Part: Locating the Median from Ordered Data
In Part D, we used noodles to help us visualize the concept of quartiles. In practice, however, the task of determining quartiles is treated strictly as a numerical problem. It is based on an ordered list of numerical measurements and the position of each measurement in the list. In Part E, we’ll transition to this numerical approach. See Note 3 below.
Remember the procedure for determining quartiles described earlier: First find the median; then find the first and third quartile values.
Let’s begin with 13 noodles, arranged in ascending order:
Each noodle has a position in this ordered list: (1) indicates the shortest noodle, (2) the next shortest, and so on. The longest noodle is (13):
The letter n is often used in statistics to indicate the number of data values in a set. In this case, there are n = 13 noodles, and 13 positions are indicated on the line above. The median is in position (7), because there are just as many positions (six) to the left of the median as there are to the right of the median:
The position of the median in an ordered list with n = 13 is (7). If there had been 14 items in the list, the position would have been halfway between positions (7) and (8), or (7.5). So if n = 14, the position of the median is (7.5).
Find the position of the median for at least three other values of n. Then use this information to come up with a general mathematical rule for determining the position of the median if you know the number of items in an ordered list.
Try consecutive numbers, like 10, 11, and 12. To get you started, if n = 10, the median will be halfway between the fifth and sixth items, so the position of the median is (5.5).
In This Part: Calculating the Position of the Median
The general rule for determining the position of the median is that the median will always be in position (n + 1) / 2 in an ordered list. The positions can be indicated from smallest to largest (ascending order) or from largest to smallest (descending order). The median is in the same position and is the same value, regardless of the ordering method you use.
It is important to remember that this rule indicates the position of the median, and not the value of the median. The value of the median is the value at that position. In our 13-noodle example, n = 13, and the position of the median is determined by (13 + 1) / 2 = 14 / 2 = 7. So the median is in position (7), and the value of Med is the length of the seventh noodle.
Note that there are six noodles to the left (1L to 6L) and six noodles to the right (6R to 1R) of the median. To find the positions of the remaining quantities for the Five-Number Summary, it’s convenient to label the noodles to the left of the median in ascending order and the noodles to the right of the median in descending order:
Again, notice that Med is in position (7) from each end of the ordered list, and notice that Min (the shortest noodle) and Max (the longest noodle) are each in position (1) on their respective ends of the ordered data.
What is the position of Q1 in this ordered list?
Remember that you should only consider noodles to the left of the median. Do not include the median itself in this count.
What is the position of Q3 in this ordered list?
Here is the ordered list of the numerical values for the 13 noodles and the corresponding position of each measurement from its respective end of the data:
Use the information from the position of the data in this ordered list and the results from Problems E2 and E3 to build the Five-Number Summary for this data.
Remember that a summary value that lies in a position halfway between two items in an ordered list is the average of the adjacent pair of values.
Use the techniques you’ve learned in Part E to build the Five-Number Summary for the following set of measurements, where n = 15:
Build the Five-Number Summary for the following set of measurements, where n = 20:
Ignore the values of the data when finding the positions of the median and quartiles. It is possible for the values surrounding the median and quartiles to be identical.
Here are the lengths of 20 pine needles, to the nearest millimeter, from Session 1, Problem H1:
a. Determine the Five-Number Summary for these 20 measurements.
b. Draw a box plot for these 20 measurements.
c. Give a brief interpretation of this summary. What does it tell you about the lengths of the pine needles?
Don’t forget that in order to build a Five-Number Summary or a box plot, you will need to order the list first!
You should be aware that several different algorithms are commonly used to determine the values of quartiles. The concept is the same but the details of each method may differ.
For instance, the method we use in this course depends on a specific definition of “upper half” and “lower half.” If you have an odd number of data values, you do not include the median in either half. This has become the popular method for teaching statistics in schools. It is also the method used in NCTM literature. Some statistics books or teaching materials, however, may use a slightly different method. For example, if you have an odd number of data values, you might include the median in both “halves.” These two methods will sometimes produce the same or similar values for quartiles, but sometimes these values will be quite different, depending on the patterns and variation in your data.
Here are some examples. If there were 15 raisins (n = 15), the median would be in position (8). If n = 16, the median would be in position (8.5). If n = 17, the median would be in position (9). A general mathematical rule is that the position of the median is (n + 1) / 2, where n is the number of items in the list.
Since six is an even number, this is a case where you would need to draw a line to represent the position of Q1, the median of the six noodles to the left of Med. Using the formula (n + 1) / 2 from Problem E1 gives us (6 + 1) / 2 = 7 / 2 = 3.5. Therefore, position (3.5L), halfway between positions (3L) and (4L) from the low end of the noodles, is the position (though not the value) of Q1.
Again, you’ll need to draw a line to represent the position of Q3. As in Problem E2, the formula (n + 1) / 2 gives us (6 + 1) / 2 = 7 / 2 = 3.5. Therefore, position (3.5R), halfway between positions (3R) and (4R) from the noodles, is the position (though not the value) of Q3.
Here is the Five-Number Summary:
• Min is in position (1L); Min = 13.
• Max is in position (1R); Max = 127.
• Med is in position (13 + 1) / 2 = (7); Med = 74.
• There are six positions to the left of (7), so Q1 is in position (6 + 1) / 2 = (3.5L). The value of Q1 is (28 +
33) / 2; Q1 = 30.5.
• There are six positions to the right of (7), so Q3 is in position (6 + 1) / 2 = (3.5R). The value of Q3 is
(102 + 118) / 2; Q3 = 110.
First, number the positions as you did in Problem E4. The center position will be marked with an (8). Here is the Five-Number Summary:
• The minimum is in position (1L); Min = 10.
• The maximum is in position (1R); Max = 89.
• The median is in position (15 + 1) / 2 = (8); Med = 26.
• The first quartile is in position (7 + 1) / 2 = (4L); Q1 = 18.
• The third quartile is in position (4R); Q3 = 51.
Again, number the positions as you did in Problem E4. This time, there will be two values numbered (10) in the center of the ordered list. Here is the Five-Number Summary:
• The minimum is in position (1L); Min = 1.
• The maximum is in position (1R); Max = 53.
• The median is in position (20 + 1) / 2 = (10.5), which means it is the average of the two values numbered (10), or (17 + 20) / 2; Med = 18.5.
• The first quartile is in position (10 + 1) / 2 = (5.5L), which is (4 + 4) / 2; Q1 = 4.
• The third quartile is in position (5.5R), which is (34 + 38) / 2; Q3 = 36.
The ordered list is as follows:
a. Here is the Five-Number Summary:
Min = 37
Q1 = 48.5
Med = 70
Q3 = 109
Max = 120
b. Here is the box plot:
c. Based on these measurements:
• All pine needles have lengths between 37 mm and 120 mm.
• Approximately half the pine needles have lengths less than 70 mm.
• Approximately half the pine needles have lengths greater than 70 mm.
• Approximately half the pine needles have lengths between 48.5 mm and 109 mm.
• The widest range of needle lengths seems to be in the third quartile, where 25% of the needles are
between 70 mm and 109 mm.
• The longest and shortest needles fall in very tight ranges; the longest 25% of needles are between 109
and 120 mm, and the shortest 25% are between 37 mm and 48.5 mm.
Session 1 Statistics As Problem Solving
Consider statistics as a problem-solving process and examine its four components: asking questions, collecting appropriate data, analyzing the data, and interpreting the results. This session investigates the nature of data and its potential sources of variation. Variables, bias, and random sampling are introduced.
Session 2 Data Organization and Representation
Explore different ways of representing, analyzing, and interpreting data, including line plots, frequency tables, cumulative and relative frequency tables, and bar graphs. Learn how to use intervals to describe variation in data. Learn how to determine and understand the median.
Session 3 Describing Distributions
Continue learning about organizing and grouping data in different graphs and tables. Learn how to analyze and interpret variation in data by using stem and leaf plots and histograms. Learn about relative and cumulative frequency.
Session 4 Min, Max and the Five-Number Summary
Investigate various approaches for summarizing variation in data, and learn how dividing data into groups can help provide other types of answers to statistical questions. Understand numerical and graphic representations of the minimum, the maximum, the median, and quartiles. Learn how to create a box plot.
Session 5 Variation About the Mean
Explore the concept of the mean and how variation in data can be described relative to the mean. Concepts include fair and unfair allocations, and how to measure variation about the mean.
Session 6 Designing Experiments
Examine how to collect and compare data from observational and experimental studies, and learn how to set up your own experimental studies.
Session 7 Bivariate Data and Analysis
Analyze bivariate data and understand the concepts of association and co-variation between two quantitative variables. Explore scatter plots, the least squares line, and modeling linear relationships.
Session 8 Probability
Investigate some basic concepts of probability and the relationship between statistics and probability. Learn about random events, games of chance, mathematical and experimental probability, tree diagrams, and the binomial probability model.
Session 9 Random Sampling and Estimation
Learn how to select a random sample and use it to estimate characteristics of an entire population. Learn how to describe variation in estimates, and the effect of sample size on an estimate's accuracy.
Session 10 Classroom Case Studies, Grades K-2
Explore how the concepts developed in this course can be applied through a case study of a K-2 teacher, Ellen Sabanosh, a former course participant who has adapted her new knowledge to her classroom.
Session 11 Classroom Case Studies, Grades 3-5
Explore how the concepts developed in this course can be applied through case studies of a grade 3-5 teacher, Suzanne L'Esperance and grade 6-8 teacher, Paul Snowden, both former course participants who have adapted their new knowledge to their classrooms.