Learning Math: Data Analysis, Statistics, and Probability
Min, Max and the Five-Number Summary Part A: The Data Set (20 minutes)
In This Part: The Data Set
When working with a large collection of data, it can be difficult to keep an accurate picture of your data in mind. One way to make it easier to work with large data sets is to reduce the entire data set to just a few summary measures (or numbers that describe significant characteristics of the data). In this session, you will learn how to determine summary measures from ordered data. For convenience, we’ll be looking at small data sets, but these methods and interpretations apply to larger data sets as well.
For the following activities, you will need these materials:
• a package of spaghetti or linguine
• a metric ruler with millimeter markings
• three pieces of paper or cardboard
• a pen or pencil
How long is a broken piece of spaghetti?
Break several spaghetti noodles into pieces to obtain 11 noodles of varying lengths. Make sure that no two noodles in your set are the same length. Draw a horizontal line on a piece of paper or cardboard large enough to display all the noodles in a row. Next, arrange the 11 noodles in order from shortest to longest along the horizontal line. Your arrangement should look something like this:
In This Part: The Two-Noodle Summary
Two useful summary measures are the smallest (minimum) and largest (maximum) data values. To find these values in your ordered arrangement of noodles, remove all but the shortest and longest (keeping the others in size order for use later on).
Label the shortest “Min” (for minimum length) and label the longest “Max” (for maximum length):
We’ll refer to these two noodles (Min and Max) as the “Two-Noodle Summary.”
If you could see only Min and Max, what could you say about any of the other nine noodles in the set?
In This Part: The Two-Number Summary
Here is our Two-Noodle Summary:
We will now determine the Two-Number Summary from the Two-Noodle Summary.
We will add a vertical axis and mark the lengths of the two noodles (left) and remove the noodles. What remains is the Two-Number Summary (right).
If we recorded the length of the fourth noodle in the original set on the same vertical number line, it might look something like this:
What can you say about the length of noodle N4, given the information in the Two-Number Summary?
If you knew only the values of Max and Min, describe some information you would not know about the remaining nine noodles.
Suppose someone asked you to find the “typical” value of the noodle data in Problems A1-A3. How would you answer this question? How would you answer this question if you only had the information from the Two-Number Summary?
In this video segment, Professor Kader asks participants to identify the “typical” value in a data set. Watch this segment after completing Problem A4.
Note: The data set used by the onscreen participants is different from the one provided above.
How do participants define the “center” of a data set?
You would know that the lengths of the other nine noodles must be between the lengths of these two; in other words, none of the other nine noodles can be shorter than Min, and none of them can be longer than Max.
The length of noodle N4 must be between Min and Max.
You would not know the mean length or the median length. You would not know whether the remaining nine noodles were closer to Min or to Max — only that they were between those values.
If you had the actual noodles or knew their lengths, you could use the mean as a “typical” value, which you find by adding the lengths of all 11 noodles and dividing the sum by 11. You could also use the median — the noodle in the center of the ordered list (i.e., the sixth noodle). However, if you only had the information from the Two-Number Summary, your best answer would be the average of Max and Min. This number, which is sometimes called the midrange, can turn out to be very far away from the mean and median, depending on the distribution of the noodles.
Session 1 Statistics As Problem Solving
Consider statistics as a problem-solving process and examine its four components: asking questions, collecting appropriate data, analyzing the data, and interpreting the results. This session investigates the nature of data and its potential sources of variation. Variables, bias, and random sampling are introduced.
Session 2 Data Organization and Representation
Explore different ways of representing, analyzing, and interpreting data, including line plots, frequency tables, cumulative and relative frequency tables, and bar graphs. Learn how to use intervals to describe variation in data. Learn how to determine and understand the median.
Session 3 Describing Distributions
Continue learning about organizing and grouping data in different graphs and tables. Learn how to analyze and interpret variation in data by using stem and leaf plots and histograms. Learn about relative and cumulative frequency.
Session 4 Min, Max and the Five-Number Summary
Investigate various approaches for summarizing variation in data, and learn how dividing data into groups can help provide other types of answers to statistical questions. Understand numerical and graphic representations of the minimum, the maximum, the median, and quartiles. Learn how to create a box plot.
Session 5 Variation About the Mean
Explore the concept of the mean and how variation in data can be described relative to the mean. Concepts include fair and unfair allocations, and how to measure variation about the mean.
Session 6 Designing Experiments
Examine how to collect and compare data from observational and experimental studies, and learn how to set up your own experimental studies.
Session 7 Bivariate Data and Analysis
Analyze bivariate data and understand the concepts of association and co-variation between two quantitative variables. Explore scatter plots, the least squares line, and modeling linear relationships.
Session 8 Probability
Investigate some basic concepts of probability and the relationship between statistics and probability. Learn about random events, games of chance, mathematical and experimental probability, tree diagrams, and the binomial probability model.
Session 9 Random Sampling and Estimation
Learn how to select a random sample and use it to estimate characteristics of an entire population. Learn how to describe variation in estimates, and the effect of sample size on an estimate's accuracy.
Session 10 Classroom Case Studies, Grades K-2
Explore how the concepts developed in this course can be applied through a case study of a K-2 teacher, Ellen Sabanosh, a former course participant who has adapted her new knowledge to her classroom.
Session 11 Classroom Case Studies, Grades 3-5
Explore how the concepts developed in this course can be applied through case studies of a grade 3-5 teacher, Suzanne L'Esperance and grade 6-8 teacher, Paul Snowden, both former course participants who have adapted their new knowledge to their classrooms.