Join us for conversations that inspire, recognize, and encourage innovation and best practices in the education profession.
Available on Apple Podcasts, Spotify, Google Podcasts, and more.
In This Part: The Data Set
When working with a large collection of data, it can be difficult to keep an accurate picture of your data in mind. One way to make it easier to work with large data sets is to reduce the entire data set to just a few summary measures (or numbers that describe significant characteristics of the data). In this session, you will learn how to determine summary measures from ordered data. For convenience, we’ll be looking at small data sets, but these methods and interpretations apply to larger data sets as well.
For the following activities, you will need these materials:
• a package of spaghetti or linguine
• a metric ruler with millimeter markings
• three pieces of paper or cardboard
• a pen or pencil
How long is a broken piece of spaghetti?
Break several spaghetti noodles into pieces to obtain 11 noodles of varying lengths. Make sure that no two noodles in your set are the same length. Draw a horizontal line on a piece of paper or cardboard large enough to display all the noodles in a row. Next, arrange the 11 noodles in order from shortest to longest along the horizontal line. Your arrangement should look something like this:
In This Part: The Two-Noodle Summary
Two useful summary measures are the smallest (minimum) and largest (maximum) data values. To find these values in your ordered arrangement of noodles, remove all but the shortest and longest (keeping the others in size order for use later on).
Label the shortest “Min” (for minimum length) and label the longest “Max” (for maximum length):
We’ll refer to these two noodles (Min and Max) as the “Two-Noodle Summary.”
Problem A1
If you could see only Min and Max, what could you say about any of the other nine noodles in the set?
In This Part: The Two-Number Summary
Here is our Two-Noodle Summary:
We will now determine the Two-Number Summary from the Two-Noodle Summary.
We will add a vertical axis and mark the lengths of the two noodles (left) and remove the noodles. What remains is the Two-Number Summary (right).
If we recorded the length of the fourth noodle in the original set on the same vertical number line, it might look something like this:
Problem A2
What can you say about the length of noodle N4, given the information in the Two-Number Summary?
Problem A3
If you knew only the values of Max and Min, describe some information you would not know about the remaining nine noodles.
Problem A4
Suppose someone asked you to find the “typical” value of the noodle data in Problems A1-A3. How would you answer this question? How would you answer this question if you only had the information from the Two-Number Summary?
Video Segment
In this video segment, Professor Kader asks participants to identify the “typical” value in a data set. Watch this segment after completing Problem A4.
Note: The data set used by the onscreen participants is different from the one provided above.
How do participants define the “center” of a data set?
Problem A1
You would know that the lengths of the other nine noodles must be between the lengths of these two; in other words, none of the other nine noodles can be shorter than Min, and none of them can be longer than Max.
Problem A2
The length of noodle N4 must be between Min and Max.
Problem A3
You would not know the mean length or the median length. You would not know whether the remaining nine noodles were closer to Min or to Max — only that they were between those values.
Problem A4
If you had the actual noodles or knew their lengths, you could use the mean as a “typical” value, which you find by adding the lengths of all 11 noodles and dividing the sum by 11. You could also use the median — the noodle in the center of the ordered list (i.e., the sixth noodle). However, if you only had the information from the Two-Number Summary, your best answer would be the average of Max and Min. This number, which is sometimes called the midrange, can turn out to be very far away from the mean and median, depending on the distribution of the noodles.