Learning Math: Data Analysis, Statistics, and Probability
Data Organization and Representation Part B: Line Plots (40 minutes)
In This Part: Counting Raisins
Let’s begin with a recap of Problem B7 from Session 1.
1. Ask a Question
How many raisins are in a half-ounce box of Brand X raisins? The weight of a box of raisins appears on the package, but the number of raisins in the box does not. In this activity, you will investigate how many raisins are in a box of a particular brand, which we will call Brand X.
2. Collect the Data
Determine the number of raisins in 17 boxes of raisins. Get some packages of half-ounce raisins and try counting them yourself! See
Using the data above, answer this question: How many raisins are in a half-ounce box of Brand X raisins? Answer the question in whatever manner seems the most descriptive.
You may have come up with a single number for the answer to the question above, or perhaps you came up with an interval for the answer. But because different boxes have different raisin counts, a single number will not provide a complete answer to the question. We cannot, for instance, say that a box contains 28 raisins — some do, but some don’t. The raisin counts vary from box to box, so answers to this question must consider the variation in the data.
Suppose we count 17 half-ounce boxes of Brand Y raisins, and the resulting raisin counts are as follows:
Answer this statistical question: How many raisins are in a half-ounce box of Brand Y raisins?
Problem B2 suggests that when there is no variation in the data, it is very easy to answer a statistical question about it.
Does Problem B2’s data strongly suggest that the next box will have 28 raisins? Does it prove that the next box will have 28 raisins? If so, why? If not, would there be a way to prove, statistically, that the next box must have 28 raisins?
Now go back to the question in Problem B1, and use that data to describe the raisin count in a box of Brand X raisins. This time, try to consider the variation in the number of raisins per box.
The raisin activity is adapted from Investigations in Number, Data, and Space, Grade 4. Copyright 1998 by Dale Seymour. Used with permission of Pearson Education, Inc.
In this Part: Making a Line Plot
As we mentioned before, looking at quantitative data — numbers that come from measurements — provides answers to statistical questions. We are mainly concerned with situations where the measurements differ; that is, where there is variation in the data. Our answers to statistical questions must take this variation into account, so we need appropriate tools for describing the differences in measurements. See
One such tool is a graphical representation known as a line plot. In a line plot, we mark each possible value between the minimum and maximum data values and then stack dots above each of these values to represent actual counts. A line plot is sometimes called a dot plot.
Recall the raisin counts for 17 boxes of Brand X raisins:
To construct a line plot, we’ll begin by setting up the horizontal axis for this set of data. Since the lowest (minimum) value is 25 and the highest (maximum) value is 31, we’ll display this segment of the number line along the horizontal axis.
Next, for each raisin count, place a dot above its corresponding value on the horizontal axis. For example, to display the count of our first box of Brand X raisins, we put a dot above the number 29.
To complete the line plot, we’ll place a dot over the value 27, follow that with another dot over the value 27, and so forth, until there is a dot for each value in the data set.
Now construct the line plot for this raisin data by using a piece of paper to add the rest of the data to the line plot we began above.
When the line plot is complete, the number of dots above each value indicates the frequency, or the number of times, that this particular raisin count appears in the data.
Does it make a difference that there is at least one of each discrete value between the lowest and highest values in the data (i.e., every raisin count between 25 and 31 has at least one box)?
In This Part: Interpreting a Line Plot
Here is the line plot for the 17 raisin counts of Brand X raisins: See
Let’s take a closer look at this graph. The horizontal axis corresponds to the number of raisins in a box. Each dot indicates one box of raisins, and the dots are placed above the numbers that indicate how many raisins are in the box. For instance, the four red dots tell us that four boxes contain 29 raisins. The two green dots tell us that two boxes contain 26 raisins.
It is important to note that the raisin counts are ordered on the horizontal axis. A proper interpretation of this graph depends on this ordering.
Use the line plot to answer the following questions:
a. What is the minimum (smallest) raisin count for a box of Brand X raisins?
b. What is the maximum (largest) raisin count for a box?
c. How many boxes have between 26 and 28 raisins, inclusively (i.e., including 26 and 28)?
d. How many boxes have between 25 and 31 raisins, inclusively (i.e., including 25 and 31)?
e. Which raisin count occurred most frequently?
f. How many boxes contain more than 29 raisins?
g. How many boxes contain 29 or fewer raisins?
h. How many boxes contain fewer than 26 raisins?
i. How many boxes contain 25 or fewer raisins?
j. How many boxes contain between 26 and 29 raisins, inclusively?
Look back at the answers you gave in Problem B6 (f) and (g). Are these answers related? If so, how? And why? What about your answers to Problem B6 (h) and (i)?
Based on your observations above, give three descriptive statements that provide an answer to the question “How many raisins are there in a half-ounce box of Brand X raisins?” At least two of your statements should take into account the variation in the data.
In this video segment, participants attempt to answer the question “How many raisins are there in a box of Brand X raisins?” by collecting data, analyzing data, and interpreting the results. Watch the segment after you have completed Problems B1-B8, and compare your strategy with the onscreen participants’.
Is there a lot of overall variation in the data collected by Georgina’s group? Are there places where there isn’t a lot of variation in this data?
You can find the first part of this segment on the session video approximately 12 minutes and 50 seconds after the Annenberg Media logo. The second part of this segment begins approximately 14 minutes and 42 seconds after the Annenberg Media logo.
In This Part: Intervals
When there is variation in data, there are many different answers to a statistical question, as your answer must take this variation into account. Frequently, answers to statistical questions are given in the form of intervals — ranges of values for data. Here are two common ways to use intervals to answer statistical questions:
- Naming the interval in which all the data are located; that is, from the minimum data value (Min) to the maximum data value (Max). For example, in the Brand X raisin-count data, the interval is 25 to 31.
- Naming an interval with the highest concentration of data; that is, an interval with little variation that contains a lot of data. For example, in the raisin-count data, a large proportion (14/17) of the Brand X raisin counts are between 26 and 29 (inclusively); this interval is 26 to 29.
In this video segment, Professor Kader leads a discussion about two potential answers to the question “How many raisins are there in a box?”
Consider Paul’s and Phil’s answers to Professor Kader’s question. When might Paul’s answer be more useful? How about Phil’s? Which answer provides a better overall way of looking at the data? Why?
You can find this segment on the session video approximately 17 minutes and 08 seconds after the Annenberg Media logo.
Sometimes it’s useful to answer a statistical question with a single value that you’ve chosen to represent all of your data. The most frequently occurring value, the mode, may be a good choice. For example, in the Brand X raisin-count data, the most common raisin count is 28, which occurred five times. As we continue, we will encounter two other such representative values, the mean (the arithmetic average of the data set) and the median (the value in the exact center of an ordered list of data).
If you are doing this activity hands-on, count and record the number of raisins in each of your 17 boxes. Then consider the following question:
- For your brand, do all the boxes have the same number of raisins? Would you say that variation is present in your data?
The goal of this section is to investigate a graphical representation that can help you better understand the variation in your data and provide various answers to this question: How many raisins are in a box of Brand X raisins?
The line plot provides a picture of the distribution of the raisin counts. It shows us what values the raisin-count variable takes and how often each value occurs. It also shows the pattern of variation in the data.
If you are working with real raisins, construct a line plot for your data.
If you are working in small groups, the groups can present their line plots to the whole group at this time.
Next, you will consider 10 questions about the distribution counts. Note that these questions all deal with the values of the counts or how often they occur. For example, Problem B6 (c)-(j) are all concerned with how many times a specific count occurs or how many times the counts occur within a specified range. Problem B8 is concerned with the principle that statistical problem solving requires answering questions in the presence of variation.
If you are working in groups you can now return to your small groups to answer the questions in Problem B6.
There are many possible answers. For example, we can be fairly confident that a box of Brand X raisins will have between 25 and 31 raisins.
We can be almost certain that a new box of Brand Y raisins will have 28 raisins.
Yes, the data strongly suggests that the next box will have 28 raisins, but it does not prove this. We would need to be sure that all boxes would have 28 raisins, not just our sample of 17 boxes. Statistically, there is no way to guarantee that the next box must have 28 raisins without sampling all boxes or without knowing something about the process involved in packaging Brand Y raisins. This is true regardless of how many boxes we sample; it only becomes more and more likely that the next box will have 28 raisins.
We can still be fairly confident that a box of Brand X raisins will have between 25 and 31 raisins. In looking at the variation, though, it is quite likely that a box of Brand X raisins will have between 27 and 29 raisins. Still, no single number can describe our expectation for the number of raisins in a box of Brand X raisins.
The range of possible data values is between 25 and 31 raisins, so the horizontal axis will include each number from 25 to 31:
Solution Graph B5
It would not matter if, for example, there were no values of 30 in the data set. All values in the range of possible data values must be included in the line plot, just as all numbers are indicated on a number line even though some of them may not be included in a list.
a. The minimum raisin count is the leftmost dot, which indicates 25 raisins.
b. The maximum raisin count is the rightmost dot, which indicates 31 raisins.
c. Counting the dots tells us that a total of 10 boxes contain between 26 and 28 raisins.
d. All 17 boxes do.
e. The most frequent count is the tallest stack of dots, 28 raisins. There are five boxes with this frequency.
f. Two boxes contain more than 29 raisins.
g. Fifteen boxes contain 29 or fewer raisins.
h. One box contains fewer than 26 raisins.
i. One box contains 25 or fewer raisins.
j. Fourteen boxes contain between 26 and 29 raisins.
For Problem B6 (f) and (g), if you already know how many boxes contain more than 29 raisins, all other boxes must contain 29 or fewer raisins. Instead of counting a large number of boxes for Problem B6 (g), you could subtract the answer to Problem B6 (f), which is 2, from the total number of boxes, 17, to get your answer, 15.
As for Problem B6 (h) and (i), they are identical questions because the data is discrete. There is no way to have between 25 and 26 raisins, so asking how many boxes have fewer than 26 raisins is the same as asking how many have 25 or fewer.
a. The box of raisins should have between 25 and 31 raisins.
b. The box of raisins is very likely to have between 26 and 29 raisins.
c. The box of raisins is unlikely to have 28 raisins, but this is the most likely number from our sample.
Session 1 Statistics As Problem Solving
Consider statistics as a problem-solving process and examine its four components: asking questions, collecting appropriate data, analyzing the data, and interpreting the results. This session investigates the nature of data and its potential sources of variation. Variables, bias, and random sampling are introduced.
Session 2 Data Organization and Representation
Explore different ways of representing, analyzing, and interpreting data, including line plots, frequency tables, cumulative and relative frequency tables, and bar graphs. Learn how to use intervals to describe variation in data. Learn how to determine and understand the median.
Session 3 Describing Distributions
Continue learning about organizing and grouping data in different graphs and tables. Learn how to analyze and interpret variation in data by using stem and leaf plots and histograms. Learn about relative and cumulative frequency.
Session 4 Min, Max and the Five-Number Summary
Investigate various approaches for summarizing variation in data, and learn how dividing data into groups can help provide other types of answers to statistical questions. Understand numerical and graphic representations of the minimum, the maximum, the median, and quartiles. Learn how to create a box plot.
Session 5 Variation About the Mean
Explore the concept of the mean and how variation in data can be described relative to the mean. Concepts include fair and unfair allocations, and how to measure variation about the mean.
Session 6 Designing Experiments
Examine how to collect and compare data from observational and experimental studies, and learn how to set up your own experimental studies.
Session 7 Bivariate Data and Analysis
Analyze bivariate data and understand the concepts of association and co-variation between two quantitative variables. Explore scatter plots, the least squares line, and modeling linear relationships.
Session 8 Probability
Investigate some basic concepts of probability and the relationship between statistics and probability. Learn about random events, games of chance, mathematical and experimental probability, tree diagrams, and the binomial probability model.
Session 9 Random Sampling and Estimation
Learn how to select a random sample and use it to estimate characteristics of an entire population. Learn how to describe variation in estimates, and the effect of sample size on an estimate's accuracy.
Session 10 Classroom Case Studies, Grades K-2
Explore how the concepts developed in this course can be applied through a case study of a K-2 teacher, Ellen Sabanosh, a former course participant who has adapted her new knowledge to her classroom.
Session 11 Classroom Case Studies, Grades 3-5
Explore how the concepts developed in this course can be applied through case studies of a grade 3-5 teacher, Suzanne L'Esperance and grade 6-8 teacher, Paul Snowden, both former course participants who have adapted their new knowledge to their classrooms.