Learning Math: Data Analysis, Statistics, and Probability
Variation About the Mean Part E: Measuring Variation (45 minutes)
In This Part: Mean Absolute Deviation (MAD)
We will now focus on how to measure variation from the mean within a data set. There are several different ways to do this. The first measure we will explore is called the mean absolute deviation, or MAD. See
Three line plots are pictured below; each has 9 values, and the mean of each is 5:
Line Plot A
Line Plot B
Line Plot C
Of the three, which line plot’s data has the least variation from the mean? Which has the most variation from the mean?
Tip: The plot with the most variation will have data values that are, in general, farthest away from the mean. The plot with the least variation will generally have the values closest to the mean.
From working on Problem E1, you probably got an intuitive sense of the variation in the three data sets. But is there a way to measure exactly how much the values in a line plot differ from the mean?
Recall that in Part B, we described the unfairness of allocations of coins by counting the number of moves required to transform an ordered allocation into an equal-shares allocation. The idea of variation from the mean is related to the idea of fairness in the coin allocation.
For example, consider this ordered allocation:
Eight moves are required to make this allocation fair. This is true because there is an excess of 8 coins above the mean from 4 stacks (+1, +1, +3, +3), and a deficit of 8 coins below the mean from 4 other stacks (-3, -2, -2, -1).
The number of moves required to make an allocation fair tells us how much the original allocation differs from the fair allocation and thus gives us a measure of the variation in our data. (The fair allocation has no variation — no moves = no variation.)
Here is the line plot that corresponds to this allocation:
Here are the deviations from the mean for each value in the set (i.e., how much each value differs from the mean):
Now consider only the magnitude of these deviations — that is, forget for the moment whether they are positive or negative. These are called the absolute deviations. The absolute deviations for this set are plotted below:
We are now going to find the mean of these absolute deviations, which is an indicator, on average, of how far (what distance) the values in our data are from the mean. As usual, find the mean by adding all the absolute deviations and then dividing by how many there are. Here is a table for this calculation:
The mean of these absolute deviations — the MAD (Mean Absolute Deviation) — is 16 / 9 = 1 7/9, or approximately 1.78. This measure tells us how much, on average, the values in a line plot differ from the mean. If the MAD is small, it tells us that the values in the set are clustered closely around the mean. If it is large, we know that at least some values are quite far away from the mean.
In this video segment, Professor Kader introduces the MAD as a method for quantifying variation. Watch this segment to review the process of finding the MAD.
How can the MAD be used to compare different distributions of data?
Below is Line Plot B from Problem E1. Create a table like the one above, find the MAD for this allocation, and compare it to the MAD of Line Plot A from the same problem.
Below is Line Plot C from Problem E1. Create another table, find the MAD for this allocation, and compare it to the MADs of Line Plots A and B.
In This Part: Working with the MAD
Use the Interactive Activity below or your paper/poster board to answer the questions in Problems E4-E6. You will be asked to form several arrangements with a specified total for the absolute deviations. You might want to write the size of each absolute deviation on each adhesive dot or note to help you determine whether you have the desired result. See
For example, for the previous example A, this version of the line plot was:
In the following problems, the mean is 5, and there are 9 values in the data set. See if you can find more than one arrangement for each description.
Create a line plot with a MAD of 24 / 9.
Create a line plot with a MAD of 22 / 9, with no 5s.
Create a line plot with a MAD of 12 / 9, with exactly two 5s, 5 values larger than 5, and 2 values smaller than 5.
Explain why it is not possible to create a line plot with 9 values that has a MAD of 1.
In This Part: Variance and Standard Deviation
The MAD is a measure of the variation in a data set about the mean. Professional statisticians more commonly use two other measures of variation: the variance and the standard deviation.
The method for calculating variance is very similar to the method you just used to calculate the MAD. First, let’s go back to Line Plot A from Problem E1:
The first step in calculating the variance is the same one you used to find the MAD: Find the deviation for each value in the set (i.e., how much each value differs from the mean). The deviations for this data set are plotted below:
The next step in calculating the variance is to square each deviation. Note the difference between this and the MAD, which requires us to find the absolute value of each deviation. The squares of the deviations for this data set are plotted below:
The final step is to find the variance by calculating the mean of the squares. As usual, find the mean by adding all the values and then dividing by how many there are. Here is a table for this calculation:
The mean of the squared deviations is 38 / 9 = 4 2/9, or approximately 4.22. This value is the variance for this data set. As with the MAD, the variance is a measure of variation about the mean. Data sets with more variation will have a higher variance.
The variance is the mean of the squared deviations, so you could also say that it represents the average of the squared deviations. The problem with using the variance as a measure of variation is that it is in squared units. To gauge a typical (or standard) deviation, we would need to calculate the square root of the variance. This measure — the square root of the variance — is called the standard deviation for a data set.
For the data set given above, the standard deviation is the square root of 4.22, which is approximately 2.05. Note that this value is fairly close to the MAD of 1.78 that we calculated earlier.
The standard deviation, first introduced in the late 19th century, has become the most frequently used measure of variation in statistics today. For example, the SAT is scaled so that its mean is 500 points and its standard deviation is 100 points. IQ tests are created with an expected mean of 100 and a standard deviation of 15.
Below is Line Plot B from Problem E1. Create a table like the one above, and find the variance and standard deviation for this allocation. Compare the standard deviation to the MAD of Line Plot B you found in Problem E2 and to the standard deviation of Line Plot A.
Tip: Remember that in these problems, the mean is 5. Calculate the variance first, then take its square root to find the standard deviation.
Below is Line Plot C from Problem E1. Create another table, and find the variance and standard deviation for this allocation. Compare the standard deviation to the MAD of Line Plot C you found in Problem E3 and to the standard deviations of Line Plots A and B.
Again, remember to calculate the variance first, then take its square root.
a. What would happen to the mean of a data set if you added 3 to every number in it?
b. What would happen to the MAD of a data set if you added 3 to every number in it?
c. What would happen to the variance of a data set if you added 3 to every number in it?
d. What would happen to the standard deviation of a data set if you added 3 to every number in it?
e. What would happen to the mean of a data set if you doubled every number in it?
f. What would happen to the MAD of a data set if you doubled every number in it?
g. What would happen to the variance of a data set if you doubled every number in it?
h. What would happen to the standard deviation of a data set if you doubled every number in it?
In this video segment, Andrea Rex, director of the Massachusetts Water Resources Authority, discusses the use of statistics in assessing water quality in Boston Harbor. Watch this segment for a real-world application of the mean and standard deviation.
How does Andrea Rex use the mean and standard deviation to assess water quality?
Part E investigates how deviations from the mean can be used to develop a summary measure of the degree of variation in your data. Is there a single number that can describe how much the values in a line plot differ from the mean?
The line plot representation helps us develop an understanding of the mean absolute deviation (MAD). As you become more familiar with the MAD, take some time to think about how this numerical measure relates to your intuitive sense of the degree of variation in the line plots you’re working with.
The idea of variation from the mean is related to the idea of fairness in an allocation of coins, which you discussed earlier in this session. Think back to the method of determining the unfairness of an allocation — counting the number of moves required to transform an ordered allocation to an equal-shares allocation. Here’s the connection: The sum of the absolute deviations is equal to twice this required number of moves.
The absolute deviations occur in pairs, since the mean is the “balance point” for the set. Half the absolute deviations are above the mean, and half are below. When a move is made, one coin is moved from a value above the mean to a value below the mean. This removes two of the deviations; the deviation above the mean is reduced, and the deviation below the mean is reduced. Since each move reduces the absolute deviation by two, the sum of the absolute deviations must be twice the required number of moves.
By inspection, it is clear that Line Plot B has the least variation and Line Plot C has the most variation. Remember that the variation is from the mean, not between the values themselves, which is why Line Plot C has the most variation.
The MAD is 4 / 9, or approximately 0.44. This is much smaller than the MAD for Line Plot A, which indicates that the values in this allocation are much more closely grouped around the mean.
The MAD is 40 / 9, or approximately 4.44. This is more than twice the MAD for Line Plot A and 10 times as large as the MAD for Line Plot B. The much larger MAD indicates that the values of Line Plot C are very far from the mean, as compared to the other two.
Answers will vary. Here is one possible line plot:
Answers will vary. Here is one possible line plot:
Answers will vary. Here is one possible line plot:
The reason it is impossible is that the MAD is the total of all absolute deviations. You may have noticed that in these problems the MAD is the sum of the deviations divided by 9. For the MAD to equal 1, the sum of the deviations would have to be exactly
9 (9 / 9 = 1). But the only way that could happen is if the total excess and the total deficit each were equal to 4.5. This would require splitting the coins, which cannot be done.
The variance is 4 / 9, or approximately 0.44.
The standard deviation is the square root of the variance, which is 2/3, or approximately 0.67. The standard deviation is slightly higher than the MAD (which is 0.44), and is significantly smaller than the standard deviation of Line Plot A (which is 2.05). The great difference in the standard deviations indicates that the values in Line Plot B are much more closely distributed around the mean.
The variance for Line Plot B is 180 / 9 = 20.
The standard deviation — the square root of 20 — is approximately 4.47. This is very close to the MAD calculated in Problem E3 (4.44), and is much higher than the standard deviations for Line Plot A (2.05) and Line Plot B (0.67).
|a.||The mean would increase by 3.|
|b.||The MAD would not change. Since the values in the list are each 3 larger, and the mean is also 3 larger, the deviations from the mean would remain the same.|
|c.||The variance would not change, since it depends only on the deviation from the mean, not the values themselves. Since the mean increases by 3 along with the rest of the data set, none of the deviations will change.|
|d.||Since the standard deviation is the square root of the (unchanged) variance, it will not change.|
|e.||The mean would be doubled.|
|f.||The MAD would be doubled, since all the deviations are now doubled, and the MAD is the average of these deviations.|
|g.||The variance would be multiplied by 4. Since calculating the variance involves squaring the deviations, the newly doubled deviations would all be squared, resulting in values that are four times as large. For example, if a deviation was (+3), it now becomes (+6). The value used in the variance calculation changes from 32 = 9 to 62 = 36, which is four times as large.|
|h.||The standard deviation would be doubled, since it is the square root of the variance.|
Session 1 Statistics As Problem Solving
Consider statistics as a problem-solving process and examine its four components: asking questions, collecting appropriate data, analyzing the data, and interpreting the results. This session investigates the nature of data and its potential sources of variation. Variables, bias, and random sampling are introduced.
Session 2 Data Organization and Representation
Explore different ways of representing, analyzing, and interpreting data, including line plots, frequency tables, cumulative and relative frequency tables, and bar graphs. Learn how to use intervals to describe variation in data. Learn how to determine and understand the median.
Session 3 Describing Distributions
Continue learning about organizing and grouping data in different graphs and tables. Learn how to analyze and interpret variation in data by using stem and leaf plots and histograms. Learn about relative and cumulative frequency.
Session 4 Min, Max and the Five-Number Summary
Investigate various approaches for summarizing variation in data, and learn how dividing data into groups can help provide other types of answers to statistical questions. Understand numerical and graphic representations of the minimum, the maximum, the median, and quartiles. Learn how to create a box plot.
Session 5 Variation About the Mean
Explore the concept of the mean and how variation in data can be described relative to the mean. Concepts include fair and unfair allocations, and how to measure variation about the mean.
Session 6 Designing Experiments
Examine how to collect and compare data from observational and experimental studies, and learn how to set up your own experimental studies.
Session 7 Bivariate Data and Analysis
Analyze bivariate data and understand the concepts of association and co-variation between two quantitative variables. Explore scatter plots, the least squares line, and modeling linear relationships.
Session 8 Probability
Investigate some basic concepts of probability and the relationship between statistics and probability. Learn about random events, games of chance, mathematical and experimental probability, tree diagrams, and the binomial probability model.
Session 9 Random Sampling and Estimation
Learn how to select a random sample and use it to estimate characteristics of an entire population. Learn how to describe variation in estimates, and the effect of sample size on an estimate's accuracy.
Session 10 Classroom Case Studies, Grades K-2
Explore how the concepts developed in this course can be applied through a case study of a K-2 teacher, Ellen Sabanosh, a former course participant who has adapted her new knowledge to her classroom.
Session 11 Classroom Case Studies, Grades 3-5
Explore how the concepts developed in this course can be applied through case studies of a grade 3-5 teacher, Suzanne L'Esperance and grade 6-8 teacher, Paul Snowden, both former course participants who have adapted their new knowledge to their classrooms.