Skip to main content Skip to main content

Private: Learning Math: Data Analysis, Statistics, and Probability

Designing Experiments Part B: Comparative Observational Studies (35 minutes)

In This Part: A New Raisin Question
Let’s begin our investigation of comparative observational studies by returning to the raisin problem from Session 2 and the homework in Session 4. In this session, you will return to the issue of comparing two different brands of raisins: When the weights of the boxes are the same, how do the number of raisins in each box compare between the two brands.

1. Ask a Question
How do the number of raisins in boxes of Brand C and Brand D compare?

2. Collect Appropriate Data
We counted 28 boxes of Brand C raisins and 36 boxes of Brand D raisins. Here are the ordered raisin counts for boxes of Brand C and Brand D raisins:

 

 

3. Analyze the Data
Here are the mean and median counts for each brand:

 

 

 

According to these data, Brand D typically has a few more raisins than Brand C. On average, Brand D has two more raisins than Brand C, and the median number of Brand D raisins (29) is one more than the median number of Brand C raisins (28).

Based on the means and medians, you might conclude that the number of raisins in a box is about the same for both brands. Although it is useful to look at the means and medians, there are other aspects of the distribution you might want to consider.


Problem B1
Why is this raisin study observational as opposed to experimental?


In This Part: Using Five-Number Summaries and Box Plot
Comparing two sets of measurements is not quite as simple as comparing two numbers. Because we are comparing a set of 28 measurements for Brand C with a set of 36 measurements for Brand D, any comparison must be based on percentages and not absolute frequencies. A comparison of the Five-Number Summaries is useful, since these quantities divide the ordered data into four groups, with approximately 25% of the data in each group. Here are the Five-Number Summaries for these data: See Note 3 below. 

 

 

Here are the comparative box plots for these data:

 

 

 

 

 

 

 

 

 

You might start by comparing the actual values in the Five-Number Summaries. This will tell you where one set of measurements is located relative to the other set:

 

 

 

 

 

 

 

 

 

Note that with the exception of the minimum values, all summary measures for Brand D are higher than for Brand C. This suggests that boxes of Brand D tend to have more raisins than boxes of Brand C. In fact, since the third quartile for Brand D is greater than the maximum for Brand C, more than 25% of the boxes of Brand D have more raisins than any boxes of Brand C.


In This Part: The Interquartile Range
Your comparison of the two sets of measurements should also consider the degree of variation within each set. This comparison can be based on the range of all the data (Max – Min) as well as on the range of the middle half of the data (which is called the Q-Spread or Interquartile Range, or simply the IQR), which is found by subtracting Q1 from Q3.

Let’s look at the box plots again, and then calculate the range and the IQR:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Based on the comparative box plots, there is more variation in the raisin counts for Brand D raisins than for Brand C raisins. The values for the ranges and IQR confirm this (Range C = 7, Range D = 15; IQR C = 3, IQR D = 6). Both the range and the IQR for Brand D are at least twice the range and the IQR for Brand C.

Consequently, although Brand D tends to have more raisins per box than Brand C, the smaller range and IQR for Brand C tell us that Brand C is more consistent than Brand D. Since the weights of boxes are the same, this would also suggest that the sizes of the raisins vary less for Brand C.


Problem B2
Brand A raisins come in boxes of the same weight as Brands C and D. Here are the ordered counts for 30 boxes of Brand A raisins:

 

 

 

 

Compare the counts for Brands A and D. Make sure you consider where the data are located and the degree of variation. (You may have already determined the Five-Number Summary in Session 4.)


Problem B3
Brand B raisins come in boxes of the same weight as Brands A, C, and D. Here are the ordered counts for 27 boxes of Brand B raisins:

 

 

 

 

Compare the counts for three brands: A, B, and C. Make sure you consider where the data are located and the degree of variation. (You may have already determined the Five-Number Summary in Session 4.)

 

Notes

Note 3
Most people do not have difficulty comparing a single number with a single number, for example, noting that the median of one set of counts is greater than the median of another, or comparing one upper quartile with another. Some people, though, may have difficulty in comparing the distribution of one set of counts with the distribution of another.

To compare one Five-Number Summary with another in the proper way requires a composite comparison of five numbers to five numbers; you must think beyond single-number comparisons. The box plots help to clarify this comparison, especially the variation within a group as indicated by the range and the interquartile range.

Solutions

Problem B1
The raisin studies are observational because they observe the objects (raisins) as they are. There is no treatment deliberately imposed on any group of raisins, so there is no “cause and effect” to study.

Problem B2
Here are the Five-Number Summaries and box plots for each brand:

Min

Q1

Med

Q3

Max

Brand A

23

27

29.5

32

39

Brand D

23

27

29

33

38

Problem B3

Here are the Five-Number Summaries for each brand:

 

Min

Q1

Med

Q3

Max

Brand A

23

27

29.5

32

39

Brand B

17

25

26

29

30

Brand C

25

26

28

29

32

The Five-Number Summaries for Brands A, B, and C suggest that Brand B has the fewest raisins in general. It has the smallest median (26), the smallest minimum (17), and the smallest maximum (30). Brand C has the least total variation and the highest minimum (25). Brand A has the most raisins in general, having the largest median (29.5) and by far the largest maximum (39); it also has the greatest variation.

 

The box plots indicate that the two sets of counts are very similar. The location indicators are all about the same: The Mins and Q1s are exactly the same, and the Meds, Q3s, and Maxes differ by 0.5, 1, and 1, respectively, which are not large differences relative to the magnitudes of the numbers we are comparing.

The degree of variation is similar for the two brands. The ranges for Brands A and D are 16 and 15, respectively, and the IQRs are 5 and 6.

 

Series Directory

Private: Learning Math: Data Analysis, Statistics, and Probability

Credits

Produced by WGBH Educational Foundation. 2001.
  • Closed Captioning
  • ISBN: 1-57680-481-X

Sessions