Private: Learning Math: Data Analysis, Statistics, and Probability
Bivariate Data and Analysis Part A: Scatter Plots (45 minutes)
In This Part: A Bivariate Data Question
Have you ever wondered whether tall people have longer arms than short people? We’ll explore this question by collecting data on two variables — height and arm span (measured from left fingertip to right fingertip).
Ask a Question
One way to ask this question is, “Is there a positive association between height and arm span?”
Through this question, we are seeking to establish an association between height and arm span. A positive association between two variables exists when an increase in one variable generally produces an increase in the other. For example, the association between a student’s grades and the number of hours per week that student spends studying is generally a positive association. A negative association, in contrast, exists when an increase in one variable generally produces a decrease in the other. For example, the association between the number of doctors in a country and the percentage of the population that dies before adulthood is generally a negative one.
There are many other ways to ask this same question about height and arm span. Here are two, which we will concentrate on in Part A:
• Do people with above-average arm spans tend to have above-average heights?
• Do people with below-average arm spans tend to have below-average heights?
Collect Appropriate Data
In Session 1, measurements (in centimeters) were given for the heights and arm spans of 24 people. Here are the collected data, sorted by increasing order of arm span:
This is bivariate data, since two measurements are given for each person.
The data given above are sorted by arm span. Are they also sorted by height? If not exactly, are they generally sorted by height, and, if so, in which direction? Does this suggest any type of association between height and arm span?
a. Measure the arm span (fingertip to fingertip) and height (without shoes) to the nearest centimeter for six people, including yourself.
b. Does the information you collected generally support or reject the observation you made in Problem A1?
c. Identify the person in the table whose arm span and height are closest to your own arm span and height.
In This Part: Building a Scatter Plot
Analyze the Data
We will now begin our analysis of the bivariate data and explore the co-variation in the arm span and height data. Here again are the collected arm spans and heights for 24 people, sorted in increasing order by arm span:
Bivariate data analysis employs a special “X-Y” coordinate plot of the data that allows you to visualize the simultaneous changes taking place in two variables. This type of plot is called a scatter plot.
For our data, we will assign the X and Y variables as follows:
X = Arm Span
Y = Height
To see how this works, let’s examine the 10th person in the data table. Here are the measurements for Person 10:
X = Arm Span = 170 and Y = Height = 167
Person 10 is represented by the coordinate pair (170, 167) and is represented in the scatter plot as this point:
Scatter for 1
Let’s add two more points to the scatter plot, corresponding to Persons 2 and 23:
Here is the completed scatter plot for all 24 people:
Scatter for all 24
Judging from the scatter plot, does there appear to be a positive association between arm span and height? That is, does an increase in arm span generally lead to an increase in height?
In this video segment, Professor Kader introduces bivariate analysis. The participants measure their heights and arm spans and then create a scatter plot of the data. Professor Kader then asks them to analyze the association between the two variables, height and arm span.
The scatter plots illustrate the general nature of the association between arm span and height. Reading from left to right on the horizontal scale, you can observe that narrow arm spans tend to be associated with people who are shorter, and wider arm spans tend to be associated with people who are taller — that is, there appears to be an overall positive association between arm span and height.
Now that we have established that there is a positive association between arm span and height, a new question emerges: How strong is the positive association between arm span and height? Here again is the data for the 24 people:
In order to answer this question, let’s note the mean arm span and height for these 24 adults:
|•||Mean arm span = 175.5 cm|
|•||Mean height = 174.8 cm|
|a.||Is your arm span and height above the average of these 24 adults?|
|b.||How many of the 24 people have above-average arm spans?|
|c.||How many of the 24 people have above-average heights?|
|d.||It is possible to divide the 24 people into four categories: above-average arm span and above-average height; above-average arm span and below-average height; below-average arm span and above-average height; and below-average arm span and below-average height. How many of the 24 people fall into each of these categories?|
|a.||Where would your arm span and height appear on the scatter plot?|
|b.||Can you identify a person with an above-average arm span and height?|
|c.||Can you identify a person with a below-average arm span and an above-average height?|
|d.||Can you identify a person with a below-average arm span and height?|
|e.||Can you identify a person with an above-average arm span and a below-average height?|
Adding a vertical line to the scatter plot that intersects the arm span (X) axis at the mean, 175.5 cm, separates the points into two groups:
|a.||Note that there are 12 arm spans above the mean and 12 below. Will this always happen? Why or why not?|
|b.||What is true about anyone whose point in the scatter plot appears to the right of this line? What is true about anyone whose point appears to the left of this line?|
Adding a horizontal line to the scatter plot that intersects the height (Y) at the mean, 174.8 cm, also separates the points into two groups:
What is true about anyone whose scatter plot point appears above this line? How many such points are there?
Enter your own measurements or those of one of the other subjects you measured into the Interactive Activity below to plot these additional heights and arm spans against those of the people in the data set. Note that adding these measurements may affect the values of the means.
For a non-interactive version of this activity print this page, plot your additional measurements on the scatter plot in Problem A5, and calculate the new means.
With bivariate data, there are four possible categories of data pairs. Accordingly, each person in the table can be placed into one of four categories:
|a.||People with above-average arm spans and heights are in orange.|
|b.||People with below-average arm spans and above-average heights are in blue.|
|c.||People with below-average arm spans and heights are in purple.|
|d.||People with above-average arm spans and below-average heights are in green.|
We can represent these categories similarly on the scatter plot:
|a.||Points for people with above-average arm spans and heights are in orange.|
|b.||Points for people with below-average arm spans and above-average heights are in blue.|
|c.||Points for people with below-average arm spans and heights are in purple.|
|d.||Points for people with above-average arm spans and below-average heights are in green.|
Adding both the vertical line at the mean arm span (175.5 cm) and the horizontal line at the mean height (174.8) separates the points in the scatter plot into four groups, known as quadrants:
Use this scatter plot to answer the following:
|a.||Describe the heights and arm spans of people in Quadrant I.|
|b.||Describe the heights and arm spans of people in Quadrant II.|
|c.||Describe the heights and arm spans of people in Quadrant III.|
|d.||Describe the heights and arm spans of people in Quadrant IV.|
|a.||Based on the scatter plot, do most people with above-average arm spans also have above-average heights?|
|b.||Based on the scatter plot, do most people with below-average arm spans also have below-average heights?|
No, the data are not sorted by height; for example, the first three heights are 162 cm, 160 cm, and 162 cm. However, the data generally appear to be listed in increasing order. The wider we find a person’s arm span to be, the greater we might expect that person’s height to be, although clearly there is some variation to this rule. The fact that height generally appears in increasing order suggests a positive association between height and arm span.
a. Answers will vary.
b. Answers will vary, but generally the recorded information should sustain the observation that there is a positive association between height and arm span.
c. Answers will vary.
Yes, there appears to be a positive association. In general, the points in the graph move up and to the right. There are exceptions to this, but typically, an increase in arm span leads to an increase in height.
a. Answers will vary.
b. Twelve of the 24 people have above-average arm spans.
c. Thirteen of the 24 people have above-average heights.
d. Eleven people have above-average arm spans and heights. One person has an above-average arm span but a below-average height. Two people have below-average arm spans but above-average heights. Ten people have below-average arm spans and heights.
Answers will vary.
a. No, this will not always happen, because we are considering the mean and not the median. The mean is not necessarily the median of the data; for example, when considering the heights for this group, we see that 13 people are above the mean and 11 are below it.
b. Anyone whose point is to the right of this line has an above-average arm span. In contrast, anyone whose point is to the left of the line has a below-average arm span.
Anyone whose point appears above this line has an above-average height. There are 13 such points.
Answers will vary.
a. People in Quadrant I have above-average arm spans and heights.
b. People in Quadrant II have below-average arm spans and above-average heights.
c. People in Quadrant III have below-average arm spans and heights.
d. People in Quadrant IV have above-average arm spans and below-average heights.
a. Yes, most people who have above-average arm spans also have above-average heights. By counting the points, we can see that 11 of the 12 people with above-average arm spans also have above-average heights.
b. Yes, most people who have below-average arm spans also have below-average heights. By counting the points, we can see that 10 of the 12 people with below-average arm spans also have below-average heights.
Session 1 Statistics As Problem Solving
Consider statistics as a problem-solving process and examine its four components: asking questions, collecting appropriate data, analyzing the data, and interpreting the results. This session investigates the nature of data and its potential sources of variation. Variables, bias, and random sampling are introduced.
Session 2 Data Organization and Representation
Explore different ways of representing, analyzing, and interpreting data, including line plots, frequency tables, cumulative and relative frequency tables, and bar graphs. Learn how to use intervals to describe variation in data. Learn how to determine and understand the median.
Session 3 Describing Distributions
Continue learning about organizing and grouping data in different graphs and tables. Learn how to analyze and interpret variation in data by using stem and leaf plots and histograms. Learn about relative and cumulative frequency.
Session 4 Min, Max and the Five-Number Summary
Investigate various approaches for summarizing variation in data, and learn how dividing data into groups can help provide other types of answers to statistical questions. Understand numerical and graphic representations of the minimum, the maximum, the median, and quartiles. Learn how to create a box plot.
Session 5 Variation About the Mean
Explore the concept of the mean and how variation in data can be described relative to the mean. Concepts include fair and unfair allocations, and how to measure variation about the mean.
Session 6 Designing Experiments
Examine how to collect and compare data from observational and experimental studies, and learn how to set up your own experimental studies.
Session 7 Bivariate Data and Analysis
Analyze bivariate data and understand the concepts of association and co-variation between two quantitative variables. Explore scatter plots, the least squares line, and modeling linear relationships.
Session 8 Probability
Investigate some basic concepts of probability and the relationship between statistics and probability. Learn about random events, games of chance, mathematical and experimental probability, tree diagrams, and the binomial probability model.
Session 9 Random Sampling and Estimation
Learn how to select a random sample and use it to estimate characteristics of an entire population. Learn how to describe variation in estimates, and the effect of sample size on an estimate's accuracy.
Session 10 Classroom Case Studies, Grades K-2
Explore how the concepts developed in this course can be applied through a case study of a K-2 teacher, Ellen Sabanosh, a former course participant who has adapted her new knowledge to her classroom.
Session 11 Classroom Case Studies, Grades 3-5
Explore how the concepts developed in this course can be applied through case studies of a grade 3-5 teacher, Suzanne L'Esperance and grade 6-8 teacher, Paul Snowden, both former course participants who have adapted their new knowledge to their classrooms.