|
You should have decided in Problem D3 that two of the three lines are better candidates for describing the trend in the data points. The line Height = Arm Span has nine points that are above the line, three that are on the line, and 12 that are below the line. The line Height = Arm Span - 1 has 12 points that are above the line, four that are on the line, and eight that are below the line.
So which of these lines is "better" at describing the relationship? While personal judgement is useful, statisticians prefer to use more objective methods. To develop criteria for identifying the "better" line, we'll use a concept developed in Part C: the vertical distance from a point to a line.
|
|


|
Person 11, whose arm span is 173 cm and whose height is 185 cm, is represented by the point (173, 185) in the scatter plot. If you were to use the line to predict person 11's height based on his or her arm span, the predicted values would be represented by the point (173, 173), which lies on the line Height = Arm Span. The scatter plot thus far looks like this:
|

|  |
The difference between the actual observed height (Y) and the corresponding hypothetical, predicted height (on the line) is called the error. If we use YL (Y on the line) to designate the Y coordinate that represents the predicted height, then we can calculate the error as follows:
Error = Y - YL
In other words, Error = Actual Observed Height - Predicted Height (on the line).
Finally, the vertical distance between an observed height and a predicted height can be expressed as:
Distance = |Y - YL| = |Error|
|
Let's see how this works for the line Height = Arm Span (i.e., YL = X).
The following table shows the arm span (X), the actual observed height (Y), the predicted height based on the line Height = Arm Span (i.e., YL = X), the error, and the vertical distance between the person's observed height (Y) and predicted height (YL) for Persons 1 through 6 in our study:
|