Linear Regression
Data Analysis Home

Problem 3

Return to Problem 1

Return to Introduction


On-Line Glossary

Excel How To...

last modified on 1/23/07

email comments to harvey@depauw.edu or william.otto@maine.edu

Problem 2

A linear model of the data in Problem 1 gives the following results:

Y = 0.500*X + 3.00

R2 = 0.6665

R = 0.8164

The coefficient of determination (R2 ) tells us that about 2/3 of the variation in the values of Y can be explained by assuming that there is a linear relationship between X and Y. There is no compelling visual evidence that another mathematical model is more appropriate, so the model seems reasonable. The significant scatter in the points, however, suggests that there is substantial uncertainty in X, in Y or in both X and Y.

Let's consider some additional data sets.

Task 1. The table below contains X and Y values for the data from the previous problem (Data Set 1), and for two additional data sets. All three data sets have identical values for X but have different values for Y. Examine the data sets without using any mathematical approaches. Do the data sets appear to be show a similar relationship between X and Y? Explain.

Data Set1

Data Set 2
Data Set 3
X
Y
X
Y
X
Y
10.00
8.04
10.00
9.14
10.00
7.46
8.00
6.95
8.00
8.14
8.00
6.77
13.00
7.58
13.00
8.74
13.00
12.74
9.00
8.81
9.00
8.77
9.00
7.11
11.00
8.33
11.00
9.26
11.00
7.81
14.00
9.96
14.00
8.10
14.00
8.84
6.00
7.24
6.00
6.13
6.00
6.08
4.00
4.26
4.00
3.10
4.00
5.39
12.00
10.84
12.00
9.13
12.00
8.15
7.00
4.82
7.00
7.26
7.00
6.42
5.00
5.68
5.00
4.74
5.00
5.73

Task 2. Data Sets 2 and 3 have statistical characteristics that are identical to those for Data Set 1. All three data sets have the same means for X and Y (9.00 and 7.50, respectively), the same standard deviations for X and Y (3.32 and 2.03, respectively), the same linear models (Y = 0.500*X + 3.00), and nearly identical values for R2 (0.6665, 0.6662 and 0.6663). Given this additional information, reconsider your answer to the question in Task 1: Do you expect that all three data sets are described equally well by the same linear model? Explain.

After completing this task, proceed to Problem 3.