The importance of visually examining a regression line relative to the data on which the regression is based is a point that cannot be overemphasized. The Anscombe data set, created by the statistician F. J. Anscombe (American Statistician, February 1973, 17-21), provides four sets of data each of which yields the same regression results:
y = 3.00 + 0.500x
correlation coefficient of 0.816
A visual examination of the data sets and their respective regression lines shows that a linear model is inappropriate for all but one of the data sets. The Excel file in this post contains the four data sets, each on a separate tab.