Outliers
Data Analysis Home

Problem 2

Return to Introduction


On-Line Glossary

Excel How To...

last modified on 1/23/07

email comments to harvey@depauw.edu or william.otto@maine.edu

Problem 1

(Applet courtesy of Prof. C. E. Efstathiou, http://www.chem.uoa.gr/applets/Applet_Index2.htm)

In this problem you will use an applet to explore how the relative position of one data point relative to the remaining data points affects your ability to classify it as an outlier.

Task 1. Begin by placing two points about 1 cm apart somewhere in the middle of the one of the lines and placing one point (the possible "outlier") near the right end of the same line. Click on the CALC button and wait for the results. In the box to the right you will find the calculated value for Q and the probability (P) that rejecting the data point is an error. A P-value of 0.22, for example, means that there is a 22% probability that the "outlier" might come from the same population as the remaining data. At the bottom of the applet is an indication of whether the "outlier" can be rejected at the 90%, 95% and 99% confidence levels. Is the "outlier" really an outlier? What happens if you add an additional point to the middle of the line? How about if you add another data point in the middle of the line?

Consider how the results of this task might affect how you perform an experiment? A common choice in lab, where time is limited by the length of the lab period, is to repeat an analysis three times. What would you do if two of your results are in good agreement, but the other result is much larger or much smaller? Hint.

Task 2. Starting with a new line in the applet, place three data points in the middle of the line, click on the CALC button and note the results of the Q-test. Next, add a point on the far right hand side of the line. Click on the CALC button again. Is this data point an outlier? Finally, add a point on the far left hand side and click on the CALC button. How does this new data point affect your ability to reject a data point? What can you conclude about the importance of precision in rejecting a data point?

Task 3. Use the CLEAR ALL button to remove any data points already entered into the applet. Add three data points on the left hand side of each line in the applet; try to arrange these so that their positions on each line are similar. On each line add one additional data point, arranged such that on each successive line the new data point is further from the first three data points. Beginning with the first data set, use the CALC button to obtain the results of the Q-test. What can you conclude about how the distance of an outlier from the remaining data affects a Q-test? Does this make sense?

When you finish these tasks, move on to problem 2.