Outliers

Summary

email comments to harvey@depauw.edu or william.otto@maine.edu

Problem 3

In Problem 2 you should have identified two possible outliers. You can reject one of the possible outliers, the penny from 1943, without a Q-test because you know, from external evidence, that its composition is different from that of the other pennies. The 1943 penny must, therefore, come from a different population and should not be included with the remaining pennies. This is an important point. If you know that there is a significant error affecting one data point that does not affect other data points, then you should eliminate that data point without regard to whether its value is similar to or different from the remaining data. Suppose, for example, that you are titrating several replicate samples and using the color change of an indicator to signal the endpoint. The indicator is blue before the endpoint, is green at the endpoint and is yellow after the endpoint. You should, in this case, immediately discard the result of any titration in which you overshoot the endpoint by titrating to the indicator's yellow color.

The other apparent outlier in this data set, one of the three pennies from 1950, has no simple explanation. Examining the penny might show, for example, that it is substantially corroded or that it is coated with some sort of deposit. In this case, we would again have good reason to reject the result. But what if the 1950 penny does not appear different from any other penny? Without a clear reason to reject the penny as an outlier, we must retain it in our data set.

In a situation such as this, the Q-test provides a method for evaluating the suspected outlier. For small data sets consisting of 3 to 10 data points, the critical values of Q are given here:

 Confidence Level Samples 90% 95% 99% 3 0.941 0.970 0.994 4 0.765 0.829 0.926 5 0.642 0.710 0.821 6 0.560 0.625 0.740 7 0.507 0.568 0.680 8 0.468 0.526 0.634 9 0.437 0.493 0.598 10 0.412 0.466 0.568

Task 1. In the introduction we examined results for the concentration of Pb in seven samples of a sediment. Given the following data (in ppb):

4.5, 4.9, 5.6, 4.2, 6.2, 5.2, 9.9

is there any evidence at the 95% confidence level that the result of 9.9 ppb is an outlier? Is your answer different at the 90% or 99% confidence levels?

After completing this task, proceed to the module's summary using the link on the left.