Comparing Data
Sets

Introduction

In the Preliminary Analysis module you inferred that pennies minted between 1977 and 1987 come from two populations, with the dividing line occurring in 1982. How confident are you, however, that a mean mass of 3.093 g for a penny minted in 1979 is significantly different than a mean mass of 2.513 g for a penny minted in 1985?

The answer to this question depends not only on the mean values, but also on their respective standard deviations. Consider the two figures shown below:

Each figure shows two normally distributed populations. For each population the distribution's maximum frequency corresponds to the population's mean and the distribution's width is proportional to the population's standard deviation.

The two populations in the figure on the left clearly are well separated from each other and we can state confidently that a sample drawn from one population will have a mean that is significantly different than the mean for a sample drawn from the other population. The two populations on the right, however, are more problematic. Because there is a significant overlap between the two populations, the mean for a sample drawn from one population may be similar to that for a sample drawn from the other population. Fortunately, there are statistical tools that can help us evaluate the probability that the means for two samples are different.

After you complete this module you should:

- appreciate how the mean and standard deviations for two data sets affect our ability to determine if they come from different populations
- be able to use an applet to carry out a t-test for comparing two samples and understand how to interpret the result of this statistical test

Before tackling some problems, read a description of the t-test by following the link on the left.