Comparing Data Sets

Problem 2

email comments to harvey@depauw.edu or william.otto@maine.edu

Problem 1

(Applet courtesy of Prof. C. E. Efstathiou, http://www.chem.uoa.gr/applets/Applet_Index2.htm)

In this problem you will use an applet to explore how the means, standard deviations and sizes for two samples affects the ability of a t-test to discriminate between the samples.

Task 1. With the "Demo data" radio button selected, enter three data points for Data Set A at approximately 48, 48.5 and 49. (To place a point, position the cursor and left-click with your mouse; each point appears as a red dot and a red line.) Add three data point for Data Set B at approximately 51, 51.5 and 52.

Click on the CALCULATE button to display the results. The number of data points, the mean and the standard deviation for each data set are in shown in the box to the right, and the mean and standard deviation are superimposed on the data as vertical and horizontal blue lines. Of particular interest to us is the confidence levels (CL) for the t-test . The statement "The means ARE different at CL 95%" indicates that that there is at least a 95% probability that the two samples are from different populations. A statement such as "The means ARE NOT different at CL 95%" means that we cannot be 95% sure that the samples are different. The value for P(type 1 error) also provides useful information; the exact confidence level at which we can show a difference between the two sample's is

100 * {1.000 - P(type 1 error)}

For example, if P(type 1 error) is 0.212, then we can be

100 * (1.000 - 0.212) = 78.2%

78.2% confident that the samples are different (and there is a 21.2% probability that this conclusion is incorrect).

Examine the calculated mean and standard deviation for each data set and compare the numerical results to the visual picture provided by the horizontal and vertical blue lines. Do the two data sets overlap? Does it appear that the data sets represent different populations? What does the t-test report state?

Task 2. Clear the data sets by clicking on the button labeled CLEAR. Create two new data sets by adding points at approximately 49, 49.5 and 50 for Data Set A and at approximately 50, 50.5 and 51 for Data Set B. How confident can you be that these two data sets come from different populations?

Add an additional data point to Data Set A between 49 and 50. How does this additional point change your analysis? Add an additional data point to Data Set B between 50 and 51. How does this additional point change your analysis? Continue adding data points to the two data sets (between 49 and 50 for Data Set A and between 50 and 51 for Data Set B) until the difference between the two data sets is significant at the 99% CL. What happens to the means and standard deviations as you add more samples? How many samples did you need?

Task 3. Clear the data sets and create two new data sets by adding points at approximately 49.5, 50 and 50.5 for Data Set A and at approximately 50, 50.5 and 51 for Data Set B. Continue to add data points to each set (between 49.5 and 50.5 for Data Set A and between 50 and 51 for Data Set B) until you can show that the samples are different at the 99% CL. How many samples did you need?

Task 4. Briefly summarize your general conclusions from these three tasks. In your answer, consider how factors such as the location of the means, the size of the standard deviations and the size of the samples affect the results of the t-test.

When you are done, proceed to Problem 2.