Page 1 of 4 MAT 155 Statistical Analysis Chapter 9 Inferences from Two Samples 0901-NotesCh9.doc Chapter 9 Inferences from Two Samples Consider the following problem. The Engineering Department at Sims Software, Inc., recently developed two chemical solutions designed to increase the usable life of computer disks. A sample of disks treated with first solution lasted 86, 78, 66, 83, 84, 81, 84, 109, 65, and 102 hours. Those treated with the second solution lasted 91, 71, 75, 76, 87, 79, 73, 76, 79, 78, 87, 90, 76, and 72 hours. At the 0.10 significance level, can we conclude that there is a difference in the length of time the two types of treatment lasted? From the question, we see that we are to compare the mean number of hours each treatment lasted. So we are using the hypothesis test to test whether the two means are equal (...can we conclude that there is a difference...?). SOLUTION: In this problem, we do not know the mean and standard deviation for the samples. Thus, we calculate the mean and standard deviation for each of the samples before beginning our 5-step procedure. Using the TI-84 calculator or Statdisk, we find the means and standard deviations (rounded to one-decimal place accuracy) as listed in the table below. Treatment 1 2 Sample Size 10 14 Mean 83.8 79.3 Standard Deviation 13.7 6.7 We do not know the population standard deviations. However, by using Statdisk, the histograms, the boxplots, and the Normal Quantile plots indicate that both sets of data may be considered normal distributions. Thus, we should use the t-test as the test statistic. Step 1. State the null hypothesis (contains =, , or ) and alternate hypothesis (contains ≠, <, or >). Think of the statement “can we conclude that there is a difference in the length of time the two types of treatment lasted?” From “…there is a difference…,” we write the alternate hypothesis as H1: 1 2 H0: 1 = 2 H1: 1 2 (Claim) Step 2. Select a level of significance. Stated in the problem as 10% or = 0.10 Step 3. Identify the statistical test to use. Since the two sets of data are not related or paired or matched, we consider the two samples to be independent. Also, by Statdisk’s Prelim F-test, we reject the equal variances. Thus, we calculate the test statistic with the following formula. t x 1 x 2 1 2 s12 s22 n1 n2 Page 2 of 4 MAT 155 Statistical Analysis Chapter 9 Inferences from Two Samples 0901-NotesCh9.doc Step 4. Formulate a decision rule. Since the alternate hypothesis states H1: 1 2, this is a two-tailed test. For the degrees of freedom, we use the smaller of n1 – 1 or n2 – 1. Since n1 – 1 = 10 – 1 = 9 and n2 – 1 = 14 – 1 = 13, we use df = 9 and = 0.10 for a two-tailed test to find critical values t = 1.833, from Table A-3. [Note: Statdisk yields critical values t = ±1.781.] So the decision rule is to Reject the null hypothesis if the computed t-value is less than –1.833 or greater than 1.833. Step 5. Take a sample; arrive at a decision. The two sample means and standard deviations, as shown above in the table above Step 1, are used to compute the t-value for comparison with the critical t-values in Step 4. t 83.8 79.3 0 0.9599 13.7 2 6.7 2 10 14 Since the computed t = 0.960 is between the critical t-values of –1.833 and 1.833, we do not reject the null hypothesis that the means are statistically the same or equal at the 10% level of significance. Thus, we conclude that the lengths of time the two types of treatment lasted are equal. [Note: Both Statdisk and TI-84 Plus yield the same computed t-value of 0.9599; thus, the same pvalue of 0.3559.] If it is determined that the samples variances are equal, we pool the variances using the following formulas. t x 1 x 2 1 2 s 2p n1 where s 2p s 2p n2 n1 1s12 n2 1s22 (pooled variance) n1 1 n2 1 and the number of degrees of freedom is given by df = n1 + n2 - 2 MAT 155 Statistical Analysis Chapter 9 Inferences from Two Samples 0901-NotesCh9.doc Page 3 of 4 [The following notes are from the Help session of Statdisk.] Hypothesis Testing for the Proportion of Two Samples This test is used to test a claim about two different population proportions (or percentages), such as this claim: "The percentage of males who smoke is equal to the percentage of females who smoke." First select the format of the original claim that is to be tested. Note that there are 6 different choices, which can be viewed by clicking on the small box to the right of the default claim ("Pop. Proportion 1= Pop. Proportion 2"). Use the up/down arrow keys to highlight the desired choice, then click on that desired choice. STATDISK will determine the appropriate null hypothesis from the claim selected. Significance (level): Enter a positive value in decimal form between 0.00001 and 0.50, such as 0.05. For Sample 1, make these entries: Sample Size, n1: Enter the size of the FIRST sample as a positive whole number. Num Successes, x1: Enter the number of successes as a positive whole number. For example, if testing the claim that "the percentage of males who smoke is equal to the percentage of females who smoke," and the first sample consists of 200 males with 80 of them smoking, then n1 = 200 and x1 = 80. For Sample 2, make the corresponding entries as described above. Click on the Evaluate button to obtain the results. Click on the Print button to print the results. Click on the Plot button to display a graph showing the distribution with the test statistic and critical value(s). To close the window, click on the X at the top of the module window. Hypothesis Testing for the Mean of Two Matched Pairs Note: This test requires that paired data are already listed in the STATDISK data window. If the data are not already there, close this window and enter or open data sets so that they are listed in columns of the STATDISK Data Window. (To open a data set from Appendix B in the textbook, click on "Datasets" at the top. To manually enter a data set, click on Data, then click on Sample Editor. The data can also be copied into STATDISK from another application, such as Minitab or Excel.) This test is used to test a claim about the mean of the differences between matched pairs of data. The data must be matched or paired. For example, in testing the effectiveness of new diet, obtain sample before and after weights for each subject, then proceed to test the claim that the differences have a mean that is positive. [Matched or paired data is such as a before and after treatment experiment or test using the same subjects in the before and in the after. For example, consider the effectiveness of a diet using the weights of subjects measured before and after the diet treatment. Each “before” value is matched or paired with the “after” value because each before/after pair of measurements comes from the same person.] First select the format of the original claim that is to be tested. Note that there are 6 different choices, which can be viewed by clicking on the small box to the right of the default claim ("Mean of Differences = 0"). Use the up/down arrow keys to highlight the desired choice, then click on that desired choice. STATDISK will determine the appropriate null hypothesis from the claim selected. Significance (level): Enter a positive value in decimal form, such as 0.05. Choose the two columns to be used as sample data. The columns must be matched or paired, so they must have the same number of entries. Click on the Evaluate button to obtain the results. Click on the Print button to print the results. Click on the Plot button to display a graph showing the distribution with the test statistic and critical value(s). To close the window, click on the X at the top of the module window. MAT 155 Statistical Analysis Chapter 9 Inferences from Two Samples 0901-NotesCh9.doc Page 4 of 4 Hypothesis Testing: Mean Two Independent Samples This test is used to test a claim about the equality of the means of two independent populations. (If the data are somehow matched or paired, do not use this test; use the hypothesis test for "Mean Matched Pairs.") For example, this test could be used for the claim that "Men have a mean IQ score that is equal to the mean IQ score of women," which would be denoted as "Pop. Mean 1 = Pop. Mean 2." First select the format of the original claim that is to be tested. Note that there are 6 different choices, which can be viewed by clicking on the small box to the right of the default claim ("Pop. Mean 1= Pop. Mean 2"). Use the up/down arrow keys to highlight the desired choice, then click on that desired choice. STATDISK will determine the appropriate null hypothesis from the claim selected. Significance (level): Enter a positive value in decimal form, such as 0.05. For Sample 1, make these entries: Sample Size, n1: Enter the sample size as a positive whole number. Sample 1 mean: Enter the sample mean of the first sample. Sample 1 StDev: Enter the standard deviation of the first sample as a positive number. Population StDev: Enter the value of the standard deviation of the first POPULATION only if that value is known. In reality, this value is usually not known, so this box is usually left blank. DO NOT ENTER THE VALUE OF THE SAMPLE STANDARD DEVIATION HERE. For the second sample, make the corresponding entries as described above. If the two POPULATION standard deviations are entered, the results will be based on calculations using a normal distribution, as described in the textbook. If the two POPULATION standard deviations are not both known (as is usually the case), you must choose among these options: Not Eq vars: NO POOL (That is, do not assume that the two population variances are equal, and do not use a pooled estimate of a common population variance.) Eq. vars: POOL (That is, assume that the two population variances are equal, and use a pooled estimate of the common population variance.) Prelim F-Test (That is, have STATDISK do a preliminary F test to decide whether the two population variances appear to be equal. Then, based on the result, proceed by either assuming that the two population variances are not equal (NO POOL) or by assuming that the two population variances are equal (POOL).) Among the above three options, the NO POOL option generally yields the best results and it is the recommended option. Note: If the NO POOL option is used, STATDISK proceeds with calculations using the t distribution with the number of degrees of freedom found from the formula included in the textbook (instead of using the simplified "df = smaller of n1-1 and n2-1"). STADISK displays the number of degrees of freedom. Because STATDISK uses the more accurate formula for df, the results may differ somewhat from those given in the textbook. Click on the Evaluate button to obtain the results. Click on the Print button to print the results. Click on the Plot button to display a graph showing the distribution with the test statistic and critical value(s). To close the window, click on the X at the top of the module window.