Inferences about the Difference in Two Population Means Independent Samples Paired (or Related) Samples When you finish these notes, for each procedure, you should know: a. When to use each b. their requirements c. How to determine if the requirements are met d. How to test a hypothesis about the difference e. How to estimate the difference in means 1. Inferences about Difference in Two Means for Independent Samples 1.1 Notes 1.1.1 When it should be used o You wish to infer about the difference (on average) about two populations o You have collected a sample from each population Example of use: Best Management Practices: (search for two sample) https://my.sfwmd.gov/pls/portal/docs/PAGE/COMMON/NEWSR/rog_final_schedule_4_2_0 1_12_09.pdf 1.1.2 Requirements for t distribution o o o o Same variance: Both populations have the same unknown variation Independence: Random sample from population 1 and another independent random sample from population 2 Normality: Both populations are normally distributed or large sample sizes Use the tests from NCSS to check the assumptions of normality and equal variance 1.1.3 Estimate of difference in population means o Differences in sample means 1.1.4 Degrees of freedom o o n1-1 from sample one and n2-1 from sample two Total degrees of freedom = (n1+n2-2): 1.1.5 Variance Estimate o o o o Since population variances are the same only one estimate is needed. Use information from both sample variances Weight the variances by the percent of information from each sample using the degrees of freedom Estimate is then a pooled variance: n1 1 2 n2 1 2 S p2 S1 S2 n1 n2 2 n1 n2 2 or (n 1) S12 (n2 1) S22 S p2 1 n1 n2 2 o Example: the first sample of size 91 has a variance of 10 and the second sample of size 11 has a variance of 30 total degrees of freedom = 100 first variance is multiplied by 0.90 and second by 0.10 pooled variance is then 0.90*10+0.10*30=12 1.1.6 Standard error of difference in sample means Standard errors: standard deviation divided by square root of sample size Here we have two samples and therefore two sample sizes Estimate of standard error is S( x1 x2 ) S p 1 1 n1 n2 1.2 Hypothesis test of differences in population averages – Used when you are testing a given value of the difference 1.2.1 Approach Overall: See if there is too much distance between the sample difference in means and the hypothesized difference in number of standard errors. Step 1: Determine what you wish to show; this goes in the alternative hypothesis. Step 2: Determine the null hypothesis. This hypothesis must contain an equal sign specifying the hypothesized difference. Step 3: Determine what values of sample differences would reject the null and support what you wish to show. Use a t-table to determine how far you would have to go to support the null. Step 4. Calculate how far the difference in sample means from the hypothesized value in number of standard errors. Using step 3, decide if the data supports the alternative Step 4 alternative. Calculate how far your sample difference is from the hypothesized value in number of standard errors. If the likelihood of this is small (less than ), you can reject the null and support the alternative. Step 5: Restate you conclusion in terms of the problem. 1.2.2 Example Is there a difference in average fill of two box filling machines? A random sample of 25 boxes from the first machine showed a mean of 379.5 ounces with a standard deviation of 15 while a random sample of 25 boxes from the second machine showed a sample average of 374.5 with a standard deviation of 14. The pooled standard deviation is then 14.509 Test at the 0.05 level. H0 = 0 (There is no difference in average fill between two machines) H1: 0 (There is a difference in average fill between two machines) This is a two-sided rejection region since only sample differences far above zero or far below zero would cause you to reject the null and support the alternative. Therefore degrees of freedom = 24+24 and t = 2.0106 48, 0.025 Rejection Region: t > 2.0106 or t < -2.0106 Test statistic t ( X 1 X 2 ) ( 1 2 ) (379.5 374.5) (0) 1.22 S ( x1 x2 ) 1 1 14.509 25 25 One of the two sides for the p-value is found by finding 1.22 in the t-table on row 48. It falls between the 0.10 and 0.25 column. The p-value is then twice that or between 0.20 and 0.50 Make statistical decision. Fail to reject that - = 0. Conclusion: We do not have enough evidence to conclude that there is a difference in average fill of two machines. 1.3 Confidence interval for difference in population averages 1.3.1 Notes Uses the same parts and requirements as a hypothesis test 1.3.2 Margin of error Knowledge: Sample sizes Confidence: t-table value Variance: pooled estimate of variance 1.3.3 Formula: difference in sample averages plus and minus the margin of error 1.3.4 Example : What is the difference in average fill of two machines that fill boxes of cereal? A random sample of 25 boxes from the first machine showed a mean of 379.5 ounces with a standard deviation of 15 while a random sample of 25 boxes from the second machine showed a sample average of 374.5 with a standard deviation of 14. The pooled standard deviation is then 14.509 Use a 90% confidence level. Formula: t ( X 1 X 2 ) (t n1 n1 2 )S ( x1 x2 ) o t o S ( x1 x2 ) S p 48, 0.05 = 1.6772 1 1 1 1 (14.509) 4.10 n1 n2 25 25 (379.5 274.5) 1.6772(4.10) 5 6.8 Substitution: Conclusion: With 90% confidence we can say that the average fill of machine one is 5 ounces more than the average fill of machine two with a margin of error of 6.8 ounces. 2. Inferences about Difference in Two Means for Related Samples 1.1 Notes 2.1.1 When it should be used You wish to infer about a difference in means You have a random sample of a pair of values Examples: o exam 1 and exam 2 grades for each of 20 students o restaurant 1 sales and restaurant 2 sales on the same 10 days o Assessed value and sales prices for each of 15 houses 2.1.2 Requirements for t distribution Random sample of pairs Difference in paired values is normally distributed or a large number of pairs Use NCSS to test the assumption of normality 2.1.3 Estimate of difference in population means Average of the differences between paired values in the sample 2.1.4 Degrees of freedom (n-1) where n is the number of pairs 2.1.5 Variance Estimate Find the sample variance of the differences in paired values 2.1.6 Standard error of difference in sample means Standard errors: standard deviation divided by square root of sample size 2.2 Hypothesis test of differences in population averages 2.2.1 Approach Overall: See if there is too much distance between the sample average difference and from the hypothesized difference in number of standard errors. Step 1: Determine what you wish to show; this goes in the alternative hypothesis. Step 2: Determine the null hypothesis. This hypothesis must contain an equal sign specifying the hypothesized difference. Step 3: Determine what values of sample average difference would reject the null and support what you wish to show. Use a t-table with n-1 degrees of freedom to determine how far you would have to go to support the null. Step 4. Calculate how far you sample mean difference is from the hypothesized mean difference in number of standard errors. Using step 3, decide if the data supports the alternative Step 4 alternative. Calculate how far your sample average difference is from the hypothesized value in number of standard errors. If the likelihood of this is small (less than ), you can reject the null and support the alternative. Step 5: Restate you conclusion in terms of the problem. 2.2.2 Example Is there a difference in average number of customers served by two workers doing similar jobs? Because the demand is not the same on different days, a random sample of ten days is selected and the number of customers served is measure for both workers on those days. The sample mean difference is 1.4 with a sample standard deviation of 3.5 Worker 1 2 Difference 1 20 18 2 2 19 21 -2 3 14 11 3 4 3 2 1 Day 5 24 14 10 6 14 15 -1 7 9 9 0 8 14 16 -2 9 11 10 1 H0 = 0 (The average difference between workers is zero) H1 0 (There is a difference between workers on average.) This is a two-sided rejection region since only sample averages far above zero or far below zero would cause you to reject the null and support the alternative. Degrees of freedom = 9 and t-table = 2.262 Rejection Region: Reject Ho if t < -2.262 or t > 2.262 Test statistic given the sample mean is 1.4 and the sample standard deviation is 3.47 X (1.4 0) t 1.28 s 3.47 n 10 One of the two sides for the p-value is found by finding 1.28 in the t-table on row 9. It falls between the 0.025 and 0.05 columns. The p-value is then between 0.05 and 0.10 10 18 16 2 Make statistical decision. Conclusion: We do not have enough evidence to conclude that there a difference in average number of customers served by two workers doing similar jobs 2.3 Confidence interval for difference in population averages 2.3.1 Notes Uses the same parts and requirements as a hypothesis test 2.3.2 Margin of error Knowledge: number of pairs Confidence: t-table value Variance: sample variance of differences 2.3.3 Formula: Sample average difference plus and minus the margin of error 2.3.4 Example : What is the difference in average number of customers served by two workers doing similar jobs? The sample mean difference is 1.4 with a sample standard deviation of 3.5, Use a 99% confidence level. Formula: t X (t n 1 ) o S n t = 3.250 9 o 3.5 1.4 3.25 10 Substitution: 1.4 3.6 Conclusion: With 99% confidence we can say that the number of customers served by worker one is, on average, 1.4 more than worker two with a margin of error of 3.6 people. 3. Use in Business: Government reports on small business research http://www.sba.gov/advo/research/rs205.pdf#search=%22t-tests%20business%20-.edu%22