INTRODUCTION TO HYPOTHESIS TESTING

INTRODUCTION TO HYPOTHESIS TESTING Purpose A hypothesis test allows us to draw conclusions or make decisions regarding population from sample data. In the following cases, we are making decisions regarding the population mean (or means). One Sample Hypothesis Tests  Applied to determine if the population mean is consistent with a specified value or standard  Two tests  the z- test  the t-test Assumptions: z-test  the underlying distribution is normal or the Central Limit Theorem can be assumed to hold  the sample has been randomly selected  the population standard deviation is known or the sample size is at least 25. Assumptions: the t- test  the underlying distribution is normal or the Central Limit Theorem can be assumed to hold  the sample has been randomly selected Two Samples Hypothesis Tests  Applied to compare the values of two population means.  Two tests  the z- test  the t-test Assumptions: z-Test  the underlying distribution is normal or the CLT can be assumed to hold  the samples have been randomly and independently selected from two populations  the population standard deviations are known or the sample size of each sample is at least 25. Assumptions: t-Test  the underlying distribution is normal or the CLT can be assumed to hold  the samples have been randomly and independently selected from two populations,  the variability of the measurements in the two populations is the same and can be measured by a common variance. (There is a t-test that does not make this assumption; it is available when using Minitab.) Goodson/ 3360hyp 1 The Logic of Hypothesis Tests  Assume a population distribution with a specified population mean.  State the hypothesized population mean (this statement is referred to as the null hypothesis). This mean is stated as the null hypothesis and is designated Ho. For example,  = 10  State the logical alternative to this hypothesis. This is called the alternate hypothesis and is designated Ha. For example,   10. (Note the alternate hypothesis can have other forms since the concept of not equal can imply  > 10 or  < 10.)  Draw a random sample from the population.  Calculate the sample mean x . This sample mean represents one point on the distribution of sample means.  Determine the “relative position” of the calculated mean (sample mean) on the distribution of sample means.  If the sample mean is “close” to the specified population mean, we do not have evidence to reject the hypothesized population mean.  If the calculated sample mean is “not close” to the specified population mean, we conclude that our sample could not have been drawn from the hypothesized distribution, and thus, we reject the null hypothesis. Goodson/ 3360hyp 2 Problems for Discussion Example: One Sample Hypothesis Test Large Sample: Sample size: n > 30 1. The scores on an aptitude test required for entry into a certain job position have a mean of 500 and a standard deviation of 120. If a random sample of 36 applicants has a mean of 546, is there evidence that their mean score is different from the mean that is expected from all applicants? Null and Alternative Hypothesis  = 500 Ha:   500 Convert 546 to a z-score to compare it to the assumed population mean. z x   n 546  500 46   2.3 120 20 36 This means that 546 is 2.3 standard deviations from the hypothesized mean. Using the z-table, we find that the probability that a value is to the right of 2.3 or to the left of -2.3 is 2*(.0107) = 0.0214. This value is called the p value p = 0.0214. This probability is considered very small (values less than 0.05 are typically considered small). Thus, if the mean is really 500, it is unlikely that we would get a sample mean that is 2.3 standard deviations from it. Thus, we conclude that the population mean is not 500; that is we reject the null hypothesis and accept the alternate, concluding that the mean is not 500. The probability that we are rejecting a true null hypothesis is 0.0294 (the value of p). Let’s construct a 95% confidence interval estimate of the population mean. 546  1.96*( 120 ) 36 546  39.2 The lower limit of the interval is 546 - 39.2 = 506.8 The upper limit of the interval is 546 + 39.2 = 585.2 Thus, we conclude that the actual mean score for the population from which this sample was drawn falls between 507 and 585. Small Samples 2. Do problem number 1 assuming that the sample size is 16. Goodson/ 3360hyp 3 Approach the problem the same way as in 1, using the t-distribution. t x   546  500 46    1.5 .3 120 s 30 16 n The degrees of freedom is 16-1=15 Using the t-table with 15 degrees of freedom, we find the closest t-value to 1.53 is 1.753 and that the associated probability is 0.05. Find is 2*(0.05) = 0.1. We then write the p value as p < 0.1. Paired Samples Test  Find the difference in the paired values  Treat the difference scores as one sample.  Apply a one-sample test. Example 3. Compare the difference in cost for 1997 vs. 1996. 1997 18.36 32.82 23.58 17.52 19.12 14.85 30.50 25.06 30.89 35.74 19.33 30.92 34.30 1996 Difference 18.41 -0.05 31.34 1.48 37.36 -13.78 16.58 0.94 21.35 -2.23 14.59 0.26 31.00 -0.50 26.21 -1.15 31.52 -0.63 35.21 0.53 19.55 -0.22 25.75 5.17 33.91 0.39 Do a one-sample test on the difference values. Z = -0.63 P = 0.54 Do not reject Ho; we have no evidence to conclude that there has been a change in the cost for 1997 over 1996. Hypothesis Test: Two Independent Samples Large Samples  Goodson/ 3360hyp 4 4. The training department of a company wishes to determine if there is any difference in the performance between the workers that have completed a training program and those that have not completed the program. A sample of 100 trained workers reveals an average output of 74.3 parts per hour with a sample standard deviation of 16 parts per hour. A sample of 100 who have not been trained has an average output of 69.7 parts per hour with a standard deviation of 18 parts per hour. Is there evidence of a difference in output between the two groups? Write a 95% confidence interval estimate of the difference. Null and Alternative Hypothesis Ho:  =  1 2  Ha:  1 (You could use  > if you want to test that the output of the trained 1 2 2 group is greater than that of the untrained group.) Mean Variance N Training No Training 74.3 69.7 16*16 = 256 100 18*18 = 324 100 Difference in Sample Means = 74.3 - 69.7 = 4.6 256 324 Standard Error = = 2.408  100 100 Calculated z = 4.6/2.408 = 1.91 Table z = 1.96 Conclusion: Note 1.91 < 1.96. Thus - do not reject Ho. The data does not support a difference in the production between trained and untrained. 95% Confidence Interval Estimate of the Difference Margin of Error = 1.96*2.408 = 4.72 Lower Limit: 4.6 – 4.72 = -0.12 Upper Limit: 4.6 + 4.72 = 9.32 Note that the interval [-0.12,9.32] contains 0. This supports the null hypothesis of no difference in population means. Goodson/ 3360hyp 5

INTRODUCTION TO HYPOTHESIS TESTING

Related documents

Products

Support

INTRODUCTION TO HYPOTHESIS TESTING

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib