Hypothesis Testing Methods for making inferences about population parameters fall into one of two categories. Either we will estimate the value of the population parameter of interest or we will test a hypothesis about the value of the parameter. With confidence interval estimates there was no supposition about the actual value of the parameter prior to collecting the data. In hypothesis testing, there is a preconceived idea about the value of the population parameter. For example, in studying the antipsychotic properties of an experimental compound, we might ask whether the average shock-avoidance response of rats treated with a specific dose of the compound is greater than 60, >60, the value that has been observed after extensive testing using a suitable standard drug. Thus, there are two hypotheses involved in a statistical study. The first is the hypothesis being proposed by the person conducting the study, called the research hypothesis, >60 in our example. The second is the negation of the research hypothesis, called the null hypothesis, 60 in our example. The goal of the study is to decide whether the sample data tend to support the research hypothesis. The fundamental idea behind hypothesis testing procedures is this: We reject the null hypothesis(H0) if the observed sample is very unlikely to have occurred when H0 is true. We begin by assuming that the null hypothesis is correct. The sample is then examined in light of this assumption. If the observed sample would not be unusual when H0 is true, then chance variation from one sample to another is a plausible explanation for what has been observed, and H0 is not rejected. On the other hand, if the observed sample would have been quite unlikely were H0 true, we take the sample as convincing evidence against the null hypothesis and reject H0. We base a decision to reject or to fail to reject the null hypothesis on an assessment of how extreme or unlikely the observed sample is when H0 is true. A statistical hypothesis test is only capable of demonstrating strong support for the alternative(research) hypothesis (by rejection of the null hypothesis). When the null hypothesis is not rejected, it does not mean strong support for H0, only lack of strong evidence against it. Hypothesis Test for a Population Mean Using a Large Sample (n>30) H0: = o Ha: 1. > o 2. < o 3. o Test Statistic: z x o n Rejection Region: For a probability of a Type-I error, we can reject H0 if 1. 2. 3. z z z - z z zor z - z if (the population standard deviation) is unknown, s may be used instead as an approximation. Example The average (mean) live weight of a farmer’s steers prior to slaughter was 380 pounds in past years. This year his 50 steers were fed on a new diet. Suppose we consider these 50 steers on the new diet as a random sample taken from a population of all possible steers that may be fed the diet now or in the future. Use the sample data given below and =.01 to test the research hypothesis that the mean live weight for steers on the new diet is greater than 380. x =390 s=35.2. = population mean live weight of steers fed on the new diet H0: = 380 Ha: > 380 Significance Level: = P(Type-I Error) = .01 Test Statistic: z = x 380 s n Rejection Region: For =.01, we reject H0 if z Calculations: z = zwhere z=2.33 390 380 = 2.01 35.2 50 Conclusion: Using =.01we fail to reject H0. There is not sufficient evidence to conclude that the mean live weight for steers on the new diet is greater than 380. Example WSU uses thousands of fluorescent light bulbs each year. The brand of bulb it currently uses has a mean life of 900 hours. A manufacturer claims that its new brand of bulbs, which cost the same as the brand the university currently uses, has a mean life of more than 900 hours. The university has decided to purchase the new brand if, when tested, the test evidence supports the manufacturer’s claim at =.05. Suppose sixty-four bulbs were tested with the following results: x = 930 hours s= 80 hours Will WSU purchase the new brand of fluorescent bulbs? Conduct hypothesis test. = population mean life for the new brand of bulbs H0: = 900 Ha: > 900 (the mean life for the new brand of bulbs is higher than the mean life for the old brand) Significance Level: = .05 Test Statistic: z = x 900 s n Rejection Region: For =.05, we reject H0 if z Calculations: z = zwhere z= 1.645 930 900 = 3.00 80 64 Conclusion: Using =.05we reject H0. There is sufficient evidence to conclude that the mean life for the new brand of bulbs is greater than 900. P-value Approach to Hypothesis Testing The p-value associated with the observed value of the test statistic is the probability of getting the observed value or a value more extreme (in the direction of Ha), assuming H0 is true. The p-value represents the probability of observing a sample outcome more contradictory to H0 than the observed sample result. The smaller the p-value, the stronger the evidence is against the null hypothesis. To draw conclusions using p-value, you compare the p-value with and draw your conclusions using the following rule. If the p-value reject H0 and conclude that Ha is true. If the p-value > fail to reject H0. Type-I Error, Type-II error, and Power in Hypothesis Testing A Type I error refers to the decision of rejecting Ho when it is actually true. The probability of making a type I error is denoted by . A Type II error refers to the decision of failing to reject Ho when it false. The probability of making a type II error is denoted by . The power of a test is the probability of rejecting a false null hypothesis. It is denoted by 1- , where probability of making a Type-II error. is the If you decrease , will increase. If you increase , willdecrease. Statistical versus Practical Significance When the value of the test statistic falls in the rejection region, it is customary to say that the result is statistically significant at the chosen level However, a statistically significant result may not having any practical consequences. This is something to be wary of when a very large sample is used in carrying out the hypothesis test. The following example is used to illustrate this point. Example Let denote the population average IQ for children in a certain region of the United States. The average IQ for all children in the United States is 100. Education authorities are interested in testing H0: =100 Ha: >100 A sample of 2500 students resulted in the following x =101 s=15 Using =.01, carry out hypothesis test. With n=2500, the point estimate x = 101 is almost surely very close to the true value of . So it looks as though H0 was rejected because 101 rather than 100. And from a practical point, a 1-pont IQ difference has no significance. So the statistically significant result does not have any practical consequences. Practice Problems Hypothesis Testing Practice Problem 1 WSU uses thousands of fluorescent light bulbs each year. The brand of bulb it currently uses has a mean life of 900 hours. A manufacturer claims that its new brand of bulbs, which cost the same as the brand the university currently uses, has a mean life of more than 900 hours. The university has decided to purchase the new brand if, when tested, the test evidence supports the manufacturer’s claim at =.05. Suppose sixty-four bulbs were tested with the following results: x = 930 hours s= 80 hours Will WSU purchase the new brand of fluorescent bulbs? Conduct hypothesis test. Use traditional and p-value approach. Practice Problem 2 A nutritionist believes that a 12 ounce box of breakfast cereal should contain an average of 1.2 ounces of bran. The nutritionist measures a random sample of sixty boxes of popular cereal for bran content. Suppose the data yield x 1.170 s = .111 Do the data indicate that the mean bran content of all boxes of this brand of cereal differs from 1.2 ounces? Use =.05. Use traditional and p-value approach. Practice Problem 3 Speed, size and strength are thought to be important factors in football performance. The paper “Physical and Performance Characteristics of NCAA Division I Football Players” (Research Quarterly for Exercise and Sport (1990) : 395-401) reported on physical characteristics of Division I starting football players in the 1988 football season. Information for teams ranked in the top 20 was easily obtained, and it was reported that the mean weight of starters on top-20 teams was 105 kg. A sample of 33 starting players (various positions were represented) from Division I teams that were not ranked in the top 20 resulted in a sample mean of 103.3 kg and a sample standard deviation of 16.3 kg. Is there sufficient evidence to conclude that the mean weight for non top-20 starters is less than the known value for top-20 teams. Conduct hypothesis test using =.01. Use traditional approach and p-value approach. Practice Problem 4 Factors influencing the Power of a Hypothesis Test Suppose we are interested in testing H0: =105 vs. Ha: >105 using a sample of size n=35 and =.025. Based on historical data we estimate to be 17. a. If is actually equal to 108, what is the probability that we would reject H0: =105 using a sample of size n=35 and =.025? b. If is actually equal to 110, what is the probability that we would reject H0: =105 using a sample of size n=35 and =.025? c. If is actually equal to 114, what is the probability that we would reject H0: =105 using a sample of size n=35 and =.025? What do you learn from the above calculations? d. If is actually equal to 108, what is the probability that we would reject H0: =105 using a sample of size n=50 and =.025? e. If is actually equal to 108, what is the probability that we would reject H0: =105 using a sample of size n=100 and =.025? f. If is actually equal to 108, what is the probability that we would reject H0: =105 using a sample of size n=200 and =.025? What do you learn?