252doctor 9/13/07 The Doctor’s Problem In Lucy Horwitz and Lou Ferleger’s book Statistics for Social Change (South End Press, 1980) the authors talk about a doctor who believes that a group of workers are suffering from respiratory illness. She knows that the number of sick days taken will be increased by a respiratory illness and that similar workers nationally take an average of 1.5 sick days a month and that, nationally, the standard deviation is 1.5. She takes a sample of 100 workers and finds a sample mean of 2.00. This is thus a rare example of a test of a population mean when the standard deviation is known. There are three ways we can test this mean of 1.5. We can simply test whether the mean is 1.5. This would give us the following hypotheses: H 0 : 1.5 . H 1 : 1.5 We can take her belief that the mean number of sick days is above the norm (1.5) seriously. This, statement, because it does not contain an equality would give us the following hypotheses: H 0 : 1.5 . H 1 : 1.5 We may hear management claim that their workers are healthier then the norm, which would imply that they take fewer sick days than the norm, which would imply that the mean number of sick days is below 1.5. This would give us the following hypotheses: H 0 : 1.5 . H 1 : 1.5 The doctor wrongly uses the first of these tests, which is bad as a precedent but excellent for teaching purposes. We thus have x 0.8 0.08 . We will assume that the confidence level is 95%, n 100 H 0 : 1.5 so that the significance level is .05 . We are testing the two-sided hypotheses . H 1 : 1.5 Test Ratio Method: x 0 z , 0 1.5 . Make a diagram showing a ‘Normal’ curve with a mean at zero and two x values of z cutting off 2.5% tails on both sides of zero. According to the t table z z.025 1.960 , so the two ‘reject’ regions are the area below -1.960 and the area above +1.960. z 2.0 1.5 6.25 is in the ‘reject’ region, so reject the null hypothesis. 0.08 Critical Value Method for x : The formula table says xcv z x , and this is a formula for two critical values, which is 2 what we need if the null hypothesis simply says that the mean is 1.5. We want two critical values, one above and one below 1.5, since it should be obvious that if our sample mean is too far above or below 1.5, we would reject the null hypothesis. We use xcv 0 t x 1.5 1.960 0.08 252doctor 9/13/07 1.5 0.157 . Make a diagram with 1.5 in the middle showing a 95% ‘accept’ region between 1.343 and 1.657 and two 2.5% ‘reject’ regions, one below 1.343 and one above 1.657. Since x 2.00 falls in the upper ‘reject’ region, reject the null hypothesis. Confidence Interval Method: The formula for a confidence interval for the mean is x z x , and a two-sided hypothesis 2 requires a two-sided confidence interval. The interval becomes x z x 2.00 1.960.08 2 2.00 0.157 , or we can write P1.843 2.157 .95 . Make a diagram – you should use 2.00 as the middle. To represent the confidence interval shade the area between 1.843 and 2.157. Since the null hypothesis mean of 2.0 does not fall on the confidence interval, the confidence interval and the null hypothesis contradict one another, reject the null hypothesis. Test ratio method using p-values. A p-value is a measure of the credibility of the null hypothesis and is defined as the probability that a test lower low statistic or ratio as extreme as or more extreme than the observed statistic or ratio could occur, high higher assuming that the null hypothesis is true. In this case, values are extreme, relative to the null hypothesis H 0 : 1.5 , if the sample mean is way above or way below 1.5. The easiest way to measure the probability is with the test ratio z x 0 x . We find the value of z, call it z 1 . If we are doing a 2-sided test and z 1 is a positive number, we must find both Pz z1 and Pz z1 . These two numbers are identical except for sign. So if z 1 is a negative number, we must find 2Pz z1 and if z 1 is a positive 2.0 1.5 6.25 , so 0.08 p value 2Pz z1 2Pz 6.25 . To find Pz 6.25 we use the Normal table. To do this, make a diagram of the standardized Normal distribution and shade the area above 6.25. The Normal table says: number, we must find 2Pz z1 . We already know that z if z 0 is 3.90 and up P0 z z0 is .5000. So Pz 6.25 Pz 0 P0 z 6.25 .5 .5000 0 and p value 2Pz z1 2Pz 6.25 20 0. We could amplify this by saying p value 2Px 2.00 2Pz 6.25 . If we are using a 5% significance level, we can say that the p-value is below .05 and reject the null hypothesis. If we need a second example of this, assume that we are testing the same hypotheses, but find that x 1.30 . Since this value of the sample mean is below 0 1.5 , we will look at the probability that the 1.30 1.5 sample mean is below or lower than 1.30. p value 2 Px 1.30 2 P z 2 Pz 2.50 . 0.08 Again make a diagram of the standardized Normal distribution and shade the area below -2.50. The Normal table says that P0 z 2.50 .4938 , so we have Pz 2.50 Pz 0 P2.50 z 0 .5 .4938 .0062 and p value 2Px 1.30 2Pz 2.50 2.0062 .0124 . If we are using a 5% significance level, we can say that the p-value is below .05 and reject the null hypothesis, but, if we are using a 1% significance level, we cannot reject the null hypothesis.