MAS1401 Handout 6 Hypothesis Tests for the Population Mean Similar arguments to those used to develop the idea of a confidence interval allow us to test a hypothesis that the population mean is equal to a particular value. For example suppose we were told that the average IQ of UK students was 115, and we were interested in whether or not the mean IQ of the MAS1401 students was different from this. Then the hypothesis that µ (the population mean IQ for MAS1401) is equal to 115 is known as the null hypothesis and denoted by H0. Here the null hypothesis is: H0: µ = 115. The other possibility, that the mean IQ of the MAS1401 students is not 115, is known as the alternative hypothesis, or the experimental hypothesis. Here the alternative hypothesis is: HA: µ ≠ 115. In order to investigate which hypothesis is true we can carry out a hypothesis test. In the example we are considering, the test we use is called a 1-sample t-test (because there is one sample of data, and we use the t-distribution!). MINITAB will carry this test out for us, using the procedure: Stat>Basic Statistics>1-sample t... Here, the outcome is p = 0.015. What does this mean? Well, the outcome of a hypothesis test is always a “p-value”. The p-value is a measure of probability. It tells us how likely we would be to see a sample as extreme as the one we have actually observed, if, in fact, the null hypothesis were true. Small p-values tell us that the sample we have observed would be unlikely to have occurred if the null hypothesis really is true. Therefore a small p-value constitutes evidence against the null hypothesis. For the IQs of the MAS1401 students, we had H0: µ = 115, and p = 0.015. This value of p is a small probability. It tells us that if the mean IQ really is 115 then our sample is very unusual. A much more plausible explanation is that H0 is in fact false, and the population mean is not 115, but higher than that. We use guidelines to interpret the result of a hypothesis test… Guidelines for interpreting p-values: If p > 0.05, we do not reject the null hypothesis at the 5% level. There is no evidence against the null hypothesis. If p < 0.05, we reject the null hypothesis at the 5% level. There is moderate evidence against the null hypothesis. If p < 0.01, we reject the null hypothesis at the 1% level. There is strong evidence against the null hypothesis. If p < 0.001, we reject the null hypothesis at the 0.1% level. There is very strong evidence against the null hypothesis. Note that these are guidelines only, and should not be interpreted as hard and fast rules when making decisions! Two independent samples Lets return to the haematocrit data we looked at in Practical 2. There were measurements on 126 women and 61 men. We would like to use these data to make comparisons between the population haematocrit distributions for females and males. The method is to construct confidence intervals for the difference between the mean haematocrit for women and men. It is also possible to carry out a hypothesis test for whether the difference between the means takes a particular value. The most commonly tested value is zero, since this amounts to a test of whether or not there is any difference between the population means. We won’t worry about the details, but we will use Minitab to carry out the work for us, using the 2-sample t-test option: Stat>Basic Statistics>2-sample t… Additional Notes: Two dependent samples In the haematocrit example, we called the samples independent because there was nothing to relate any particular female measurement to any particular male measurement. Sometimes we have a situation where every measurement in one group is related to a particular measurement in the other group. Animal Cholesterol Before: 1 2 3 4 5 6 7 8 210 217 208 215 202 209 207 210 Cholesterol After: 212 210 210 213 200 208 203 199 Note that each measurement of ‘cholesterol before’ is directly related to the ‘cholesterol after’ measurement immediately below it. We need to take this into account, but the two-sample t-test we just carried out doesn’t do this. Instead we must carry out a paired t-test (because the data are in pairs!) using the Minitab procedure: Stat>Basic Statistics>Paired t… Paired T-Test and CI: cholesterol before, cholesterol after Paired T for cholesterol before - cholesterol after cholesterol befo cholesterol afte Difference N 8 8 8 Mean 209.750 206.875 2.87500 StDev 4.652 5.463 4.42194 SE Mean 1.645 1.931 1.56339 95% CI for mean difference: (-0.82184, 6.57184) T-Test of mean difference = 0 (vs not = 0): T-Value = 1.84 P-Value = 0.109 We see that the 95% confidence interval for mean before – mean after is: (-0.82, 6.57). Note that this range of numbers includes the value zero, corresponding to the situation where the population mean cholesterol level is the same before and after treatment. Note that the null hypothesis is: H0: mean before – mean after = 0, which we could equivalently express as: H0: the drug is not effective. The result of the paired t-test is p = 0.109. This is not small, so there is no evidence against the null hypothesis, i.e. there is no evidence that this drug is effective. We must be careful how we interpret a non-significant result from a hypothesis test: • It is correct to say: “there is no evidence from this study that the drug is effective.” • It is not correct to say “there is evidence from this study that the drug is not effective.” • In general, absence of evidence does not automatically imply evidence of absence, and we should think carefully about how good our study is before we say it does! • In this example, collecting data from just 8 animals has not given our drug much chance to prove it’s worth. This study was not very powerful. Bigger samples means more power.