Notes Mar 2, 2003 What does the hypothesis testing method do? It uses data from a sample to judge whether or not a statement about a population may be true. 11.1 Formulating Hypothesis Statements Many of the questions that researchers ask can be expressed as questions about which of two statements might be true for a population. For example: 1) Do female students study, on average, more than male students do? 2) Does a new drug have smaller side effects than the old one does? 3) Do smokers tend to drink more as well? 4) Is the proportion of the male students who have at least one tattoo different from the proportion of the female students who have at least one tattoo? All these questions can be answered with a “yes” or “no”, and each possible answer is a specific statement about a situation. For instance, for question 2), we can break the question into 2 competing hypothesis: a. That new drug has smaller side effects than the old one does. b. That new drug does not have smaller side effects than the old one does. In statistics, the two possible answers are call the null hypothesis and alternative hypothesis. The null hypothesis, represented by the symbol H0, is a statement that there is nothing happening. Generally, we hope to disprove or reject the null hypothesis. The alternative hypothesis, represented by the symbol Ha, is a statement that something is happening. In most situations, we hope to prove the alternative hypothesis is right. For instance, in Example 2), the null hypothesis, H0, is “the new drug does not have smaller side effects than the old one does”. This is the statement that we assume “nothing is happening”; Page 1 of 6 The alternative hypothesis, Ha, is “the new drug does have smaller side effects than the old one does”. This is the statement that we assume “something is happening”. Question 1: Write down the null and alternative hypothesis for example 1), 3), and 4). As we might notice, example 1), 2), and 3) have the alternative hypotheses that test “if something is greater (or smaller) than the other”. These alternative hypotheses include values in one direction only (either greater, or smaller, but not both). We call this kind of test one-sided or one tailed hypothesis tests. Example 4) asks if one proportion is different from the other. Therefore, this proportion, as long as it is different from the other proportion, can be either greater, or smaller than the other one. Alternative hypothesis like this includes values in either direction from a specific standard. We call them two-sided or two-tailed hypothesis test. The logic of hypothesis testing: what if the null is true? In hypothesis testing, we always assume that the null hypothesis a possible truth until the sample data conclusively demonstrates otherwise. 11.3 Deciding Between the Two Hypotheses Page 2 of 6 Two terms we should pay attention to: 1. Test statistic is the data summary that we use to evaluate the two hypotheses. 2. p-value is used to describe the likelihood that we would have observed what we did, or something even more extreme, if the null hypothesis is true. NOTE: This is the second time we meet the notion “p-value”. The first time we introduced p-value to decide whether the relationship between 2 variables is statistically significant. This time, p-value is computed by assuming the null hypothesis is true and then determining the probability of a result as extreme (or more extreme) as the observed test statistic in the direction of the alternative hypothesis. In general, a test statistic is simply a summary that compares the sample data to the null hypothesis. The chi-square statistic we mentioned before is a special case of a test statistic. We also need to emphasize that a p-value does not tell us the probability that the null hypothesis is true. Instead, it only tells us the probability that our test statistic could have been as extreme as it is, if we assume the null hypothesis is true. In order to introduce the idea of rejecting the null hypothesis, we need two more terms first: 1. Statistically significant is used to describe the result when the researcher has decided that the p-value is small enough to decide in favor of the alternative hypothesis. 2. Level of significance, also called the level, is the borderline for deciding that the p-value is small enough to justify choosing the alternative hypothesis. A result is statistically significant when the p-value is less than the chose level of significance. Again, we usually choose 0.05 as our level of significance. We can summarize the relationship between p-value and hypothesis testing using the following table: p-value small (usually less than 0.05) p-value is not small (usually larger than 0.05) Reject the null hypothesis, or Do not reject the null hypothesis. equivalently, we accept the alternative hypothesis. Page 3 of 6 Question 2: For example 1), suppose we know that the average study time of male students, male, is 13.5 hours per week. Let female denoted as the average study time of female students. Write down the formal two hypothesis statements. Moreover, assume that the p-value we calculate is 0.12. Using significance level 0.05, what should be our conclusion? Interpret the meaning of p-value in this case. 11.4 Testing Hypothesis about a proportion NOTE: The formulas in this section are specific to test proportions, but the basic steps of hypothesis testing are the same in any setting. Steps in any hypothesis test: 1. Determine the null and alternative hypothesis; 2. Summarize the data into an appropriate test statistic; 3. Assuming the null hypothesis is true, find the p-value; 4. Decide whether or not the result is statistically significant based on the p-value. So far, we have discussed step 1 in previous sections. For step 2, when we have a sufficient large random sample, we can use a “z-test” to examine hypotheses about a population proportion. The corresponding test statistic is called z-statistic or z-value. A “sufficient large” random sample is one for which both np0 and 1 – np0 are at least 10, where p0 is the value of the population proportion specified in the null hypothesis. Page 4 of 6 Recall: The 2 sufficient conditions to use normal approximation rule for the distribution of sample proportion. Most software will provide z-statistic. How to calculate z-statistic by hand? Z= p̂ p 0 sample estimate null value = , with standard error standard error standard error = p 0 (1 p 0 ) . n 1. p̂ represents the sample estimate of the proportion; 2. p0 represents the specific value in the null hypothesis; 3. n is the sample size. In order to use z-test, we must make certain assumptions. They are: 1. The sample should be a random sample from the population. 2. The quantities np0 and 1 – np0 should be at least 10. Example: Suppose the present success rate in the treatment of a particular psychiatric disorder is 0.65 (65%). A research group hopes to demonstrate that the success rate of a new treatment will be better than this standard. Suppose we look at 200 patients and find that 140 have success with the new treatment. Example Minitab Output Test and CI for One Proportion Test of p = 0.65 vs p > 0.65 Sample 1 X 140 N 200 Sample p 0.700000 95.0% Lower Bound 0.646701 Z-Value 1.48 P-Value 0.069 Using the 4 steps we mentioned previously, we have: a. H0: p = 0.65 vs. Ha: p > 0.65; b. z-test will be appropriate to use in this case (why?). The z-statistic for our problem is 1.48 (how do you compute it?). c. The p-value is 0.069 from the Minitab output. Page 5 of 6 d. Since p-value is greater than 0.069, we claim the result of the new treatment is not statistically significant at 0.05 significant level. Hence, we cannot reject the null, and cannot make the conclusion that the new treatment is more effective. Question 3: Decide if you can use z-test on the following cases. If yes, determine the zstatistic as well. 1. In order to test whether the proportion of the students who owns a PC in PSU is 0.6, we draw a random sample of 100 students from PSU. We then find 58 of them have their own PCs. 2. In order to test whether the proportion of the students who own a PC in PSU is 0.6, we draw a random sample of 20 students from PSU. We then find 13 of them have their own PCs. 3. In order to test whether the proportion of the students who own a PC in PSU is 0.6, we use a stat 200 class as a sample (assume they have 50 people in their class). We then find out 31 of them have their own PCs. Page 6 of 6