Mention Errors! Statistical vs Practical Significance. Introduction to Hypothesis Testing or Statistical Inference Pre-Lecture Items Recent research has suggested that dogs may be helpful as a supplement to standard medical diagnostic tests in detecting if a person has cancer. Naturally there are doubters. With this in mind, consider the following: An experiment was conducted to analyze a dog's ability to detect the correct urine specimen of a person with cancer in comparison to what would be expected by random quessing. Six (6) dogs were run through nine (9) separate trials (this makes 54 total trials) where in each trial a dog was presented with six (6) urine samples: of these 6, one(1) was from a bladder cancer patient while the remaining five (5) were clean. Questions to ponder: 1. For each trial, the probability of guessing correctly would be 1/6 since there was one of six samples that was from a cancer patient. So for 54 trials, about how many would you expect to get right just from random quessing? 2. Based on your answer to question 1, if you as the experimenter wanted to demonstrate that dogs could detect cancer (i.e. better than random guessing), how many would the dogs need to get correct in order for you to believe they did significantly better than random quessing? What if you were trying to show that dogs were NOT better than random quesssing? 3. In just such an experiment, the dogs got 22 out of 54 (about 40%) correct. Assuming that to start we couldn't say the dogs were better (or worse) than random guessing, what do you think the chances (i.e. probabiliy) was for the dogs getting at least 22 out of 54 correct? Do you think this result would be likely or unlikely, again keeping in mind that we start from the assumption that they are NOT different from random guessing? 4. If 540 instead of 54 trials were conducted would the differences you chose in question 2 change much (e.g. if you said you would need the dogs to get at least 5 more correct than what is expected by guessing would you still say 5 for the longer trial size)? Statistical inference – drawing conclusions about our population based on our sample statistics. Last lesson we constructed 1-proportion and 1-mean confidence intervals to estimate the true population proportion or true population mean. Now we will introduce hypothesis tests for 1proportion and 1-mean. Five Steps in a Hypothesis Test 1. 2. 3. 4. Write null and alternative hypotheses. Set a level of significance called alpha Calculate an appropriate test statistic. Determine a p-value associated with the test statistic. 1 5. Decide between the null and alternative hypotheses and state a "real world" conclusion. Step 1: TERMINOLOGY: A statistical hypothesis test is a procedure for deciding between two possible statements about a population. The phrase significance test means the same thing as the phrase "hypothesis test." The two competing statements about a population are called the null hypothesis and the alternative hypothesis. A typical null hypothesis, Ho, is a statement that two variables are not related. Other examples are statements that there is no difference between two groups (or treatments) or that there is no difference from an existing standard value. An alternative hypothesis, Ha, is a statement that there is a relationship between two variables or there is a difference between two groups or there is a difference from a previous or existing standard. Considering the pre-lecture scenario, as the experiment what would you construct as the null and alternative hypotheses? Ho: The dogs were no different from random guessing in identifying cancer patients Ha: There is a difference between dogs and random guessing in identifying cancer patients. NOTATION: The notation Ho represents a null hypothesis and Ha represents an alternative hypothesis. The possible hypotheses statements are: 1-Proportion Ho: p = po Ha: p ≠ po or Ha: p > po or Ha: p < po [Remember, only select one Ha] or Ha: u > uo or Ha: u < uo [Remember, only select one Ha] 1-mean Ho: u = uo Ha: u ≠ uo The first Ha is called a two-sided test since "not equal" implies that the true value could be either greater than or less than the hypothesized value. This two sided alternative is the most common set up. However, the other two Ha are referred to as one-sided tests since they are restricting the conclusion to a specific side of an hypothesized value. 2 Returning to our pre-lecture scenario, we have by random guessing a 1/6 chance of correctly identifying a patient. Since this would be categorical data (i.e. for each trial we would either say the dog was "correct" or "incorrect" and then calculate the proportion of the total the dogs go correct), using our new notation we have: Ho: p = 1/6 Ha: p ≠ 1/6 Special Note: po can be any value from 0 to 1. E.g. in the lab activity where we analyzed the proportion of students who smoke cigarettes and compared this interval to the U.S. Dept of Health’s statement that 24% of U.S. adults between 18 – 24 smoke, the po value is 0.24 Step 2 The level of significance, alpha (α), is a “cut-off” that is used to determine if a particular hypothesis test can be considered significant. For our class we will set this value to 0.05 or 5%. Referring to the pre-lecture this is the "how unlikely would our results have to be in order to conclude that this result was too unlikely" Step 3 The general test statistic format is: (sample statistic – hypothesized value)/S.E. 1-Proportion Z pˆ p0 p0 (1 po ) n 1-mean t X 0 S n Keep in mind that the use of these test statistics are based on the “rules” we discussed for confidence intervals. That is, for 1-proportion that the number of successes AND failures is at least 10; and for 1-mean that either the distribution of the population is approximately normal or if not that the sample size is at least 30 (i.e. the Central Limit Theorem). From the pre-lecture, the sample proportion, p̂ , is 22/54 = 0.4 This results in: Z .4 1 / 6 .233 4.56 1 / 6(1 1 / 6) 0.051 54 Step 4 Keep this in mind: The method for finding the p-value is based on the null hypothesis. Minitab will provide the p-value. If doing by hand, then find p-value from Table A1 for 1-Proportion and the T-table for 1-Mean. 3 Probability Value (p-value): the probability the data produces a result assuming the null hypothesis is true. Therefore the smaller the p-value the stronger the evidence against the null hypothesis. 1-Proportion For Ha: p ≠ po then p-value = 2*P(Z ≥ |z|) That is, find 1 – P(Z < |z|) and then multiply this p-value by 2. For Ha: p > po then p-value = P( Z ≥ z) For Ha: p < po then p-value = P( Z ≤ z) 1-Mean Keep in mind that Degrees of Freedom (DF) is N – 1 and that table values are representative of the area to the right of the absolute value of the t test statistic. a. Using the t test statistic from Step 3 go across the top row of Table A3 to locate test statistic. Usually the test stat will not be found but will be compared to the listed values (i.e. less than the first one, between two, or greater than the last one). b. After locating where the test statistic would “fall” in the table locate the row for the proper DF from N – 1. c. Get the p-value(s) from the table that correspond to column t-value(s) found in part a. d. If Ha is one sided use the p-value(s) in part c. If Ha is two-sided (i.e. not equal) then double the p-value(s) found in part c. Step 5 Decision Rule: If the p-value is less than alpha (i.e. 0.05) then reject the null hypothesis, Ho. If pvalue is greater than alpha then fail to reject Ho. Step 6 Put into words your decision. That is recap your p-value, decision and what this means in terms have concluding the alternative or not having enough evidence to conclude the alternative. EXAMPLES 1-Proportion Continuing with the pre-lecture example... Check conditions: note we use npo now instead of n*sample proportion. Step 1: 4 Ho: p = 1/6 Ha: p ≠ 1/6 Step 2: We set our level of significance, α, at 0.05 Step 3: Calculate our test statistic. Since our test is a one proportion test we use: Z .4 1 / 6 .233 4.56 1 / 6(1 1 / 6) 0.051 54 Step 4: We next get the p-value which is based on our alternative being "not equal" we use: 2*P(Z ≥ |z|) = 2*P(Z ≥ |4.56|) and from Z-table the closest we get to 4.56 is 3.49 which as area to the right of 0.0002 (from 1 - 0.9998). Then 2*0.0002 gets us 0.0004 So we know the p-value is less than 0.0004 The interpretation of this is that if dogs were no different than random chance (i.e. 1/6) in identifying cancer patients, then there would be less than a 0.0004 chance that this sample of dogs would correctly identify 40% or more. Pretty unlikely! Step 5: We compare this p-value to 0.05 and if less we reject Ho and if greater we fail to reject Ho. Here, p-value is less than 0.0004 which in turn is less than 0.05 requiring us to reject the null hypothesis. We had expected the dogs to get about 9 out of the 54 correct (1/6) if they were no different from random guessing. However, the dogs getting 22 out of 54 correct was just too unlikely a difference from 9 out of 54 to be due to simply chance. Step 6: Putting all this together into words: With our p-value being less than 0.0004, we reject the null hypothesis and conclude that dogs are better than random chance in identifying patients with cancer. 1-Mean Example (from Spring 2013 class survey): From www.collegedata.com 2011 PSU-UP graduates exited with an average loan debt of $33, 530. Using our survey as a random sample of all PSU-UP, do we have evidence that conflicts with this amount?? Check conditions: Sample size is 177 which is greater than 30. 5 Step 1: Ho: u = 33,530 Ha: u ≠ 33,530 Step 2: α = 0.05 Step 3: t X 0 36352 33530 0.68 S 54056 n 177 Step 4: First, our DF are 177-1 = 176 so use 100. Then going across the row for 100 we see that 0.68 is less than the first value of 1.29 which corresponds to a right tail probability from the top of this column of 0.100 Since the alternative hypothesis, Ha, is "not equal" the p-value is found we double this value to get a final p-value range of 0.200 to 1.00 Step 5: Using a significance value, α, of 0.05 we fail to reject the null hypothesis, Ho, since the p-value is greater than 0.05. Step 6: With a p-value of 0.004 to 0.012, we reject the null hypothesis and conclude there is enough evidence to say that the true mean amount that PSU undergraduates spend on textbooks a semester is different than $400. Also, since the sample mean was less, we could divide our pvalue by 2 and also conclude that the mean amount is statistically less than $400. 6