Exam #1 Wednesday, July 18th Questions about the Assignment For Part II, only one person calculated the 95% confidence interval using the standard error method. Exam will be written and in-class. They will be a combination of defining terms, solving problems, and interpreting/discussing results. Exams will be closed book and closed computer. You may bring a (non-cell phone) calculator and one double-sided 8 ½ x 11 page of notes. You must prepare this page of notes yourself and submit it along with your exam. There will be no make-up for exams. If an exam must be missed, absence must be officially excused in advance. Hypothesis Testing II Hypothesis Testing II What is the probability that our observed outcome could have occurred by random chance? Randomization distribution p-value Statistical significance Exercise and Gender Study p-value www.lock5stat.com/statkey/ We use the randomization sampling distribution to calculate the p-value of the observed sample statistic. The p-value is the probability of getting a sample statistic as extreme as the observed sample statistic, just by random chance, if the null hypothesis is true. The smaller the p-value (i.e., the smaller the probability), the stronger the evidence is against the H0 and in favor of the Ha. p-value Right Tail Enter 3 The p-value is the proportion of randomization sample statistics that are as extreme as our observed sample statistic. You could get the pvalue by counting the red dots. 1 Exercise and Gender Study p-value www.lock5stat.com/statkey/ If time spent exercising did not differ by gender, we would see a difference in sample means as extreme as 3 hours in about 10% of our studies. p-value Example: The observed sample statistic from Study A has a p-value of 0.002 and the observed sample statistic from Study B has a p-value of 0.2. Which study provides stronger evidence against the null hypothesis? The lower the p-value, the stronger the evidence against the A. Study A null hypothesis. B. Study B C. Study A and Study B provide equally strong evidence Right Tail Enter 3 An Experiment on Cocaine Addiction Cocaine Addiction Research Question: Is Desipramine effective at treating cocaine addiction? Research Question: Is Desipramine effective at treating cocaine addiction? In a randomized experiment on treating cocaine addiction 48 cocaine addicts were randomly assigned to take either Desipramine (a new drug) or a placebo. Then they were followed to see who relapsed. The null hypothesis is the claim that there is no effect or no difference What would be our null hypothesis for this experiment? Desipramine is equally effective as a placebo at treating cocaine addiction. Sample size (n) = 48 cocaine addicts H0: pD = pP (or pD – pP = 0) Two Variables: Treatment given: Desipramine or a placebo Outcome: Relapsed or No Relapse The alternative hypothesis is the claim that we seek evidence for. What are the sample statistics we need for this study? ̂ D: The proportion of people treated with Desipramine who relapsed ̂ P: The proportion of people treated with a placebo who relapsed What would be our alternative hypothesis? Desipramine is more effective than a placebo at treating cocaine addiction. Ha: pD < pP (or pD – pP < 0) Conducting the Experiment Conducting the Experiment P P P P P P P P P P P P 1. Randomly assign participants to treatment groups P P P P P P P P P P P P 2. Carrying out the treatment phase for both groups P P P P P P P P P P P P 3. Observe relapse counts in each group P P P P P P P P P P P P R = Relapsed N = No Relapse 1. Randomly assign participants to treatment groups Desipramine Placebo Observed Sample Statistic Desipramine P P P P P P P P R R R R R R P P P P P P P P P P P P R R R R N R N R P P P P P P P P P P P P N R N R N N N N P P P P P P P P P P P P N N N N N N 24 Participants 24 Participants 10 relapsed, 14 no relapse = ̂ = ̂P = –.416 Placebo R R R R R R R R R R R R R R R R R R N N N N N N 20 relapsed, 4 no relapse 2 Cocaine Addiction Measuring Evidence against H0 Research Question: Is Desipramine effective at treating cocaine addiction? Two options: 1. H0 is true (Desipramine and the placebo cause the same proportion of relapses) To see if an observed sample statistic provides evidence against H0, we need to see what kind of sample statistics we would observe, just by random chance, if H0 was true. 2. Ha is true (Desipramine causes a smaller proportion of relapses than a placebo) If H0 is true, how would you explain the observed difference in the sample proportion of relapses? The observed difference in the sample proportion of relapses could have reasonably happen by chance. How can we determine the probability that our observed sample statistic could have occurred by random chance? Cocaine Addiction Statistical Test Cocaine Addiction Randomization Process The sample size of our observed sample is 48. Research Question: Is Desipramine effective at treating cocaine addiction? Observed sample statistic: ̂ ̂ P = -.416 (Diff. in sample proportions) Imagine having 48 pieces of paper. 30 pieces have an “R” on it and 18 have an “N” on it. This corresponds with the total number of Relapsers and Nonrelapsers in our experiment (i.e., observed sample). How unusual would it be to observe this sample statistic by random chance if the null hypothesis was true (i.e., ̂ ̂ P = 0)? We want to generate samples where the null hypothesis is true (i.e., Desipramine is equally effective as a placebo at treating cocaine addiction). What is the probability that we would observe, by random chance, a difference in sample proportions as large as .416 if Desipramine is equally effective as a placebo at treating cocaine addiction? To do this we can randomly assign each piece of paper to a treatment group. To answer this question we need a distribution of sample statistics that would occur if the null hypothesis was true. Then we would calculate the sample statistic (i.e., difference in sample proportions) for this randomization sample. To be consistent with our observed sample, we’d randomly assign 24 pieces to the Desipramine group and 24 pieces to the placebo group. We can generate these sample statistics using the randomization process. Create a Randomization Sample Create a Randomization Sample Our Observed Sample R R R R R R R R R R R R R R R R N N R R R R R R N N N N N N R R R R R R N N N N N N R R N N N N Randomization Sample Statistic Desipramine R R R R R R R R R R R R R N R N R R R R N N R R R R R R R R R R R R N N N N N N R R R R R R R N R R R N N N N N N N R R N N N N R N N N R R 10 relapsed, 14 no relapse 20 relapsed, 4 no relapse 16 relapsed, 8 no relapse = ̂ = Placebo ̂P = .084 N N N R N R R N N R N R N R R R R N R R R R 14 relapsed, 10 no relapse 3 Create a Randomization Sampling Distribution Repeat this process 1,000 times to obtain 1,000 randomization sample statistics to form a randomization sampling distribution. Randomization Sample Statistic Desipramine R R R R R R R N R R N N R R N R N R R N R N R R = ̂ = ̂P = .166 17 relapsed, 7 no relapse R N R N R R R R R N N N N N R N N R R R N N R R 13 relapsed, 11 no relapse www.lock5stat.com/statkey/ The p-value is the area in the tail(s) beyond the observed sample statistic in the randomization sampling distribution. p-value of the observed sample statistic The observed sample statistic www.lock5stat.com/statkey/ Placebo Cocaine Addiction Proportion of randomization sample statistics as extreme as the observed sample statistic Cocaine Addiction Which tail(s) to include (i.e., lefttail, right-tail, or “two-tail”) depends on the alternative hypothesis. Exercise and Gender A Two-Tail Test Research Question: Among college students, does one gender spend more time exercising than the other? What are the parameters of interest? = mean number of hours male students spend exercising = mean number of hours female students spend exercising What is the H0 and Ha? H0: 0 Time spent exercising does not differ by gender. Ha: 0 Time spent exercising does differ by gender. Proportion of randomization sample statistics as extreme as the observed sample statistic p-value of the observed sample statistic The probability of getting a sample difference in proportions as low as -0.416 just by random chance, if Desipramine is equally effective as a placebo, is 0.003 The observed sample statistic Alternative Hypothesis The alternative hypothesis is determined by the research question. A one-sided Ha contains either > or < A two-sided Ha contains ≠ For a one-sided Ha, the p-value is the proportion of randomization sample statistics in the tail specified by Ha (i.e., < → left-tail and > →right-tail). For a two-sided Ha, the p-value is the proportion of randomization sample statistics in both tails. Exercise and Gender www.lock5stat.com/statkey p-value = 2 x .109 = 0.218 Little evidence against H0 Do not reject H0 Conclusion: This study does not provide adequate evidence that there is any association between gender and exercise times among college students. Think: A result this extreme would happen about 22% of the time just by random chance if H0 were true, so this study does not provide adequate evidence against H0. 4 Strength of Evidence Hypothesis Testing The p-value is the probability of getting results as extreme as our observed sample statistic, if the null hypothesis is true. If the p-value is small enough, we reject the null hypothesis, in favor of the alternative hypothesis The p-value measures our evidence against the null hypothesis. How small is small enough? .01 .05 p-values .10 →1 The smaller the p-value, the smaller the proportion of randomization sample statistics as extreme as our sample statistic. The smaller the p-value, the stronger the evidence against H0. Statistical Significance The significance level ( ) is the threshold (e.g., .05, .01) below which the p-value is deemed small enough to reject the null hypothesis. If the p-value is less than the threshold, the results are statistically significant, and we reject the null hypothesis in favor of the alternative hypothesis. When the proportion of randomization sample statistics as extreme as our observed sample statistic is less than (e.g., .05, .01), we say that our observed sample statistic is “statistically significant”. Saying that our observed sample statistic is statistically significant, means that we have convincing evidence against H0 (and for Ha) Formal Decisions A formal hypothesis test has only two possible conclusions: 1. If the p-value is : Reject the null hypothesis in favor of the alternative. Statistical Conclusions Strength of evidence against H0: .01 .05 p-values .10 →1 Formal decision of hypothesis test [based on = 0.05]: .01 .05 statistically significant [p-value ] .10 →1 not statistically significant [p-value ] Assignment Part I: 4.52, 4.76, and 4.84 Hint for #4.84: A correlation (r) between two variables is a type of sample statistic Part II: See Next Slide 2. If the p-value is : Do not reject the null hypothesis. 5 Assignment Obtaining Proportions from the GSS Part II: (Type up this assignment in a Word document) [Worth 100 points] Construct a research question that uses the following GSS variables DIVORCE and SEX. Provide the symbol and value for the sample mean/sample proportion for each variable. Provide the symbol and value for the sample statistic you’ll be testing. State your null hypothesis in words and with an equation. State your alternative hypothesis in words and with an equation. Indicate whether this will be a left-tail, right-tail or two-tail test. Use StatKey to generate a randomization sampling distribution where the H0 is true. (Provide a screen shot of your randomization sampling distribution) Calculate and interpret the p-value for your observed sample statistic. Assess the strength of evidence this data provides against H0 Select a significance level and make a formal decision based on the significance level Interpret/explain the results/conclusions of your study. Hint: This is similar to the Cocaine study (i.e., difference in proportions) Entering Data into StatKey to Create a Randomization Sampling Distribution Click this button to enter your data and this window will pop up. To compare proportions across two variables. Enter the first variable here and the second variable here. Uncheck the “Weighted” box Check the “Unweighted” box Click this button and the values/statistics needed to calculate the difference in sample proportions will open up in a new window. Identifying where your Observed Sample Statistic fits on the Randomization Sampling Distribution Once you’ve created your randomization sampling distribution. Check the appropriate tail test box. To see where your observed sample statistic fits on this distribution by click here. This window will pop up and you can enter the value for your observed sample statistic here. Summary A randomization sampling distribution shows the distribution of statistics that would be observed if H0 was true. A p-value is the probability of getting a sample statistic as extreme as the observed sample statistic, just by random chance, if H0 is true. The p-value measures the strength of evidence against H0. Results are statistically significant if the p-value is < α (the significance level). In making formal decisions, reject H0 if the p-value < α; otherwise do not reject H0. Hypothesis Testing 1. Construct research question 2. Define the parameter(s) of interest 3. State H0 and Ha 4. Set significance level ( ) [usually 0.05 if unspecified] 5. Collect data 6. Generate descriptive statistics 7. Calculate the appropriate observed sample statistic 8. Create a randomization sampling distribution (where H0 is true) 9. Calculate the p-value of the observed sample statistic 10.Assess the strength of evidence against H0 11.Make a formal decision based on the significance level 12.Interpret the conclusion in context 6 Randomized Experiments In randomized experiments the “randomness” is the random allocation of cases to treatment groups. If the null hypothesis is true, it doesn’t make any difference which treatment group a respondent gets placed in. Generate randomization samples assuming H0 is true by reallocating units to treatment groups, and keeping the response values the same. Formal Decisions Reject H0 if observing a sample statistic so extreme is unlikely when H0 is true. This means that the observed sample data provides strong evidence to support Ha. Do not reject H0 if observing a sample statistic is likely when H0 is true. This means that the observed sample data does not provide strong enough evidence to reject H0 (and support Ha) For a given significance level () p-value < Reject H0 p-value > Do not Reject H0 Elephant Example The mystery animal X is unknown, so we set up the following hypothesis test: H0: X is an elephant Ha: X is not an elephant What would you conclude, if you had the following data? X has four legs X walks on two Since it remains plausible that X could be an elephant, we don’t have enough evidence to reject H0. However, with this data we also cannot accept H0 and conclude that X is an elephant. legs Since it is highly unusual for an elephant to walk on two legs, we can reject H0 and conclude that X is probably not an elephant. Randomization Process Through the randomization process we can generate a randomization sampling distribution which is the distribution of sample statistics we would observe, just by random chance, if the null hypothesis was true. 1. Simulate many randomization samples, assuming H0 is true. 2. For each randomization sample, calculate the randomization sample statistic. 3. These randomization sample statistics form a randomization sampling distribution. 4. Find the proportion of these randomization sample statistics that are as extreme as our observed sample statistic. Statistical Significance www.xkcd.com 7