Intro to Hypothesis Tests – Stat 120, Spring 2023 Activity 1: Building intuition Background Evidence suggests that reward systems may operate in the opposite way from what is intended (e.g., ranking systems may decrease productivity; grading systems may not stimulate learning). To investigate this phenomenon, Amabile, T. M. (1985) designed a study to explore whether motivation type (intrinsic or extrinsic) impacted creativity scores. In this study, 47 creative writers were randomly assigned to one of two questionnaires where they ranked reasons they write. One questionnaire listed intrinsic motivations and the other listed extrinsic motivations. After completing the questionnaire, all subjects wrote a Haiku about laughter, which was graded for creativity by a panel of poets. The average rating (out of 40 points) for each subject was recorded. In this activity, you will explore whether there are discernible differences in creativity between the two groups. GROUP QUESTIONS 1. Is this an experimental or observational study? How do you know? 2. What research question is being investigated? That is, what is the objective of this study? 3. Write down the competing claims (i.e., hypotheses) being investigated. 4. Identify the response and explanatory variables. 5. What type of plot could you use to compare the two treatment groups? Why did you choose this plot type? How does this plot help you investigate the competing claims? 1 INDIVIDUAL QUESTIONS Please do not discuss your answers with your group until you start question #8. Below is a lineup of plots where one plot displays the observed data and the other 19 plots are “decoy” plots where there is no difference between the treatment groups. 1 2 3 4 5 6 7 8 30 20 10 30 20 Score 10 Treatment 9 10 11 12 30 Extrinsic Intrinsic 20 10 13 14 Extrinsic Intrinsic Extrinsic Intrinsic 15 16 Extrinsic Intrinsic Extrinsic Intrinsic 30 20 10 Treatment 6. Which plot do you think is the most different from the others? 7. What feature(s) of the plot led you to this choice? GROUP QUESTIONS 8. Which plot does your group think is the most different from the others? 9. What feature(s) of the plot led you to this choice? 2 The observed data are in plot # ______. 10. Did your group choose the observed (data) plot? 11. Based on your answer to question 10, what does this suggest about your competing claims? STOP HERE! We will have a large group discussion sharing the results and then the plot that displays the observed data will be revealed. Activity 1 Part 2: Permutation test Rather than using a “lineup” of plots, we will construct a randomization distribution to represent the behavior of the test statistic when the null hypothesis is in fact true. 1. Suppose we had slips of paper with each response written on it. How could we use these cards to simulate a new sample under the null hypothesis? 2. Create a randomization distribution using the StatKey app: • • • • • • Open a web browser and go to http://lock5stat.com/statkey. Click on the appropriate link to conduct your simulation (Test for difference in means). Go to https://aloy.rbind.io/data/CreativeWriting.csv. In StatKey, click on the button that says Edit Data. Highlight all of the data in the window and press Delete. Copy the data and paste it into the Edit Data window. Click Ok. 3. What is the value of the test statistic? This is given in the Original Sample section in the top right. 4. Click “Generate 1000 samples” to create a randomization distribution with 1000 samples. Describe what the plot of the randomization distribution represents. Note: If you hover over each dot the lower right section shows you the sample. 3 5. Where is the plot centered? Why does this make sense? 6. Where does the location of the sample statistic fall in the plot? 7. Does the observed test statistic appear to be rare? What does this indicate about the null hypothesis? 8. In StatKey, calculate the proportion of simulated statistics that are at least as far out in the tail as your observed test statistic: • Check the “Left Tail” checkbox (This direction is determined by Ha ) • Click on the box under the x-axis and enter the value of the observed test statistic. • The proportion will display in the text box above the x-axis. Because the simulation assumes the null hypothesis to be true, we sometimes refer to the distribution you simulated as the null distribution. When you determine the proportion of the results in the null distribution that are at least as extreme as the observed results, you are estimating a probability. We will refer to this probability as the p-value. We use the p-value to evaluate the strength of evidence against the null hypothesis. 4 Activity 2: Shargorodsky et al. (2010) compared the current hearing loss rate among teens (12-19 years) to previous years to see whether teen hearing loss is increasing, possibly due to heavier use of ear buds. They collected hearing loss data from 1771 teens in the NHANES 2005-6 study (333 with some level of hearing loss) and 2928 teens from NHANES III (1988-1994), with 480 showing some level of hearing loss. Is there a discernible difference between these two groups? 1. Identify the two populations and the variable being considered in this study. 2. Create a two-way (contingency) table displaying the results from the NHANES studies. Have columns represent the time span and the rows represent the level of hearing loss (some vs. no hearing loss). 3. Write a sentence or two comparing the distributions of hearing loss between these two studies. Be sure to report an appropriate statistic. 4. Do these data convince you that there is a difference in the population proportions? If not, what could be another explanation for the difference you see in these numerical and graphical summaries for these two samples? 5. Let p94 represent the proportion of all American teenagers in 1994 with at least some hearing loss, and similarly for p06 . Define the parameter of interest to be p94 − p06 , the difference in the population proportions between these two years. State appropriate null and alternative hypotheses about this parameter to reflect the researchers’ conjecture that hearing loss by teens is becoming more prevalent. 5 6. Create a randomization distribution using http://lock5stat.com/statkey: • Click on the appropriate link to conduct your simulation (Test for difference in means). • In StatKey, click on the button that says Edit Data. • Enter the count and sample size for each group and press Ok. Sketch a plot of the randomization distribution below. 7. Describe what the plot of the randomization distribution represents 8. Where is the plot centered? Why does this make sense? 9. Where does the location of the sample statistic fall in the plot? 10. Based on your answer to the previous question, would you expect to see a small or large p-value? 11. Use StatKey to calculate the p-value and report it. What information does the p-value tell us? 12. Provide an answer to the research question in context. 6