Example: Helper vs. Hinderer In a study reported in the November 2007 issue of Nature, researchers investigated whether infants take into account an individual’s actions towards others in evaluating that individual as appealing or aversive, perhaps laying the foundation for social interaction. (Hamlin, Wynn, and Bloom, 2007). In one component of the study, sixteen 10-month old infants were shown a “climber” character (a piece of wood with “google” eyes glued onto it) that could not make it up a hill in two tries. Then they were shown two scenarios for the climber’s next try, one where the climber was pushed to the top of the hill by another character (“helper”) and one where the climber was pushed back down the hill by another character (“hinderer”). The infant was alternately shown these two scenarios several times. Then the child was presented with both pieces of wood (the helper and the hinderer) and was asked to pick one to play with. The color and shape and order (left/right) of the toys was varied and balanced among the 16 infants. A video of the experiment can be found at the following link: http://www.yale.edu/infantlab/socialevaluation/Helper-Hinderer.html. Sources: Introducing Concepts of Statistical Inference. Rossman, Chance, Cobb, and Holcomb. NSF/DUE/CCLI # 0633349 Hamlin, J. Kiley, Karen Wynn, and Paul Bloom. “Social evaluation by preverbal infants.” November 22, 2007. Nature, Volume 150. Research Question – Do 10-month old infants tend to prefer the helper toy over the hinderer toy? Questions: 1. Why is it important for the researchers to balance the color, shape and order of the toys across the study? For example, how would the study results have been affected if the researchers always made the helper toy a blue circle and the hinderer toy a yellow triangle? 2. Identify the population of interest in the study. 3. Identify the sample in the study. 4. What is the single categorical variable of interest in the study? 5. Recall that the study involves 16 10-month old infants. If the population of all 10-month old infants has no real preference for one toy over the other, how many infants would you expect to choose the helper toy? Explain. 1 6. Suppose that 10 out of the 16 infants chose the helper toy (62.5%). Since this value is more than what is expected (50%), a researcher argues that these data show that the majority of ALL 10-month old infants would choose the helper toy. What is wrong with this reasoning? Once again, the key question is how to determine whether the study’s result is surprising under the assumption that there is no real preference for one toy over the other in the population of all 10-monthold infants. To answer this, we will simulate the process of 16 infants simply choosing a toy at random, over and over again. Each time we simulate the process, we’ll keep track of how many infants out of the 16 chose the helper toy (note that you could also keep track of the number that chose the hinderer toy). Once we’ve repeated this process a large number of times, we’ll have a pretty good sense for what outcomes would be very surprising, or somewhat surprising, or not so surprising if the population of all 10-month-old infants has no real preference. Use the instructions outlined in previously to carry out the Tinkerplots simulation. Note that you will have to revise a few elements of the simulation that relate to the following questions: What are the two possible outcomes on each of the trials? Change the values on your spinner accordingly. What is the probability that each outcome occurs, given that the population of all 10-month-old infants has no real preference for either toy? Change your spinner accordingly. Be sure to change the Draw value to 1 since only one infant is choosing a toy at a time. How many infants were used in this study? Keep this value in mind when setting the Repeat value. Carry out the simulation study 1000 times overall, keeping track of the number of infants that choose the helper toy in each trial of the simulation. Sketch in your results below: 2 Questions: 7. What does each dot in the plot represent? 8. Suppose that in the actual study 10 out of 16 infants chose the helper toy. Would this convince you that the majority of the population of all 10-month old infants had a preference for the helper toy? Why or why not? 9. The actual study results are as follows: 14 out of 16 infants chose the helper toy. Based on this statistical investigation, what should the researchers conclude? Example: Bone density Forty percent of postmenopausal women are diagnosed with low bone density (osteopenia), placing them at risk for osteoporosis with ensuing spontaneous fractures. Osteoporosis is estimated to cost $14 billion per year in medical expenses alone, and yet it can be prevented if treated early enough. A new test for detecting the early stages of low bone density was developed, and allows postmenopausal women susceptible to osteoporosis to receive treatment early in order to prevent the full blown onset of the disease. A random sample of 248 postmenopausal women were given the new test and treated accordingly, and 82 were diagnosed with low bone density ten years later. Research Question – Is there evidence that the new test (which allows for early detection of osteoporosis) reduces the percentage of postmenopausal women diagnosed with low bone density? Questions: 10. Identify the population of interest in the study. 11. Identify the sample in the study. 12. What is the single categorical variable of interest in the study? 13. If the new test does not allow for early detection and ultimately prevention of low bone density, how many postmenopausal women would you expect to be diagnosed with low bone density? Explain. 3 Once again, the key question is how to determine whether the study’s result is surprising under the assumption that the new test does not aid in early detection and prevention of low bone density. To answer this, we will simulate the process of 248 postmenopausal being tested for low bone density, over and over again. Each time we simulate the process, we’ll keep track of how many postmenopausal women out of the 248 get diagnosed with low bone density (note that you could also keep track of the number that do not get diagnosed with low bone density). Once we’ve repeated this process a large number of times, we’ll have a pretty good sense for what outcomes would be very surprising, or somewhat surprising, or not so surprising if the new test is not better at early detection and prevention of low bone density. Use the instructions outlined in previously to carry out the Tinkerplots simulation. Note that you will have to revise a few elements of the simulation that relate to the following questions: What are the two possible outcomes on each of the trials? Change the values on your spinner accordingly. What is the probability that each outcome occurs, given that test is not better at detecting low bone density? Change your spinner accordingly. Be sure to change the Draw value to 1 since only woman is being tested at a time. How many postmenopausal women were used in this study? Keep this value in mind when setting the Repeat value. Carry out the simulation study 1000 times overall, keeping track of the number of postmenopausal women diagnosed with low bone density in each trial of the simulation. Sketch in your results below: Questions: 14. What does each dot in the plot represent? 15. Recall, 82 out of the 248 postmenopausal women were diagnosed with low bone density. Based on this statistical investigation, what should the researchers conclude? 4 Formal Hypothesis Testing In the previous examples, we have used a logical process to make statistical inferences in problems involving a single categorical variable. Next, we will add more structure to these statistical investigations by introducing a procedure known as _______________________________________. Before we discuss this procedure, we need a few more definitions. In each of the previous examples, we tested a claim about a population parameter of interest. Parameter – A numerical descriptive measure of a ___________________. This value is almost always unknown, and our goal is to estimate this parameter or test claims regarding it. Statistic – A numerical descriptive measure of a ___________________. This value is calculated from the observed data. Example Statistic Parameter Auto Accident Numbness Helper vs. Hinderer Bone Density Hypothesis testing is a procedure, based on sample evidence and probability, used to test a claim regarding a population parameter. The test will measure how well our observed sample statistic agrees with some assumption about this population parameter. Before you begin a hypothesis test, you should clearly state your research question. For instance, let’s reconsider the research question from three of our previous examples. Example Auto Accident Numbness Helper vs. Hinderer Bone Density Research Question Is the patient intentionally answering incorrectly? Do 10-month-old infants tend to prefer the helper toy over the hinderer toy? Does the new test reduce the percentage of postmenopausal women diagnosed with low bone density? 5 Setting Up the Null and Alternative Hypothesis The null hypothesis, Ho, is what we will assume to be true, and we will evaluate the observed data from our study against what we expected to see under the null hypothesis. This will always contain a statement saying that the population parameter is equal to some value. The alternative hypothesis, Ha, is what we are trying to show. Therefore, the research question is restated here in the alternative hypothesis. This will always contain statements of inequality, saying that the population parameter is less than, greater than, or different from the value in the null hypothesis. For our three examples, the null and alternative hypotheses are shown below. Research Question Is the patient intentionally answering incorrectly? Hypotheses H0: The patient is not intentionally answering incorrectly. Ha: The patient is intentionally answering incorrectly. Do 10-month-old infants tend to prefer the helper toy over the hinderer toy? Does the new test reduce the percentage of postmenopausal women diagnosed with low bone density? Ho: There is no preference for one toy over the other in the population of all 10-month-olds. Ha: The majority of all 10-month-old infants prefer the helper toy. H0: The new test is no better than the previous tests in diagnosing low bone density, i.e. the percentage diagnosed with low bone density is the same. Ha: The new test is better, i.e. the percentage diagnosed with low bone density has decreased. Note that we can also state these hypotheses in terms of the population parameter of interest: Research Question Hypotheses Ho: Is the patient intentionally answering incorrectly? Ha: 6 Ho: Do 10-month-old infants tend to prefer the helper toy over the hinderer toy? Does the new test reduce the percentage of postmenopausal women diagnosed with low bone density? Ha: Ho: Ha: Evaluating Evidence Using P-Values In in each of our three examples, we assumed the null hypothesis was true when setting up our spinner for the Tinkerplots investigation. Then, we used the results simulated under this scenario to help us decide whether observing results such as our sample data would be an unusual event if the null hypothesis were true. Up to this point, whether an observed result was considered unusual (or extreme) has been a rather subjective decision. Now, we will discuss the guidelines used by statisticians to determine whether an observed result is extreme enough under the null hypothesis for us to conclude that the evidence supports the research question. Statisticians use what is called a _________________ to quantify the amount of evidence that an observed result from a set of data provides for a research question. P-value – The probability of observing an outcome as __________________ (or more extreme) than the observed study result, assuming the _____ hypothesis is true. Note that in each of the above examples, we obtained the simulation results assuming the null hypothesis was true. Therefore, to estimate the p-value, we simply determine how often outcomes as extreme (or more extreme) than the observed study results appeared in our simulation study. Research Question Is the patient intentionally answering incorrectly? Do 10-month-old infants tend to prefer the helper toy over the hinderer toy? Does the new test reduce the percentage of postmenopausal women diagnosed with low bone density? Observed Results Outcomes as Extreme p-value 7 Making a Decision with p-values If the p-value is less than ________, then the data provide enough ___________________ evidence to support the research question. If the p-value is ________ less than 0.05, then the data _____________ provide enough statistical evidence to support the research question. Next, we will review the steps involved in a formal hypothesis test for each of our three examples. Note that our conclusions are written in the context of the problem. Moreover, even a person with no statistical background should be able to understand these conclusions (i.e., a conclusion should NOT say something like “We reject the null hypothesis.”) Auto Accident Numbness Example Research Question – Is the patient intentionally answering incorrectly? Hypotheses – H0: The patient is not intentionally answering incorrectly. Ha: The patient is intentionally answering incorrectly. p-value – Conclusion – Helper vs. Hinderer Example Research Question – Do 10-month-old infants tend to prefer the helper toy over the hinderer toy? Hypotheses – Ho: There is no preference for one toy over the other in the population of all 10-month olds. Ha: The majority of all 10-month-old infants prefer the helper toy. p-value – Conclusion – 8 Bone Density Example Research Question – Does the new test reduce the percentage of postmenopausal women diagnosed with low bone density? Hypotheses – H0: The new test is no better than the previous tests in diagnosing low bone density, i.e. the percentage diagnosed with low bone density is the same. Ha: The new test is better, i.e. the percentage diagnosed with low bone density has decreased. p-value – Conclusion – Example: Effectiveness of an Experimental Drug Suppose a commonly prescribed drug for relieving nervous tension is believed to be only 70% effective. Experimental results with a new drug administered to a random sample of 20 adults who were suffering from nervous tension show that 18 received relief. Research Question - Is there statistical evidence that the new experimental drug is more than 70% effective? Questions: 16. Identify both the population and sample of interest. 17. Identify the single categorical variable of interest. 18. Identify both the parameter and statistic of interest. 9 19. Carry out the formal hypothesis test to address the research question. Research Question – Hypotheses – p-value – Conclusion – Example: Obesity in America In 2000 it was reported that 60% of Americans were categorized as overweight or obese. According to recent studies it appears that even more Americans are now categorized as overweight or obese. A random sample of 125 Americans was taken and 83 of them were categorized as overweight or obese. I Research Question – Is there evidence that the obesity rate of Americans has increased since 2000? Questions: 20. Identify both the population and sample of interest. 10 21. Identify the single categorical variable of interest. 22. Identify both the parameter and statistic of interest. 23. Carry out the formal hypothesis test to address the research question. Research Question – Hypotheses – p-value – Conclusion – 11