Example: Helper vs. Hinderer In a study reported in the November

advertisement
Example: Helper vs. Hinderer
In a study reported in the November 2007 issue of Nature, researchers investigated whether infants
take into account an individual’s actions towards others in evaluating that individual as appealing or
aversive, perhaps laying the foundation for social interaction. (Hamlin, Wynn, and Bloom, 2007). In one
component of the study, sixteen 10-month old infants were shown a “climber” character (a piece of
wood with “google” eyes glued onto it) that could not make it up a hill in two tries. Then they were
shown two scenarios for the climber’s next try, one where the climber was pushed to the top of the hill
by another character (“helper”) and one where the climber was pushed back down the hill by another
character (“hinderer”). The infant was alternately shown these two scenarios several times. Then the
child was presented with both pieces of wood (the helper and the hinderer) and was asked to pick one
to play with. The color and shape and order (left/right) of the toys was varied and balanced among the
16 infants. A video of the experiment can be found at the following link:
http://www.yale.edu/infantlab/socialevaluation/Helper-Hinderer.html.
Sources:
 Introducing Concepts of Statistical Inference. Rossman, Chance, Cobb, and Holcomb. NSF/DUE/CCLI #
0633349
 Hamlin, J. Kiley, Karen Wynn, and Paul Bloom. “Social evaluation by preverbal infants.” November 22,
2007. Nature, Volume 150.
Research Question – Do 10-month old infants tend to prefer the helper toy over the hinderer toy?
Questions:
1. Why is it important for the researchers to balance the color, shape and order of the toys across
the study? For example, how would the study results have been affected if the researchers
always made the helper toy a blue circle and the hinderer toy a yellow triangle?
2. Identify the population of interest in the study.
3. Identify the sample in the study.
4. What is the single categorical variable of interest in the study?
5. Recall that the study involves 16 10-month old infants. If the population of all 10-month old
infants has no real preference for one toy over the other, how many infants would you expect to
choose the helper toy? Explain.
1
6. Suppose that 10 out of the 16 infants chose the helper toy (62.5%). Since this value is more
than what is expected (50%), a researcher argues that these data show that the majority of ALL
10-month old infants would choose the helper toy. What is wrong with this reasoning?
Once again, the key question is how to determine whether the study’s result is surprising under the
assumption that there is no real preference for one toy over the other in the population of all 10-monthold infants. To answer this, we will simulate the process of 16 infants simply choosing a toy at random,
over and over again. Each time we simulate the process, we’ll keep track of how many infants out of the
16 chose the helper toy (note that you could also keep track of the number that chose the hinderer toy).
Once we’ve repeated this process a large number of times, we’ll have a pretty good sense for what
outcomes would be very surprising, or somewhat surprising, or not so surprising if the population of all
10-month-old infants has no real preference.
Use the instructions outlined in previously to carry out the Tinkerplots simulation. Note that you will
have to revise a few elements of the simulation that relate to the following questions:

What are the two possible outcomes on each of the trials? Change the values on your spinner
accordingly.

What is the probability that each outcome occurs, given that the population of all 10-month-old
infants has no real preference for either toy? Change your spinner accordingly.

Be sure to change the Draw value to 1 since only one infant is choosing a toy at a time.

How many infants were used in this study? Keep this value in mind when setting the Repeat
value.
Carry out the simulation study 1000 times overall, keeping track of the number of infants that choose
the helper toy in each trial of the simulation. Sketch in your results below:
2
Questions:
7. What does each dot in the plot represent?
8. Suppose that in the actual study 10 out of 16 infants chose the helper toy. Would this convince
you that the majority of the population of all 10-month old infants had a preference for the
helper toy? Why or why not?
9. The actual study results are as follows: 14 out of 16 infants chose the helper toy. Based on this
statistical investigation, what should the researchers conclude?
Example: Bone density
Forty percent of postmenopausal women are diagnosed with low bone density (osteopenia), placing
them at risk for osteoporosis with ensuing spontaneous fractures. Osteoporosis is estimated to cost $14
billion per year in medical expenses alone, and yet it can be prevented if treated early enough. A new
test for detecting the early stages of low bone density was developed, and allows postmenopausal
women susceptible to osteoporosis to receive treatment early in order to prevent the full blown onset
of the disease. A random sample of 248 postmenopausal women were given the new test and treated
accordingly, and 82 were diagnosed with low bone density ten years later.
Research Question – Is there evidence that the new test (which allows for early detection of
osteoporosis) reduces the percentage of postmenopausal women
diagnosed with low bone density?
Questions:
10. Identify the population of interest in the study.
11. Identify the sample in the study.
12. What is the single categorical variable of interest in the study?
13. If the new test does not allow for early detection and ultimately prevention of low bone density,
how many postmenopausal women would you expect to be diagnosed with low bone density?
Explain.
3
Once again, the key question is how to determine whether the study’s result is surprising under the
assumption that the new test does not aid in early detection and prevention of low bone density. To
answer this, we will simulate the process of 248 postmenopausal being tested for low bone density,
over and over again. Each time we simulate the process, we’ll keep track of how many postmenopausal
women out of the 248 get diagnosed with low bone density (note that you could also keep track of the
number that do not get diagnosed with low bone density). Once we’ve repeated this process a large
number of times, we’ll have a pretty good sense for what outcomes would be very surprising, or
somewhat surprising, or not so surprising if the new test is not better at early detection and prevention
of low bone density.
Use the instructions outlined in previously to carry out the Tinkerplots simulation. Note that you will
have to revise a few elements of the simulation that relate to the following questions:

What are the two possible outcomes on each of the trials? Change the values on your spinner
accordingly.

What is the probability that each outcome occurs, given that test is not better at detecting low
bone density? Change your spinner accordingly.

Be sure to change the Draw value to 1 since only woman is being tested at a time.

How many postmenopausal women were used in this study? Keep this value in mind when
setting the Repeat value.
Carry out the simulation study 1000 times overall, keeping track of the number of postmenopausal
women diagnosed with low bone density in each trial of the simulation. Sketch in your results below:
Questions:
14. What does each dot in the plot represent?
15. Recall, 82 out of the 248 postmenopausal women were diagnosed with low bone density. Based
on this statistical investigation, what should the researchers conclude?
4
Formal Hypothesis Testing
In the previous examples, we have used a logical process to make statistical inferences in problems
involving a single categorical variable. Next, we will add more structure to these statistical
investigations by introducing a procedure known as _______________________________________.
Before we discuss this procedure, we need a few more definitions. In each of the previous examples, we
tested a claim about a population parameter of interest.

Parameter – A numerical descriptive measure of a ___________________. This value is almost
always unknown, and our goal is to estimate this parameter or test claims
regarding it.

Statistic – A numerical descriptive measure of a ___________________. This value is calculated
from the observed data.
Example
Statistic
Parameter
Auto Accident
Numbness
Helper vs. Hinderer
Bone Density
Hypothesis testing is a procedure, based on sample evidence and probability, used to test a claim
regarding a population parameter. The test will measure how well our observed sample statistic agrees
with some assumption about this population parameter.
Before you begin a hypothesis test, you should clearly state your research question. For instance, let’s
reconsider the research question from three of our previous examples.
Example
Auto Accident Numbness
Helper vs. Hinderer
Bone Density
Research Question
Is the patient intentionally answering
incorrectly?
Do 10-month-old infants tend to prefer the
helper toy over the hinderer toy?
Does the new test reduce the percentage of
postmenopausal women diagnosed with low
bone density?
5
Setting Up the Null and Alternative Hypothesis

The null hypothesis, Ho, is what we will assume to be true, and we will evaluate the observed
data from our study against what we expected to see under the null hypothesis. This will always
contain a statement saying that the population parameter is equal to some value.

The alternative hypothesis, Ha, is what we are trying to show. Therefore, the research question
is restated here in the alternative hypothesis. This will always contain statements of inequality,
saying that the population parameter is less than, greater than, or different from the value in
the null hypothesis.
For our three examples, the null and alternative hypotheses are shown below.
Research Question
Is the patient intentionally answering incorrectly?
Hypotheses
H0: The patient is not intentionally answering
incorrectly.
Ha: The patient is intentionally answering incorrectly.
Do 10-month-old infants tend to prefer the helper
toy over the hinderer toy?
Does the new test reduce the percentage of
postmenopausal women diagnosed with low bone
density?
Ho: There is no preference for one toy over the
other in the population of all 10-month-olds.
Ha: The majority of all 10-month-old infants
prefer the helper toy.
H0: The new test is no better than the previous tests
in diagnosing low bone density, i.e. the
percentage diagnosed with low bone density is
the same.
Ha: The new test is better, i.e. the percentage
diagnosed with low bone density has decreased.
Note that we can also state these hypotheses in terms of the population parameter of interest:
Research Question
Hypotheses
Ho:
Is the patient intentionally answering incorrectly?
Ha:
6
Ho:
Do 10-month-old infants tend to prefer the helper
toy over the hinderer toy?
Does the new test reduce the percentage of
postmenopausal women diagnosed with low bone
density?
Ha:
Ho:
Ha:
Evaluating Evidence Using P-Values
In in each of our three examples, we assumed the null hypothesis was true when setting up our spinner
for the Tinkerplots investigation. Then, we used the results simulated under this scenario to help us
decide whether observing results such as our sample data would be an unusual event if the null
hypothesis were true.
Up to this point, whether an observed result was considered unusual (or extreme) has been a rather
subjective decision. Now, we will discuss the guidelines used by statisticians to determine whether an
observed result is extreme enough under the null hypothesis for us to conclude that the evidence
supports the research question.
Statisticians use what is called a _________________ to quantify the amount of evidence that an
observed result from a set of data provides for a research question.

P-value – The probability of observing an outcome as __________________ (or more extreme)
than the observed study result, assuming the _____ hypothesis is true.
Note that in each of the above examples, we obtained the simulation results assuming the null
hypothesis was true. Therefore, to estimate the p-value, we simply determine how often outcomes as
extreme (or more extreme) than the observed study results appeared in our simulation study.
Research Question
Is the patient intentionally
answering incorrectly?
Do 10-month-old infants
tend to prefer the helper toy
over the hinderer toy?
Does the new test reduce the
percentage of
postmenopausal women
diagnosed with low bone
density?
Observed Results
Outcomes as Extreme
p-value
7
Making a Decision with p-values

If the p-value is less than ________, then the data provide enough ___________________
evidence to support the research question.

If the p-value is ________ less than 0.05, then the data _____________ provide enough
statistical evidence to support the research question.
Next, we will review the steps involved in a formal hypothesis test for each of our three examples. Note
that our conclusions are written in the context of the problem. Moreover, even a person with no
statistical background should be able to understand these conclusions (i.e., a conclusion should NOT say
something like “We reject the null hypothesis.”)
Auto Accident Numbness Example
Research Question – Is the patient intentionally answering incorrectly?
Hypotheses –
H0: The patient is not intentionally answering incorrectly.
Ha: The patient is intentionally answering incorrectly.
p-value –
Conclusion –
Helper vs. Hinderer Example
Research Question – Do 10-month-old infants tend to prefer the helper toy over the hinderer toy?
Hypotheses –
Ho: There is no preference for one toy over the other in the population of all 10-month
olds.
Ha: The majority of all 10-month-old infants prefer the helper toy.
p-value –
Conclusion –
8
Bone Density Example
Research Question – Does the new test reduce the percentage of postmenopausal women diagnosed
with low bone density?
Hypotheses –
H0: The new test is no better than the previous tests in diagnosing low bone density, i.e.
the percentage diagnosed with low bone density is the same.
Ha: The new test is better, i.e. the percentage diagnosed with low bone density has
decreased.
p-value –
Conclusion –
Example: Effectiveness of an Experimental Drug
Suppose a commonly prescribed drug for relieving nervous tension is believed to be only 70% effective.
Experimental results with a new drug administered to a random sample of 20 adults who were suffering
from nervous tension show that 18 received relief.
Research Question - Is there statistical evidence that the new experimental drug is more than
70% effective?
Questions:
16. Identify both the population and sample of interest.
17. Identify the single categorical variable of interest.
18. Identify both the parameter and statistic of interest.
9
19. Carry out the formal hypothesis test to address the research question.
Research Question –
Hypotheses –
p-value –
Conclusion –
Example: Obesity in America
In 2000 it was reported that 60% of Americans were categorized as overweight or obese. According to
recent studies it appears that even more Americans are now categorized as overweight or obese. A
random sample of 125 Americans was taken and 83 of them were categorized as overweight or obese. I
Research Question – Is there evidence that the obesity rate of Americans has increased since
2000?
Questions:
20. Identify both the population and sample of interest.
10
21. Identify the single categorical variable of interest.
22. Identify both the parameter and statistic of interest.
23. Carry out the formal hypothesis test to address the research question.
Research Question –
Hypotheses –
p-value –
Conclusion –
11
Download