Statistical significance with simulation

advertisement
Can swimming with dolphins reduce depression?
Statistical Significance using Simulation and Randomization
Adopted from ISCAM II (Chance/Rossman, 2011, Investigation 2.3)
In this lab you will explore the above research question. You will use simulation to model the process of
randomly assigning subjects to groups, assuming dolphins had no effect on depression. You will repeatedly
randomly assign subjects to groups and will collect statistics after each randomization. By plotting these statistics
in a distribution we can see the possible results from many different random assignments. Then we will assess the
statistical significance by finding out where the result actually observed in the study falls in this “null
distribution” of the randomly collected statistics to see if it is common or rare.
Study background. Antonioli and Reveley (2005) investigated whether swimming with dolphins was
therapeutic for patients suffering from clinical depression. The researchers recruited 30 subjects aged 18-65 with
a clinical diagnosis of mild to moderate depression through announcements on the internet, radio, newspapers,
and hospitals in the U.S. and Honduras. Subjects were required to discontinue use of any antidepressant drugs or
psychotherapy four weeks prior to the experiment, and throughout the experiment. These 30 subjects went to an
island off the coast of Honduras, where they were randomly assigned to one of two treatment groups. Both
groups engaged in one hour of swimming and snorkeling each day, but one group (Dolphin Therapy) did so in
the presence of bottlenose dolphins and the other group (Control) did not. At the end of two weeks, each
subjects’ level of depression was evaluated, as it had been at the beginning of the study, and each subject was
categorized as experiencing substantial improvement in their depression symptoms or not. (Afterwards, the
control group had a one day session with dolphins.)
(a) Is this an observational study or an experiment?
(b) What are the explanatory and response variables in this study?
Explanatory Variable:
Response Variable:
The contingency table as it appears in Chance & Rossman (2011) summarizes the Dolphin study results:
Dolphin Therapy Control Group
Total
Showed substantial improvement
10
3
13
Did not show substantial improvement
5
12
17
Total
15
15
30
(c) What is the risk in the Dolphin group (the proportion of subjects who showed substantial improvement)?
(d) What is the risk in the Control group (the proportion of subjects who showed substantial improvement)?
(e) What is the observed difference between the proportions? What would we expect this difference to be if
there’s no effect of Dolphin Therapy?
(f) Does the difference in proportions support the claim that dolphin therapy is more effective than the
control?
(g) Do you think this difference in proportions could have arisen by random chance alone? Yes or No
(h) State null and alternative hypotheses in terms of a difference in proportions.
Null hypothesis (no effect):
Alternative (research) hypothesis:
We will create a simulation model of how these statistics are expected to vary under the null
hypothesis.
Key assumption: 13 of the 30 people in the study would see a substantial improvement, regardless of whether
they swam with dolphins or not. This is consistent with the idea that whether people improved was not related
to the group they were put in (our null hypothesis).
Key question: How unlikely is it for the random assignment process alone to produce a difference of .467 or
larger in the success rates?
Logic of statistical significance (p-values): If the observed difference would rarely occur in a world where
dolphin therapy had the same result as the control group, then we would have strong evidence to reject the
null hypothesis and conclude that dolphin therapy is more effective.
(i) Simulate the variability expected from random assignment under the null hypothesis using cards.
Work with a partner or two. Take out 30 cards from the deck, one for each subject in the study.
Designate 13 of the cards to represent “improvers” (e.g., red cards) and then designate 17 of the cards to
represent “non-improvers” (e.g., black cards). Shuffle the cards and deal out 15 of the cards (the
subjects) into two stacks. One stack is the reshuffled Dolphin Therapy group and the other stack of 15 is
the reshuffled Control Group. Then, complete the following contingency table to show this version of
how the results “could have been.”
Reshuffle 1:
Dolphin Therapy
Control Group
Total
Showed substantial improvement
13
Did not show substantial improvement
17
Total
15
15
30
(j) With your partner, repeat this hypothetical random assignment, reshuffling four times and
recording four more contingency tables for groupings that “could have been.” The first row is
for Improved = Yes (the “Success” condition had a total of 13 improvers). The second row is
for Improved = No (the “Failure” condition had a total of 17 non-improvers.)
Reshuffle 2
Dolphin Control
Yes
No
Reshuffle 3
Dolphin Control
Yes
No
Reshuffle 4
Dolphin Control
Yes
No
Reshuffle 5
Dolphin Control
Yes
No
(k) For each reshuffle, report the difference in proportions (dolphin – control)
Reshuffle #
1
2
3
4
5
Difference in proportions
(l) Pool your results with the remainder of the class. Describe the plot for differences in
proportions (shape, center, spread). This plot represent the distribution we’d expect under
the null hypothesis where treatment is unrelated to the response.
(m) Does it seem like the actual experimental results (difference in proportions of .467) would be
surprising to arise purely from the random assignment process under the null model that
Dolphin Therapy is not effective? Briefly explain.
We really need to do this simulated random assignment process hundreds, preferably thousands of
times. This would be very tedious and time-consuming with cards, so let’s turn to technology.
(n) Open the Dolphin Simulation applet:
http://www.rossmanchance.com/applets/ChiSqShuffle.html?dolphins=1
The “Analyzing Two-way Tables” applet shows the left panel when it is first opened. Click the “Show
Shuffle Options” in the right panel to reveal the right hand side of the screen:
In the left panel is the original contingency (two-way) table from the 2005 Antonioli and Reveley study.
Note that in this applet the explanatory groups are in the columns; namely, GroupA = Dolphin Therapy
and GroupB = Control Group. The response variable is in the rows; namely, “Showed substantial
improvement” = Success and “Did not show substantial improvement” = Failure. There is also a stacked
bar chart with the explanatory groups on the x axis with the response variable defined in the stacks. We
can see that a larger proportion of the successes are in GroupA for this sample.
In the right panel we see that the blue cards represent the Successes (“improvers”) with 10 in Group A and
3 in Group B. The green cards represent the Failures (“non-improvers”.) In this panel you can shuffle
the cards and the applet will compute the difference in the proportions as you have been doing. The
applet will also plot the differences after each shuffle in the “Shuffled DIFFS” plot on the right.
Click “Shuffle” and watch as the applet repeats what you have been doing with your partner.
The applet “Shuffles” the 30 online cards and “deals out” 15 to the “Dolphin Therapy” group and creates
the table of the results that “could have been.” The applet then adds a “square dot” to the dotplot on the
right for the difference in proportions.
Click “Show table” in the left panel to be able to see the original table on the left and the reshuffled table
on the right. (You probably need to scroll down to see these two tables.).
(o) Now click on Shuffle four more times. Does the number of successes vary among the repetitions?
Yes
or
No
(p) Now enter 995 for the Number of shuffles. This produces a total of 1000 repetitions of the simulated
random assignment process under the null hypothesis. Next to the “Count Samples Beyond” in the right
panel, type in the difference in proportions observed in our original study (i.e., difference = .467.)
In what proportion of your 1000 simulated random assignments were the difference in proportions results
as (or more) extreme as the actual study? That is, what is the value for the count divided by 1000?
. This is the p-value based on your simulated randomization results!
(q) What is the general shape of the “Shuffled DIFFs” distribution? What is the center (value of the mean) of
this distribution (rounded off)? Does this make intuitive sense? Think about it!
Finally, click on the “overlay Normal Distribution” button to see if the mathematical normal model
could be used to represent your simulation model. How do these distributions compare?
(r) Is the p-value from this simulation small enough to convince you that the original experimental data
provide evidence that dolphin therapy is indeed effective (i.e., that the null hypothesis should be
rejected)? Briefly explain.
(s) Can we say that dolphin therapy “caused” the reduced depression based on this study design? Why or
why not?
To what population are you willing to generalize, if any?
Download