Altruistic punishment: A closer look (ESM) 1
Electronic Supplementary Material
S1. Supporting Methods and Results
S1.1 Random assignment to experimental condition and cell sizes
Experiment 1: The condition in which witnesses of unfairness started with $9 was
significantly smaller than the other conditions because we stopped enrolling subjects in this
condition to increase power for statistical analyses involving the four cells of the 2 (Target:
Self, Other) x 2 (Treatment: Fair, Unfair) design. This decision was made prior to any
analysis of the data from this condition and was based solely on an attempt to maximize the
number of subjects in each of the other four cells, given the rate at which we managed to
recruit subjects into the study. All other cell size differences are due to the random nature of
the assignments.
Experiments 2a and 2b: Cell size differences are due to random assignment. The discrepancy
in sample sizes between punishment/rewarding analyses and emotion analyses is a result of
our adding the emotion questions halfway through data collection.
S1.2 Data excluded from Experiment 1 analyses based on debriefing responses
Data from 26 subjects (12 female; Age: M = 18.85, SD = 1.62) were excluded from all
analyses, figures, and tables (including ESM) because they expressed scepticism during
debriefing (see ESM Appendix for debriefing script) that they had been interacting with other
people. Decisions to exclude individual subjects were made without knowledge of their
experimental data. The number of subjects excluded did not vary by condition, χ2 (4, N = 341)
= .459, p = .977, suggesting that none of the conditions induced greater scepticism than any
other.
Additionally, we re-ran all of the analyses with flagged subjects included and found that the
results did not change qualitatively in any way—all of the significant relationships presented
in the main text remained significant when flagged subjects were added to the analyses.
S1.3 Lexical decision task measure of implicit anger following round 1 (Experiment 1):
Method and results
In addition to the self-report anger data reported in the main paper for Experiment 1, we
collected data from a lexical decision task (LDT) following Round 1: Subjects decided, as
quickly as possible, whether a string of letters was a word or a non-word; reaction times to 15
hostility-related words (e.g., “angry,” “kill”) in the LDT serve as an implicit measure of anger
(with faster reaction times indicating more anger; see refs 34,35).
A 2 (Target: Self, Other) x 2 (Treatment: Fair, Unfair) ANOVA revealed a significant
target*treatment interaction for LDT anger following Round 1, F (1, 261) = 4.66, p = .032,
partial 2 = .02 (see Fig. S1). Main effects were not significant. The reaction times to
hostility-related words for subjects who witnessed another person being treated unfairly
(natural log-transformed ms, M = 6.70, SD = 0.28, N = 62) did not differ from those who
witnessed another being treated fairly (M = 6.66, SD = 0.27, N = 80), p = .416, partial 2 =
.00. However, subjects who were treated unfairly (M = 6.58, SD = 0.29, N = 59) had faster
Altruistic punishment: A closer look (ESM) 2
reaction times than did subjects who were treated fairly (M = 6.69, SD = 0.32, N = 64), p =
.029, partial 2 = .02, suggesting greater anger among those who had been treated unfairly.
Reaction times to neutral words are typically controlled for in analysing LDT data [34,35].
However, we did not control for reaction times to neutral words in our analysis because (a)
reaction times to neutral words did not differ across conditions, F (3, 261) = 1.54, p = .205,
partial 2 = .02, and (b) the inclusion of reaction times to neutral words as a covariate
eliminated the significant interaction. Because statistical control of a non-significant
covariate decreases power by subtracting degrees of freedom from the error term while not
removing adequate sums of squares for error [36], we dropped it from our analysis.
S1.4 Fairness/moral wrongness of the round 1 dictator’s behavior (Experiment 1):
Witnesses and receivers view the transgression identically.
A possible explanation for the differences between the emotional and behavioral responses of
recipients and witnesses of unfairness is that recipients paid more attention to the Dictator’s
unfair behavior. However, the target*treatment interaction was not significant for perceived
fairness (F (1, 266) = 1.73, p = .190) or moral wrongness (F (1, 266) = 1.33, p = .249) of the
Dictator’s decision (see Fig. S2). There were, however, significant effects of treatment:
Unfair Dictators’ behavior was perceived as more unfair, F (1, 266) = 354.13, p < .001, and
morally wrong, F (1, 266) = 182.03, p < .001, than fair Dictators’ behavior. Thus, subjects in
both unfair Dictator conditions clearly judged that a transgression had occurred, but they
were only angered and motivated to punish when they personally were targets of
unfairness—consistent with previous findings [27].
S1.5 Effects of round 1 dictator behavior on self-reported anger toward the round 1
dictator, envy uncontrolled
When envy was not controlled, witnesses of unfairness did appear to get angry at unfair
Dictators, relative to witnesses’ anger in response to fair dictators: A 2 (Target: Self, Other) x
2 (Treatment: Fair, Unfair) ANOVA revealed significant main effects of both target, F (1,
266) = 10.81, p = .001), and treatment, F (1, 266) = 85.09, p < .001) for anger, and a
significant target*treatment interaction for anger, F (1, 266) = 12.44, p < .001). Witnesses of
unfairness (M = .813, SE = .110, N = 65) reported more anger than did witnesses of fairness
(M = .198, SE = .098, N = 80), p < .001, partial 2 = .06, and recipients of unfairness (M =
1.55, SE = .115, N = 61) reported more anger than recipients of fairness (M = .172, SE = .108,
N = 64), p < .001, partial 2 = .22. However, as discussed in the main text, witnesses of
unfairness reported no more anger than witnesses of fairness when envy was statistically
controlled. Thus, witnesses’ anger at unfair Dictators can be attributed to envy rather than to
moralistic anger.
S2. Supplementary Notes
S2.1 The use of deception
Though experimental economists typically resist the use of deception in experiments, its use
here is justified: there was no practical way we could have obtained a sufficiently large
sample size without deception. We sought to gather data from at least 50 subjects per cell of
our main 2x2 design in order to have adequate statistical power. Without the use of
deception, we would have had to rely on a minimum of 100 subjects, in the role of the Round
Altruistic punishment: A closer look (ESM) 3
1 Dictator, to take exactly $4.00 from the Recipient (unfair conditions) and at least 100 more
subjects to take exactly $0.00 from the Recipient (fair conditions). Assume that Round 1
Dictators would be equally likely to make one of the 11 possible choices on the give-take
continuum, from giving $5 to taking $5, in whole-dollar increments.1 We would need N =
1,100 (100 subjects per choice*11 choices) to achieve the same statistical power without the
use of deception as with 200 subjects in our actual paradigm. Considering that our interest
lies entirely with how subjects responded to Dictator actions, and not the actions themselves,
such a design would be wasteful of subjects’ time (thereby altering the ratio of benefits to
risks of the experiment, and thus its ethicality) and resources.
Below we address some possible concerns with the use of deception in our method, and why
these concerns do not affect the validity of our results.
Authenticity of debriefing responses. It could be argued that our debriefing process was
inadequate to identify all cases of scepticism of our deception among our participants because
participants were being compensated both with money and partial course credit; that is,
perhaps they were coerced into responding that they believed the deception because they felt
they would not receive compensation if they responded that they did not. This is highly
unlikely. Participants read and sign consent forms prior to participation in any experiment at
the University of Miami—as is the case with any IRB-approved study involving human
subjects in the United States—that explicitly state that participants cannot be denied
compensation based on their responses; actual amount of money earned may vary based on
decisions in experimental tasks, but course credit is required to be granted once a consent
form is signed.
Possible contamination of the subject pool. If participants are regularly subjected to
experiments in which deception is used, it could be argued that perhaps they come to expect
to be deceived in future experiments in which they participate [37], and consequently provide
responses that do not accurately reflect natural behavior. This is unlikely to be a significant
issue in the psychology department’s subject pool at the University of Miami for several
reasons. First, our subject pool consists only of undergraduates currently enrolled in a
specific introductory psychology course; once they have completed the course they are no
longer eligible to participate in experiments. Second, members of the subject pool only
participate in five total hours of experiments, which translates to participation in
approximately two to four experiments. Third, many of the experiments that members of our
subject pool participate in do not involve deception. Thus, it is unlikely that participants in
our subject pool have become accustomed to participating in experiments involving
deception to the point where their behavior is altered by its expectation.
Running subjects in individual sessions. The decision to run subjects either individually or in
groups presented a trade-off between experimental control and a more realistic setting. We
decided to run subjects individually as the benefits of doing so greatly outweighed the
potential costs of the less realistic setting. First, though subjects may have been more
sceptical that they were actually interacting with others since there were no other people in
the room, the scenario we presented was very plausible: subjects were told that two other
participants we located elsewhere in the psychology building (a five-story building with lots
of foot traffic). Second, running a third-party punishment study with two other people in the
1
It is probably unrealistic to assume that subjects would be equally likely to choose any one of the 11 options,
but this oversimplification is more than adequate to demonstrate the present point. Also, in our actual design
with predetermined Round 1 Dictator actions, subjects in Round 2 were allowed to reward or punish any amount
in this range, down to the nearest cent: they made their decision by typing in a precise amount.
Altruistic punishment: A closer look (ESM) 4
room introduces variables relative to their physical appearances including: sex, clothing style,
coalitional markings (e.g., fraternity symbols, sports team logos), perceived formidability,
friendliness, ethnicity, etc. Even though subjects’ identities are anonymous in the game and
subjects are separated with partitions during play, subjects will inevitably interact sometime
during the experimental session. Thus, running subjects individually under the perfectly
plausible guise of their interacting with other people in another room avoids the potential
experimental noise that can be introduced by these other factors.
Concern has previously been raised that running subjects in isolated rooms may potentially
increase scepticism that interactions are legitimate, thereby influencing behavioural results.
Frohlich, et al. [38], empirically tested whether running a dictator game with isolated subjects
led to different results as compared with running the game with subjects in the same room. In
their discussion, Frolich, et al. state, “Contrary to the hypothesis, most measures of subjects’
uncertainties were not significantly different as a result of the change in the number of rooms.
Indeed, only doubt that the money left in the envelope would be given to the paired other was
significantly reduced in the One Room experiments.” We note that this effect was only
statistically significant by one-tailed test (p = .038), whereas the effects on other, similar,
dependent variables were not statistically significant (even by one-tailed test: Did not view
experiment as a game; Not sure that description was accurate; and, most importantly, Not
sure that there were real people paired.)
S2.2 Inconsistency with previous experimental results
It could be argued that the reason we found no altruistic punishment in Experiment 1 is
because we changed multiple aspects of the standard design of the third-party punishment
game at once. Indeed, we made several changes in an attempt to minimize or eliminate (1)
audience effects, (2) experimental demand, (3) affective forecasting, and (4) potential
extraneous variables introduced by brief interactions with other participants. However, there
are several reasons this argument is not compelling. First, we observed a significant amount
of second-party punishment so, clearly, there was not a general suppression of punishment in
our design. Second, the cost of punishment in our design (1:4) was less expensive than the
1:3 cost-to-punishment ratio typically used in the third-party punishment game [e.g., 6].
Indeed, our design should have encouraged punishment relative to previous designs. Third,
our purpose here was to test for the presence of altruistic third-party punishment in a wellcontrolled experimental design, not to iteratively make changes to a design that contained
several features that likely produced artefactual results—to achieve this well-controlled
design, all of our changes needed to be made simultaneously.
References
34
Ayduk, O., Mischel, W. & Downey, G. 2002 Attentional mechanisms linking rejection
to hostile reactivity: The role of “hot” versus “cool” focus. Psychol Sci 13, 443-448.
35
Gollwitzer, M. & Denzler, M. 2009 What makes revenge sweet: Seeing the offender
suffer or delivering a message? J Exp Soc Psychol 45, 840-844.
(DOI:10.1016/j.jesp.2009.03.001)
36
Tabachnick, B. G. & Fidell, L. S. 1989 Using Multivariate Statistics. New York:
Harper & Row.
Altruistic punishment: A closer look (ESM) 5
37
Hertwig, R. & Ortmann, A. 2001 Experimental practices in economics: A
methodological challenge for psychologists? Behav Brain Sci 24, 383-451.
38
Frohlich, N., Oppenheimer, J., & Bernard Moore, J. 2001. Some doubts about
measuring self-interest using dictator experiments: the costs of anonymity. J Econ
Behav Organ 46, 271-290.
Altruistic punishment: A closer look (ESM) 6
S3. Supplementary Figures and Captions
Figure S1.
Reaction times to hostility-related words in the lexical decision task.
Altruistic punishment: A closer look (ESM) 7
Figure S2.
Ratings of the fairness and moral wrongness of the Round 1 Dictator’s decision. Scale from 1
(Not at all fair/morally wrong) to 9 (Totally fair/morally wrong).
Altruistic punishment: A closer look (ESM) 8
S4. Supplementary Tables
Fair –
Recipient
Mean SD
Unfair Recipient
Mean SD
Fair Witness
Mean SD
Unfair Witness
($5)
Mean SD
Unfair Witness
($9)
Mean SD
Variable
Overall
Mean SD
$ Punished/Rewarded*
-0.18
1.46
0.17
0.68
-1.12
2.08
0.34
1.05
-0.24
1.37
-0.22
1.42
LDT RT**
6.65
0.29
6.69
0.32
6.58
0.29
6.66
0.28
6.70
0.28
6.61
0.31
Moral wrongness
3.34
2.67
1.31
1.19
4.48
2.49
1.61
1.79
5.18
2.30
5.13
2.31
Fairness
6.07
2.95
8.61
1.08
3.64
2.40
8.48
1.24
4.26
2.43
4.07
2.00
Anger
0.64
1.01
0.14
0.44
1.54
1.29
0.17
0.46
0.84
1.07
0.67
0.88
Envy
0.85
1.17
0.27
0.61
1.47
1.40
0.36
0.76
1.48
1.27
0.80
1.02
* Negative values indicate punishment
** Response time to hostility-related words in lexical decision task, ln-transformed ms
Table S1. Variable summary statistics – Experiment 1.
Altruistic punishment: A closer look (ESM) 9
Fair –
Recipient
Mean
SD
Unfair Recipient
Mean SD
Fair Witness
Mean SD
Unfair Witness
Mean SD
Variable
Overall
Mean
SD
$ Punished/Rewarded*
-0.23
2.08
0.05
1.69
-0.59
2.48
0.17
1.51
-0.45
2.28
Moral wrongness
3.08
3.23
1.04
2.13
5.32
3.04
0.79
1.81
4.72
2.68
Fairness
5.28
3.38
7.85
2.21
2.64
2.57
7.74
2.01
3.37
2.58
Anger
1.12
1.32
0.16
0.52
2.25
1.20
0.28
0.64
1.63
1.13
Envy
0.98
1.25
0.46
0.92
1.85
1.39
0.34
0.69
1.12
1.16
* Negative values indicate punishment
Table S2. Variable summary statistics – Experiment 2a.
Altruistic punishment: A closer look (ESM) 10
Variable
Overall
Mean
SD
Fair –
Recipient
Mean
SD
Unfair Recipient
Mean SD
Fair Witness
Mean SD
Unfair Witness
Mean SD
$ Punished/Rewarded*
0.25
1.83
0.42
1.71
-0.04
1.99
0.58
1.74
0.06
1.79
Moral wrongness
2.87
2.88
0.75
1.79
4.56
2.41
1.00
1.92
4.97
2.32
Fairness
5.40
3.25
7.92
2.17
3.24
2.24
7.70
2.34
3.02
2.30
Anger
0.90
1.15
0.15
0.51
1.60
1.21
0.24
0.62
1.51
1.11
Envy
0.92
1.26
0.39
0.91
1.68
1.28
0.21
0.53
1.27
1.42
* Negative values indicate punishment
Table S3. Variable summary statistics – Experiment 2b.
Altruistic punishment: A closer look (ESM) 11
Appendix
Debriefing script followed by experimenter:
(1) Tell participant that the study is over. Ask if he/she has any questions. If the questions are
about hypotheses or the deceptive elements of the experiment, explain that you will address
those specific questions in just a few moments.
(2) Ask whether entire the experiment was clear in its overall purpose, and whether all
aspects of the procedure made sense. Was there anything that the participant found confusing
unclear? “Were you, at any point, unsure about what we were asking you to do?”
(3) We would find it very helpful to hear about any of your personal feelings and reactions to
the experiment. Probe about what made person feel the way they felt.
(4) Today’s experiment was designed to help us test some very specific hypotheses about
human behavior. Do you have any idea what those hypotheses were? If you had to guess,
what would you say were the hypotheses we were testing today? We would like to know as
many of your guesses about our hypotheses as you can come up with.
(5) Ask whether participant found any aspect of the procedure odd, upsetting or disturbing.
(6) Did you wonder at any point whether there was more than meets the eye to any of the
procedures that we had you complete today? That is, do you think that there might have been
any information that I held back from explaining to you about the experiment until now? Ask
participant to say more about their suspicions, and to elaborate on their questions about the
procedure.
(7) Ask how participant thinks (the suspicions he/she mentioned) affected his or her behavior
during the study.
The experimenter then fully explained the nature of the deception to the participant and why
it was a necessary part of the experiment. The experimenter also discussed the aims of the
research with the participant, and answered any questions the participant had about their
experience. Lastly, the experimenter and participant discussed possible ways for the
participant to talk about the experiment with his or her peers in a manner that, while honest,
would not spoil the deception for others in the subject pool.
Our experimenters reported anecdotally that participants generally found the experiment fun
and interesting, and were typically fascinated with the study design after the deception was
revealed in debriefing. Furthermore, the experimenters’ discussion of how to talk to other
about the study helped to, in a sense, bring participants in as collaborators on the research
such that (a) they would not feel that they had been taken advantage of, and (b) they would
not spoil the study for others in the subject pool.
Download

Electronic Supplementary Material