Altruistic punishment: A closer look (ESM) 1 Electronic Supplementary Material S1. Supporting Methods and Results S1.1 Random assignment to experimental condition and cell sizes Experiment 1: The condition in which witnesses of unfairness started with $9 was significantly smaller than the other conditions because we stopped enrolling subjects in this condition to increase power for statistical analyses involving the four cells of the 2 (Target: Self, Other) x 2 (Treatment: Fair, Unfair) design. This decision was made prior to any analysis of the data from this condition and was based solely on an attempt to maximize the number of subjects in each of the other four cells, given the rate at which we managed to recruit subjects into the study. All other cell size differences are due to the random nature of the assignments. Experiments 2a and 2b: Cell size differences are due to random assignment. The discrepancy in sample sizes between punishment/rewarding analyses and emotion analyses is a result of our adding the emotion questions halfway through data collection. S1.2 Data excluded from Experiment 1 analyses based on debriefing responses Data from 26 subjects (12 female; Age: M = 18.85, SD = 1.62) were excluded from all analyses, figures, and tables (including ESM) because they expressed scepticism during debriefing (see ESM Appendix for debriefing script) that they had been interacting with other people. Decisions to exclude individual subjects were made without knowledge of their experimental data. The number of subjects excluded did not vary by condition, χ2 (4, N = 341) = .459, p = .977, suggesting that none of the conditions induced greater scepticism than any other. Additionally, we re-ran all of the analyses with flagged subjects included and found that the results did not change qualitatively in any way—all of the significant relationships presented in the main text remained significant when flagged subjects were added to the analyses. S1.3 Lexical decision task measure of implicit anger following round 1 (Experiment 1): Method and results In addition to the self-report anger data reported in the main paper for Experiment 1, we collected data from a lexical decision task (LDT) following Round 1: Subjects decided, as quickly as possible, whether a string of letters was a word or a non-word; reaction times to 15 hostility-related words (e.g., “angry,” “kill”) in the LDT serve as an implicit measure of anger (with faster reaction times indicating more anger; see refs 34,35). A 2 (Target: Self, Other) x 2 (Treatment: Fair, Unfair) ANOVA revealed a significant target*treatment interaction for LDT anger following Round 1, F (1, 261) = 4.66, p = .032, partial 2 = .02 (see Fig. S1). Main effects were not significant. The reaction times to hostility-related words for subjects who witnessed another person being treated unfairly (natural log-transformed ms, M = 6.70, SD = 0.28, N = 62) did not differ from those who witnessed another being treated fairly (M = 6.66, SD = 0.27, N = 80), p = .416, partial 2 = .00. However, subjects who were treated unfairly (M = 6.58, SD = 0.29, N = 59) had faster Altruistic punishment: A closer look (ESM) 2 reaction times than did subjects who were treated fairly (M = 6.69, SD = 0.32, N = 64), p = .029, partial 2 = .02, suggesting greater anger among those who had been treated unfairly. Reaction times to neutral words are typically controlled for in analysing LDT data [34,35]. However, we did not control for reaction times to neutral words in our analysis because (a) reaction times to neutral words did not differ across conditions, F (3, 261) = 1.54, p = .205, partial 2 = .02, and (b) the inclusion of reaction times to neutral words as a covariate eliminated the significant interaction. Because statistical control of a non-significant covariate decreases power by subtracting degrees of freedom from the error term while not removing adequate sums of squares for error , we dropped it from our analysis. S1.4 Fairness/moral wrongness of the round 1 dictator’s behavior (Experiment 1): Witnesses and receivers view the transgression identically. A possible explanation for the differences between the emotional and behavioral responses of recipients and witnesses of unfairness is that recipients paid more attention to the Dictator’s unfair behavior. However, the target*treatment interaction was not significant for perceived fairness (F (1, 266) = 1.73, p = .190) or moral wrongness (F (1, 266) = 1.33, p = .249) of the Dictator’s decision (see Fig. S2). There were, however, significant effects of treatment: Unfair Dictators’ behavior was perceived as more unfair, F (1, 266) = 354.13, p < .001, and morally wrong, F (1, 266) = 182.03, p < .001, than fair Dictators’ behavior. Thus, subjects in both unfair Dictator conditions clearly judged that a transgression had occurred, but they were only angered and motivated to punish when they personally were targets of unfairness—consistent with previous findings . S1.5 Effects of round 1 dictator behavior on self-reported anger toward the round 1 dictator, envy uncontrolled When envy was not controlled, witnesses of unfairness did appear to get angry at unfair Dictators, relative to witnesses’ anger in response to fair dictators: A 2 (Target: Self, Other) x 2 (Treatment: Fair, Unfair) ANOVA revealed significant main effects of both target, F (1, 266) = 10.81, p = .001), and treatment, F (1, 266) = 85.09, p < .001) for anger, and a significant target*treatment interaction for anger, F (1, 266) = 12.44, p < .001). Witnesses of unfairness (M = .813, SE = .110, N = 65) reported more anger than did witnesses of fairness (M = .198, SE = .098, N = 80), p < .001, partial 2 = .06, and recipients of unfairness (M = 1.55, SE = .115, N = 61) reported more anger than recipients of fairness (M = .172, SE = .108, N = 64), p < .001, partial 2 = .22. However, as discussed in the main text, witnesses of unfairness reported no more anger than witnesses of fairness when envy was statistically controlled. Thus, witnesses’ anger at unfair Dictators can be attributed to envy rather than to moralistic anger. S2. Supplementary Notes S2.1 The use of deception Though experimental economists typically resist the use of deception in experiments, its use here is justified: there was no practical way we could have obtained a sufficiently large sample size without deception. We sought to gather data from at least 50 subjects per cell of our main 2x2 design in order to have adequate statistical power. Without the use of deception, we would have had to rely on a minimum of 100 subjects, in the role of the Round Altruistic punishment: A closer look (ESM) 3 1 Dictator, to take exactly $4.00 from the Recipient (unfair conditions) and at least 100 more subjects to take exactly $0.00 from the Recipient (fair conditions). Assume that Round 1 Dictators would be equally likely to make one of the 11 possible choices on the give-take continuum, from giving $5 to taking $5, in whole-dollar increments.1 We would need N = 1,100 (100 subjects per choice*11 choices) to achieve the same statistical power without the use of deception as with 200 subjects in our actual paradigm. Considering that our interest lies entirely with how subjects responded to Dictator actions, and not the actions themselves, such a design would be wasteful of subjects’ time (thereby altering the ratio of benefits to risks of the experiment, and thus its ethicality) and resources. Below we address some possible concerns with the use of deception in our method, and why these concerns do not affect the validity of our results. Authenticity of debriefing responses. It could be argued that our debriefing process was inadequate to identify all cases of scepticism of our deception among our participants because participants were being compensated both with money and partial course credit; that is, perhaps they were coerced into responding that they believed the deception because they felt they would not receive compensation if they responded that they did not. This is highly unlikely. Participants read and sign consent forms prior to participation in any experiment at the University of Miami—as is the case with any IRB-approved study involving human subjects in the United States—that explicitly state that participants cannot be denied compensation based on their responses; actual amount of money earned may vary based on decisions in experimental tasks, but course credit is required to be granted once a consent form is signed. Possible contamination of the subject pool. If participants are regularly subjected to experiments in which deception is used, it could be argued that perhaps they come to expect to be deceived in future experiments in which they participate , and consequently provide responses that do not accurately reflect natural behavior. This is unlikely to be a significant issue in the psychology department’s subject pool at the University of Miami for several reasons. First, our subject pool consists only of undergraduates currently enrolled in a specific introductory psychology course; once they have completed the course they are no longer eligible to participate in experiments. Second, members of the subject pool only participate in five total hours of experiments, which translates to participation in approximately two to four experiments. Third, many of the experiments that members of our subject pool participate in do not involve deception. Thus, it is unlikely that participants in our subject pool have become accustomed to participating in experiments involving deception to the point where their behavior is altered by its expectation. Running subjects in individual sessions. The decision to run subjects either individually or in groups presented a trade-off between experimental control and a more realistic setting. We decided to run subjects individually as the benefits of doing so greatly outweighed the potential costs of the less realistic setting. First, though subjects may have been more sceptical that they were actually interacting with others since there were no other people in the room, the scenario we presented was very plausible: subjects were told that two other participants we located elsewhere in the psychology building (a five-story building with lots of foot traffic). Second, running a third-party punishment study with two other people in the 1 It is probably unrealistic to assume that subjects would be equally likely to choose any one of the 11 options, but this oversimplification is more than adequate to demonstrate the present point. Also, in our actual design with predetermined Round 1 Dictator actions, subjects in Round 2 were allowed to reward or punish any amount in this range, down to the nearest cent: they made their decision by typing in a precise amount. Altruistic punishment: A closer look (ESM) 4 room introduces variables relative to their physical appearances including: sex, clothing style, coalitional markings (e.g., fraternity symbols, sports team logos), perceived formidability, friendliness, ethnicity, etc. Even though subjects’ identities are anonymous in the game and subjects are separated with partitions during play, subjects will inevitably interact sometime during the experimental session. Thus, running subjects individually under the perfectly plausible guise of their interacting with other people in another room avoids the potential experimental noise that can be introduced by these other factors. Concern has previously been raised that running subjects in isolated rooms may potentially increase scepticism that interactions are legitimate, thereby influencing behavioural results. Frohlich, et al. , empirically tested whether running a dictator game with isolated subjects led to different results as compared with running the game with subjects in the same room. In their discussion, Frolich, et al. state, “Contrary to the hypothesis, most measures of subjects’ uncertainties were not significantly different as a result of the change in the number of rooms. Indeed, only doubt that the money left in the envelope would be given to the paired other was significantly reduced in the One Room experiments.” We note that this effect was only statistically significant by one-tailed test (p = .038), whereas the effects on other, similar, dependent variables were not statistically significant (even by one-tailed test: Did not view experiment as a game; Not sure that description was accurate; and, most importantly, Not sure that there were real people paired.) S2.2 Inconsistency with previous experimental results It could be argued that the reason we found no altruistic punishment in Experiment 1 is because we changed multiple aspects of the standard design of the third-party punishment game at once. Indeed, we made several changes in an attempt to minimize or eliminate (1) audience effects, (2) experimental demand, (3) affective forecasting, and (4) potential extraneous variables introduced by brief interactions with other participants. However, there are several reasons this argument is not compelling. First, we observed a significant amount of second-party punishment so, clearly, there was not a general suppression of punishment in our design. Second, the cost of punishment in our design (1:4) was less expensive than the 1:3 cost-to-punishment ratio typically used in the third-party punishment game [e.g., 6]. Indeed, our design should have encouraged punishment relative to previous designs. Third, our purpose here was to test for the presence of altruistic third-party punishment in a wellcontrolled experimental design, not to iteratively make changes to a design that contained several features that likely produced artefactual results—to achieve this well-controlled design, all of our changes needed to be made simultaneously. References 34 Ayduk, O., Mischel, W. & Downey, G. 2002 Attentional mechanisms linking rejection to hostile reactivity: The role of “hot” versus “cool” focus. Psychol Sci 13, 443-448. 35 Gollwitzer, M. & Denzler, M. 2009 What makes revenge sweet: Seeing the offender suffer or delivering a message? J Exp Soc Psychol 45, 840-844. (DOI:10.1016/j.jesp.2009.03.001) 36 Tabachnick, B. G. & Fidell, L. S. 1989 Using Multivariate Statistics. New York: Harper & Row. Altruistic punishment: A closer look (ESM) 5 37 Hertwig, R. & Ortmann, A. 2001 Experimental practices in economics: A methodological challenge for psychologists? Behav Brain Sci 24, 383-451. 38 Frohlich, N., Oppenheimer, J., & Bernard Moore, J. 2001. Some doubts about measuring self-interest using dictator experiments: the costs of anonymity. J Econ Behav Organ 46, 271-290. Altruistic punishment: A closer look (ESM) 6 S3. Supplementary Figures and Captions Figure S1. Reaction times to hostility-related words in the lexical decision task. Altruistic punishment: A closer look (ESM) 7 Figure S2. Ratings of the fairness and moral wrongness of the Round 1 Dictator’s decision. Scale from 1 (Not at all fair/morally wrong) to 9 (Totally fair/morally wrong). Altruistic punishment: A closer look (ESM) 8 S4. Supplementary Tables Fair – Recipient Mean SD Unfair Recipient Mean SD Fair Witness Mean SD Unfair Witness ($5) Mean SD Unfair Witness ($9) Mean SD Variable Overall Mean SD $ Punished/Rewarded* -0.18 1.46 0.17 0.68 -1.12 2.08 0.34 1.05 -0.24 1.37 -0.22 1.42 LDT RT** 6.65 0.29 6.69 0.32 6.58 0.29 6.66 0.28 6.70 0.28 6.61 0.31 Moral wrongness 3.34 2.67 1.31 1.19 4.48 2.49 1.61 1.79 5.18 2.30 5.13 2.31 Fairness 6.07 2.95 8.61 1.08 3.64 2.40 8.48 1.24 4.26 2.43 4.07 2.00 Anger 0.64 1.01 0.14 0.44 1.54 1.29 0.17 0.46 0.84 1.07 0.67 0.88 Envy 0.85 1.17 0.27 0.61 1.47 1.40 0.36 0.76 1.48 1.27 0.80 1.02 * Negative values indicate punishment ** Response time to hostility-related words in lexical decision task, ln-transformed ms Table S1. Variable summary statistics – Experiment 1. Altruistic punishment: A closer look (ESM) 9 Fair – Recipient Mean SD Unfair Recipient Mean SD Fair Witness Mean SD Unfair Witness Mean SD Variable Overall Mean SD $ Punished/Rewarded* -0.23 2.08 0.05 1.69 -0.59 2.48 0.17 1.51 -0.45 2.28 Moral wrongness 3.08 3.23 1.04 2.13 5.32 3.04 0.79 1.81 4.72 2.68 Fairness 5.28 3.38 7.85 2.21 2.64 2.57 7.74 2.01 3.37 2.58 Anger 1.12 1.32 0.16 0.52 2.25 1.20 0.28 0.64 1.63 1.13 Envy 0.98 1.25 0.46 0.92 1.85 1.39 0.34 0.69 1.12 1.16 * Negative values indicate punishment Table S2. Variable summary statistics – Experiment 2a. Altruistic punishment: A closer look (ESM) 10 Variable Overall Mean SD Fair – Recipient Mean SD Unfair Recipient Mean SD Fair Witness Mean SD Unfair Witness Mean SD $ Punished/Rewarded* 0.25 1.83 0.42 1.71 -0.04 1.99 0.58 1.74 0.06 1.79 Moral wrongness 2.87 2.88 0.75 1.79 4.56 2.41 1.00 1.92 4.97 2.32 Fairness 5.40 3.25 7.92 2.17 3.24 2.24 7.70 2.34 3.02 2.30 Anger 0.90 1.15 0.15 0.51 1.60 1.21 0.24 0.62 1.51 1.11 Envy 0.92 1.26 0.39 0.91 1.68 1.28 0.21 0.53 1.27 1.42 * Negative values indicate punishment Table S3. Variable summary statistics – Experiment 2b. Altruistic punishment: A closer look (ESM) 11 Appendix Debriefing script followed by experimenter: (1) Tell participant that the study is over. Ask if he/she has any questions. If the questions are about hypotheses or the deceptive elements of the experiment, explain that you will address those specific questions in just a few moments. (2) Ask whether entire the experiment was clear in its overall purpose, and whether all aspects of the procedure made sense. Was there anything that the participant found confusing unclear? “Were you, at any point, unsure about what we were asking you to do?” (3) We would find it very helpful to hear about any of your personal feelings and reactions to the experiment. Probe about what made person feel the way they felt. (4) Today’s experiment was designed to help us test some very specific hypotheses about human behavior. Do you have any idea what those hypotheses were? If you had to guess, what would you say were the hypotheses we were testing today? We would like to know as many of your guesses about our hypotheses as you can come up with. (5) Ask whether participant found any aspect of the procedure odd, upsetting or disturbing. (6) Did you wonder at any point whether there was more than meets the eye to any of the procedures that we had you complete today? That is, do you think that there might have been any information that I held back from explaining to you about the experiment until now? Ask participant to say more about their suspicions, and to elaborate on their questions about the procedure. (7) Ask how participant thinks (the suspicions he/she mentioned) affected his or her behavior during the study. The experimenter then fully explained the nature of the deception to the participant and why it was a necessary part of the experiment. The experimenter also discussed the aims of the research with the participant, and answered any questions the participant had about their experience. Lastly, the experimenter and participant discussed possible ways for the participant to talk about the experiment with his or her peers in a manner that, while honest, would not spoil the deception for others in the subject pool. Our experimenters reported anecdotally that participants generally found the experiment fun and interesting, and were typically fascinated with the study design after the deception was revealed in debriefing. Furthermore, the experimenters’ discussion of how to talk to other about the study helped to, in a sense, bring participants in as collaborators on the research such that (a) they would not feel that they had been taken advantage of, and (b) they would not spoil the study for others in the subject pool.