Supplemental Materials Mechanisms of Social Avoidance Learning Can Explain the Emergence of Adaptive and Arbitrary Behavioral Traditions in Humans by B. Lindström & A. Olsson, 2015, JEP: General http://dx.doi.org/10.1037/xge0000071 Additional analyses of Exp. 1-4 To ascertain that our conclusions do not depended on this specific analysis strategy, we also analyzed the data using both simple parametric (one-sample t-tests) and non-parametric (Wilcoxon signed rank test) tests. The 20 choices of each individual was aggregated into a proportion which we refer to as P(A)* to differentiate from the random effects regression. We analyzed if the average P(A)* in each experiment different from 0.5 (see figure S1). Both parametric and non-parametric tests resulted in the identical conclusions as the random effects regressions: P(A)* in Exp. 1 was above 0.5 (t(24) = 3.65, p = .001. W = 271 p = .003), P(A)* in Exp. 2 was not different from 0.5 (t(19) = 0.92, p = .36. W = 103.4, p = .44), P(A) * in Exp. 3 was above 0.5 (t(24) = 2.64, p = .01. W = 208, p = .03), and P(A) * in Exp. 4 was above 0.5 (t(49) = 2.23, p = .03. W = 755, p =.014). Figure S1. The empirical distributions of individual P(A) in experiment 1-4. Additional analyses of Exp. 4. In the main text, we report a quadratic effect of Generation on P(A). This GLMM model was based on a simplified random effects structure (random intercepts for each Transmission Chain and Generation, but no random terms for individual subjects), because the more complex model including random intercepts for all subjects failed to converge. A model with only a linear term for Generation (β = 0.37, SE=23, z = -1.62, p = .1) had an inferior fit (likelihood ratio test: χ2(1) = 5.4, p = .02) relative to a model that included a quadratic term for Generation (linear effect: β = -16.65, SE = 5.38, z = -3.1, p = .002. quadratic effect: β = 17.74, SE = 5.37, z = 3.3, p < .001). In Fig. S2, the empirical data and best fitting line is shown. As a complement to the average correlation across generations reported in the main text (Fig. 5), we also conducted a LMM analysis with random intercepts for the different transmission chains. This analysis included an interaction term (Previous generation * Generation) to test if the strength of the effect reported in the main text differed across generations. This was not the case, (Previous generation * Generation interaction: χ2(1) = 0.0945, p = .76). The main effect of Previous generation remained significant when controlling for Generation, χ2(1) = 4.3, p < .05. Figure S2. Experiment 4. The boxplot shows the median (black lines) and the first and third quartiles (the box height). The whiskers indicate 1.5 interquartile ranges above and below the first and third quartiles. The red line shows the quadratic effect of Generation on P(A) described in the main text. Parameter estimation Parameter estimation of the Experiment 1-4 data was conducted using maximum-likelihood optimization, which finds the set of parameters that maximize the probability of the participant´s trial-by-trial choices given the model. Optimization was done by to minimizing the negative loglikelihood, -L, computed by: π −πΏ = − ∑ ππ (ππβππππ (π‘)) [1] π‘=1 where T denotes the total number of trials. Parameters were independently fitted to each subject using the BFGS optimization method. Model implementations and parameter fitting was done in R (R Development Core Team, 2012). The results of the model fitting for Exp. 1-4 are shown in Table S1. The Akaike Information Criterion is used as index of model fit: π΄πΌπΆ = 2π − 2ln(πΏ), where k is the number of fitted parameters and –ln(L) is the negative log-likelihood. As shown, the goodness of fit in Exp. 1 and 4 was extremely good, which depends on the low variance in within-subject behavior (see Fig. S1). We used the sample median as point estimate of the empirical parameter values, because the distribution of estimates for the β¦ parameter was very skewed (see Fig. S3). Parameter Experiment Exp. 1 Exp. 2 Exp. 3 Exp. 4 πΌπΌ Ω πΌπ AIC M= 0.567 M=0.227 M=0.424 M=13.608 Mdn =0.594 Mdn = 0.034 Mdn =0.440 Mdn=7.366 M= 0.502 M=0.552 M=0.392 M=28.96 Mdn =0.507 Mdn =0.711 Mdn =0.288 Mdn=32.34 M= 0.344 M=0.455 M=0.531 M=24.378 Mdn=0.277 Mdn=0.353 Mdn=0.521 Mdn= 28.254 M=0.531 M=0.059 M=0.541 M=12.725 Mdn=0.55 Mdn=0.011 Mdn=0.512 Mdn= 6.0 Table S1. Mean and median of the estimated parameters from the reinforcement learning model for Exp. 1–4. AIC = Akaike Information Criterion Figure S3. The distribution of the estimated β¦ parameter in Exp. 1 and 4. The red line show the sample median and the blue line the sample mean. Sensitivity Analyses We systematically varied the key parameters of these models to assess the robustness of the results. When not otherwise noted, the values of the parameters for all simulations were the same as the parameters reported in the main text (αI = 0.5, αO = 0.5, Ω = 0.03, M = 20, and N = 100). Effect of punishment probability First, we calculated the value of P(Punishment|B) that maximized the difference in P(A) between groups with BC (αO > 0) and RPO (R = 1) relative to groups without these mechanisms. As seen in Figure S4, the maximum difference occurred at P(Punishment B) ≈ 0.01, and unequivocally show the importance of BC and RPO in the presence of rare punishment. Figure S4. The maximum difference in P(A) between groups with RPO and BC relative to groups without these mechanism (based on 10 simulation runs). The dotted vertical line indicates the maximum difference (~0.009). Effect of group size The simulation results reported in the main text were based on N = 100. To assess the stability of avoidance traditions in smaller groups, we ran simulations with different group-sizes. As seen in Fig. S5, smaller groups showed on average less pronounced avoidance traditions Figure S5. The effect of group size (N). Average P(A) for different simulated group sizes (10 simulation runs each) averaged across 10000 time steps. P(Punishment |B) = .005. Error bars indicate SEM. Of specific relevance is how stable behavior is in groups of 10 agents, the empirical group size in Exp. 4. Figure S6 shows two typical simulation runs with (Fig. S6 A), and without (Fig. S6 B) punishment. In groups with punishment, the average behavior occasionally switched but was on average above random (mean P(A) ≈ .74). In groups without punishment, the average behavior oscillated between A and B. Figure S6. Avoidance traditions with N = 10. (A) Two typical simulation runs, with (A) and without (B) actual punishment in groups of 10 agents In (A), behavior B had a 0.5 % risk of being punished (P(Punishment |B) = 0.005). All other parameter values were identical to the simulations reported in the main text. Effect of M (observation trials) Figure S7 shows the effect of varying M, the parameter that determines the number of observation trials, in the presence of rare punishment (P(Punishment |B) = 0.005). The number of individual choice trials was also M, and average life was thus 2M. In the empirical experiments and all other simulations, M = 20. The strength of the avoidance traditions is predicted to increase with M. . Figure S7. The effect of M (observation trials). Each value is based on 10 simulation runs, averaged across 100000 time steps. In the simulations and experiment reported in the main text, M = 20 Error-bars indicate the within-run standard deviation. P(Punishment |B) = 0.005. Effect of individual differences in BC Figure S8 shows how P(A), across different punishment probabilities, is affected by the proportion of individuals in the group with BC (c.f., Figure 1, main text). Figure S8. Variation in BC. The strength of avoidance traditions for five levels of individual differences in BC propensity. Each line is based on 10 simulation runs, averaged across 100000 time steps. Effect of individual-level model parameters We also varied the parameters of the basic RL model that controlled the individual-level behavior. Figure S9 shows that the emergence of avoidance tradition depends on the parameter Ω, which determines the psychological impact of RPO. The occurrence of arbitrary avoidance traditions (Figure S9, bottom) at zero probability of punishment is indicated by the high standard deviation at the lower range of the Ω parameter (because arbitrary avoidance traditions can emerge for the two behaviors, the mean will naturally be zero when averaged across multiple simulation runs. See Figure 2 in the main text for illustration of a representative run). The estimated empirical Ω value fell in the lower range of the values reliably resulting in avoidance traditions. Figure S10 shows the effect of varying the πΌπΌ - parameter, and Figure S11 shows the effect of varying the πΌπ - parameter. As evident, none of these two parameters strongly affect avoidance tradition, although higher values of the αI parameter promote stronger avoidance traditions. Higher αI magnitude will result in faster convergence on one action (e.g., A), which is especially important if the choices of the Demonstrator agent were evenly distributed over both A and B (i.e., if the Q-values of the actions are comparable or close at the outset of the Choice phase). If the probability of punishment (Fig. 1, main text), and the Ω parameter (Figure S9) promote avoidance traditions the effect of αI magnitude is weaker. Figure S9. Variations in β¦ - magnitude. (Top) With punishment (P(Punishment |B) = 0.005), and (bottom) without punishment. Each value is based on 10 simulation runs, averaged across 100000 time steps. The dotted gray lines show the within-run standard deviation. The green and blue lines show the estimated median Ω in Exp. 1, and Exp. 4 respectively. Figure S10. Variations in πΆπ° – magnitude. (Top) With punishment (P(Punishment |B) = 0.005), and (bottom) without punishment. Each value is based on 10 simulation runs, averaged across 100000 time steps. Error-bars indicate the withinrun standard deviation. Figure S11. Variations in πΆπΆ – magnitude. (Top) With punishment (P(Punishment |B) = 0.005), and (bottom) without punishment. Each value is based on 10 simulation runs, averaged across 100000 time steps. Error-bars indicate the withinrun standard deviation. Multi-demonstrator model To confirm that our basic results can be generalized to a setting with more than one Demonstrator (see Discussion, main text), we analyzed a modified version of the model presented above (Multidemonstrator model). The only difference between the original (see above) and modified version was that in the latter, each agent randomly selected a Demonstrator on every time-step from the population of other agents. Because the model is non-spatial, there were no restrictions based on proximity. As shown in Fig. S12, this model created both adaptive and arbitrary avoidance traditions (the results were similar for lower punishment probabilities). The multi-demonstrator model created even more stable avoidance traditions than the original model, both in the presence and absence of punishment, and was more sensitive to the value of the β¦ parameter. Figure S12. Multi-demonstrator model. The influence of the β¦ parameter on avoidance traditions with (top) and without (bottom) rare punishment. The results are averaged over 10000 time-steps and 20 simulation runs. The dotted gray lines show the within-run standard deviation. Three-action model To confirm that our results can be generalized to choices between more than two actions, we analyzed a version of the basic model which included an additional, third choice (C). Also with three choices, the combination of BC and RPO led to a highly reduced risk of individual punishment (Figure S13, cf. Figure 1B main text). Figure S14 show a representative run of the behavioral dynamics with RPO and BC. As evident, the population avoided the dangerous B action (red color). The two safe actions displayed temporary oscillations in the same manner as the original model with zero probability of punishment (cf. Figure 2B, main text). Figure S13. Probability of incurring punishment in the three-action model. Ω = 0.03, αI = 0.5, αO = 0.5, M = 20, N = 100. Figure S14. Representative run of the three-action model. The red action risked being punished (P(Punishment |Red) = 0.005). Ω = 0.03, αI = 0.5, αO = 0.5, M = 20, N = 100. Generation = 2M. References 1. Jaeger TF (2008) Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. Journal of memory and language 59:434-446. 2. Wilensky U (1999) Netlogo.