Supplemental Materials Mechanisms of Social Avoidance Learning

advertisement
Supplemental Materials
Mechanisms of Social Avoidance Learning Can Explain the Emergence of Adaptive and
Arbitrary Behavioral Traditions in Humans
by B. Lindström & A. Olsson, 2015, JEP: General
http://dx.doi.org/10.1037/xge0000071
Additional analyses of Exp. 1-4
To ascertain that our conclusions do not depended on this specific analysis strategy, we also analyzed
the data using both simple parametric (one-sample t-tests) and non-parametric (Wilcoxon signed rank
test) tests. The 20 choices of each individual was aggregated into a proportion which we refer to as
P(A)* to differentiate from the random effects regression. We analyzed if the average P(A)* in each
experiment different from 0.5 (see figure S1). Both parametric and non-parametric tests resulted in the
identical conclusions as the random effects regressions: P(A)* in Exp. 1 was above 0.5 (t(24) = 3.65,
p = .001. W = 271 p = .003), P(A)* in Exp. 2 was not different from 0.5 (t(19) = 0.92, p = .36. W =
103.4, p = .44), P(A) * in Exp. 3 was above 0.5 (t(24) = 2.64, p = .01. W = 208, p = .03), and P(A) *
in Exp. 4 was above 0.5 (t(49) = 2.23, p = .03. W = 755, p =.014).
Figure S1. The empirical distributions of individual P(A) in experiment 1-4.
Additional analyses of Exp. 4.
In the main text, we report a quadratic effect of Generation on P(A). This GLMM model was based on
a simplified random effects structure (random intercepts for each Transmission Chain and Generation,
but no random terms for individual subjects), because the more complex model including random
intercepts for all subjects failed to converge. A model with only a linear term for Generation (β = 0.37, SE=23, z = -1.62, p = .1) had an inferior fit (likelihood ratio test: χ2(1) = 5.4, p = .02) relative to
a model that included a quadratic term for Generation (linear effect: β = -16.65, SE = 5.38, z = -3.1, p
= .002. quadratic effect: β = 17.74, SE = 5.37, z = 3.3, p < .001). In Fig. S2, the empirical data and best
fitting line is shown.
As a complement to the average correlation across generations reported in the main text
(Fig. 5), we also conducted a LMM analysis with random intercepts for the different transmission
chains. This analysis included an interaction term (Previous generation * Generation) to test if the
strength of the effect reported in the main text differed across generations. This was not the case,
(Previous generation * Generation interaction: χ2(1) = 0.0945, p = .76). The main effect of Previous
generation remained significant when controlling for Generation, χ2(1) = 4.3, p < .05.
Figure S2. Experiment 4. The boxplot shows the median (black lines) and the first and third quartiles (the box height).
The whiskers indicate 1.5 interquartile ranges above and below the first and third quartiles. The red line shows the quadratic
effect of Generation on P(A) described in the main text.
Parameter estimation
Parameter estimation of the Experiment 1-4 data was conducted using maximum-likelihood
optimization, which finds the set of parameters that maximize the probability of the participant´s
trial-by-trial choices given the model. Optimization was done by to minimizing the negative loglikelihood, -L, computed by:
𝑇
−𝐿 = − ∑ 𝑙𝑛 (π‘ƒπ‘β„Žπ‘œπ‘–π‘π‘’ (𝑑))
[1]
𝑑=1
where T denotes the total number of trials. Parameters were independently fitted to each subject
using the BFGS optimization method. Model implementations and parameter fitting was done in R
(R Development Core Team, 2012). The results of the model fitting for Exp. 1-4 are shown in Table
S1. The Akaike Information Criterion is used as index of model fit: 𝐴𝐼𝐢 = 2π‘˜ − 2ln(𝐿), where k is the
number of fitted parameters and –ln(L) is the negative log-likelihood. As shown, the goodness of fit in
Exp. 1 and 4 was extremely good, which depends on the low variance in within-subject behavior (see
Fig. S1). We used the sample median as point estimate of the empirical parameter values, because
the distribution of estimates for the Ω parameter was very skewed (see Fig. S3).
Parameter
Experiment
Exp. 1
Exp. 2
Exp. 3
Exp. 4
𝛼𝐼
Ω
𝛼𝑂
AIC
M= 0.567
M=0.227
M=0.424
M=13.608
Mdn =0.594
Mdn = 0.034
Mdn =0.440
Mdn=7.366
M= 0.502
M=0.552
M=0.392
M=28.96
Mdn =0.507
Mdn =0.711
Mdn =0.288
Mdn=32.34
M= 0.344
M=0.455
M=0.531
M=24.378
Mdn=0.277
Mdn=0.353
Mdn=0.521
Mdn= 28.254
M=0.531
M=0.059
M=0.541
M=12.725
Mdn=0.55
Mdn=0.011
Mdn=0.512
Mdn= 6.0
Table S1. Mean and median of the estimated parameters from the reinforcement learning model for Exp. 1–4.
AIC = Akaike Information Criterion
Figure S3. The distribution of the estimated Ω parameter in Exp. 1 and 4. The red line show the sample median and
the blue line the sample mean.
Sensitivity Analyses
We systematically varied the key parameters of these models to assess the robustness of the results.
When not otherwise noted, the values of the parameters for all simulations were the same as the
parameters reported in the main text (αI = 0.5, αO = 0.5, Ω = 0.03, M = 20, and N = 100).
Effect of punishment probability
First, we calculated the value of P(Punishment|B) that maximized the difference in P(A) between
groups with BC (αO > 0) and RPO (R = 1) relative to groups without these mechanisms. As seen in
Figure S4, the maximum difference occurred at P(Punishment B) ≈ 0.01, and unequivocally show the
importance of BC and RPO in the presence of rare punishment.
Figure S4. The maximum difference in P(A) between groups with RPO and BC relative to groups without these mechanism
(based on 10 simulation runs). The dotted vertical line indicates the maximum difference (~0.009).
Effect of group size
The simulation results reported in the main text were based on N = 100. To assess the stability of
avoidance traditions in smaller groups, we ran simulations with different group-sizes. As seen in Fig.
S5, smaller groups showed on average less pronounced avoidance traditions
Figure S5. The effect of group size (N). Average P(A) for different simulated group sizes (10 simulation runs each)
averaged across 10000 time steps. P(Punishment |B) = .005. Error bars indicate SEM.
Of specific relevance is how stable behavior is in groups of 10 agents, the empirical group size in Exp.
4. Figure S6 shows two typical simulation runs with (Fig. S6 A), and without (Fig. S6 B) punishment.
In groups with punishment, the average behavior occasionally switched but was on average above
random (mean P(A) ≈ .74). In groups without punishment, the average behavior oscillated between A
and B.
Figure S6. Avoidance traditions with N = 10. (A) Two typical simulation runs, with (A) and without (B) actual
punishment in groups of 10 agents In (A), behavior B had a 0.5 % risk of being punished (P(Punishment |B) = 0.005). All
other parameter values were identical to the simulations reported in the main text.
Effect of M (observation trials)
Figure S7 shows the effect of varying M, the parameter that determines the number of observation
trials, in the presence of rare punishment (P(Punishment |B) = 0.005). The number of individual choice
trials was also M, and average life was thus 2M. In the empirical experiments and all other simulations,
M = 20. The strength of the avoidance traditions is predicted to increase with M.
.
Figure S7. The effect of M (observation trials). Each value is based on 10 simulation runs, averaged across 100000 time
steps. In the simulations and experiment reported in the main text, M = 20 Error-bars indicate the within-run standard
deviation. P(Punishment |B) = 0.005.
Effect of individual differences in BC
Figure S8 shows how P(A), across different punishment probabilities, is affected by the proportion of
individuals in the group with BC (c.f., Figure 1, main text).
Figure S8. Variation in BC. The strength of avoidance traditions for five levels of individual differences in BC
propensity. Each line is based on 10 simulation runs, averaged across 100000 time steps.
Effect of individual-level model parameters
We also varied the parameters of the basic RL model that controlled the individual-level behavior.
Figure S9 shows that the emergence of avoidance tradition depends on the parameter Ω, which
determines the psychological impact of RPO. The occurrence of arbitrary avoidance traditions (Figure
S9, bottom) at zero probability of punishment is indicated by the high standard deviation at the lower
range of the Ω parameter (because arbitrary avoidance traditions can emerge for the two behaviors, the
mean will naturally be zero when averaged across multiple simulation runs. See Figure 2 in the main
text for illustration of a representative run). The estimated empirical Ω value fell in the lower range of
the values reliably resulting in avoidance traditions.
Figure S10 shows the effect of varying the 𝛼𝐼 - parameter, and Figure S11 shows the
effect of varying the 𝛼𝑂 - parameter. As evident, none of these two parameters strongly affect
avoidance tradition, although higher values of the αI parameter promote stronger avoidance traditions.
Higher αI magnitude will result in faster convergence on one action (e.g., A), which is especially
important if the choices of the Demonstrator agent were evenly distributed over both A and B (i.e., if
the Q-values of the actions are comparable or close at the outset of the Choice phase). If the probability
of punishment (Fig. 1, main text), and the Ω parameter (Figure S9) promote avoidance traditions the
effect of αI magnitude is weaker.
Figure S9. Variations in Ω - magnitude. (Top) With punishment (P(Punishment |B) = 0.005), and (bottom) without
punishment. Each value is based on 10 simulation runs, averaged across 100000 time steps. The dotted gray lines show the
within-run standard deviation. The green and blue lines show the estimated median Ω in Exp. 1, and Exp. 4 respectively.
Figure S10. Variations in πœΆπ‘° – magnitude. (Top) With punishment (P(Punishment |B) = 0.005), and (bottom) without
punishment. Each value is based on 10 simulation runs, averaged across 100000 time steps. Error-bars indicate the withinrun standard deviation.
Figure S11. Variations in πœΆπ‘Ά – magnitude. (Top) With punishment (P(Punishment |B) = 0.005), and (bottom) without
punishment. Each value is based on 10 simulation runs, averaged across 100000 time steps. Error-bars indicate the withinrun standard deviation.
Multi-demonstrator model
To confirm that our basic results can be generalized to a setting with more than one Demonstrator
(see Discussion, main text), we analyzed a modified version of the model presented above (Multidemonstrator model). The only difference between the original (see above) and modified version was
that in the latter, each agent randomly selected a Demonstrator on every time-step from the
population of other agents. Because the model is non-spatial, there were no restrictions based on
proximity. As shown in Fig. S12, this model created both adaptive and arbitrary avoidance traditions
(the results were similar for lower punishment probabilities). The multi-demonstrator model created
even more stable avoidance traditions than the original model, both in the presence and absence of
punishment, and was more sensitive to the value of the Ω parameter.
Figure S12. Multi-demonstrator model. The influence of the Ω parameter on avoidance traditions with (top) and
without (bottom) rare punishment. The results are averaged over 10000 time-steps and 20 simulation runs. The dotted
gray lines show the within-run standard deviation.
Three-action model
To confirm that our results can be generalized to choices between more than two actions, we
analyzed a version of the basic model which included an additional, third choice (C). Also with
three choices, the combination of BC and RPO led to a highly reduced risk of individual punishment
(Figure S13, cf. Figure 1B main text). Figure S14 show a representative run of the behavioral
dynamics with RPO and BC. As evident, the population avoided the dangerous B action (red color).
The two safe actions displayed temporary oscillations in the same manner as the original model with
zero probability of punishment (cf. Figure 2B, main text).
Figure S13. Probability of incurring punishment in the three-action model. Ω = 0.03, αI = 0.5, αO = 0.5, M = 20, N
= 100.
Figure S14. Representative run of the three-action model. The red action risked being punished
(P(Punishment |Red) = 0.005). Ω = 0.03, αI = 0.5, αO = 0.5, M = 20, N = 100. Generation = 2M.
References
1.
Jaeger TF (2008) Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards
Logit Mixed Models. Journal of memory and language 59:434-446.
2.
Wilensky U (1999) Netlogo.
Download