1 Electronic supplementary material A: Lexicon-syntax coevolution model 2 A1 Language and individuals 3 The model encodes language as meaning-utterance mappings (M-U mappings). On the 4 meaning side, individuals share a semantic space containing a fixed number of integrated meanings, 5 each of which has a predicate-argument structure, e.g., “predicate<agent>” or “predicate<agent, 6 patient>”, where predicate, agent, and patient are thematic notations. Predicates refer to actions 7 that individuals conceptualize (e.g., “run” or “chase”), and arguments entities on or by which those 8 actions are performed (e.g., “fox” or “tiger”). Some predicates take a single argument, e.g., 9 “run<tiger>” (“a tiger is running”); others take two, e.g., “chase<tiger, fox>” (“a tiger is chasing a 10 fox”), where the first constituent in < >, “tiger”, denotes the agent (the action instigator) of the 11 predicate “chase”, and the second, “fox”, the patient (the entity that undergoes the action). 12 Meanings with identical agent and patient constituents (e.g., “fight<fox, fox>”) are excluded. On 13 the utterance side, integrated meanings are encoded by utterances, each comprising a string of 14 syllables chosen from a signaling space. An utterance encoding an integrated meaning can be 15 segmented into subparts, each mapping one or two semantic constituents; and meanwhile, subparts 16 can combine to encode an integrated meaning. Using predicate-argument structures to denote 17 semantics and combinable syllables to form utterances has been adopted in many structured 18 simulations [1]. 19 Individuals are simulated as artificial agents. By means of learning mechanisms (see A3), they 20 can: a) acquire linguistic knowledge (see A2) from M-U mappings obtained in previous 21 communications; and b) produce utterances to encode integrated meanings and comprehend 22 utterances during communications (see A4). In addition, during reproduction, some agents (parents) 23 can create new agents (offspring) who have no linguistic knowledge but inherit the learning 24 mechanisms and language-related abilities (e.g., joint attention) of their parents. After learning from 25 their parents or other agents during inter-general communications, offspring replace their parents, 26 and start to communicate with other agents during intra-generational communications. 27 28 A2 Linguistic knowledge 29 Linguistic knowledge is encoded by lexicon, syntax, and syntactic categories. An individual’s 30 lexicon consists of a number of lexical rules (see figure A1). Some of these rules are holistic, each 31 mapping an integrated meaning onto an utterance (sentence), e.g., “run<tiger>” ↔ /abcd/ indicates 32 that the meaning “run<tiger>” can be encoded by the utterance /abcd/, and also that /abcd/ can be 33 decoded as “run<tiger>”; others are compositional, each mapping particular semantic constituent(s) 34 onto a subpart of an utterance, e.g., “fox” ↔ /ef/ (a word rule) or “chase<wolf, #>” ↔ /gh*i/ (a 35 phrase rule, “#” denotes an unspecified constituent, and “*” unspecified syllable(s)). 36 In order to form a sentence using compositional rules, the utterances of these rules need to be 37 ordered. A syntactic rule (see figure A1) specifies an order between two lexical items, e.g., “tiger” 38 ↔ /ef/ << “fox” ↔ /abc/ denotes that the utterance /ef/ encoding the constituent “tiger” lies before 39 — but not necessarily immediately before — /abc/ encoding “fox”. Based on one such local order, 40 “predicate<agent>” meanings can be expressed; based on combination of two or three local orders, 41 “predicate<agent, patient>” meanings can be expressed. 42 Syntactic categories are formed in order for syntactic rules acquired from some lexical items to 43 be applied productively to other lexical items with the same thematic notation. A syntactic category 44 (see figure A1) comprises a set of lexical rules and a set of syntactic rules that specify the orders in 45 utterance between these lexical rules and those from other categories. For the sake of simplicity, we 46 simulate a nominative-accusative language and exclude the passive voice. A category that associates 47 lexical rules encoding the thematic notation of agent can be denoted as a subject (S) category, since 48 that thematic notation usually corresponds to the syntactic role of S. Similarly, patient corresponds 49 to object (O), and predicate to verb (V). A local order between two categories can be denoted by 50 their syntactic roles, e.g., an order before between an S and a V category can be denoted by S<<V, 51 or simply SV. 52 Lexical and syntactic knowledge collectively encode integrated meanings. As shown in figure 53 A1, to express “fight<wolf, fox>” using the lexical rules respectively from the S, V, and O 54 categories and the syntactic rules SV and SO, the resulting sentence can be /bcea/ or /bcae/, 55 following SVO or SOV. 56 Every lexical or syntactic rule has a strength, indicating the probability of successfully using its 57 M-U mapping or local order. A lexical rule also has an association weight to the category that 58 contains it, indicating the probability of successfully applying the syntactic rules of this category to 59 the utterance of that lexical rule. Strengths and association weights lie in [0.0 1.0]. A 60 newly-acquired rule has strength 0.5, the same as the weight of a new association of a lexical rule to 61 a category. These numeral parameters realize a strength-based competition during communications 62 (see A4) and a gradual forgetting of linguistic knowledge. Forgetting occurs regularly after a 63 number (scaled to the population size) of communications. During forgetting, all individuals deduct 64 a fixed amount from each of their rules’ strengths and association weights, after that, lexical or 65 syntactic rules with negative strengths are discarded, lexical rules with negative association weights 66 to some categories are removed from those categories, and categories having no lexical rules are 67 also discarded, together with their syntactic rules. 68 69 70 Figure A1. Examples of lexical rules, syntactic rules, and syntactic categories. “#” denotes 71 unspecified semantic item, and “*” unspecified syllable(s). S, V, and O are syntactic roles of 72 categories. Numbers enclosed by ( ) denote rule strengths, and those by [ ] association weights. 73 “<<” denotes the local order before, and “>>” after. Compositional rules can combine, if specifying 74 each constituent in an integrated meaning exactly once, e.g., rules (c) and (d) can combine to form 75 “chase<wolf, bear>”, and the corresponding utterance is /ehfg/. 76 77 A3 Acquisition mechanisms 78 Individuals use general learning mechanisms to acquire linguistic knowledge. Lexical rules are 79 acquired by detecting recurrent patterns (meanings and syllables appearing recurrently in at least 80 two M-U mappings). Each individual has a buffer storing M-U mappings obtained from previous 81 communications. New mappings, before being inserted into the buffer, are compared with those 82 already in the buffer. As shown in figure A2a, by comparing “hop<fox>” ↔ /ab/ with “run<fox>” 83 ↔ /acd/, an individual can detect the recurrent pattern “fox” and /a/, and if there is no lexical rule 84 recording such mapping in the individual’s rule list, a lexical rule “fox” ↔ /a/ with initial strength 85 0.5 will be acquired. 86 87 88 (a) 89 90 (b) 91 Figure A2. Examples of acquisition of lexical rules (a) and syntactic categories and syntactic rules 92 (b). M-U mappings are itemized by Arabic numbers, and lexical rules by Roman numbers. 93 94 Syntactic categories and syntactic rules are acquired according to the thematic notations of 95 lexical rules and local order relations of their utterances in M-U mappings. As shown in figure A2b, 96 evident in M-U mappings (1) and (2), syllables /d/ of rule (i) and /ac/ of rule (iii) precede /m/ of rule 97 (ii). Since both “wolf” and “fox” are agents in these meanings, rules (i) and (iii) are associated into 98 an S category (Category 1), and the order before between these two rules and rule (ii) is acquired as 99 a syntactic rule. Similarly, in M-U mappings (1) and (3), /m/ of rule (ii) and /b/ of rule (iv) follow 100 /d/ of rule (i), thus leading to a V category (Category 2) associating rules (ii) and (iv) and a syntactic 101 rule after. Now, since Categories 1 and 2 respectively associate rules (i) and (iii), and (ii) and (iv), 102 the two syntactic rules are updated as “Category 1 (S) << Category 2 (V)”, indicating that the 103 syllables of lexical rules in the S category should precede those in the V category. 104 105 Such item-based learning mechanisms have been traced in empirical studies [2], and the categorization process resembles the verb-island hypothesis [3]. 106 107 108 109 A4 Communication A communication involves two individuals (a speaker and a listener), who perform a number of utterance exchange, each proceeding as follows (see figure A3). 110 111 112 Figure A3. Diagram of an utterance exchange. The dotted block indicates unreliable cues. 113 114 In production, the speaker (hereafter as “she”) randomly selects an integrated meaning from the 115 semantic space to produce. She activates: a) lexical rules encoding some (compositional rules) or all 116 (holistic rules) constituents in this meaning; b) categories associating those lexical rules and having 117 appropriate syntactic roles; and c) syntactic rules in those categories that can regulate those lexical 118 rules in sentences. These rules form candidate sets for production. She calculates the combined 119 strength (CSproduction) of each set (see equation (A.1), where Avg means taking average, aso 120 association weights, and str rule strengths): 121 122 CS production Avg(str ( LexRule (s))) Avg(aso(Cats ) str (SynRule (s))) (A.1) 123 124 CSproduction is the sum of the lexical contribution (the average strength of the lexical rules in this 125 set) and the syntactic contribution (the average product of (i) the strengths of the syntactic rules 126 regulating the lexical rules and (ii) the association weights of those lexical rules to the categories in 127 this set). As shown in figure A1, the three categories, three lexical rules encoding “wolf”, “fight<#, 128 #>” and “fox”, and two syntactic rules SV and SO form a candidate set to encode “fight<wolf, 129 fox>”. Its CSproduction is 0.98, where the lexical contribution is 0.6 ((0.7+0.6+0.5)/3) and the 130 syntactic contribution is 0.38 ((0.8×(0.7+0.6)/2+0.4×(0.7+0.5)/2)/2). 131 After calculation, she selects the set of winning rules having the highest CSproduction, builds up 132 the sentence accordingly, and transmits it to the listener. If lacking sufficient rules to encode the 133 integrated meaning, she occasionally (based on a random creation rate) creates a holistic rule to 134 encode the whole meaning. 135 In comprehension, the listener (hereafter as “he”) receives the sentence from the speaker and an 136 environmental cue (whose meaning is set according to RC). Based on his linguistic knowledge, he 137 activates (i) lexical rules whose syllables fully or partially match the heard sentence, (ii) categories 138 associating these lexical rules, and (iii) syntactic rules in those categories whose orders match those 139 of the lexical rules in the heard sentence. Then, he compares the cue’s meaning with the meaning 140 comprehended using the activated rules. According to the three conditions discussed in the main 141 texts, he forms candidate sets for comprehension, and calculates the combined strength of each set 142 (see equation (A.2)). For a set without a cue, the calculation is the same as CSproduction; for a set with 143 a cue, cue strength is added; and for a set with only a cue, cue strength becomes CScomprehension: 144 145 CS comprehension Avg(str ( LexRule (s))) Avg(aso(Cats ) str (SynRule (s))) str (Cue) (A.2) 146 147 After calculation, the listener selects the set of winning rules having the highest CScomprehension to 148 interpret the heard sentence. If CScomprehension of the winning rules exceeds a confidence threshold, he 149 adds the perceived M-U mapping to his buffer, and transmits a positive feedback to the speaker. 150 Then, both individuals reward their winning rules by adding a fixed amount to their strengths and 151 association weights, and penalize other competing ones by deducting the same amount from their 152 strengths and association weights. Otherwise, he discards the perceived mapping and sends a 153 negative feedback to the speaker. Then, both individuals penalize their winning rules. Such linear 154 inhibition mechanism has been used in many models, though its details may be slightly different 155 (e.g., [4]). For activated rules having initial strength and association weight values, the linguistic 156 (lexical and syntactic) contribution is 0.75 (0.5+0.5×0.5). In order to equally treat linguistic and 157 non-linguistic information, we set cue strength and confidence threshold as 0.75. Together with 158 forgetting, this strength-based competition leads to conventionalization of linguistic knowledge 159 among individuals. 160 Equations (A.1) and (A.2) illustrate how non-linguistic information aids linguistic 161 comprehension, by clarifying unspecified constituent(s) and enhancing rules helping achieve similar 162 interpretations. Recent neuroscience studies (e.g., [5]) have shown that certain brain regions act as a 163 domain-general cognitive control on categorization, stimulus-response mapping, and voluntary 164 attention shift among perceptual inputs and memory representations. This provides the neural basis 165 of the coordination of various types of information. 166 Throughout the utterance exchange, there is no check whether the speaker's encoded meaning 167 matches the listener’s decoded one. This makes the model suitable for studying whether unreliable 168 cues can trigger the development of linguistic knowledge and whether the transition from relying on 169 cues to relying on linguistic rules could take place. 170 171 Table A1. Parameter setting for the semantic and signaling spaces, acquisition and communication 172 mechanisms. The effects of these parameters on language evolution are discussed in [6]. Parameters Values Size of semantic space 64 Size of signaling space 30 Size of individual buffer 40 Size of rule list 60 Number of utterance exchange per communication/transmission 20 Random creation rate of holistic rules 0.25 Amount of adjustment on strengths and association weights in competition 0.1 Amount of deduction on strengths and association weights in forgetting 0.01 Size of population 10 Frequency of forgetting (scaled to population size) 10 173 174 Table A1 lists the values of the parameters defining the semantic and signaling spaces, and 175 acquisition and communication mechanisms. In each simulation, individuals in the first generation 176 share 8 holistic rules with strength 1.0. This knowledge can help exchange 8 out of 64 integrated 177 meanings in the semantic space. Except these holistic rules, individuals have no compositional or 178 syntactic rules, or categories, all of which have to be acquired via their learning mechanisms during 179 communications. 180 At the early stage of language origin, since individuals fail to express many integrated 181 meanings, many holistic expressions are created randomly by individuals to encode those meanings. 182 These numeral linguistic instances cause recurrent patterns to appear more frequently. Then, based 183 on those learning mechanisms, individuals start extracting some of these patterns as compositional 184 rules, and the competition between compositional and holistic rules begins, both inside and among 185 idiolects. A holistic rule can only express one integrated meaning, but a compositional rule, due to 186 combination, can potentially express many integrated meanings all involving the constituent(s) 187 encoded by this rule. This makes compositional rules be referred to more frequently than holistic 188 ones in communications. Therefore, compositional rules gradually win the competition against 189 holistic rules. And with more compositional rules being shared among individuals, a communal 190 compositional language with a high level of mutual understandability emerges among individuals. 191 In summary, this model presumes that: a) individuals can conceptualize semantic constituents 192 and integrated meanings having simple predicate-argument structures; b) they can use general 193 learning mechanisms to handle lexical and syntactic rules, and compositional relationship; c) they 194 can optimize their linguistic knowledge during communications. Compared with the iterated 195 learning (e.g., [7]) and language game models (e.g., [8]), this model traces a simultaneous 196 acquisition and close interaction of lexical and syntactic knowledge during language processing; 197 compared with the artificial neural network (e.g., [9]) and gene-culture/language coevolution 198 models (e.g., [10]), this model implements concrete instance-based, learning mechanisms and major 199 behaviors during linguistic communications. 200 201 References 202 1 203 Wagner, K., Reggia, J. A., Uriagereka, J., & Wilkinson, G. S. (2003). Progress in the simulation of emergent communication and language. Adaptive Behavior, 11(1), 37–69. 204 2 development of dependency resolution. Lingua, 118(4), 499–521. 205 206 3 4 Steels, L. (2005). The emergence and evolution of linguistic structure: From lexical to grammatical communication systems. Connection Science, 17(3–4), 213–230. 209 210 Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press. 207 208 Mellow, J. D. (2008). The emergence of complex syntax: A longitudinal case study of the ESL 5 Esterman, M., Chiu, Y-C., Tamber-Rosenau, B. J., & Yantis, S. (2009). Decoding cognitive 211 control in human parietal cortex. Proceedings of the National Academy of Sciences of the United 212 States of America, 106, 17974–17979. 213 6 emergence. Taipei: Institute of Linguistics, Academia Sinica. 214 215 Gong, T. (2009). Computational simulation in evolutionary linguistics: A study on language 7 Kirby, S., Christiansen, M. H., & Chater, N. (2009). Syntax as an adaptation to the learner. In D. 216 Bickerton & E. Szathmáry (Eds.), Biological foundations and origin of syntax (pp. 325–344). 217 Cambridge, MA: MIT Press. 218 8 Puglisi, A., Baronchelli, A., & Loreto, V. (2008). Cultural route to the emergence of linguistic 219 categories. Proceedings of the National Academy of Sciences of the United States of America, 220 105, 7936–7940. 221 9 Christiansen, M. H., & Ellefson, M. R. (2002). Linguistic adaptation without linguistic 222 constraints: The role of sequential learning in language evolution. In A. Wray (Ed.), The 223 transition to language (pp. 335–358). Oxford: Oxford University Press. 224 225 226 10 Nowak, M. A., Komarova, N. L., & Niyogi, P. (2002). Computational and evolutionary aspects of language. Nature, 417, 611–617. 227 Electronic supplementary material B: Simulation results in 2000 and 5000 generations 228 As for UR, the ANCOVA shows that natural selection has a significant main effect on UR 229 (under 2000 generations: F1,8 = 96.131, p < .001, ηp2 = 0.923; under 5000 generations: F1,8 = 230 521.499, p < 0.001, ηp2 = .985), but cultural selection does not (under 2000 generations: F1,8 = 231 3.619, p = 0.095, ηp2 = 0.311; under 5000 generations: F1,8 = 1.282, p = 0.290, ηp2 = 0.138), and 232 there is no significant interaction between the two (under 2000 generations: F1,8 = 2.713, p = 0.138, 233 ηp2 = 0.253; under 5000 generations: F1,8 = 1.583, p = 0.244, ηp2 = 0.165). In addition, RC condition 234 has a significant main effect on UR (under 2000 generations: F8,7.136 = 5.469, p < 0.018, ηp2 = 0.860; 235 under 5000 generations: F8,4.925 = 6.384, p < 0.029, ηp2 = 0.912) and interacts significantly with 236 natural selection (under 2000 generations: F8,8 = 4.873, p < 0.019, ηp2 = 0.830; under 5000 237 generations: F8,8 = 1.696, p < 0.036, ηp2 = 0.629). 238 As for RC, the similar ANCOVA shows that natural selection (under 2000 generations: F1,8 = 239 30.011, p < 0.001, ηp2 = 0.790; under 5000 generations: F1,8 = 71.470, p < 0.0005, ηp2 = 0.899) and 240 RC condition (under 2000 generations: F8,7.090 = 19.382, p < 0.0005, ηp2 = 0.956; under 5000 241 generations: F8,6.999 = 10.840, p < 0.003, ηp2 = 0.925) have significant main effects on RC, but 242 cultural selection does not (under 2000 generations: F1,8 = 1.978, p = 0.197, ηp2 = 0.198; under 5000 243 generations: F1,8 = 3.500, p = 0.098, ηp2 = 0.304); and there is a significant interaction between 244 natural selection and RC condition (under 2000 generations: F8,8 = 4.021, p < 0.033, ηp2 = 0.801; 245 under 5000 generations: F8,8 = 0.854, p < 0.020, ηp2 = 0.828), but not between the two types of 246 selections (under 2000 generations: F1,8 = 2.980, p = 0.123, ηp2 = 0.271; under 5000 generations: 247 F1,8 = 1.460, p = 0.261, ηp2 = 0.154). Below, figure B1 shows these results, figure B2 shows the 248 increasing and maintaining effects of natural selection on RC based on the results in certain RC 249 conditions, and figure B3, B4 show the results in the other RC conditions. 250 251 252 (a) (b) (c) (d) (e) (f) (g) (h) 253 254 255 256 257 258 259 Figure B1. Marginal mean UR as a function of natural (Nat) and cultural (Cul) under 2000 (a) and 260 5000 (e) generations, as a function of Nat and RC condition under 2000 (b) and 5000 (f) 261 generations; Marginal mean RC as a function of Nat and Cul under 2000 (c) and 5000 (g) 262 generations, as a function of Nat and RC condition under 2000 (d) and 5000 (h) generations. Error 263 bars denote standard errors. 264 265 266 (a) (b) (c) (d) 267 268 269 Figure B2. Mean RC along generations in the RC conditions 0.2 (a) and 0.8 (b) under 2000 270 generations and the RC conditions 0.3 (c) and 0.9 (d) under 5000 generations. 271 272 273 (a) (b) (c) (d) (e) (f) 274 275 276 277 278 279 280 (g) Figure B3. RC evolution in RC conditions 0.1 (a), 0.3–0.7 (b–f), 0.9 (g) under 2000 generations. 281 282 283 (a) (b) (c) (d) (e) (f) 284 285 286 287 288 289 290 (g) Figure B4. RC evolution in RC conditions 0.1 (a), 0.2 (b), 0.4–0.8 (c–g) under 5000 generations. 291 292 Electronic supplementary material C: Evolution of RC in other RC conditions under 1000 293 generations 294 295 (a) (b) (c) (d) (e) (f) 296 297 298 299 300 301 (g) 302 Figure C1. RC evolution in RC conditions 0.1–0.3 (a–c), 0.5 (d), 0.6 (e), 0.8 (f), 0.9 (g) under 1000 303 generations.