Evolution of RC in other - Proceedings of the Royal Society B

advertisement
1
Electronic supplementary material A: Lexicon-syntax coevolution model
2
A1 Language and individuals
3
The model encodes language as meaning-utterance mappings (M-U mappings). On the
4
meaning side, individuals share a semantic space containing a fixed number of integrated meanings,
5
each of which has a predicate-argument structure, e.g., “predicate<agent>” or “predicate<agent,
6
patient>”, where predicate, agent, and patient are thematic notations. Predicates refer to actions
7
that individuals conceptualize (e.g., “run” or “chase”), and arguments entities on or by which those
8
actions are performed (e.g., “fox” or “tiger”). Some predicates take a single argument, e.g.,
9
“run<tiger>” (“a tiger is running”); others take two, e.g., “chase<tiger, fox>” (“a tiger is chasing a
10
fox”), where the first constituent in < >, “tiger”, denotes the agent (the action instigator) of the
11
predicate “chase”, and the second, “fox”, the patient (the entity that undergoes the action).
12
Meanings with identical agent and patient constituents (e.g., “fight<fox, fox>”) are excluded. On
13
the utterance side, integrated meanings are encoded by utterances, each comprising a string of
14
syllables chosen from a signaling space. An utterance encoding an integrated meaning can be
15
segmented into subparts, each mapping one or two semantic constituents; and meanwhile, subparts
16
can combine to encode an integrated meaning. Using predicate-argument structures to denote
17
semantics and combinable syllables to form utterances has been adopted in many structured
18
simulations [1].
19
Individuals are simulated as artificial agents. By means of learning mechanisms (see A3), they
20
can: a) acquire linguistic knowledge (see A2) from M-U mappings obtained in previous
21
communications; and b) produce utterances to encode integrated meanings and comprehend
22
utterances during communications (see A4). In addition, during reproduction, some agents (parents)
23
can create new agents (offspring) who have no linguistic knowledge but inherit the learning
24
mechanisms and language-related abilities (e.g., joint attention) of their parents. After learning from
25
their parents or other agents during inter-general communications, offspring replace their parents,
26
and start to communicate with other agents during intra-generational communications.
27
28
A2 Linguistic knowledge
29
Linguistic knowledge is encoded by lexicon, syntax, and syntactic categories. An individual’s
30
lexicon consists of a number of lexical rules (see figure A1). Some of these rules are holistic, each
31
mapping an integrated meaning onto an utterance (sentence), e.g., “run<tiger>” ↔ /abcd/ indicates
32
that the meaning “run<tiger>” can be encoded by the utterance /abcd/, and also that /abcd/ can be
33
decoded as “run<tiger>”; others are compositional, each mapping particular semantic constituent(s)
34
onto a subpart of an utterance, e.g., “fox” ↔ /ef/ (a word rule) or “chase<wolf, #>” ↔ /gh*i/ (a
35
phrase rule, “#” denotes an unspecified constituent, and “*” unspecified syllable(s)).
36
In order to form a sentence using compositional rules, the utterances of these rules need to be
37
ordered. A syntactic rule (see figure A1) specifies an order between two lexical items, e.g., “tiger”
38
↔ /ef/ << “fox” ↔ /abc/ denotes that the utterance /ef/ encoding the constituent “tiger” lies before
39
— but not necessarily immediately before — /abc/ encoding “fox”. Based on one such local order,
40
“predicate<agent>” meanings can be expressed; based on combination of two or three local orders,
41
“predicate<agent, patient>” meanings can be expressed.
42
Syntactic categories are formed in order for syntactic rules acquired from some lexical items to
43
be applied productively to other lexical items with the same thematic notation. A syntactic category
44
(see figure A1) comprises a set of lexical rules and a set of syntactic rules that specify the orders in
45
utterance between these lexical rules and those from other categories. For the sake of simplicity, we
46
simulate a nominative-accusative language and exclude the passive voice. A category that associates
47
lexical rules encoding the thematic notation of agent can be denoted as a subject (S) category, since
48
that thematic notation usually corresponds to the syntactic role of S. Similarly, patient corresponds
49
to object (O), and predicate to verb (V). A local order between two categories can be denoted by
50
their syntactic roles, e.g., an order before between an S and a V category can be denoted by S<<V,
51
or simply SV.
52
Lexical and syntactic knowledge collectively encode integrated meanings. As shown in figure
53
A1, to express “fight<wolf, fox>” using the lexical rules respectively from the S, V, and O
54
categories and the syntactic rules SV and SO, the resulting sentence can be /bcea/ or /bcae/,
55
following SVO or SOV.
56
Every lexical or syntactic rule has a strength, indicating the probability of successfully using its
57
M-U mapping or local order. A lexical rule also has an association weight to the category that
58
contains it, indicating the probability of successfully applying the syntactic rules of this category to
59
the utterance of that lexical rule. Strengths and association weights lie in [0.0 1.0]. A
60
newly-acquired rule has strength 0.5, the same as the weight of a new association of a lexical rule to
61
a category. These numeral parameters realize a strength-based competition during communications
62
(see A4) and a gradual forgetting of linguistic knowledge. Forgetting occurs regularly after a
63
number (scaled to the population size) of communications. During forgetting, all individuals deduct
64
a fixed amount from each of their rules’ strengths and association weights, after that, lexical or
65
syntactic rules with negative strengths are discarded, lexical rules with negative association weights
66
to some categories are removed from those categories, and categories having no lexical rules are
67
also discarded, together with their syntactic rules.
68
69
70
Figure A1. Examples of lexical rules, syntactic rules, and syntactic categories. “#” denotes
71
unspecified semantic item, and “*” unspecified syllable(s). S, V, and O are syntactic roles of
72
categories. Numbers enclosed by ( ) denote rule strengths, and those by [ ] association weights.
73
“<<” denotes the local order before, and “>>” after. Compositional rules can combine, if specifying
74
each constituent in an integrated meaning exactly once, e.g., rules (c) and (d) can combine to form
75
“chase<wolf, bear>”, and the corresponding utterance is /ehfg/.
76
77
A3 Acquisition mechanisms
78
Individuals use general learning mechanisms to acquire linguistic knowledge. Lexical rules are
79
acquired by detecting recurrent patterns (meanings and syllables appearing recurrently in at least
80
two M-U mappings). Each individual has a buffer storing M-U mappings obtained from previous
81
communications. New mappings, before being inserted into the buffer, are compared with those
82
already in the buffer. As shown in figure A2a, by comparing “hop<fox>” ↔ /ab/ with “run<fox>”
83
↔ /acd/, an individual can detect the recurrent pattern “fox” and /a/, and if there is no lexical rule
84
recording such mapping in the individual’s rule list, a lexical rule “fox” ↔ /a/ with initial strength
85
0.5 will be acquired.
86
87
88
(a)
89
90
(b)
91
Figure A2. Examples of acquisition of lexical rules (a) and syntactic categories and syntactic rules
92
(b). M-U mappings are itemized by Arabic numbers, and lexical rules by Roman numbers.
93
94
Syntactic categories and syntactic rules are acquired according to the thematic notations of
95
lexical rules and local order relations of their utterances in M-U mappings. As shown in figure A2b,
96
evident in M-U mappings (1) and (2), syllables /d/ of rule (i) and /ac/ of rule (iii) precede /m/ of rule
97
(ii). Since both “wolf” and “fox” are agents in these meanings, rules (i) and (iii) are associated into
98
an S category (Category 1), and the order before between these two rules and rule (ii) is acquired as
99
a syntactic rule. Similarly, in M-U mappings (1) and (3), /m/ of rule (ii) and /b/ of rule (iv) follow
100
/d/ of rule (i), thus leading to a V category (Category 2) associating rules (ii) and (iv) and a syntactic
101
rule after. Now, since Categories 1 and 2 respectively associate rules (i) and (iii), and (ii) and (iv),
102
the two syntactic rules are updated as “Category 1 (S) << Category 2 (V)”, indicating that the
103
syllables of lexical rules in the S category should precede those in the V category.
104
105
Such item-based learning mechanisms have been traced in empirical studies [2], and the
categorization process resembles the verb-island hypothesis [3].
106
107
108
109
A4 Communication
A communication involves two individuals (a speaker and a listener), who perform a number of
utterance exchange, each proceeding as follows (see figure A3).
110
111
112
Figure A3. Diagram of an utterance exchange. The dotted block indicates unreliable cues.
113
114
In production, the speaker (hereafter as “she”) randomly selects an integrated meaning from the
115
semantic space to produce. She activates: a) lexical rules encoding some (compositional rules) or all
116
(holistic rules) constituents in this meaning; b) categories associating those lexical rules and having
117
appropriate syntactic roles; and c) syntactic rules in those categories that can regulate those lexical
118
rules in sentences. These rules form candidate sets for production. She calculates the combined
119
strength (CSproduction) of each set (see equation (A.1), where Avg means taking average, aso
120
association weights, and str rule strengths):
121
122
CS production  Avg(str ( LexRule (s)))  Avg(aso(Cats )  str (SynRule (s)))
(A.1)
123
124
CSproduction is the sum of the lexical contribution (the average strength of the lexical rules in this
125
set) and the syntactic contribution (the average product of (i) the strengths of the syntactic rules
126
regulating the lexical rules and (ii) the association weights of those lexical rules to the categories in
127
this set). As shown in figure A1, the three categories, three lexical rules encoding “wolf”, “fight<#,
128
#>” and “fox”, and two syntactic rules SV and SO form a candidate set to encode “fight<wolf,
129
fox>”. Its CSproduction is 0.98, where the lexical contribution is 0.6 ((0.7+0.6+0.5)/3) and the
130
syntactic contribution is 0.38 ((0.8×(0.7+0.6)/2+0.4×(0.7+0.5)/2)/2).
131
After calculation, she selects the set of winning rules having the highest CSproduction, builds up
132
the sentence accordingly, and transmits it to the listener. If lacking sufficient rules to encode the
133
integrated meaning, she occasionally (based on a random creation rate) creates a holistic rule to
134
encode the whole meaning.
135
In comprehension, the listener (hereafter as “he”) receives the sentence from the speaker and an
136
environmental cue (whose meaning is set according to RC). Based on his linguistic knowledge, he
137
activates (i) lexical rules whose syllables fully or partially match the heard sentence, (ii) categories
138
associating these lexical rules, and (iii) syntactic rules in those categories whose orders match those
139
of the lexical rules in the heard sentence. Then, he compares the cue’s meaning with the meaning
140
comprehended using the activated rules. According to the three conditions discussed in the main
141
texts, he forms candidate sets for comprehension, and calculates the combined strength of each set
142
(see equation (A.2)). For a set without a cue, the calculation is the same as CSproduction; for a set with
143
a cue, cue strength is added; and for a set with only a cue, cue strength becomes CScomprehension:
144
145
CS comprehension  Avg(str ( LexRule (s)))  Avg(aso(Cats )  str (SynRule (s)))  str (Cue) (A.2)
146
147
After calculation, the listener selects the set of winning rules having the highest CScomprehension to
148
interpret the heard sentence. If CScomprehension of the winning rules exceeds a confidence threshold, he
149
adds the perceived M-U mapping to his buffer, and transmits a positive feedback to the speaker.
150
Then, both individuals reward their winning rules by adding a fixed amount to their strengths and
151
association weights, and penalize other competing ones by deducting the same amount from their
152
strengths and association weights. Otherwise, he discards the perceived mapping and sends a
153
negative feedback to the speaker. Then, both individuals penalize their winning rules. Such linear
154
inhibition mechanism has been used in many models, though its details may be slightly different
155
(e.g., [4]). For activated rules having initial strength and association weight values, the linguistic
156
(lexical and syntactic) contribution is 0.75 (0.5+0.5×0.5). In order to equally treat linguistic and
157
non-linguistic information, we set cue strength and confidence threshold as 0.75. Together with
158
forgetting, this strength-based competition leads to conventionalization of linguistic knowledge
159
among individuals.
160
Equations (A.1) and (A.2) illustrate how non-linguistic information aids linguistic
161
comprehension, by clarifying unspecified constituent(s) and enhancing rules helping achieve similar
162
interpretations. Recent neuroscience studies (e.g., [5]) have shown that certain brain regions act as a
163
domain-general cognitive control on categorization, stimulus-response mapping, and voluntary
164
attention shift among perceptual inputs and memory representations. This provides the neural basis
165
of the coordination of various types of information.
166
Throughout the utterance exchange, there is no check whether the speaker's encoded meaning
167
matches the listener’s decoded one. This makes the model suitable for studying whether unreliable
168
cues can trigger the development of linguistic knowledge and whether the transition from relying on
169
cues to relying on linguistic rules could take place.
170
171
Table A1. Parameter setting for the semantic and signaling spaces, acquisition and communication
172
mechanisms. The effects of these parameters on language evolution are discussed in [6].
Parameters
Values
Size of semantic space
64
Size of signaling space
30
Size of individual buffer
40
Size of rule list
60
Number of utterance exchange per communication/transmission
20
Random creation rate of holistic rules
0.25
Amount of adjustment on strengths and association weights in competition
0.1
Amount of deduction on strengths and association weights in forgetting
0.01
Size of population
10
Frequency of forgetting (scaled to population size)
10
173
174
Table A1 lists the values of the parameters defining the semantic and signaling spaces, and
175
acquisition and communication mechanisms. In each simulation, individuals in the first generation
176
share 8 holistic rules with strength 1.0. This knowledge can help exchange 8 out of 64 integrated
177
meanings in the semantic space. Except these holistic rules, individuals have no compositional or
178
syntactic rules, or categories, all of which have to be acquired via their learning mechanisms during
179
communications.
180
At the early stage of language origin, since individuals fail to express many integrated
181
meanings, many holistic expressions are created randomly by individuals to encode those meanings.
182
These numeral linguistic instances cause recurrent patterns to appear more frequently. Then, based
183
on those learning mechanisms, individuals start extracting some of these patterns as compositional
184
rules, and the competition between compositional and holistic rules begins, both inside and among
185
idiolects. A holistic rule can only express one integrated meaning, but a compositional rule, due to
186
combination, can potentially express many integrated meanings all involving the constituent(s)
187
encoded by this rule. This makes compositional rules be referred to more frequently than holistic
188
ones in communications. Therefore, compositional rules gradually win the competition against
189
holistic rules. And with more compositional rules being shared among individuals, a communal
190
compositional language with a high level of mutual understandability emerges among individuals.
191
In summary, this model presumes that: a) individuals can conceptualize semantic constituents
192
and integrated meanings having simple predicate-argument structures; b) they can use general
193
learning mechanisms to handle lexical and syntactic rules, and compositional relationship; c) they
194
can optimize their linguistic knowledge during communications. Compared with the iterated
195
learning (e.g., [7]) and language game models (e.g., [8]), this model traces a simultaneous
196
acquisition and close interaction of lexical and syntactic knowledge during language processing;
197
compared with the artificial neural network (e.g., [9]) and gene-culture/language coevolution
198
models (e.g., [10]), this model implements concrete instance-based, learning mechanisms and major
199
behaviors during linguistic communications.
200
201
References
202
1
203
Wagner, K., Reggia, J. A., Uriagereka, J., & Wilkinson, G. S. (2003). Progress in the simulation
of emergent communication and language. Adaptive Behavior, 11(1), 37–69.
204
2
development of dependency resolution. Lingua, 118(4), 499–521.
205
206
3
4
Steels, L. (2005). The emergence and evolution of linguistic structure: From lexical to
grammatical communication systems. Connection Science, 17(3–4), 213–230.
209
210
Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition.
Cambridge, MA: Harvard University Press.
207
208
Mellow, J. D. (2008). The emergence of complex syntax: A longitudinal case study of the ESL
5
Esterman, M., Chiu, Y-C., Tamber-Rosenau, B. J., & Yantis, S. (2009). Decoding cognitive
211
control in human parietal cortex. Proceedings of the National Academy of Sciences of the United
212
States of America, 106, 17974–17979.
213
6
emergence. Taipei: Institute of Linguistics, Academia Sinica.
214
215
Gong, T. (2009). Computational simulation in evolutionary linguistics: A study on language
7
Kirby, S., Christiansen, M. H., & Chater, N. (2009). Syntax as an adaptation to the learner. In D.
216
Bickerton & E. Szathmáry (Eds.), Biological foundations and origin of syntax (pp. 325–344).
217
Cambridge, MA: MIT Press.
218
8
Puglisi, A., Baronchelli, A., & Loreto, V. (2008). Cultural route to the emergence of linguistic
219
categories. Proceedings of the National Academy of Sciences of the United States of America,
220
105, 7936–7940.
221
9
Christiansen, M. H., & Ellefson, M. R. (2002). Linguistic adaptation without linguistic
222
constraints: The role of sequential learning in language evolution. In A. Wray (Ed.), The
223
transition to language (pp. 335–358). Oxford: Oxford University Press.
224
225
226
10 Nowak, M. A., Komarova, N. L., & Niyogi, P. (2002). Computational and evolutionary aspects
of language. Nature, 417, 611–617.
227
Electronic supplementary material B: Simulation results in 2000 and 5000 generations
228
As for UR, the ANCOVA shows that natural selection has a significant main effect on UR
229
(under 2000 generations: F1,8 = 96.131, p < .001, ηp2 = 0.923; under 5000 generations: F1,8 =
230
521.499, p < 0.001, ηp2 = .985), but cultural selection does not (under 2000 generations: F1,8 =
231
3.619, p = 0.095, ηp2 = 0.311; under 5000 generations: F1,8 = 1.282, p = 0.290, ηp2 = 0.138), and
232
there is no significant interaction between the two (under 2000 generations: F1,8 = 2.713, p = 0.138,
233
ηp2 = 0.253; under 5000 generations: F1,8 = 1.583, p = 0.244, ηp2 = 0.165). In addition, RC condition
234
has a significant main effect on UR (under 2000 generations: F8,7.136 = 5.469, p < 0.018, ηp2 = 0.860;
235
under 5000 generations: F8,4.925 = 6.384, p < 0.029, ηp2 = 0.912) and interacts significantly with
236
natural selection (under 2000 generations: F8,8 = 4.873, p < 0.019, ηp2 = 0.830; under 5000
237
generations: F8,8 = 1.696, p < 0.036, ηp2 = 0.629).
238
As for RC, the similar ANCOVA shows that natural selection (under 2000 generations: F1,8 =
239
30.011, p < 0.001, ηp2 = 0.790; under 5000 generations: F1,8 = 71.470, p < 0.0005, ηp2 = 0.899) and
240
RC condition (under 2000 generations: F8,7.090 = 19.382, p < 0.0005, ηp2 = 0.956; under 5000
241
generations: F8,6.999 = 10.840, p < 0.003, ηp2 = 0.925) have significant main effects on RC, but
242
cultural selection does not (under 2000 generations: F1,8 = 1.978, p = 0.197, ηp2 = 0.198; under 5000
243
generations: F1,8 = 3.500, p = 0.098, ηp2 = 0.304); and there is a significant interaction between
244
natural selection and RC condition (under 2000 generations: F8,8 = 4.021, p < 0.033, ηp2 = 0.801;
245
under 5000 generations: F8,8 = 0.854, p < 0.020, ηp2 = 0.828), but not between the two types of
246
selections (under 2000 generations: F1,8 = 2.980, p = 0.123, ηp2 = 0.271; under 5000 generations:
247
F1,8 = 1.460, p = 0.261, ηp2 = 0.154). Below, figure B1 shows these results, figure B2 shows the
248
increasing and maintaining effects of natural selection on RC based on the results in certain RC
249
conditions, and figure B3, B4 show the results in the other RC conditions.
250
251
252
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
253
254
255
256
257
258
259
Figure B1. Marginal mean UR as a function of natural (Nat) and cultural (Cul) under 2000 (a) and
260
5000 (e) generations, as a function of Nat and RC condition under 2000 (b) and 5000 (f)
261
generations; Marginal mean RC as a function of Nat and Cul under 2000 (c) and 5000 (g)
262
generations, as a function of Nat and RC condition under 2000 (d) and 5000 (h) generations. Error
263
bars denote standard errors.
264
265
266
(a)
(b)
(c)
(d)
267
268
269
Figure B2. Mean RC along generations in the RC conditions 0.2 (a) and 0.8 (b) under 2000
270
generations and the RC conditions 0.3 (c) and 0.9 (d) under 5000 generations.
271
272
273
(a)
(b)
(c)
(d)
(e)
(f)
274
275
276
277
278
279
280
(g)
Figure B3. RC evolution in RC conditions 0.1 (a), 0.3–0.7 (b–f), 0.9 (g) under 2000 generations.
281
282
283
(a)
(b)
(c)
(d)
(e)
(f)
284
285
286
287
288
289
290
(g)
Figure B4. RC evolution in RC conditions 0.1 (a), 0.2 (b), 0.4–0.8 (c–g) under 5000 generations.
291
292
Electronic supplementary material C: Evolution of RC in other RC conditions under 1000
293
generations
294
295
(a)
(b)
(c)
(d)
(e)
(f)
296
297
298
299
300
301
(g)
302
Figure C1. RC evolution in RC conditions 0.1–0.3 (a–c), 0.5 (d), 0.6 (e), 0.8 (f), 0.9 (g) under 1000
303
generations.
Download