Introduction

advertisement
Reassessing the Ability of Simple Recurrent Networks to Account for Verbal Working Memory Performance
Nicolas Ruh1, Kerstin Klöckner2 and Lars Konieczny1,2
1University
Introduction
English Subject-relative clauses (SRCs, 1) are easier to process
than Object-RCs (2), and more notably so for readers with a low
reading span (King and Just, 1991). MacDonald & Christiansen
(2002) attributed this asymmetry to the differential degree of
word order regularity: Whereas word order in English SubjectRCs (1) resembles that of frequent main clauses (S-verb-O),
Object-RCs (2) obey irregular word order (O-S-verb).
Fig. 3.
4.
Fig. 1. Reading times for German Subject and Object-RCs
1. The reporter that attacked the senator praised the judge.
2. The reporter that the senator attacked praised the judge.
MC (2002) were able to back their claim by running simulations
based on SRNs (Elman, 1990). The modulating effect of reading
span could be attributed to low-spans' lower degree of reading
experience, simulated by the number of training epochs in SRNs
(frequency x regularity interaction).
S-O asymmetries in German
In German, the preference for S-before-O orderings is well
established (e.g. Mecklinger et al. 1995, Hemforth et al. 1993,
Bader and Meng 1999), albeit with different experimental
paradigms and clause types.
In a self-paced reading experiment we were able to confirm the Sbefore-O preference in German RCs (SRC:3, ORC:4).
3. Der Wärter, der den Häftling beleidigte, entdeckte den Tunnel.
The guard, who [nom] the[acc] prisoner insulted, discovered the tunnel.
”The guard who insulted the prisoner discovered the tunnel.”
4. Der Wärter, den der Häftling beleidigte, entdeckte den Tunnel.
The guard, who [acc] the[nom] prisoner insulted, discovered the tunnel.
Fig. 2. GPEs for German RCs.
SFB 378
of Freiburg, 2Saarland University
As in English, reading times on the embedded verb exhibit a clear
penalty for ORCs. However, while reading span effected reading
time overall, there was no interaction with word order (see fig. 1).
Questions
Can MC’s model account for the SO-asymmetry in German RCs
as well? Contrary to English, the verb is placed in clause-final
position in German sub-clauses. It is thus unclear, whether the Sbefore-O regularity will be transferred from main clauses to subclauses like SRCs (3) and ORCs (4).
Simulation 1: German sentences
Simulation setting.
Parameters were kept as close to the original simulation as possible.
 30 word vocabulary (+EOS), three genders, no past tense.
 one corpus contained 10000 sentences; 5% SRCs and ORCs, respectively.
 SRN with 31 units in the input and the output layer, 60 hidden and context units.
 One epoch = 55000 sweeps, learning rate = 0,1, no momentum.
 Test sentences(20 SRCs, 20 ORCs) were not in the training corpus.
Results (epochs 1 – 3).
 There was no indication that the net learned to transfer the
main clause S-O regularity to RCs. By contrast, we found
lower Grammatical Prediction Errors (GPE) for the embedded
verb in ORCs, which was constant over learning epochs.
 We found extraordinarily high GPEs at the matrix verb. A
careful analysis of the output vector at this position revealed
that the net predicts either the end-of-sentence marker (EOS)
or another NP, but not the (congruent) verb. Both predictions
are ungrammatical.
Discussion.
Both results suggest that the net’s performance was mainly
determined by local (up to four word) regularities. The lower
GPE on ORC verbs was due to the local NPnom-verb regularity
transferred from simple main clauses to ORCs (fig. 4, 5).
Furthermore, the network failed to realize simple embedding,
even after ample training (15 epochs; see fig. 3).
Simulation 2: Revisiting English
In order to validate our parameter settings, we re-run MC’s
simulation on English materials.
Results.
We succeded in replicating MC’s pattern of GPEs (albeit with
minor differences). A careful analysis of the output activation
patterns at relevant positions revealed the following picture:
Matrix verb
 The high GPE in SRCs (after …v-NP) mainly is due to the
false prediction of EOS (see fig. 7).
 The high GPE in ORCs (after … NP-v) is due to the false
prediction of another NP (determiner), or the EOS (fig. 8).
Embedded verb (ORCs, after … that – NP)
 In early epochs, the highest proportion of activation is on EOS.
In later epochs, the network learned to predict the embedded
verb correctly (fig. 9).
Discussion.
Again, the data suggest that the networks failed to capture long
distance dependencies and embedding per se: Following an RC,
the network “forgot” to predict the matrix verb, the highest output
activation hence being on the EOS. The nets also failed to acquire
the syntax of (O-)RCs, indicated by the prediction of another NP
following an embedded transitive verb.
Most crucially, the experience-based effect on the embedded verb
turned out to be due to the fact that in early epochs, the network
fails to distinguish verbs from the rel-pro “that”, because both
are legitimate successors of NPs. Therefore, the system predicts
the EOS after an NP-that-NP sequence, considering it a simple
main clause. It was only in later epochs that the network learned
to distinguish the rel-pro that from verbs.
General discussion
During the critical training epochs (1-3), the networks trained on
both English and German sentences failed to capture long
distance dependencies and embedding. The supposed regularity x
frequency interaction in English networks can easily be attributed
to the network learning to distinguish the rel-pro that from verbs
within these epochs. MacDonald and Christiansens´ (2002)
results thus appear to be simulation artefacts. It remains to be
seen, though, whether or not different architectures and/or
training procedures (“starting small”) reveal the desired results.
Regularity transfer to German RCs?
Even if a revised architecture or training procedure could be
found that behaved correctly on English materials, it is yet
unclear how such a model would predict reading time pattern
obtained for German sentences: It would have to predict i. a clear
SRC-advantage, and ii. the effect being largest at the embedded
verb. If the transfer of the SO regularity from main clauses to RCs
is limited due to the clause-final verb-placement in RCs, the SRC
advantage might stem from either SRCs simply being more
frequent than ORCs, or from SOV-transfer from other types of
subclauses, or both. However, the present simulations exhibit a
clear regularity transfer for NPnom-verb sequences from main
clauses to Object-RCs that the network would have to overcome.
References
Bader, M., & Meng, M. (1999). Subject-Object ambiguities in German embedded clauses:
an across-the-board comparison. Journal of Psycholinguistic Research, 28-2, 121-143.
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179-211.
King, J., & Just, M. A. (1991). Inidvidual differences in syntactic processing:the role of
working memory. Journal of Memory and Language, 30, 580-602.
MacDonald, M. C., & Christiansen, M. H. (2002). Reassessing Working Memory: Comment
on Just and Carpenter (1992) and Waters and Caplan (1996) . Psychological Review, Vol.
109, No. 1, 35–54.
Mecklinger, A., Schriefers, H. Steinhauer, K., & Friederici, A.D. (1995). Processing
relative clauses varying on syntactic and semantic dimensions: An analysis with eventrelated potentials. Memory and Cognition, 23, 477-494
Fig. 6. Performance of the SRNs in the replicated simulation with English
data.
Fig. 7. Output activations and probabilities collapsed over word classes.
Fig. 8: ORCs, matrix verb
Fig. 5.
Output activations and correct probabilities compared directly. The Figures
show the prediction of the main verb (fig. 3), the embedded verb in SRCs (fig.
4) and in ORCs (fig . 5).
Fig. 9: ORCs, embedded verb
Download: http://www.iig.uni-freiburg.de/team/members/konieczny/cuny02.pdf
Download