Reassessing the Ability of Simple Recurrent Networks to Account for Verbal Working Memory Performance Nicolas Ruh1, Kerstin Klöckner2 and Lars Konieczny1,2 1University Introduction English Subject-relative clauses (SRCs, 1) are easier to process than Object-RCs (2), and more notably so for readers with a low reading span (King and Just, 1991). MacDonald & Christiansen (2002) attributed this asymmetry to the differential degree of word order regularity: Whereas word order in English SubjectRCs (1) resembles that of frequent main clauses (S-verb-O), Object-RCs (2) obey irregular word order (O-S-verb). Fig. 3. 4. Fig. 1. Reading times for German Subject and Object-RCs 1. The reporter that attacked the senator praised the judge. 2. The reporter that the senator attacked praised the judge. MC (2002) were able to back their claim by running simulations based on SRNs (Elman, 1990). The modulating effect of reading span could be attributed to low-spans' lower degree of reading experience, simulated by the number of training epochs in SRNs (frequency x regularity interaction). S-O asymmetries in German In German, the preference for S-before-O orderings is well established (e.g. Mecklinger et al. 1995, Hemforth et al. 1993, Bader and Meng 1999), albeit with different experimental paradigms and clause types. In a self-paced reading experiment we were able to confirm the Sbefore-O preference in German RCs (SRC:3, ORC:4). 3. Der Wärter, der den Häftling beleidigte, entdeckte den Tunnel. The guard, who [nom] the[acc] prisoner insulted, discovered the tunnel. ”The guard who insulted the prisoner discovered the tunnel.” 4. Der Wärter, den der Häftling beleidigte, entdeckte den Tunnel. The guard, who [acc] the[nom] prisoner insulted, discovered the tunnel. Fig. 2. GPEs for German RCs. SFB 378 of Freiburg, 2Saarland University As in English, reading times on the embedded verb exhibit a clear penalty for ORCs. However, while reading span effected reading time overall, there was no interaction with word order (see fig. 1). Questions Can MC’s model account for the SO-asymmetry in German RCs as well? Contrary to English, the verb is placed in clause-final position in German sub-clauses. It is thus unclear, whether the Sbefore-O regularity will be transferred from main clauses to subclauses like SRCs (3) and ORCs (4). Simulation 1: German sentences Simulation setting. Parameters were kept as close to the original simulation as possible. 30 word vocabulary (+EOS), three genders, no past tense. one corpus contained 10000 sentences; 5% SRCs and ORCs, respectively. SRN with 31 units in the input and the output layer, 60 hidden and context units. One epoch = 55000 sweeps, learning rate = 0,1, no momentum. Test sentences(20 SRCs, 20 ORCs) were not in the training corpus. Results (epochs 1 – 3). There was no indication that the net learned to transfer the main clause S-O regularity to RCs. By contrast, we found lower Grammatical Prediction Errors (GPE) for the embedded verb in ORCs, which was constant over learning epochs. We found extraordinarily high GPEs at the matrix verb. A careful analysis of the output vector at this position revealed that the net predicts either the end-of-sentence marker (EOS) or another NP, but not the (congruent) verb. Both predictions are ungrammatical. Discussion. Both results suggest that the net’s performance was mainly determined by local (up to four word) regularities. The lower GPE on ORC verbs was due to the local NPnom-verb regularity transferred from simple main clauses to ORCs (fig. 4, 5). Furthermore, the network failed to realize simple embedding, even after ample training (15 epochs; see fig. 3). Simulation 2: Revisiting English In order to validate our parameter settings, we re-run MC’s simulation on English materials. Results. We succeded in replicating MC’s pattern of GPEs (albeit with minor differences). A careful analysis of the output activation patterns at relevant positions revealed the following picture: Matrix verb The high GPE in SRCs (after …v-NP) mainly is due to the false prediction of EOS (see fig. 7). The high GPE in ORCs (after … NP-v) is due to the false prediction of another NP (determiner), or the EOS (fig. 8). Embedded verb (ORCs, after … that – NP) In early epochs, the highest proportion of activation is on EOS. In later epochs, the network learned to predict the embedded verb correctly (fig. 9). Discussion. Again, the data suggest that the networks failed to capture long distance dependencies and embedding per se: Following an RC, the network “forgot” to predict the matrix verb, the highest output activation hence being on the EOS. The nets also failed to acquire the syntax of (O-)RCs, indicated by the prediction of another NP following an embedded transitive verb. Most crucially, the experience-based effect on the embedded verb turned out to be due to the fact that in early epochs, the network fails to distinguish verbs from the rel-pro “that”, because both are legitimate successors of NPs. Therefore, the system predicts the EOS after an NP-that-NP sequence, considering it a simple main clause. It was only in later epochs that the network learned to distinguish the rel-pro that from verbs. General discussion During the critical training epochs (1-3), the networks trained on both English and German sentences failed to capture long distance dependencies and embedding. The supposed regularity x frequency interaction in English networks can easily be attributed to the network learning to distinguish the rel-pro that from verbs within these epochs. MacDonald and Christiansens´ (2002) results thus appear to be simulation artefacts. It remains to be seen, though, whether or not different architectures and/or training procedures (“starting small”) reveal the desired results. Regularity transfer to German RCs? Even if a revised architecture or training procedure could be found that behaved correctly on English materials, it is yet unclear how such a model would predict reading time pattern obtained for German sentences: It would have to predict i. a clear SRC-advantage, and ii. the effect being largest at the embedded verb. If the transfer of the SO regularity from main clauses to RCs is limited due to the clause-final verb-placement in RCs, the SRC advantage might stem from either SRCs simply being more frequent than ORCs, or from SOV-transfer from other types of subclauses, or both. However, the present simulations exhibit a clear regularity transfer for NPnom-verb sequences from main clauses to Object-RCs that the network would have to overcome. References Bader, M., & Meng, M. (1999). Subject-Object ambiguities in German embedded clauses: an across-the-board comparison. Journal of Psycholinguistic Research, 28-2, 121-143. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179-211. King, J., & Just, M. A. (1991). Inidvidual differences in syntactic processing:the role of working memory. Journal of Memory and Language, 30, 580-602. MacDonald, M. C., & Christiansen, M. H. (2002). Reassessing Working Memory: Comment on Just and Carpenter (1992) and Waters and Caplan (1996) . Psychological Review, Vol. 109, No. 1, 35–54. Mecklinger, A., Schriefers, H. Steinhauer, K., & Friederici, A.D. (1995). Processing relative clauses varying on syntactic and semantic dimensions: An analysis with eventrelated potentials. Memory and Cognition, 23, 477-494 Fig. 6. Performance of the SRNs in the replicated simulation with English data. Fig. 7. Output activations and probabilities collapsed over word classes. Fig. 8: ORCs, matrix verb Fig. 5. Output activations and correct probabilities compared directly. The Figures show the prediction of the main verb (fig. 3), the embedded verb in SRCs (fig. 4) and in ORCs (fig . 5). Fig. 9: ORCs, embedded verb Download: http://www.iig.uni-freiburg.de/team/members/konieczny/cuny02.pdf