Uploaded by jl.verissimo

Veríssimo & Clahsen (2014), Journal of Memory and Language

advertisement
Journal of Memory and Language 76 (2014) 61–79
Contents lists available at ScienceDirect
Journal of Memory and Language
journal homepage: www.elsevier.com/locate/jml
Variables and similarity in linguistic generalization: Evidence
from inflectional classes in Portuguese
João Veríssimo a,⇑, Harald Clahsen b
a
b
Faculty of Psychology, University of Lisbon, Alameda da Universidade, 1649-013 Lisboa, Portugal
Potsdam Research Institute for Multilingualism, University of Potsdam, Haus 2, Campus Golm, Karl-Liebknecht-Straße 24–25, 14476 Potsdam, Germany
a r t i c l e
i n f o
Article history:
Received 11 November 2013
Received in revised form 19 May 2014
Keywords:
Variables
Similarity
Rules
Morphological generalization
Productivity
Computational modeling
a b s t r a c t
Two opposing viewpoints have been advanced to account for morphological productivity,
one according to which some knowledge is couched in the form of operations over variables, and another in which morphological generalization is primarily determined by similarity. We investigated this controversy by examining the generalization of Portuguese
verb stems, which fall into one of three conjugation classes. In Study 1, an elicited production task revealed that the generalization of 2nd and 3rd conjugation stems is influenced by
the degree of phonological similarity between novel roots and existing verbs, whereas the
1st conjugation generalizes beyond similarity. In Study 2, we directly contrasted two distinct computational implementations of conjugation class assignment in how well they
matched the human data: a similarity-driven model that captures phonological similarities, and a dual-mechanism model that implements an explicit distinction between context-free and similarity-based generalizations. The similarity-driven model consistently
underestimated 1st conjugation responses and overestimated proportions of 2nd and 3rd
conjugation responses, especially for novel verbs that are highly similar to existing verbs
of those classes. In contrast, the expected proportions produced by the dual-mechanism
model were statistically indistinguishable from human responses. We conclude that both
context-free and context-sensitive processes determine the generalization of conjugations
in Portuguese, and that similarity-based algorithms of morphological acquisition are insufficient to exhibit default-like generalization.
Ó 2014 Elsevier Inc. All rights reserved.
Introduction
One of the striking features of human language is its productivity, that is, the fact that speakers are able to produce
and comprehend linguistic expressions that they have not
encountered before (Chomsky, 1965). At the heart of this
ability is the generalization of linguistic patterns and constraints to novel items. For example, if a novel verb such
⇑ Corresponding author. Present address: Potsdam Research Institute for
Multilingualism, University of Potsdam, Haus 2, Campus Golm,
Karl-Liebknecht-Straße 24–25, 14476 Potsdam, Germany. Fax: +49 (0)
331/977 2687.
E-mail address: joao.verissimo@uni-potsdam.de (J. Veríssimo).
http://dx.doi.org/10.1016/j.jml.2014.06.001
0749-596X/Ó 2014 Elsevier Inc. All rights reserved.
as to ploamph were to enter the English language, speakers
would be readily able to form its different variants
(e.g., ploamphed, ploamphing; Prasada & Pinker, 1993), as
well as incorporate them into acceptable sentences
(e.g., Why do you think I should have ploamphed it?). Given
that knowledge of language is finite, but the number of
complex forms and sentences that can be produced and
understood is infinite, one of the central goals of the
language sciences is to characterize the representational
substrate that accounts for linguistic generalizations.
Broadly speaking, two opposing viewpoints have been
advanced. On one side, proponents of symbol-manipulation
approaches hold that linguistic knowledge is primarily
62
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
couched in the form of operations over variables, that is,
placeholders that stand for every instance of a category
(e.g., Chomsky, 1980; Fodor & Pylyshyn, 1988; Marcus,
2001). Variables are insensitive to the idiosyncratic properties of the tokens they instantiate, and as such, allow free
and unbounded generalization to novel instances. For
example, if the rules or constraints of sentence formation
make reference to a variable ‘(V)erb’, then every lexical item
that satisfies this condition can be used in a well-formed
sentence. Likewise, if producing a progressive form involves
concatenating an instance of a variable with the appropriate
suffix (i.e., V + -ing), then this operation can be productively
extended to any novel verb.
A radically different approach to linguistic generalization is espoused by proponents of similarity-based
approaches, which here we will take to encompass a broad
class of models, including (amongst others) connectionist
and exemplar-based architectures (e.g., Daelemans, 2002;
Elman, 1993; Rumelhart & McClelland, 1986; Skousen,
Lonsdale, & Parkinson, 2002).
A distinctive feature of such models is the notion that
generalization is primarily determined by similarity. More
specifically, the higher the representational overlap
between a novel item and a set of learned instances, the
higher the probability that it will be responded to in the
same way (e.g., Hahn & Nakisa, 2000). This aspect stands
in sharp contrast to how generalization is treated in variable-based approaches. Rather than the same operation
being applied equally to all members of a category, analogical models typically produce graded and probabilistic
outcomes which reflect overlap at different levels of representation (e.g., phonological, semantic) and are influenced
by the statistical properties of previously learned input–
output pairs.
The study of morphological generalization and processing has played an important role in this debate, particularly
in what concerns the contrast between regular (e.g., walked)
and irregular (e.g., sang) past tense forms in English:
whereas the regular -ed pattern is productively extended
to new roots, in a way that appears to be insensitive to their
phonological characteristics (e.g., ploamphed; see, e.g.,
Berko, 1958; Prasada & Pinker, 1993; Ullman, 1999; but
see Albright & Hayes, 2003), irregular patterns are seldom
generalized and are applied only to novel items that phonologically resemble clusters of irregular verbs (e.g., spling,
which in analogy with verbs such as sing, can be inflected
as splang; Bybee & Moder, 1983).
The case of the English past tense clearly illustrates a
tension that is also visible in the inflectional and derivational systems of many other languages: that between
context-independence and context-sensitivity (Keuleers
et al., 2007). More specifically, because many inflectional
contrasts and word-formation processes are not applied in
the same way for each and every member of a grammatical
class (such as all verbs), then morphological operations
must, at least for some items, be conditioned by lexical
information. At the same time, because some patterns can
be productively extended in an unbounded fashion
(i.e., even to novel items that are very dissimilar to existing
forms in the language), then it would appear that at least
some morphological operations can behave as a default,
applying when ‘‘all else fails” (Bybee, 1995, p. 452) and in
a way that is independent of the idiosyncratic properties
of individual tokens (see, e.g., Berent, Pinker, & Shimron,
1999, for Hebrew; Marcus, Brinkmann, Clahsen, Wiese, &
Pinker, 1995, for German; Prasada & Pinker, 1993, for
English).
The balance between lexically conditioned and
productive generalizations is most explicitly incorporated
in a class of dual-mechanism models of morphology
(e.g., Clahsen, 1999; Marslen-Wilson & Tyler, 1997, 2007;
Pinker, 1999; Pinker & Ullman, 2002). According to such
models, morphological operations can either be instantiated: (1) by the application of a grammatical rule, which
operates over a variable and generates a structured representation (e.g., adding the English regular -ed affix to any
verbal root); or (2) through direct retrieval of an exceptional form (e.g., an irregular past tense, such as brought),
and in the case of generalization to novel words, via analogy
from the associations between lexically specified representations (e.g., splought as a possible inflection for spling).
In contrast, according to the class of similarity-based
models mentioned above, a single context-sensitive mechanism based on the overlap between lexical representations
is purported to be sufficient to capture both the generalizations that are similarity-based and those that are made outside of restricted areas of the similarity space. In such
models, approximation to default-like behavior is thought
to emerge naturally for those morphological patterns that
are most frequent or that display significant heterogeneity
in their phonological distributions (e.g. Hahn & Nakisa,
2000; Hare & Elman, 1995; Hare, Elman, & Daugherty,
1995). In other words, in the single-mechanism view, what
appears to be the result of an operation over a variable is in
fact the product of the same frequency- and similaritybased mechanisms that are responsible for restricted and
lexically conditioned generalizations.
In the present paper, we set out to investigate the generalization properties of conjugation classes in Portuguese, an
example of pure morphology, which we believe provides a
better test case for assessing the mechanisms involved in
morphological generalization than the familiar contrast
between regular and irregular inflection. In order to assess
the role of phonological similarity, we have used a metric
derived from a computational implementation of a similarity-based model, the Minimal Generalization Learner (MGL)
proposed by Albright (Albright, 2002a; Albright & Hayes,
2003). In addition, by minimally changing the MGL model
to embed a more explicit dual-mechanism distinction, we
were able to directly compare two specific computational
implementations in how well they matched elicited production data obtained from native speakers of Portuguese.
Linguistic background
In linguistic treatments of Portuguese morphology, the
structure of Portuguese verbs has been proposed to display
three hierarchical levels: a root constituent, a stem constituent, and a word constituent (Villalva, 2000, 2003). The root
(e.g., cant-, in the infinitive form cantar, ‘to sing’) is taken to
be the locus of all semantic, syntactic and morphological
information, and transmits this information to the stem
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
node above it. The stem is constituted by a root and, in the
case of verbs, a theme vowel, which is the affix that realizes
the verb’s morphological class or conjugation. In Portuguese,
verbs can belong to one of three conjugation classes, which
are commonly identified by the theme vowels that they display in the stem of infinitive forms: these are -a-, for 1st
conjugation verbs; -e-, for 2nd conjugation verbs; and -i-,
for 3rd conjugation verbs.
In most cases, the stem is the linguistic unit that functions as the base for further morphological processes, that
is, both inflectional and derivational affixes generally attach
to the stem (e.g., cant-a-r ‘to sing’, transfer-í-vel ‘transferable’). One of the few exceptions is the 1sg present indicative form which displays an affix that attaches directly to
the root (e.g., cant-o ‘I sing’). For that reason, novel 1sg
present indicative forms are ambiguous regarding class
membership and can therefore be used to gauge the generalization properties of the different conjugations.
A full form such as cantássemos, for example, will therefore be composed of four morphological constituents: the
root cant-, the theme vowel -a- for the 1st conjugation,
the inflectional affix -sse for the imperfect subjunctive,
and the affix -mos expressing person and number information. Of these four constituents, only three can be conceived
of as morphemes: the root, bearing all idiosyncratic information associated with the lexical entry; -sse, encoding
tense, mood, and aspect; and -mos, an agreement morpheme. However, the theme vowel -a- is not a morpheme
under any sensible definition, because it does not have
any meaning or function. Beyond being class markers,
theme vowels specify no further information. Such constituents are semantically and syntactically empty, and only
convey purely morphological information, namely, which
conjugation the verb belongs to.
Another important aspect of the verbal conjugation system of Portuguese is that the formal spell-out of an inflectional morpheme may differ depending on what class the
verb belongs to. Most, but not all, inflectional affixes are
the same across conjugations; however, some forms display
inflectional affixes that are sensitive to conjugation membership. For example, the past imperfect form shows two
different affixes: -va in the case of 1st conjugation verbs,
and -a, in the case of 2nd and 3rd conjugation verbs.
Summing up, conjugational stems in Portuguese display
three advantageous properties. Firstly, most stems are
combinatorial, that is, they consist of a root and a theme
vowel; therefore, all stems are ‘regular’ in the sense that
they do not need to show any phonological changes and
can be potentially segmented into their constituents.
Secondly, stems contain an empty ‘morph’, a theme vowel
that expresses nothing except morphological information.
Thirdly, stems define inflectional classes, that is, they
select particular affixes. Therefore, rather than being a
mapping between a phonological form and a meaning (or
morphosyntactic feature), theme vowels and stems determine the mappings of sound to meaning (for example, that
the past imperfect is expressed by the affixes -va or -a).
Because of these properties, stems and inflectional classes
constitute purely morphological concepts (see Aronoff,
1994), and therefore, they can be used as a very
63
clear-cut and unconfounded test case of morphological
generalization.
In addition, conjugations in Portuguese display a striking
discrepancy between the 1st conjugation, on the one hand,
and the 2nd and 3rd conjugations, on the other. In the
Portuguese verb lexicon, the 1st conjugation is the most
productive class. For example, a count of type frequencies
in a lexical database of Portuguese, created from a corpus
of 16,210,438 words (Bacelar do Nascimento et al., 2000),
showed a predominance of 1st conjugation verbs in both
the whole corpus (3,396 1st conjugation, 380 2nd conjugation, and 348 3rd conjugation verbs) and amongst the verbs
with the lowest lemma frequency (0.37 per million; 123 1st
conjugation verbs, but only 10 2nd or 3rd conjugation
verbs). More importantly, the formation of 1st conjugation
stems in Portuguese (as well as in other Romance languages) qualifies as a default process according to the criteria laid out in Marcus et al. (1995), that is, the 1st
conjugation exhibits unrestricted productivity in that it
can apply to foreign borrowings, onomatopoeias, denominal verbs, etc. Consequently, novel words that enter the language are always assigned to the 1st conjugation (e.g.,
blogar ‘to blog’).
Psycholinguistic models of stem formation
From a psycholinguistic perspective, the above considerations raise the question of how stems and inflectional classes are mentally represented and generalized to novel
forms. We can think of three general possibilities, which
differ in the role ascribed to context-sensitive mechanisms
or to variables referring to grammatical categories.
Following linguistic treatments of Portuguese morphology (Villalva, 2000, 2003), according to which verbal stems
of all three inflectional classes constitute the output of morphological rules that join a root and a class marker, one
would expect all conjugations to be generalized in the same
manner, that is, in a way that is insensitive to phonological
characteristics of the root. Therefore, whilst the differences
in the productivity of the three conjugation classes is indeed
acknowledged in traditional linguistic treatments of Portuguese, no explanation for that discrepancy is provided.
A second possibility is a dual-mechanism account of
stem formation, as was proposed by Say and Clahsen
(2002) for Italian verbal conjugations. According to this
account, 1st conjugation stems are generated by a stemformation rule that applies to any verbal root, accounting
for its unbounded productivity, whilst 2nd and 3rd conjugation stems have to be stored in the lexicon and block the
application of the general stem-formation rule.
This account has been tested for Portuguese by
Veríssimo and Clahsen (2009) using a morphological
cross-modal priming experiment, in which infinitive forms
belonging to the 1st (e.g., limit-a-r ‘to limit’) or the 3rd conjugation (e.g., resist-i-r ‘to resist’) primed root-based present
tense indicative forms (e.g., limit-o, resist-o). The results
showed that only 1st conjugation infinitives produced as
much facilitation as identity primes, which was interpreted
as support for a dual-mechanism account along the lines of
Say and Clahsen (2002).
64
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
With respect to generalization properties, a dual-mechanism account would predict differences between 1st and
2nd/3rd conjugation stems: 1st conjugation forms should
be more widely generalized and unaffected by analogies
to existing words, because they are the result of a stemformation operation over a variable, whilst 2nd or 3rd conjugation forms should reveal graded effects of phonological
similarity.
A third possibility is that the conjugational stems of
Romance languages do not have any internal morphological
structure, and that generalization of all stems is performed
by a frequency- and similarity-driven mechanism that is
sensitive to phonological overlap. For Italian, there are several different computational implementations of these
assumptions, Eddington’s (2002) exemplar-based model,
Colombo, Stoianov, Pasini, and Zorzi’s (2006) connectionist
network of massively interconnected units, and Albright’s
(2002a) rule-based MGL.
Eddington’s (2002) exemplar model is an implementation of the Analogical Modelling of Language (AML) framework developed by Skousen (1992) and Skousen et al.
(2002). In AML, the probability of a particular output is
obtained by first searching lexical memory for the entries
that are most similar to the given context (i.e., to an input
form). The members of the database are then grouped into
subcontexts containing forms that share phonological constituents with the input (e.g., the last consonant or the
nucleus vowels). Those lexical items that share more features with the elicitation form (i.e., that belong to more subcontexts) and that display cohesive change patterns have a
larger influence on the probability that a particular output
is chosen. Therefore, such properties make Eddington’s
model highly sensitive to analogical effects of phonological
similarity, especially those arising from particularly dense
neighborhoods. Eddington argued that this model accurately simulates the human data presented by Say and
Clahsen (2002) (see below for discussion).
Another computational implementation of analogical
principles has been proposed by Colombo et al. (2006),
who trained two connectionist networks (both with input,
hidden, and output layers) to produce Italian participles
from different inflectional variants. In these networks,
inflected forms are represented in a distributed fashion
across a set of phonological units and learned connection
weights mediate the activation of phonological outputs.
Because of these properties, the networks generalize conjugation membership through a graded and analogical mechanism, that is, novel roots are assigned to a conjugation on
the basis of their phonological overlap with similar existing
verbs. Colombo et al. (2006) performed an elicited production task with adult native speakers of Italian, the results of
which they argued could be accurately simulated by their
model.
Finally, an influential account of conjugation assignment
has been proposed by Albright (2002a), also for Italian, in
the form of a computational implementation: the MGL
model. Unlike analogical models of morphology, the MGL
model generalizes on the basis of phonological similarity,
but it does so by extracting rules that incorporate both variables and restricted phonological contexts. The algorithm
proposed in this model takes pairs of morphologically
related words as input and, for each pair, posits a morphophonological rule that describes the mapping. For example,
conjugation assignment in Portuguese can be represented
as a mapping between a 1sg present indicative form, which
does not contain a theme vowel (and displays the same
inflectional affix -o in all conjugations) and an infinitive
form ending in -ar, -er and -ir, in which the theme vowels
distinguish the three verb classes. The rules that the model
extracts for each input–output pair have the form ‘‘change
A into B, in the presence of C and D”, where C and D are
the phonological environments on the left- and right-hand
side of the change, respectively. The first rules that are
learned are word-specific, that is, they can apply only to a
single input form. For example, the rule that relates the
1sg form [fiku] (fico‘(I) stay’) to its 1st conjugation infinitive
[fikar] (ficar ‘to stay’) could be described as ‘‘change [u] into
[ar], in the environment [fik]”. These word-specific rules
cannot be used for generalization and are indistinguishable
from a direct lexical association.
However, by comparing different rules with the same
structural change, the model then posits ‘generalized’ rules
that can encompass the phonological contexts in both rules.
In order to do that, the algorithm first preserves the common phonemes in both contexts; then, when different phonemes are found, it maintains whatever phonological
features these phonemes have in common; and finally, if
there is additional unmatched material, it is substituted
by a variable. For example, comparing the word-specific
rule that relates [fiku] and [fikar] to the rule that relates
R
R
[ tiku] (estico ‘(I) stretch’) and [ tikar] (esticar ‘to stretch’)
would lead the algorithm to extract a ‘generalized’ rule that
changes [u] into [ar], in the presence of certain left-side
phonological material: the common phonemes [ik], preceded by a featural description that encompasses the two
different phonemes, [f] and [t] (i.e., non-sonorant unvoiced
consonants), together with a variable that can match any
extra material (in this case, because the remaining
R
R
phoneme [ ] in [ tiku] was not already covered by the
phonological description). By iterating the process (i.e., by
comparing all word-specific rules both to each other and
to all ‘generalized’ rules), additional rules are extracted that
match increasingly wider phonological environments. In
the case of morphological transformations that apply to sets
of forms with some degree of phonological heterogeneity
(rather than clustering into relatively constrained phonological environments), even a context-free rule can be discovered (i.e., ‘‘change [u] into [ar], in the presence of x”).
The resulting grammar therefore contains many redundant rules, which cover the phonological space in both general and specific ways. For example, in Portuguese, the MGL
model derives the following 1st conjugation rules, with the
first of these necessarily subsumed under the second
(because the phoneme [i] in the first rule matches the featural description in the second rule):
(1) u ? ar/[x fik___]
(2) u ? ar/[x[+cont, –nas, –lat]k___]
In the MGL model, competition between rules is
resolved not by their specificity (Kiparsky, 1973), but by
evaluating their reliability, a measure of a rule’s ‘success’
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
(e.g., Albright, 2002b). In particular, the reliability of a rule
is the number of forms a rule can derive correctly (the rule’s
‘hits’) divided by the number of forms that the rule can be
applied to (its ‘scope’; reliabilities are then adjusted using
a lower confidence limit, which penalizes rules with smaller
scopes; see Mikheev, 1997). For example, rule (2) above can
apply to 155 verbs in a lexicon of Portuguese, and derives
the right infinitive (i.e., correctly assigns to the 1st conjugation) for 101 of them, which leads to an adjusted reliability
score of .625 (slightly less than the raw 65.2%, due to the
adjustment). In contrast, rule (1) derives 48 out of 48 verbs
correctly (e.g., ficar ‘to stay’; verificar ‘to verify’); because
every Portuguese verb root that ends in [fik] forms an infinitive in -ar (i.e., it is a 1st conjugation verb), this rule has a
very high adjusted reliability of .979 and takes precedence
over the more general rule in (2).
The MGL model therefore incorporates both variables
and similarity as part of its mechanisms for morphological
generalization. However, Albright’s (2002a) proposal can
be crucially distinguished from the ‘standard’ dual-mechanism model in two important ways. Firstly, the MGL model
is similarity-driven in that it preserves all phonologically
specific rules, instead of a single context-free one. That is,
the generalizations learned by the MGL algorithm are phonologically restricted for all conjugation classes (even when
the extracted rules incorporate variables, as in examples 1
and 2 above) and many of these phonologically-sensitive
rules will take precedence over a context-free rule that
applies unboundedly to a grammatical category. In contrast,
a dual-mechanism model of stem formation, such as the
one proposed by Say and Clahsen (2002) and Veríssimo
and Clahsen (2009), predicts that context-free and phonologically restricted generalizations align with conjugation
membership: only the generalization of 2nd and 3rd (but
not 1st) conjugation stems should display sensitivity to
phonological properties.
Secondly, in the MGL model, the ordering of rules by a
reliability metric is inherently input-driven, that is, it
reflects the predictive power of each phonological environment as indicative of conjugation membership. This is the
case even for any context-free rule extracted by the model,
which will still be ranked according to its success. In contrast, in a dual-mechanism approach, the 1st conjugation
in Romance languages is the default class: rather than having its rank determined by statistical and distributional
properties, the default has unlimited applicability.
These contrasts between Albright’s (2002a) MGL and the
dual-mechanism proposals of Say and Clahsen (2002) and
Veríssimo and Clahsen (2009), together with the fact that
specific quantitative predictions can be derived from the
MGL computational implementation, make stem formation
in Romance languages an interesting test case for the wider
theoretical questions regarding the role of context-free and
context-sensitive generalizations in morphology.
As support for his proposal, Albright (2002a) conducted
an acceptability judgement study in Italian and found that
mean acceptabilities of novel infinitives belonging to the
1st, 2nd or 3rd conjugations were positively correlated with
MGL rule reliabilities, suggesting that, in Italian at least,
speakers are sensitive to the phonological shape of a nonce
65
word when generating stems belonging to all three
conjugations.
The specific question we will be addressing in Study 1 is
whether the generalization of 1st, 2nd and 3rd conjugation
stems in Portuguese displays sensitivity to the phonological
properties of novel roots. In order to assess this, we conducted an elicited production experiment in which participants were presented with novel roots that were
constructed such that they fell into a range of reliability values and associated phonological environments. If generalization of all conjugational stems can be appropriately
described by a context-sensitive mechanism, then proportions of infinitives in -ar, -er and -ir should be influenced
by this variation in phonological properties.
In Study 2, we will directly compare the proportions of
participant responses belonging to the different conjugations to the predicted proportions of the MGL implementation. Furthermore, the MGL model will be contrasted with a
minimally different computational model that implements
an explicit distinction between context-free and similarity-based generalizations for the different verbal conjugations of Portuguese.
Study 1: Similarity effects in the generalization of
inflectional classes
To examine similarity effects in Portuguese, we followed
the same steps as Albright did for Italian. We first applied
the MGL algorithm to a large lexical database of European
Portuguese. When the MGL implementation was run over
the verbs in this database, the model extracted morphophonological rules and corresponding reliability scores reflecting similarity clusters or – in Albright’s terms – ‘islands of
reliability’ within the three inflectional classes. Using these
reliability scores, we then constructed a set of novel verbs
and tested their generalization properties in an elicited production task with a group of native speakers of Portuguese.
Finally, in order to determine similarity effects, we tested
whether participant performance was predicted by the
model’s reliability scores.
Participants were presented with root-based 1sg present
indicative forms of novel verbs, and asked to produce a
stem-based infinitive form. Because the 1sg present indicative is constituted by a verbal root coupled with an inflectional affix that is the same for all conjugations (-o), the
presented form was ambiguous regarding conjugation
membership. However, infinitive forms are constituted by
a verbal root and a theme vowel (i.e., the stem), to which
the infinitival affix (-r) attaches. Therefore, the elicitation
of an infinitive form requires assigning a conjugation class
to the novel verbal roots.
By manipulating phonological similarity in a graded,
continuous manner, it is possible to ascertain whether the
generalization of the three conjugation classes is sensitive
to contextual properties of novel roots, or instead, based
on an operation over an unbounded class, that is, a variable
such as ‘(V)erb root’. Therefore, the experiment allows us to
contrast the similarity-based and dual-mechanism models
that have been proposed for the generalization of stems in
Romance languages.
66
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
If the generalization of Portuguese verbal stems can be
captured by Albright’s (2002a) similarity-driven model, as
has been proposed for Italian, we would expect to find that
the model’s reliability scores predict proportions of
responses belonging to all three conjugations. In other
words, we should find evidence that the generalization of
1st, 2nd and 3rd conjugations is sensitive to the phonological properties of novel roots and displays ‘gang’ effects,
even when similarity to the competing classes is factored
out.
In contrast, if conjugational stems of Romance languages
are represented in a dual architecture, we should find a dissociation between the generalization of the three conjugation classes. If 1st conjugation stems constitute structured
representations, then they should be generalized on the
basis of an operation over a variable, which potentially
encompasses any novel verbal root. Therefore, we expect
the proportions of novel 1st conjugation responses to be
insensitive to variation in phonological properties, once
similarity to the competing classes is factored out. At the
same time, if 2nd and 3rd conjugation stems are represented in a format that supports a similarity-based relation
to their roots, we might expect them to be susceptible to
graded similarity-based generalizations, as captured by
the reliability scores of the MGL algorithm.
Method
Participants
Fifty-four adult native speakers of European Portuguese
(mean age: 25.3, 30 males) participated in the elicited production experiment. All of the participants were from mainland Portugal, had completed at least 12 years of schooling,
had normal or corrected-to-normal vision and were naive
with respect to the purpose of the experiment. None of
them had ever experienced language or literacy-related difficulties. Participants were randomly assigned to one of two
experimental versions, 32 to one version, and 22 to the
other (see Procedure below).
Simulation
The first step in constructing the materials was to construct an implementation of the MGL learning algorithm
described by Albright and Hayes (2002), which was programmed in Visual Basic.1 The model’s input was selected
from a large lexical database, the Léxico Multifuncional
Computorizado do Português Contemporâneo (Bacelar do
Nascimento et al., 2000), a frequency lexicon created from a
corpus of 16,210,438 words of modern Portuguese. Starting
with the 4,124 lemmatized verbs that exist in the Bacelar
do Nascimento et al.’s frequency lexicon, we selected those
verbs that had a lemma frequency of 1 per million or higher
(3,543 verbs). In addition, because the target form for both
the model and the participants was the infinitive, verbs
whose infinitive form did not appear in the corpus were
excluded from the model’s input. This resulted in a set of
3,117 verbs, each of these represented by a pair of forms:
1
We have also programmed an equivalent (but more efficient) implementation in R, which is available for download at http://software.jverissimo.net.
1sg present indicative (which does not contain a theme
vowel) and infinitive.
For all verbs in the resulting set, both forms were
encoded in phonetic transcription, following standard European Portuguese pronunciation, and in particular, reflecting
the pronunciation variety that is more common in Lisbon
(the region where most of the participants in this study
lived). The inventory of Portuguese consonants and vowels
was taken from Mateus and d’Andrade (2000, pp. 29–30),
albeit with several modifications: we only employed phonemes that occur in European Portuguese, excluded glides
(which have the same featural specification as their corresponding vowels), and excluded alternative realizations.
For the resulting set of phonemes, we constructed a matrix
of distinctive features, also from the work of Mateus and
d’Andrade, excluding only the phonological features that
contained redundant information in terms of the MGL
generalization algorithm. This resulted in a matrix of 33
phonemes by 13 features, with feature values encoded as
+, , or 0 (i.e., unspecified).1
When the MGL implementation was run over the 3,117
pairs of 1sg present indicative and corresponding infinitive
forms in the database, it extracted many morphophonological rules, first a set of word-specific rules for each of the
input pairs, and then a set of 6,389 ‘generalized’ rules
obtained from the iterative comparison of the word-specific
rules. For each of the ‘generalized’ rules extracted by the
model, the type reliability ratio was computed, that is, the
number of verbs that undergo the particular morphological
transformation divided by the number of all verbs that contain the relevant phonological context.2 Reliability scores
were then adjusted using a lower confidence limit of 75%,
which was the value used in Albright’s (2002a) simulation
for Italian. Rules were then sorted in descending order by
their adjusted reliabilities.
Materials
We constructed 78 novel verbs on the basis of the MGL
rules and reliability values (listed in the Appendix). Novel
verbs contained specific phonological environments that
the MGL model identified as constituting particularly reliable contexts for one or more inflectional classes. In order
to achieve this, we first selected rules that encompass
‘islands of reliability’ for the different conjugations. Following Albright (2002a), we wanted to use correlation statistics
to test how well the adjusted reliabilities of the rules discovered by the model predict the conjugation class of the
forms produced by our participants. Therefore, we selected
not only rules with high and low adjusted reliabilities for a
given class, but also rules that span a wide range of intermediate reliability values for all three conjugations.
The next step was to create novel verbs constituted by a
root together with the 1sg present indicative suffix -o
2
Phonemes excluded from Mateus and d’Andrade’s (2000, pp. 29–30)
descriptive table were [ʧ], [ʤ], [v], [ɫ], [R], [j] and [w]. Features excluded
were [laryngeal], [height] and [±round].
3
This calculation of reliabilities takes only type frequencies into account.
In previous comparisons of type- and token-based reliability measures in
Italian and English, Albright (2002a, 2002b) showed that type measures
display greater correlations with human judgements, especially for the 1st
conjugation in Italian.
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
(pronounced [u]), such that the novel roots contained phonological material that matched the conditions of application of
selected rules. One problem that arises when creating novel
forms is that there are several rules that can potentially apply
to any given form. Following Albright (2002a), we assumed
that, for each given novel 1sg form, only the most reliable
rule for each conjugation is relevant for the acceptability of
the resulting infinitive form. Therefore, if we wanted a novel
form to tap into a lower reliability ‘island’, care was taken to
ensure that the form could not be covered by any other rule
that was associated with a higher reliability value. In other
words, the rules that were used to create the novel words
were the most reliable ones that could apply to them.
Note that because novel forms match different rules for
each conjugation, each of the items used in the present
study is associated with three reliability values, which can
vary independently. A novel form’s reliability for a conjugation corresponds to the value of the most reliable rule that
assigns it to the 1st, 2nd or 3rd conjugation. In other words,
the most reliable -ar, -er, and -ir rules that can apply to a
novel form define its similarity to a conjugation class.
Further considerations that were taken into account
when building the novel forms were that they should sound
as natural as possible in Portuguese and that they should
not contain existing verbs within them.3
Since we were most interested in testing for graded similarity effects in the 1st conjugation, we made sure that the
78 items fell into an ‘island’ of reliability for this class. However, because the MGL model also discovers a context-free
rule for 1st conjugation infinitives (albeit one that is ranked
lower than the more reliable phonologically restricted
rules), there is necessarily a lower boundary to the reliability values for the 1st conjugation, limiting their variance
and range in the experimental materials. Another important
constraint in the reliabilities of the nonce forms is that, due
to the phonological distribution of 2nd and 3rd conjugation
verbs into relatively tighter ‘islands’ than in the case of 1st
conjugation verbs, it is difficult to create materials with
medium reliability values for the non-default conjugations
(see descriptive statistics in the Appendix).
In order to minimize these problems, we attempted to
create forms that covered the whole reliability space as much
as possible, for all classes, but especially for the 1st conjugation. The resulting set of items displays a greater range and
standard deviation (SD) in their reliabilities than the items
used by Albright (2002a) in Italian (1st conj. range: .534 vs.
.230; SD: .178 vs. .085), thereby providing enough power to
detect continuous and graded effects of similarity.
In addition, besides testing whether reliability scores
predict probabilities of production, we also wanted to
assess the MGL model’s success in predicting each
response’s relative acceptability. To this end, we included
4
This last consideration was followed to the extent possible, but the
features and segments specified by some of the MGL rules, together with the
phonological well-formedness constraints of Portuguese, necessarily yield
existing verbs (e.g., many verb roots ending in [duz] belong to the 3rd
conjugation, but any novel 1sg form that falls into this ‘island’ will
necessarily contain the 1st conjugation form uso [uzu] ‘I use’). In the very
few cases in which this was unavoidable (5 items), the possibility of a direct
association to a memorized verb was reduced by ensuring that the existing
form did not occur as a separate syllable (e.g., by creating a diphthong).
67
items for which the most reliable rule outputs a 2nd or
3rd conjugation form (12 items for 2nd and 12 for 3rd
conjugation). These 24 items also fall into phonological
‘islands’ for the 1st conjugation, but because they contain
phonological properties that are highly predictive of membership in the 2nd or 3rd conjugations, assigning them to
one of the non-default classes is rated by the MGL model
as the most acceptable alternative. Unlike in Albright’s
(2002a) experiment in Italian, we made sure that such
items were also included so that we could specifically
examine the proportions of 1st conjugation responses in
cases of high similarity to the other classes.
Procedure
Novel verbs were presented in written form in a paper
booklet, and participants were asked to perform an open
response sentence completion task. The first page of the
booklet detailed the instructions for the task. The experiment
was introduced as a study about how new words enter everyday language, and participants were informed that they
would be asked to transform verbs they had never heard
before. Each novel form was presented only once and was
embedded in a frame ‘conversation’ consisting of two wellformed sentences. There were 78 ‘conversations’, one for
each of the experimental items. The first sentence presented
the novel verb inflected for the 1sg present indicative, in bold
type. The second sentence created a syntactic context that
required an infinitive form, but contained a blank space in
place of the main verb, as in the following example:
(3) Quase sempre acuo sozinho.
Mas amanhã vou _______ acompanhado.
‘I almost always acuo alone.
But tomorrow I will ________ with someone.’
Before the experimental sentences were introduced, participants were presented with four example ‘conversations’
that were similar in every respect to the experimental
frames except that they contained existing verbs and that
the underlined space was already filled with an infinitive
form in bold font. The verbs used in these introductory
frames were the most frequent verbs in Portuguese that display each of the three possible infinitive theme vowels (and
the verb pôr ‘to put’, which is considered not to have a
theme vowel in the infinitive). In order to reduce the influence of metalinguistic knowledge and elicit more natural
responses, no mention was made of conjugation classes,
nor that responses should be given in the infinitive.
Participants were instructed to read the first sentence in
each of the experimental frames and to complete the second sentence by filling in the blank spaces with a form that
they considered appropriate. Participants were encouraged
to read every frame ‘conversation’ and carefully consider
their responses. However, it was also emphasized that there
were no right or wrong answers, and that they should rely
on their intuitions by completing the spaces using the form
that ‘‘sounded best” to them.
Experimental sentences were constructed so that the
contexts for the novel verbs did not elicit any obvious
semantic associations with existing verbs. This reduced
the possibility that effects from similarity to real verbs were
68
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
based on any semantic properties, rather than exclusively
based on phonological similarity, which was the crucial
independent variable.
Two versions of the task were constructed. The sentences were presented in the same order in both versions,
but the order of the novel verbs was pseudo-randomised
in each version. Therefore, because not all participants
saw the same novel verb in the same sentence, the possibility of conjugation assignments reflecting associations created by particular sentences was further reduced.
The 24 items for which the MGL model prefers a 2nd or
3rd conjugation form were scattered throughout the task,
with 14 items appearing in the first half of the booklet in
one of the versions, and in the second half of the booklet
in the other. In addition, there were no more than 2 of these
items presented sequentially in any of the versions. This
was done to ensure that participant responses were diverse
enough during the task, without several items eliciting the
same response in a sequence.
Data scoring and analysis
Participant responses were coded into a nominal scale
reflecting the conjugation that the supplied infinitive form
belonged to. Two items were excluded from the analysis
due to a spelling inconsistency in the two versions of the
task. A total of 45 blank, illegible, or non-infinitive
responses were discarded (accounting for 1.10% of the
data). Another 8 responses (0.20% of the remaining data)
were considered valid but were not further analyzed,
because they were infinitive forms with the ending -ôr
(by analogy to the verb pôr and its compounds, which
belong to the 2nd conjugation, but do not display a theme
vowel).
Instead of directly analyzing proportions of each type
of response, several adjustments needed to be made to
the raw counts. Firstly, when analyzing nominal data
(such as conjugation membership), there are potential
problems associated with using methods such as linear
regression, because proportions are inherently bounded
between 0 and 1 and their error variance is not independent from the mean (Barr, 2008). These violations can
lead to biases in the associated p-values, and give rise to
spurious significances or null results (Jaeger, 2008). One
way to avoid the statistical problems involved in the analysis of categorical scales is to convert proportions to relative odds, which are then subjected to a logarithmic
transformation (Woolf, 1954). When this conversion is
applied, the scale of the resulting log-odds or logits has
the advantageous properties of being unbounded and
symmetric around zero (with a logit of zero corresponding
to a proportion of 50%).
In the present paper, all analyses were performed on logodds, rather than on proportions. This allows the human
responses to be analyzed using linear methods (Study 1),
but also, to statistically compare the human data to
expected proportions predicted by the computational
implementations (Study 2).
In order to avoid infinite values for proportions of 0 and
1, all results were analyzed by applying the method recommended by Agresti (2002), in which raw counts of
responses are converted to empirical logits, using an
estimator originally proposed by Haldane (1955) and
Anscombe (1956).4 In addition, when performing linear
regression with empirical logits, McCullagh and Nelder
(1989) and Jaeger (2008) argued for weighting cases by the
inverse of their variance, due to the fact that log-odds with
lower variance (i.e., those closer to zero and based on a
higher number of valid responses) should be more informative to the estimated model than those with higher variance.
In the present analysis, cases were weighted using an estimator recommended by McCullagh and Nelder and originally
proposed by Gart (1966).5
Log-odds of responses belonging to the 1st, 2nd and 3rd
conjugations were each submitted to three weighted linear
regressions. In each regression, the reliability scores for
each of the three conjugations were entered as simultaneous predictor variables, that is, the contribution of each
predictor is estimated by controlling for the reliabilities
for the other conjugations. Similarity to the competing conjugations can also have an effect (likely a negative one) in a
particular response type; this inhibition could arise either
through a linguistic principle, morphological blocking
(Aronoff, 1976), or simply by virtue of the mutual exclusivity of the different possible responses. Therefore, the multiple regression method we employ here allows estimating
the independent contribution of phonological similarity to
each conjugation and answers the question of whether proportions of responses for any one conjugation are predicted
by similarity to that class.
Results
Table 1 displays the results of a set of regressions, in
which the reliability scores for each conjugation were
simultaneously entered as predictors. The table shows the
(unstandardized) coefficients of each predictor (i.e., the reliability score for each class) in the estimation of the log-odds
of responses belonging to each of the three conjugations.
As can be seen in Table 1, a clear effect of phonological
similarity was obtained for the 2nd and 3rd conjugations,
but not for 1st conjugation responses. For 2nd conjugation
responses, the only significant predictor was the MGL model’s reliabilities for the 2nd conjugation (t(72) = 6.35,
p < .001). Neither the reliabilities for the 1st conjugation
(t(72) = 0.39, p = .696), nor those for the 3rd conjugation
(t(72) = 1.83, p = .071) were significant predictors. The
total model with the three reliability values as predictors
had an associated r2 of .583 (F(3,72) = 33.58, p < .001).
The same pattern was found for 3rd conjugation
responses. The only significant predictor was the reliability
scores for the 3rd conjugation (t(72) = 5.45, p < .001). Again,
there was no effect of similarity to competing classes, as
both the reliabilities for the 1st (t(72) = .09, p = .928) and
the 2nd conjugations (t(72) = .09, p = .368) were not
yþ:5
Empirical logits g0 are given by g0 ¼ ln nyþ:5
, in which y is the number
of participants that gave a certain response to a particular item, and n is the
number of participants that gave a valid response to that item. Gart and
Zweifel (1967) have shown in an empirical comparison that this estimator
fares very well against the alternatives.
6
1
1
Cases were weighted by 1/v, where v ¼ yþ:5
þ nyþ:5
. In the empirical
comparison conducted by Gart and Zweifel (1967), this variance estimator
showed no substantial biases.
5
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
Table 1
Estimated coefficients for the MGL reliabilities for the 1st, 2nd, and 3rd
conjugations in the three conducted regressions (with log-odds of 1st, 2nd
and 3rd conjugations as dependent variables).
Predictors
Reliabilities for 1st conj.
Reliabilities for 2nd conj.
Reliabilities for 3rd conj.
*
Dependent variables (Log-odds)
1st conj.
(-ar)
2nd conj.
(-er)
3rd conj.
(-ir)
0.14
1.69
1.37
0.28
2.19
1.03
0.06
0.45
2.11
Indicates statistical significance at a = .05.
significant predictors. The model with the three reliability
scores as independent variables explained 38.2% of the variance in the log-odds of 3rd conjugation responses (F(3,72)
= 14.81, p < .001)).
A very different picture emerged for the log-odds of producing 1st conjugation infinitives. Both the reliabilities for
the 2nd conjugation (t(72) = 5.25, p < .001) and the 3rd
conjugation (t(72) = 3.73, p < .001) were highly significant
negative predictors. In contrast, the MGL reliabilities for the
1st conjugation did not play a statistically significant role in
the prediction of the corresponding log-odds (t(72) = .24,
p = .809). The total model r2 for 1st conjugation responses
was.395 (F(3,72) = 15.63, p < .001). These results stand in
sharp contrast to those for 2nd and 3rd conjugation
responses, in which the coefficients of the reliabilities of
the corresponding classes were highly significant and of
moderate magnitude.
The three regression models were also combined into a
single multivariate analysis, so that the general effect of
each predictor across all three response types could be
assessed. Whilst each of the regressions above examined
the role of reliability scores in the production of one particular conjugation (a one-vs.-rest analysis of multinomial
data; Agresti, 2002), a multivariate regression model allows
simultaneously taking into account all three dependent
variables, as well as their interdependence. Consistently
with the previous analyses, the results showed an effect
of the reliabilities for the 2nd (F(3,70) = 24.32, p < .001)
and the 3rd conjugations (F(3,70) = 13.00, p < .001), but no
effect of the reliabilities for the 1st conjugation (F(3,70)
= 0.32, p = .814) in the assignment of novel words to a
conjugation.
In order to further attest the robustness of these results,
we have investigated the influence of different types of reliability adjustment, which is a computational parameter of
Albright and Hayes’s (2002) MGL model. Recall that raw
reliabilities, which are obtained by dividing the number of
verbs correctly derived by a phonological rule by the number of verbs to which the rule applies, are adjusted using a
lower confidence limit. This is the only free parameter in
the MGL model, varying from 50% to 100%, and its effect
is to penalize the reliability of rules that apply to less verbs.
Therefore, different confidence adjustments can dramatically change the reliability values of rules with narrower
scopes, in turn leading to a substantial reordering of the
rules. Furthermore, because the different conjugations in
Portuguese display different phonological distributions,
69
the type of adjustment that is used might differentially
impact the similarity effects for the three conjugation
classes.
The potential effect of the strength of the confidence
adjustment was investigated by using four additional ways
of calculating the reliability function, all of them previously
employed in comparisons conducted by Albright (Albright,
2002b; Albright & Hayes, 2003).6 For each of these four variants of the reliability function, the rules extracted by the MGL
model were sorted in descending order of their reliabilities,
and the novel words employed in the elicitation task were
fed into the model. As before, each item’s reliability for the
1st, 2nd, and 3rd conjugations corresponded to the scores
of the most reliable rules that output -ar, -er, and -ir infinitives, and the new reliability scores were then used as simultaneous predictors in weighted linear regressions on the
production log-odds of 1st, 2nd, and 3rd conjugation
responses.
The different versions of the reliability function had a
very small effect on the overall fit of the regression models
(r2 coefficients ranged from .36 to .42 for 1st conj., .57 to .59
for 2nd conj., and .34 to .40 for log-odds of 3rd conj.
responses). More importantly, all versions of the reliability
function yielded a dissociation between conjugations:
regardless of the type of adjustment, reliabilities for the
1st conjugation were never a significant predictor of 1st
(all ps > .723), 2nd (all ps > .271), or 3rd conjugation
responses (all ps > .534). In contrast, higher reliability values for both the 2nd and 3rd conjugations were systematically associated with higher proportions of responses
belonging to these classes (all ps < .001) and with lower
proportions of 1st conjugation responses (all ps < .001).
Therefore, this set of regressions replicated the results
above and showed that the obtained dissociation between
conjugations is robust to changes in this implementational
parameter of the MGL model.
In sum, the results showed that the log-odds of 2nd and
3rd conjugation responses were solely determined by their
corresponding reliability values, with no significant effects
from the reliabilities of competing classes. In contrast, the
MGL reliabilities for the 1st conjugation had no effect in
any of the conducted analyses, that is, they did not reliably
predict 1st conjugation responses or responses belonging to
the other classes. Instead, the log-odds of producing a 1st
conjugation response were predicted by phonological similarity to the 2nd and 3rd conjugations, such that the higher
the reliabilities for these classes, the lower the proportion of
participants producing 1st conjugation infinitives.
Additional analyses
In order to determine whether the obtained dissociation
between conjugations, and in particular, the absence of a
similarity effect for the 1st conjugation, could be explained
by potentially confounding factors, a range of subsequent
analyses was conducted on the log-odds of 1st conjugation
7
The four additional types of adjustment, besides the 75% lower
confidence limit used in the main analysis, were: raw reliabilities with no
adjustment; a 90% lower confidence limit; a 55% lower confidence limit; and
an adjustment that multiplied raw reliability values by 1.2n, where n is the
number of full segments specified in a rule’s phonological context.
70
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
responses, scrutinizing the possible influence of a number
of statistical and methodological aspects of our study.
Statistical factors. Concerning statistical factors, we first
asked whether a similarity effect for the 1st conjugation
could have been rendered undetectable because of limited
statistical power. An inspection of the estimated coefficients of the reliabilities for the 1st conjugation in all
regression models indicates that this is not the case. In five
regression models (on log-odds of 1st conj. responses, each
employing a different way of calculating the reliability
function; see above), coefficients for the reliabilities for
the 1st conjugation were extremely small, ranging from
0.00 to 0.14. Even for the largest of these coefficients (Table
1, with a model intercept of 1.56), a back-transformation
from log-odds to proportions shows that the effect of the
whole reliability scale corresponds to an increase of only
2% of 1st conjugation responses, from 83% (at a reliability
score of .00) to 85% (at a reliability of 1.00). Therefore, it
does not appear to be the case that an effect of similarity
to the 1st conjugation was not detected due to limitations
in statistical power.
A related possibility, however, is that differences in the
distribution of the predictors, that is, in the distribution of
reliabilities for the three conjugations, could have made
the dissociation in their effect sizes more extreme. As can
be seen in the Appendix, reliabilities for the 1st conjugation
across the items in Study 1 had both a smaller range and SD
than the reliabilities for the 2nd and 3rd conjugations. As
noted before (see Materials subsection), this is a consequence of the phonological distribution of the three classes
in interaction with properties of the MGL model. Despite
the impossibility of generating items with a wider range
of reliabilities, this limited variation could potentially
reduce the contribution of similarity to the 1st conjugation
in determining proportions of responses–an instance of the
well-known problem of ‘‘restriction of range” (e.g.,
Gulliksen, 1950), which can sometimes reduce correlation
coefficients (though by no means always; see Wiseman,
1967; Zimmerman & Williams, 2000).
In order to investigate this potential confound, we estimated how much larger the ‘‘unrestricted” effect of the reliabilities for the 1st conjugation would be, by applying a
common correction for restriction of range in correlations,
Thorndike’s Case 2 correction (Hunter & Schmidt, 1990;
Thorndike, 1949). We first performed a (weighted) correlation between the reliabilities for the 1st conjugation and the
residuals of a (weighted) regression, in which the effect of
the reliabilities for the 2nd and the 3rd conjugations had
been removed from the log-odds of 1st conjugation
responses. This correlation coefficient was then corrected,
by assuming that the SD of the reliabilities for the 1st conjugation was as large as that of the reliabilities for the 2nd
conjugation (SD = .177 to .293). The results showed that the
corrected correlation was inflated from only .02 to .04, a
minute increase of an extremely small coefficient, suggesting that restriction of range played no role in reducing the
effect of similarity to the 1st conjugation.
Finally, one last statistical factor that could have reduced
an effect of the reliabilities for the 1st conjugation is the
presence of multicollinearity (see, e.g., Baayen, 2008). In
our case, because multiple regression coefficients assess
only the unique contribution of the predictors, if the reliabilities for the 1st conjugation are negatively correlated
with those for the 2nd and 3rd conjugations, then their
independent contribution could be underestimated.
We assessed this possibility in two different ways: by
calculating the unique variance of the reliability predictors
and by conducting a stepwise regression. We first calculated each predictor’s tolerance, a measure of its variance
that cannot be accounted for by the other independent variables. Tolerance values were similar for all three reliabilities, .65 for 1st, .55 for 2nd and .70 for 3rd conjugation.
That is, all variables suffer from slight multicollinearity,
with the reliabilities for the 2nd, not 1st, conjugation being
the most affected. Therefore, if a null effect of the reliabilities for 1st conjugation was due to multicollinearity, one
would expect the same pattern of results to emerge for
2nd and 3rd conjugation responses, instead of a clear dissociation between conjugations.
In addition, the same three-predictor regression on the
log-odds of 1st conjugation responses was conducted in a
stepwise fashion, that is, the MGL reliabilities for -ar, -er,
and -ir were allowed to enter the model one at a time according to how much variance they explained (see Albright,
2002a, for a similar analysis). If similarity to the 1st conjugation determines log-odds of 1st conjugation responses, then
it should be considered a better predictor for inclusion.
Importantly, the contribution of the first predictor to enter
the model is assessed without controlling for the other predictors, and therefore, in a way that is immune to effects of
multicollinearity. The results showed that the first predictor
to be included was the reliabilities for the 2nd conjugation
(B = 1.29, t(74) = 4.91, p < .001) and the second and last
predictor was the reliabilities for the 3rd conjugation
(B = 1.41, t(73) = 4.23, p < .001), but at no point were the
reliabilities for the 1st conjugation considered to enter the
model. Therefore, the stepwise regression produced an
identical result to the previous set of regressions (in which
all predictors were simultaneously considered), indicating
that multicollinearity cannot account for the observed dissociation between conjugations.
Methodological factors. In what concerns potential methodological confounds, a plausible concern in a nonce word
elicitation task, in which none of the possible answers is
the single ‘correct’ one, is that participants might converge
on a pattern of responding in which they give the same type
of answer for every item (e.g., by realizing at some point in
the task that most of their answers involved the same transformation and then repeating the same type of answer). If
this was the case in the present task, then participants
could have repeatedly assigned novel forms to the 1st conjugation, which was the more common response, without
actually reading or considering the different novel forms.
Crucially, this could reduce or altogether eliminate a similarity effect for the 1st conjugation, because even items
with low reliability for the 1st conjugation would still get
a very large proportion of 1st conjugation responses, should
they occur closer to the end of the task.
In the present study, various steps were taken to eliminate such task habituation effects. First, because each item
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
was embedded in a different frame ‘conversation’, participants were led to read each individual sentence and, presumably, to give more natural responses than if they only
needed to focus on the novel verbs. Second, because there
were two versions of the task containing items presented
in different orders, the same items should not be affected
by the same list composition effects for all participants.
Third, the 24 items that are highly similar to the 2nd and
3rd conjugations were scattered throughout the task in
order to maximize the probability that different types of
response were given at all points in the task.
More importantly, it is exactly because items that are
very similar to the 2nd and 3rd conjugations were randomly distributed, that potential effects of previous
answers on subsequent ones cannot, by themselves,
account for the observed dissociation between conjugations. If participants settled on a pattern of 1st conjugation
responding, then most items would elicit very high proportions of -ar responses, regardless of their phonological similarity to the different conjugations, in which case one
would expect all conjugations to show weak or non-significant similarity effects, contrary to what was obtained.
In addition to the care taken to minimize such taskrelated artifacts, we have attempted to estimate a possible
influence of task habituation on participant’s responses.
This was done by repeating the regressions on the production log-odds of 1st conjugation responses, but including as
an additional predictor a (centered) variable encoding item
position, thereby estimating and, more importantly, controlling for any potential influence of habituation throughout the task. Because two different versions of the task
were administered, each with different item orders, logodds and case weightings were recalculated and these
four-predictor regressions were conducted separately for
each version. The results showed no effect of item position
in either version (version 1: t(71) = 1.16, p = .251; version
2: t(71) = 0.78, p = .441). In addition, even when item position was controlled for, there were no significant effects of
the reliabilities for the 1st conjugation on the corresponding production log-odds (version 1: t(71) = 0.13, p = .900;
version 2: t(71) = 0.18, p = .858). Instead, and as before,
production log-odds for the 1st conjugation were, in both
versions, significantly predicted by the reliabilities for the
2nd (version 1: t(71) = 5.28, p < .001; version 2: t(71)
= 4.29, p < .001) and 3rd conjugations (version 1: t(71)
= 3.59, p < .001; version 2: t(71) = 2.96, p = .004), such
that higher reliabilities for these classes are associated with
smaller log-odds of 1st conjugation responses. These results
show that the observed dissociation between conjugation
was immune to habituation effects and also that it is robust
enough to hold for two distinct subgroups of participants.
Discussion
The results of the elicited production task showed a clear
dissociation between the 1st conjugation, on the one hand,
and the 2nd and 3rd conjugations on the other, which can
be summarized as follows. Firstly, a substantial proportion
of the variance in the probabilities of producing 2nd and
3rd conjugation forms was accounted for by their respective
71
reliability scores, indicating that generalizations of stems
belonging to these classes were determined by the degree
of phonological similarity (as defined by Albright’s, 2002a,
MGL model) between novel roots and the roots of existing
2nd and 3rd conjugation verbs. Secondly, the reliabilities
for the 1st conjugation did not predict the probabilities of
producing 1st conjugation forms, once the reliabilities for
competing classes are factored out. Thirdly, the majority of
variance in producing 1st conjugation forms was explainable
by the reliabilities for the two other classes, such that the
higher the reliabilities for the 2nd or 3rd conjugations, the
lower the likelihood of a 1st conjugation response.
These results were found to be unaffected by a number
of computational, statistical and methodological factors. In
all analyses, generalization of 1st conjugation forms was
not influenced by the degree of phonological similarity,
which is precisely what is expected if 1st conjugation stem
formation is based on a context-free operation that is insensitive to the idiosyncratic properties of individual tokens.
On the other hand, the results suggest that the generalization of 2nd and 3rd conjugation stems is based on a
context-sensitive and graded process that produces similar
responses to similar inputs, that is, on an operation of
similarity-based generalization.
Study 2: A comparison of two competing computational
implementations
In the present study, we contrasted two different computational implementations in how well they match the
elicited production data obtained in Study 1. The first of
these was Albright’s (2002a) MGL, a model in which multiple morphophonological rules are extracted from input–
output pairs and then evaluated by their reliability, that
is, by the adjusted proportion of verbs in the lexicon that
they correctly derive. The second implementation is our
own model, the Default Generalization Learner (DGL), in
which the evaluation metric makes a principled distinction
between a context-free default rule and context-sensitive
rules, which contain phonological material as part of their
conditions of application.
As explained before, a number of specific and general
rules are extracted by the MGL algorithm during the learning stage, possibly including a context-free rule that applies
a given morphological transformation to any possible input
(i.e., it contains only a variable as part of its conditions of
application). Importantly, all MGL rules, regardless of their
specificity, are ordered by their reliability values, that is,
by how successful they are in deriving existing forms in
the language.
The DGL model we propose here is based on the same
algorithm for the discovery of phonological environments
and rule learning, but differs from the MGL model in the
ranking of rules, and in particular, in how the context-free
rule is evaluated. More specifically, the DGL model is
endowed with a built-in principle according to which a
maximal reliability score is attributed to the first contextfree rule that is derived during the rule-learning process,
that is, to the rule that forms a 1st conjugation infinitive
from any 1sg form.
72
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
The assignment of a maximal score follows from the
very notion of a linguistic default. By definition, a default
can be applied to an unbounded number of cases, so that
in terms of a reliability metric of rule evaluation, a default
should have maximal reliability or confidence. Therefore,
the crucial distinction between Albright’s (2002a) MGL
model and the DGL model we are proposing is that our
model is innately guided to distinguish between a context-free rule and all other phonologically restricted rules:
only the context-free rule is ascribed maximal reliability,
or in other words, the morphological transformation that
it performs is ascribed maximal well-formedness. As such,
the DGL model becomes an implementation of a dualmechanism proposal, similar to the ones by Say and
Clahsen (2002) and Veríssimo and Clahsen (2009), in two
important ways. First, by making a principled distinction
between the two types of rule, the model invokes separate
mechanisms (context-free and context-sensitive) for the
generalization of the default class (i.e., the 1st conjugation)
and of the two other phonologically restricted classes (i.e.,
the 2nd and the 3rd conjugations). Second, by biasing rule
evaluation towards the maximal well-formedness of a
default, our model postulates an innate principle that goes
beyond the input-driven statistics that guide rule evaluation in Albright’s MGL model.
In the present study, these two competing implementations were contrasted by comparing their predicted proportions of 1st, 2nd and 3rd conjugation responses for the
items employed in Study 1, with the actual proportions of
responses given by participants. Whilst the analyses in
Study 1 involved correlating the MGL reliability scores to
log-odds of responses (i.e., whether responses increase as
the corresponding similarity to a class increases), here we
ask how the MGL model and our dual-mechanism implementation fare in predicting actual proportions of
responses, that is, whether they overestimate or underestimate responses belonging to the different conjugations.
In the present analysis, the 76 items used in Study 1
were factorized into one of three groups, following the
MGL model preference for the 1st, 2nd, or 3rd conjugations.
That is, items were grouped according to the transformation
performed by the most reliable MGL rule that they match.
Under the assumptions that the well-formedness of a novel
stem is derived from the reliability of its best matching rule,
and that most participants choose the most well-formed
transformation when producing an output, the MGL prediction is that the 24 items that are highly similar to the 2nd or
3rd conjugations should elicit a majority of responses
belonging to these classes. In contrast, the DGL implementation specifically favors the application of the context-free
default regardless of the phonological properties of novel
roots. Therefore, the DGL model predicts that even items
that are similar to the 2nd and 3rd conjugations receive a
majority of 1st conjugation responses. For the remaining
52 items, both models would predict a majority of 1st
conjugation responses.
Method
The predictions of the two computational models were
compared to human data by deriving expected proportions
from each model for each of the items that was part of the
elicited production task. Expected proportions of responses
were derived from the MGL rule evaluations by applying
Luce’s (1959) choice axiom to each of the well-formedness
scores (i.e., reliabilities) for all the novel items that were
used in Study 1. In other words, it was assumed that the
probability of selecting an output from a set of candidates
with different well-formedness scores is given by dividing
each corresponding score by the sum of well-formedness
scores for all possible outputs.
More specifically, in order to calculate the MGL model’s
expected proportions of 1st, 2nd and 3rd conjugation
responses for a given item, we first obtained every item’s
reliabilities for -ar, -er and -ir, that is, the same values that
were used as predictors in Study 1. Secondly, each of the
scores was divided by the sum of all three reliability values
to yield expected proportions of -ar, -er and -ir responses.
For example, the best rules for each conjugation that match
the novel 1sg present indicative form sarrolvo [sɐʁolvu]
have reliabilities of .482 (for [sɐʁolar], in the 1st conjugation), .489 (for [sɐʁoler], in the 2nd conjugation) and .171
(for [sɐʁolvir], in the 3rd conjugation). In this example,
the sum of all three reliability scores for sarrolvo is 1.142,
and dividing each of the three scores by this sum yields predicted proportions of 42.21%, 42.82%, and 14.97% for the 1st,
2nd, and 3rd conjugations, respectively (which necessarily
add up to 100%). Thirdly, as in Study 1, statistical analyses
were conducted on log-odds, rather than expected proportions. In order to make the log-odd calculation exactly parallel to the one for the human data, expected counts for
each particular item were obtained by multiplying expected
proportions by the number of valid responses for that same
item in Study 1; these predicted counts were then subjected
to the same empirical logit transformation (as per the equation in Footnote 3).
The same procedure was employed to derive expected
proportions and log-odds from our DGL model, which differs from Albright’s (2002a) MGL model in that the 1st conjugation context-free rule is assigned a maximal wellformedness score (i.e., a reliability score of 1). Therefore,
expected proportions of 1st conjugation responses for an
item with reliabilities w2 and w3 (for the 2nd and 3rd conjugations) were given by the inverse of 1 + w2 + w3, whilst
expected proportions of 2nd and 3rd conjugation responses
were given by dividing the reliabilities for each of these
classes by 1 + w2 + w3. As above, expected counts were calculated and then converted to log-odds.7
Finally, items were grouped in terms of their highest
MGL reliability value, that is, separated into items more
similar to the 1st, 2nd and 3rd conjugations (52 items for
a 1st conjugation group, and 12 items for each of the 2nd
and 3rd conjugation groups). The data were submitted to
Analyses of Variances (ANOVAs) and t-tests, directly comparing model expected log-odds to human data.8
8
The case weighting that was employed in Study 1 cannot be used in
repeated measures analyses of multinomial data, such as the ones in Study 2,
because the comparisons involve two different sets of weights.
9
All paired t-tests performed on subsets of 12 items were also conducted
using a non-parametric method, Wilcoxon signed rank tests, which showed
exactly the same pattern of results.
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
73
Fig. 1. Mean proportions (and associated 95% confidence intervals) of 1st, 2nd, and 3rd conj. responses obtained with human participants and predicted by
the MGL and DGL models, for the group of items that highly resemble (a) 1st conj., (b) 2nd conj. and (c) 3rd conj. verbs.
Results
The mean proportion of 1st conjugation responses across
all 76 analyzed items was 73.2%, demonstrating that participants displayed a clear preference for producing 1st conjugation infinitives. Mean proportions of 2nd and 3rd
conjugation infinitives were substantially smaller, corresponding to 12.3% and 14.3%, respectively.
Fig. 1 displays proportions of 1st, 2nd and 3rd conjugation responses obtained in the elicited production task with
human participants, as well as expected proportions
derived from the MGL and DGL computational models, for
each group of items: novel items highly similar to the 1st
(panel a), 2nd (panel b) and 3rd conjugations (panel c). In
what follows, we will contrast the different types of
responses given by participants with the predictions of both
the MGL and the DGL models. Finally, the two models are
directly compared with respect to the absolute error of their
predictions across all items.
Human responses vs. MGL model predictions
For the group of items that are highly similar to the 1st
conjugation (see Fig. 1, panel a), the MGL model predicts
higher proportions of 1st conjugation infinitives (73.7%)
than of those belonging to the 2nd (6.9%; t(51) = 20.78,
p < .001) or 3rd conjugations (19.4%; t(51) = 13.17,
p < .001). This same pattern was obtained in the human
data, that is, 1st conjugation responses were overwhelmingly prevalent (80.1%) and more common than 2nd
(7.0%; t(51) = 18.96, p < .001) or 3rd conjugation responses
(12.6%; t(51) = 16.12, p < .001). However, a direct comparison between the log-odds of 1st conjugation responses produced by participants (80.1%) and those predicted by the
MGL model (73.7%) shows that the model significantly
underestimates porportions of infinitives in -ar (t(51)
= 3.73, p < .001).
As for responses to items highly similar to the 2nd conjugation (see Fig. 1, panel b), we directly compared the relative proportion of 1st and 2nd conjugation responses given
by participants with the predictions of the MGL model, by
conducting a repeated measures Analysis of Variance
(ANOVA) with two factors: Output Source (MGL vs. participants) and Response Type (1st conj. vs. 2nd conj.).
The results revealed no main effects (Output Source:
F(1,11) = 1.41, p = .260; Response Type: F(1,11) = 1.90,
p = .196), but a significant Output Source Response Type
74
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
interaction (F(1,11) = 10.09, p = .009). That is, the difference
between log-odds of 1st and 2nd conjugation responses
was significantly larger in the human data than in the model’s predictions.
Even though, for these items, participants produced a
majority of 1st conjugation responses (50.9%) and a slightly
smaller proportion of 2nd conjugation responses (44.2%),
with the log-odds of these responses being statistically
indistinguishable (t(11) = 0.71, p = .491), the MGL model
predicted higher proportions of responses belonging to
the 2nd conjugation (57.2%) than to the 1st conjugation
(34.5%), and this difference is statistically significant (t
(11) = 9.07, p < .001). In addition, direct contrasts between
the human data and the MGL model showed that the model
overestimated log-odds of 2nd conjugation responses
(44.2% vs. 57.2%; t(11) = 2.64, p = .023) and underestimated log-odds of 1st conjugation responses (50.9% vs.
34.5%; t(11) = 3.52, p = .005).
When considering the group of items that are highly
similar to the 3rd conjugation (see Fig. 1, panel c), participant and MGL model data were submitted to the same
repeated measures ANOVA, but including 1st and 3rd conjugation log-odds as the two Response Types. A very similar
pattern emerged. There was no main effect of Output
Source (F(1,11) = 1.53, p = .242), and a marginally significant effect of Response Type (F(1,11) = 4.25, p = .064). More
importantly, the ANOVA also revealed a significant Output
Source Response Type interaction (F(1,11) = 12.94,
p = .004), demonstrating a larger difference between the
log-odds of 1st and 3rd conjugation responses in the participant data than in the MGL model’s predictions.
Participants produced a majority of 1st conjugation
responses (65.4%), that is, proportions of infinitives in -ar
were higher than of 3rd conjugation infinitives (t(11)
= 2.85, p = .016), with the latter being associated with a
mean proportion of only 31.2%. In contrast, the model predicts a significantly higher proportion of 3rd conjugation
(53.9%), rather than 1st conjugation (42.1%) responses
(t(11) = 7.22, p < .001). Furthermore, relatively to the
responses produced by human participants, the MGL model
overestimates proportions of 3rd conjugation responses
(31.2% vs. 53.9%; t(11) = 3.58, p = .004) and underestimates proportions of 1st conjugation responses (65.4% vs.
42.1%; t(11) = 3.56, p = .004), as was the case for items
highly similar to the 2nd conjugation.
Human response vs. DGL model predictions
In order to investigate how the predictions of the DGL
model match the human data, we conducted parallel analyses to those conducted for the MGL model reported above.
First, considering only the group of items that are more similar to the 1st conjugation than to the other classes (see
Fig. 1, panel a), recall that both the participant data and
the MGL predictions showed a large majority of 1st conjugation responses. The same pattern is obtained in the analyses of the predictions of the DGL model, that is, 1st
conjugation infinitives (79.0%) are predicted to be a more
common response than both 2nd conjugation (5.6%; t(51)
= 26.01, p < .001) and 3rd conjugation infinitives (15.4%; t
(51) = 18.08, p < .001). Crucially, however, whilst the MGL
model still significantly underestimated proportions of 1st
conjugation responses (see above), the DGL predictions
approximated the proportions of 1st conjugation responses
remarkably well (80.1% vs. 79.0%; t(51) = 1.24, p = .220).
Likewise, for the group of items more similar to the 2nd
conjugation (see Fig. 1, panel b), the same analysis that was
employed in the comparison of the predictions of the MGL
model with the human data was now conducted for the
DGL predictions. More specifically, the same repeated measures ANOVA with two factors, Output Source (DGL vs. participants) and Response Type (1st conj. vs. 2nd conj.). The
results revealed no main effects (Output Source: F(1,11)
= 0.09, p = .775; Response Type: F(1,11) = 1.71, p = .217),
and, crucially, no interaction between the two factors (F
(1,11) = 0.003, p = .955). That is, the difference between
mean log-odds of 1st and 2nd conjugation responses for
these items were similar in the human data and the DGL
predictions. Even though this interaction was not significant, we further tested the DGL predictions by comparing
log-odds of 1st and 2nd conjugation responses, within and
across each Output Source. The DGL model predicted larger
mean log-odds of 1st (50.8%), rather than 2nd (43.0%) conjugation responses (t(11) = 3.20, p = .008). This same contrast was not statistically significant in the analysis of the
human data (see above), but the discrepancy can be attributed to the larger variability in participant responses. In
fact, as can be seen on Fig. 1 (panel b), mean proportions
predicted by the DGL model are almost identical to those
in the participant data, and they do not differ for either
1st conjugation responses (50.9% vs. 50.8%; t(11) = 0.02,
p = .982), or 2nd conjugation responses (44.2% vs. 43.0%; t
(11) = 0.13, p = .901).
Considering now the predictions of the DGL model for
the items that are highly similar to the 3rd conjugation
(see Fig. 1, panel c), the same repeated measures ANOVA
was conducted, albeit on 1st and 3rd conjugation responses.
The ANOVA revealed no main effect of Output Source (F
(1,11) = 2.64, p = .133), but a main effect of Response Type
(F(1,11) = 11.72, p = .006), reflecting more 1st, rather than
3rd, conjugation responses. The interaction between these
two factors approached significance (F(1,11) = 4.44,
p = .059), which suggests a larger difference between 1st
and 3rd conjugation responses in the participant data than
in the DGL predictions. Paired contrasts showed that the
DGL model predicted larger proportions of 1st (56.0%)
rather than 3rd (41.0%) conjugation responses (t(11)
= 7.72, p < .001), which is the same pattern that was
obtained in the human data (see above). However, marginally significant comparisons across Output Sources suggest
that the model slightly underestimates 1st conjugation
responses (65.4% vs. 56.0%; t(11) = 1.96, p = .076) and
slightly overestimates 3rd conjugation responses (31.2%
vs. 41.0%; t(11) = 2.20, p = .051).
MGL vs. DGL in absolute error
In the last analysis contrasting the two computational
implementations, the MGL and DGL models were directly
compared in how well they fit the data across all items
employed in the present study. In order to assess this, absolute error was calculated for each item and each response
type, for each of the two models. We calculated the difference between the MGL and DGL predicted proportions
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
and the obtained proportions of responses given by participants (after conversion to log-odds), regardless of whether
the human data were being under- or overestimated. Average absolute error produced by the MGL and DGL models
was then compared for all three response types, by conducting t-tests.
The results revealed that average absolute error for 1st
conjugation responses was significantly lower for the DGL
(0.68) than for the MGL model (0.85; t(75) = 3.81,
p < .001). The same advantage was obtained for responses
belonging to the non-default classes: the DGL model outperformed the MGL model in its predictions of the proportions of 2nd conjugation responses (0.97 vs. 1.05; t(75)
= 2.42, p = .018) and of 3rd conjugation responses (0.80 vs.
0.99; t(75) = 6.22, p < .001).
Finally, root mean squared error (a commonly used measure of model accuracy) was also calculated and was uniformly lower for the DGL model, for both 1st (1.00 vs.
0.80), 2nd (1.29 vs. 1.23), and 3rd conjugation responses
(1.21 vs. 1.02).
Summary
The results of Study 2 showed that the preference for
the 1st conjugation is sufficiently strong to override high
levels of similarity to the other conjugations. For items
very similar to the 2nd conjugation, proportions of 1st
and 2nd conjugation infinitives were statistically indistinguishable (but numerically larger) for the 1st conjugation.
Similarly, for items that match highly reliable 3rd conjugation rules, the mean proportion of 1st conjugation infinitives was even larger than that of responses belonging to
the 3rd conjugation.
When comparing Albright’s (2002a) MGL model of stem
formation with the participant data, it was clear that the
model failed to account for the data in a very specific
way: it consistently overestimated the role of similarity in
the generalization of inflectional classes in Portuguese. This
lead to inflated predictions of the proportions of 2nd and
3rd conjugation responses for verbs that fall into specific
phonological contexts that are characteristic of these classes. Conversely, the MGL model underestimated the proportion of 1st conjugation responses, which were found to be
more numerous than what could be predicted by the reliability metric that determines rule priority in the model.
In contrast, the predicted proportions of responses
extracted from our DGL model showed a remarkable close
fit to the human data, which was in fact, statistically indistinguishable from obtained proportions.
General discussion
The combined results of Study 1, that is, the default-like
behavior of 1st conjugation and the similarity effects of 2nd
and 3rd conjugation stems, are straightforwardly explained
by the dual-mechanism account put forward by Say and
Clahsen (2002), for Italian, and Veríssimo and Clahsen
(2009) for Portuguese. This account distinguishes between
a general rule for 1st conjugation stem formation that
may apply to any verbal root and a restrictive set of associatively represented 2nd and 3rd conjugation stems and
roots.
75
Alternatively, it has been proposed that conjugational
stems do not have any internal morphological structure
and that the default-like behavior of the 1st conjugation is
a consequence of its higher frequency or of its more heterogenous phonological distribution. Consider firstly Eddington’s
(2002) attempt to simulate the results from Say and Clahsen
(2002). It is true that this model correctly simulated the finding that novel forms of verbs that bear no similarity to existing words are preferably assigned to the 1st conjugation.
Closer inspection, however, reveals that the simulation output was inaccurate in several other ways. For novel verbs
resembling existing 2nd and 3rd conjugation verbs with
irregular stems, for example, speakers of Italian preferred
1st conjugation forms (59% and 71%, respectively), but in
Eddington’s model the majority of responses belonged to
the 2nd (55%) or 3rd conjugations (56%). For novel verbs that
rhymed with existing 2nd conjugation verbs, Say and
Clahsen’s participants produced 1st and 2nd conjugation
forms with similar percentages (43% for the former, 45% for
the latter), but Eddington’s model dispreferred 1st over 2nd
conjugation forms (27% vs. 54%). These discrepancies
between the model’s output and the performance of human
participants are a result of the model’s overreliance on phonologically similar patterns as the basis for generalization.
Similar problems arise in Colombo et al.’s (2006) connectionist network model of stem formation in Italian. Again,
the proportions of 1st conjugation responses (i.e., participles in -ato) to 2nd or 3rd conjugation pseudoverbs were
much smaller than for human participants, indicating an
oversensitivity of the network to phonological similarity.
Furthermore, unlike human participants, Colombo et al.’s
model produced a large proportion of incorrect (‘unclassifiable’) responses, especially for 2nd and 3rd conjugation
pseudoverbs. This relatively poor performance is likely to
be a consequence of the high proportion of 1st conjugation
forms in the training set, which enabled the network to
approximate the default-like behavior of the 1st conjugation. However, Colombo et al.’s results suggest that this
can only be achieved at the expense of an unacceptable
level of performance for the other classes. Thus, purely
frequency and similarity-based models such as the ones
proposed by Eddington (2002) and Colombo et al. fail to
accurately simulate the different generalization properties
of the inflectional classes.
Finally, Albright’s (2002a) MGL model is not able to capture the generalization properties of 1st conjugation stems
in Portuguese. If, as postulated by the MGL model, all conjugations were generalized on the basis of a context-sensitive
mechanism, we would expect our manipulation of phonological similarity to have an effect on the proportion of
responses belonging to all classes. However, our results
showed that the MGL reliability scores only predicted proportions of 2nd and 3rd conjugation responses. Furthermore,
the MGL model consistently underestimated the proportion
of 1st conjugation responses given by participants, that is,
the 1st conjugation is generalized to novel roots beyond
what would be expected given their phonological similarity
to existing roots.
In contrast to the present findings, however, Albright’s
(2002a) acceptability judgement study in Italian produced
a different pattern of results, in which acceptability ratings
76
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
of novel infinitival forms belonging to all conjugations were
found to be predicted by their respective reliability scores.
Although ratings given to 1st conjugation forms were
(negatively) predicted by ratings given to other classes,
and were influenced by a phonological well-formedness
metric, there was still a significant effect of the reliabilities
for the 1st conjugation. Therefore, whilst our results for the
2nd and 3rd conjugations are in accordance with those
reported by Albright in Italian, there is an important inconsistency regarding the role of phonological similarity in the
generalization of 1st conjugation stems. How can these
discrepancies be explained?
First, the possibility that the conflicting results are attributable to language-specific differences cannot be discarded.
Although the morphological characteristics of Portuguese
and Italian are very similar, the distribution of the different
conjugations in terms of phonological properties and their
type and token frequencies are different, and such differences could perhaps bias the morphological system towards
a larger reliance on context-sensitive or variable-based generalization. However, given the scarcity of psycholinguistic
evidence pertaining to this issue, there is no principled
reason for assuming that the generalization of conjugation
classes in different Romance languages should be characterized by distinct representational mechanisms.
Setting aside this basic divergence, the only obvious
differences between Albright’s study and the present experiment are of a methodological nature. Firstly, regarding the
MGL input and the construction of stimuli, our elicited production experiment featured a number of methodological
improvements over Albright’s (2002a) experiment. One of
these is the size of the frequency lexicon that served as
input to the MGL simulation. We used a considerably larger
set of verbs than Albright (n = 3,117 vs. n = 2,022). A smaller
frequency-ordered lexicon will differ from a larger one primarily in the number of 1st conjugation verbs it contains,
which in turn, might lead to dramatic changes in the corresponding reliabilities, possibly leading to their underestimation. It is difficult to assess the effects of this difference
without running MGL simulations comparing lexicons of
different sizes, but it should be clear that the reliability values we obtained are more realistic than the values used by
Albright. In addition, our materials are also more representative of the language than Albright’s in that they covered a
wide range of reliability scores, whereas Albright’s items
were confined to a subset of verbs with very high reliabilities for the 1st conjugation (see Method of Study 1).
Thirdly, and more importantly, the two studies crucially
differ in the experimental tasks that were used. The present
experiment used elicited production as a means to assess
generalization, whilst Albright’s (2002a) study employed
an acceptability judgement task. In Albright’s experiment,
participants were presented with a 1sg present indicative
form of a given verb followed by the corresponding infinitive forms (belonging to different conjugations) and they
had to rate how ‘typical’ each form sounded to them. It is
conceivable that this particular presentation format recruits
processes that are very different from the ones normally
involved in language use. For example, the emphasis on
the ‘typicality’ of a form might trigger (perhaps conscious)
processes of lexical search for similar existing forms.
Furthermore, the presentation of all possible infinitival
stems for a given root may encourage participants to perform artificial comparisons as to how typical each form is
as a member of a particular class. In this way, Albright’s task
may have produced inflated similarity effects. Compare this
with an elicited production experiment, in which participants are only given one ambiguous form and asked to
complete a sentence blank in the way they consider most
appropriate. We consider this a more natural way of examining the mechanisms involved in the generalization of
stems than Albright’s acceptability judgement task.
In Study 2, we have also directly contrasted the predictions of the MGL model with a revised model of stem formation. In order to produce a novel implementation we changed
the original MGL model in a single specific way, which we
argued would bring the model closer to a dual-mechanism
account (Clahsen, 1999; Say & Clahsen, 2002; Veríssimo &
Clahsen, 2009). Instead of confidence in a 1st conjugation
output being measured by Albright and Hayes’s (2002) reliability function (which measures the ‘success’ of a given
rule), we proposed that a default context-free rule is treated
by the linguistic system in a special way, in that it is associated with a maximum confidence value. In other words, the
default morphological transformation is associated with a
maximum well-formedness score.
The results of this study showed that the DGL model outperforms the MGL model both in overall error, and in
accounting for the patterns of participant responses for different types of items. In particular, whilst Albright’s (2002a)
MGL underestimated the proportions of 1st conjugation
responses that were given to items that are highly similar
to the 2nd and 3rd conjugations, the DGL model predicts
average proportions that are remarkably similar to the ones
found in the human data. Furthermore, whilst the MGL
model underestimated 1st conjugation responses even for
novel forms that it preferably assigns to that class, the
DGL model does not.
Importantly, this implementation also provided proof of
principle in two respects. Firstly, it shows that a model
embedding a dual-mechanism architecture does not overgeneralize the default transformation. That is, even though
the 1st conjugation has a quantitative advantage relatively
to the other classes, mean proportions of 1st conjugation
responses were not overestimated. Secondly, it demonstrates that even substantial proportions of non-default
responses (in our study, 20% of 2nd and 3rd conjugation
responses for items that are relatively more similar to the
1st conjugation) are not unexplainable by a dual-mechanism account. On the contrary, these responses follow naturally from the fact that such items bear some resemblance
to existing 2nd and 3rd conjugation verbs, and in fact, the
strength of this process relatively to the preference for the
1st conjugation appears to have been almost perfectly
approximated by the dual-mechanism model.
Finally, another crucial respect in which the DGL model
outperforms Albright’s (2002a) MGL is by predicting that
the generalization of the 1st conjugation to novel forms is
insensitive to phonological similarity. Given that the context-free rule is ascribed maximal reliability, then it will
always serve as the best 1st conjugation rule; any minor
1st conjugation rules that apply in restricted parts of the
77
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
phonological space have lower reliability and do not make
any contribution to the output. As such, our DGL model
treats the generalization of the 1st conjugation as always
based on a context-free mechanism in which phonological
similarity plays no role, which accounts for the results of
our regression analyses of the elicited production task.
Even though we demonstrated that the kind of structured phonological similarity that is captured by the MGL
model is not sufficient to explain the generalization of 1st
conjugation stems, we have not explored the potential role
of other sources of similarity for morphological processes.
For example, Ramscar (2002) has argued that semantic similarity plays a role in English past tense inflection. Likewise,
Keuleers et al. (2007) have used a plural inflection task in
Dutch to show that an analogical memory-based model that
makes use of both phonological and orthographic information produces similar generalization patterns as human
participants. One direction for future work is investigating
whether such effects can be found in stem formation, for
example, by applying the same algorithm of minimal
generalization to inputs and outputs that incorporate nonphonological representations.
Note, however, that such accounts also predict effects of
phonological similarity for all types of morphological generalization (besides effects of other types of similarity).
Therefore, the challenge for any similarity-driven account
is showing that effects of phonological similarity in Portuguese dissociate along conjugation classes: they are absent
in the generalization of the 1st conjugation, but present in
the case of the non-default classes.
Conclusions
In sum, when the three Portuguese conjugations were
explicitly distinguished such that the default conjugation is
‘maximally reliable’, and the non-default classes are still generalized in a restricted manner on the basis of similarity, we
arrived at a much better account, that produced less overall
error and explained both the qualitative and quantitative
patterns in the results of our elicited production study. In
contrast, when the three conjugations were generalized by
similarity and their strength was derived purely by inputdriven statistics, as is the case in the MGL model, then the role
ascribed to similarity is larger that what participants reveal.
This specific discrepancy from the human data is evident
not only in the behavior of Albright’s (2002a) MGL model,
but also of other similarity-based models of conjugation
assignment in Romance languages (e.g., Colombo et al.,
2006; Eddington, 2002; see above). We believe that to be
a consequence of these models’ single-mechanism architectures and, in particular, that the pattern of errors they display suggests that purely similarity-driven algorithms of
morphological acquisition are insufficient to exhibit
default-like generalizations.
By taking Romance conjugations as a case of pure
morphology and contrasting two minimally differing
computational models, we have provided evidence for
a distinction between two different mechanisms of
morphological generalization: context-free, unbounded
operations and context-sensitive restricted generalizations.
More generally, our results demonstrate the need to
postulate variable-based symbolic operations to account
for linguistic productivity and support approaches that
argue for a dual architecture of the language faculty (e.g.,
Clahsen, 2006; Pinker & Ullman, 2002).
Acknowledgments
Supported by doctoral (SFRH/BD/13195/2003) and postdoctoral fellowships (SFRH/BPD/65164/2009) awarded to
João Veríssimo by the Fundação para a Ciência e a Tecnologia, Portugal, and by an Alexander von Humboldt Professorship awarded to Harald Clahsen. We thank Adam Albright
for advice on implementing the computational simulation,
Constança Carvalho for recruiting many of the participants,
Patrícia Vidigal for help in designing the materials, and
three JML reviewers (T.M. Bailey, E. Keuleers, one anonymous) for detailed and helpful comments.
List of experimental stimuli
Items used in the current studies (in the 1st present
indicative), their corresponding MGL reliabilities for the
1st, 2nd and 3rd conjugations, and descriptive statistics for
their distribution.
MGL reliabilities
1st conj. (-ar) 2nd conj. (-er) 3rd conj. (-ir)
Descriptives
Mean
.698
SD
.177
Min
.459
Max.
.994
Range
.534
.192
.293
.000
.983
.983
.280
.248
.067
.923
.856
Item
prizo
lico
rento
bito
apreio
matreio
alfego
buro
faugo
livo
zalo
fanso
sulho
lauso
feduzo
frigo
jasto
beço
saurro
faivo
pretuo
launo
perfenso
.084
.040
.040
.040
.104
.104
.102
.034
.102
.082
.000
.036
.000
.084
.084
.102
.040
.036
.061
.082
.043
.000
.215
.120
.081
.069
.080
.078
.078
.123
.089
.123
.171
.074
.067
.074
.487
.923
.123
.069
.570
.123
.171
.570
.719
.067
.994
.990
.983
.970
.970
.970
.967
.951
.938
.933
.933
.929
.916
.911
.911
.908
.908
.899
.872
.852
.852
.821
.812
(continued on next page)
78
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
References
(continued)
MGL reliabilities
1st conj. (-ar) 2nd conj. (-er) 3rd conj. (-ir)
labenso
astenso
renso
dechenço
beilo
acuo
sôcho
fraimo
lauvo
micho
sempo
taucho
laido
empido
estumo
defido
tromo
vundo
manstituo
lustituo
bendo
assido
ajido
espoudo
chinzo
quendo
inuo
ambuo
anhuo
prijo
azijo
lajo
mecisto
conzuo
fubo
saubo
tureo
anheo
treso
azeso
apreso
dezêbo
feibo
taibo
denfo
pinvo
estenvo
arro
dorro
murêvo
perzolvo
sarrolvo
faio
esgaio
solho
.812
.812
.812
.812
.805
.787
.785
.785
.775
.739
.733
.723
.719
.719
.719
.706
.646
.636
.612
.612
.610
.607
.601
.601
.584
.584
.575
.575
.575
.574
.574
.574
.574
.570
.544
.544
.503
.503
.503
.503
.503
.492
.492
.492
.486
.482
.482
.482
.482
.482
.482
.482
.480
.480
.459
.215
.215
.215
.215
.000
.043
.036
.000
.082
.036
.058
.036
.102
.102
.000
.102
.000
.102
.043
.043
.833
.102
.102
.102
.084
.959
.043
.043
.043
.075
.075
.075
.040
.043
.102
.102
.983
.983
.872
.872
.872
.852
.102
.102
.058
.064
.064
.061
.916
.908
.719
.489
.034
.034
.000
.067
.067
.067
.067
.074
.512
.067
.292
.171
.067
.145
.067
.297
.297
.292
.355
.238
.354
.872
.872
.150
.433
.355
.251
.120
.150
.742
.685
.629
.523
.523
.300
.719
.804
.355
.355
.067
.067
.067
.067
.067
.171
.607
.607
.145
.171
.171
.123
.123
.171
.171
.171
.694
.694
.074
Agresti, A. (2002). Categorical data analysis (2nd ed.). New York, NY: John
Wiley & Sons.
Albright, A. (2002a). Islands of reliability for regular morphology: Evidence
from Italian. Language, 78, 684–709.
Albright, A. (2002b). The lexical bases of morphological well-formedness.
In S. Bendjaballah, W. U. Dressler, O. E. Pfeiffer, & M. D. Voeikova (Eds.),
Morphology 2000 (pp. 5–15). Amsterdam: John Benjamins.
Albright, A., & Hayes, B. (2002). Modeling English past tense intuitions with
minimal generalization. In M. Max well (Ed.), Proceedings of the sixth
meeting of the ACL special interest group in computational phonology
(pp. 58–69). Philadelphia: Association for Computational Linguistics.
Albright, A., & Hayes, B. (2003). Rules vs. analogy in English past tenses: A
computational/experimental study. Cognition, 90, 119–161.
Anscombe, F. J. (1956). On estimating binomial response relations.
Biometrika, 43, 461–464.
Aronoff, M. (1976). Word formation in generative grammar. Cambridge, MA:
MIT Press.
Aronoff, M. (1994). Morphology by itself: Stems and inflectional classes.
Cambridge, MA: MIT Press.
Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to
statistics using R. Cambridge: Cambridge University Press.
Bacelar do Nascimento, M. F., Casteleiro, J. M., Marques, M. L. G., Barreto, F.,
Amaro, R., & Veloso, R. (2000). Léxico multifuncional computorizado do
português contemporâneo [Computerized multifunctional lexicon of
contemporary Portuguese]. Lisboa: Centro de Linguística da
Universidade de Lisboa.
Barr, D. J. (2008). Analyzing ‘visual world’ eyetracking data using
multilevel logistic regression. Journal of Memory and Language, 59,
457–474.
Berent, I., Pinker, S., & Shimron, J. (1999). Default nominal inflection in
Hebrew: Evidence for mental variables. Cognition, 72, 1–44.
Berko, J. G. (1958). The child’s learning of English morphology. Word, 14,
150–177.
Bybee, J. L. (1995). Regular morphology and the lexicon. Language and
Cognitive Processes, 10, 425–455.
Bybee, J. L., & Moder, C. L. (1983). Morphological classes as natural
categories. Language, 59, 251–270.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT
Press.
Chomsky, N. (1980). Rules and representations. New York, NY: Columbia
University Press.
Clahsen, H. (1999). Lexical entries and rules of language: A
multidisciplinary study of German inflection. Behavioral and Brain
Sciences, 22, 991–1013.
Clahsen, H. (2006). Dual-mechanism morphology. In K. Brown (Ed.).
Encyclopedia of language and linguistics (Vol. 4, pp. 1–5). Oxford:
Elsevier.
Colombo, L., Stoianov, I., Pasini, M., & Zorzi, M. (2006). The role of
phonology in the inflection of Italian verbs: A connectionist
investigation. The Mental Lexicon, 1, 147–181.
Daelemans, W. (2002). A comparison of analogical modeling of language to
memory-based language processing. In R. Skousen, D. Lonsdale, & D. B.
Parkinson (Eds.), Analogical modeling: An exemplar-based approach to
language (pp. 157–179). Amsterdam: John Benjamins.
Eddington, D. (2002). Dissociation in Italian conjugations: A single-route
account. Brain and Language, 81, 291–302.
Elman, J. L. (1993). Learning and development in neural networks: The
importance of starting small. Cognition, 48, 71–99.
Fodor, J., & Pylyshyn, Z. W. (1988). Connectionism and cognitive
architecture: A critical analysis. In S. Pinker & J. Mehler (Eds.),
Connections and symbols (pp. 3–71). Cambridge, MA: MIT Press.
Gart, J. J. (1966). Alternative analyses of contingency tables. Journal of the
Royal Statistical Society, B28, 164–179.
Gart, J. J., & Zweifel, J. R. (1967). On the bias of various estimators of the
logit and its variance with application to quantal bioassay. Biometrika,
54, 181–187.
Gulliksen, H. (1950). Theory of mental tests. New York, NY: John Wiley &
Sons.
Hahn, U., & Nakisa, R. C. (2000). German inflection: Single route or dual
route? Cognitive Psychology, 41, 313–360.
Haldane, J. B. S. (1955). The estimation and significance of the logarithm of
a ratio of frequencies. Annals of Human Genetics, 20, 309–311.
Hare, M. L., & Elman, J. L. (1995). Learning and morphological change.
Cognition, 6, 61–98.
Hare, M. L., Elman, J. L., & Daugherty, K. G. (1995). Default generalization in
connectionist networks. Language and Cognitive Processes, 10, 601–630.
J. Veríssimo, H. Clahsen / Journal of Memory and Language 76 (2014) 61–79
Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting
error and bias in research findings. London: Sage Publications.
Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs
(transformation or not) and towards logit mixed models. Journal of
Memory and Language, 59, 434–446.
Keuleers, E., Sandra, D., Daelemans, W., Gillis, S., Durieux, G., & Martens, E.
(2007). Dutch plural inflection: The exception that proves the analogy.
Cognitive Psychology, 54, 283–318.
Kiparsky, P. (1973). Elsewhere in phonology. In S. Anderson & P. Kiparsky
(Eds.), A festschrift for morris halle (pp. 93–106). New York, NY: Holt,
Rinehart and Winston.
Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New
York, NY: Wiley.
Marcus, G. F. (2001). The algebraic mind: Integrating connectionism and
cognitive science. Cambridge, MA: MIT Press.
Marcus, G. F., Brinkmann, U., Clahsen, H., Wiese, R., & Pinker, S. (1995).
German inflection: The exception that proves the rule. Cognitive
Psychology, 29, 189–256.
Marslen-Wilson, W. D., & Tyler, L. K. (1997). Dissociating types of mental
computation. Nature, 387, 592–594.
Marslen-Wilson, W. D., & Tyler, L. K. (2007). Morphology, language and the
brain: The decompositional substrate for language comprehension.
Philosophical Transactions of the Royal Society B, 362, 823–836.
Mateus, M. H. M., & d’Andrade, E. (2000). The phonology of Portuguese. New
York, NY: Oxford University Press.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. New York,
NY: Chapman and Hall.
Mikheev, A. (1997). Automatic rule induction for unknown-word guessing.
Computational Linguistics, 23, 405–423.
Pinker, S. (1999). Words and rules: The ingredients of language. New York,
NY: Basic Books.
Pinker, S., & Ullman, M. T. (2002). The past and future of the past tense.
Trends in Cognitive Sciences, 6, 456–463.
Prasada, S., & Pinker, S. (1993). Generalization of regular and irregular
morphological patterns. Language and Cognitive Processes, 8, 1–56.
79
Ramscar, M. (2002). The role of meaning in inflection: Why the past tense
does not require a rule. Cognitive Psychology, 45, 45–94.
Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tenses of
English verbs. In J. L. McClelland, D. E. Rumelhart, & The PDP Research
Group
(Eds.). Parallel distributed processing: Explorations in the
microstructure of cognition (Vol. 2, pp. 216–271). Cambridge, MA: MIT Press.
Say, T., & Clahsen, H. (2002). Words, rules and stems in the Italian mental
lexicon. In S. Nooteboom, F. Weerman, & F. Wijnen (Eds.), Storage and
computation in the language faculty (pp. 93–129). Dordrecht: Kluwer.
Skousen, R. (1992). Analogy and structure. Dordrecht: Kluwer Academic.
Skousen, R., Lonsdale, D., & Parkinson, D. B. (2002). Analogical modeling: An
exemplar-based approach to language. Amsterdam: John Benjamins.
Thorndike, R. L. (1949). Personnel selection: Test and measurement
techniques. New York, NY: John Wiley & Sons.
Ullman, M. T. (1999). Acceptability ratings of regular and irregular pasttense forms: Evidence for a dual-system model of language from word
frequency and phonological neighbourhood effects. Language and
Cognitive Processes, 14, 47–67.
Veríssimo, J., & Clahsen, H. (2009). Morphological priming by itself: A
study of Portuguese conjugations. Cognition, 112, 187–194.
Villalva, A. (2000). Estruturas morfológicas: Unidades e hierarquias nas
palavras do portuguê [Morphological structures: Units and hierarchies in
Portuguese words]. Lisboa: Fundaczão Calouste Gulbenkian, Fundaczão
para a Ciência e a Tecnologia.
Villalva, A. (2003). Estrutura morfológica básica [Basic morphological
structure]. In M. H. M. Mateus, A. M. Brito, I. Duarte, & I. H. Faria (Eds.),
Gramática da língua portuguesa (6th ed., pp. 917–938). Lisboa: Editorial
Caminho.
Wiseman, S. (1967). The effect of restriction of range upon correlation
coefficients. British Journal of Educational Psychology, 37, 248–252.
Woolf, B. (1954). On estimating the relation between blood group and
disease. Annals of Human Genetics, 19, 251–253.
Zimmerman, D. W., & Williams, R. H. (2000). Restriction of range and
correlation in outlier-prone distributions. Applie d Psychological
Measurement, 24, 267–280.
Download