Dependency grammar (and syntactic government)

advertisement
for Anna Kibort (ed.) Syntactic Government and Subcategorisation
Dependency grammar
1.
A brief history
The terms government and subcategorization are an interesting pair in terms of the
history of syntactic theory because one is ancient while the other is an invention of the
twentieth century with much the same meaning – a clear case of reinventing the
wheel.
By the twelfth century, grammarians were already using the Latin verb regere, ‘to
rule’, to describe the way in which a preposition or verb dictated the case of its
complement (Robins 1967:83), and (according to the Oxford English Dictionary) the
verb govern is used in the same sense by the early seventeenth century. Moreover, the
intellectual and metaphorical foundations for these terms go even further back in time;
so in second-century Alexandria, Apollonius discussed the ways in which different
verbs and prepositions selected different cases in dependent nouns, and even fathered
the term ‘transitive’ (Robins 1967:37). These selection relations received considerable
attention in the Arabic grammarians of the eighth century onwards, which described a
word as ‘governing’ (Arabic ‘a:mil) another word whose case it selected, and even
went so far as to notice that in Arabic (a head-initial language) the governor generally
precedes the governed (Owens 1988:53). In short, the terms govern and government
are at least five hundred years old, and the underlying idea of an asymmetrical
relation in which one word controls another is almost two thousand years old. (Similar
ideas in Panini may well push the history even further back to the fifth century BC;
Robins 1967:145.)
In contrast, the term subcategorization dates back only to 1965, when Chomsky
introduced it (Chomsky 1965) as a solution to a problem that arose from the phrasestructure theory that he had espoused. The basis for phrase structure is the assumption
that only one kind of structure can be represented: that between a whole and its parts.
Thus, the phrase cows eat grass could be related to its parts (cows and eat grass) and
eat grass to its parts (eat and grass) but, crucially, these two parts could not be related
directly to one another. This curiously restrictive assumption meant that there was no
way to show that the presence of grass had anything to do with the properties of eat
as a transitive verb. Chomsky’s solution was to add ‘features’ to eat which showed
how it could combine with other elements: ‘selection features’ for semantic
restrictions and ‘subcategorization features’ for syntactic restrictions.
The term subcategorization is odd, because it would normally mean nothing
but ‘subclassification’ although its intended meaning is much more specific:
subclassification according to the syntactic properties of accompanying elements.
However, there are deeper objections to this ‘solution’ because it is part of a theory
which starts by claiming that part-part relations are not permitted in syntactic
structure. Subcategorization features, like selection features, undermine this principle
by allowing part-part relations in the guise of a classification of the head word, but
without acknowledging that each such feature implies a direct relation between the
parts. Fortunately, at least the terminology in Chomskyan linguistics has reverted to
the more traditional govern and government, a direct relation between governor and
governed, though this relation is still treated as parasitic on the whole-part relations of
phrase structure. Unfortunately, both the terminology and the ideas of the earlier
theory persist in models such as Phrase Structure Grammar (Pollard and Sag 1994).
It is interesting to trace the recent history of these ideas, going back to the first
attempts to produce formal underpinnings for syntactic analysis – systems of analysis
which were sufficiently explicit and clear to be represented diagrammatically. This
history is interesting not only because it concerns some of the most elementary
assumptions of modern syntactic theory (such as government), but also because it
shows varying influence of the various ‘stakeholders’ in syntax – descriptive
linguistics, logic and education. For each theory considered below, the immediate
question for us is how the theory concerned accommodated government relations.
We start with the ‘sentence diagramming’ which was invented in the United States
in the early nineteenth century and reached maturity in 1877 in the work of Reed and
Kellogg (Gleason 1965:73). Their diagramming system allowed structures like Figure
1 for the sentence Cows eat fresh grass, with horizontal lines for government relations
and diagonals for what we call adjuncts. The vertical lines distinguish subjects from
objects, with the verb in the centre of the diagram as the heart of the sentence. Each
relation is shown as a single line, so government relations are shown directly.
Cows
eat
grass.
Figure 1: A sentence diagram
This diagramming system was intended for use in schools, and was so
successful that it is still taught today in many American (and other) schools – indeed
some readers of this chapter may have learned it as children. It even has a 21st century
face in a website (http://1aiway.com/nlp4net/services/enparser/) that generates ‘Reed
and Kellogg’ sentence diagrams to order and an informative page on Wikipedia.
However, so far as I know it was never used in descriptive linguistics, so it remained a
product of, and for, the school classroom, without any theoretical or research-based
underpinnings. On the other hand, it may well have been part of the school education
of academic linguists, so it is hard to rule out the possibility that it at least suggested
the possibility of using diagrams to display sentence structure.
One feature of Reed and Kellogg diagrams is that (in modern parlance) they
show dependency relations (government and adjunction) but not precedence (word
order); for instance, Figure 1 shows that fresh is an adjunct of grass, but does not
show which word follows which. Another important feature is that they do not
recognise phrases as such, although phrases are implicit in the dependency lines.
These features were given a more thorough theoretical foundation during the 1920s
and 1930s by at least two European linguists, both of whom wanted to improve the
teaching of grammar in schools. On the one hand, Otto Jespersen recognised the
hierarchical ordering of words in phrases such as a furiously barking dog, but
(confusingly) concluded that the word classes concerned could be arranged in three
‘ranks’ so that ‘tertiary’ words such as furiously consistently attach to ‘secondary’
words like barking, which in turn attach to ‘primary’ words such as dog (Jespersen
1924, Pierce 2006). And on the other hand, Lucien Tesnière not only wrote a major
theoretical discussion of dependency relations (published posthumously as Tesnière
1959), but also produced a simple diagramming system. He does not seem to have
known about the Reed and Kellogg system, but he may have been influenced by
German grammarians who developed the idea of dependency, as well as the name, in
the early nineteenth century (Forsgren 2006). His notation showed dependency
relations more consistently and iconically, with dependents consistently written lower
than the words on which they depend in a tree-diagram called a ‘stemma’ such as
Figure 2. Notice that the stemma has the same features as the Reed and Kellogg
diagrams: showing dependency but not precedence, and leaving phrases implicit in
the word-word dependencies.
eat
moo
cows
cows
grass
fresh
Figure 2: A stemma
Another European attempt to formalise the notion of government led to Categorial
Grammar, but this time the development was driven by logic (Morrill 2006). A verb
such as eat is incomplete in itself, and needs to combine with a following noun to
produce a phrase such as eat grass; but this too is incomplete until it combines with a
preceding noun to produce a phrase such as cows eat grass, which is complete. These
notions of one word ‘needing’ another accurately reflect the old tradition of
government, although they are extended to include subjects; but they are also
extended in another direction to include adjuncts such as fresh, which is said to need a
noun in order to combine with it and produce another noun – a case of a dependent
being sanctioned by itself rather than by the head.
Categorial Grammar is sensitive to word order, but like Jespersen’s theory it
bases the classification of words directly on their combinatorial needs rather than (as
in traditional grammar) on a bundle of morphological, syntactic and semantic criteria.
The ‘categories’ of Categorial Grammar replace the traditional word classes such as
‘noun’ and ‘verb’, so (at least in early versions of the theory) there is a category for
intransitive verbs (N\S) and another for transitive verbs ((N\S)/N), but none for ‘verb’.
On the other hand, the basis in logic allows a very simple translation from a syntactic
structure to a logical semantic structure. Given the orientation to logic rather than
pedagogy, it is unsurprising that there is no standard diagrammatic representation for
syntactic structure in Categorial Grammar, comparable with Reed and Kellogg
sentence diagrams or stemmas. Figure 3 uses an ad hoc notation which at least reflects
the spirit of Categorial Grammar.
fresh: N/N
eat: (N\S)/N
Cows: N
+
+
+
grass: N
fresh grass: N
eat fresh grass: N\S
Cows eat fresh grass: S
Figure 3: A categorial grammar analysis
Meanwhile, in the USA the main demand for syntactic theory came from
descriptive linguists working on the local native-American languages, for whom the
tradition developed for highly inflected, case-based, languages such as Latin, Greek,
Hebrew and Arabic proved hard to apply. Bloomfield’s reaction to the problem was to
start from scratch, making the minimum of basic assumptions about how sentences
were structured (Bloomfield 1933). The result was immediate-constituent analysis, in
which the only relation needed, or allowed, in syntax is the whole-part relation
between a phrase and its parts. When diagrams started to be used in works such as
Nida’s analysis of English (Nida 1960) they were the tree diagrams which later
became familiar through Chomsky’s work, as in Figure 4.
S
VP
NP
NP
N
V
Cows
eat
Aj
N.
fresh
grass.
Figure 4: A phrase-structure tree
It is true that some of these early systems, including Nida’s, acknowledge the
importance of the traditional grammatical relations such as ‘subject’ and ‘object’ by
recognising these as sub-divisions of the whole-part relations; but these are still not
true part-part relations like traditional government; and phrase structure, Chomsky’s
purification of immediate-constituency analysis, left no room even for these
concessions to traditional analyses.
When Chomsky was developing his ideas about syntactic structure, he was aware
of Categorial Grammar but not, apparently, of other theories which treated
government relations as basic (pc). The main models for his work were,
paradoxically, the post-Bloomfieldian theories of Zellig Harris (Harris 1951) which
he later attacked so vehemently, and the branch of mathematics called ‘formal
language theory’ (and in particular the theory of recursive functions - Smith 1999:56).
Government relations played litte or no part in these models, so they were entirely
absent from Chomsky’s earliest work, and it was only in 1965 that he recognised
them through the introduction of the ‘subcategorization’ features discussed earlier.
Since then, the part-part relations of government have increased in importance, but the
framework of whole-part relations remains basic. It is unfortunate that it was possible
to argue in the 1960s that dependency grammars were equivalent to phrase-structure
grammars (Gaifman 1965, Robinson 1970), because this allowed syntacticians to
conclude that dependency grammars could safely be ignored. In fact, the arguments
only showed that one very limited version of dependency grammar was equivalent to
an equally limited version of phrase structure; and in any case, the equivalence was
only weak – in terms of the strings of symbols that could be generated – rather than
the much more important strong equivalence of the structural analyses, where
dependency structures were clearly not equivalent to phrase structures.
During the decades since Chomsky’s generative grammar rose to prominence in
syntactic theory, other approaches have also been growing and developing. Categorial
Grammar has turned into the most popular option for logically oriented linguists, and
has been combined with phrase structure in Head-driven Phrase Structure Grammar
(Pollard and Sag 1994). A number of approaches have combined phrase structure with
a traditional functional analysis (in terms of subjects, objects and the like), but
interpreted as whole-part relations rather than as government relations between a head
and its dependents:
 Systemic Functional Grammar (Halliday 1985).
 Relational Grammar (Blake 1990)
 Functional Grammar (Dik 1991)
 Lexical Functional Grammar (Bresnan 2001)
 Role-and-Reference Grammar (Van Valin 1993)
 Construction Grammar (Croft 2007, Fillmore and others 1988, Goldberg
1995, Tomasello 1998)
What these approaches share is the basic Bloomfieldian assumption that the
foundation for syntactic structure is the whole-part relation between a phrase and its
parts, rather than the part-part relation between a word and its dependents.
During the same period, however, the dependency-based tradition also developed
a range of theories that is almost as diverse as the tradition based on phrase structure.
Many of these theories are inspired by the formal developments of generative
grammar and computational linguistics (Kruijff 2006), including at least the following
list (where some theories are given ad hoc names):
 Abhängigkeitsgrammatik (for computing) (Kunze 1975)
 Generative Dependency Grammar (Vater 1975, Diaconescu 2002)
 Case Grammar (Anderson 1977)
 Functional Generative Description (Sgall and others 1986)
 Lexicase (Starosta 1988)
 Abhängigkeitsgrammatik (for schools) (Heringer 1993)
 Meaning Text Theory (Mel'cuk 2004)
 Tree-Adjoining Grammar (Joshi and Rambow 2003).
 Link Grammar (Sleator and Temperley 1993).
 Dependency Parsing (Kübler and others 2009)
 Catena Theory (Osborne and others 2012)
There is also a very productive research tradition called Valency Theory which
focuses specifically on government relations (Allerton 2006, Herbst and others 2004)
and whose key term, valency, was introduced into syntax by Tesnière.
The enormous diversity within the dependency tradition undermines most simple
generalisations about ‘dependency grammar’ in relation to government. Instead of
trying to survey the diversity, I shall present the version of dependency grammar that
I have been developing since about 1980 under the name ‘Word Grammar’ (Gisborne
2010, Gisborne 2011, Duran-Eppler 2011, Sugayama 2003, Sugayama and Hudson
2006, Hudson 1984, Hudson 1990, Hudson 2007a, Hudson 2010). First, however, I
start with a survey in section 2 of the kinds of information that could, and I believe
should, be covered by the term government. Section 3 then introduces the relevant
general characteristics of Word Grammar (including, of course, its use of word-word
dependencies), and shows how these characteristics allow Word Grammar to
accommodate government in all the diversity surveyed in section 2.
2.
The scope of government and valency
Traditionally, government applied either to a word’s complement (e.g. a preposition
governs its complement) or to that complement’s case (e.g. the Latin preposition de,
‘about’, governs the ablative case). However, the Arabic grammarians extended the
same notion to the inflectional categories of dependent verbs, and it is easy to argue
for a much wider application of the term to cover any word-complement pair. For
instance, in the ‘DP’ analysis, a boy consists of a determiner followed by its
complement, so a counts as the governor of boy. This extension is fully justified by
the similarities across word-classes in the way they restrict their complements. And,
of course, if we insist on fidelity to the tradition, we are left with a terminological gap:
what do we call the relation between a word and its complement when the word is not
a verb or preposition? Extending the term govern fills this gap perfectly, and removes
the otherwise arbitrary historical restriction to verbs, prepositions and case.
Some dependency grammarians have extended the term in another direction,
so that governor is simply the converse of dependent (e.g. Tarvainen 1987:76). This
regrettable extension misses the main point of the idea (and terminology), which is
that the governor controls both the presence and other properties of the dependent; in
this sense, Cows eat grass constantly shows a verb eat governing grass but not
constantly. If govern is extended to cover constantly as well as grass, then we need
another term to distinguish them, so we may as well stick with govern for the relation
between eat and grass, with constantly as its adjunct (or Tesnière’s circonstant). On
the other hand, we do need a converse of the term dependent. One obvious possibility
is head (the term I once used), but this term is used in phrase structure to relate a word
to its containing phrase, rather than to its individual dependents; for example, in the
phrase green grass, the word grass is head of green grass, and not of its dependent
green. To avoid this confusion I now prefer the term parent, so in the pair green
grass, grass is the parent of green, which is its dependent; and in Cows eat grass
constantly, the verb eat is the parent of all the other words.
If the governor of an element is the word that controls its presence and
properties, then two further extensions are needed to the traditional scope of
government. On the one hand, it must include subjects as well as complements; after
all, the finite verb eats demands a subject even more strongly than it demands a
complement. With this extension, the scope of government includes all ‘arguments’ or
(following Tesnière) ‘actants’. Whatever mechanism is responsible for providing a
word with its correct complements can also be extended to deal with subjects.
The other extension is to parents (in the sense I introduced above, where a
word’s parent is the word on which it depends). This extension is already established
in all but name in Categorial Grammar, where (as explained earlier) adjuncts are
elements that take their parent as argument; so constantly might be classified as a
form that takes a verb as its argument to produce another (larger) verb. It is also
justified by the facts, because dependents can ‘govern’ their parents in much the same
way as vice versa. The reason why government was traditionally applied only to
complements was that the classical languages Latin and Greek are both ‘dependentmarking’, marking most dependencies by case inflections on the dependent. But not
all languages are like this, and ‘head-marking’ languages locate the marker of a
dependency on the parent (Nichols 1986, Nichols 2006). In such a language, the
dependent controls the inflectional form of the parent. Moreover, it is easy to find
examples even in more familiar dependent-marking languages where a dependent
selects its parent. For example, in many European languages a past participle selects
between ‘have’ and ‘be’ as its auxiliary verb, but we all agree that the participle
depends on the auxiliary. Similarly, many nouns select their preferred preposition –
e.g. in school but at university – and once again, it is the governor in this relation that
depends syntactically on the governed. And of course, even adjuncts are fussy about
the words they can depend on, so very can depend on an adjective or an adverb, but
not on a verb (very interesting, but not *very interest). And even more generally, the
one thing we know about the syntax of almost every word except finite verbs is that it
needs a parent – again a clear case of government.
In conclusion, I am suggesting that the relationship described as ‘government’
can be found not only when the ‘governed’ is the complement of the governor, but
also when it is the subject or the parent. The extension to parents means that
complements and subjects govern their parents reciprocally, while adjuncts merely
govern their parents but not vice versa. But even where government is reciprocal, it is
generally unequal in its effects; for example, in Cows eat grass, the nouns cows and
grass only govern eat to the extent that they each need some word as their parent,
whereas the verb eat imposes much more specific requirements on its subject and
complement.
Given this extended definition of government, we can now ask what kinds of
demands a governor may make. Following Tesnière once again, we can call these
demands the word’s valency (a metaphor from chemistry, where atoms have valencies
which determine how they combine with other atoms). In these terms, then, the
question is: What is the scope of valency? The following survey raises the
fundamental question of whether there are any special limitations on valency. Is
language special – a dedicated module of the mind – or is it just ordinary cognition
applied to words? We start with some rather obvious or well established limitations of
government, which are the best case for special status; but in section 3 I offer a
functional explanation for even these facts.
The most obvious limitation on government is the list of relations to which it
applies. For example, although a word governs its complement, it cannot govern a
complement of its complement. We can call this limitation the Locality restriction:
(1)
Locality. A word A can govern a word B only if B is linked directly to A by a
dependency relation.
For example, we might imagine a language in which word A requires a dative case on
the complement of any preposition that happens to act as complement of A; but
according to locality, this language is impossible. It is true that there are well-attested
examples of words that require a particular case on the complement of a dependent
preposition; for example, in German the preposition auf governs the dative when it
means ‘on’ and the accusative when it means ‘onto’, but when it is itself governed by
the verb warten, ‘wait’, it only governs the accusative (Er wartet auf Dich, ‘He is
waiting for you’). But valency patterns like this always involve a specific lexeme (in
this case, auf) as the intervening complement, so can easily be broken down into two
steps by recognising two ‘sub-lexemes’ of the preposition: AUF/acc and AUF/dat,
governing the accusative and dative respectively. Given this distinction, WARTEN
simply governs AUF/acc, and the government is purely local.
Another limitation on government is the apparent ‘blindness’ of syntax to both
phonology and morphology (Miller and others 1997, Zwicky 1992). This limitation
prevents valency from referring to either phonological or morphological properties of
a word which it governs; for example, a verb cannot require its object to start with /b/
or to have two syllables, nor can it require it to be (in Latin) a member of the first
declension-class (inflected like amicus, ‘friend’, but not like agricola, ‘farmer’, dux,
‘leader’ or manus, ‘hand’), or to be morphologically regular. Thanks to extensive
research by Zwicky, Pullum and others, these limitations appear to be real. They seem
to be especially well attested in valency, even though other parts of syntax such as
word order may be affected by phonological features. Section 3 below suggests an
explanation for these restrictions, and indeed argues that the blindness of valency to
both phonology and morphology has the same explanation as the locality principle.
The two restrictions are summarised here:
(2)
Phonological blindness. A word cannot directly govern any phonological
property of another word.
(3)
Morphological blindness. A word cannot directly govern any morphological
property of another word.
Having noted these three limitations on valency, the natural question is what
other limits there are; and my answer, which I try to justify below, is that there are
none. More precisely, any limits that exist can be explained in functional terms. For
instance, there must be some upper limit to the number of complements a given word
can have. English seems to have a maximum of about three complements, found with
verbs such as BET, as in (4), and although more may be possible, the world record is
likely to be in single figures.
(4)
I bet him a pound that he couldn’t do it.
Assuming this is so, the explanation is easy to find in the processing demands of
tracking multiple complements, combined with the very small communicative
benefits of verbs with multiple complements (in comparison with structures that
distribute the same number of arguments across two or more verbs). In other words,
the valency patterns that exist in different languages are simply different ‘engineering
solutions’ to the problem of creating an efficient communication system with a
reasonable balance between the various competing demands and pressures on such a
system (Evans and Levinson 2009).
To give an idea of the flexibility of valency, consider what we might imagine
to be the most basic requirement of all: that a word’s valency should only include
other words. Standard views of valency, in all theories, follow the structuralist
tradition of isolating language from everything else, so this restriction might seem to
be a theoretical given. However, this isolation of language from other behaviour is
arguably a research strategy rather than a fact. In reality, a speaker’s knowledge of
language is intimately integrated with the rest of their knowledge. A particularly clear
example of this integration is the valency of the verb GO, as in The train went ....,
where (in speech) the dots stand for some sound such as a whistle or some other noise
(Hudson 1985). Let us call this sub-lexeme of GO ‘GO/noise’. Crucially, speakers in
my generation allow the complement of GO/noise to be any appropriate non-verbal
noise, but not a word; so The train went woosh is barely acceptable. (In contrast,
younger speakers allow GO to introduce speech, as in He went ‘Look out!’.) A word
that can take a mere noise as its complement is clearly a very serious threat to the
supposed boundary round language, and should discourage us from expecting any
substantive constraints on valency.
What properties of a word may be governed by another word? The answer
seems to be ‘any property other than its morphology or phonology’. The following list
(based on Hudson 1988) is meant to be simply suggestive of the diversity of
possibilities, and in each case the examples show restrictions on parents as well as on
complements or subjects:
 word class (a preposition takes a noun as its complement; very takes an
adjective or adverb as its parent).
 inflection (German FOLGEN, ‘follow’, takes a dative object; in Eastern
Tzutujil, the phrase r-cheeuaal ja kinaq’ ‘3s-stick the bean’ i.e. ‘beanpole’,
contains the prefix r- marking the possessed-possessor dependency on the
parent, so it must be part of the valency of the possessor noun (Nichols 2006).
 lexical identity (the complement of DEPEND is ON; the parent of French
perfect ALLER, ‘go’, is ÊTRE, ‘be’).
 word order (the adjectival dependent of SOMETHING follows it, as in
something nice; the parent of ENOUGH precedes it, as in big enough)
 meaning (PUT takes a complement which identifies a place; the parent of
GREEN denotes a solid object, hence the badness of green ideas)
In short, all the familiar ‘linguistic’ properties of words, other than their pure
morphophonology and phonology, seem to be subject to valency restrictions.
On the other hand, these are not the only properties that words have. Another
property of any word is the language to which it belongs. Are there any words that
require their complement or parent to be in a particular language? This is a question
for research on code-switching, and the answer seems to be positive: there are indeed
cases in the literature where a bilingual community allows code-switching after some
words but not after others. For example, German-English bilinguals in London were
found (Duran-Eppler 2011:188) to allow the English because to have either an
English complement or a German one (e.g. either because it rained or because es
regnete), but the German equivalent, weil, allowed only a German complement (weil
es regnete, but not weil it rained). Other word properties include the sociolinguistic
categories of dialect and register, and the diachronic category of etymology; do these
properties ever figure in valency restrictions? No case comes to mind, but it is a
reasonable question for future research.
Another striking feature of valency restrictions is their flexibility. Take the
English ‘caused-motion’ construction, exemplified in (5) and (6) (Goldberg
1995:152).
(5)
They laughed the poor guy out of the room.
(6)
Frank sneezed the tissue off the table.
It is clear that LAUGH and SNEEZE are basically intransitive verbs, and yet
examples like these are incontestably good English. This extension of basic valency
fits into a larger pattern in which verbs that basically describe an activity can be
combined with a ‘satellite’ to describe a motion (Talmy 1985, Slobin 1996). Other
examples that illustrate the same pattern are these:
(7)
The bottle floated out of the cave.
(8)
They waltzed into the room.
Unlike basic valency patterns, which are stored lexeme by lexeme or derived from
very specific word classes, these seem to be created as needed, rather like metaphors.
Indeed, the term ‘grammatical metaphor’ fits them rather well (Halliday 2002:282).
On the other hand, these extensions are clearly part of English grammar rather than
creative innovations, because typological studies show that languages such as Spanish
and French do not allow them (Talmy 1991).
To conclude this survey of the scope of government restrictions, I have
suggested a much broader research agenda than is usually recognised in research on
government or valency. On the one hand, I have suggested that valency restrictions
apply not only to a word’s complements but also to its subject and to its parent, on the
grounds that in all three relations, one word can impose similar restrictions on the
kinds of word with which it may contract dependency relations. And on the other
hand, I have also suggested that these valency restrictions may involve a wider range
of properties than are usually considered relevant, including the most fundamental
property of all: being a word (where GO/noise is an example of a word which requires
a non-word as its complement). This very wide range of relevant properties suggests
that, in principle, any kind of word-property may be relevant to valency restrictions.
On the other hand, I have also noted three limitations that almost certainly do apply:
the locality principle which prevents a word’s valency from reaching beyond the
words with which it is directly connected by dependencies, and the principles of
phonology-free and morphology-free syntax, which prevent a word’s valency from
reaching into the word’s phonological and morphological structure. The question for
any theoretical model is why valency is so free in most respects, but so limited in
these three ways.
3.
Word Grammar and valency
Word Grammar (WG) has roots not only in dependency grammar, but also in a
number of other traditions of linguistic theory. Perhaps the most important influence,
for present purposes, comes from formal generative linguistics, which focuses
attention on the formal properties of language as revealed in the fine detail of more or
less formalized grammars. This attention to formal detail explains why WG may have
a special contribution to make in the theory of government.
Almost equally important is the theory’s cognitive orientation, which owes a
great deal to Stratificational Grammar (Lamb 1966,Lamb 1998) and early AI
(Winograd 1972, Anderson 1983). Consequently, WG is one of the earliest members
of the ‘cognitive linguistics’ family of theories (Hudson 2007b). The cognitive
orientation is a reaction against the traditional view of language as a separate
phenomenon; so instead of assuming from the outset that language is special, we
assume the opposite: that language is a branch of knowledge, special only in that it
involves the knowledge of words and how to use them. Once again this is important
for government because it predicts that valency expectations can range widely over
everything we know about words, and may even go beyond words (as indeed it does
in the case of GO/noise). A cognitive orientation is also relevant to the choice
between dependency structure and phrase structure, given that the latter rests on the
claim that human minds are incapable of grasping direct relations between individual
words – a most implausible assumption, given the ease with which we grasp similar
relations in other areas of life such as social relations. (Rather oddly, other ‘cognitive’
theories of language follow the phrase structure tradition without question.)
The effect of combining a focus on formal structures with a cognitive
orientation is the WG claim that language is a cognitive network (Hudson 1984:1).
This claim goes well beyond the widely held view that the lexicon is a network of
lexical entries (Pinker 1998), or that language is a network of conventional units
(Langacker 2000:12) or constructions (Croft 2004). In WG, language is a network of
atomic concepts, each of which is merely an atomic node whose ‘content’ consists
entirely in its links to other nodes. For example, the word MAN is merely a node,
which only carries a label as a matter of convenience for the analyst and whose
properties consist entirely of links to other nodes – the node for the general concept
‘man’; the nodes for the morphemes {man} (also linked to other words such as
MANLY) and {men}; the node for ‘noun’; and the node for ‘English word’. This tiny
fragment of the network can be diagrammed, using WG notation, as in Figure 5.
English
word
noun
‘man’
meaning
g
MAN
MANLY
base
base
realization
{men}
MAN,
plural
{manly}
{man}
part 1
Figure 5: A network for MAN
The importance of the Network Notion (Reisberg 2007:252), the idea that
concepts have no content except for their links to other concepts, is that a conceptual
network can model the flow of activation in the underlying neural network because a
concept’s content is part of the network. For example, if in speaking we activate the
concept ‘man’, this activation can spill over to the neighbouring concept MAN, so
that this becomes available for utterance; and conversely, if in listening we hear (and
activate) the concept {man}, this activity can spill over onto MAN (or MANLY),
allowing us to recognise this lexeme. In contrast, activation cannot spread to words
from their meanings or forms if these are encapsulated in a complex structure such as
a ‘construction’.
The small triangles in Figure 5 indicate the ‘isa’ relation; for instance, this
network shows that MAN isa noun and also isa English word, and that ‘MAN, plural’
isa MAN. This relation has its own notation to reflect its fundamental status as the
basis for both classification and generalisation. ‘Isa’ is the relation not only between
any class and its members, but also between a type and its tokens or instances.
Furthermore, it carries all ‘inheritance’, whereby an instance inherits properties from
the stored types and classes; for instance, in Figure 5 the isa link from ‘MAN, plural’
to ‘MAN’ and to ‘plural’ (not shown) allow it to inherit properties such as inflectional
patterns from these categories, and (recursively) from even more general categories
such as ‘noun’ and ‘English word’. But of course the underlying logic is default
inheritance, which accommodates exceptions such as the irregular realization of
‘MAN, plural’ as {men} rather than the regular {mans}.
One further tenet of WG is relevant to valency: that language-learning is
‘usage-based’ (Bybee 1999, Kemmer and Israel 1994, Langacker 2000, Barlow and
Kemmer 2000). This tenet goes beyond the truism that we learn our language by
observing other people’s usage, by claiming that this mode of learning is reflected in
the structure of the language network. First, as I have just implied, this network is
implemented in a neural network which gives each node a quantitatively variable
‘activation level’ – or, more precisely, two such levels: a relatively permanent one
which reflects one’s lifetime experience in terms of frequency, recency and emotional
impact; and a constantly varying one which reflects current concerns and activity.
And second, usage-based learning means, inevitably, that one’s memory includes a
vast collection of half-remembered specific tokens or exemplars as well as the
generalised memories that collectively build into a grammar and lexicon. The specific
cases are inevitable companions of the generalisations if (as I believe) we have no
mental mechanism for ‘forgetting’ specifics once we have generalised from them.
How does this theoretical framework help in the understanding of government
and valency? It throws light on a number of questions. First, why do government and
valency restrictions exist at all? After all, if we were designing a language from
scratch, and the only aim was to allow it to express indefinitely many complex
meanings, it is not obvious that we would include any government or valency
restrictions in the language. Each word would have its meaning, which could be
expanded and modified in various ways by adding other words as dependents. For
example, we might imagine a very simple version of English which contained just the
words likes, he, her, with the network structures shown in Figure 6. In this language,
syntax would consist of nothing but a general cognitive principle that variables (such
as x and y) should be bound to suitable constants (Hudson 2007a:46-50). In this case,
the suitable constants would be ‘known male’ and ‘known female’, expressed
respectively by he and her, so to express the more complex meaning ‘known male
like known female’ one could say any of He likes her, her likes he, he her likes and so
on. If, as I argued earlier, ‘government’ should expand to include any restrictions on
dependents or parents, then every restriction which would limit this freedom qualifies
as ‘government’ so government includes the whole of syntax, and the question is why
syntactic restrictions exist.
x
y
liker
liked
x like y
known
male
known
female
meaning
meaning
meaning
likes
he
her
Figure 6: Three-word English
The functional explanations for syntax are obvious and well known: syntactic
restrictions provide helpful guidance to the hearer, so that a sentence such as He likes
her has just one possible meaning, not two. When syntax provides multiple
independent restrictions, incompatible combinations such as *Her likes him or *He
her likes are ungrammatical rather than ambiguous, so syntax is autonomous.
Moreover, syntax is driven by pressures other than mere avoidance of ambiguity,
notably the vagaries of usage. To take a simple example, when the verb HANG is
used literally the natural preposition for use with things such as hooks is ON, as in (9).
(9)
He hung his coat on the hook.
But when used metaphorically, HANG still takes ON:
(10) He hung on her every word.
And the same is true for related verbs such as DEPEND and RELY. The answer to the
question about why government exists, therefore, seems to be a mixture of processing
pressures (guiding the hearer) and other pressures such as building on the patterns
experienced in usage.
The remaining questions concern the technicalities of government. First, why
is it that so many valency restrictions are lexical, in the sense that a lexically-specified
governor requires a particular, lexically-specified, governed? As we have just seen,
the verbs HANG, DEPEND and RELY require the preposition ON, but there are
many other similar restrictions found not only in conventional government patterns
but also in the vast territory of collocation (e.g. white wine), cliche (e.g. I hesitate to
say this, but ...) and idiom (e.g. kick the bucket). In each case, a stored unit (in the
conceptual network of language) consists of a sequence of specific words related just
as they would be in a larger sentence. In the terminology of Construction Grammar,
each of these stored units is a construction. The question is why so many
constructions appear to link individual words directly to one another. This fact is a
problem for phrase-structure theory given its ban on direct relations between
individual words. For example, in a phrase-structure analysis of (10) the verb hang is
only related indirectly to the preposition on via the preposition phrase on her every
word. In contrast, any version of dependency grammar, including Word Grammar,
has a simple explanation: stored constructions can relate single words to each other,
either as lexical items or as more general classes, because words are directly related in
sentence structure.
We now turn to the restrictions on valency discussed in section 2, starting with
the locality principle (1) which prevents a governor from governing any words other
than its own direct dependents or parents. For instance, it can restrict the
morphosyntactic case of its own dependents, but not of their dependents. The reason
why locality applies is that any such restriction would imply a direct relation between
the governor and the governed, so an extra direct link would automatically be added
to the network and the link would become direct. And, indeed, there are clear cases in
language where this seems to have happened; consider, for example, the collocation
nice little (as in a nice little cottage). The fact is that there is no direct dependency
link, as such, between two adjectives modifying the same noun; rather, the
dependencies only link them indirectly via their shared parent. This is confirmed by
the fact that nice little cannot be used without a shared parent, as in the hypothetical
(11):
(11) *That cottage is nice little.
Consequently we must assume some kind of link between nice and little, which is not
a syntactic dependency but which allows a government link between them. The
example shows that locality is an automatic consequence of the assumption that
language is a network. A similar analysis may be possible for German adjective
inflections which are influenced by the choice of determiner as in (12) and (13):
(12) ein junger Mann ‘a young man’ (nominative)
(13) der junge Mann ‘the young man’ (nominative)
As in the example of nice little, there seems to be a non-dependency link between the
determiner and the adjective, but here we are not dealing with a mere collocation but
with the central part of grammar, morphosyntax.
The other two constraints on government were honological blindness (2) and
morphological blindness (3). Both of these constraints follow from locality, combined
with the WG assumption that morphology and phonology are autonomous levels. This
assumption leads to network fragments such as Figure 7, in which a lexical item such
as BARK is related directly to its subject, its base and its meaning. By government,
BARK can restrict its subject y in various ways, but locality means that these
restrictions must be summarised directly in y, by recognising a general category to
which y can be linked as an example. But such a category would be unlikely to arise
for a purely morphological or phonological property of y. A morphological restriction
on y would require a restriction on y’s base (such as belonging to a particular
inflectional class), and a phonological restriction on y would apply not to y itself, but
to the realization of the base of y. According to the WG theory of learning (Hudson
2007a:52-9), it would in fact be possible to create a special concept such as ‘word
beginning with /b/’, but this could not happen unless this property (beginning with /b/)
correlated with some other property. Consequently, any property which is purely
morphological or purely phonological must be invisible outside morphology or
phonology.
barker
‘x bark’
x
meaning
meaning
subject
y
BARK
base
{bark}
realization
/b + ɑ: + k/
Figure 7: A four-level network for BARK
And finally, there is the question about why valency allows so much
flexibility. How is it that we can take what is basically an intransitive description of
an action such as LAUGH, SNEEZE, FLOAT or WALTZ and turn it into a
description of a movement resulting from this action, as in (5) to (8), simply by
adding a direction adjunct and (in some cases) an object (Holmes 2005)? And why is
this extra ‘result’ structure possible in English, but not in French? Given the formal
apparatus of a conceptual network and default inheritance, there are two possible
answers. One is that the result structure is an optional property of ‘verb’, so it is
available for inheritance by any verb – provided its meaning and its existing syntax
are compatible. The other solution is to treat the relation between the basic verb and
its resultative version as an example of word-formation. In this analysis, English
allows any verb to have a ‘resultative’ with the same properties but with extra
resultative structure; but this relation is not found in French. The choice between these
two alternatives is a question for research, but the main point is that the existence of
the pattern is easy to explain within the framework of WG.
In conclusion, then, dependency-based grammars are better than those based
on phrase structure as a basis for accommodating the full range of restrictions that
qualify as ‘government’ because most such restrictions require a direct link between
one word and another – i.e. between a parent and a dependent. Moreover, the
traditional examples of government involve restrictions which can be found not only
between verbs or prepositions and their complements, but also in a much wider range
of syntactic patterns. If ‘government’ is extended to include all these patterns, then we
find that the restriction may apply in either direction, with mutual government as a
common arrangement. These patterns also favour dependency structure over phrase
structure. But government and valency patterns are also relevant to another basic
theoretical choice, the one between traditional structuralist analyses and modern
cognitive analyses in which language patterns may related directly to non-linguistic
patterns. Examples such as the English verb GO show that a word’s valency may
require something other than a word, which supports the cognitive approach. A full
understanding of government therefore requires a theory which combines dependency
structure with a cognitive orientation, such as Word Grammar.
References
Allerton, David 2006. 'Valency Grammar', in Keith Brown (ed.) Encyclopedia of
Language & Linguistics. Oxford: Elsevier, pp.301-314.
Anderson, J. R. 1983. 'A spreading activation theory of memory', Journal of Verbal
Learning and Verbal Behavior 22: 261-295.
Anderson, John 1977. On case grammar: Prolegomena to a theory of grammatical
relations. London: Croom Helm
Barlow, Michael and Kemmer, Suzanne 2000. Usage based models of language.
Stanford: CSLI
Blake, Barry 1990. Relational Grammar. London: Croom Helm
Bloomfield, Leonard 1933. Language. New York: Holt, Rinehart and Winston
Bresnan, Joan 2001. Lexical-Functional Syntax. Oxford: Blackwell
Bybee, Joan 1999. 'Usage-based phonology', in Michael Darnell, Edith Moravcsik,
Frederick Newmeyer, Michael Noonan, & Kathleen Wheatley (eds.)
Functionalism and formalism in Linguistics. I: General papers. Amsterdam:
Benjamins, pp.211-242.
Chomsky, Noam 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT Press
Croft, William 2004. 'Logical and typological arguments for Radical Construction
Grammar', in Mirjam Fried & Jan-Ola Östman (eds.) Construction
Grammar(s): Cognitive and cross-language dimensions (Constructional
Approaches to Language, 1). Amsterdam: John Benjamins, pp.273-314.
Croft, William 2007. 'Construction grammar', in Dirk Geeraerts & Hubert Cuyckens
(eds.) The Oxford Handbook of Cognitive Linguistics. Oxford: Oxford
Univesity Press, pp.463-508.
Diaconescu, Stefan 2002. 'A Generative Dependency Grammar', in Mitsuru Ishizuka
& Abdul Sattar (eds.) 7th Pacific Rim International Conference on Artificial
Intelligence Tokyo, Japan, August 18–22, 2002 Proceedings. Berlin: Springer,
pp.605-605.
Dik, Simon 1991. 'Functional Grammar', in Flip Droste & John Joseph (eds.)
Linguistic Theory and Grammatical Description. Amsterdam: Benjamins,
pp.247-274.
Duran-Eppler, Eva 2011. Emigranto. The syntax of German-English code-switching.
Vienna: Braumüller
Evans, Nicholas and Levinson, Stephen 2009. 'The Myth of Language Universals:
Language diversity and its importance for cognitive science', Behavioral and
Brain Sciences 32: 429-492.
Fillmore, Charles, Kay, Paul, and O'Connor, Mary 1988. 'Regularity and idiomaticity
in grammatical constructions: the case of let alone.', Language 64: 501-538.
Forsgren, Kjell-Åke 2006. 'Tesnière, Lucien Valerius (18931954)', in Keith Brown
(ed.) Encyclopedia of Language & Linguistics (Second Edition). Oxford:
Elsevier, pp.593-594.
Gaifman, Haim 1965. 'Dependency systems and phrase-structure systems.',
Information and Control 8: 304-337.
Gisborne, Nikolas 2010. The event structure of perception verbs. Oxford: Oxford
University Press
Gisborne, Nikolas 2011. 'Constructions, Word Grammar, and grammaticalization',
Cognitive Linguistics 22: 155-182.
Gleason, Henry 1965. Linguistics and English Grammar. New York: Holt, Rinehart
and Winston
Goldberg, Adele 1995. Constructions. A Construction Grammar Approach to
Argument Structure. Chicago: University of Chicago Press
Halliday, Michael 1985. An Introduction to Functional Grammar. London: Arnold
Halliday, Michael 2002. On Grammar. New York: Continuum
Harris, Zellig 1951. Structural Linguistics. Chicago: University of Chicago Press
Herbst, Thomas, Heath, David, Roe, Ian, and Götz, Dieter 2004. A Valency
Dictionary of English: A Corpus-based Analysis of the Complementation
Patterns of English Verbs, Nouns and Adjectives. Berlin: Mouton de Gruyter
Heringer, Hans J. 1993. 'Dependency syntax - basic ideas and the classical model', in
Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld, & Theo
Venneman (eds.) Syntax - An International Handbook of Contemporary
Research, volume 1. Berlin: Walter de Gruyter, pp.298-316.
Holmes, Jasper (2005). Lexical Properties of English Verbs. PhD dissertation, UCL,
London.
Hudson, Richard 1984. Word Grammar. Oxford: Blackwell.
Hudson, Richard 1985. 'The limits of subcategorization', Linguistic Analysis 15: 233255.
Hudson, Richard 1988. 'The linguistic foundations for lexical research and dictionary
design.', International Journal of Lexicography 1: 287-312.
Hudson, Richard 1990. English Word Grammar. Oxford: Blackwell.
Hudson, Richard 2007a. Language networks: the new Word Grammar. Oxford:
Oxford University Press
Hudson, Richard 2007b. 'Word Grammar', in Hubert Cuyckens & Dirk Geeraerts
(eds.) Handbook of Cognitive Linguistics. Oxford: Oxford University Press,
pp.777-819.
Hudson, Richard 2010. An Introduction to Word Grammar. Cambridge: Cambridge
University Press
Jespersen, Otto 1924. The Philosophy of Grammar. London: Allen and Unwin
Joshi, Aravind and Rambow, Owen 2003. 'A Formalism for Dependency Grammar
Based on Tree Adjoining Grammar.', in Sylvain Kahane & Alexis Nasr (eds.)
Proceedings of the First International Conference on Meaning-Text Theory.
Paris: Ecole Normale Supérieure, pp.
Kemmer, Suzanne and Israel, Michael 1994. 'Variation and the usage-based model', in
Katherine Beals, Jeanette Denton, Robert Knippen, Lynette Melnar, Hisame
Suzuki, & Enca Zeinfeld (eds.) Papers from the 30th regional meeting of the
Chicago Linguistics Society: The parasession on variation in linguistic theory.
Chicago: Chicago Linguistics Society, pp.165-179.
Kruijff, Geert-Jan. 2006. 'Dependency Grammar', in Keith Brown (ed.) Encyclopedia
of Language & Linguistics. Oxford: Elsevier, pp.444-450.
Kübler, Sandra, McDonald, Ryan, and Nivre, Joakim 2009. 'Dependency Parsing',
Synthesis Lectures on Human Language Technologies 2: 1-127.
Kunze, Jürgen 1975. Abhängigkeitsgrammatik. Berlin: Akademie-Verlag
Lamb, Sydney 1966. Outline of Stratificational Grammar. Washington, DC:
Georgetown University Press
Lamb, Sydney 1998. Pathways of the Brain. The neurocognitive basis of language.
Amsterdam: Benjamins
Langacker, Ronald 2000. 'A dynamic usage-based model.', in Michael Barlow &
Suzanne Kemmer (eds.) Usage-based Models of Language. Stanford: CSLI,
pp.1-63.
Mel'cuk, Igor 2004. 'Levels of dependency in linguistic description: concepts and
problems', in Vilmos Àgel, Ludwig Eichinger, Hans-Werner Eroms, Peter
Hellwig, Hans-Jürgen Heringer, & Henning Lobin (eds.) Dependency and
Valency: An international handbook of contemporary research. Berlin: Walter
de Gruyter, pp.188-229.
Miller, Philip., Pullum, Geoffrey, and Zwicky, Arnold 1997. 'The Principle of
Phonology-Free Syntax: Four apparent counterexamples in French', Journal of
Linguistics 33: 67-90.
Morrill, Glyn 2006. 'Categorial Grammars: Deductive Approaches', in Keith Brown
(ed.) Encyclopedia of Language & Linguistics (Second Edition). Oxford:
Elsevier, pp.242-248.
Nichols, Johanna 1986. 'Head-marking and dependent-marking grammar.', Language
62: 56-119.
Nichols, Johanna 2006. 'Head/Dependent Marking', in Editor-in-Chief:-á-á Keith
Brown (ed.) Encyclopedia of Language & Linguistics (Second Edition).
Oxford: Elsevier, pp.234-237.
Nida, Eugene 1960. A synopsis of English syntax. Norman: Summer Institute of
Linguistics of the University of Oklahoma
Osborne, Timothy, Putnam, M, and Gross, Thomas 2012. 'Catenae: introducing a
novel unit of syntactic analysis.', Syntax 15: 354-396.
Owens, Jonathan 1988. The Foundations of Grammar: an Introduction to Mediaeval
Arabic Grammatical Theory. Amsterdam: Benjamins
Pierce, Marc 2006. 'Jespersen, Otto (1860Çô1943)', in Keith Brown (ed.)
Encyclopedia of Language & Linguistics (Second Edition). Oxford: Elsevier,
pp.119-120.
Pinker, Steven 1998. 'Words and rules', Lingua 106: 219-242.
Pollard, Carl and Sag, Ivan 1994. Head-Driven Phrase Structure Grammar. Chicago:
Chicago University Press
Reisberg, Daniel 2007. Cognition. Exploring the Science of the Mind. Third media
edition. New York: Norton
Robins, Robert 1967. A Short History of Linguistics. London: Longman
Robinson, Jane 1970. 'Dependency structure and transformational rules', Language
46: 259-285.
Sgall, Petr, Hajicová, Eva, and Panevova, Jarmila 1986. The Meaning of the Sentence
in its Semantic and Pragmatic Aspects. Prague: Academia
Sleator, Daniel D. and Temperley, David 1993. 'Parsing English with a link grammar',
in Anon. Proceedings of the Third International Workshop on Parsing
Technologies. Tilburg: 277-292.
Slobin, Dan 1996. 'Two ways to travel. Verbs of motion in English and Spanish.', in
Masayoshi Shibatani & Sandra Thompson (eds.) Grammatical Constructions.
Their form and meaning. Oxford: Clarendon, pp.195-219.
Smith, Neil 1999. Chomsky. Ideas and ideals. Cambridge: Cambridge University
Press
Starosta, Stanley 1988. The case for lexicase. Pinter
Sugayama, Kensei 2003. Studies on Word Grammar. Kobe: Kobe City University of
Foreign Studies
Sugayama, Kensei and Hudson, Richard 2006. Word Grammar. New perspectives on
a theory of language structure. London: Continuum
Talmy, Leonard 1985. 'Lexicalisation patterns: semantic structure in lexical forms', in
Tim Shopen (ed.) Language Typology and Syntactic Description III:
Grammatical categories and the lexicon. Cambridge: Cambridge University
Press, pp.57-149.
Talmy, Leonard 1991. 'Path to realization: A typology of event conflation.', in Anon.
Proceedings of the Seventeenth Annual Meeting of the Berkeley Linguistics
Society. Berkeley: 480-519.
Tarvainen, Kalevi 1987. 'Semantic cases in the framework of dependency theory.', in
René Dirven & Gunter Radden (eds.) Concepts of Case. Gunter Narr, pp.75102.
Tesnière, Lucien 1959. Éléments de syntaxe structurale. Paris: Klincksieck
Tomasello, Michael 1998. 'Constructions: A construction grammar approach to
argument structure', Journal of Child Language 25: 431-442.
Van Valin, Robert 1993. Advances in Role and Reference Grammar. Amsterdam:
Benjamins
Vater, Heinz 1975. 'Toward a generative dependency grammar', Lingua 36: 121-145.
Winograd, Terry 1972. Understanding Natural Language. New York: Academic
Press
Zwicky, Arnold 1992. 'Morphology: Morphology and syntax', in William Bright (ed.)
International Encyclopedia of Linguistics. Oxford: Oxford University Press,
pp.10-12.
Download