for Anna Kibort (ed.) Syntactic Government and Subcategorisation Dependency grammar 1. A brief history The terms government and subcategorization are an interesting pair in terms of the history of syntactic theory because one is ancient while the other is an invention of the twentieth century with much the same meaning – a clear case of reinventing the wheel. By the twelfth century, grammarians were already using the Latin verb regere, ‘to rule’, to describe the way in which a preposition or verb dictated the case of its complement (Robins 1967:83), and (according to the Oxford English Dictionary) the verb govern is used in the same sense by the early seventeenth century. Moreover, the intellectual and metaphorical foundations for these terms go even further back in time; so in second-century Alexandria, Apollonius discussed the ways in which different verbs and prepositions selected different cases in dependent nouns, and even fathered the term ‘transitive’ (Robins 1967:37). These selection relations received considerable attention in the Arabic grammarians of the eighth century onwards, which described a word as ‘governing’ (Arabic ‘a:mil) another word whose case it selected, and even went so far as to notice that in Arabic (a head-initial language) the governor generally precedes the governed (Owens 1988:53). In short, the terms govern and government are at least five hundred years old, and the underlying idea of an asymmetrical relation in which one word controls another is almost two thousand years old. (Similar ideas in Panini may well push the history even further back to the fifth century BC; Robins 1967:145.) In contrast, the term subcategorization dates back only to 1965, when Chomsky introduced it (Chomsky 1965) as a solution to a problem that arose from the phrasestructure theory that he had espoused. The basis for phrase structure is the assumption that only one kind of structure can be represented: that between a whole and its parts. Thus, the phrase cows eat grass could be related to its parts (cows and eat grass) and eat grass to its parts (eat and grass) but, crucially, these two parts could not be related directly to one another. This curiously restrictive assumption meant that there was no way to show that the presence of grass had anything to do with the properties of eat as a transitive verb. Chomsky’s solution was to add ‘features’ to eat which showed how it could combine with other elements: ‘selection features’ for semantic restrictions and ‘subcategorization features’ for syntactic restrictions. The term subcategorization is odd, because it would normally mean nothing but ‘subclassification’ although its intended meaning is much more specific: subclassification according to the syntactic properties of accompanying elements. However, there are deeper objections to this ‘solution’ because it is part of a theory which starts by claiming that part-part relations are not permitted in syntactic structure. Subcategorization features, like selection features, undermine this principle by allowing part-part relations in the guise of a classification of the head word, but without acknowledging that each such feature implies a direct relation between the parts. Fortunately, at least the terminology in Chomskyan linguistics has reverted to the more traditional govern and government, a direct relation between governor and governed, though this relation is still treated as parasitic on the whole-part relations of phrase structure. Unfortunately, both the terminology and the ideas of the earlier theory persist in models such as Phrase Structure Grammar (Pollard and Sag 1994). It is interesting to trace the recent history of these ideas, going back to the first attempts to produce formal underpinnings for syntactic analysis – systems of analysis which were sufficiently explicit and clear to be represented diagrammatically. This history is interesting not only because it concerns some of the most elementary assumptions of modern syntactic theory (such as government), but also because it shows varying influence of the various ‘stakeholders’ in syntax – descriptive linguistics, logic and education. For each theory considered below, the immediate question for us is how the theory concerned accommodated government relations. We start with the ‘sentence diagramming’ which was invented in the United States in the early nineteenth century and reached maturity in 1877 in the work of Reed and Kellogg (Gleason 1965:73). Their diagramming system allowed structures like Figure 1 for the sentence Cows eat fresh grass, with horizontal lines for government relations and diagonals for what we call adjuncts. The vertical lines distinguish subjects from objects, with the verb in the centre of the diagram as the heart of the sentence. Each relation is shown as a single line, so government relations are shown directly. Cows eat grass. Figure 1: A sentence diagram This diagramming system was intended for use in schools, and was so successful that it is still taught today in many American (and other) schools – indeed some readers of this chapter may have learned it as children. It even has a 21st century face in a website (http://1aiway.com/nlp4net/services/enparser/) that generates ‘Reed and Kellogg’ sentence diagrams to order and an informative page on Wikipedia. However, so far as I know it was never used in descriptive linguistics, so it remained a product of, and for, the school classroom, without any theoretical or research-based underpinnings. On the other hand, it may well have been part of the school education of academic linguists, so it is hard to rule out the possibility that it at least suggested the possibility of using diagrams to display sentence structure. One feature of Reed and Kellogg diagrams is that (in modern parlance) they show dependency relations (government and adjunction) but not precedence (word order); for instance, Figure 1 shows that fresh is an adjunct of grass, but does not show which word follows which. Another important feature is that they do not recognise phrases as such, although phrases are implicit in the dependency lines. These features were given a more thorough theoretical foundation during the 1920s and 1930s by at least two European linguists, both of whom wanted to improve the teaching of grammar in schools. On the one hand, Otto Jespersen recognised the hierarchical ordering of words in phrases such as a furiously barking dog, but (confusingly) concluded that the word classes concerned could be arranged in three ‘ranks’ so that ‘tertiary’ words such as furiously consistently attach to ‘secondary’ words like barking, which in turn attach to ‘primary’ words such as dog (Jespersen 1924, Pierce 2006). And on the other hand, Lucien Tesnière not only wrote a major theoretical discussion of dependency relations (published posthumously as Tesnière 1959), but also produced a simple diagramming system. He does not seem to have known about the Reed and Kellogg system, but he may have been influenced by German grammarians who developed the idea of dependency, as well as the name, in the early nineteenth century (Forsgren 2006). His notation showed dependency relations more consistently and iconically, with dependents consistently written lower than the words on which they depend in a tree-diagram called a ‘stemma’ such as Figure 2. Notice that the stemma has the same features as the Reed and Kellogg diagrams: showing dependency but not precedence, and leaving phrases implicit in the word-word dependencies. eat moo cows cows grass fresh Figure 2: A stemma Another European attempt to formalise the notion of government led to Categorial Grammar, but this time the development was driven by logic (Morrill 2006). A verb such as eat is incomplete in itself, and needs to combine with a following noun to produce a phrase such as eat grass; but this too is incomplete until it combines with a preceding noun to produce a phrase such as cows eat grass, which is complete. These notions of one word ‘needing’ another accurately reflect the old tradition of government, although they are extended to include subjects; but they are also extended in another direction to include adjuncts such as fresh, which is said to need a noun in order to combine with it and produce another noun – a case of a dependent being sanctioned by itself rather than by the head. Categorial Grammar is sensitive to word order, but like Jespersen’s theory it bases the classification of words directly on their combinatorial needs rather than (as in traditional grammar) on a bundle of morphological, syntactic and semantic criteria. The ‘categories’ of Categorial Grammar replace the traditional word classes such as ‘noun’ and ‘verb’, so (at least in early versions of the theory) there is a category for intransitive verbs (N\S) and another for transitive verbs ((N\S)/N), but none for ‘verb’. On the other hand, the basis in logic allows a very simple translation from a syntactic structure to a logical semantic structure. Given the orientation to logic rather than pedagogy, it is unsurprising that there is no standard diagrammatic representation for syntactic structure in Categorial Grammar, comparable with Reed and Kellogg sentence diagrams or stemmas. Figure 3 uses an ad hoc notation which at least reflects the spirit of Categorial Grammar. fresh: N/N eat: (N\S)/N Cows: N + + + grass: N fresh grass: N eat fresh grass: N\S Cows eat fresh grass: S Figure 3: A categorial grammar analysis Meanwhile, in the USA the main demand for syntactic theory came from descriptive linguists working on the local native-American languages, for whom the tradition developed for highly inflected, case-based, languages such as Latin, Greek, Hebrew and Arabic proved hard to apply. Bloomfield’s reaction to the problem was to start from scratch, making the minimum of basic assumptions about how sentences were structured (Bloomfield 1933). The result was immediate-constituent analysis, in which the only relation needed, or allowed, in syntax is the whole-part relation between a phrase and its parts. When diagrams started to be used in works such as Nida’s analysis of English (Nida 1960) they were the tree diagrams which later became familiar through Chomsky’s work, as in Figure 4. S VP NP NP N V Cows eat Aj N. fresh grass. Figure 4: A phrase-structure tree It is true that some of these early systems, including Nida’s, acknowledge the importance of the traditional grammatical relations such as ‘subject’ and ‘object’ by recognising these as sub-divisions of the whole-part relations; but these are still not true part-part relations like traditional government; and phrase structure, Chomsky’s purification of immediate-constituency analysis, left no room even for these concessions to traditional analyses. When Chomsky was developing his ideas about syntactic structure, he was aware of Categorial Grammar but not, apparently, of other theories which treated government relations as basic (pc). The main models for his work were, paradoxically, the post-Bloomfieldian theories of Zellig Harris (Harris 1951) which he later attacked so vehemently, and the branch of mathematics called ‘formal language theory’ (and in particular the theory of recursive functions - Smith 1999:56). Government relations played litte or no part in these models, so they were entirely absent from Chomsky’s earliest work, and it was only in 1965 that he recognised them through the introduction of the ‘subcategorization’ features discussed earlier. Since then, the part-part relations of government have increased in importance, but the framework of whole-part relations remains basic. It is unfortunate that it was possible to argue in the 1960s that dependency grammars were equivalent to phrase-structure grammars (Gaifman 1965, Robinson 1970), because this allowed syntacticians to conclude that dependency grammars could safely be ignored. In fact, the arguments only showed that one very limited version of dependency grammar was equivalent to an equally limited version of phrase structure; and in any case, the equivalence was only weak – in terms of the strings of symbols that could be generated – rather than the much more important strong equivalence of the structural analyses, where dependency structures were clearly not equivalent to phrase structures. During the decades since Chomsky’s generative grammar rose to prominence in syntactic theory, other approaches have also been growing and developing. Categorial Grammar has turned into the most popular option for logically oriented linguists, and has been combined with phrase structure in Head-driven Phrase Structure Grammar (Pollard and Sag 1994). A number of approaches have combined phrase structure with a traditional functional analysis (in terms of subjects, objects and the like), but interpreted as whole-part relations rather than as government relations between a head and its dependents: Systemic Functional Grammar (Halliday 1985). Relational Grammar (Blake 1990) Functional Grammar (Dik 1991) Lexical Functional Grammar (Bresnan 2001) Role-and-Reference Grammar (Van Valin 1993) Construction Grammar (Croft 2007, Fillmore and others 1988, Goldberg 1995, Tomasello 1998) What these approaches share is the basic Bloomfieldian assumption that the foundation for syntactic structure is the whole-part relation between a phrase and its parts, rather than the part-part relation between a word and its dependents. During the same period, however, the dependency-based tradition also developed a range of theories that is almost as diverse as the tradition based on phrase structure. Many of these theories are inspired by the formal developments of generative grammar and computational linguistics (Kruijff 2006), including at least the following list (where some theories are given ad hoc names): Abhängigkeitsgrammatik (for computing) (Kunze 1975) Generative Dependency Grammar (Vater 1975, Diaconescu 2002) Case Grammar (Anderson 1977) Functional Generative Description (Sgall and others 1986) Lexicase (Starosta 1988) Abhängigkeitsgrammatik (for schools) (Heringer 1993) Meaning Text Theory (Mel'cuk 2004) Tree-Adjoining Grammar (Joshi and Rambow 2003). Link Grammar (Sleator and Temperley 1993). Dependency Parsing (Kübler and others 2009) Catena Theory (Osborne and others 2012) There is also a very productive research tradition called Valency Theory which focuses specifically on government relations (Allerton 2006, Herbst and others 2004) and whose key term, valency, was introduced into syntax by Tesnière. The enormous diversity within the dependency tradition undermines most simple generalisations about ‘dependency grammar’ in relation to government. Instead of trying to survey the diversity, I shall present the version of dependency grammar that I have been developing since about 1980 under the name ‘Word Grammar’ (Gisborne 2010, Gisborne 2011, Duran-Eppler 2011, Sugayama 2003, Sugayama and Hudson 2006, Hudson 1984, Hudson 1990, Hudson 2007a, Hudson 2010). First, however, I start with a survey in section 2 of the kinds of information that could, and I believe should, be covered by the term government. Section 3 then introduces the relevant general characteristics of Word Grammar (including, of course, its use of word-word dependencies), and shows how these characteristics allow Word Grammar to accommodate government in all the diversity surveyed in section 2. 2. The scope of government and valency Traditionally, government applied either to a word’s complement (e.g. a preposition governs its complement) or to that complement’s case (e.g. the Latin preposition de, ‘about’, governs the ablative case). However, the Arabic grammarians extended the same notion to the inflectional categories of dependent verbs, and it is easy to argue for a much wider application of the term to cover any word-complement pair. For instance, in the ‘DP’ analysis, a boy consists of a determiner followed by its complement, so a counts as the governor of boy. This extension is fully justified by the similarities across word-classes in the way they restrict their complements. And, of course, if we insist on fidelity to the tradition, we are left with a terminological gap: what do we call the relation between a word and its complement when the word is not a verb or preposition? Extending the term govern fills this gap perfectly, and removes the otherwise arbitrary historical restriction to verbs, prepositions and case. Some dependency grammarians have extended the term in another direction, so that governor is simply the converse of dependent (e.g. Tarvainen 1987:76). This regrettable extension misses the main point of the idea (and terminology), which is that the governor controls both the presence and other properties of the dependent; in this sense, Cows eat grass constantly shows a verb eat governing grass but not constantly. If govern is extended to cover constantly as well as grass, then we need another term to distinguish them, so we may as well stick with govern for the relation between eat and grass, with constantly as its adjunct (or Tesnière’s circonstant). On the other hand, we do need a converse of the term dependent. One obvious possibility is head (the term I once used), but this term is used in phrase structure to relate a word to its containing phrase, rather than to its individual dependents; for example, in the phrase green grass, the word grass is head of green grass, and not of its dependent green. To avoid this confusion I now prefer the term parent, so in the pair green grass, grass is the parent of green, which is its dependent; and in Cows eat grass constantly, the verb eat is the parent of all the other words. If the governor of an element is the word that controls its presence and properties, then two further extensions are needed to the traditional scope of government. On the one hand, it must include subjects as well as complements; after all, the finite verb eats demands a subject even more strongly than it demands a complement. With this extension, the scope of government includes all ‘arguments’ or (following Tesnière) ‘actants’. Whatever mechanism is responsible for providing a word with its correct complements can also be extended to deal with subjects. The other extension is to parents (in the sense I introduced above, where a word’s parent is the word on which it depends). This extension is already established in all but name in Categorial Grammar, where (as explained earlier) adjuncts are elements that take their parent as argument; so constantly might be classified as a form that takes a verb as its argument to produce another (larger) verb. It is also justified by the facts, because dependents can ‘govern’ their parents in much the same way as vice versa. The reason why government was traditionally applied only to complements was that the classical languages Latin and Greek are both ‘dependentmarking’, marking most dependencies by case inflections on the dependent. But not all languages are like this, and ‘head-marking’ languages locate the marker of a dependency on the parent (Nichols 1986, Nichols 2006). In such a language, the dependent controls the inflectional form of the parent. Moreover, it is easy to find examples even in more familiar dependent-marking languages where a dependent selects its parent. For example, in many European languages a past participle selects between ‘have’ and ‘be’ as its auxiliary verb, but we all agree that the participle depends on the auxiliary. Similarly, many nouns select their preferred preposition – e.g. in school but at university – and once again, it is the governor in this relation that depends syntactically on the governed. And of course, even adjuncts are fussy about the words they can depend on, so very can depend on an adjective or an adverb, but not on a verb (very interesting, but not *very interest). And even more generally, the one thing we know about the syntax of almost every word except finite verbs is that it needs a parent – again a clear case of government. In conclusion, I am suggesting that the relationship described as ‘government’ can be found not only when the ‘governed’ is the complement of the governor, but also when it is the subject or the parent. The extension to parents means that complements and subjects govern their parents reciprocally, while adjuncts merely govern their parents but not vice versa. But even where government is reciprocal, it is generally unequal in its effects; for example, in Cows eat grass, the nouns cows and grass only govern eat to the extent that they each need some word as their parent, whereas the verb eat imposes much more specific requirements on its subject and complement. Given this extended definition of government, we can now ask what kinds of demands a governor may make. Following Tesnière once again, we can call these demands the word’s valency (a metaphor from chemistry, where atoms have valencies which determine how they combine with other atoms). In these terms, then, the question is: What is the scope of valency? The following survey raises the fundamental question of whether there are any special limitations on valency. Is language special – a dedicated module of the mind – or is it just ordinary cognition applied to words? We start with some rather obvious or well established limitations of government, which are the best case for special status; but in section 3 I offer a functional explanation for even these facts. The most obvious limitation on government is the list of relations to which it applies. For example, although a word governs its complement, it cannot govern a complement of its complement. We can call this limitation the Locality restriction: (1) Locality. A word A can govern a word B only if B is linked directly to A by a dependency relation. For example, we might imagine a language in which word A requires a dative case on the complement of any preposition that happens to act as complement of A; but according to locality, this language is impossible. It is true that there are well-attested examples of words that require a particular case on the complement of a dependent preposition; for example, in German the preposition auf governs the dative when it means ‘on’ and the accusative when it means ‘onto’, but when it is itself governed by the verb warten, ‘wait’, it only governs the accusative (Er wartet auf Dich, ‘He is waiting for you’). But valency patterns like this always involve a specific lexeme (in this case, auf) as the intervening complement, so can easily be broken down into two steps by recognising two ‘sub-lexemes’ of the preposition: AUF/acc and AUF/dat, governing the accusative and dative respectively. Given this distinction, WARTEN simply governs AUF/acc, and the government is purely local. Another limitation on government is the apparent ‘blindness’ of syntax to both phonology and morphology (Miller and others 1997, Zwicky 1992). This limitation prevents valency from referring to either phonological or morphological properties of a word which it governs; for example, a verb cannot require its object to start with /b/ or to have two syllables, nor can it require it to be (in Latin) a member of the first declension-class (inflected like amicus, ‘friend’, but not like agricola, ‘farmer’, dux, ‘leader’ or manus, ‘hand’), or to be morphologically regular. Thanks to extensive research by Zwicky, Pullum and others, these limitations appear to be real. They seem to be especially well attested in valency, even though other parts of syntax such as word order may be affected by phonological features. Section 3 below suggests an explanation for these restrictions, and indeed argues that the blindness of valency to both phonology and morphology has the same explanation as the locality principle. The two restrictions are summarised here: (2) Phonological blindness. A word cannot directly govern any phonological property of another word. (3) Morphological blindness. A word cannot directly govern any morphological property of another word. Having noted these three limitations on valency, the natural question is what other limits there are; and my answer, which I try to justify below, is that there are none. More precisely, any limits that exist can be explained in functional terms. For instance, there must be some upper limit to the number of complements a given word can have. English seems to have a maximum of about three complements, found with verbs such as BET, as in (4), and although more may be possible, the world record is likely to be in single figures. (4) I bet him a pound that he couldn’t do it. Assuming this is so, the explanation is easy to find in the processing demands of tracking multiple complements, combined with the very small communicative benefits of verbs with multiple complements (in comparison with structures that distribute the same number of arguments across two or more verbs). In other words, the valency patterns that exist in different languages are simply different ‘engineering solutions’ to the problem of creating an efficient communication system with a reasonable balance between the various competing demands and pressures on such a system (Evans and Levinson 2009). To give an idea of the flexibility of valency, consider what we might imagine to be the most basic requirement of all: that a word’s valency should only include other words. Standard views of valency, in all theories, follow the structuralist tradition of isolating language from everything else, so this restriction might seem to be a theoretical given. However, this isolation of language from other behaviour is arguably a research strategy rather than a fact. In reality, a speaker’s knowledge of language is intimately integrated with the rest of their knowledge. A particularly clear example of this integration is the valency of the verb GO, as in The train went ...., where (in speech) the dots stand for some sound such as a whistle or some other noise (Hudson 1985). Let us call this sub-lexeme of GO ‘GO/noise’. Crucially, speakers in my generation allow the complement of GO/noise to be any appropriate non-verbal noise, but not a word; so The train went woosh is barely acceptable. (In contrast, younger speakers allow GO to introduce speech, as in He went ‘Look out!’.) A word that can take a mere noise as its complement is clearly a very serious threat to the supposed boundary round language, and should discourage us from expecting any substantive constraints on valency. What properties of a word may be governed by another word? The answer seems to be ‘any property other than its morphology or phonology’. The following list (based on Hudson 1988) is meant to be simply suggestive of the diversity of possibilities, and in each case the examples show restrictions on parents as well as on complements or subjects: word class (a preposition takes a noun as its complement; very takes an adjective or adverb as its parent). inflection (German FOLGEN, ‘follow’, takes a dative object; in Eastern Tzutujil, the phrase r-cheeuaal ja kinaq’ ‘3s-stick the bean’ i.e. ‘beanpole’, contains the prefix r- marking the possessed-possessor dependency on the parent, so it must be part of the valency of the possessor noun (Nichols 2006). lexical identity (the complement of DEPEND is ON; the parent of French perfect ALLER, ‘go’, is ÊTRE, ‘be’). word order (the adjectival dependent of SOMETHING follows it, as in something nice; the parent of ENOUGH precedes it, as in big enough) meaning (PUT takes a complement which identifies a place; the parent of GREEN denotes a solid object, hence the badness of green ideas) In short, all the familiar ‘linguistic’ properties of words, other than their pure morphophonology and phonology, seem to be subject to valency restrictions. On the other hand, these are not the only properties that words have. Another property of any word is the language to which it belongs. Are there any words that require their complement or parent to be in a particular language? This is a question for research on code-switching, and the answer seems to be positive: there are indeed cases in the literature where a bilingual community allows code-switching after some words but not after others. For example, German-English bilinguals in London were found (Duran-Eppler 2011:188) to allow the English because to have either an English complement or a German one (e.g. either because it rained or because es regnete), but the German equivalent, weil, allowed only a German complement (weil es regnete, but not weil it rained). Other word properties include the sociolinguistic categories of dialect and register, and the diachronic category of etymology; do these properties ever figure in valency restrictions? No case comes to mind, but it is a reasonable question for future research. Another striking feature of valency restrictions is their flexibility. Take the English ‘caused-motion’ construction, exemplified in (5) and (6) (Goldberg 1995:152). (5) They laughed the poor guy out of the room. (6) Frank sneezed the tissue off the table. It is clear that LAUGH and SNEEZE are basically intransitive verbs, and yet examples like these are incontestably good English. This extension of basic valency fits into a larger pattern in which verbs that basically describe an activity can be combined with a ‘satellite’ to describe a motion (Talmy 1985, Slobin 1996). Other examples that illustrate the same pattern are these: (7) The bottle floated out of the cave. (8) They waltzed into the room. Unlike basic valency patterns, which are stored lexeme by lexeme or derived from very specific word classes, these seem to be created as needed, rather like metaphors. Indeed, the term ‘grammatical metaphor’ fits them rather well (Halliday 2002:282). On the other hand, these extensions are clearly part of English grammar rather than creative innovations, because typological studies show that languages such as Spanish and French do not allow them (Talmy 1991). To conclude this survey of the scope of government restrictions, I have suggested a much broader research agenda than is usually recognised in research on government or valency. On the one hand, I have suggested that valency restrictions apply not only to a word’s complements but also to its subject and to its parent, on the grounds that in all three relations, one word can impose similar restrictions on the kinds of word with which it may contract dependency relations. And on the other hand, I have also suggested that these valency restrictions may involve a wider range of properties than are usually considered relevant, including the most fundamental property of all: being a word (where GO/noise is an example of a word which requires a non-word as its complement). This very wide range of relevant properties suggests that, in principle, any kind of word-property may be relevant to valency restrictions. On the other hand, I have also noted three limitations that almost certainly do apply: the locality principle which prevents a word’s valency from reaching beyond the words with which it is directly connected by dependencies, and the principles of phonology-free and morphology-free syntax, which prevent a word’s valency from reaching into the word’s phonological and morphological structure. The question for any theoretical model is why valency is so free in most respects, but so limited in these three ways. 3. Word Grammar and valency Word Grammar (WG) has roots not only in dependency grammar, but also in a number of other traditions of linguistic theory. Perhaps the most important influence, for present purposes, comes from formal generative linguistics, which focuses attention on the formal properties of language as revealed in the fine detail of more or less formalized grammars. This attention to formal detail explains why WG may have a special contribution to make in the theory of government. Almost equally important is the theory’s cognitive orientation, which owes a great deal to Stratificational Grammar (Lamb 1966,Lamb 1998) and early AI (Winograd 1972, Anderson 1983). Consequently, WG is one of the earliest members of the ‘cognitive linguistics’ family of theories (Hudson 2007b). The cognitive orientation is a reaction against the traditional view of language as a separate phenomenon; so instead of assuming from the outset that language is special, we assume the opposite: that language is a branch of knowledge, special only in that it involves the knowledge of words and how to use them. Once again this is important for government because it predicts that valency expectations can range widely over everything we know about words, and may even go beyond words (as indeed it does in the case of GO/noise). A cognitive orientation is also relevant to the choice between dependency structure and phrase structure, given that the latter rests on the claim that human minds are incapable of grasping direct relations between individual words – a most implausible assumption, given the ease with which we grasp similar relations in other areas of life such as social relations. (Rather oddly, other ‘cognitive’ theories of language follow the phrase structure tradition without question.) The effect of combining a focus on formal structures with a cognitive orientation is the WG claim that language is a cognitive network (Hudson 1984:1). This claim goes well beyond the widely held view that the lexicon is a network of lexical entries (Pinker 1998), or that language is a network of conventional units (Langacker 2000:12) or constructions (Croft 2004). In WG, language is a network of atomic concepts, each of which is merely an atomic node whose ‘content’ consists entirely in its links to other nodes. For example, the word MAN is merely a node, which only carries a label as a matter of convenience for the analyst and whose properties consist entirely of links to other nodes – the node for the general concept ‘man’; the nodes for the morphemes {man} (also linked to other words such as MANLY) and {men}; the node for ‘noun’; and the node for ‘English word’. This tiny fragment of the network can be diagrammed, using WG notation, as in Figure 5. English word noun ‘man’ meaning g MAN MANLY base base realization {men} MAN, plural {manly} {man} part 1 Figure 5: A network for MAN The importance of the Network Notion (Reisberg 2007:252), the idea that concepts have no content except for their links to other concepts, is that a conceptual network can model the flow of activation in the underlying neural network because a concept’s content is part of the network. For example, if in speaking we activate the concept ‘man’, this activation can spill over to the neighbouring concept MAN, so that this becomes available for utterance; and conversely, if in listening we hear (and activate) the concept {man}, this activity can spill over onto MAN (or MANLY), allowing us to recognise this lexeme. In contrast, activation cannot spread to words from their meanings or forms if these are encapsulated in a complex structure such as a ‘construction’. The small triangles in Figure 5 indicate the ‘isa’ relation; for instance, this network shows that MAN isa noun and also isa English word, and that ‘MAN, plural’ isa MAN. This relation has its own notation to reflect its fundamental status as the basis for both classification and generalisation. ‘Isa’ is the relation not only between any class and its members, but also between a type and its tokens or instances. Furthermore, it carries all ‘inheritance’, whereby an instance inherits properties from the stored types and classes; for instance, in Figure 5 the isa link from ‘MAN, plural’ to ‘MAN’ and to ‘plural’ (not shown) allow it to inherit properties such as inflectional patterns from these categories, and (recursively) from even more general categories such as ‘noun’ and ‘English word’. But of course the underlying logic is default inheritance, which accommodates exceptions such as the irregular realization of ‘MAN, plural’ as {men} rather than the regular {mans}. One further tenet of WG is relevant to valency: that language-learning is ‘usage-based’ (Bybee 1999, Kemmer and Israel 1994, Langacker 2000, Barlow and Kemmer 2000). This tenet goes beyond the truism that we learn our language by observing other people’s usage, by claiming that this mode of learning is reflected in the structure of the language network. First, as I have just implied, this network is implemented in a neural network which gives each node a quantitatively variable ‘activation level’ – or, more precisely, two such levels: a relatively permanent one which reflects one’s lifetime experience in terms of frequency, recency and emotional impact; and a constantly varying one which reflects current concerns and activity. And second, usage-based learning means, inevitably, that one’s memory includes a vast collection of half-remembered specific tokens or exemplars as well as the generalised memories that collectively build into a grammar and lexicon. The specific cases are inevitable companions of the generalisations if (as I believe) we have no mental mechanism for ‘forgetting’ specifics once we have generalised from them. How does this theoretical framework help in the understanding of government and valency? It throws light on a number of questions. First, why do government and valency restrictions exist at all? After all, if we were designing a language from scratch, and the only aim was to allow it to express indefinitely many complex meanings, it is not obvious that we would include any government or valency restrictions in the language. Each word would have its meaning, which could be expanded and modified in various ways by adding other words as dependents. For example, we might imagine a very simple version of English which contained just the words likes, he, her, with the network structures shown in Figure 6. In this language, syntax would consist of nothing but a general cognitive principle that variables (such as x and y) should be bound to suitable constants (Hudson 2007a:46-50). In this case, the suitable constants would be ‘known male’ and ‘known female’, expressed respectively by he and her, so to express the more complex meaning ‘known male like known female’ one could say any of He likes her, her likes he, he her likes and so on. If, as I argued earlier, ‘government’ should expand to include any restrictions on dependents or parents, then every restriction which would limit this freedom qualifies as ‘government’ so government includes the whole of syntax, and the question is why syntactic restrictions exist. x y liker liked x like y known male known female meaning meaning meaning likes he her Figure 6: Three-word English The functional explanations for syntax are obvious and well known: syntactic restrictions provide helpful guidance to the hearer, so that a sentence such as He likes her has just one possible meaning, not two. When syntax provides multiple independent restrictions, incompatible combinations such as *Her likes him or *He her likes are ungrammatical rather than ambiguous, so syntax is autonomous. Moreover, syntax is driven by pressures other than mere avoidance of ambiguity, notably the vagaries of usage. To take a simple example, when the verb HANG is used literally the natural preposition for use with things such as hooks is ON, as in (9). (9) He hung his coat on the hook. But when used metaphorically, HANG still takes ON: (10) He hung on her every word. And the same is true for related verbs such as DEPEND and RELY. The answer to the question about why government exists, therefore, seems to be a mixture of processing pressures (guiding the hearer) and other pressures such as building on the patterns experienced in usage. The remaining questions concern the technicalities of government. First, why is it that so many valency restrictions are lexical, in the sense that a lexically-specified governor requires a particular, lexically-specified, governed? As we have just seen, the verbs HANG, DEPEND and RELY require the preposition ON, but there are many other similar restrictions found not only in conventional government patterns but also in the vast territory of collocation (e.g. white wine), cliche (e.g. I hesitate to say this, but ...) and idiom (e.g. kick the bucket). In each case, a stored unit (in the conceptual network of language) consists of a sequence of specific words related just as they would be in a larger sentence. In the terminology of Construction Grammar, each of these stored units is a construction. The question is why so many constructions appear to link individual words directly to one another. This fact is a problem for phrase-structure theory given its ban on direct relations between individual words. For example, in a phrase-structure analysis of (10) the verb hang is only related indirectly to the preposition on via the preposition phrase on her every word. In contrast, any version of dependency grammar, including Word Grammar, has a simple explanation: stored constructions can relate single words to each other, either as lexical items or as more general classes, because words are directly related in sentence structure. We now turn to the restrictions on valency discussed in section 2, starting with the locality principle (1) which prevents a governor from governing any words other than its own direct dependents or parents. For instance, it can restrict the morphosyntactic case of its own dependents, but not of their dependents. The reason why locality applies is that any such restriction would imply a direct relation between the governor and the governed, so an extra direct link would automatically be added to the network and the link would become direct. And, indeed, there are clear cases in language where this seems to have happened; consider, for example, the collocation nice little (as in a nice little cottage). The fact is that there is no direct dependency link, as such, between two adjectives modifying the same noun; rather, the dependencies only link them indirectly via their shared parent. This is confirmed by the fact that nice little cannot be used without a shared parent, as in the hypothetical (11): (11) *That cottage is nice little. Consequently we must assume some kind of link between nice and little, which is not a syntactic dependency but which allows a government link between them. The example shows that locality is an automatic consequence of the assumption that language is a network. A similar analysis may be possible for German adjective inflections which are influenced by the choice of determiner as in (12) and (13): (12) ein junger Mann ‘a young man’ (nominative) (13) der junge Mann ‘the young man’ (nominative) As in the example of nice little, there seems to be a non-dependency link between the determiner and the adjective, but here we are not dealing with a mere collocation but with the central part of grammar, morphosyntax. The other two constraints on government were honological blindness (2) and morphological blindness (3). Both of these constraints follow from locality, combined with the WG assumption that morphology and phonology are autonomous levels. This assumption leads to network fragments such as Figure 7, in which a lexical item such as BARK is related directly to its subject, its base and its meaning. By government, BARK can restrict its subject y in various ways, but locality means that these restrictions must be summarised directly in y, by recognising a general category to which y can be linked as an example. But such a category would be unlikely to arise for a purely morphological or phonological property of y. A morphological restriction on y would require a restriction on y’s base (such as belonging to a particular inflectional class), and a phonological restriction on y would apply not to y itself, but to the realization of the base of y. According to the WG theory of learning (Hudson 2007a:52-9), it would in fact be possible to create a special concept such as ‘word beginning with /b/’, but this could not happen unless this property (beginning with /b/) correlated with some other property. Consequently, any property which is purely morphological or purely phonological must be invisible outside morphology or phonology. barker ‘x bark’ x meaning meaning subject y BARK base {bark} realization /b + ɑ: + k/ Figure 7: A four-level network for BARK And finally, there is the question about why valency allows so much flexibility. How is it that we can take what is basically an intransitive description of an action such as LAUGH, SNEEZE, FLOAT or WALTZ and turn it into a description of a movement resulting from this action, as in (5) to (8), simply by adding a direction adjunct and (in some cases) an object (Holmes 2005)? And why is this extra ‘result’ structure possible in English, but not in French? Given the formal apparatus of a conceptual network and default inheritance, there are two possible answers. One is that the result structure is an optional property of ‘verb’, so it is available for inheritance by any verb – provided its meaning and its existing syntax are compatible. The other solution is to treat the relation between the basic verb and its resultative version as an example of word-formation. In this analysis, English allows any verb to have a ‘resultative’ with the same properties but with extra resultative structure; but this relation is not found in French. The choice between these two alternatives is a question for research, but the main point is that the existence of the pattern is easy to explain within the framework of WG. In conclusion, then, dependency-based grammars are better than those based on phrase structure as a basis for accommodating the full range of restrictions that qualify as ‘government’ because most such restrictions require a direct link between one word and another – i.e. between a parent and a dependent. Moreover, the traditional examples of government involve restrictions which can be found not only between verbs or prepositions and their complements, but also in a much wider range of syntactic patterns. If ‘government’ is extended to include all these patterns, then we find that the restriction may apply in either direction, with mutual government as a common arrangement. These patterns also favour dependency structure over phrase structure. But government and valency patterns are also relevant to another basic theoretical choice, the one between traditional structuralist analyses and modern cognitive analyses in which language patterns may related directly to non-linguistic patterns. Examples such as the English verb GO show that a word’s valency may require something other than a word, which supports the cognitive approach. A full understanding of government therefore requires a theory which combines dependency structure with a cognitive orientation, such as Word Grammar. References Allerton, David 2006. 'Valency Grammar', in Keith Brown (ed.) Encyclopedia of Language & Linguistics. Oxford: Elsevier, pp.301-314. Anderson, J. R. 1983. 'A spreading activation theory of memory', Journal of Verbal Learning and Verbal Behavior 22: 261-295. Anderson, John 1977. On case grammar: Prolegomena to a theory of grammatical relations. London: Croom Helm Barlow, Michael and Kemmer, Suzanne 2000. Usage based models of language. Stanford: CSLI Blake, Barry 1990. Relational Grammar. London: Croom Helm Bloomfield, Leonard 1933. Language. New York: Holt, Rinehart and Winston Bresnan, Joan 2001. Lexical-Functional Syntax. Oxford: Blackwell Bybee, Joan 1999. 'Usage-based phonology', in Michael Darnell, Edith Moravcsik, Frederick Newmeyer, Michael Noonan, & Kathleen Wheatley (eds.) Functionalism and formalism in Linguistics. I: General papers. Amsterdam: Benjamins, pp.211-242. Chomsky, Noam 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT Press Croft, William 2004. 'Logical and typological arguments for Radical Construction Grammar', in Mirjam Fried & Jan-Ola Östman (eds.) Construction Grammar(s): Cognitive and cross-language dimensions (Constructional Approaches to Language, 1). Amsterdam: John Benjamins, pp.273-314. Croft, William 2007. 'Construction grammar', in Dirk Geeraerts & Hubert Cuyckens (eds.) The Oxford Handbook of Cognitive Linguistics. Oxford: Oxford Univesity Press, pp.463-508. Diaconescu, Stefan 2002. 'A Generative Dependency Grammar', in Mitsuru Ishizuka & Abdul Sattar (eds.) 7th Pacific Rim International Conference on Artificial Intelligence Tokyo, Japan, August 18–22, 2002 Proceedings. Berlin: Springer, pp.605-605. Dik, Simon 1991. 'Functional Grammar', in Flip Droste & John Joseph (eds.) Linguistic Theory and Grammatical Description. Amsterdam: Benjamins, pp.247-274. Duran-Eppler, Eva 2011. Emigranto. The syntax of German-English code-switching. Vienna: Braumüller Evans, Nicholas and Levinson, Stephen 2009. 'The Myth of Language Universals: Language diversity and its importance for cognitive science', Behavioral and Brain Sciences 32: 429-492. Fillmore, Charles, Kay, Paul, and O'Connor, Mary 1988. 'Regularity and idiomaticity in grammatical constructions: the case of let alone.', Language 64: 501-538. Forsgren, Kjell-Åke 2006. 'Tesnière, Lucien Valerius (18931954)', in Keith Brown (ed.) Encyclopedia of Language & Linguistics (Second Edition). Oxford: Elsevier, pp.593-594. Gaifman, Haim 1965. 'Dependency systems and phrase-structure systems.', Information and Control 8: 304-337. Gisborne, Nikolas 2010. The event structure of perception verbs. Oxford: Oxford University Press Gisborne, Nikolas 2011. 'Constructions, Word Grammar, and grammaticalization', Cognitive Linguistics 22: 155-182. Gleason, Henry 1965. Linguistics and English Grammar. New York: Holt, Rinehart and Winston Goldberg, Adele 1995. Constructions. A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press Halliday, Michael 1985. An Introduction to Functional Grammar. London: Arnold Halliday, Michael 2002. On Grammar. New York: Continuum Harris, Zellig 1951. Structural Linguistics. Chicago: University of Chicago Press Herbst, Thomas, Heath, David, Roe, Ian, and Götz, Dieter 2004. A Valency Dictionary of English: A Corpus-based Analysis of the Complementation Patterns of English Verbs, Nouns and Adjectives. Berlin: Mouton de Gruyter Heringer, Hans J. 1993. 'Dependency syntax - basic ideas and the classical model', in Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld, & Theo Venneman (eds.) Syntax - An International Handbook of Contemporary Research, volume 1. Berlin: Walter de Gruyter, pp.298-316. Holmes, Jasper (2005). Lexical Properties of English Verbs. PhD dissertation, UCL, London. Hudson, Richard 1984. Word Grammar. Oxford: Blackwell. Hudson, Richard 1985. 'The limits of subcategorization', Linguistic Analysis 15: 233255. Hudson, Richard 1988. 'The linguistic foundations for lexical research and dictionary design.', International Journal of Lexicography 1: 287-312. Hudson, Richard 1990. English Word Grammar. Oxford: Blackwell. Hudson, Richard 2007a. Language networks: the new Word Grammar. Oxford: Oxford University Press Hudson, Richard 2007b. 'Word Grammar', in Hubert Cuyckens & Dirk Geeraerts (eds.) Handbook of Cognitive Linguistics. Oxford: Oxford University Press, pp.777-819. Hudson, Richard 2010. An Introduction to Word Grammar. Cambridge: Cambridge University Press Jespersen, Otto 1924. The Philosophy of Grammar. London: Allen and Unwin Joshi, Aravind and Rambow, Owen 2003. 'A Formalism for Dependency Grammar Based on Tree Adjoining Grammar.', in Sylvain Kahane & Alexis Nasr (eds.) Proceedings of the First International Conference on Meaning-Text Theory. Paris: Ecole Normale Supérieure, pp. Kemmer, Suzanne and Israel, Michael 1994. 'Variation and the usage-based model', in Katherine Beals, Jeanette Denton, Robert Knippen, Lynette Melnar, Hisame Suzuki, & Enca Zeinfeld (eds.) Papers from the 30th regional meeting of the Chicago Linguistics Society: The parasession on variation in linguistic theory. Chicago: Chicago Linguistics Society, pp.165-179. Kruijff, Geert-Jan. 2006. 'Dependency Grammar', in Keith Brown (ed.) Encyclopedia of Language & Linguistics. Oxford: Elsevier, pp.444-450. Kübler, Sandra, McDonald, Ryan, and Nivre, Joakim 2009. 'Dependency Parsing', Synthesis Lectures on Human Language Technologies 2: 1-127. Kunze, Jürgen 1975. Abhängigkeitsgrammatik. Berlin: Akademie-Verlag Lamb, Sydney 1966. Outline of Stratificational Grammar. Washington, DC: Georgetown University Press Lamb, Sydney 1998. Pathways of the Brain. The neurocognitive basis of language. Amsterdam: Benjamins Langacker, Ronald 2000. 'A dynamic usage-based model.', in Michael Barlow & Suzanne Kemmer (eds.) Usage-based Models of Language. Stanford: CSLI, pp.1-63. Mel'cuk, Igor 2004. 'Levels of dependency in linguistic description: concepts and problems', in Vilmos Àgel, Ludwig Eichinger, Hans-Werner Eroms, Peter Hellwig, Hans-Jürgen Heringer, & Henning Lobin (eds.) Dependency and Valency: An international handbook of contemporary research. Berlin: Walter de Gruyter, pp.188-229. Miller, Philip., Pullum, Geoffrey, and Zwicky, Arnold 1997. 'The Principle of Phonology-Free Syntax: Four apparent counterexamples in French', Journal of Linguistics 33: 67-90. Morrill, Glyn 2006. 'Categorial Grammars: Deductive Approaches', in Keith Brown (ed.) Encyclopedia of Language & Linguistics (Second Edition). Oxford: Elsevier, pp.242-248. Nichols, Johanna 1986. 'Head-marking and dependent-marking grammar.', Language 62: 56-119. Nichols, Johanna 2006. 'Head/Dependent Marking', in Editor-in-Chief:-á-á Keith Brown (ed.) Encyclopedia of Language & Linguistics (Second Edition). Oxford: Elsevier, pp.234-237. Nida, Eugene 1960. A synopsis of English syntax. Norman: Summer Institute of Linguistics of the University of Oklahoma Osborne, Timothy, Putnam, M, and Gross, Thomas 2012. 'Catenae: introducing a novel unit of syntactic analysis.', Syntax 15: 354-396. Owens, Jonathan 1988. The Foundations of Grammar: an Introduction to Mediaeval Arabic Grammatical Theory. Amsterdam: Benjamins Pierce, Marc 2006. 'Jespersen, Otto (1860Çô1943)', in Keith Brown (ed.) Encyclopedia of Language & Linguistics (Second Edition). Oxford: Elsevier, pp.119-120. Pinker, Steven 1998. 'Words and rules', Lingua 106: 219-242. Pollard, Carl and Sag, Ivan 1994. Head-Driven Phrase Structure Grammar. Chicago: Chicago University Press Reisberg, Daniel 2007. Cognition. Exploring the Science of the Mind. Third media edition. New York: Norton Robins, Robert 1967. A Short History of Linguistics. London: Longman Robinson, Jane 1970. 'Dependency structure and transformational rules', Language 46: 259-285. Sgall, Petr, Hajicová, Eva, and Panevova, Jarmila 1986. The Meaning of the Sentence in its Semantic and Pragmatic Aspects. Prague: Academia Sleator, Daniel D. and Temperley, David 1993. 'Parsing English with a link grammar', in Anon. Proceedings of the Third International Workshop on Parsing Technologies. Tilburg: 277-292. Slobin, Dan 1996. 'Two ways to travel. Verbs of motion in English and Spanish.', in Masayoshi Shibatani & Sandra Thompson (eds.) Grammatical Constructions. Their form and meaning. Oxford: Clarendon, pp.195-219. Smith, Neil 1999. Chomsky. Ideas and ideals. Cambridge: Cambridge University Press Starosta, Stanley 1988. The case for lexicase. Pinter Sugayama, Kensei 2003. Studies on Word Grammar. Kobe: Kobe City University of Foreign Studies Sugayama, Kensei and Hudson, Richard 2006. Word Grammar. New perspectives on a theory of language structure. London: Continuum Talmy, Leonard 1985. 'Lexicalisation patterns: semantic structure in lexical forms', in Tim Shopen (ed.) Language Typology and Syntactic Description III: Grammatical categories and the lexicon. Cambridge: Cambridge University Press, pp.57-149. Talmy, Leonard 1991. 'Path to realization: A typology of event conflation.', in Anon. Proceedings of the Seventeenth Annual Meeting of the Berkeley Linguistics Society. Berkeley: 480-519. Tarvainen, Kalevi 1987. 'Semantic cases in the framework of dependency theory.', in René Dirven & Gunter Radden (eds.) Concepts of Case. Gunter Narr, pp.75102. Tesnière, Lucien 1959. Éléments de syntaxe structurale. Paris: Klincksieck Tomasello, Michael 1998. 'Constructions: A construction grammar approach to argument structure', Journal of Child Language 25: 431-442. Van Valin, Robert 1993. Advances in Role and Reference Grammar. Amsterdam: Benjamins Vater, Heinz 1975. 'Toward a generative dependency grammar', Lingua 36: 121-145. Winograd, Terry 1972. Understanding Natural Language. New York: Academic Press Zwicky, Arnold 1992. 'Morphology: Morphology and syntax', in William Bright (ed.) International Encyclopedia of Linguistics. Oxford: Oxford University Press, pp.10-12.