ewg6 - Richard (`Dick`) Hudson

advertisement
Richard Hudson. 1990. English Word Grammar. (Blackwell)
6
SYNTACTIC STRUCTURES
6.1 THE ELEMENTS OF SYNTACTIC STRUCTURE
Like any other syntactic theory, WG (Word Grammar) recognizes words as basic elements of syntactic
structure. However, as the name of the theory suggests, words have rather a special status. First, they are the
smallest units of syntax, if we take syntax as defined by a particular set of syntagmatic relations - the relations,
in fact, with which this chapter is concerned. These relations are not the same as those which are found among
morphological elements, as can be seen most clearly in semitic-type morphology where inflectional distinctions
are shown by means of a pattern of vowels and consonants interleaved with the consonants of the root. For
example, in the Cushitic language Beja (Hudson 1973, 1974) what is common to all inflected forms of the verb
'write' is the consonant-pattern /ktb/ (presumably borrowed from Arabic). The third-person masculine form is
/iktib/ in the past tense, but /kanti:b/ in the present, and similar differences are found regularly in a large subset
of Beja verbs. It can be seen that in some respects the patterns found in morphology are potentially far more
complex than anything found in syntax; but at the same time they are much simpler, because the rules for
combining elements are generally far more rigid than those of syntax.
I assume, then, that the kind of rule which is responsible for arranging words in sentences is not suitable for
defining the ways in which morphemes (and phonological elements, as in the Beja example) combine inside a
word. This is a very traditional view, of course, and has been defended by many linguists since Robins 1959
under the title 'Word and Paradigm' morphology. The principle underlying it is that syntax is 'blind' to
morphological structure; syntactic rules may of course refer to the morpho- syntactic features of words, but they
may not refer to whatever affixes etc. realize these features.
In this respect WG contrasts sharply with the transformational tradition, whose analytic practices follow the
neo-Bloomfieldian tradition of syntactic analysis with the morpheme rather than the word as the basic unit. The
clearest example of this is the treatment of 'Infi', the inflectional affix of a finite verb, which is an increasingly
crucial element in the syntactic analysis of transformational grammarians; e.g. a sentence like Fred came has a
structure in which there is an Infl between Fred and the verb. Such an analysis is completely at odds with the
spirit of Word-and-Paradigm morphology, as are other radical transformational analyses like the one advocated
by Baker et al. (1989), in which the inflectional suffix of a passive verb is treated as a syntactic argument of the
verb.
The principle that words are the smallest units of syntax allows one word to consist of two or more smaller
words with strictly syntactic relations between them. This is clearly needed in the case of clitics - for instance, a
clitic object pronoun satisfies the syntactic valency requirements of its head verb in just the same way as it
would have done if it had been a non-clitic (compare French -,'je connais, 'I know/am acquainted with', with je
connals Jean and je le connais, 'I know John/hirn'). As I suggested in Hudson (1984: 48ff), the clitic and its
head are both part of a larger word whose classification is irrelevant but whose internal structure defines the
order in which they occur (in the typically rigid fashion to be expected inside a word). I have nothing significant
to add to the brief discussion of clitics in that place, except for a novel analysis of gerunds in which the -ing
ending is taken as a clitic (see section 11.9). There is a good discussion of the issues involved in Taylor (1989:
176ff). Nor can I add to the equally brief discussion in Hudson (1984: 50ff) of compounds, which are also
single words composed out of smaller ones.
The status of words as the smallest units of syntax is now accepted in most mono-stratal theories, and is
controversial only in relation to GB. What is much more controversial is the WG claim that words are also the
largest units of syntax, in the sense that most of syntax is handled without reference to any larger unit. This is
possible if the grammar refers only to the relations between pairs of words (typically, but not only, dependency
relations); the structure of a whole sentence then consists of the total set of pairwise relations among its words,
and nothing more. For example, in Fred likes modern music, the four words are related in just three pairs:
Fred
likes
music
likes
modern
music
and once these relations are defined the analysis of the sentence's internal structure is complete. This is a very
different view of syntactic structure from the earlier versions of Phrase-structure Grammar (and especially from
those like Systemic Grammar which allow mother nodes to carry a great number of features which are only
loosely related to those of their daughters), but it is much less different from theories based on the X-bar theory
of constituent structure, in which all phrases are projections of lexical categories.
As we shall see shortly, the notion 'phrase' can be defined in WG in terms of dependency relations, but it is
probably never needed in the grammar - i.e. no rules refer to 'phrase', 'noun-phrase', 'verb-phrase' etc. What the
grammar does however refer to is the notion 'word-string' which we discussed in section 5.4. A word-string is a
string of words which may also be a phrase, but which is primarily defined by the part it plays in a coordinate
structure. Thus in He drinks coffee because he likes it and tea for purely social reasons, the words coffee
because he likes it make up a word-string, but not a phrase. Word-strings play an important part in the grammar,
but it is a completely different part from that played by phrases in constituency-based grammars because it is
unrelated to dependency notions: unlike a phrase, a word-string may have more than one root, and its
classification is completely unrelated to the classification of any of its roots.
6.2 DEPENDENCY
As we have already seen (section 5.5), most of syntax is handled, in WG, in terms of dependency relations,
which involve the relational categories 'head', 'dependent', 'root' and 'subordinate', where 'root' and 'subordinate'
are generalized versions of 'head' and 'dependent' respectively. For detailed justification of this theoretical
position I refer to other works, such as Anderson (1977), Kodama (1982), Hudson (1980a, b), Hudson (1984:
chapter 3), Siewlerska (1988: chapter 4), Mel'cuk (1988: 3ff.). Briefly, some of the more obvious advantages of
dependency analysis include the following:
1
Very general word-order rules can be formulated in terms of heads and their dependents (e.g. in
Japanese all heads follow their dependents, but in Welsh all heads precede their dependents).
2 Words which need to be related directly to one another can be so related in dependency grammar but not in
constituency grammar, where phrase nodes usually intervene - e.g. the relation between a verb and a
preposition that it selects lexically is a direct dependency, but only an 'aunt-niece' relation in terms of
constituency.
3 The use of dependency in non-coordinate structures leaves constituency free for use where it is needed, in
coordinate structures, so conjuncts like coffee at eleven in I drank coffee at eleven and tea at four are
unproblematic.
For each word, a phrase can be defined consisting of the word plus any words that are subordinate to it; e.g.
with great difficulty is a phrase whose root is with. (In terms of dependency chains, a word's phrase consists of
the union of all its down-chains.) As we have just explained, these phrases are entirely derivative, and actually
play absolutely no part in the grammar. Thus the relations between dependency structure and constituent
structure are precisely the reverse of those in constituency- based theories, where relational notions like 'subject'
can be defined in terms of phrase- structure configurations, but are never referred to in the grammar.
Like other linguistic categories, 'head' is defined by a combination of properties. Heads that have all these
properties are typical, but as we should expect there are exceptions - heads which have some but not all of these
properties. Zwicky (1985) lists six properties which define some head-like notion and argues that when applied
to a particular sample of constructions in English these properties actually disagree significantly, and show no
tendency to coincide; however Hudson (1987a) reviews Zwicky's evidence and shows that these particular
constructions can in fact be analysed in such a way that their properties are congruent. If this conclusion is
correct, and generalizes to a wide range of other constructions, then the notions of dependency reflect an
important characteristic of human language, namely that its syntactic structures are organized in such a way that
a typical construction contains one word which has enough properties in common with the prototypical 'head'
to count as the head of the construction.
The properties shared by most heads are the following. We can assume a construction C, consisting of the
head word H plus at least one other word D (for 'dependent').
1
C refers to a hyponym of what H refers to; e.g. big book refers to a kind of book, jam sandwich refers to
a kind of sandwich, Leaves fall in the autumn refers to a kind of falling, in which the faller is leaves and the
time is the autumn.
2 The semantic relation between H and D involves different parts of their respective meanings: H's sense, but
D's referent. For example, in picture of a girl the referent of a girl helps to define the sense of picture as
,picture of X', where X is the particular girl referred to by a girl.
3 Each dependent D takes its position from H, at least in the sense of having to be 'adjacent' to H (a notion that
we shall develop below). Because of this, in the autumn could not have been *the in autumn, because
autumn must take its position from the, and not from in.
4 H is referred to in any rule which restricts the relative position (i.e. before/after some other word) of any D.
In simple cases, the rule relates the position of D to that of H; and in some languages there are extremely
simple rules of this kind which generalize to virtually every construction, such as those for Japanese and
Welsh mentioned above. In more complex cases the position of one D may be fixed relative to that of
another D of the same H (e.g. the indirect-object of a verb in English must precede the direct object of the
same verb).
5 The external syntactic relations of C are all due to the properties of H: C can be used as it is used because H
is the kind of word that allows this. Typically there is one word outside C which is the head of H, and which
we can call E (for 'external'). Whatever dependency relation exists between E and H is due to the combined
requirements of these two words; e.g. if E requires its direct-object dependent to be a noun, and H is a noun,
then H may be the direct object of E. (This is what Zwicky means by defining the head as the
'morpho-syntactic locus' of its construction, and also as the word which is 'distributionally equivalent' to the
construction.)
6 The range of possible complement Ds is determined by H - i.e. H is the 'subcategorizand' and the 'governor',
in Zwicky's terms. That is, it is H that determines the pattern of complements within C. Adjuncts, alias
'modifiers', are different from other dependents, however, because H is not responsible for their occurrence
or interpretation, as we shall see in the next section and in section 9.3.
There is a wide measure of agreement about the dependency analysis of some constructions. For example,
everybody agrees that objects are dependents of verbs, and almost everybody takes subjects as dependents of
verbs as well. The reason why subjects are less clear than objects is because the form of a verb may vary
according to the subject's number (etc.), by so-called 'subject-verb agreement'. Since the subject is taken as
fixed and the verb as variable, this rule is often seen as evidence that (in this respect) the verb 'depends on' the
subject. However, there is a much simpler explanation: the subject is fixed not because it is the subject, but
because it is a noun and its number (etc.) are tied to its meaning, so the generalization is simply that the fixed
element in any 'agreement' relation is always a noun, irrespective of the direction of dependency.
6.3 DEPENDENCY IN MODERN SYNTACTIC THEORY
Dependency relations have recently become more important in the kind of syntactic theorizing that is reported
in English. The view that dependency is the basis of most of syntax is commonplace in the Slavic-speaking
world, and this tradition is now easily accessible to those of us who do not read Russian, thanks to Mel'cuk and
his colleagues in Moscow and Montreal (Mel'cuk 1979, 1988, Mel'cuk and Pertsov 1987), and to Sgall and his
colleagues in Prague (Sgall et al. 1986). We have been indebted until now to slavists such as Nichols (1978,
1986) and Kilby (see Atkinson et al. 1982) for keeping us in touch with this tradition. The dependency-grammar
tradition has also flourished in the German-speaking world, especially, of late, in the work of Kunze and his
colleagues in Berlin (Kunze 1975, 1982), and it could reasonably be described as the 'indigenous' syntactic
theory of Europe - so much so that it has been adopted as the basis for the European machine-translation system
EUROTRA.
More recently the dependency tradition has been recognized in an English-language textbook on syntax,
Matthews 1981, and also in two important books on English grammar by Huddleston (Huddleston 1984, 1988).
Moreover we now have a book-length description of Lexicase, a dependency-based theory of grammar which
Starosta and his colleagues in Hawaii have been developing since the early 1970s (Starosta 1988).
However it remains true that the dependency tradition is totally ignored in most introductory textbooks on
grammar, and in most discussions of syntax published in English. This is strange considering how frequently
the notion of dependency is invoked in referring e.g. to 'unbounded dependencies' and 'anaphoric dependence'.
Indeed a recent book about GB contains the following sentence (Koster 1987: 8): 'The most fundamental notion
of the theory of grammar is the dependency relation.' It is true that the dependency tradition has not produced a
body of theory of the kind that most theoretical linguists would find helpful. But the basic dependency approach
to grammatical analysis is a serious alternative to the constituent-structure analysis that most linguists take for
granted, and it is time that more attention was paid to the need to argue the case for constituent structure.
I shall now comment on the following recent developments in syntactic theory, all of which seem to show an
increase in the role of dependency.
1 Reduced information in phrase categories.
2 Increased interest in Categorial Grammar.
3 Increased use of grammatical relations and/or Case.
4 Increased use of 'head'.
5 The use of 'government' in GB.
Further comments on these trends can be found in Hudson (forthcoming a).
Reduced Information in Phrase Categories
In X-bar theory, the features of a phrase are just the same as those of its head with only one exception: the
number of 'bars' (which in GPSG are treated as a special kind of feature). This obviously means that any other
classificatory information that is provided by a phrase node is also available on the head node, so it makes no
difference whether the rules refer to the former (as in constituency-based grammars) or to the latter (as in
dependency-based grammars). This fact follows from the fact that the head of a construction is typically the
word whose properties determine how it is used. For example, the dependency rule 'the object of a verb is a
noun' is exactly equivalent, in its information content, to the constituency rule 'the object of a verb is a
noun-phrase'.
The main difference between constituency and dependency thus boils down to the role of 'bar' differences the difference between N, N' and N" for example. It is interesting to see the increased use of the notion 'Xmax',
which corresponds to the dependency notion of a phrase - i.e. a word plus all its dependents. In X-bar theory it
has to be stipulated, as a theoretical claim which might not be true, that any complement must be an 'Xmax',
(Stowell 1981: 70). In dependency theory this claim cannot help being true, because it follows from the general
principle that all valency requirements must be satisfied. Every word is thus the head of an 'Xmax', by definition,
and it is quite impossible to require some kind of dependent to be anything other than the head of an 'Xmax'.
It is also interesting to notice a tendency for the range of possible bar-levels to dwindle, from the exuberance
of the 1970's when three or four levels were contemplated to the austerity of e.g. Stuurman (1985), with a
maximum of one bar level - a restriction which Starosta has argued (1988) is equivalent to dependency
grammar.
Increased Interest in Categorial Grammar
A recent development in syntactic theory which has involved dependency directly is the increasing interest in
Categorial Grammar (CG), due to the work of Steedman (Steedman 1983, 1985, 1987), Dowty (1982, 1988a)
and others. According to both CG and (ordinary) dependency theory, each word has a 'valency', which (in
general) defines the kinds of dependents that it may take; and in both theories we decide whether a particular
string of words is well formed by applying extremely general rules to the valencies of the individual words to
see if they can be fitted together in such a way that all valencies are satisfied.
One special characteristic of CG is that a word's valency is reflected directly in the name of its grammatical
category (e.g. the category 'intransitive verb' might be named 'v/n', meaning 'something which combines with a
(subject) n(oun) to make a complete v(erb)'; or more standardly, 's/n', meaning 'something which combines with
an n to make a complete s(entence)'). This leads to a great many problems without any apparent benefits; for
example, it is impossible to use this naming system to name the category 'verb', because transitive and
intransitive verbs have distinct (and ungeneralizable) category-names. In other theories, including WG, categories have arbitrary names which allow them to be referred to in statements of many different kinds (e.g. in
statements about morphology as well as about syntax).
There are other important differences between CG and WG, such as the fact that dependency relations cannot
be specified directly in CG (e.g. as subject versus object), and the fact that CG appears not to be able to express
the fact that some dependent is optional, except by providing two distinct categorisations of the same word.
However, one of the distinctive characteristics of CG is an important contribution to general linguistic theory
which I have tried to incorporate into WG, namely the treatment of adjuncts.
The basic insight of CG is that an adjunct is less dependent on the head than a subject or complement is. Take
an example like He likes her because of her eyes. The verb likes has three dependents, be, her and because,
which all satisfy the first four criteria for head-dependent relations (hyponymy, sense-referent links, adjacency,
relative position and external relations).
In addition, he and her depend on likes for their existence - i.e. the verb provides a subject 'slot' and an object
'slot' plus a set of properties for each of them (relating to semantic roles, position, syntactic type, etc.). The same
is not true of because, which is a typical adjunct. Instead, because defines its own semantic relation (cause) to
its head and defines the kind of head it can have (typically a verb). It is easy to see that it does not fill a slot
provided by the verb, because if it did the slot would be filled up by one such adjunct; but in fact there is no
upper limit to the number of adjuncts that are possible, even where the semantic relations are the same (cf. He
likes her because of her eyes because of their brown colour because of his experiences of girls with brown eyes
... ). All this follows if we allow D to depend on H under two circumstances: if H provides a slot which D could
fill, and which is not already filled, or if D provides a slot (namely, 'head') which H could fill, and which is not
already filled. The former circumstances allow subjects and complements, the latter adjuncts. As we shall see in
discussing English, there are cases where both circumstances seem to apply, such as the 'complementcum-adjunct' of place after put.
Increased use of Grammatical Relations and/or Case.
We now turn to a different trend in modern linguistics, the tendency to recognize particular grammatical
relations (e.g. 'subject' and 'object') as basic. This is obvious in some theories which escaped the influence of the
Neo-Bloomfieldians - e.g. Functional Grammar (Dik 1978, Siewierska forthcoming) and Systemic Grammar
(Butler 1985) - but also in some theories derived from transformational grammar - Relational Grammar (Blake
1990) and Lexical-Functional Grammar (Bresnan 1982a).
Apart from these theories in which explicit grammatical relations are recognized openly as legitimate, they
have also found a place even in those theories which claim not to recognize them, namely GB and GPSG. In
both these theories, the features which an NP can have now include some which in fact have only one function,
to show its grammatical relation to its head. The features concerned include those which show the NP's 'Case' (a
more abstract notion than 'case', which is the traditional inflectional category). Particular Cases (e.g.
'Nominative') are used to distinguish different grammatical relations from one another in GPSG -e.g. to indicate
that the ,missing' dependent of an infinitive following easy must not be its subject (Gazdar et al. 1985: 150).
Similarly, the 'feature' PRD (for 'predicate') is used to distinguish predicative NPs from object NPs, something
which has always been a problem for theories in which grammatical relations are supposed to be derivable from
the position of an NP in its tree.
In GB, on the other hand, it is the notion of 'Case' in general, rather than particular Cases, that is important,
because of the 'Case Filter' (e.g. McCloskey 1988). This requires that every (overt) NP must have a Case, which
it is given by some element that governs it. As we shall see below the notion 'govern' is closely related to our
notion 'depend', but excludes adjuncts, so the GB Case Filter is similar in content to a general claim to the effect
that every noun must be either a subject or a complement. Examples like I saw him this morning, where an NP
is an adjunct, have received less attention than they deserve in the GB literature, but more importantly no
attempt has ever been made (so far as I know) to demonstrate that the GB apparatus based on abstract (and
invisible) Case is preferable to one based on abstract and invisible grammatical relations or dependencies.
A further example of the trend towards 'covert' relations is noted by Radford (1988: 555), who explains that
categories like NP may name either a slot or its filler; so when NP movement moves an NP from the object
'position' to the subject 'position', it is moving a filler (labelled 'NP') from one slot (also labelled NP') to another.
Radford compares this change to a tenant moving from one house to another, a useful analogy except that it
would be very odd to assign tenants and houses to the same category! While NP is used ambiguously, either to
indicate class-membership or to indicate function, Comp is very clearly a functional category (though this fact
is not generally acknowledged), given that in the Barriers tradition (Chomsky 1986b) it can be 'filled' not only
by complementisers proper, such as THAT and WHETHER, but also by finite verbs.
Yet another example of the surreptitious presentation of relational information is the increasing use of indices
to show dependency relations between words. This is particularly highly developed in Brame (1981), who
concludes that 'it appears that we can do away with tree structure and bracketing altogether. The necessary
relations are shown by the indices themselves.'
Increased Use of 'Head'
A further point of contact between dependency theory and modern syntactic theories is the central place of the
notion 'head' in X-bar theory - which indeed could be defined as a version of phrase- structure grammar in
which every phrase is required to have a lexical head, where 'head' is used in almost the same way as in
dependency theory. According to Radford (1988: 545ff) every construction is now assumed to have a head, an
assumption expressed as 'The Endocentricity Constraint'.
One important difference between the two theories is that for dependency grammar a word is the head of one
or more other words, whereas for X-bar theory it is the head of the phrase containing it - which means, in effect,
that the 'head' of X-bar theory corresponds to the 'root' of WG. Thus in the phrase in London, in is the head of
London but the root of in London.
This difference is not just a matter of terminology, but reflects the fact that in X-bar theory there is no direct
relationship between the head of a phrase and its various dependents; it is related to these only via a more or
less long chain of 'mother' nodes. This fact is particularly troublesome if it is necessary to refer to a verb's
subject, because the subject is not a sister but a 'niece' of its verb, so the relation is quite indirect. The subject
does in fact need to be referred to because the verb assigns a semantic role to its subject hence the need for the
awkward stipulation that a verb assigns a semantic role to an NP if this is either its complement or its subject
(Horrocks 1987: 103). However in some languages it is also necessary for purely syntactic reasons, such as the
fact that some verbs require their subjects to be in some case other than nominative; this seems to be so in
Icelandic, for example (see the references in Zaenen et al. 1985). Needless to say, no such problems arise in
dependency analyses, because the verb is directly related to its subject as well as to its complements; and if we
distinguish adjuncts from other dependents in the way suggested earlier, it will automatically follow that the
verb assigns no semantic role to adjuncts (because these assign semantic roles to themselves).
Another significant difference between dependency theory and X-bar theory is that in the latter the head is
not indicated, as such, in the syntactic structures that are generated, even if it is represented by a special variable
in the underlying rules (as it is in GPSG). This interesting point is due to Zwicky (1986), who argues that GPSG
should mark heads explicitly by means of features (in the same way that it uses features for Case). It is true that
the head of a construction can always be identified on the basis of an X-bar phrase-structure, just as any other
grammatical relation can, but no rule or generalization can refer to it in the grammar. It is hard to know what
might count as an example of such a rule, but it is interesting to speculate how the lexical entry for a word like
enough could be formulated so as to locate it (exceptionally) after its head.
The use of 'government' in GB
Our final cross-theory comparison involves the notion of 'government' which is so central to GB theory (to the
extent of being part of the theory's name). As mentioned earlier, an NP is assigned Case by the element that
governs it, a claim which is linked explicitly to the traditional idea that a verb or preposition 'governs' its
complement, and determines its case. In dependency analyses, of course, government is closely linked to
dependency, since the governor is always the head of the governed and the latter is always its complement or
subject - i.e. the two kinds of dependent whose properties are determined by the head.
It is interesting to see how hard it is to define the notion 'governor of W' when dependency relations as such
are not available. If W was always a complement or subject then the main problem would be that the notion
'head' is not available in phrase markers, as mentioned above. W would always be either a sister of its governor
or in whatever relation the subject stands to the finite element that assigns it nominative case. But the problem is
compounded by a number of analytical decisions which make the governor even harder to find, notably the
decision not to recognize the subject of the infinitive in examples like (1) as also being the object of the main
verb.
(1) I expect him to win.
In GB expect assigns case to him, so it must also govern him; but him is not a complement of expect, being
merely the subject of to (win). This requires a much more complex definition of 'governor', which allows a word
to govern not only its sisters, but also its nieces - the 'specifiers' of its sisters (Chomsky 1986a: 162). It is
particularly interesting to see that another extension is allowed (ibid.) which means that a word governs the
head of each of its sisters, because this move makes government into a relation between single words - the
governor and the head of each governed phrase.
The result of these qualifications to the definition of government is that government is exactly equivalent to a
non-adjunct dependency in WG. The difference between the two theories is that the relation can be read directly
off the basic syntactic structures of WG, whereas in GB it has to be inferred via a relatively complex and
stipulative definition. What has been noticeably absent from the literature is any discussion of the relative
merits of deriving grammatical relations and government relations from constituent structure, as opposed to
deriving the latter from the former. This is particularly odd now that it is becoming clear that the same kind of
configurational relations apply not just to government but also to subcategorization, theta-marking, agreement,
anaphor binding, NP and WH movement, obligatory control, predication and gapping (Koster 1987:13).
6.4 RELAXING THE FORMAL CONSTRAINTS ON DEPENDENCY STRUCTURES
One of the reasons for the relative neglect of dependency theory by modern linguists is surely the fact that
dependency grammarians have tended to favour a very conservative version of dependency theory which makes
it equivalent to X-bar theory (except for the explicit recognition given, in the structures generated, to the status
of the head). We all know that natural languages contain phenomena that cannot be explained by means of an
ordinary context-free phrase-structure grammar, of which X-bar theory is an example (short of a massive
expansion of the feature system, as in GPSG and HPSG) so this simple kind of dependency grammar adds little
to our understanding of syntax and has little appeal to linguists who are accustomed to more powerful systems
like GB, LFG, GPSG and CG. What I shall argue below is that a more powerful version of dependency
grammar is needed.
First, however, I must explain the very general principle that allows dependency structures to be mapped onto
well-formed constituent structures. In Europe this principle is called the principle of 'projectivity' (e.g. Mel'cuk
1988: 35), but I have followed Robinson (1970) in calling it the 'adjacency' principle. I shall continue to do so,
but with the warning that this adjacency principle is quite different from its namesake in GB (Horrocks 1987:
100). The effect of the WG adjacency principle is that every phrase (defined as above) must be continuous.
For example, consider the phrase with great difficulty, which involves two dependency relations:
with  difficulty
great  difficulty
(We shall use arrows like these, pointing from head to dependent, as our standard notation for dependencies.)
Each of these relations respects the relevant rules of English, which require a preposition's complement to
follow it, and a noun's attributive adjective to precede it. But what rules out the ungrammatical *great with
difficulty, with the same dependency relations? The ordering restrictions on the individual dependencies are still
satisfied - difficulty follows with, and great precedes difficulty - but the phrase defined by difficulty is now
discontinuous (great ... difficulty); hence the ungrammaticality.
A preliminary version of the Adjacency Principle is as follows:
The Adjacency Principle
A word must be adjacent to any other word which is its head.
Roughly speaking, a word is adjacent to its head provided it is as close as possible to its head, given the needs
of its own subordinates to be adjacent to their heads. More precisely:
Adjacency (preliminary)
D is adjacent to H provided that every word between D and H is a
subordinate of H.
The Adjacency Principle rules out *great with difficulty because great is not adjacent to its head, difficulty,
being separated from the latter by a word, with, which is not a subordinate of difficulty. In contrast with the GB
'adjacency principle', this allows one dependent to be separated from its head by another dependent of the same
head, as in big black boots, where both big and black are dependents of boots. The following are examples of
dependency structures that are permitted by the Adjacency Principle. Once again, arrows point from heads to
dependents.
(1)
big
black
boots
(2)
Fred often
drinks
wine with
meals.
(3)
Students with problems worry busy
lecturers.
Just as with phrase-structure grammars, it is clear that the Adjacency Principle, as formulated here, captures
an important fact about language: that phrases are normally continuous. There are said to be languages of which
this is not true - so called 'flat', or 'W*', languages - but even these may turn out to follow the same principle if
we bear in mind that even in languages of this kind there is a limit to the apparent discontinuity, namely that any
phrase defined by a verb (i.e. any clause) must be continuous. For a brief discussion of this possibility see
Hudson (1984: 81f).
However, as we all know there are constructions which do involve discontinuous phrases, and which are the
source of a good deal of the complexity (and interest) of syntax. Two fairly uncontroversial examples are given
below. Their dependency diagrams show the discontinuity, but are incomplete in certain crucial respects that we
discuss below.
(4)
It
keeps
raining.
(5)
What do you think he
said?
In (4), the phrase it ... raining is discontinuous. This has to be recognized as a phrase because the choice of it is
due to the requirements of raining. (Notice that the choice cannot be explained, as in GB, simply by treating it
as a default for use when a subject is obligatory but no NP is required by the meaning; it is needed as subject of
RAIN even when the latter is a gerund, and therefore need not have a subject - compare It raining during the
outing spoilt the fun with *Raining during the outing spoilt the fun.) And in (5) what ... said is a discontinuous
phrase, because what has to be taken as object of said in order to explain its semantic relation to the rest of the
sentence, and also to explain why said, which has to have an object, is satisfied.
One way to allow these phrases would be simply to abandon the Adjacency Principle, but this would then
leave us without any explanation for the ill-formedness of phrases like *great with difficulty - a clear case of
throwing the baby out with the bath water. What is needed is a slight relaxation of the Adjacency Principle,
which will build on a relaxation of the normal assumptions about the formal properties of dependency
structures. To summarise, if we allow a word to have more than one head, then we can rephrase the definition of
adjacency so that an intervening shared head guarantees adjacency. In this way we shall maintain the Adjacency
Principle, while allowing discontinuities under specified conditions.
First, then, how many heads can a word have? The traditional answer in dependency theory has always been
that each word has just one head, the only exception being that the root of the whole sentence - typically a finite
verb - has no head (by definition). No justification is ever given for this restriction, and the only justification
one can imagine is that it brings dependency structures into line with constituency structures. With this
restriction, plus some form of the Adjacency Principle, any dependency structure can be mapped onto a
well-formed constituent structure. But we know that constituent structures are inherently incapable of showing
discontinuous phrases; this limitation is inherent because a constituent structure is by definition equivalent to a
bracketing of the word-string into continuous sub-strings (Chomsky 1965: 64). Thus proposals to allow
discontinuities in constituent structure (e.g, Sampson 1975, McCawley 1987)
strike at the very basis of the theory. Since we know that some phrases are discontinuous, it is obvious that
constituent structure is inadequate without some considerable enrichment, so it seems odd to take constituent
structure as a model for the formal properties of dependency structures.
In contrast, dependency structures are simply sets of pair-wise relations between single words, without
implications for word-order, so the ban on discontinuity has to be stipulated (by means of the Adjacency
Principle). It is possible to imagine a language in which the Adjacency Principle does not apply at all - and
indeed it is still open to debate whether so-called flat languages are examples. So proposals to allow
discontinuities in dependency structure, by allowing a word to have more than one head, are simply matters of
detail. A few proposals of this kind have been made by other dependency linguists (notably Nichols 1978), but
WG is probably the only theory which builds on this possibility in a systematic way. It is also interesting to note
that Baker (1989) suggests a GB analysis of serial verb constructions in which an NP which is shared as object
by two verbs would in effect have them both as its head.
According to WG, then, a word normally has just one head, but exceptions are allowed in both directions.
Some words (e.g. finite verbs) are allowed to have no head at all; and others are allowed to have more than one
head. For example, take sentence (4), It keeps raining. We have seen that it is a dependent of raining, but there
are also good reasons for treating it as a dependent of keeps. For example, the latter is a tensed verb, and in
English tensed verbs must have a subject. In this sentence keeps is well-formed, so it must have a subject; and
since it is the only candidate, it must be the subject, and therefore a dependent, of keeps. Therefore it is a
dependent of two words, and has two heads. The grammar must of course explicitly permit each part of this
structure, and it allows the double-head pattern for it by a rule which refers to the grammatical relation between
keeps and raining, 'incomplement' (for 'Incomplete complement'): the subject of the incomplement of word W is
also the subject of W. The complete dependency structure for (4) is therefore as shown in (4').
(4)
It
keeps
raining.
How does this extra dependency help to explain the discontinuity of it raining? The crucial point is that in the
new structure, the word which separates it from raining is the head of both of them, a relation which makes it
not a 'separator', but in fact a strong 'binder'. After all, this is the pattern in e.g. John loves Mary, where it is
loves that holds the other two words together. It would be very natural, then, if this pattern were to count as
adjacency. On this assumption we shall revise our definition of adjacency.
Adjacency (revised and final)
D is adjacent to H provided that every word between D and H is a subordinate either of H, or of a
mutual head of D and H.
It should be remembered that we defined 'subordinate of X' in such a way that X counts as one of its own
subordinates; the subordinates of X are all the words on a down-chain of X, and X is part of all its down-chains
(as well as of all its up-chains). Bearing this in mind, we can see that the revised definition of adjacency solves
our problem. It is adjacent to raining in It keeps raining because the only word between it and raining is the
head of both of them.
This formulation has the advantage of preserving the basic restriction on discontinuities which rules out
structures like *great witb difficulty, without adding greatly to the search-space for a parser. This is so because
discontinuity presupposes multiple heads, but these are only allowed by quite specific rules (e.g. the one given
above which refers to 'incomplement'). Indeed, the rules concerned have very similar formal properties to those
needed for 'agreement' phenomena, and most of them have little more power. For example, the 'incomplement'
rule is given below, together with the rule that links the features of a determiner to those of its following
common-noun.
[1] subject of incomplement of word = subject of it.
[2] feature of complement of determiner = feature of it.
This is the only mechanism in WG for handling discontinulties (apart from those that arise in coordination),
and it is very similar in spirit to the GB analyses in which discontinuities are produced by the application of
Move-alpha which produces a chain of NP positions (cf. dependency relations) all shared by the same 'filler'
NP. For example, in Fred keeps talking, the NP Fred effectively occupies two NP subject positions at the same
time, each of which gives it a different head. However in contrast with GB, the WG analysis shows the relevant
relations without the use of empty NP nodes or co-indexing or movement transformations, all of which vastly
increase the search-space for a parser. What WG has instead of these things is the possibility of building a
relatively rich relational structure among surface words. We shall see that this is a very general and important
characteristic of WG.
The Adjacency Principle, with the definition of adjacency that I have just given, applies very generally. As I
mentioned earlier, it is debatable whether it applies to languages with very free word-order, and this is certainly
an important research topic. However until there is incontrovertible evidence to the contrary I shall assume that
it applies to all languages, and is never overridden (except in performance). However I should prepare the
reader for a slight twist to its interpretation that we shall meet in section 11.9, in connection with clitics, where
we shall see that if one word W is part of a larger word W', then the restrictions that the Adjacency Principle
places on the position of W can be satisfied by the position of W' instead; for example, in the French sentence Il
en mange beaucoup, 'He eats a lot of it' (literally: 'He of-it eats a-lot'), en is part of a larger word il en mange,
which is adjacent to the head of en, beaucoup though en itself is not.
Having relaxed the restrictions on the number of heads per word, we should now consider what other formal
restrictions apply to dependency structures. Another candidate is a ban on mutual dependency, or more
generally, a ban on 'loops' in a dependency structure. According to this ban, if A is subordinate to B, then B
must not be subordinate to A. Once again, this restriction is observed in typical dependency trees, but there are
exceptions where it is easy to show the need for a loop.
Take the sentence What happened?, for example. It is easy to see that what depends on happened, because the
former is clearly fulfilling the latter's need for a subject. But it is almost as easy to see that happened depends
on what, because the latter can occur without the former, but not vice versa - as in exchanges like:
A: Something happened. B: What?
Moreover, if the sentence is used as a complement to a verb such as wonder, this is by virtue of the
(Interrogative) properties of what, not those of happened. The conclusion, then, is that each of them depends on
the other, though the particular dependency relation is different in each case - what is subject of happened, but
happened is complement of what. Each of these dependencies is the result of a distinct rule, and in no case does
a single rule impose an interdependence; but we now know that there is at least one construction in which
mutual dependency exists.
A complication of a different kind arises in coordinate structures, because a given dependency can be shared
by either a number of different dependents, or a number of different heads. This can be seen in sentence (6),
where both Fred and Mary are subjects of arrived, and in (7), where Fred is subject of both stretched and
yawned.
(7)
Fred and Mary arrived.
(8)
Fred
stretched and yawned.
The extended 'equals' sign shows that the relation is the same as the one to which it connects - e.g. in (7), that
Fred has the same relation to arrived as Mary does. This kind of sharing, found in coordination, provides
another set of conditions under which a word may have more than one head, but it also interacts with all
dependency rules to multiply the number of possible structures. We shall explain these possibilities further in
chapter 14.
What, then, are the formal restrictions on dependency structures? If we look at the total set of possibilities, we
find that the Adjacency Principle is the only generalization that remains true. But if we restrict ourselves to
typical cases, we find that all the traditionally recognized restrictions hold - just one head per word, no
overlapping phrases, no loops. In conclusion, then, what we find is that the notion 'dependent' has a set of
properties which are characteristic of typical cases, but exceptions are allowed. It would have been surprising if
we had found anything different.
6.5 GRAMMATICALRELATIONS AS TYPES OF DEPENDENT
Dependency theory has always allowed one word to have more than one dependent, in contrast with its single
head. Different dependents of a single word or word-type often have different characteristics, all of which need
to be defined in rules, so it is necessary to distinguish one dependent from another - hence the traditional set of
'grammatical relation' categories like 'subject', 'object', 'complement' and so on. In contrast, there is generally no
need to subdivide 'head' because each word has just one head, so its head can be distinguished from other heads
just by referring to 'head of W', where 'W' stands for the word or word-type concerned. However now that we
have allowed more than one head per word we face the possibility of having to distinguish different types of
head from one another, and we shall indeed find this to be necessary in section 13.3.
The same possibility of referring to the relatum also exists for dependents, of course, so we can refer to
'dependent of W', where W ranges over a wide variety of different word-types, each of which takes a different
kind of word as its dependent. The need for further subdivisions of 'dependent' arises only in two cases: either
where a word allows two or more dependents which need to be distinguished, or where some generalization can
be made across word-types. Both of these situations are commonplace, so we need to sub-classify dependents.
Unlike many other theories in which relational categories are basic (e.g. Relational Grammar, Functional
Grammar, Lexical-Functional Grammar), WG does not attempt to provide a universal inventory of grammatical
relations. On the contrary, it is based on the assumption that the subdivisions of 'dependent' have much the same
status as the subdivisions of 'word', since properties are generalized in both cases by means of default inheritance. In both cases, we may expect some categories to recur in many, or even all, languages - e.g. it is widely
believed that every language recognizes a category similar to the 'subject' of English, just as every language
probably recognizes a category 'verb'. At the same time, some categories are certainly specific to particular
languages - e.g. 'indirect-object', in the English sense, does not exist in French, nor does 'auxiliary verb' (which
we shall in fact call 'polarity-verb').
One might expect that the most general categories, near the top of the hierarchy, would be less parochial than
their subdivisions. This may well be the case for word-types, but according to the analysis of English that I shall
offer, it is not true of grammatical relations. The very first distinction applied to 'dependent' is that between 'predependent' and 'post-dependent', reflecting the fact that English is a 'mixed' language in which some kinds of
dependents precede their heads, and others follow. The distinction between dependents that typically (though
not always) precede and follow their heads, respectively, is fundamental to a number of important rules (e.g.
passivization and extraction). However it is clear that no such distinction would be needed for languages in
which the order of dependents is consistent across constructions (e.g. Japanese and Welsh), so 'pre-dependent'
and 'post-dependent' are needed for English but not for these languages.
6.6 SUMMARY OF SYNTACTIC STRUCTURES AND NOTATION
According to WG a sentence has just one syntactic structure (barring ambiguity), which must of course show all
the relevant surface facts about each word - including its position relative to other words - as well as the
relatively abstract facts about its relations to other words, most of which are shown explicitly as syntactic
dependency relations. All these facts are expressed in WG as propositions, from which inferences can be drawn
and which can themselves be produced by inference. However, there is also a friendlier notation which consists
of an ordinary orthographic presentation of the words concerned, with time increasing in the usual way from left
to right, plus various arrows and brackets to show the syntactic structure.
As we have seen, dependencies are indicated by arrows pointing from the head towards the dependent. The
particular kind of dependency - i.e. the 'grammatical relation'- can be shown by the relation's name
superimposed on the arrow to show the type of dependency. The following is an example.
(9)
subject
subject
complement
The
incomplement
man
is
sleeping.
Coordination, on the other hand, involves constituent structure. Once again the facts concerned are properly
stored and processed as propositions, but there is a notation to help the human user which we introduced in
section 5.4. Coordinations are surrounded by curly brackets: {}, and conjunct-strings - i.e. conjuncts which are
not themselves also coordinations - by square brackets. As we saw above, there is a special notation for showing
dependency relations that are shared: =. The last three examples illustrate these possibilities.
(10)
subj
He
obj
comp
ate
{[an
apple] and [two bananas]}
(11)
subj
ind obj obj
obj
incomp
subj
He {[gives her presents] and [makes her
post-dependent
(12)
subj
obj
She reads {[novels here] and [journals there]}.
laugh]}.
Download