HPSG without PS? Richard Hudson, UCL draft: August 1995 1. Introduction There are two ways of thinking about the structure of a simple example such as Small babies cry: in terms of phrases and the relations among their parts or in terms of the words and their relationships to one another. At a very elementary level, the two approaches could be diagrammed as in Fig. 1. 1 Fig. 1 The arrows will be explained shortly, but their main significance is to represent some kind of `horizontal' word-word dependency, in contrast with the vertical relationships (following the diagonal lines) in the first diagram. The approaches are based respectively on phrase-structure (PS) and dependency-structure (DS). Both diagrams show that small babies combine to form a phrase, but they show it in different ways. In the PS analysis the phrase itself is explicit, and the word-word relationship is implicit: node 4 stands for the phrase, and the lines connect 1 and 2 to 4, but not to each other. For the DS analysis this balance is reversed: the arrow shows the word-word relationship explicitly, but the resulting phrase is left implicit. The purpose of this paper is to argue that syntactic analysis which includes DS should not use PS as well; more precisely, I shall argue that this is true of most of syntax, in which constructions are headed (and involve subordination), though I shall not try to argue the same for coordination. (Indeed, I have argued elsewhere that coordination is precisely the one area of syntax where PS is appropriate; see Hudson 1990: chapter 14.) Virtually everyone accepts DS as part of syntax, even if not by name - the notion `long-distance dependency' makes it explicit, but government, agreement, valency (alias subcategorization), and selection are all horizontal dependency relationships, and all word order rules seem to be expressible as dependencies. Similarly most theories now recognise `grammatical relations' such as head, complement, adjunct and subject; although usually expressed in terms of a function in the larger phrase, these can all be translated easily into types of word-word dependency. As PS was originally defined by Chomsky, none of these notions was available; so there really was no alternative to PS as a basis for syntactic analysis. But now that so many dependency notions are available in most syntactic theories, it is time to ask whether we still need the PS as well. The question applies particularly urgently to Head-driven Phrase Structure Grammar (HPSG; Pollard & Sag 1994) as can be seen from the simplified version of Pollard and Sag's analysis of Kim gives Sandy Fido (p. 33) in Fig. 2. 2 Fig. 2 The most interesting thing about this diagram is the way the verb's structure cross-refers directly to the nouns by means of the numbers [1], [2] and [3]. These cross-references are pure DS and could be displayed equally well by means of dependency arcs. Almost equally interesting is the way in which the verb shares its class-membership, indexed as [4], with the VP and S nodes. An even simpler way to show this identity would be to collapse the nodes themselves into one. The only contribution that the phrase nodes make is to record the word-word dependencies via their `SUBCAT' slots: the top node records that the verb's subcategorization requirements have all been met (hence the empty list for SUBCAT), while the VP node shows that it still lacks one dependent, the subject. This separation of the subject from other dependents is the sole independent contribution that PS makes in this diagram; but why is it needed? Pollard and Sag argue persuasively (Chapter 6) against using the VP node in binding theory, they allow languages with free constituent order to have flat, VP-less structures (40), and in any case HPSG recognises separate functional slot for subjects (345). It is therefore important to compare the HPSG diagram in Fig. 2 with its pure-DG equivalent in Fig. 3? 3 Fig. 3 What empirical difference is there between these two diagrams? What does Fig. 3 lose, if anything, by not having a separate node for the sentence? Could an analysis like Fig. 3 even have positive advantages over Fig. 2? Questions like these are hardly ever raised, less still taken seriously. Pollard and Sag go further in this respect than most syntacticians by at least recognising the need to justify PS: But for all that a theory that successfully dispenses with a notion of surface constituent structure is to be preferred (other things being equal, of course), the explanatory power of such a notion is too great for many syntacticians to be willing to relinquish it. (p. 10) Unfortunately they do not take the discussion further; for them the `explanatory power' of PS is self-evident, as it no doubt is for most syntacticians. The evidence may be robust and overwhelming, but it should be presented and debated. A reading of the rest of Pollard and Sag's book yields very few examples of potential evidence. PS seems to play an essential role only in the following areas of syntax: in adjunct recursion (55-6), in some kinds of subcategorization where S and VP have to be distinguished (125), in coordination (203), in the analysis of internally-headed relative clauses, for which they suggest a non-headed structure with N' dominating S (233). Apart from coordination (where, as mentioned earlier, I agree that PS is needed) the PS-based analysis is at least open to dispute, though the dispute may of course turn out in Pollard and Sag's favour. The question, then, is whether a theory such as HPSG which is so well-endowed with machinery for handling dependencies really needs PS as well. My personal view is that this can now be thrown away, having served its purpose as a kind of crutch in the development of sophisticated and explicit theories of syntax; but whether or not this conclusion is correct, our discipline will be all the stronger for having debated the question. The rest of the paper is a contribution to this debate in which I present, as strongly as I can, the case for doing away with PS. The basis for my case will not be simply that PS is redundant, but that it is positively harmful because it prevents us from capturing valid generalisations. My main case will rest on the solutions to two specific syntactic problems: the interaction of ordinary wh-fronting with adverb-fronting as in 0, and the phenomenon in German and Dutch called `partial-VP fronting', illustrated by 0. (1) Tomorrow what shall we do? (2)Blumen geben wird er seiner Frau. Flowers give will he to-his wife. `He'll give his wife flowers.' First, however, I must explain how a PS-free analysis might work. 2. Word Grammar My aim as stated above is `to argue that syntactic analysis [of non-coordinate structures] which includes DS should not use PS as well'. Clearly it is impossible to prove that one PS-free analysis is better than all possible analyses that include PS, so the immediate goal is to compare two specific published theories, one with PS and the other without it, in the hope of being able to isolate this particular difference from other differences. Fortunately there are two such theories: HPSG and Word Grammar (WG; see Hudson 1984, 1990, 1992, 1993, 1994, forthcoming; Fraser and Hudson 1992; Rosta 1994). Apart from the presence of PS in HPSG and its absence from WG, the two theories are very similar: both are `head-driven' in the sense that constructions are sanctioned by information on the head word; both include a rich semantic structure in parallel with the syntactic structure; both are monostratal; both are declarative; both make use of inheritance in generating structures; neither relies on tree geometry to distinguish grammatical functions; both include contextual information about the utterance event (e.g. the identities of speaker and hearer) in the linguistic structure; and perhaps most important of all for present purposes, both allow `structure sharing', in which a single element fills more than one structural role. Admittedly there are some theoretical differences as well: HPSG allows phonologically empty elements, HPSG distinguishes complements from one another by means of the ordered SUBCAT list rather than by explicit labels such as `object'. And not surprisingly there are disagreements in published accounts over the vocabulary of analytical categories (e.g. Pollard and Sag's `specifier' and `marker') and over the analysis of particular constructions (e.g. Hudson's analysis of determiners as pronouns and total rejection of case for English; see Hudson 1990: 268ff, 230ff; 1995a). However these differences, both theoretical and descriptive, are only indirectly related to the question about the status of PS, so we can ignore them for present purposes. One problem in comparing theories is to find a notation which does justice to both. The standard notation for HPSG uses either attribute-value boxes-within-boxes or trees, both of which are specific to PS, whereas DS structures are usually shown in WG by means of arrows between words whose class-membership is shown separately. To help comparison we can start by using a compromise notation which combines WG arrows with the HPSG unification-based notation, so that the information supplied by the grammar (including the lexicon) will be a partial description of the structures in which the word concerned may be used. For example, a noun normally depends on another word, to which it is connected by an arrow, and (obviously) can be used only at a node labelled `noun'; so Mary emerges from the grammar with the point of an arrow (whose shaft will eventually connect it to the word on which it depends), and also with the label `N' for `noun' (as well as `nN' for `naming noun', alias `proper noun'). In terms of morphosyntactic features it is singular, i.e. `[sg]'. A simple way of showing this information is by an entry like 0: (3) Mary N, nN [sg] For some words the grammar supplies a little more information. For example, deeply must be an adjunct (abbreviated to `a') of whatever word it depends on, and he must be subject (`s') of a tensed (`[td]') verb (which typically follows it). (4) -a deeply Av (5) he sN, pN V [td] For a slightly more interesting word consider loves. As in HPSG this is supplied with a valency consisting of a singular subject and any kind of noun as object. These requirements will eventually be instantiated by dependencies to some other word, which we show by a labelled dependency arrow. The English word-order rules fix their (default) positions, hence their positions to the left and right of loves in the entry. Unlike the previous examples, loves can be the `root' of the whole sentence, so it does not need to depend on anotehr word (though it may depend on one, in which case it is the root of a subordinate clause); this is shown by the brackets above the top arrow. (6) () N s- loves -o N [sg] V Putting these four entries together generates the structure for He loves Mary deeply in Fig. 4. 4 Fig. 4 Word order is handled, as in HPSG, by means of separate `linear precedence' statements. Some apply to very general patterns, the most general of all being the combination of any word with its `anchor' (the word on which it depends1). By default the anchor comes first in English, though this default may be overridden by increasingly specific patterns. At the extreme of 1The word anchor has not been used before in this sense, so far as I am aware. The normal term in dependency analysis is either head or regent, and I have always used head in my own work. However it has caused a great deal of confusion for readers familiar with PS uses of the same term. For example, in a PP such as with great care, the `head' of great care is care for PS but with in DS. (More precisely, with is the head of care, but the looser terminology is tempting.) This confusion is a potentially serious problem and best avoided. In any case, the notion `head' isn't very helpful as a metaphor for word-word dependencies. Regent is tied to complements and subjects which are `governed' by the word on which they depend, so it brings false expectations for adjuncts. The metaphor behind the word anchor is that the sentence-root is a fixed, firm, `anchor' for the words which depend on it, each of which then acts as an anchor for the words that depend on it, and so on recursively along each chain of dependencies. specificity are combinations of specific lexical items, which we can illustrate with the adverb deeply. Normally this follows a verbal anchor as in 0-0, but it can precede the verb resent as in 0: (7) I love her deeply *I deeply love her. (8) I slept deeply. *I deeply slept. (9) We looked deeply into each other's eyes. *We deeply looked into each other's eyes. (10)I resent the suggestion deeply. I deeply resent the suggestion. This idiosyncrasy can be shown by a special lexical entry for the sequence deeply resent, supplementing the normal entry: (11) () deeply a- resent -o N Av V This example illustrates an important general characteristic of WG, which it again shares with HPSG. Default inheritance and unification allow complex and specific entries to be composed on the basis of much more general entries, with the consequence that there is no clear division between `the lexicon' and `the grammar'. In WG the basic units of grammatical description are in fact elementary propositions such as those in 0. (12) resent is a verb. resent has an object. The object of a verb is a noun. A word's anchor is obligatory. A verb's anchor is optional. A word follows its anchor. deeply is an adverb. An adverb may depend on a verb. deeply may both depend on resent and precede it. Only the last of these propositions is specific to the sequence deeply resent; but all the rest are available in the grammar, and when applied to this sequence they generate the more complex structure in 0. Even at this elementary stage of explanation we can illustrate one of the positive disadvantages of PS. In a PS-based analysis, deeply is the head of an adverb-phrase and resent is the head of a verb-phrase, so the combination is in fact not a combination of words, but of phrases. In a tree-structure, the two words are likely to be separated by at least one AP and two VP nodes. These cannot simply be ignored in a lexical entry, because they are part of the definition of the relationship between the words. In contrast, a pure-DS analysis shows the words themselves as directly related to each other. A similar problem arises with lexically selected prepositions such as the with required by cope or the on after depend, which can be handled very simply and directly in WG (with `c' standing for `complement'). (13)( ) cope -c with V P In contrast, PS imposes a PP node between the verb and the preposition, so the only sister node available for subcategorization by the verb is labelled simply PP; but cope does not simply take a PP, it takes one headed by with. Pollard and Sag discuss the fact that regard lexically selects a PP headed by as, but their suggestion that the relevant phrases could be identified by the feature [+/AS] (ibid: 110) is hard to take seriously. Similarly, stop requires a subordinate verb to be a present participle, which can easily be expressed in a DS entry as a restriction imposed directly by one word on the other: (14) ( ) stop -c V V [prpt] But if PS is available the first verb's sister is the VP headed by the participle, so the latter's inflectional features have to be projected up to the phrasal node. The problem in each case is that the phrase node is a positive hindrance to satisfactory analysis. The problems created can of course be solved by projecting the head word's features onto the phrase; but the PS is part of the problem, not of the solution. 3. Structure sharing and (dis)continuity As mentioned earlier, perhaps the most important characteristic of HPSG and WG is the notion of `structure sharing'. (`It is not going too far to say that in HPSG structure sharing is the central explanatory mechanism ...', Pollard and Sag 1994:19) In both theories virtually all the major complexities of syntactic structure require some kind of structure sharing. We start with a simple example of subject-to-subject raising, which I shall assume needs no justification. The entry for (intransitive) stop shows the raising as structure sharing whereby the two verbs share the same subject: (15) () N s- stop -c V ___ s _/[prpt] The sharing appears in sentence structure as two dependencies converging on the same word as in the two diagrams of Fig. 5. The second diagram shows the recursive application of raising where stop is combined with another raising verb, the auxiliary have. 5 Fig. 5 Structures such as this are a major innovation in dependency theory (as they once were in PS theory) because they disrupt the normally simple relationship among dependencies, phrases and linear order. If we assume that all the words which depend (directly or indirectly) on a single word constitute a phrase, it remains true that phrases are normally continuous, i.e. not interrupted. Normally an infringement of this generalization leads to ungrammaticality. For example, what happens if we combine the following entries? (16)-a after -c N P (17) parties N [pl] (18)big a- N A The ordering restrictions on after and big require parties to follow both these words, but nothing in the entries restricts the relative order of after and big; and yet after big parties is fine while *big after parties (with big depending on parties) is totally ungrammatical. The obvious explanation is that big parties is a phrase, so it must be continuous - i.e. it must not be split by a word, such as with, which is not part of the phrase. Traditionally PS-based theories have assumed that phrases should be equivalent to a bracketing of the string of words, which means that discontinuous phrases are ruled out a priori as a fundamental impossibility - a discontinuous phrase simply cannot be separated by brackets from the rest of the sentence, because any brackets round the whole phrase must also include the interruption. Admittedly alternative views of PS have been proposed in which discontinuity is possible (McCawley 1982), and indeed the same is true of any PS-based theory (such as HPSG) which admits structure sharing. But if discontinuous structures are admitted in principle, what then excludes strings like *big after parties? The question seems trivially easy to answer from an HPSG perspective: discontinuity is only permitted when it is caused by structure sharing, and structure sharing is only permitted when it is explicitly sanctioned in the grammar. Structure sharing removes a single element from one constituent and makes it act as part of another (higher) constituent. For example, subject-raising causes discontinuity by locating the lower clause's subject in a named place (subject) in the higher clause; but since the grammar says this structure sharing is ok, it is ok. In contrast, there is no structure sharing pattern which allows big to move away from parties, so discontinuity is not ok. What will turn out to be a weakness in this answer is that the grammar has to stipulate each structure sharing possibility. In most cases this is no problem but the problem data that we consider below will show that more flexibility is needed. There is an alternative which is much more obviously compatible with DS than with PS, and which will play a crucial part in the later discussion; so to the extent that the explanation works it will count as evidence against PS. The alternative is to ask what the essential difference is between the structures of It stopped raining, where the discontinuous phrase it ... raining is grammatical, and of *big after parties, with the illegal *big ... parties - apart from the fact that one is allowed and the other isn't. The difference is that all the words in It stopped raining are held together by one continuous phrase (the whole sentence). It has a legitimate place in the sentence thanks to its dependency on stopped, whereas the only link between big and the rest of the sentence lies through the discontinuous phrase. In other words, the grammatical sentence has a substructure which is free of discontinuities but holds all the words together. This observation provides the basis for a very general principle which controls discontinuity in DS. For rather obvious reasons discontinuity always results in tangling arrows (provided we consider the vertical arrow above the sentence root to be infinitely long). The No-tangling Principle A string of words is not grammatical unless its dependency structure includes a substructure in which: a. there is a single `root' word, b. every other word depends either directly or indirectly on the root, c. for each word, there is a single, minimal, chain of dependencies to the root, d. no dependencies tangle. We can call this tangle-free substructure the phrase's `surface structure' because this is the structure which is responsible for surface word-order. The only clause that needs any comment is (c). The point of this requirement is to force `raising' while excluding `lowering'; so if a word has two anchors, one nearer to the root than the other, the one to be included in the substructure must be the former. For example, the verb stop shares its subject with its participle complement, but clause (c) guarantees that this shared word is treated in surface structure as the subject of stop, not of the complement. This means that there is no need to build this asymmetry into the lexical entry for stop, and indeed in true WG notation (in contrast with the compromise used in 0) the entry is neutral regarding word order. The relevant WG information on stop is actually a proposition2 which is equally compatible with 2Strictly speaking this proposition is in turn derived, by inheritance, from a simpler proposition about stop plus a general proposition about the dependency-type `xcomp' (named after its counterpart in LFG). (i) stop has an xcomp. (ii) A verb's subject is also the subject of its xcomp. either position for the shared subject: (19)The subject of stop is also the subject of its complement. In contrast, the HPSG analysis builds the raising into each relevant lexical entry and structure-description, as a stipulation rather than as a general principle. The No-tangling Principle will be crucial to the arguments in the rest of the paper. It expresses a putative universal of syntax which limits the kinds of discontinuity that are permitted in any language; but it also removes the need to stipulate how structure sharing affects surface word order. This means, on the one hand, that we can simplify lexical entries as in 0, but much more importantly it provides flexibility at just the points where it is needed. It will allow us to solve one of our problems by providing two alternative surface structures for certain rather special dependency structures; and the other problem will be solved by allowing an indefinite number of raising slots controlled only by one general rule plus the No-tangling Principle. In diagrams it is convenient to separate the surface structure from the other dependencies by drawing it above the words (literally `on the surface', if the word-string defines `ground level'), with all other dependencies added below. With this change the structure for It has stopped raining is shown in Fig. 6. 6 Fig. 6 The effect of the No-tangling Principle is to guarantee that any example of discontinuity must also involve structure sharing (with the possible exception of discontinuities caused by parenthetical elements such as vocatives, which we shall simply ignore here). The converse does not apply: not all examples of structure sharing lead to discontinuity. For example, both WG and HPSG treat so-called `Exceptional Case-Marking' verbs as subject-to-object raisers, so the lower verb's subject is also the higher verb's object, but this does not in itself lead to discontinuity as can be seen from the structure in Fig. 7 for John expected Mary to come. Proposition (ii) is a default which is overridden when the verb has an object: (iii)A transitive verb's object is also the subject of its xcomp. 7 Fig. 7 To prepare for the next section we need to illustrate two other kinds of discontinuous structure: extraction, and extraposition from NP. The grammar entries responsible for extraction are shown in 0, using the HPSG terminology for the three entries (Pollard & Sag 1994: 160). The `Top' entry optionally allows any word to be attached to a finite verb (i.e. a tensed verb or an imperative) by the dependency labelled `x+' (short for `extractee', but distinguished from `+x' for `extraposee' by the `+' which marks the relative position of the anchor; in Hudson (1990) this dependency is called `visitor'). This is the entry for topicalisation, as in Beans I like, and other Top entries are needed for other types of extraction such as relative clauses and tough-movement. The `Middle' entry allows the extractee of one word to be passed recursively down a chain of dependents, and the `Bottom' entry allows the extractee dependency to be `cashed' for some kind of complement relationship. Without this final trade-in the extractee has no semantic role in the sentence, so extraction is impossible. Fig. 8 shows a sample structure in which all three entries have applied. (20) Extraction: a. `Top' word x+- V [fin] 8 b. `Middle' word x+- word -c word __ x+ __/ c. `Bottom' word x+,c- word Fig. 8 Fig. 8 shows two discontinuous phrases, beans ... that you like and beans ... you like, both of which are justified by the structure sharing of the `Middle' entry. Why does the `extractee' relation override the word-order demands of the object dependency? In this case the explanation involves default inheritance. In WG generalisations about dependencies as well as those about word-classes apply only by default, i.e. unless overridden by more specific requirements. As we shall see in section 6, dependencies, like word-classes, are organised in an inheritance hierarchy with `dependent' as the most general type and more specific types such as `subject' and `complement' below it. The word-order rule which applies to objects is actually just one application of the very general anchor-first default rule for English. The rule which positions extractees before their anchors is more specific, applying just to extractees3. (21)General: A word follows its anchor. Extractees only: An extractee precedes its anchor. The general principle of default inheritance ensures that the latter overrides the former. Extraposition from NP is somewhat simpler as it involves only a single, non-recursive, entry which generates examples like the following: (22) Reports have emerged of widespread atrocities. Someone came to see me who I hadn't seen for years. (23) Extraposition from NP: V -+x word / N _/ In this entry4 the vertical dependency arrow from V to N means that the N concerned may either 3In Hudson 1990 I argued for a basic division of English dependencies into `predependent' and `postdependent', with `extractee' (alias `visitor') as a type of predependent. In that analysis the word-order rule for extractees is in fact the one for predependents in general. This is a side issue which is not relevant to the present discussion. 4The entry given here ignores what may turn out to be a serious problem for the WG analysis of determiners as heads: an intervening determiner does not affect extraposability, in contrast with other intervening material which does. For example, (ii) is just as good as (i) in spite of the determiner some which (according to WG) adds an extra dependency between reports and are. (i) Reports have emerged of widespread atrocities. (ii) Some reports have emerged of widespread atrocities. In contrast, indisputable extra dependencies can make extraposition harder (the so-called `Right-Roof Effect'), as in (iv) contrasting with (iii). precede or follow the `pivot' verb. This verb accepts one of the N's dependents as its own `extraposee', labelled `+x'; more specifically, it accepts one of the N's `post-dependents' (i.e. one of its complements or post-anchor adjuncts). Fig. 9 shows the structure generated for the first example. 9 Fig. 9 Why does the structure sharing have the effect of moving the extraposee away from its first anchor? Because raising is obligatory, so in the surface structure what counts is its extraposee relation to the V, rather than its adjunct relation to the N. The most important point to emerge from this discussion of structure sharing and discontinuity in WG is that the consequences of structure sharing for word order are due to the No-tangling Principle rather than to stipulated entries. We are now ready to consider our two main empirical problems. 4. Preposed adverbials in English wh-questions Adverbials (adjuncts of place, time, cause, etc.) can generally be preposed, giving examples like the following which I take to be straightforwardly grammatical. The preposed adverbial phrases are bracketed. (24)[After that] I saw Mary. [When he calls] tell him to ring back. [For supper] we had ice-cream. (iii) Reports of atrocities have emerged that have been denied. (iv) ?Reports of atrocities have emerged that were committed by government forces. On the other hand it is possible that the constraint is due to processing difficulties. This might explain the relative goodness of examples like (v), where the intervening structure is semantically rather `bland'. (v) A collection of reports has been published of widespread atrocities. If there is a processing explanation, then the entry for Extraposition should allow the `source' noun to depend only indirectly on the pivot verb. This preposing involves the same extraction process which we illustrated above, but the exact mechanics are irrelevant. The main point is that the adverbial is a dependent of the first verb. The structure for the last example is in Fig. 10, with the adverbial shown simply as `a' (for `adjunct') of the verb. 10 Fig. 10 Adverbial preposing seems to apply freely to wh-interrogative sentences as well, as shown by the following examples. (25) [After that] who did you see? [When he calls] what shall I say to him? [For supper] who had ice-cream? Only one order is possible: preposed adverbial followed by wh-pronoun. (26) *Who [after that] did you see? *What [when he calls] shall I say to him? *Who [for supper] had ice-cream? In fact, the preposed adverbial is in general the very first element in any sentence, as it also has to precede the uninverted subjects in 0. In a DS analysis there is no reason to doubt that the preposed adverb and the wh-pronoun are both dependents of the first verb, as shown in Fig. 11. 11 Fig. 11 The most natural PS analysis would presumably treat the adverbial as an adjunct of the rest of the sentence, as in Fig. 12. 12 Fig. 12 The crucial point about this structure is that whatever the label on the top node may be, adjunction guarantees that it is identical to the one on the lower `sentence' node. Whichever kind of analysis we use, then, there is no reason why the fronted adverbial should affect the distribution of the whole clause. Now we come to the problem. In the dependency analysis in Fig. 11, the root of the sentence is clearly the tensed verb, had, and the two preceding elements, the adverbial and the wh-pronoun, are simply dependents. We should therefore expect to be able to use the sentence as a subordinate clause in a larger sentence such as 0. (27) *For supper who had ice-cream is unclear. (compare: Who had ice-cream for supper is unclear.) But this is ungrammatical, although the corresponding sentence with the adverbial un-fronted is fine. The same is true more generally of all our examples when embedded as subordinate clauses. (28) *I wonder [after that] who you saw. *I wonder [when he calls] what I should say to him. *I wonder [for supper] who had ice-cream. Why? It is easy to describe the problem informally: the wh-pronoun marks the absolute beginning of the subordinate clause, in much the same way as a subordinator such as that, so the above examples are bad for just the same reason as the following: (29)*I know [after that] that you saw Mary. *I know [when he calls] that I should tell him to ring back. *I know [for supper] that we had ice-cream. These sentences can all be mended, of course, by reversing the order of that and the adverbial, which shows that adverbial fronting is possible in subordinate clauses (thus eliminating a pragmatic explanation along the lines that fronting is pragmatically allowed only in main clauses): (30)I know that [after that] you saw Mary. I know that [when he calls] I should tell him to ring back. I know that [for supper] we had ice-cream. But if the wh-pronoun blocks pre-posing in the subordinate clause, why not in the main clause as well? And why can't we mend the subordinate questions in the same way as the that clauses, simply by reversing the order of the wh-pronoun and the adverbial? (31)*I wonder who [after that] you saw. *I wonder what [when he calls] I should say to him. *I wonder who [for supper] had ice-cream. For those attracted by a GB-style analysis, it is worth pointing out the problems raised by the data just cited5. If the wh-pronoun was positioned before the `Comp' position of that, then a wh-pronoun ought to be able to replace that before a preposed adverbial; so why is this not possible 0? Similarly, if the preposed adverb was adjoined to CP and the wh-pronoun was moved to the specifier of Comp, the combination adverbial + wh-pronoun should be just as possible in a subordinate clause as in a main clause; but it is not 0. (It should be noticed, incidentally, that the interrogative clause need not be a complement, as witness 0, so the solution does not lie in the general ban on adjunction to complements suggested in Chomsky (1986: 6).) The WG explanation requires a discussion of the structure of wh-questions. Take a very simple example like Who came? Which word depends on the other? In the last paragraph we took the verb to be the root of this sentence as in virtually every other sentence. The centrality of the verb is one of the main tenets not only of dependency analysis but of virtually every other modern analysis (including HPSG, in which a sentence is a kind of VP). After all, if who is the subject of came, and subjects are dependents, then who certainly depends on came. But there is also another set of facts about wh-questions which support the opposite analysis, with the verb depending on the wh-pronoun (Hudson 1990:362ff): The verb (and therefore the rest of the clause) can be omitted, leaving just the wh-pronoun (`sluicing'); e.g. I know someone came, but I don't know who. The wh-pronoun is the link between the clause and a higher verb; for example, wonder subcategorizes for an interrogative clause, which means a clause introduced by an `interrogative word' - a wh-pronoun, whether or if. In DS terms this means that in I wonder who came, who depends on wonder, and came on who. It would be extremely difficult to write the necessary rules if came depended directly on wonder: `a tensed verb which depends on wonder must itself have a dependent interrogative word in first position'. The wh-pronoun seems to select the form of the following verb. After most pronouns this may be either a tensed verb or TO plus infinitive, but one word is different: after why only a tensed verb is permitted. (Compare I was wondering when/why we should do it with I was wondering when/*why to do it.) Selection of this kind is typical of anchors in relation to complements, which again suggests that the verb depends on the wh-pronoun. Semantically the verb (on behalf of the rest of the clause) modifies the wh-pronoun in much the same way as a relative clause. For example, the question Who came? is asking for the identity of the person that came, where that came clearly modifies person. 5These facts are among the evidence against the existence of the categories `Comp' and `complementizer' in Hudson 1995b. In short, there is at least a good prima facie case for taking who as the anchor of came in Who came?. This descriptive conclusion is supported by similar conclusions from relative clauses and possibly other areas of grammar as well, and leads to a very important theoretical conclusion: that mutual dependency is possible. There are good reasons for taking who as a dependent of came, but there are equally good reasons for taking came as a dependent of who. At this point DS and PS part company severely. Although DS theory has traditionally set its face against mutual dependence, this is just a matter of theoretical stipulation and could be abandoned under pressure. For PS, as far as I can see, it is impossible to show mutual dependence in a monostratal system, and even a transformational analysis would face severe problems. It remains to be seen whether this pessimistic conclusion about PS is justified. How does all this help with our problem? Because mutual dependency interacts with the No-tangling Principle to give a free choice of surface structure. Either of the dependencies in Who came? will satisfy the Principle, so either could be the (entire) surface structure. Thus there are two ways in which we could draw the structure for Who came?, both shown in Fig. 13. 13 Fig. 13 This choice is not a syntactic ambiguity, but two alternative ways of satisfying the No-tangling Principle. In this respect the diagramming system is misleading, so it should be remembered that there is just a single syntactic structure for each sentence, in which each `surface structure' is simply a sub-structure which satisfies the No-tangling Principle. We now return to the examples with fronted adverbials. Take the sentence [For supper] who had ice-cream? Which word is its root (i.e. the word on which the rest of the sentence depends) - who or had? Fig. 11 took had as the root, which is the only correct answer. In this case who is not a possible sentence-root because its arrow would tangle with the one linking for supper to had, as can be seen from Fig. 14 which shows the choice which is now offered by the mutual dependency of who and had. 14 Fig. 14 For the embedded example, however, even this option is impossible because who has to be taken as the root of the subordinate clause. Fig. 15 shows the only possible structure for an example without the offending adverbial, where the main point is the dependency between wonder and who. 15 Fig. 15 Given this dependency, there is no way to avoid the tangling caused by the fronted adverbial, which can be seen in Fig. 16. 16 Fig. 16 To summarise the argument, in who had ice-cream there are two ways to satisfy the No-tangling Principle because both who and had make acceptable sentence-roots. If we add a fronted adverbial such as for supper, this precludes who as a root; and embedding the clause precludes had as root. It follows that if we combine the two patterns, neither word qualifies as a possible root and the structure collapses completely. Hence the ungrammaticality of *I wonder for supper who had ice-cream. 5. Partial verb-phrases in German and Dutch The second empirical problem which will show the advantages of PS-free analysis is that German and Dutch both allow movement of a `partial VP'. This phenomenon has received a good deal of attention, especially within the HPSG community (e.g. Baker 1994, Haider 1990, Johnson 1986, Nerbonne 1994, Uszkoreit 1987a, b, van Noord & Bouma 1995). Native speakers seem to agree that German examples like the following (from Uszkoreit 1987a) are grammatical: (32)[Den Brief zustecken] sollte der Kurier nachher einem Spion. [the letter to-slip] was the courier afterwards to-a spy. `The courier was to slip the letter afterwards to a spy.' The problem is that on the one hand there is solid evidence that the bracketed material forms a single phrase (because German grammar requires the finite verb, in this case sollte, to follow just a single constituent), but on the other hand this would be less than a complete phrase in any standard analysis of the un-fronted equivalent 0. (33) Der Kurier sollte nachher einem Spion den Brief zustecken. The courier was afterwards to-a spy the letter slip. The facts are fairly simple. German allows the finite verb to follow any one of its own dependents: (34) Er isst einen Apfel jeden Tag. He eats an apple every day. Einen Apfel isst er jeden Tag. Jeden Tag isst er einen Apfel. If the finite verb is an auxiliary such as sollte in the earlier examples, the element before it may be a dependent of the auxiliary's complement infinitive: (35)Er sollte einen Apfel jeden Tag essen. He was an apple every day to-eat. Einen Apfel sollte er jeden Tag essen. Jeden Tag sollte er einen Apfel essen. In each case the first element is an entire phrase; in general, phrases must be kept intact (as in every other language). The exception to this generalisation is that putative verb-phrases, called `partial VPs', may be fronted as illustrated in 0 above, as well as in the following further permutations of the same elements: (36) [Zustecken] sollte der Kurier nachher den Brief einem Spion. [Einem Spion zustecken] sollte der Kurier nachher den Brief. [Nachher einem Spion zustecken] sollte der Kurier den Brief. It is even possible for the syntactic subject to be included in the front-shifted group of words, though this is only possible for very `inactive' subjects: (37)[Ein wirklicher Fehler unterlaufen] war ihm noch nie. A real error happened was to-him yet never `He had never yet made a real error.' (38)[Eine Concorde gelandet] ist hier noch nie.6 A Concord landed is here yet never. (39) [Ein Witz erzählt] wurde.7 A joke told was. (40) [Ein solch schönes Geschenk gemacht] wurde mir noch nie. A so beautiful present made was to-me yet never. In spite of the arguments in Baker (1994) it is hard to avoid the conclusion that the first phrase in each of the bracketed groups really is the subject of both the verbs. Most obviously, it is in the nominative case, which is normally found only in clear subjects (or predicative nominals). Slightly more subtly, the last two examples have passive participles (erzählt, gemacht), which is evidence that the nominative phrases have been raised from object to subject, and are therefore subjects (Müller 1995). Similar patterns are found in Dutch. The following examples are from van Noord & Bouma 1995. (41)[Bezocht] heeft Jan dat congres nog nooit. visited has Jan that conference still never. (42)[Bezoeken] to-visit zou Jan dat congres niet willen. would Jan that conference not want. (43)[Bezoeken willen] heeft Jan dat congres nooit. to-visit wanted has Jan that conference never. The last example is particularly significant, illustrating as it does the recursive power of the process behind the fronting. We shall return to it later but meanwhile we shall concentrate on German. These structures are a serious challenge for most theories of syntactic structure because the fronted groups do not correspond to phrases that are recognised in the structure of a sentence whose subordinate verb is in its normal position at the end of the sentence. In other words, they involve movement of non-constituents. To take a simple example given earlier 0, it is possible to front-shift essen, `to eat', along with any combination of einen Apfel, `an apple' and jeden Tag, `every day'. 6Example 7This from Andreas Kathol, via HPSG network. example and the next are from Müller 1995. (44)[Essen] sollte er jeden Tag einen Apfel. [Einen Apfel essen] sollte er jeden Tag. [Jeden Tag essen] sollte er einen Apfel. [Jeden Tag einen Apfel essen] sollte er. Admittedly German word order is sufficiently free for each fronted group to be able to constitute the tail of a `normal' clause, but there is no evidence at all that they constitute a constituent when they do. Moreover, to assume such phrases would be to abandon the idea that a verb and its object form a phrase which excludes adjuncts; for example it would oblige us to recognise a `verb-phrase' consisting of jeden Tag, a clear adjunct, and essen, but excluding the latter's object einen Apfel: ... einen Apfel [jeden Tag essen]. Suitable grammars can be constructed (as Uszkoreit 1987a demonstrates), but at great theoretical cost and with few descriptive insights. Turning to DS-based analyses, if anything the challenge is even more serious for conventional DS accounts than for PS-based ones, because the object and adjunct depend on the infinitive but need not move with it, contrary to basic tenets of dependency grammar. The problem can be seen in Fig. 17, a conventional DS analysis of Er sollte jeden Tag einen Apfel essen, `He should eat an apple every day'. 17 Fig. 17 If essen is fronted, then it should take all its dependents with it because the alternative is either for them to be left without anchors, or for the dependencies to tangle with others; and in either case the No-tangling Principle is broken. These alternatives are illustrated in Fig. 18 for Essen sollte er jeden Tag einen Apfel. 18 Fig. 18 Whereas the PS problem was that the fronted `phrases' implied a very clumsy and unrevealing analysis of ordinary clauses, the DS problem is that a fronted verb which leaves its dependents behind is simply impossible to analyse. One way to solve these problems is to assume a flat, VP-less structure (which is no problem for HPSG, and obligatory in any case for WG), plus optional raising dependencies controlled by the No-tangling Principle. First, however, we must provide a little background grammar for German. The verb-second phenomenon is easy to handle if we assume that each (non-interrogative) finite verb has just one `extractee', again labelled `x+', which must be combined with one dependent of that verb, or (recursively) of any non-finite verb which depends on it. (In many German dialects extraction is limited to non-finite clauses.) For example, in Einen Apfel isst er jeden Tag, `An apple eats he every day', einen (Apfel) is both the extractee and the object of isst; and similarly for Einen Apfel sollte er jeden Tag essen, where einen is extractee of both sollte and essen, as well as the latter's object. The structure is in Fig. 19. 19 Fig. 19 With extractees as an exception, a sentence-root verb precedes all its dependents; but this order is reversed for other verbs - i.e. for any verbs which depend on another word. (45)German/Dutch word order rules: A sentence root precedes all its dependents. Other verbs follow their dependents. For example, in the last example the sentence-root verb sollte has three dependents: einen Apfel (which as extractee precedes it), er and essen, both of which follow it; but essen follows both of its un-extracted dependents, er and jeden (Tag). Suppose we now, following Baker 1994, apply a `universal raising' rule to the extracted verbal complement of any auxiliary verbs. This would raise any number of dependents of the complement verb to serve also as a dependent of the higher auxiliary verb. Since raising is obligatory, these dependents of the lower verb will be positioned as though dependents of the higher verb, while their unraised co-dependents stick to the lower verb. This pattern involves a new kind of raising. It is clearly not raising to subject, extraction or extraposition (the three types of raising discussed so far), so we must invent a new name for the relationship which is involved. The obvious name is `raisee', which we shall adopt with the abbreviation `r'. (46) German/Dutch universal raising Any number of dependents of an extracted verbal complement of an auxiliary verb A may also be raisees of A. This raising option is only available for a very restricted range of sentences, and does not lead to ambiguity. It applies only to verbs which are both complements of an auxiliary verb and also extracted, and when a dependent receives the extra `raisee' dependency this affects its position in the sentence. For example, if essen depends on the auxiliary sollte, then at one extreme we can raise every single dependent into the surface structure so that they take their position after sollte: Essen sollte er jeden Tag einen Apfel. At the other extreme, there could be no raisees at all so all the words concerned line up before essen: Jeden Tag einen Apfel essen sollte er. And in between we could raise, say, jeden Tag but not einen Apfel: Einen Apfel essen sollte er jeden Tag. Structures for the three variants on our simple example are shown in Fig. 20 to 22. 20 Fig. 20 21 Fig. 21 22 Fig. 22 What about subjects that are front-shifted with the infinitive, as in 0 to 0? These are surprising because one would expect them to be covered by ordinary subject-raising, which ought to apply obligatorily. If Ein (Fehler), `an error', in 0 is the subject of the infinitive unterlaufen, and if war is an auxiliary (as it obviously is), then the link between Ein Fehler and war ought to be in the surface structure instead of the lower subject link. The solution must be more tentative, but it is possible that this pattern is linked to the fact that German tensed verbs, unlike English ones, do not have to have a subject. In the following classic examples there is no subject. (47) Mich friert. `Me freezes', i.e. `I am cold' Jetzt wird gearbeitet. `Now is worked', i.e. `Let's get down to work now' If the subject slot is optional for the auxiliary verb, then raising to it must also be optional. In that case, the problem disappears as the infinitive's subject has no dependency link to the main clause. This analysis may also help to explain why the unraised subject must be inactive; subjectless sentences seem only to be possible with inactive verbs (such as frieren) and passives. As stated, the German Universal raising rule predicts that raising applies recursively across auxiliary verbs. This may be true for German, but it is certainly true for Dutch to judge by the example quoted earlier 0 and repeated here: (48)[Bezoeken willen] heeft Jan dat congres nooit. to-visit wanted has Jan that conference never. In this example one auxiliary (willen) depends on another (heeft), so any dependent of willen may qualify also as a raisee of heeft, including dependents which are merely raisees of willen. This applies specifically to dat (congres), which must be a raisee of heeft as well as raiseee of willen and object of bezoeken. Van Noord & Bouma (1995) show that the partial VP fronting which we have been discussing is closely related to another construction found in Dutch and in some dialects of German: cross-serial dependencies among final verbs. The facts are well known and much discussed since Bresnan et al (1983). The following examples are from Van Noord & Bouma 1995. (49)... dat Jan het boek wil lezen. ... that Jan the book wants to-read. (50) ... dat Jan Marie het boek laat lezen. `that Jan lets Mary read the book' (51)... dat Jan Marie het boek wil laten lezen. `that Jan wants to-let Mary read the book. The examples are challenging because the verbs occur in the same order as their dependents, which inevitably leads to tangling (`cross-serial') dependencies. However, universal raising may offer a solution here as well. Let's assume that the dependent verbs are delayed by extraposition; e.g. in 0 lezen could have preceded wil but is extraposed instead. This is an extra `raising' dependency, labelled `+x', which allows us to define another class of verbs that depend on auxiliaries. If universal raising were to apply to these verbs as well, any of their dependents could become raisees of the auxiliary, which would avoid tangling in surface structure as shown in Figs. 23 to 25. 23 Fig. 23 24 Fig. 24 25 Fig. 25 It should be noticed that universal raising simply provides a background to other rules, and in particular for the rules which allow verb extraposition. This explains why standard German does not allow such examples, although it does have universal raising. The details of the rules are a separate topic which is only indirectly relevant here. The main point is that a small extension of Universal Raising may cover this construction as well. Moreover, this extension involves what looks like a rather natural class of dependents, extractee and extraposee. (52)German/Dutch universal raising (extended) Any number of dependents of an extracted or extraposed verbal complement of an auxiliary verb A may also be raisees of A. This section has suggested that Dutch and German share the rule of Universal Raising, which interacts with the No-tangling Principle to explain not only why `partial verb-phrases' can be fronted but also why the expected order of verbs in final clusters can be reversed without producing dependency chaos. The analysis relied crucially on the absence of any predetermined raising structure, and more generally of any predetermined PS; raisee relationships are added freely to the ordinary DS, generating patterns which are otherwise not possible. It is hard to see how this kind of flexibility could be provided by a PS-based analysis which required a separate entry for each structure-sharing pattern. The price we have paid for the explanation is the introduction of a new dependency, `raisee', whose sole function is to give extra flexibility in word order. Knowing that one word is the raisee of another tells us nothing at all about the semantic relationship between them - indeed it almost guarantees the absence of such a relationship; nor does it `carry' any other syntactic rules, such as agreement or selection. This emptiness distinguishes the raisee dependency from most others, but it is not unique. The next section takes this classification of dependencies a little further. 6. Dependencies and binding One of the general themes running through this paper has been that structure sharing is the correct way to handle departures from basic syntax. On this WG and HPSG are agreed. However one rather fundamental point of difference is over the treatment of dependency categories, the named attributes whose values may be words or phrases. HPSG has no unifying concept or slot which covers them all - complements are in SUBCAT, subjects in SUBJ, and adjuncts in ADJUNCT. In contrast, for WG they are all instances of the category `dependent' which may be further subdivided into more or less general subcategories such as `complement' (divided further into `object', `indirect object', and so on). This hierarchical classification of dependencies not only allows generalisations across them, but encourages one to look for similarities and differences among specific dependency types, so we can now survey the dependencies which we have mentioned in this paper. Do any broad generalisations emerge about any cluster of dependencies? The examples of structure sharing have involved only a small number of dependency types: subject, extractee, extraposee and raisee. These fall into two groups. Subject can be used without any structure sharing (e.g. in any simple single-verb sentence), but the other three are never used except in structure sharing. An extractee is never just an extractee, because this does not give enough information to integrate the word into the sentence's semantic structure; it also has to bear some `proper' dependency relation such as object. Similarly for extraposees and raisees. Subjects are not alone in being able to occur either with or without structure sharing (i.e. subject raising); if a subject-to-object raising analysis is right, then the same is also true at least of objects. For most dependency types, however, structure sharing may not be possible, so we seem to have the basis at least for a broad two-way classification of dependencies according to whether or not sharing is their sole function. We shall arrive at an important generalisation about binding in which PS plays no part at all, and indeed in which any notion of PS would simply get in the way. Before we approach binding we can review the distinctive features of these three dependency types which we have already discussed. As just mentioned, they are always involved in structure sharing. They are always `raisers' - i.e. when they combine with some other dependency type, they are always the higher of the two. Therefore they are always part of surface structure (except when used recursively, in which case the `lower' dependencies are excluded from surface structure). They never combine with each other, but only with dependencies of the other type (e.g. an extractee of one word may also, recursively, be an extractee of another word, but it cannot be a raisee of another word; in contrast, the subject of one word may be the object of another word). In recognition of these similarities we can call extractees, extraposees and raisees `s-links', where the `s-' recalls the s's of `surface structure' and `structure sharing'. As van Noord & Bouma (1995) point out, the s-links extractee and raisee are irrelevant to binding. (The same may be true of extraposition, but it is hard to construct critical examples.) The following Dutch examples are from van Noord & Bouma's paper: (53)dat Jani Pietj zichzelfj/*i ziet wassen `that Jani sees Pietj wash himselfj/*i' The binding possibilities are exactly the same in the Dutch example as in its English translation, but in the Dutch example this is surprising given the dependency structures argued for above. Universal raising has raised zichzelf into the top clause, as raisee of ziet, so it is a co-dependent of Jan. In general a reflexive pronoun can be bound by the subject of its clause; in DS terms, it can be bound by a co-dependent subject (Hudson 1984:173ff). Why then can't Piet bind zichzelf in this example? Because the raisee dependency is invisible to binding. The next example confirms this conclusion. (54) dat Jani Pietj hemi/*j ziet wassen `that Jani sees Pietj wash himi/*j' Once again the raisee link between hem and ziet must be invisible to binding, because otherwise hem would count as a co-dependent of Jan and should be non-coreferential, as it is for Piet. The same is true for the extractee relationship in English as can be seen from the following (rather stilted) examples. (55)Himself/me/*him/*myself I think John admires. The possibilities are exactly the same as in the un-extracted version: (56)I think John admires himself/me/*him/*myself. And yet there is no reason to doubt the existence of the extractee dependency between the pronoun and think in 0, which makes the pronoun into a co-dependent of I. The only possible conclusion is that this dependency, like the raisee one, is invisible to binding. According to van Noord and Bouma, a recent development in HPSG theory (Iida et al 1994) would have two separate lists of dependents: SUBCAT, for all complements, and ARG-S (`argument-structure') just for those that are visible to binding. To make the contrast in this way is to shift the explanatory focus away from tree geometry (and PS) to the classification of the relationships themselves (and DS). Similarly, WG dependencies, as mentioned earlier, are organised in an inheritance hierarchy from `dependent' at the top to specific grammatical relations like `subject' and `object' at the bottom. This allows us to make the same distinction as in HPSG, but to incorporate it into the overall classification of dependencies. Fig. 27 presents a revised version of the dependency system in Hudson (1990: 208). Dependents are of two kinds: s-links (explained above) and b-links, with b- to remind us that they are visible to binding, that they are basic and that they are the bottom relationship in a structure sharing pattern. This distinction is presumably universal, though there may be some languages which have no s-links at all. (After all, `raisee' is part of German and Dutch but not of English, and Japanese seems not to have any extraction.) A very different contrast applies to languages with `mixed' word order (where head-final constructions exist alongside head-initial ones). These seem to need a general distinction between `pre-dependent' and `post-dependent' (respectively, dependents that precede and follow their anchor). In Fig. 26, unlike Hudson (1990), this distinction is separate from the main system. This diagram needs a little more explanation. The convention for showing inheritance (or `isa') relations is also new (Hudson 1995c): a triangle has its (large) base against the larger, more general, category, and its (small) apex linked to the subcategories or members. The diagram thus allows multiple inheritance for dependencies; e.g. `subject' inherits both from `valent' and from `pre-dependent'. The dotted lines apply to English but probably not to German and Dutch (where relative order of verb complements is generally controlled by the German/Dutch word order rules in 0); and `raisee' is bracketed because it is part of German and Dutch but not English. 26 Fig. 26 The b-link/s-link distinction is similar to that between A- (`argument') and A'-positions in GB, but does not involve tree-geometry and PS at all. Like the HPSG analysis of binding in terms of obliqueness, it shows that not only can non-PS analyses `work', but they can also be expressed in terms of natural and possibly universal categories. 7. Conclusions A reasonable conclusion is that PS does not help to solve some of the most interesting problems, and indeed probably makes them even harder to solve; as I put it earlier, the PS is part of the problem, not of the solution. This case is particularly easy to argue when a lot of dependency apparatus is already available and transformational solutions are unavailable, as in HPSG. The main evidence for this conclusion involved our two empirical problems: The interaction between adverb-preposing, wh-movement and clause subordination in English is quite easy to explain in terms of DS. The explanation hinged crucially on the assumption that the wh-pronoun and the clause's root verb are mutually dependent, which would be difficult if not impossible to express in a PS analysis. `Partial verb phrases' in German and Dutch are problematic for a PS analysis because they involve phrases for which there is otherwise no evidence, and which in fact make normal structures much harder to analyse. For a DS analysis all we need is a new relationship `raisee' which can be added freely to any dependent just in those environments where partial VPs occur. However, we also considered some other supporting evidence: Many lexical or grammatical facts involve restrictions imposed by one word on the head-word in an accompanying phrase; e.g. a word may select a PP with a specific preposition, a VP whose verb has a particular inflection, and so on. In these cases the particularities of the head-word itself are directly visible to the controlling word in a DS analysis but not in a PS analysis, where they have to be projected up onto the phrase node. This can be done, but the PS is the cause of the problem. The peculiarities of s-links are hard to express in terms of PS, especially when they include the German/Dutch `raisee' links but not subjects; at best PS is a harmless irrelevance. If dependencies are available, they can be classified with `s-link' as one important subclass which can be mentioned, as appropriate, in generalisations about raising, surface structure and binding. What are the implications of this conclusion for syntactic theory in general, and for HPSG in particular? It is hard to draw general conclusions beyond a plea for the issue to be given the attention it deserves. For most syntacticians the need for PS is one of the few things that they are really sure of, and which they might well describe as one of the pillars of modern syntactic theory. As I have already observed, most modern syntacticians also accept a lot of dependency-based concepts which were not recognised in the early days of PS theory, so it is high time for a general review of the evidence for PS. As far as HPSG is concerned, however, the conclusion is simple: the theory no longer needs PS. Apart from the theory's name (why not HG?), the main thing which requires attention is the `official' notation, where boxes-within-boxes can give way to boxes-beside-boxes. For example, to return to the elementary example Kim gives Sandy Fido, instead of the HPSG analysis with VP and S boxes to hold the word and phrase boxes together we could have the following, which in every other respect preserves the insights of the HPSG analysis. 27 Fig. 27 The cross-referencing between boxes could be made more visible by adding dependency arcs, but that really is just a matter of notation. Apart from these notational changes, the main change needed in the theory is a revised `subcategorization principle' (Pollard & Sag 1994:34). As it stands this principle reads as follows: (57)SUBCATEGORIZATION PRINCIPLE (ORIGINAL VERSION) In a headed phrase (i.e. a phrasal sign whose DTRS value is of sort head-struc), the SUBCAT value of the head daughter is the concatenation of the phrase's SUBCAT list with the list (in order of increasing obliqueness) of SYNSEM values of the complement daughters. Without PS sisters (i.e. accompanying words) take over the role of the daughters: (58) SUBCATEGORIZATION PRINCIPLE (REVISED VERSION) The SUBCAT value of any word is a list (in order of increasing obliqueness) containing the SYNSEM values of some of its sister words. Various other revisions could also be considered but the main point that emerges from this paper is that PS could be removed from current HPSG with considerable gains and little, if any, loss. References Baker, Kathryn. 1994. An extended account of `modal flip' and partial verb phrase fronting in German = CMU-LCL-94-4. Pittsburgh: Carnegie Mellon University, Laboratory for Computational Linguistics. Bresnan, Joan; Kaplan, Ronald; Peters, Stan; & Zaenen, Annie. 1983. Cross-serial dependencies in Dutch. Linguistic Inquiry, 13: 613-35. Chomsky, N. 1986. Barriers. Cambridge, MA: MIT Press. Fraser, Norman & Hudson, Richard. 1992. Inheritance in Word Grammar. Computational Linguistics 18: 133-59. Haider, Hubert. 1990. Topicalization and other puzzles of German syntax. In Gunther Grewendorff, & Wolfgang Sternefeld (eds.) Scrambling and Barriers. Amsterdam: Benjamins, 43-112. Hudson, Richard. 1984. Word Grammar. Oxford: Blackwell. Hudson, Richard. 1990. English Word Grammar. Oxford: Blackwell. Hudson, Richard. 1992. Raising in syntax, semantics and cognition. In Iggy Roca, ed. Thematic Structure: Its role in grammar. Berlin: Mouton de Gruyter, 175-98. Hudson, Richard. 1993. Do we have heads in our minds? In Greville Corbett, Scott McGlashan & Norman Fraser, eds. Heads in Grammatical Theory. Cambridge: Cambridge University Press, 266-91. Hudson, Richard. 1994. Word Grammar. In Ronald Asher, ed. Encyclopedia of Language and Linguistics. Oxford: Pergamon Press, 4990-3. Hudson, Richard. 1995a. Does English really have case? Journal of Linguistics 31. Hudson, Richard. 1995b. Competence without Comp? In Bas Aarts and Charles Meyer (eds.) The Verb in Contemporary English. Cambridge: Cambridge Univesity Press, 40-53. Hudson, Richard. 1995c. Word Meaning. London: Routledge. Hudson, Richard. forthcoming. Really bare phrase-structure = Dependency structure. Studies in English Language Usage and English Language Teaching 17 (Kyoto: Yamaguchi). Iida, Masayo; Manning, Christopher; O'Neill, Patrick; & Sag, Ivan. 1994. The lexical integrity of Japanese causatives. Paper presented at the LSA 1994 Annual Meeting. Johnson, Mark. 1986. A GPSG account of VP structure in German. Linguistics 24;871-82. McCawley, James. 1982. Parentheticals and discontinuous constituent structure. Linguistic Inquiry 13: 91-106. Müller, Stefan. 1995. Message to the HPSG list = hpsg@ling.ohio-state.edu on 21st July. Nerbonne, John. 1994. Partial verb phrases and spurious ambiguities. In John Nerbonne, Klaus Netter & Carl Pollard (eds.) German Grammar in HPSG. Stanford: Center for the Study of Language and Information. Pollard, Carl & Sag, Ivan. 1994. Head-Driven Phrase Structure Grammar. Stanford: CSLI and Chicago: University of Chicago Press. Rosta, Andrew. 1994. Dependency and grammatical relations. UCL Working Papers in Linguistics 6, 219-58. Uszkoreit, Hans. 1987a. Linear precedence and discontinuous constituents: Complex fronting in German. In Geoffrey Huck & Almerindo Ojeda (eds.) Discontinuous Constituents = Syntax and Semantics 20, 406-27. Uszkoreit, Hans. 1987b. Word Order and Constituent Structure in German = CSLI Lecture Notes 8. Stanford: Center for the Study of Language and Information. van Noord, Gertjan & Bouma, Gosse. 1995. Dutch verb clustering without verb clusters. MS. POSTSCRIPT Since writing this draft I've discovered more facts about German which undermine my proposed analysis of fronted partial VPs which include the subject. According to three native speakers (replying independently to a query on the HPSG network) the auxiliary verb agrees with the fronted subject: (59)Wirkliche Fehler unterlaufen waren ihm noch nie. `Real errors happened were ..' This seems to show beyond doubt that Fehler really is the subject of the auxiliary as well as of the infinitive. We therefore seem to have a clear example of subject lowering unless some clever alternative can be found .... Any suggestions gratefully received! Notice that this has nothing to do with PS, VP and so on - it's to do with the basic structure sharing mechanism. Does anyone know of any other examples of lowering?