ON PARSING PREFERENCES Lenhart K. Schubert Department of Computing Science University of Alberta, Edmonton Abstract. It is argued that syntactic p r e f e r e n c e principles such as Right A s s o c i a t i o n and Minimal Attachment are unsatisfactory as usually formulated. Among the difficulties are: (I) dependence on ill-specified or implausible principles of p a r s e r operation; (2) d e p e n d e n c e on questionable a s s u m p t i o n s about syntax; (3) lack Of provision, even in principle, for i n t e g r a t i o n with semantic and p r a g m a t i c p r e f e r e n c e principles; and (4) apparent counterexamples, even when d i s c o u n t i n g (I)-(3). A possible approach to a solution is sketched. while the latter chooses the longest reduction among those possible reductions whose initial constituent is "strongest" (e.g., reducing V NP PP to VP is p r e f e r r e d to reducing NP PP to PP). In (5), M i n i m a l Attachment would predict association of the PP on that rack with wanted, while the actual p r e f e r e n c e is for a s s o c i a t i o n with dress. Both Ford et al. and S h i e b e r account for this fact by appeal to lexical preferences: for Ford et al., the strongest form of want takes an NP complement only, so that Final A r g u m e n t s prevails; for Shieber, the NP the dress is stronger than wanted, viewed as a V requiring NP and PP complements, so that the shorter reduction prevails. I. Some p r e f e r e n c e p r i n c i p l e s The following are some standard kinds of sentences illustrating the role of syntactic preferences. (I) John bought the book which I had selected for Mary (2) John p r o m i s e d to visit frequently (3) The girl in the chair with the spindly legs looks bored (4) John carried the g r o c e r i e s for Mary (5) She wanted the dress on that rack (6) The horse raced past the Darn fell (7) The boy got fat melted sentence (6) leads most people "down the garden path", a fact explainable in terms of Minimal Attachment or its variants. The e x p l a n a t i o n also works for (7) (in the c a s e of Ford et al. with appeal to the a d d i t i o n a l p r i n c i p l e that r e - a n a l y s i s of complete p h r a s e s requiring r e - c a t e g o r i z a t i o n of lexical c o n s t i t u e n t s is not possible). Purportedly, this is an a d v a n t a g e over Marcus' (1980) p a r s i n g model, whose three-phrase buffer should allow t r o u b l e - f r e e p a r s i n g of (7). (I) (3) illustrate Right A s s o c i a t i o n of PP's and adverbs, i.e., the p r e f e r r e d a s s o c i a t i o n of these modifiers with the rightmost v e r b (phrase) or noun (phrase) they can modify (Kimball 1973). Some variants of Right A s s o c i a t i o n (also c h a r a c t e r i z e d as Late Closure or Low Attachment) w h i c h have Dean p r o p o s e d are F i n a l A r g u m e n t s (Ford et al. 1982) and Shifting Preference (Shieber 1983); the former is roughly Late Closure restricted to the last o b l i g a t o r y c o n s t i t u e n t and any following optional constituents of v e r b phrases, while the latter is Late Closure w i t h i n the context of an LR(1) shiftreduce parser. 2. Problems w i t h the p r e f e r e n c e p r i n c i p l e s 2.1 D e p e n d e n c e on i l l - s p e c i f i e d or i m p l a u s i b l e p r i n c i p l e s of parser operation. Frazier & Fodor's (1979) model does not c o m p l e t e l y s p e c i f y what structures are built as each new w o r d is accommodated. C o n s e q u e n t l y it is hard to tell e x a c t l y what the effects Of their p r e f e r e n c e p r i n c i p l e s are. Shieber's (1983) shift-reduce p a r s e r is welldefined. However, it p o s t u l a t e s complete phrases only, whereas human p a r s i n g appears to involve integration of c o m p l e t e l y analyzed phrases into larger, incomplete phrases. C o n s i d e r for e x a m p l e the following sentence Deginnings: (8) So I says to the ... (9) The man r e c o n c i l e d h e r s e l f to the ... (10) The news a n n o u n c e d on the ... (11) The r e p o r t e r a n n o u n c e d on the ... (12) John beat a rather hasty and u n d i g n i f i e d ... People p r e s e n t e d with complete, spoken sentences beginning like (8) and (9) are able to signal detection of the errors about two or three syllables after their occurrence. Thus a g r e e m e n t Regarding (4), it would seem that a c c o r d i n g to Right A s s o c i a t i o n the PP for Mar~ should be preferred as p o s t m o d i f i e r of g r o c e r i e s rather than carried; yet the opposite is the case. Frazier & Fodor's (1979) e x p l a n a t i o n is based on the assumed phrase structure rules VP -> V NP PP, and NP -> NP PP: attachment of the PP into the VP m i n i m i z e s the resultant number of nodes. This p r i n c i p l e of Minimal Attachment is assumed to take p r e c e d e n c e over Right Association. Ford et al's (1982) variant is Invoked Attachment, and Shieber's (1983) variant is Maximal Reduction; roughly speaking, the former amounts to early closure of no___nn-final constituents, 247 features appear to p r o p a g a t e upward from i n c o m p l e t e constituents. (10) and (11) suggest that even semantic features (logical translations?) are propagated before phrase completion. The "premature" recognition of the idiom in (12) provides further e v i d e n c e for early i n t e g r a t i o n of partial structures. the theory; (however, see Crain & S t e e d m a n 1981). Two views that seem to u n d e r l i e some d i s c u s s i o n s of this issue are (a) that syntactic preferences are "defaults" that come into effect only in the a b s e n c e Of s e m a n t i c / p r a g m a t i c p r e f e r e n c e s ; or (b) that alternatives are tried in o r d e r of s y n t a c t i c preference, w i t h semantic tests s e r v i n g to reject incoherent combinations. Evidence against both p o s i t i o n s is found in s e n t e n c e s in w h i c h syntactic preferences prevail over much more coherent alternatives: (21) Mary saw the man w h o had lived w i t h her w h i l e on m a t e r n i t y leave. (22) John met the tall, slim, a u b u r n - h a i r e d g i r l from M o n t r e a l that he m a r r i e d at a d a n c e (23) John was named a f t e r his twin sister W h a t we apparently need is not hard and fast decision rules, but some way of trading off s y n t a c t i c and n o n - s y n t a c t i c p r e f e r e n c e s of various s t r e n g t h s against each other. These considerations appear to favour a "fullpaths" parser which integrates each s u c c e s s i v e w o r d (in p o s s i b l y more ways than one) into a comprehensive parse tree (with overlaid alternatives) spanning all of the text processed. Ford et al.'s (1982) parser does develop complete top-down paths, but the nodes on these paths d o m i n a t e no text. Nodes p o s t u l a t e d bottom-up extend only one level above c o m p l e t e nodes. 2.2 D e p e n d e n c e on q u e s t i o n a b l e ab____out syntax assumptions 2.4 A p p a r e n t The successful prediction of observed p r e f e r e n c e s in (4) d e p e n d e d on an assumption that PP p o s t m o d i f i e r s are added to c a r r i e d v i a the rule VP -> V NP PP and to g r o c e r i e s via the rule NP -> NP PP. However, these rules fail to do justice to certain systematic similarities between verb phrases and noun phrases, evident in such pairs as (13) John loudly quarreled with Mary in the kitchen (14) John's loud q u a r r e l w i t h Mary in the k i t c h e n When the a n a l y s e s are a l i g n e d by postulating two levels of postmodification for both verbs and nouns, the accounts of many examples that supposedly involve Minimal A t t a c h m e n t (or M a x i m a l Reduction) are spoiled. These include (4) as well as standard examples involving non-preferred r e l a t i v e clauses, such as (15) John told the girl that he loved the story (16) Is the block sitting in the box? counterexamples. There appear to be straightforward counterexamples to the syntactic preference principles which have been proposed, even if we discount evidence for integration of incomplete structures, accept the s y n t a c t i c a s s u m p t i o n s made, and r e s t r i c t o u r s e l v e s to cases w h e r e none of the a l t e r n a t i v e s show any s e m a n t i c anomaly. The following are a p p a r e n t c o u n t e r e x a m p l e s to Right A s s o c i a t i o n (and S h i f t i n g Preference. etc.): (24) John stopped s p e a k i n g f r e q u e n t l y (25) John d i s c u s s e d the girl that he met with his mother (26) John was a l a r m e d by the d i s a p p e a r a n c e of the a d m i n i s t r a t o r from head o f f i c e (27) The d e r a n g e d i n v e n t o r a n n o u n c e d that he had p e r f e c t e d his d e s i g n of a clip car shoe (shoe car clip, clip shoe car, shoe clip car, etc.) (28) Lee and Kim or Sandy d e p a r t e d (29) a. John removed all of the fat and some of the bones from the roast b. John removed all of the fat and sinewy pieces of meat The point Of (24)-(26) should De clear. (27) and (28) show the lack of r i g h t - a s s o c i a t i v e tendencies in compound nouns and c o o r d i n a t e d phrases. (29a) i l l u s t r a t e s the n o n - o c c u r r e n c e of a garden path predicted by Right Association (at least Dy S h i e b e r ' s version); note the possible adjectival r e a d i n g of fat and ..., as i l l u s t r a t e d in (29b). 2.3 Lack of p r o v i s i o n for i n t e g r a t i o n w i t h semantic/pragmatic preference principles Right A s s o c i a t i o n and Minimal Attachment (and their variants) are typically presented as principles which prescribe particular parser choices. As such, they are simply wrong, since the choices often do not coincide with human choices for text which is s e m a n t i c a l l y or p r a g m a t i c a l l y biased. For example, there are c o n c e i v a b l e contexts in which the PP in (4) a s s o c i a t e s with the verb, or in which (7) is trouble-free. (For the latter, imagine a story in w h i c h a young worker in a shortening factory toils long hours melting down hog fat in clarifying vats.) Indeed, even isolated sentences d e m o n s t r a t e the effect of semantics: (~7) John met the girl that he married at a dance (]8) John saw the bird with t~e y e l l o w wings (!9) She w a n t e d the gun on her night table (20) This lens gets light focused These sentences should be contrasted w i t h (I), (4), (5). and (7) respectively. The following are apparent c o u n t e r e x a m p l e s to M i n i m a l A t t a c h m e n t (or Maximal Reduction): (30) John a b a n d o n e d the attempt to p l e a s e M a r y (31) Kim o v e r h e a r d John and Mary's quarrel with Sue (32) John carried the umDre!la, the t r a n s i s t e r radio, the bundle of old magazines, and the g r o c e r i e s for Mary (33) The boy got fat spattered on his arm While the account of (30) and (31) can be rescued by d i s t i n g u i s h i n g s u b c a t e g o r i z e d and nonsubcategorized noun postmodifiers, such a move would lead to the failures already mentioned in section 2.2. Ford et al. (1982) w o u l d have no While the reversal of choices Dy semantic and pragmatic factors is regularly acknowledged, these factors are rarely a s s i g n e d any explicit role in 248 e x p e c t a t i o n p o t e n t i a l to the total p o t e n t i a l of the node. The e x p e c t a t i o n p o t e n t i a l c o n t r i b u t e d by a d a u g h t e r is maximal if the d a u g h t e r immediately follows the m o t h e r ' s head lexeme, and d e c r e a s e s as the distance (in words) of the d a u g h t e r from the head lexeme increases. The d e c a y of e x p e c t a t i o n p o t e n t i a l s with d i s t a n c e evidently results in a r i g h t - a s s o c i a t i v e tendency. The maximal e x p e c t a t i o n potentials of the d a u g h t e r s of a node are fixed p a r a m e t e r s of the rule i n s t a n t i a t e d by the node. They can be thought Of as e n c o d i n g the "affinity" of the head daughter for the remaining constituents, with "strongly expected" c o n s t i t u e n t s having r e l a t i v e l y large e x p e c t a t i o n potentials. For example, I would assume that verbs have a generally stronger affinity for (certain kinds Of) PP adjuncts than do nouns. This a s s u m p t i o n can explain P P - a s s o c i a t i o n with the v e r b in examples like (4), even if the rules governing v e r b and noun postmodification are taken to be structurally analogous. Similarly the scheme allows for c o u n t e r e x a m p l e s to R i g h t Association like (24), where the a f f i n i t y of the first v e r b (stop) for the frequency adverbial may be assumed to De sufficiently great c o m p a r e d to that of the second (speak) to o v e r p o w e r a weak right-associatlve effect resulting from the d e c a y of e x p e c t a t i o n p o t e n t i a l s with distance. t r o u b l e with (30) or (31), but they, too, pay a price: they would e r r o n e o u s l y predict a s s o c i a t i o n of the PP with the object NP in (34) Sue had d i f f i c u l t i e s with the teachers (35) Sue wanted the d r e s s for Mary (36) Sue returned the dress for Mary (32) is the sort of example w h i c h motivated Frazier & Fodor's (1979) Local Attachment principle, but their parsing model remains too sketchy for the i m p l i c a t i o n s of the p r i n c i p l e to be clear. Concerning (33), a small-scale experiment indicates that this is not a garden path. This result appears to i n v a l i d a t e the a c c o u n t s of (7) based on irreversible closure at fat. Moreover, the d i f f e r e n c e between (7) and (33) cannot De explained in terms of one-word lookahead, since a further experiment has indicated that (37) The boy got fat spattered. is q u i t e as difficult to understand as (7). 3. Towards an account of p r e f e r e n c e t r a d e - o f f s My main objective has been to point out deficiencies in current theories of parsing preferences, and hence to spur their revision. ] conclude with my own rather speculative proposals, w h i c h represent work in progress. I suggest that the effect Of semantics and pragmatics can in p r i n c i p l e be c a p t u r e d through a semantic potential contributed to each node potential by s e m a n t i c / p r a g m a t i c p r o c e s s i n g of the node. The semantic potential of a terminal node (i.e., a lexical node w i t h a p a r t i c u l a r choice of word sense for the w o r d it dominates) is high to the extent that the a s s o c i a t e d word sense refers to a f a m i l i a r (highly consolidated) and contextually salient concept (entity, predicate, or function). For example, a noun node d o m i n a t i n g star, with a translation expressing the a s t r o n o m i c a l sense Of the word, presumably has a higher semantic p o t e n t i a l than a similar node for the s h o w - b u s ~ n e s s sense Of the word, when an a s t r o n o m i c a l context (but no show-business context) has been established; and vice versa. P o s s i b l y a spreading a c t i v a t i o n m e c h a n i s m could account for the c o n t e x t dependent part of the semantic potential (of., Quillian 1968, Collins & Loftus 1975, C h a r n i a k 1983). In summary, the p r o p o s e d model involves (I) a full-paths parser that schedules tree pruning decisions so as to limit the number of a m b i g u o u s constituents to three; and (2) a system of numerical "potentials" as a way of i m p l e m e n t i n g p r e f e r e n c e trade-offs. These p o t e n t i a l s (or "levels of activation") are a s s i g n e d to nodes as a f u n c t i o n of their syntactic/semantic/pragmatic structure, and the p r e f e r r e d structures are those w h i c h lead to a g l o b a l l y high potential. The total potential of a node consists of (a) a negative rule potential~ (b) a p o s i t i v e semantic potential, (c) positive e x p e c t a t i o n p o t e n t i a l s c o n t r i b u t e d by all daughters following the head (where these decay with d i s t a n c e from the head lexeme), and (d) transmitted p o t e n t i a l s passed on from the d a u g h t e r s to the mother. I have a l r e a d y argued for a full-paths approach in which not only complete p h r a s e s but a l s o all incomplete phrases are fully integrated into (overlaid) parse trees d o m i n a t i n g all of the text seen so far. Thus features and partial logical translations can be p r o p a g a t e d and checked for c o n s i s t e n c y as early as possible, and a l t e r n a t i v e s chosen or d i s c a r d e d on the basis of all of the available information. The s e m a n t i c p o t e n t i a l of a n o n t e r m i n a l node is high to the extent that its logical t r a n s l a t i o n (obtained by suitably combining the logical translations of the daughters) is easily t r a n s f o r m e d and e l a b o r a t e d into a d e s c r i p t i o n of a familiar and c o n t e x t u a l l y relevant kind of object or situation. (My a s s u m p t i o n is that an u n a m b i g u o u s m e a n i n g r e p r e s e n t a t i o n of a phrase is c o m p u t e d on the b a s i s of its initial logical form by c o n t e x t dependent pragmatic processes; see S c h u b e r t & Pelletier 1982.) For example, the sentences Time flies, The y e a r s pass swiftly, The m i n u t e s creep by, etc., are i n s t a n c e s of the f a m i l i a r p a t t e r n of predication < p r e d i c a t e of l o c o m o t i o n > (<time term>), and as such are easily t r a n s f o r m a b l e into c e r t a i n commonplace (and unambiguous) assertions about one's p e r s o n a l sense of p r o g r e s s i o n through time. Thus they are likely to be a s s i g n e d high s e m a n t i c The rule p o t e n t i a l is a n e g a t i v e increment contributed by a phrase structure rule to any node which i n s t a n t i a t e s that rule. Rule p o t e n t i a l s lead to a m i n i m a l - a t t a c h m e n t tendency: they "inhibit" the use of rules, so that a parse tree using few rules will generally De p r e f e r r e d to one using many. Lexical p r e f e r e n c e s can be captured by m a k i n g the rule p o t e n t i a l more negative for the more unusual rules (e.g., for N --> fat, and for V -~ time). Each "expected" d a u g h t e r of a node w h i c h follows the node's head lexeme c o n t r i b q t e s a non-negative 249 potentials, and so will not easily admit any alternative analysis. Similarly the phrases met [someone] at a dance (versus married [someone] at a dance) in sentence (17), and bird with the y e l l o w wings (versus saw [something] with the yellow w i n g s ) in (18) are easily i n t e r p r e t e d as d e s c r i p t i o n s of familiar kinds of o b j e c t s and situations, and as such c o n t r i b u t e semantic p o t e n t i a l s that help to edge Out competing analyses. semantic potential), preliminary investigation suggests that they can do justice to examples like (I)-(37). Schubert & Pelletier 1982 b r i e f l y d e s c r i b e d a f u l l - p a t h s parser w h i c h chains upward from the c u r r e n t word to current " e x p e c t a t i o n s " by "left-corner stack-ups" Of rules. However, this parser searched a l t e r n a t i v e s by b a c k t r a c k i n g only and did not handle gaps or coordination. A new version designed to handle most aspects of G e n e r a l i z e d Phrase Structure G r a m m a r (see Gazdar et al., to appear) is currently being implemented. Crain & Steedman's (1981) very interesting suggestion that readings with few new presuppositions are p r e f e r r e d has a p o s s i b l e p l a c e in the p r o p o s e d scheme: the m a p p i n g from logical form to u n a m b i g u o u s meaning representation may often be r e l a t i v e l y simple when few p r e s u p p o s i t i o n s need to De a d d e d to the context. However, their more g e n e r a l p l a u s i b i l i t y p r i n c i p l e appears to fail for e x a m p l e s like (21)-(23). Acknowledgements I thank my unpaid informants who patiently a n s w e r e d strange q u e s t i o n s about s t r a n g e sentences. I have also b e n e f i t e d from d i s c u s s i o n s with m e m b e r s Of the Logical Grammar Study G r o u p at the U n i v e r s i t y of Alberta, especially Matthew Dryer, who suggested some relevant references. The r e s e a r c h was supported by the N a t u r a l S c i e n c e s and Engineering Research Council of C a n a d a under O p e r a t i n g Grant A8818. Note that the above pattern of t e m p o r a l p r e d i c a t i o n may well be c o n s i d e r e d to v i o l a t e a selectional restriction, in that p r e d i c a t e s of locomotion cannot l i t e r a l l y a p p l y to times. Thus the nodes with the highest semantic p o t e n t i a l are not n e c e s s a r i l y those c o n f o r m i n g most fully w i t h selectional restrictions. This leads to some departures from Wilks' theory of semantic preferences (e.g., 1976), a l t h o u g h I suppose that normally the most e a s i l y i n t e r p r e t a b l e nodes, and hence those with the highest semantic potential, are indeed the ones that c o n f o r m w i t h selectional restrictions. References Charniak, E. (1983). Passing markers: a theory of contextual i n f l u e n c e in l a n g u a g e comprehension. C o g n i t i v e Science 7, pp. 171-190. Collins, A. M. & Loftus, E. F. (1975). A spreading activation theory of semantic processing. P s y c h o l o g i c a l R e v i e w 82, pp. 407-428. Crain, S. & Steedman, M. (1981). The use of c o n t e x t by the P s y c h o l o g i c a l Parser. P a p e r presented at the Symposium on Modelling Human Parsing Strategies, Center for C o g n i t i v e Science, Univ. of Texas, Austin. Ford, M., Bresnan, J. & Kaplan, R. (1981). A c o m p e t e n c e - b a s e d theory of syntactic closure. In Sresnan, J. (ed.), The Mental R e p r e s e n t a t i o n of G r a m m a t i c a l Relations MIT Press, Cambridge, MA. Frazier, L. & Fodor, J. (1979). The S a u s a g e Machine: a new two-stage parsing model. C o g n i t i o n 6, pp. 191-325. Gazdar, G., Klein, E., Pullum, G. K. & Sag, I. A. (to appear). Generalized Phrase Structure Grammar: A Study in English Syntax. Kimball, J. (1973). Seven p r i n c i p l e s of s u r f a c e structure p a r s i n g in natural language. C o g n i t i o n 2, pp. 15-47. Marcus, M. (1980). A Theory of Syntactic Recognition for Natural Language, MIT Press, Cambridge, MA. Quillian, M. R. (1968). S e m a n t i c memory. In Minsky, M. (ed.), Semantic Information Processing, MIT Press, Cambridge, MA, pp. 227-270. Schubert, L.K. & Pelletier, F. J. (1982). F r o m English to logic: context-free computation of 'conventional' logical translations. Am. J. of C o m p u t a t i o n a l Linguistics 8, pp. 26-44. Shieber, S. M. (1983). Sentence d i s a m b i g u a t i o n by a shift-reduce parsing technique. Proc. Sth Int. Conf. on Artificial Intelligence, Aug. 8-12, Karlsruhe, W. Germany, pp. 699-703. Also in Proc. of the 21st Ann. Meet. of the Assoc. for Computational Linguistics, June 15-17, MIT, Cambridge, MA., pp. 113-118. Wilks, Y. (1976). Parsing E n g l i s h II. In Charniak, E. & Wilks, Y. (eds.), C o m p u t a t i o n a l Semantics, North-Holland, Amsterdam, pp. 155-184. The d i f f e r e n c e between such pairs of sentences as (17) and (22) can now be explained in terms of semantic/syntactic potential trade-offs. In both sentences the semantic potential of the reading which a s s o c i a t e s the PP with the first v e r b is relatively high. However, only in (17) is the PP close enough to the first verb for this effect to overpower the r i g h t - a s s o c i a t i v e tendency inherent in the d e c a y of e x p e c t a t i o n potentials. The final contribution to the potential of a node is the t r a n s m i t t e d potential, i.e., the sum of potentials of the daughters. Thus the total potential at a node reflects the syntactic/semantic/pragmatic properties of the entire tree it dominates. A crucial question that remains concerns the scheduling Of d e c i s i o n s to discard g l o b a l l y weak hypotheses. Examples like (33) have convinced me that Marcus (1980) was essentially correct in positing a three-phrase limit on successive ambiguous constituents. (In the context of a fullpaths parser, a m b i g u o u s constituents can be d e f i n e d in terms of "upward or-forks" in phrase structure trees.) Thus I propose to discard the globally weakest alternative at the latest when it is not possible to proceed rightward without creating a fourth ambiguous constituent. Very weak alternatives (relative to the others) may be discarded earlier, and this a s s u m p t i o n can account for early d i s a m b i g u a t i o n in cases like (10) and (11). out Although these proposals are not fully worked (especially with regard to the d e f i n i t i o n of 250