From: AAAI-86 Proceedings. Copyright ©1986, AAAI (www.aaai.org). All rights reserved. A CHINESE NATURAL THE Long-Ji Lin*, James LANGUAGE THEORY PROCESSING OF EMPTY Huang**, K.J. Chen*** *Dept. of Electrical Engineering, National **Dept. of Modern Languages and Linguistics, ***Institute of Information Science, Academia ABSTRACT In this paper, we will present a device specially designed on the basis of the theory of empty categories. This device cooperates with a bottom-up parser and is used as an elegant and efficient approachtotreatthetroublesome problems of the transformations of passivization,relativizatlon; toplcalization, ba-transformation use of zero pronouns in Chinese natural With the aid of the device, the grammar Chinese will be much more simplified and design, and the processing capability significantly improved. and the language. rules easier for to can be SYSTEM BASED UPON CATEGORIES and Lin-Shan Lee* Taiwan University, Taiwan, R.O.C. Cornell University, U.S.A. Sinica, Taiwan, R.O.C. The SASC system uses a bottom-up parser instead of a top-down parser, because the’former tends to be more efficient for Chinese sentence analysis. The parser uses charts (Kay, 1973; Kaplan; 1973) as global working structures, because many natural language processing systems, such as MIND (Kay, proved the 1973) and GSP (Kaplan, 1973)) have chart to be an efficient data structure to record what have been done so far in the course of oarsA parser based on charts can avoid the ing. inefficiency in duplicating many computations that parser often suffers when backtracking a top-down occurs. The input Chinese sentence is submitted to a preprocessor, I INTRODUCTION relativization, topicalization, and the use of zero pronouns in Chinese. To deal with those syntactic phenomena, the conventional approach is to cover all the to collect a set of grammar rules possible sentence patterns derived from those transformations. But such an approach needs a set of grammar rules to cover all the great possibilities. Especially the complexity resulting from the interactions of several transformations will make such an approach infeasible. Passivization, ba-transformation play major roles Another approach adopted in this paper is the raise-bind mechanism based upon the the theory of empty categories. It seems that the are not related to each above syntactic phenomena ot ier. However, the sentences derived from them all involve the common use of empty categories. With the use of the raise-bind mechanism, the parser will treat the transformations in the same way. use of The following our parsing categories mechanism sections algorithm Chinese in operates. first, and will briefly describe then discuss empty how the raise-bind which seqments input sentence (a seqbence of Chinese characters) into words. The result of the preprocessor is represented by a chart, and is sent to the parser. The parser parses sentences in the way that phrases are built by startinq with their heads and up on the chart adjoining constituknts on the left or the right of the heads. For example, according to the phrase structure rule (PSR), "NP-> QP N", N (noun) is the head of NP. When encounterinq a noun, the parser will try to build an NP by-starting’ with the noun and adjoining proceding quantity the phrase (QP). According to the PSR, "VP-> V-n NP", V-n (transitive verb) is the head of VP. a transitive verb, the parser’s When encountering action is similar to that of "NP -> QP N", except that it tries to adsjoin the followinq NP as its object. But if its following NP is not yet parsed by the parser, the expectation to build a VP is &spend&d until an NP' is built up in the object position. The parser using the above algorithm constructs syntax trees of input sentences exactly from bottom to top. The alqorithm used seems to be a good combination of da&d-driven parsing and hypothesis-driven parsing. The implementation of parsing algorithm and the grammar to model the Chinese syntax can III II THE PARSING In the SASC system presented here, Chinese syntactically analyzed from the of generative grammar 1982). (Huang, are found EMPTY in (Lin et al., 1986). CATEGORIES ALGORITHM Let's sentences viewpoints be (I) consider ,flimfm he hurt the following Chinese sentences. SE Chang-san NATURAL LANGUAGE I 1059 (2) material, but is “bound” to its antecedant ,“ChangSan”. In addition to ba-transformation, passiviztopicalization and relativization can also ation, ba-transformation: fib 82 ES t-%BT e be analyzed as involving some form of “move 0 ‘I. Thus there are traces within these constructions. he ba Chang-san hurt (He hurt Chang-san) (3) (6)-(8) also contain vacant NP-posinot traces, because they are not They are called “null derived from “move Q “. Null pronominals are in general pronominals”. sentence (8). But those in for example, free, for example, are bound, certain constructions Sentence (7) is called a sentence (6) and (7). tions, passivization: $Ez 3% fib U@T e I Chang-san (4) hurt was hurt by him) by him (Chang-san ?flCWiJB first ?5t ZSl!% So, the “bound” e relativization: I% in the the object of the of the second verb. subject position is empty that dog I never have seen (I have never seen that dog) 3%~ that is, subject the null pronominal to the object. are known as null pronominals Traces and The syntactic (or empty NPs). categories behavior of null pronominals is different from are treated inhowever, that of traces. They, discriminately in our implementation. I (5) are construction; verb is also pivot topicalization: Sentence which ‘M4 IV THE RAISE-BIND MECHANISM 2 playing (the (6) 5&1z tried tried (sy empty %$I?) escape to EtrG b /Ia escape) e El%) I he asked children (He asked the children (8) using zero go to dinner to go to dinner) T3B someone e or something) Sentence (Z)-(8) all involve a missing s&ject or object (indicated by “e”). But what does each missing subject or object refer to? The solid lines under sentence (Z)-(7) indicate the reference of each one. The missing object in (81, however, does not refer to any element within (8). In fact, it is an omitted pronoun, which refers to someone or something understood in the situation. According to the current linguistic theory (Chomsky, 1981; Huang, 1982), sentence (2) is derived from sentence (I) by a transformation called “move 0 “. The transformation is pemrformed as follows : the object, “Chang-San” in (1) , is ” E ” (“ba”) to the position moved by carrier . indicated (indicated 1060 in (Z), and then leaves behind a “trace” b y “e”). The trace dominates no lexcial / ENGINEERING be bound more than one time. every category. NP position can be filled by an empty categories only In Chinese, in the subject position and direct object appear in the indirect object posiposition, and never and never in the indirect object position tion, empty likes likes an Once being bound, the empty NP its antecedant. not be raised any further this is because empty NP has exactly one antecedant and cannot to will Not pronoun: SEE Chang-san (Chang-san raise-bind mechanism is used to cope with categories; in other words, to find out the antecedant for each empty category except those With the aid are free (eq. sentence (8) ) . which of the raise-bind mechanism, the parser will inserted into the vacant generate an empty NP Then position where an NP is expected to appear. up in some way along the empty NP will be raised when the tree is growing up the parsing tree, until (recall that the parser works bottom-up), At this point, the its antecedant is parsed. parser binds the empty NP by setting it to refer The playing) construction: pivot fibi were SE Chang-san (Chang-san (7) children de who children and prepositional object position. In our implementation, an empty NP contains three fields: (1) a field to keep the pointer to came antecedant, (2) a field to keep where it its and (3) a field to keep the syntactic or from, semantic constraints on the empty NP for later checking. We can formulate the rules informally to treat relativization as follows: for a noun and a the relative clause to be combined into an NP, relative clause must contain an empty NP which is unbound and marked coming from either subject position or object position, and the empty NP will be bound We as to the (head) can also state follows: once a noun. the rules clause is for passivization the constructed, parser checks whether ” @ +NP” (similar to volved in the clause. the prepositional “by+NP” in English) If so, there and marked empty NP which is unbound the object position, and it subject of the clause. will phrase, is in- must be an coming from be bound to the Rules for pivot constructions can be formulated as follows: in a pivot construction, the direct object will bind the empty NP coming from the subject position of the embedded clause. Similarly, rules for topicalization, batransofmration and others can be designed. To illustrate the above rules, let’s consider example (9) and its parsing tree in figure 1. such a construction or relativization, ation be ruled out. If the mechanism is adopted English sentence analysis, a test must be go to dinner de children were asked by Li-szu who to for permore formed to rule out sentences with one or categories which have no binder. But such sentences are in general grammatical in Chinese empty (see (8)). MORE V SYNTACTIC Relativization movement; several PHENOMENA in it Chinese is a long-distance can move an object across Noun phrase (IO’) is an ndoes. that is, S (sentence) example. (~O)[sZk a4 +p9 by Li-szu ask (the children to dinner) will rs $$J $&.W I ask Li-szu (the book which go (1-l) r.eZgf& NP e 111 & l- help me buy I asked Li-szu el I& A de the like s de to buy book for me) man If the head Noun phrase (11) is ambiguous. noun (“the man”) binds el, the NP means “the man If the head noun binds e2, whom someone likes”. it means “the man who likes someone or something”. To remove the ambiguity needs semantic interactions. (ask) el 4 Figure e2 + e3 /J\ a The (9) : parsing the dummy subject. (2) is a pivot construction, Node S2 is constructed. because of the PP, “by for passivization, rules is constructed. r) (children) tree follow the bottom-up parser (I) Node Sl is constructed Let’s Now we can e2 e3 + 1. 5iuEAR (go to dinne According of to (9) parse example serves V’ is constructed. and el as Node V’ so el is bound to e2. (3) S2 is a passive clause, Li-szu”. According to the e3 binds e2. (4) Node NP to the rules for rela- formulate the rules for relativizas follows: for a noun and a relative clause to be combined into an NP, the parser checks the raised from the relative clause. “empty-NP list” And “if no empty NP is raised, rule out the NP; and marked coming from if an empty NP is raised subject position or object position or embedded object position (as in (IO)), set the empty NP to be bound to the head noun; NPs are raised from subject and if two empty (as in (II)), employ semantic object position to determine the proper binding.” analysis ation Like relativization, long-distance the parsing tree in figure1 is finished, it is easy to answer who were asked and who went to dinner. Since el is the dummy subject of “go to dinner” and the binder of el is e2, whose binder is e3, whose binder is “children”, we can conclude it is “children” who went to dinner. In the same way, we also conclude it is “children” who were asked. A element raise-bind mechanism also serves as a rule out incorrect sentencesorincorrect trees. For example, if no empty NP is within a construction involving passiviz- The to is treated is also a in a similar Another syntactic phenomena crucial to the is known as the Complex NP Constraint (CNPC) (Radford, 1981): rule can move any CNPC -- No transformation parser filter parsing raised and way. tivization, e3 is bound to “children”. Notice that only e3 was raised up across node S2, because el and e2 had been bound beforeS2was constructed. Once topicalization movement complex NP (CNP) out is an of a complex NP containing NP. a relative clause. CNPC can be easily encoded in our grammar NPs can not be raised up way-- all empty ar: NP node. Hence it is impossible for the NP within a CNP to be bound to any element The in this across empty out of that CNP. In most cases, ba-transformation and passiviBut zation will move the direct objects of verbs. raising” the phenomena known as “subject-to-object (Radford, 1981) makes some differences: NATURAL LANGUAGE / 106 1 --The subject of an embedded clause can be moved into the subject (or ba-object) position of the higher clause by passivization (or ACKNOWLEDGEMENTS ba-transform- Thanks ation). Chen, is sentence (13) example, (12) by such a movement. For sentence derived to the and Chen, J.J. enlightening discussions of J.C. from REFERENCES people (This believe will mistake will be this is mistake believed to right be right) To cope with subject-to-object raising, the in previous section for passivization are of a passive the subject modified as follows: the object clause will bind the empty NP in either position or the subject position of an embedded clause. rules VI A COMPARISON WITH THE HOLD-LIST MECHANISM In ATN (Bates, 1978), the hold-list mechanism used for the purpose similar to that of the is raise-bind But mechanism. we object to approch, for (1) it is not fit for deal with null I;a;ser; (2) it cannot (6)-(a)); (3) it handles e . example position position left (eg. example (eg. example (Z)-(4)), (5)). extraposition, the position left (right) with right extraposition, not a to its ATN theory, bcth is called an NP to To deal mech- extraposition and right dominating is bound, extraposition move an NP to a position and a null pronominal, if its trace, to an NP dominating the always bound So, the raisepronominal (Chomsky , 1981) . null bind mechanism is sufficient to cope with all since its function is to raise empty categories, up an empty category to be bouend to an NP which dominates this empty category. VII CONCLUSION We have presented how the raise-bindmechanism copes with traces and null pronominals in Chinese. With the use of the mechanism, many sophisticated syntactic phenomena can be encoded in the grammar easily. The complete. remove correct reached. 1062 mechanism is simple and theoretically If semantic analysis is employed to such as example (II), the ambiguities, bindings of empty categories can always be / ENGINEERING Chomsky, . . Blnding, and Lectures N. (1981) Forise, Dordrecht. J. l-31 Huang (1982) Logical Theory the of on Government Relations Grammar, in MIT and Chinese doctoral dissertation. (1973) [Rustln "A General 19731. [41 Kaplan, R.M. Processor", in r51 Kay, M. 19731. b1 L. J. Lin, K. J. Chen, James Huang and L.S. Lee (1986) "SASC: A Syntactic Analysis System for Chinese Sentences", International Journal of Processing of Chinese and Oriental Computer an pronominals extraright extra- trace. uses another left 121 bottom-up anism. In linguistic Bates, (1973) The MIND Languages, Published Computer Society. left An movement if it moves (right) such M. (1978) "The Theory and Practice of Augmented Transition Network Grammars”,Natural Language Communication with Computers, pp.lVl259. I31 [71 Radford, Student's Theory, [81 A. (1981) Guide to Cambridge Rustin R, ed. ing, Algorithm by System, Syntactic in Chinese Transformational Chomsky's Extended Univ. Press, 1981. (1973) Press, Natural N.Y. Language [Rustin Language Syntax: Standard Process- A