English-Chinese Machine Translation System IMT/EC

Machine T r a n s l a t i o n System IMT/EC Chert Zhsoxlong and Gee Qingshi I n s t i t u t e o f Computing Technology Chinese Academy o f Science B e l J l n g , PRC. English-Chinese Abstract IM'I/EC is an E n g l i s h - C h i n e s e machine translation system ~,hich integrates some outstanding features of the case grammar end semantic grammar l n t o a u n i f o r m frame, LISeS v a r i o u s kuowledgo In the d i s a m b l g u a t i o n , and t r i e s t o m o d i f y the o b j e c t language by i t s e l f . In t h i s poF,er,we f i r s t i n t r o d u c e IMT/EC's design m o t i v e t i o r l and o v e r a l l a r c h i t e c t u r e , then d e s c r i b e the deslgn p h i l o s o p h y o f its t r a n s l a t i o n mechanisms and thelr procesging algorithms. way. To deal wlth this problem, we store these rules in their respected word entries end classify them es s e v e r a l c a t e g o r i e s in the IMT/EC, Each c a t e g o r y c o r responds tO 0 general subroutine epplleatlon mechanism, which apply the word specific rules and subroutines in the processing of translation. The subroutines are s t o r e d in a n a t u r a l language s p e c i f i c sub- r o u t i n e package. Some word s p e c i f i c s u b r o u t i n e s are d i r e c t l y s t o r e d in the respected word e n t r y . (~) Powerful J,The design nlotivation ]'he design of the IMT/EC system are motivated to develop new approaches to the Engllsh-Chinese machine translation, such as, to provide the system with powerful analysis meohanisnls end MT knowledge base menagonlerit system , as well as some exceptional processing and learning meohanlsms, that is, to make the system ba intelligent. In addition, it also tries to inregret9 as many advantages of conventional machine translation systems into a single system as possible, such as, to provide the system with powerful mechon-. Isms for the processing of various ambiguities and contextu,~l relations. The design of the IMT's translation mechanisms are based on the following conslderQti on ~;, (1) S t - a n a l y s i s In the development o f machine t r a n s l a t i o n in order to disambiguato the source system, language, we have t o {molyze the i n p u t deeply t o get the i n t e r n a l meonlng source representation of the However, the deeper we aaalyze the input, language. the more we lose the clues about how to express the translation, also, th(it it results in extremely poor or no translations ef sentences for which complete analyses can not be (h~rlved[Slocum 85]. To find a suitable analysis depth so as to get both clues about how to express the trorl~;lation of the input and to disombiguate the input conlpletely Is almost impossible. In the IMT/EC, we try t(, design a simple grammar analysis mechanism -- SC-gr'(~mmar auulysls mechanism to inherit both the outstanding features of case grammar analysis and semantic grammar analysis so as to produce a high quality translation, (2) Multl-language translations oriented In present technical conditions, it is impossible to design a general internal meaning representation for all natural languages. Thus, the knowledge based multl-language oriented machine translation system is difficult to be marketed in the near future. A feaslble way ~or d e s i g n i n g multl--longuage oriented machine translatJorl systems might be to separate the processing mechanisms from the language specific rules |as King et oI. ~5], thet ie, t O apply the same processing meotlonlsm with different language s p e c i f i c rules for d~ffErent natural language pair translations. In the 1NI'/EC, we develop a g e n e r a l r u l e representation form f o r the r e p r e s e n t a t i o n o f v a r i o u s knowledges used in the t r a n s l o t l o n . Knowledges f o r different language p a l r t r a n s l a t i o n s a r e s t o r e d in the d i f f e r ant packages o f the knowledge base IMT-KB. The knowledge base a r e o r g a n i z e d In m u l t i - p a c k a g e end multil e v e l way so as t o s t o r e r u l e s f o r the t r a n s l a t i o n o f d i f f e r e n t language p a i r s and d i f f e r e n t phases o f the processing. Thus, t h e system can be e a s i l y extended for' m u l t i - l a n g u a g e t r a r l s l a t l o n purposes. (3) D i v e r s i t y p r o c e s s i n g As the dlsemblguation rules are rather words specific, ±% is difficult to manage them in the same exceptional processing Since the n a t u r a l language phenomena ore so abundant t h a t any e x i s t e d machine t r a n s l a t i o n system can not process all the provide an exceptional phenomena, it is essential to processing mechanism in the system t o deal w l t h e x c e p t i o n a l phenomena. As IMT/EC i n c o r p o r a t e s s o m e l e a r n i n g mechanisms, thus, i t i s more p o w e r f u l in d e a l i n g w i t h the e x c e p t i o n s than others. (5) Automatic m o d i f i c a t i o n o f the t r a n s l a t i o n G e n e r a l l y speaking, machine t r a n s l a t i o n system can o n l y produce r i g i d t r a n s l a t l o n s , i t i s a d e s i r e t h a t MT systems be a b l e t o modify the o u t p u t by i t s e l f so as t o produce more f l u e n t t r a n s l a t i o n s . IMT/EC t r i e s t o a p p l y same common sense knowledge and l i n g u i s t i c knowledge o f o b j e c t language t o disamblguate the i n p u t end m o d i f y the t r a n s l a t i o n s , thus, t o improve the t r a n s l a t l o n q u a l i t y . In the f o l l o w i n g paragraph, we focus on the t r a n s l a t i o n procedure o f the system end the a l g o r i t h m s related to i t ignoring the knowledge base o r g a n i z a t i o n and management mechanisms. 2. The o v e r a l l a r c h i t e c t u r e o f the system The architecture of the IMT/EC system is as follow, ] knowledge System Engllsh I n p u t [ ~ M o r p h o l o g i c a l A n a l y s i s lI & Dictionary R e t r l v i n g ~ f Knowledge~ A p p l i c a t z .o nI J ~ / base management IMT-KB p._I_T_T..~1~ t I [ ~;n:l;;";'s ' - - ~ ~ L \~/ /~ / / Dlsambigu~on r / I\k | L / & Transfer ___ B°se.~_ 7 Augmentation-~ & Modifiootio~ _ f * ~ I \~ ' .... ',, I Acquisitioo - I the T r e n ? l o t l o n J I / ) t / / Fig. The a r c h i t e c t u r e of the IMT/EC 117 As the rule bose and dlctlonarv in a machine trenslatlon system is so vast that it ls impossible for human beings to find the confllotion end implication among the rules. To modify a rule in the knowledge base often results In many side effects on other rules. Thus, i t I s necessary to provide a self reo r g a n i z a t i o n and r e f i n e m e n t mechanisms in t h e k n o w l e dge bose. In t h e IMT/EC, we d e s i g n a s p e c i a l knowledge base management system IMT-KB t o manage ell t h e knowledge used In v a r l o u s processing phases of the translat i o n . In a d d i t i o n , IMT/EC a l s o p r o v i d e s o knowledge bose a u g m e n t a t i o n and knowledge a c q u i s i t i o n environment f o r t h e system t o augment system p e r f o r m a n c e by itself and f o r t h e users to improve t h e knowledge base. The c o l 1 relations connected by d o t t e d l l n e s i n t h e f i g u r e above o r e e x e c u t e d o n l y when t h e user s e t s the l e a r n i n g mechanisms in working s t a t u s . These mechanisms can a c q u i r e new knowledge i n t h e dynamic interactive, static interactive,or disconnected ways. They ore primarily used to resolve the exaeDtlenel phenomena in t h e translation. Dynamic Interactive Learning (DIL): Whenever the system encounters c sentence out of its processing range, it produces various possible translations for each segment of the sentence and interacts with human beings when necessary to select on appropriate translation of the segment and combine them to get o correct translation of the sentence. At the some time, it also creates'some n e w rules to reflect the selections. That is, it learns some new knowledge. S t a t i c I n t e r a c t i v e Learning (SIL): Whenever the system e n c o u n t e r s a sentence out o f its processing r a n g e , i t records down t h e sentence and its appearance context in e f i l e . A f t e r the t e x t has been translated, i t begins t o analyze the sentence in d e t a i l to get various possible t r a n s l a t i o n s f o r each segment o f the sentence and i n t e r a c t s w i t h human b e i n g s when necessary to get a p p r o p r i a t e translations of the segments and combines them t o g e t a c o r r e c t t r a n s l a t i o n of the sentence. At the same time, it also c r e a t e s some new rules to reflect the selections, thus, to learn new knowledge. Disconnected L e a r n i n g (DL): Whenever t h e system e n c o u n t e r s o sentence out of i t s p r o c e s s i n g range, i t analyzes the sentence in detail to get all the possible translations, and then evaluates these translations according to the p r e f e r e n c e rules stored i n . t h e IMT-KB to select on appropriate translation and modify the related rules used i n t h e a n a l y s i s t o r e f l e c t t h e s e l e c t i o n s . I t s k i p s o v e r sentences which the translation con n o t be d e t e r m i n e d by t h e p r e f e rence rules Instead of interacting with human beings. 5. The translation procedure IMT/EO's tronslatlon procedure is divided into several phases, i.e.,morphology onalysls and dictioncry retrlvlng, SC-grammor anolysls,dlsamblguotlon and t r a n s f e r , modification o f the t r o n s l a t l o n etc. The communlcotions between t r o n s l o t l o n mechanisms and t h e knowledge bose o r e p e r f o r m e d by t h e knowledge base management system IMT-KB, these o p e r a t i o n s includes getting a se~ of r e l a t e d rules and returning some I n f o r m a t i o n for the modification as w e l l aS a u g m e n t o t l o n o f t h e MT knowledge bose. 5 . 1 . Morphology a n a l y s i s and d i c t i o n a r y r e t r i v i n g In t h e IMT/EC, words i n most common uses con be r e t r i v e d by e i t h e r t h e i r base forms o r t h e i r s u r f a c e forms, w h l l e most o f t h e o t h e r words can o n l y be r e t r i e v e d by their base forms. The tasks of the morphology analysis ore to process the prefix, suffix, and compound words. Since t h e s e p r o c e s s l n g s ore c o m p l e t e iv natural language s p e c i f i c , in order f o r the processing mechanisms t o be language independent, we develop a language independent morphology analysis mechanism to opply the language specific morphology rules In the morphology analysis, The morphology analysis r u l e form is < s u r f a c e p a t t e r n > -> < c o n d i t i o n s > I < r e s u l t > 118 Here, <surface pattern> is the surface form of the word t o be a n a l y z e d , <conditions> is the oppllcotlon condltlons of. the rule, <result> Is the definltlon of the word base form analyzed. For example, (1) (" s)-> (2) (" s)-> (3)(-1 - "2)-> (verb -) I (def("), SV) (noun * ) I ( c l e f ( * ) , PN) (word - 1 ) ( w o r d "2)1 ( ( d e f ( m o r p h o l o g v w1), d e f ( m o r p h o l o g v " 2 ) ) , CaM) Here, *, -1 and *2 are variables Indlcating that it con be bounded to any sub-character string of the word t o be analyzed, def(X) is the definition o f X in t h e IMT-KB, SV, PN, cam a r e surface 'features of the word. Rule ( I ) I n d l c o t e s t h a t when t h e l a s t c h a r a c t e r o f the surface f o r m o f a word i s ' s ' and t h e remained c h a r a c t e r s t r i n g * i n t h e word i s o v e r b , then i t s s u r f a c e f e a t u r e i s t h e s l n g u l o r v e r b f o r m (SV) o f t h e v e r b *. Thus, I t r e t u r n s t h e v a l u e o f (def(*), SV) as r e s u l t . Rule ( 2 ) i n d i c a t e s t h a t when t h e l o s t c h a r a c t e r o f the surface form of a word i s ' s ' and t h e remained character string * in the word is o noun, then its s u r f a c e f e a t u r e i s t h e p l u r a l noun f o r m (PN) o f the noun *. Thus, i t r e t u r n s t h e v a l u e o f (def(*), PN) as r e s u l t . Rule ( 5 ) i n d i c a t e s t h a t when t h e c h a r a c t e r s t r i n g of o word comprises a character '-', the left pert ~1 and the right part w2 of '-' ore both words, then it l s o compound word of ~I and w2.Thus, it applies morphology rules to analyze the word ~1 and *2, and returns the value of ( ( f ( m o r p h o l o g y - 1 ) , f ( m o r p h o l o g y w2)),COM) as r e s u l t , Suppose t h a t , SX indicates that X is o variable, #X r e t u r n s t h e c h a r a c t e r l l s t o f X, &X returns the l o s t character o f X, >X returns the f i r s t part o f r u l e X or the first element o f a l i s t , <X r e t u r n s the remained p o r t of X which (>X" <X o X), f(X,V) returns the first different item palr of X ond Y, lookup(X) looks up t h e d i c t i o n a r y and returns the deflnltlon of the word X, search(X) returns the morphology rules which leclu-. des character X, c h e c k ( X ) tests whether two elements of the item pair X is uniflable or nag, n u l l ( x ) tests whether llst X IS empty, apply(g,x) returns the r e s u l t o f g(X), t ( X ) t e s t s whether r e s u l t X n e e d s f u r t h e r onolysls and performs r e c u r s l v e analysis when n e c e s s a r y . The a l g o r i t h m f o r morphology a n a l y s i s and d i c t i o n - erv retrievlnq i s as f o l l o w . INITIALIZE $X <- #word; SP <- s e a r c h ( & SX); $P <- $PU s e a r c h ( > $X); $ r e s u l t <- ( ); f o r $ r u l e m SP do {MATCH SPAT <= > Srule; $COND <>"< $ r u l e ; $RES <<=< $ r u l e ; Loop $ p a t r <-- f(SPAT, SX); if (null($polr)) g o t o TEST; if (not(check($pair))) break; SPAT <- $PAT~ S p e l t ; (2) Find a list of expectation pathos from the p h r a s e [List as follow,. X, "'~')( i , J , ) X~;(J,~l k,! X~l(m,4tm) i P A I R .I. <-. i P A I R l.U ( $ p o l r } ; g a t e Loop ; rEST f o r $CONDI e$COND, do ($PROP <- l o o k u p ( > e < ($CONDL)); if ( n o t ( a p p l y ( > } $CONDI, SPROP)) b r o o k ; |:~ESUI.T iPROP <-- l o o k u p (> ($RES~'$PA);R L ) ) ; iresult <-- i r e s u l t U { ( $ R E S ~ iPA:[R L, iPR(IP ) } ; ) if (pull($res~it)) r e t u r n else returrl word t($resu].t); El~ll), 3.2. St--Grammar Anulys:Ls The S t - g r a m m a r enolysis mechanism o f IMT/EC U|)plle~ th( SC--rolos stored ill the iMT-KB to dlsumblgLlate the ~.trueturol embigultles of the input senten-cos end p r e d m ; e s t h e s t r u c t u r a l description f o r them. "ihe grammar |lOS some o u t s t e n d i n g f e a t u r e s o f t h e case grommet ond s e m u n t i c gr'ommor. The r u l e f o r m l s cs follow, <S-STRUCTURE> ~> <S-ENVIRONMENT>I <R=SI'RUCTURE >, <I~- ENVIRONMENi> <TRANSFER>. Itore, <S-.SIIRUCrUI~E> mid <S-ENVIRONMENT> a r e r u l e conditions which defines the current strtlctLIr(]l f o r m end contextual foattlres of the input, <R-.STRUCTURE> und <R-ENVIRONNiENT> ore result strueturul f o r m ~nd corltextuel features of the input, <TRANSFER> e r e t h e trensformotlo~Is related to the rule. lho structural forms , <S-SI'RUCTURE> end <R-STRUCTURE> o r e r e p r e s e n t e d os s t r i n g s o f syntogmos arid w o r d s . ] h e c o n t e x t u e l envlronments,<S-ENVIRON~ENT> and <R-ENViRONMENT> o r e r e p r e s e n t e d os v e c t o r s , of w h i c h each e l e m e n t c o r r e s p o n d s t o on i n t e r - s e n t e n t t a l reletion o r e s p e c l a ] eeoc, t h e i r v a l u e s e r e used t o resolve the ellipsi.s,dnephore, t e n s e and e s p o c t s e t c . it is the principal contextual p r o c e s s i n g mochenisms in t h e IMI/EC. Since the contextual vector i s used o n l y os a supplo*~lent to the p u r e semantic grammar ona]vsls, espeele].].y in the processing of contextual relations, it is riot necessary to analyze the Irlput to the e x t e n t t h e | . one con g e t e l l t h e s e m a n t i c r e l a t i o n s of the input. Thus, the vector processing formalism is completely acceptable. Two example rules ore as follow, NP VP -> A I S, change(B1,×), INP IVP. i n NP -> A1 I PP, c h o n g e ( B 2 , X ) , z e l INP n u e i . S t - g r a m m a r o n e l y s i s mechanisms r e c e i v e t h e r e s u l t s of morphology analysis or previous SO-reduction, send t h e messages t o t h e IMT-K8 t o g e t r e l e t e d r u l e s , end apply these rules to ,reduce. the input until o non- terminal symbol S is reduced, thus, structural d e s c r i p t i o n o1' the input. The SO-grommet a n a l y s i s algorithm to produce the o f t h e system is:. (I) ]in the entries of' the [MT-KB dictionary, we stored not only the word meanings and their disembi-guotlon conditions, but oleo SO-phrase end sementlc rules specJflo to the entry word. When onulyzlng e sentence, the system first retrieves the SC-phrose rules specific t e t h e w o r d s a p p e a r e d irl t h e s e n t e n c e , end ~ p p l i m { t h e s e rules to find a list of possible p h r a s e s o f t h e s e n t e n c e f r o m t h e c o n t e x t o f t h e words i n t h e senl;eneo. The p h r ~ s e l i s t r e t u r n e d i s os f o l l o w , X, ( i , , J, ) X~(±~,J:) x~(i.,j~). Here, XI, X~., ...,Xm ore phrase syutogma Identif$ers, i } , ~) . . . . . J.~ arid J~, Jz . . . . . J~ ere ending trons of the phrases in the input sentence. posi- vl~), × ~ / ( l , J m i X~)(Jm÷l,k ) ~ U"~+I,n#) P(w ~ ) P(w~+ L) .,, p(w~z. # (Here, P ( w ) i s t h e word w i t s e l f or its property, t is the current onalysls position which initial veluo is ~,I is the expects|lee l e n g t h d e f i n e d by t h e u s e r ) end o r d e r them by means o f the phrase ending post-tlons n=,n~ . . . . . n~ from lerger to smeller. These p o t h e s a r e used as h e u r i s t i c s in the ano].ysis ef the s e n t e n c e . We t r y one new p a t t i o t one b e c k t r e c k i n g . ( 3 ) Send t h e ano].yzed component M = V,(...) V~(...) ... V~(...) cndIiihe current expectetion path t o theIIMT---KB t o retrleve the'SC-ruIes'which heed p c | t e r n s contain sub-string of ,(...) ... v~(...) x~(..:) ... xz(...) ~1 Path = or (. ) v~( ) P (w~ ) ... P (w~.~) and o r g a n i z e t h e s e r u l e s i n a l i s t according to their p r e f e r e n c e s f r o m higher" Lo l o w e r . Then, i t t o k e s one rule from the rule llst a t one b u c k f i r e c k l n g and go t o (~) to appl V the rule to reduce the input. I f no r u l e i n t h e l l s t con be s u c c e s s f u l l y applied t o r e d u c e t h e i n p u t , t h e system g e t s t h e n e x t expectation path from (2)end repeet (3). If all the expectation p e t h e s have been t r i e d end no s u c c e s s f u l r u l e has been a p p l i e d , it returns 1;o t h e l a s t analysis p o s i t i o n t o r e - a n a l y z e t h e i n p u t . I f t h e c u r r e n t erla-lysis position Is the beginning of the sentence, the system c o i l s t h e exception processing ,lechenism t o deal with this un--analyzable sentence. (l~) Match the rule head p a t t e r n w i t h t h e c u r r e n t form of the input sentence. If there is a sub-pattern of the current sentence pettern that can match t h e r u l e heed, t h e n go t o ( 5 ) e l s e g e t t h e n e x t r u l e f r o m ( 3 ) end t r i e s t o re-.match them. (5) First, odd some n e w l y formed phrases into the p h r a s e l i s t in order For the backtracking of the o n e lysls, t h e n c o l l t h e cese e n e l y s ± s mechanism t o check t h e e u r r e e t a n a l y s i s r e s u l t s und t h e c u r r e n t form of t i l e s e n t e n c e 'to F i l l in the rose freme A, B i n t h e r u l e end t h e c o n t e x t v e c t o r . The case a n e l y s i s algorithm is described In the following paragraph. ( 6 ) Check A end t h e c o n t e x t v e c t o r t o see w h e t h e r {~ their values are unlfleble. If they are unifieble, t h e n go t o ( 7 1 , e l s e g e t t h e next rule from ( 3 ) and returns to (4). (7) Store the backtracking informetion into the temporary stock, substitute the reducing part of the current sentence form with the reduced form, change the current analysis position t o t h e l a s t word oF t h e n e w l y r e d u c e d s y n t a g m a , c n d change t h e r e l a t e d e l e m e n t v a l u e s o£ t h e c o n t e x t v e c t o r a e c o r d i n g t o t h e e l e m e n t v a l u e s o f B. If the current position i s n o t t h e end o f o s e n t e n c e , t h e n go t o ( 2 ) , I f t h e c u r r e n t position is t h e end o f a s e n t e n c e end t h e c u r r e n t f o r m o f t h e s e n t e n c e i s n e t S, ~hon go t o ( 2 ) , If the current position i s t h e end o f e s e n t e n c e end t h e c u r r e n t f o r m of the s e n t e n c e i s S, t h e n go t o (8), ( 8 ) C a l l t h e s e m a n t i c p r o c e s s i n g mechanism t o check t h e r e s u l t o f t h e n n s l y s l s t o see w h e t h e r ~ t v i o l a t e s the English collocetlon rules. If the result violetes the collocation rules, t h e system recovers to the status before the last r e d u c t i o n and g e t s 'the r l e x t rule from (5) to r'e-onoiyze the input. Otherwise,there wlll be two cases, a, If the u s e r only needs the most adequate •trensletlon, the system proceeds to analyze the next sentence. b. I f the user needs ell possible trcnslations, t h e system r e c o r d s down ~he c u r r e n t r e s u l t end r o s e v e r s t o t h e s t e t u s b e f o r e t h e Z e s t r e d u c t i o n end g e t s 119 the next r u l e from (5) t o re-analyze the input in order to get other onolysls r e s u l t s . AS we have m e n t i o n e d b e f o r e , t h e case a n a l y s i s i n the SC-analysls i s o n l y a complement t o t h e s e m a n t i c analysis. I t i s m a i n l y used t o des1 w i t h t h e c o n t e x t relation and a § p e c t , tense, modal e t c . Thus, the s y s t e m o n l y needs t o a n a l y z e t h o s e cases w h i c h can be used i n t h o s e p u r p o s e s . I t l s much s i m p l e r than the case a n a l y s i s i n t h e c a s e grammar a n a l y s i s . The case a n a l y s i s i n t h e S C - e n a l y s l s l s performed by t h e f o l l o w i n g a l g o r i t h m , ( 1 ) Get t h e case expressions defined in the el'emerita o f " v e c t o r A and B. The f o r m o f t h e e l e m e n t e x p r e s s i o n s o f A and g l s s~[i]:E Here, S ~ [ I ] i n d i c a t e s t h e e l e m e n t case C d e n t l f l e r ( S # ) of t h e case f r a m e A o r g i s c o r r e s p o n d e d t o t h e case identifier sill o f t h e s y s t e m case f r a m e , l . e . , s y s t e m context vector. E i s t h e e x p r e s s i o n used t o g e t t h e value of the respected case. (2) Retrieve the definition o f t h e case identifie r s f r o m t h e s y s t e m case f r a m e and o r g a n i z e these case i d e n t i f i e r into a 1let according to their prefer e n c e s f r o m h i g h e r t o l o w e r . The f o r m i s , (S[il].subject:EI,S[12].obJect:E2 ..... S[lm].Em .... ) (5) Evaluate the value of the elements in the case identifier llst, and flll them Into the respected p o s i t i o n i n t h e case f r a m e A end B. T h e r e a r e many cases i n t h e e v o l u a t l o n . a. E ls a constant, returns E, b. E l s empty, evaluate the case value according the definition o f t h e case i d e n t i f i e r , c. I f the case identifier l s a syntagma idetlfler, then finds the vclue of the identifier from the analyzed input according to the heuristics p r o v i d e d by t h e e x p r e s s i o n E, d. If the ease identifier is o sementlc identifier, t h e n c a l l t h e s e m a n t i c mechdnlsm t o g e t t h e v a l u e w h i c h can be f i l l e d i n t o t h e case identifier from the input according to the heuristics p r o v i d e d by t h e e x p r e s s i o n E, e. For o t h e r case i d e n t i f i e r s , c a l l thelr respect e d Subroutines to get the value of the case. These subroutines are defined by the rule designer. The case analysis in the SC-analysis con solve the elllpsls, anaphora, and other contextual p r o b l e m s . 5.5. Semantic dlsambiguatlon and transformation The SO-rules define not only the r e l a t i o n s f o r the syntagma reduction, but also contextual vector value changes w i t h r e s p e c t t o t h e r e d u c t i o n of o sentence, and t h e r u l e s r e l a t e d t r a n s f o r m a t i o n s . The t r a n s f o r m a t i o n o p e r a t i o n d e f i n e d in t h e SC~rule is in the follewlng forms, IX IX ... IX Here, I X , IX .... , IX are translations of the syntagmas X , X , . . . , X i n t h e r u l e head. T h e i r positions indicate the positions of the translations of the syntogmas. There will also be some indicators In the string which are used to indicate positions of the translations for inserting tense, voice, modal modiflers.These indlcotors are used as the heuristics of the semantic processing. The transformetlon in the IMT/EC ls relatively slmple. It travels over the whole anolysls tree from top to down, left to right, transfer every node when the node is £raveled.J'he result of the transformation is the Chinese utterance of the sentence. Rules with same head patterns may have different case frames A and B , i n t h i s c a s e , t h e y may c o r r e s p o n d to different transformation operations. These r u l e s o r e d e f i n e d as two d i f f e r e n t r u l e s by t h e r u l e d e s l g ne'r. W h f l e i n t h e IMT-KB, t h e s y s t e m s t o r e s them as one r u l e w l t h many c a n d i d a t e r i g h t p a t t e r n s . Whenever t h e head p a t t e r n i s s u c c e s s f u l l y matched, t h e system sequentially checks these candidates until one o f them i s satisfied and r e c o r d s down t h e c u r r e n t s u c c e s s f u l position so t h a t backtracking mechanism can 120 get the other candidates when necessary. The tasks of the semantic processing in the IMT/EC ore t o check t h e r e s u l t s o f t h e a n a l y s i s t o see whet h e r they satisfy the syntax or semantic collocation r u l e s d e f i n e d i n t h e IMT-KB, t o produce the suitable modifiers for expressing the tense, volce, aspects and so on I n t h e C h i n e s e . I n some c a s e s , l t a l s o a p p l y t h e w e l l formed w o r l d knowledge d e f l n e d i n t h e IMT-KB t o e l i m i n a t e some 1 1 l e g a l e x p r e s s i o n s and e x t e n d t h e meanings o f some a m b i g u i t y w o r d s . Slnce the SC-analysls is based on t h e s e m a n t i c grammar analysis, most of the syntax and semantic ambiguities are solved in the reduction operations. Even t h o u g h t h e case a n a l y s l s i n S C - a n a l y s l s l s aimed mainly to resolve the contextual problems, they can a l s o s o l v e some a m b i g u i t i e s among o s e n t e n c e . T h a t i s , t h e s e m a n t i c p r o c e s s i n g i n t h e IMT/EC l s o r l e n t e d to s p e c l f l c a m b i g u i t i e s and l n t e r - s e n t e n t l a l case v a l u e evaluations. Though t h e p r o c e s s t n g s a r e d i f f e r e n t in d i f f e r e n t phases o f the t r o n s 1 o t l o n , they can be categorized as, (1) determining the value o f o s p e c i f i c semantic i d e n t i f i e r , such as tlme adverbial, place edverblal, anophoro etc. When o s p e c i f i c semantic i d e n t i f i e r i s concerned, the semontlc, processlng mechanisms f i r s t finds the key word which con match the semantic i d e n t i f i e r from the sentence,such as word wlth tlme,plaoe properties, then get the phrase which comprises the key word in the sentence, and return the phrase as the value o f the i d e n t i f i e r . Only simple anaphoro phenomena are considered in the IMT/EC. They are processed in two d i f f e r e n t ways. One i s to compare the synonyms t o f l n d the. anaphorn words, the other I s t o f l n d the s u i t a b l e anaphoro content through the p o s i t i o n r e l a t i o n s , s u c h c a , in some s p e c i f i c context the word 'which' can refer to the noun phrases immedlotel y before i t . (2) checking the c o l l o c a t i o n o f syntogmas. There ore t h r e e p o s s i b l e c a t e g o r i e s o f c o l l o c a t i o n I n the analysls results, <1> X W -> (W => CI) <2> W Y -> (W => C 2 ) <5> X W Y -> (w => C 3) Here, X, Y may be s t r i n g s o f words or syntagmos, W I s a s p e c i f i c word. The above expressions means t h a t , <1> W appears a f t e r s t r i n g X and functions as speech C I , <2> W a p p e a r s before s t r i n g Y and f u n c t i o n s as speech C~, <5> W a p p e a r s between s t r l n g X and Y and f u n c t i o n s as speech C~. The r e l a t e d word d e f i n i t i o n in the IMT-KB d i c t i o n a r y is as f o l l o w , W := C, (E, => MI) c: (E~ :> M~) (E~ => M~) Cm (E~ => M~) Here, C i s the speech category, E i s the context s t r u c t u r e o f word W, M I s the meaning o f word W. The semantic processing mechanism r e t r i e v e s the c o l l o c a t l o n rules s p e c i f i c t o words o f the s e n t e n c e f r o m t h e IMT-KB, and a p p l i e s t h e s e r u l e s t o c h e c k t h e analysis result to see whether there is any Violation between the analysis result and collocation rules. If there l s , returns f e l l . (5) cheoklng the d i s t a n t contextual r e l a t l o n s . There are also three possible categories o f d i s t a n t c o n t e x t u a l r e l a t i o n s appeared in o sentence, X ... X . . . W [m] W . . . Y [n] W . . . Y [m,n] -> (W => C~) -> (W => C~) -> (W => C~) Here, X, V, W, C h a v e the same meanings as in the (2). n, m are o p t i o n a l , they defines the r e l a t i v e p o s i t i o n between the word W and XIY. When n, m = 0, they ore the cases described in (2), When n, m is not d e f i n e d , t h e y i n d i c a t e s any p o s i t i o n b e f o r e / a f t e r the w o r d ~V. These d i s t a n t c o n t e x t u a l r e l a t i o n rules are d e f i n e d l a t h e It~-I'--Kg I n t h e same way as i n ( 2 ) . If m and/or u are present, the semantic processing mochenisdl f i n d s m/rl word before/after t i l e word W i n t h e s e n t e n c e , and t r i e s t o r e d u c e t h a t word and i t s a d j a c e n t words i n c a X or' Y. I f t h e y can be r e d u c e d , end t i l e word W f u n c t i o a s as t h e same c a t e g o r y as d e f i n e d in t h e r u l e , trlen S u c c e s s e s , e18o e l i m i n a t e s tlm analysis, if .iaud n are not ~efined, then try to find the word b e f o r e / a f t e r t i l e word W w h i c h con be r e d u c e d i n t o X or Y t o g e t h e r w i t h i t s a d j a c e n t words. I f t h e r e e r e no such e l e m e n t in tile sentence, then returns f(lil. (ll)cre,3ting Chinese modifiers to express the tense. v o i c e , modal arid so on Grid i n s e r t t h e s e m o d i f i e r s in tile t r(3Jisloticn according to tile position mark Ip'pearoci i u the r u l e . t h e 9 r e c e s s i n g p r o c e d u r e i s as f o l l o w , a. GeL ClIO niorks o f t h e t e n s e , v o i c e , modal e t c . b. Coll. trio' c o r r e s p e l l d / n g s a b r a u t i n e s d e f i n e d by t h e r u l e d e s i g n e r t o del.orlidtle on a p p r o p r i a t e m o d i f i - o r 1-'or t i l e mclrk. T h i s i s bclsod m a i n l y on t i l e s o a texi;llul structure of the analysis result. C. Ins(~rt tri() modifier in trlo p o s i t i o n o f tile t r u n s lati~)n marked by t h e n i a r k e r . For" e x a l l l J i l o , i f a v e r y i s i n t h e ' - l e g ' f o r m and t i l e r e &s I:e t i m e o d v o r b i o l i n the i~lput, the tease of tho contoxt are all progressive, t h e n i g n o r e s trle t i m e rHark. I F t h e p r e d i c a t e o r e 'be g o i n g t o ' , then transl a t e s i t as ' d a s h u a n g ' i g n o r i n g t r l o t i m e mark. The w o r l d knowledge r u l e s a r e d e f i n e d i n t h e same f o r m as i h e s e m a n t i c r u l e s . "lhe a p p l i c a t i o n of these r u l e s (Jr( t o t e s t tile c o n t e x t to f i n d the s e m a n t i c f'o(~ttlres Of t h e s l t L I o t I o n end coIdpQre t h e s e t o the w o r l d mo(iel d e f i n i t i o n dei"i~lc, d if~ t h e w o r l d knowledge r u l e t o [{el;ormiue t h e s i t < l o t i o n of" t h e u t t e r a n c e , and thO~l d e t o r m i d o t h e c o r r e c h t r a n s l a t i o n o r exterlct t h e mounlllgS Of r e l a t e d w o r d s . Every semuntlc processing nleChOnlsm mentiorled .b<)ve co,~responds to u s p e c i f i c processing subrout.i r l e . rtie;~.~ s a b r o a t i n e s a r e c a l l e d i l l t h e grammar onel y s l s Ulid t r a n s f o r m a t i o n processing to perform the r e l a t e d [~{911tantic p r o c e s s i n g . ( 1 ) I s p r i m a r i l y asod i n t h e case q n a l y s l s , ( 2 ) and ( ~ ) a r e p r ' I m a r l l y used i n c h e c k i n g cbe analysis r e s u l t and d i s e m b i g u a t i o n s i n t h e a n a l v : ; I s and t r o n s f o r t l l a L i o n , ( 4 ) is p r i m a r i l y used ill the tr,msformotlon. The gr~muner and word t r n n s f o r m a t l o n q l g o r $ t h m i s , ( 1 ) C H r r e n t = n o d e <-- r o o t o f t i l e e n a l y s l s t r e e , ( 2 ) ) Z ' £ t h e c u r r o n t - I n O d O i s o l o a f node, go ( 4 ) , ( ~ ) The c u r r e n t - - n o d e is not a leaf node, the p r o c e s s i n ! j a r e as f o l l o w , a. i I ' ~ll the elemerlts in the transformation e ~ : p r e s s i o n o f t h e node c~re c o n s t a n t , go ( 5 ) , b. Ji ~ t i l l t h e variables le trio transformation expression of the node a r e s u b s t i t u t e d by c(~nstants, t h e e call s e m a n t i c p r o c e s s i n g mechanism to create suitable modifiers. Go ( 5 ) . a. i f ' t h e r e a r e scale u n s u b s t i t u t e d variables in the expression o f t h e node. s e t t h e s e v a r l a - . b i a s r~s c u r r e n t - n o d e er)e by one, aed uses t h e r e s u l t s r e t u r n e d by each subnode t o replace the vari(lbles. (l~) Whet! t h e c u r r e n t = n o d e Js a l e a f n o d e , t h a t i s , it is n spec~flc word o r arl l d l o l ~ , then retrieves its defintt;lon l=l"Onl t h e IM'I'-,KB, cell the semantic i } r e ( : o s s l u [ I ,lecheli I Sill t o determine on appropriate moaning f a r i t according to the tree s t r u c t u r e . (5) I | ' Lhe eurrent-aode i s root node,then returns trio c u r r o n { ; f o r n l o f the transfermatiou e x p r e s e i o n es the translation of the sentence. Otherwise, returns tile expr'ession to the parent uode,reeovers the parent nod~ rJs c u r r e e t node. Go ( 2 ) . tdanslatlon. The main t a s k s c o m p r i s e s : a. Change t h e order o f t h e p h r a s e s and words o f the translation, b. S u b s t i t u t e some words w h i c h c o l l o c o t i o n is not commonly used i n t h e C h i n e s e u t t e r a h c e f o r t h e synoi nymous words, c. l n s e r t some conjunctive words when necessary, d. E11m~.nate some redundant wards. The algorithm f o r these processing i s , ( I ) According t o the Chinese o o l l o c o t i o n rules defined in the IMT-KB, changes the words and phrases order o f the t r a n e l a t l o n which are not in accord with t h e c o l l o c a t £ o n c o n v e n t i o n s i n C h i n e s e , such as, Budon . . . . Erchia .... (2) According to the co-occurrence rules of the C h i n e s e words d e f i n e d i n t h e IMT-KB, check t h e uses o f t h e C h i n e s e words i n t h e t r a n s l a t i o n . If they are n o t i n accord w i t h the co-occurrence r u l e s , then replaces t h e s e words with the Chinese synonymous words u n t i l t h e y ape a c c o r d to the rules. I f t h e r e i s no s u i t a b l e s y n o n y m s . t h e n t r i e s t o e x t e n d t h e meaning o f some w o r d s . The meaning e x t e n d i n g r u l e s a r e d e f i n e d i n t h e word e n t r i e s . I t s f o r m i s as f o l l o w , <word> :- <condition <condition 1> 2> < e x t e n s i o n I> < e x t e n s i o n 2> <condition n> < e x t e n s i o n n> Here, <word> indicates the word appeared in the sentence, <condition> defines the extending conditions, <extension> is the utterances extended. i£ the word can n o t be r e p l a c e d o r e x t e n d e d , t i l e n just returns the source translation. ( 5 ) Check t h e translation to find the redundant words and e l i m i n a t e s them. The f o r m o f d o l e t i o r l r u l e is, X V X Z -> p ( X ) , p (V) i X V Z such as, 'NP de NP de -> NP NP d e ' . Since the modification has no a b s o l u t e standard and r e q u i r e s a l a r g e amount o f w o r l d knowledge, i t i s rather dlfflcult t o s o l v e t h i s p r o b l e m i n one day. I n t h e ZMT/EC, we o n l y d e a l w i t h t h e most s i m p l e eases. More complex s i t u a t i o n s can be s o l v e d w i t h t h e a p p l i c a t i o n and i m p r o v e m e n t o f t h e s y s t e m . T h u s , t h e system is designed t o be e a s i l y e x t e n d e d w i t h t h e a p p l i c a tion. if t h e u s e r needs h i g h q u a l i t y translation, he may c a l l the post e d i t i n g subroutine t o modify the t r a n s l a t i o n by human b e i n g s or with t h e aid o f human b e i n g s . A t t i l e same t i m e , w e can a l s o s e t t h e l e a r n i n g mechanisms in working status t o trace t h e m o d i f i c a t i o n p r o c e d u r e oF human b e i n g s and p r o d u c e some u s e f u l r u l e s For t h e s y s t e m . I~. Summary In conclusion, we hove l n t r o d u c e ~ ~ translation processing procedure 9f the English-Chinese machine translation s y s t e m IMT/EC, and d e s c r i b e i t s p r i n c i p a l processing algorithms. Aknowledgement: We would like to t h a n k Hang X i o n g , Zharlg Yujie, Ye Y i m l n , Tong J i o x i o n , Zong L l y i , Zhong Z i f e , Chen Z1zong, Chen Z i z e n g and Fu Wei For t h e i r c o o p e r a t i o n i n t h e i m p l e m e n t a t i o n o f IMT/EC, 5. ~.t, The m o d i f i c a t i o n of: t h e t r a n s l a t i o n The o b j e c t i v e of" t h e a u t o m a t i c m o d i f i c a t i o n of the tr'auslGtlo~i i S tO ililprove the roadability of the transl{~L:lorl,but tilts socrlflces part of the accuracy. I t i s more s u i t a b l e f o r t h e n o n - s c t e l l t i f l c literature Reference [1 ] A x e l b i e w e r , C h r i s t i a n F e n n e y r o l , Johannes R l t z k e , ErWirl i t e g e n t r l t t ( 1 9 8 5 ) , ASCOF-A m o d u l a r m u l t i l e v e l system f o r French-German t r a n s l a t i o n , OL, V o i . 1 1 , No. 2 - 5 , p'157-'154, 1985. [ 2 ] B e r n a r d V a u q u o i s and C h r i s t i a n ~ ] o i t e t , A u t o m a t e d 121 t r a n s l e t i o n a t Grenoble u n i v e r s i t y , CL, Voi.11, No.I, p28-36, 1985. [5] galena Henlsz-Dostert et e l . , Machine Translat i o n , Mouton publishers, Hague, Paris,New York, 1979. [4] Harry tennant, Natural Language processing, Pet r o c e l l i books, New York, 1981. [5] Hiroshl Uchida, F u j l t r u machine t r a n s l a t i o n system: ATLAS, FGCS, Vol.2, No.2, p95-1~0,1986. [6] Jaime G. Carbonell and Masaru Tomita, New approaches t o Machine Translation, TR-CMU-CS-85-143, Carnegie-Mellon U n i v e r s i t y , 1985. [7] Jonathan Slocum, A survey o f machine t r a n s l a t i o n : i t s h l s t o r y , c u r r e n t status and f u t u r e perspectives, CL, Vol. 11,No. 1, p1-17, 1985. [8] Kazunori Muraki, VENUS: Two-phrase machine t r a n s l a t i o n system, FGCS, Vol.2, p121-124, 1986. [9] Martin Kay, The MIND system, Natural Language processing, edited by Rustin, Algorithmics press, New York, 1975, p155-189. [10] Makota Nagao, Current Status and f u t u r e trends in machine t r a n s l a t i o n , FGCS, Vol. 2, No. 2, p77-82, 1986. [11] M.Nogao, J . T s u j i l and J.Nakamura, Science and Technology agency's machine t r a n s l a t i o n proJect,FGCS, Vol.2, No.2, p125-14~, 1986. [12] Murlel Vasconcellos and MarJorie Leon, SPANAM and ENGSPAN: machine t r a n s l a t i o n at the PAN American health o r g a n i z a t i o n , CL,Vol.11,No.2-5, p122-156,1985. [15] Paul L . G a r v i n ( e d . ) , Natural Language and the Computer, McGraw-Hill book, New York, London, 1979. [14] Perelra F. and Warren D., D e f i n i t e clause grammar f o r language onalvsls, A r t l f ~ c l o l I n t e l l l g e n ce, Vol.15,p251-278,198~. [15] Pierre I s a b e l l e and Laurent Bourbeau, TAUMAVIATION:Its technlcal features and some experimental r e s u l t s , CL, Vol.11, No. 1, p18-27, 1985. [16] Richard E. CulZingford, Word-meaning s e l e c t i o n in multiproces~language understanding, IEEE Trans. on Pattern analysis and m~chlne i n t e l l i g e n c e , Vol, PAMI-.6, No.4., July 1984,p493-509. [17] R.F.Simmons, Technologies f o r machine t r a n s l a t i o n , FGCS, Vol.2, No.2, p85-94, 1986. [18] Rod Johnson, Maghi Kinq, end Louis des Tombe, EUROTRA: A m u l t i l i n g u a l system under development, CL, V01.11, No.2-3, p155-169,1985. [19] Roger Sehank, The condeptuel analysis o f natural language, Natural Language processing, edited by Randall Rustln, A l g o r i t h m i c s Press, New York,1975, p291-511. [20] Rozena Hennisz-Dostert, R.Ross Macdonald and Michael Zarechnak, Machine t r a n s l o t i o n , Mouton Publisher, Hugue, Paris, New York, 1979. ~. [21] S.Amano, The Toshiba machine t r a n s l a t i o n system, FGCS, Vol.2, No.2, p121-124~11986. [22] Terry Wlnograd, Language as a Cognltlve(Vol 1) process, Addison-Wesley Publishing Company, California,London, 1985. [25] V t n f i e l d S.Bennett and Jonathan 61ocum, The LRC machine t r a n s l a t i o n system, CL, Vo1.11, No.2-5, p1¢1-121, 1985. [24] Y. Wilks, The stanford machine t r a n s l a t i o n project, Natural Language Processing, edited by Randall Rustin, Algorithmics Press, New York, 1973, p245-291. [25] Yoshlhiko N l t t a , Problems o f machine t r a n s l a t i o n s y s t e m s - e f f e c t o f c u l t u r a l d i f f e r e n c e s on sentence s t r u c t u r e , FGCS, Vol.2, No.2, p117-120, 1986. 122

English-Chinese Machine Translation System IMT/EC

Related documents

Products

Support

English-Chinese Machine Translation System IMT/EC

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib