English-Chinese Machine Translation System IMT/EC

advertisement
Machine T r a n s l a t i o n System IMT/EC
Chert Zhsoxlong and Gee Qingshi
I n s t i t u t e o f Computing Technology
Chinese Academy o f Science
B e l J l n g , PRC.
English-Chinese
Abstract
IM'I/EC is an E n g l i s h - C h i n e s e machine translation
system ~,hich integrates some outstanding
features of
the case grammar end semantic grammar l n t o a u n i f o r m
frame, LISeS v a r i o u s kuowledgo In the d i s a m b l g u a t i o n ,
and t r i e s t o m o d i f y the o b j e c t language by i t s e l f .
In
t h i s poF,er,we f i r s t i n t r o d u c e IMT/EC's design m o t i v e t i o r l and o v e r a l l
a r c h i t e c t u r e , then d e s c r i b e the
deslgn p h i l o s o p h y o f
its
t r a n s l a t i o n mechanisms and
thelr procesging
algorithms.
way. To deal wlth this problem, we store these rules
in their respected word entries end
classify them es
s e v e r a l c a t e g o r i e s in the IMT/EC, Each c a t e g o r y c o r responds tO 0 general subroutine epplleatlon mechanism, which apply the word specific rules and subroutines in the
processing of translation. The
subroutines are s t o r e d in a n a t u r a l
language s p e c i f i c sub-
r o u t i n e package. Some word s p e c i f i c
s u b r o u t i n e s are
d i r e c t l y s t o r e d in the respected word e n t r y .
(~) Powerful
J,The design nlotivation
]'he design of the
IMT/EC
system are motivated to
develop new approaches to the Engllsh-Chinese machine
translation, such
as,
to provide
the
system
with
powerful analysis
meohanisnls
end MT
knowledge base
menagonlerit system , as well as some exceptional processing and learning meohanlsms, that is, to make the
system ba intelligent. In addition, it also
tries to
inregret9 as many advantages
of conventional machine
translation systems into a single system as possible,
such as,
to provide the system with powerful mechon-.
Isms for the processing
of various
ambiguities and
contextu,~l
relations. The design of the IMT's translation
mechanisms
are based on the following conslderQti on ~;,
(1) S t - a n a l y s i s
In the development o f machine t r a n s l a t i o n
in order to
disambiguato
the
source
system,
language,
we
have t o {molyze the i n p u t deeply
t o get the i n t e r n a l
meonlng
source
representation
of
the
However, the deeper we aaalyze the input,
language.
the more we
lose the clues about how to express the
translation,
also, th(it it results in extremely poor or no translations ef sentences
for which complete analyses can
not be (h~rlved[Slocum 85]. To find a suitable analysis depth so as to get both clues about how to express
the trorl~;lation of the input and to disombiguate
the
input conlpletely Is almost impossible. In the IMT/EC,
we try t(, design a simple grammar analysis
mechanism
-- SC-gr'(~mmar auulysls
mechanism to inherit both the
outstanding features of case grammar analysis and semantic grammar analysis
so as to produce a high quality translation,
(2) Multl-language translations oriented
In present technical conditions, it is impossible
to design a general internal
meaning
representation
for all natural languages. Thus, the knowledge
based
multl-language oriented machine translation system is
difficult to be marketed in the near future. A feaslble way ~or d e s i g n i n g multl--longuage oriented machine
translatJorl systems might be to separate the processing mechanisms from the language specific
rules |as
King et oI. ~5], thet ie,
t O apply the same processing meotlonlsm with different
language s p e c i f i c rules
for d~ffErent natural language pair translations.
In
the 1NI'/EC, we develop a g e n e r a l r u l e
representation
form f o r
the r e p r e s e n t a t i o n o f v a r i o u s knowledges
used in
the t r a n s l o t l o n . Knowledges f o r
different
language p a l r t r a n s l a t i o n s a r e s t o r e d in the d i f f e r ant packages o f the knowledge base IMT-KB. The knowledge base a r e o r g a n i z e d In m u l t i - p a c k a g e end multil e v e l way so as t o s t o r e r u l e s f o r the t r a n s l a t i o n o f
d i f f e r e n t language p a i r s and d i f f e r e n t
phases o f the
processing.
Thus, t h e system can be e a s i l y extended
for' m u l t i - l a n g u a g e t r a r l s l a t l o n purposes.
(3) D i v e r s i t y p r o c e s s i n g
As the dlsemblguation rules are rather
words specific,
±% is
difficult
to
manage
them in the same
exceptional
processing
Since the n a t u r a l language phenomena ore so abundant t h a t any e x i s t e d machine t r a n s l a t i o n system can
not
process
all
the
provide an exceptional
phenomena, it is essential to
processing
mechanism in the
system t o deal w l t h e x c e p t i o n a l phenomena. As IMT/EC
i n c o r p o r a t e s s o m e l e a r n i n g mechanisms, thus, i t i s
more p o w e r f u l in d e a l i n g w i t h
the e x c e p t i o n s than
others.
(5) Automatic m o d i f i c a t i o n o f the t r a n s l a t i o n
G e n e r a l l y speaking, machine t r a n s l a t i o n system can
o n l y produce r i g i d t r a n s l a t l o n s , i t i s a d e s i r e t h a t
MT systems be a b l e t o modify the o u t p u t
by i t s e l f so
as t o produce more f l u e n t t r a n s l a t i o n s .
IMT/EC t r i e s
t o a p p l y same common sense knowledge and l i n g u i s t i c
knowledge o f o b j e c t language t o disamblguate the
i n p u t end m o d i f y the t r a n s l a t i o n s ,
thus, t o improve
the t r a n s l a t l o n q u a l i t y .
In the f o l l o w i n g paragraph, we focus on the t r a n s l a t i o n procedure o f the system end the a l g o r i t h m s
related to i t ignoring
the knowledge base o r g a n i z a t i o n and management mechanisms.
2. The o v e r a l l a r c h i t e c t u r e o f the system
The architecture
of
the
IMT/EC
system
is
as
follow,
]
knowledge
System
Engllsh I n p u t
[
~
M o r p h o l o g i c a l A n a l y s i s lI
& Dictionary R e t r l v i n g ~
f
Knowledge~
A p p l i c a t z .o nI J
~
/
base management
IMT-KB
p._I_T_T..~1~
t
I
[
~;n:l;;";'s '
-
-
~
~
L \~/
/~ /
/ Dlsambigu~on
r / I\k
|
L /
&
Transfer
___
B°se.~_
7
Augmentation-~
& Modifiootio~
_
f * ~
I \~
' ....
',,
I
Acquisitioo
-
I the T r e n ? l o t l o n J
I
/
)
t
/
/
Fig.
The a r c h i t e c t u r e
of
the
IMT/EC
117
As the
rule bose
and
dlctlonarv
in a machine
trenslatlon
system
is so vast that it ls impossible
for human beings to find the confllotion end implication among the rules. To modify a rule in the knowledge base often results In many side
effects on other
rules.
Thus, i t I s
necessary
to provide a self reo r g a n i z a t i o n and r e f i n e m e n t mechanisms in t h e k n o w l e dge bose.
In t h e IMT/EC, we d e s i g n a s p e c i a l knowledge base
management system IMT-KB t o manage ell t h e
knowledge
used In v a r l o u s
processing
phases
of the translat i o n . In a d d i t i o n , IMT/EC a l s o p r o v i d e s
o knowledge
bose a u g m e n t a t i o n and knowledge a c q u i s i t i o n
environment f o r t h e system t o augment system p e r f o r m a n c e by
itself
and f o r t h e
users to
improve t h e
knowledge
base.
The c o l 1
relations
connected
by d o t t e d l l n e s i n
t h e f i g u r e above o r e e x e c u t e d o n l y when t h e user s e t s
the l e a r n i n g
mechanisms
in
working
s t a t u s . These
mechanisms can a c q u i r e new knowledge i n
t h e dynamic
interactive, static interactive,or disconnected ways.
They ore primarily used to
resolve
the
exaeDtlenel
phenomena in t h e translation.
Dynamic Interactive
Learning (DIL):
Whenever the
system
encounters c sentence
out of its
processing
range, it produces various possible
translations for
each segment of the sentence and interacts with human
beings when necessary to select on appropriate translation
of
the
segment
and
combine
them
to
get
o correct translation
of the sentence.
At the
some
time, it also creates'some n e w rules to
reflect the
selections. That is, it learns some new knowledge.
S t a t i c I n t e r a c t i v e Learning (SIL): Whenever the
system e n c o u n t e r s a sentence
out o f
its
processing
r a n g e , i t records down t h e sentence and its appearance
context in e f i l e . A f t e r the t e x t has been translated,
i t begins t o analyze the sentence in d e t a i l to get
various possible t r a n s l a t i o n s f o r each segment o f the
sentence and i n t e r a c t s w i t h human b e i n g s when necessary to get a p p r o p r i a t e
translations of the
segments
and combines them t o g e t a c o r r e c t t r a n s l a t i o n
of the
sentence. At the same time, it also c r e a t e s some new
rules to reflect the selections,
thus,
to learn new
knowledge.
Disconnected
L e a r n i n g (DL):
Whenever t h e system
e n c o u n t e r s o sentence out of i t s p r o c e s s i n g range, i t
analyzes the sentence in detail to get all the possible translations, and then
evaluates these
translations according to the p r e f e r e n c e rules stored i n . t h e
IMT-KB
to
select
on
appropriate
translation
and
modify the
related
rules
used i n t h e a n a l y s i s t o
r e f l e c t t h e s e l e c t i o n s . I t s k i p s o v e r sentences which
the
translation
con n o t be d e t e r m i n e d by t h e p r e f e rence rules Instead of interacting with human beings.
5. The translation
procedure
IMT/EO's
tronslatlon
procedure
is divided into
several phases, i.e.,morphology onalysls and dictioncry retrlvlng, SC-grammor anolysls,dlsamblguotlon and
t r a n s f e r , modification o f the t r o n s l a t l o n etc.
The communlcotions between t r o n s l o t l o n
mechanisms
and t h e knowledge bose o r e p e r f o r m e d by t h e knowledge
base
management
system
IMT-KB, these o p e r a t i o n s
includes getting a se~ of r e l a t e d rules and returning
some I n f o r m a t i o n
for the modification
as w e l l aS
a u g m e n t o t l o n o f t h e MT knowledge bose.
5 . 1 . Morphology a n a l y s i s and d i c t i o n a r y r e t r i v i n g
In t h e
IMT/EC, words i n most common uses
con be
r e t r i v e d by e i t h e r t h e i r base forms o r t h e i r s u r f a c e
forms, w h l l e
most o f t h e o t h e r words can o n l y be r e t r i e v e d by their base forms. The tasks of the morphology analysis ore to process the prefix,
suffix, and
compound words. Since t h e s e p r o c e s s l n g s ore c o m p l e t e iv natural language s p e c i f i c ,
in order f o r the processing mechanisms t o be language independent, we develop a language
independent
morphology analysis mechanism to
opply
the language
specific morphology
rules In the morphology analysis,
The morphology analysis r u l e form is
< s u r f a c e p a t t e r n > -> < c o n d i t i o n s > I < r e s u l t >
118
Here,
<surface pattern> is the surface form of the word
t o be a n a l y z e d ,
<conditions>
is the oppllcotlon condltlons of. the
rule,
<result>
Is the definltlon of the word base
form
analyzed.
For example,
(1)
(" s)->
(2)
(" s)->
(3)(-1 - "2)->
(verb
-) I (def("),
SV)
(noun
* ) I ( c l e f ( * ) , PN)
(word - 1 ) ( w o r d "2)1
( ( d e f ( m o r p h o l o g v w1),
d e f ( m o r p h o l o g v " 2 ) ) , CaM)
Here, *, -1 and *2 are variables
Indlcating
that
it con be bounded to any sub-character string
of the
word t o be analyzed, def(X) is the definition o f X in
t h e IMT-KB, SV, PN, cam a r e
surface
'features of the
word.
Rule ( I ) I n d l c o t e s t h a t when t h e l a s t c h a r a c t e r o f
the surface
f o r m o f a word i s ' s ' and t h e remained
c h a r a c t e r s t r i n g * i n t h e word i s o v e r b ,
then i t s
s u r f a c e f e a t u r e i s t h e s l n g u l o r v e r b f o r m (SV) o f t h e
v e r b *. Thus, I t r e t u r n s t h e v a l u e o f
(def(*),
SV)
as r e s u l t .
Rule ( 2 ) i n d i c a t e s t h a t when t h e l o s t c h a r a c t e r o f
the surface
form of
a word i s ' s ' and t h e remained
character
string * in the word is o noun, then its
s u r f a c e f e a t u r e i s t h e p l u r a l noun f o r m (PN) o f
the
noun *. Thus, i t r e t u r n s t h e v a l u e o f
(def(*),
PN)
as r e s u l t .
Rule ( 5 ) i n d i c a t e s t h a t
when t h e c h a r a c t e r s t r i n g
of o word comprises a character '-', the left pert ~1
and the right part w2 of '-' ore both words,
then it
l s o compound word of ~I and w2.Thus, it applies morphology rules to analyze the word ~1 and *2,
and returns the value of
( ( f ( m o r p h o l o g y - 1 ) , f ( m o r p h o l o g y w2)),COM)
as r e s u l t ,
Suppose t h a t ,
SX indicates that X is o variable,
#X r e t u r n s t h e c h a r a c t e r l l s t o f X,
&X returns the l o s t character o f X,
>X returns the f i r s t
part o f r u l e X or the
first
element o f a l i s t ,
<X r e t u r n s
the
remained p o r t
of
X which
(>X" <X o X),
f(X,V) returns the
first
different item palr of
X ond Y,
lookup(X) looks up t h e d i c t i o n a r y and returns the
deflnltlon of the word X,
search(X) returns the morphology rules which leclu-.
des character X,
c h e c k ( X ) tests whether
two
elements of the item
pair X is uniflable or nag,
n u l l ( x ) tests whether llst X IS empty,
apply(g,x) returns the r e s u l t o f g(X),
t ( X ) t e s t s whether r e s u l t X n e e d s f u r t h e r
onolysls
and
performs r e c u r s l v e analysis
when n e c e s s a r y .
The a l g o r i t h m f o r morphology a n a l y s i s and d i c t i o n - erv retrievlnq
i s as f o l l o w .
INITIALIZE
$X <- #word;
SP <- s e a r c h ( &
SX);
$P <- $PU s e a r c h ( >
$X);
$ r e s u l t <- ( );
f o r $ r u l e m SP do
{MATCH
SPAT <= > Srule;
$COND <>"< $ r u l e ;
$RES <<=< $ r u l e ;
Loop
$ p a t r <-- f(SPAT, SX);
if (null($polr))
g o t o TEST;
if (not(check($pair)))
break;
SPAT <- $PAT~ S p e l t ;
(2) Find a list
of
expectation
pathos
from the
p h r a s e [List as follow,.
X,
"'~')( i , J , ) X~;(J,~l k,!
X~l(m,4tm)
i P A I R .I. <-. i P A I R l.U ( $ p o l r } ;
g a t e Loop ;
rEST
f o r $CONDI e$COND, do
($PROP <- l o o k u p ( > e < ($CONDL));
if ( n o t ( a p p l y ( >
}
$CONDI,
SPROP)) b r o o k ;
|:~ESUI.T
iPROP <-- l o o k u p (>
($RES~'$PA);R L ) ) ;
iresult <-- i r e s u l t U { ( $ R E S ~ iPA:[R L,
iPR(IP ) } ;
)
if (pull($res~it)) r e t u r n
else returrl
word
t($resu].t);
El~ll),
3.2. St--Grammar Anulys:Ls
The S t - g r a m m a r
enolysis
mechanism o f
IMT/EC
U|)plle~ th( SC--rolos stored ill the iMT-KB to dlsumblgLlate the ~.trueturol embigultles of the input senten-cos end p r e d m ; e s t h e s t r u c t u r a l
description
f o r them.
"ihe grammar |lOS some o u t s t e n d i n g
f e a t u r e s o f t h e case
grommet ond s e m u n t i c
gr'ommor.
The r u l e f o r m l s cs
follow,
<S-STRUCTURE> ~> <S-ENVIRONMENT>I
<R=SI'RUCTURE >,
<I~- ENVIRONMENi>
<TRANSFER>.
Itore,
<S-.SIIRUCrUI~E> mid <S-ENVIRONMENT> a r e r u l e
conditions which defines the current strtlctLIr(]l
f o r m end
contextual
foattlres of the input,
<R-.STRUCTURE> und
<R-ENVIRONNiENT>
ore
result
strueturul
f o r m ~nd
corltextuel features of the input,
<TRANSFER> e r e t h e
trensformotlo~Is related to the rule.
lho
structural
forms ,
<S-SI'RUCTURE>
end
<R-STRUCTURE> o r e r e p r e s e n t e d os s t r i n g s o f syntogmos
arid w o r d s . ] h e c o n t e x t u e l
envlronments,<S-ENVIRON~ENT>
and <R-ENViRONMENT> o r e
r e p r e s e n t e d os v e c t o r s ,
of
w h i c h each e l e m e n t c o r r e s p o n d s t o on i n t e r - s e n t e n t t a l
reletion
o r e s p e c l a ] eeoc, t h e i r v a l u e s e r e used t o
resolve the ellipsi.s,dnephore,
t e n s e and e s p o c t s e t c .
it is the principal
contextual
p r o c e s s i n g mochenisms
in t h e IMI/EC.
Since
the
contextual
vector
i s used o n l y os a
supplo*~lent to the p u r e
semantic grammar ona]vsls,
espeele].].y in the processing of contextual relations,
it is riot necessary to analyze the Irlput to the
e x t e n t t h e | . one con g e t e l l t h e s e m a n t i c r e l a t i o n s
of
the input. Thus, the vector processing
formalism
is
completely acceptable.
Two example rules ore as follow,
NP VP -> A I S,
change(B1,×),
INP IVP.
i n NP -> A1 I PP, c h o n g e ( B 2 , X ) , z e l INP n u e i .
S t - g r a m m a r o n e l y s i s mechanisms r e c e i v e t h e r e s u l t s
of morphology analysis or previous SO-reduction,
send
t h e messages t o t h e IMT-K8 t o g e t r e l e t e d
r u l e s , end
apply these rules to ,reduce. the
input
until o non-
terminal symbol S
is reduced, thus,
structural d e s c r i p t i o n
o1' the input.
The SO-grommet a n a l y s i s
algorithm
to produce the
o f t h e system
is:.
(I) ]in the entries of' the [MT-KB dictionary, we
stored not only the word meanings and their disembi-guotlon conditions, but oleo SO-phrase end sementlc
rules specJflo to the entry word.
When onulyzlng e
sentence, the system
first
retrieves
the SC-phrose
rules specific
t e t h e w o r d s a p p e a r e d irl t h e s e n t e n c e ,
end ~ p p l i m { t h e s e
rules
to find
a list
of possible
p h r a s e s o f t h e s e n t e n c e f r o m t h e c o n t e x t o f t h e words
i n t h e senl;eneo.
The p h r ~ s e l i s t
r e t u r n e d i s os f o l l o w ,
X, ( i , , J, )
X~(±~,J:)
x~(i.,j~).
Here, XI, X~., ...,Xm ore phrase syutogma Identif$ers,
i } , ~) . . . . .
J.~ arid J~, Jz . . . . .
J~ ere ending
trons of the phrases in the input sentence.
posi-
vl~),
× ~ / ( l , J m i X~)(Jm÷l,k )
~ U"~+I,n#)
P(w ~ ) P(w~+ L)
.,, p(w~z. #
(Here,
P ( w ) i s t h e word w i t s e l f
or its property,
t
is the current onalysls position which initial
veluo
is ~,I is the expects|lee
l e n g t h d e f i n e d by t h e u s e r )
end o r d e r them by means o f
the phrase
ending
post-tlons
n=,n~ . . . . .
n~ from
lerger to smeller.
These
p o t h e s a r e used as h e u r i s t i c s
in the
ano].ysis ef the
s e n t e n c e . We t r y one new p a t t i o t one b e c k t r e c k i n g .
( 3 ) Send t h e ano].yzed component
M = V,(...)
V~(...)
... V~(...)
cndIiihe
current
expectetion
path
t o theIIMT---KB t o
retrleve
the'SC-ruIes'which
heed p c | t e r n s
contain
sub-string
of
,(...)
... v~(...)
x~(..:)
... xz(...)
~1 Path =
or
(.
)
v~(
) P (w~ ) ... P (w~.~)
and o r g a n i z e t h e s e r u l e s i n a l i s t
according to their
p r e f e r e n c e s f r o m higher" Lo l o w e r . Then,
i t t o k e s one
rule from the rule llst
a t one b u c k f i r e c k l n g and go t o
(~) to appl V the rule to reduce the input.
I f no r u l e i n t h e l l s t
con be s u c c e s s f u l l y
applied
t o r e d u c e t h e i n p u t , t h e system g e t s t h e n e x t
expectation path from (2)end repeet (3). If all the expectation
p e t h e s have been t r i e d end no s u c c e s s f u l r u l e
has been a p p l i e d ,
it returns
1;o t h e l a s t
analysis
p o s i t i o n t o r e - a n a l y z e t h e i n p u t . I f t h e c u r r e n t erla-lysis position
Is the beginning of the sentence,
the
system c o i l s t h e
exception
processing
,lechenism t o
deal with this un--analyzable sentence.
(l~) Match
the rule
head p a t t e r n w i t h t h e c u r r e n t
form of the input sentence. If there is a sub-pattern
of the current sentence pettern
that
can match t h e
r u l e heed, t h e n go t o ( 5 ) e l s e g e t t h e n e x t r u l e f r o m
( 3 ) end t r i e s
t o re-.match them.
(5) First,
odd some n e w l y
formed phrases into the
p h r a s e l i s t in order For the backtracking of the o n e
lysls,
t h e n c o l l t h e cese e n e l y s ± s mechanism t o check
t h e e u r r e e t a n a l y s i s r e s u l t s und t h e c u r r e n t
form of
t i l e s e n t e n c e 'to F i l l
in the rose
freme
A, B i n t h e
r u l e end t h e c o n t e x t v e c t o r . The case a n e l y s i s
algorithm is described In the following
paragraph.
( 6 ) Check A end t h e c o n t e x t v e c t o r t o see w h e t h e r
{~
their values are unlfleble.
If
they
are
unifieble,
t h e n go t o ( 7 1 , e l s e g e t t h e
next rule from
( 3 ) and
returns to (4).
(7) Store the
backtracking
informetion
into the
temporary stock, substitute
the reducing part
of the
current sentence form with the reduced
form,
change
the current analysis position
t o t h e l a s t word oF t h e
n e w l y r e d u c e d s y n t a g m a , c n d change t h e r e l a t e d e l e m e n t
v a l u e s o£ t h e c o n t e x t v e c t o r a e c o r d i n g t o t h e e l e m e n t
v a l u e s o f B.
If
the
current position
i s n o t t h e end o f o s e n t e n c e , t h e n go t o ( 2 ) ,
I f t h e c u r r e n t position
is t h e
end o f a s e n t e n c e
end t h e c u r r e n t f o r m o f t h e s e n t e n c e
i s n e t S,
~hon
go t o ( 2 ) ,
If the
current
position
i s t h e end o f e s e n t e n c e
end t h e c u r r e n t f o r m
of the
s e n t e n c e i s S, t h e n go
t o (8),
( 8 ) C a l l t h e s e m a n t i c p r o c e s s i n g mechanism t o check
t h e r e s u l t o f t h e n n s l y s l s t o see w h e t h e r ~ t v i o l a t e s
the English collocetlon
rules. If the result violetes
the collocation
rules,
t h e system
recovers
to
the
status before the
last
r e d u c t i o n and g e t s 'the r l e x t
rule from (5) to r'e-onoiyze the input. Otherwise,there
wlll be two cases,
a, If the u s e r
only needs the most adequate
•trensletlon, the system proceeds to analyze the next
sentence.
b. I f
the user
needs
ell possible trcnslations,
t h e system r e c o r d s down ~he c u r r e n t r e s u l t end r o s e v e r s t o t h e s t e t u s b e f o r e t h e Z e s t r e d u c t i o n end g e t s
119
the next r u l e from (5) t o re-analyze the input in
order to get other onolysls r e s u l t s .
AS we have m e n t i o n e d b e f o r e , t h e case a n a l y s i s i n
the SC-analysls
i s o n l y a complement t o t h e s e m a n t i c
analysis.
I t i s m a i n l y used t o des1 w i t h t h e c o n t e x t
relation
and a § p e c t ,
tense,
modal e t c .
Thus,
the
s y s t e m o n l y needs t o a n a l y z e t h o s e cases w h i c h can be
used i n t h o s e p u r p o s e s .
I t l s much s i m p l e r
than the
case a n a l y s i s i n t h e c a s e grammar a n a l y s i s .
The case a n a l y s i s i n t h e S C - e n a l y s l s l s
performed
by t h e f o l l o w i n g a l g o r i t h m ,
( 1 ) Get t h e
case
expressions
defined
in
the
el'emerita o f " v e c t o r A and B. The f o r m o f t h e e l e m e n t
e x p r e s s i o n s o f A and g l s
s~[i]:E
Here, S ~ [ I ] i n d i c a t e s t h e e l e m e n t case C d e n t l f l e r ( S # )
of
t h e case f r a m e A o r g i s c o r r e s p o n d e d t o t h e case
identifier
sill
o f t h e s y s t e m case f r a m e , l . e . , s y s t e m
context
vector.
E i s t h e e x p r e s s i o n used t o g e t t h e
value of the respected case.
(2) Retrieve the definition
o f t h e case
identifie r s f r o m t h e s y s t e m case f r a m e
and o r g a n i z e
these
case i d e n t i f i e r
into a 1let according to their prefer e n c e s f r o m h i g h e r t o l o w e r . The f o r m i s ,
(S[il].subject:EI,S[12].obJect:E2 ..... S[lm].Em .... )
(5) Evaluate the value of the elements in the case
identifier
llst,
and flll
them
Into the respected
p o s i t i o n i n t h e case f r a m e A end B.
T h e r e a r e many
cases i n t h e e v o l u a t l o n .
a. E ls a constant, returns E,
b. E l s empty, evaluate the case value according
the definition
o f t h e case i d e n t i f i e r ,
c. I f the case identifier l s a syntagma idetlfler, then finds the
vclue of
the
identifier
from the
analyzed
input
according
to the
heuristics
p r o v i d e d by t h e e x p r e s s i o n E,
d. If the ease identifier is o sementlc
identifier, t h e n c a l l t h e s e m a n t i c mechdnlsm t o g e t
t h e v a l u e w h i c h can be f i l l e d
i n t o t h e case
identifier
from the
input
according
to the
heuristics
p r o v i d e d by t h e e x p r e s s i o n E,
e. For o t h e r case i d e n t i f i e r s , c a l l
thelr respect e d Subroutines to get the value of the case.
These subroutines
are
defined
by
the rule
designer.
The case
analysis in the SC-analysis con solve the
elllpsls, anaphora, and other contextual p r o b l e m s .
5.5. Semantic dlsambiguatlon and transformation
The SO-rules define not only the r e l a t i o n s f o r the
syntagma reduction, but also contextual vector value
changes w i t h r e s p e c t t o t h e r e d u c t i o n
of o sentence,
and t h e r u l e s r e l a t e d t r a n s f o r m a t i o n s .
The t r a n s f o r m a t i o n
o p e r a t i o n d e f i n e d in t h e SC~rule
is in the follewlng
forms,
IX
IX
...
IX
Here, I X
, IX
....
, IX
are
translations
of the
syntagmas X , X , . . . ,
X
i n t h e r u l e head. T h e i r
positions indicate the
positions of the translations
of the syntogmas.
There will also be some indicators
In the string which are used to indicate positions of
the translations for inserting
tense,
voice,
modal
modiflers.These indlcotors are used as the heuristics
of the semantic processing.
The
transformetlon
in the
IMT/EC ls relatively
slmple. It travels over the whole anolysls tree
from
top to down, left to right, transfer every
node when
the node is £raveled.J'he result of the transformation
is the Chinese utterance of the sentence.
Rules with same
head
patterns may have different
case frames A and B , i n t h i s c a s e , t h e y may c o r r e s p o n d
to different
transformation
operations.
These r u l e s
o r e d e f i n e d as two d i f f e r e n t
r u l e s by t h e r u l e d e s l g ne'r. W h f l e i n t h e IMT-KB, t h e s y s t e m s t o r e s
them as
one r u l e w l t h many c a n d i d a t e r i g h t p a t t e r n s . Whenever
t h e head p a t t e r n i s s u c c e s s f u l l y matched,
t h e system
sequentially
checks
these
candidates
until
one o f
them i s
satisfied
and r e c o r d s down t h e c u r r e n t s u c c e s s f u l position so t h a t backtracking
mechanism
can
120
get the other candidates when necessary.
The tasks of the semantic processing in the IMT/EC
ore t o check t h e r e s u l t s o f t h e
a n a l y s i s t o see whet h e r they satisfy the syntax or semantic
collocation
r u l e s d e f i n e d i n t h e IMT-KB, t o
produce the suitable
modifiers for expressing the
tense,
volce,
aspects
and so on I n t h e C h i n e s e . I n some c a s e s , l t a l s o a p p l y
t h e w e l l formed w o r l d knowledge d e f l n e d i n t h e IMT-KB
t o e l i m i n a t e some 1 1 l e g a l
e x p r e s s i o n s and e x t e n d t h e
meanings o f some a m b i g u i t y w o r d s .
Slnce the
SC-analysls
is
based
on t h e s e m a n t i c
grammar analysis,
most of the
syntax
and semantic
ambiguities
are
solved in the reduction operations.
Even t h o u g h t h e case a n a l y s l s i n S C - a n a l y s l s l s aimed
mainly to resolve the contextual problems,
they
can
a l s o s o l v e some a m b i g u i t i e s among o s e n t e n c e . T h a t i s ,
t h e s e m a n t i c p r o c e s s i n g i n t h e IMT/EC l s o r l e n t e d
to
s p e c l f l c a m b i g u i t i e s and l n t e r - s e n t e n t l a l
case v a l u e
evaluations.
Though t h e p r o c e s s t n g s a r e d i f f e r e n t
in
d i f f e r e n t phases o f the t r o n s 1 o t l o n , they can be
categorized as,
(1) determining the value o f o s p e c i f i c semantic
i d e n t i f i e r , such as tlme adverbial,
place edverblal,
anophoro etc.
When o s p e c i f i c semantic i d e n t i f i e r i s concerned,
the semontlc, processlng
mechanisms f i r s t
finds the
key word which con match the semantic i d e n t i f i e r from
the sentence,such as word wlth tlme,plaoe properties,
then get the phrase which comprises the key word in
the sentence, and return the phrase as the value o f
the i d e n t i f i e r .
Only simple anaphoro phenomena are considered in
the IMT/EC. They are processed in two d i f f e r e n t ways.
One i s to compare the synonyms t o f l n d the. anaphorn
words, the other I s t o f l n d the s u i t a b l e anaphoro
content through the p o s i t i o n r e l a t i o n s , s u c h c a , in
some s p e c i f i c context the word 'which'
can refer to
the noun phrases immedlotel y before i t .
(2) checking the c o l l o c a t i o n o f syntogmas.
There ore t h r e e p o s s i b l e c a t e g o r i e s o f c o l l o c a t i o n
I n the analysls results,
<1>
X W -> (W => CI)
<2>
W Y -> (W => C 2 )
<5> X W Y -> (w => C 3)
Here, X, Y may be s t r i n g s o f words or syntagmos, W
I s a s p e c i f i c word. The above expressions means t h a t ,
<1> W appears a f t e r s t r i n g X and functions as
speech C I ,
<2> W a p p e a r s
before
s t r i n g Y and f u n c t i o n s as
speech C~,
<5> W a p p e a r s
between s t r l n g X and Y and f u n c t i o n s as speech C~.
The r e l a t e d word d e f i n i t i o n
in the
IMT-KB d i c t i o n a r y is as f o l l o w ,
W := C,
(E, => MI)
c:
(E~ :> M~)
(E~ => M~)
Cm (E~ => M~)
Here, C i s the speech category, E i s the context
s t r u c t u r e o f word W, M I s the meaning o f word W.
The semantic processing
mechanism r e t r i e v e s the
c o l l o c a t l o n rules s p e c i f i c t o words o f the s e n t e n c e
f r o m t h e IMT-KB, and a p p l i e s t h e s e r u l e s t o c h e c k t h e
analysis result to see whether there is any Violation
between the analysis result and collocation rules. If
there l s , returns f e l l .
(5) cheoklng the d i s t a n t contextual r e l a t l o n s .
There are also three
possible categories o f
d i s t a n t c o n t e x t u a l r e l a t i o n s appeared in o sentence,
X ...
X . . . W [m]
W . . . Y [n]
W . . . Y [m,n]
-> (W => C~)
-> (W => C~)
-> (W => C~)
Here, X, V, W, C h a v e the same meanings as in the
(2). n, m are o p t i o n a l , they defines the r e l a t i v e
p o s i t i o n between the word W and XIY. When n, m = 0,
they ore the cases described in (2), When n, m is not
d e f i n e d , t h e y i n d i c a t e s any p o s i t i o n b e f o r e / a f t e r
the
w o r d ~V. These d i s t a n t c o n t e x t u a l r e l a t i o n
rules
are
d e f i n e d l a t h e It~-I'--Kg I n t h e same way as i n ( 2 ) .
If m and/or u are present, the semantic processing
mochenisdl f i n d s m/rl word
before/after
t i l e word W i n
t h e s e n t e n c e , and t r i e s t o r e d u c e t h a t word
and i t s
a d j a c e n t words i n c a X or' Y. I f
t h e y can be r e d u c e d ,
end t i l e
word W f u n c t i o a s
as t h e same c a t e g o r y as
d e f i n e d in t h e r u l e , trlen
S u c c e s s e s , e18o e l i m i n a t e s
tlm analysis,
if .iaud n are not ~efined,
then try to find the
word b e f o r e / a f t e r
t i l e word W w h i c h
con
be r e d u c e d
i n t o X or Y t o g e t h e r w i t h i t s a d j a c e n t words. I f t h e r e
e r e no such e l e m e n t
in tile
sentence, then returns
f(lil.
(ll)cre,3ting Chinese modifiers to express the tense.
v o i c e , modal arid so on Grid i n s e r t t h e s e m o d i f i e r s
in
tile
t r(3Jisloticn
according
to
tile
position
mark
Ip'pearoci i u the r u l e .
t h e 9 r e c e s s i n g p r o c e d u r e i s as f o l l o w ,
a. GeL ClIO niorks o f t h e t e n s e , v o i c e , modal e t c .
b. Coll. trio' c o r r e s p e l l d / n g s a b r a u t i n e s d e f i n e d by t h e
r u l e d e s i g n e r t o del.orlidtle on a p p r o p r i a t e m o d i f i - o r 1-'or t i l e mclrk. T h i s i s bclsod m a i n l y on t i l e s o a texi;llul structure of the analysis result.
C. Ins(~rt tri() modifier in trlo p o s i t i o n o f tile t r u n s lati~)n marked by t h e n i a r k e r .
For" e x a l l l J i l o , i f a v e r y i s i n t h e ' - l e g '
f o r m and t i l e r e
&s I:e t i m e o d v o r b i o l i n
the i~lput,
the tease of tho
contoxt
are all
progressive,
t h e n i g n o r e s trle t i m e
rHark. I F t h e p r e d i c a t e o r e 'be g o i n g t o ' ,
then transl a t e s i t as ' d a s h u a n g ' i g n o r i n g t r l o t i m e mark.
The w o r l d
knowledge r u l e s a r e d e f i n e d i n t h e same
f o r m as i h e s e m a n t i c r u l e s .
"lhe a p p l i c a t i o n
of these
r u l e s (Jr( t o t e s t
tile c o n t e x t
to f i n d
the s e m a n t i c
f'o(~ttlres Of t h e s l t L I o t I o n end coIdpQre t h e s e t o
the
w o r l d mo(iel d e f i n i t i o n
dei"i~lc, d if~ t h e w o r l d knowledge
r u l e t o [{el;ormiue t h e s i t < l o t i o n of" t h e u t t e r a n c e , and
thO~l d e t o r m i d o t h e c o r r e c h t r a n s l a t i o n
o r exterlct t h e
mounlllgS Of r e l a t e d w o r d s .
Every
semuntlc
processing
nleChOnlsm
mentiorled
.b<)ve co,~responds
to u s p e c i f i c
processing subrout.i r l e . rtie;~.~ s a b r o a t i n e s a r e c a l l e d i l l t h e grammar onel y s l s Ulid t r a n s f o r m a t i o n
processing
to perform the
r e l a t e d [~{911tantic p r o c e s s i n g . ( 1 ) I s p r i m a r i l y
asod i n
t h e case q n a l y s l s ,
( 2 ) and ( ~ ) a r e p r ' I m a r l l y
used i n
c h e c k i n g cbe
analysis
r e s u l t and d i s e m b i g u a t i o n s i n
t h e a n a l v : ; I s and t r o n s f o r t l l a L i o n , ( 4 )
is p r i m a r i l y
used
ill the tr,msformotlon.
The gr~muner and word t r n n s f o r m a t l o n q l g o r $ t h m i s ,
( 1 ) C H r r e n t = n o d e <-- r o o t o f t i l e e n a l y s l s t r e e ,
( 2 ) ) Z ' £ t h e c u r r o n t - I n O d O i s o l o a f node, go ( 4 ) ,
( ~ ) The c u r r e n t - - n o d e
is not a leaf
node,
the
p r o c e s s i n ! j a r e as f o l l o w ,
a. i I ' ~ll
the
elemerlts
in
the transformation
e ~ : p r e s s i o n o f t h e node c~re c o n s t a n t , go ( 5 ) ,
b. Ji ~ t i l l t h e
variables le
trio
transformation
expression
of
the
node a r e s u b s t i t u t e d
by
c(~nstants,
t h e e call s e m a n t i c p r o c e s s i n g mechanism to create suitable modifiers.
Go ( 5 ) .
a. i f ' t h e r e a r e scale u n s u b s t i t u t e d
variables in
the expression
o f t h e node. s e t t h e s e v a r l a - .
b i a s r~s c u r r e n t - n o d e er)e by one, aed uses t h e
r e s u l t s r e t u r n e d by each
subnode t o
replace
the vari(lbles.
(l~) Whet! t h e c u r r e n t = n o d e Js a l e a f n o d e , t h a t i s ,
it is n spec~flc
word o r arl l d l o l ~ ,
then
retrieves
its
defintt;lon
l=l"Onl t h e
IM'I'-,KB,
cell the semantic
i } r e ( : o s s l u [ I ,lecheli I Sill t o
determine
on appropriate
moaning f a r i t according to the tree s t r u c t u r e .
(5) I | ' Lhe eurrent-aode i s root node,then returns
trio c u r r o n { ; f o r n l o f the transfermatiou e x p r e s e i o n es
the translation
of the sentence.
Otherwise,
returns
tile expr'ession to the parent uode,reeovers the parent
nod~ rJs c u r r e e t node. Go ( 2 ) .
tdanslatlon.
The main t a s k s c o m p r i s e s :
a. Change t h e
order
o f t h e p h r a s e s and words o f
the translation,
b. S u b s t i t u t e
some words w h i c h c o l l o c o t i o n
is not
commonly used i n t h e C h i n e s e u t t e r a h c e f o r t h e
synoi
nymous words,
c. l n s e r t some conjunctive words when necessary,
d. E11m~.nate some redundant wards.
The algorithm f o r these processing i s ,
( I ) According t o the Chinese o o l l o c o t i o n rules
defined in the IMT-KB, changes the words and phrases
order o f the t r a n e l a t l o n which are not in accord with
t h e c o l l o c a t £ o n c o n v e n t i o n s i n C h i n e s e , such as,
Budon . . . .
Erchia
....
(2) According to the
co-occurrence rules
of the
C h i n e s e words
d e f i n e d i n t h e IMT-KB, check t h e uses
o f t h e C h i n e s e words i n t h e t r a n s l a t i o n .
If they are
n o t i n accord w i t h the co-occurrence r u l e s , then replaces t h e s e words
with the Chinese
synonymous words
u n t i l t h e y ape a c c o r d
to the
rules.
I f t h e r e i s no
s u i t a b l e s y n o n y m s . t h e n t r i e s t o e x t e n d t h e meaning o f
some w o r d s . The meaning e x t e n d i n g r u l e s a r e d e f i n e d i n
t h e word e n t r i e s .
I t s f o r m i s as f o l l o w ,
<word>
:-
<condition
<condition
1>
2>
< e x t e n s i o n I>
< e x t e n s i o n 2>
<condition
n>
< e x t e n s i o n n>
Here,
<word>
indicates
the
word
appeared
in
the
sentence,
<condition> defines the extending conditions,
<extension> is the utterances extended.
i£ the
word can n o t be r e p l a c e d o r e x t e n d e d , t i l e n
just returns the source translation.
( 5 ) Check t h e
translation
to
find
the
redundant
words
and e l i m i n a t e s them. The f o r m o f d o l e t i o r l r u l e
is,
X V X Z -> p ( X ) ,
p (V)
i X V Z
such as, 'NP de NP de -> NP NP d e ' .
Since the
modification
has no a b s o l u t e
standard
and r e q u i r e s a l a r g e amount o f w o r l d knowledge, i t i s
rather dlfflcult
t o s o l v e t h i s p r o b l e m i n one day. I n
t h e ZMT/EC, we o n l y d e a l w i t h t h e most s i m p l e
eases.
More complex s i t u a t i o n s
can be s o l v e d w i t h t h e a p p l i c a t i o n and i m p r o v e m e n t o f t h e s y s t e m . T h u s , t h e system
is
designed
t o be e a s i l y e x t e n d e d w i t h t h e a p p l i c a tion.
if t h e u s e r needs h i g h q u a l i t y
translation,
he may
c a l l the post e d i t i n g subroutine t o modify the t r a n s l a t i o n by human b e i n g s
or
with t h e aid o f human
b e i n g s . A t t i l e same t i m e , w e can a l s o s e t t h e l e a r n i n g
mechanisms
in working status
t o trace t h e m o d i f i c a t i o n p r o c e d u r e oF human b e i n g s
and p r o d u c e some u s e f u l r u l e s For t h e s y s t e m .
I~. Summary
In conclusion,
we hove l n t r o d u c e ~ ~
translation
processing procedure 9f the
English-Chinese
machine
translation
s y s t e m IMT/EC, and d e s c r i b e i t s p r i n c i p a l
processing algorithms.
Aknowledgement:
We would
like to
t h a n k Hang X i o n g ,
Zharlg
Yujie,
Ye Y i m l n ,
Tong J i o x i o n ,
Zong L l y i ,
Zhong Z i f e ,
Chen Z1zong,
Chen Z i z e n g and Fu Wei For
t h e i r c o o p e r a t i o n i n t h e i m p l e m e n t a t i o n o f IMT/EC,
5.
~.t,
The m o d i f i c a t i o n
of: t h e t r a n s l a t i o n
The o b j e c t i v e of" t h e a u t o m a t i c m o d i f i c a t i o n
of the
tr'auslGtlo~i
i S tO
ililprove
the
roadability
of the
transl{~L:lorl,but tilts socrlflces
part of the accuracy.
I t i s more s u i t a b l e f o r t h e n o n - s c t e l l t i f l c
literature
Reference
[1 ] A x e l b i e w e r , C h r i s t i a n
F e n n e y r o l , Johannes R l t z k e ,
ErWirl i t e g e n t r l t t ( 1 9 8 5 ) ,
ASCOF-A m o d u l a r m u l t i l e v e l
system f o r French-German t r a n s l a t i o n ,
OL, V o i . 1 1 , No.
2 - 5 , p'157-'154, 1985.
[ 2 ] B e r n a r d V a u q u o i s and C h r i s t i a n ~ ] o i t e t , A u t o m a t e d
121
t r a n s l e t i o n a t Grenoble u n i v e r s i t y , CL, Voi.11, No.I,
p28-36, 1985.
[5] galena Henlsz-Dostert et e l . , Machine Translat i o n , Mouton publishers, Hague, Paris,New York, 1979.
[4] Harry tennant, Natural Language processing, Pet r o c e l l i books, New York, 1981.
[5] Hiroshl Uchida, F u j l t r u machine t r a n s l a t i o n
system: ATLAS, FGCS, Vol.2, No.2, p95-1~0,1986.
[6] Jaime G. Carbonell and Masaru Tomita,
New
approaches t o Machine Translation, TR-CMU-CS-85-143,
Carnegie-Mellon U n i v e r s i t y , 1985.
[7] Jonathan Slocum, A survey o f machine t r a n s l a t i o n : i t s h l s t o r y , c u r r e n t status and f u t u r e perspectives, CL, Vol. 11,No. 1, p1-17, 1985.
[8] Kazunori Muraki, VENUS: Two-phrase machine
t r a n s l a t i o n system, FGCS, Vol.2, p121-124, 1986.
[9] Martin Kay, The MIND system, Natural Language
processing, edited by Rustin, Algorithmics press, New
York, 1975, p155-189.
[10] Makota Nagao, Current Status and f u t u r e trends
in machine t r a n s l a t i o n , FGCS, Vol. 2, No. 2, p77-82,
1986.
[11] M.Nogao, J . T s u j i l and J.Nakamura, Science and
Technology agency's machine t r a n s l a t i o n proJect,FGCS,
Vol.2, No.2, p125-14~, 1986.
[12] Murlel Vasconcellos and MarJorie Leon, SPANAM
and ENGSPAN: machine t r a n s l a t i o n at the PAN American
health o r g a n i z a t i o n , CL,Vol.11,No.2-5, p122-156,1985.
[15] Paul L . G a r v i n ( e d . ) ,
Natural Language and the
Computer, McGraw-Hill book, New York, London, 1979.
[14] Perelra F. and Warren D.,
D e f i n i t e clause
grammar f o r language onalvsls, A r t l f ~ c l o l I n t e l l l g e n ce, Vol.15,p251-278,198~.
[15] Pierre I s a b e l l e and Laurent Bourbeau, TAUMAVIATION:Its technlcal features and some experimental
r e s u l t s , CL, Vol.11, No. 1, p18-27, 1985.
[16] Richard E. CulZingford, Word-meaning s e l e c t i o n
in multiproces~language understanding,
IEEE Trans.
on Pattern analysis and m~chlne i n t e l l i g e n c e , Vol,
PAMI-.6, No.4., July 1984,p493-509.
[17] R.F.Simmons, Technologies f o r machine t r a n s l a t i o n , FGCS, Vol.2, No.2, p85-94, 1986.
[18] Rod Johnson, Maghi Kinq, end Louis des Tombe,
EUROTRA: A m u l t i l i n g u a l system under development, CL,
V01.11, No.2-3, p155-169,1985.
[19] Roger Sehank, The condeptuel
analysis o f
natural language, Natural Language processing, edited
by Randall Rustln, A l g o r i t h m i c s Press, New York,1975,
p291-511.
[20] Rozena Hennisz-Dostert,
R.Ross Macdonald and
Michael Zarechnak, Machine t r a n s l o t i o n , Mouton Publisher, Hugue, Paris, New York, 1979. ~.
[21] S.Amano, The Toshiba machine t r a n s l a t i o n system, FGCS, Vol.2, No.2, p121-124~11986.
[22] Terry Wlnograd, Language as a Cognltlve(Vol 1)
process, Addison-Wesley
Publishing
Company,
California,London, 1985.
[25] V t n f i e l d S.Bennett and Jonathan 61ocum, The
LRC machine t r a n s l a t i o n system, CL, Vo1.11, No.2-5,
p1¢1-121, 1985.
[24] Y. Wilks,
The stanford machine t r a n s l a t i o n
project,
Natural
Language Processing, edited by
Randall Rustin,
Algorithmics Press, New York, 1973,
p245-291.
[25] Yoshlhiko N l t t a , Problems o f machine t r a n s l a t i o n s y s t e m s - e f f e c t o f c u l t u r a l d i f f e r e n c e s on sentence s t r u c t u r e , FGCS, Vol.2, No.2, p117-120, 1986.
122
Download