A CHINESE NATURAL LANGUAGE PROCESSING

From: AAAI-86 Proceedings. Copyright ©1986, AAAI (www.aaai.org). All rights reserved.
A CHINESE
NATURAL
THE
Long-Ji
Lin*,
James
LANGUAGE
THEORY
PROCESSING
OF EMPTY
Huang**,
K.J.
Chen***
*Dept.
of Electrical
Engineering,
National
**Dept.
of Modern
Languages
and Linguistics,
***Institute
of Information
Science,
Academia
ABSTRACT
In
this
paper,
we will
present
a device
specially
designed
on the basis
of the theory
of
empty
categories.
This device
cooperates
with a
bottom-up
parser
and
is used
as an elegant
and
efficient
approachtotreatthetroublesome
problems
of the transformations
of passivization,relativizatlon;
toplcalization,
ba-transformation
use of zero
pronouns
in Chinese
natural
With the aid of the device,
the grammar
Chinese
will be much more simplified
and
design,
and
the
processing
capability
significantly
improved.
and
the
language.
rules
easier
for
to
can
be
SYSTEM
BASED
UPON
CATEGORIES
and
Lin-Shan
Lee*
Taiwan
University,
Taiwan,
R.O.C.
Cornell
University,
U.S.A.
Sinica,
Taiwan,
R.O.C.
The SASC system
uses a bottom-up
parser
instead
of
a top-down
parser,
because
the’former
tends to be
more efficient
for Chinese sentence
analysis.
The
parser
uses
charts
(Kay,
1973;
Kaplan;
1973) as
global
working
structures,
because
many
natural
language
processing
systems,
such
as MIND
(Kay,
proved
the
1973)
and
GSP
(Kaplan,
1973))
have
chart
to be an efficient
data structure
to record
what have been done so far in the course of oarsA parser
based
on
charts
can avoid
the
ing.
inefficiency
in duplicating
many computations
that
parser
often
suffers
when backtracking
a top-down
occurs.
The
input
Chinese
sentence
is
submitted
to
a
preprocessor,
I
INTRODUCTION
relativization,
topicalization,
and
the
use
of zero
pronouns
in Chinese.
To deal with those
syntactic
phenomena,
the conventional
approach
is
to cover
all the
to collect
a set of grammar rules
possible
sentence
patterns
derived
from
those
transformations.
But
such
an approach
needs
a
set
of
grammar
rules
to
cover
all
the
great
possibilities.
Especially
the complexity
resulting
from the
interactions
of
several
transformations
will
make such an approach
infeasible.
Passivization,
ba-transformation
play major
roles
Another
approach
adopted
in this paper
is the
raise-bind
mechanism
based
upon
the
the
theory
of empty
categories.
It seems
that
the
are not related
to each
above
syntactic
phenomena
ot ier.
However,
the sentences
derived
from them
all involve
the common
use of empty
categories.
With
the
use
of
the
raise-bind
mechanism,
the
parser
will
treat
the transformations
in the same
way.
use
of
The
following
our parsing
categories
mechanism
sections
algorithm
Chinese
in
operates.
first,
and
will
briefly
describe
then
discuss
empty
how
the
raise-bind
which
seqments
input
sentence
(a
seqbence
of Chinese
characters)
into words.
The
result
of
the
preprocessor
is
represented
by a
chart,
and
is sent
to
the
parser.
The parser
parses
sentences
in the way that phrases
are built
by startinq
with their
heads
and
up on the chart
adjoining
constituknts
on the left or the right
of
the heads.
For example,
according
to the phrase
structure
rule
(PSR),
"NP->
QP N", N (noun)
is
the head
of NP.
When
encounterinq
a noun,
the
parser
will
try
to build
an NP by-starting’
with
the
noun
and
adjoining
proceding
quantity
the
phrase
(QP).
According
to the PSR, "VP->
V-n
NP",
V-n
(transitive
verb)
is the
head
of VP.
a transitive
verb,
the parser’s
When encountering
action
is similar
to that of "NP ->
QP N", except
that
it tries
to adsjoin the followinq
NP as its
object.
But if its following
NP is not yet parsed
by the parser,
the expectation
to build
a VP is
&spend&d
until
an NP' is built
up in the object
position.
The parser
using
the above
algorithm
constructs
syntax
trees
of
input
sentences
exactly
from bottom
to top.
The alqorithm
used seems
to
be a good combination
of da&d-driven
parsing
and
hypothesis-driven
parsing.
The implementation
of
parsing
algorithm
and
the
grammar to model
the
Chinese
syntax
can
III
II
THE
PARSING
In
the
SASC
system
presented
here,
Chinese
syntactically
analyzed
from
the
of generative
grammar
1982).
(Huang,
are
found
EMPTY
in
(Lin
et al.,
1986).
CATEGORIES
ALGORITHM
Let's
sentences
viewpoints
be
(I)
consider
,flimfm
he hurt
the
following
Chinese
sentences.
SE
Chang-san
NATURAL
LANGUAGE
I 1059
(2)
material,
but is “bound”
to its antecedant ,“ChangSan”.
In addition
to ba-transformation,
passiviztopicalization
and relativization
can also
ation,
ba-transformation:
fib 82
ES
t-%BT
e
be analyzed
as involving
some form of “move
0 ‘I.
Thus there
are traces
within
these constructions.
he
ba Chang-san
hurt
(He hurt Chang-san)
(3)
(6)-(8) also contain
vacant NP-posinot traces,
because
they are not
They are
called
“null
derived
from “move
Q “.
Null
pronominals
are
in general
pronominals”.
sentence
(8).
But
those
in
for example,
free,
for
example,
are
bound,
certain
constructions
Sentence
(7) is called
a
sentence
(6) and
(7).
tions,
passivization:
$Ez
3%
fib
U@T
e
I
Chang-san
(4)
hurt
was hurt by him)
by him
(Chang-san
?flCWiJB
first
?5t
ZSl!%
So, the
“bound”
e
relativization:
I%
in
the
the
object
of
the
of the second
verb.
subject
position
is
empty
that dog I never have seen
(I have never seen that dog)
3%~
that
is,
subject
the
null pronominal
to the object.
are
known
as
null
pronominals
Traces
and
The syntactic
(or
empty
NPs).
categories
behavior
of
null
pronominals
is different
from
are treated
inhowever,
that
of traces.
They,
discriminately
in our implementation.
I
(5)
are
construction;
verb is also
pivot
topicalization:
Sentence
which
‘M4
IV
THE
RAISE-BIND
MECHANISM
2
playing
(the
(6)
5&1z
tried
tried
(sy
empty
%$I?)
escape
to
EtrG
b
/Ia
escape)
e
El%)
I
he asked children
(He asked the children
(8)
using
zero
go to dinner
to go to dinner)
T3B
someone
e
or
something)
Sentence
(Z)-(8) all involve
a missing s&ject
or object (indicated
by “e”).
But what does each
missing
subject
or object
refer
to?
The solid
lines
under
sentence
(Z)-(7)
indicate
the reference of each one.
The missing
object
in (81, however,
does not refer
to any element within (8). In
fact,
it is an omitted
pronoun,
which
refers
to
someone
or something
understood
in the situation.
According
to
the
current
linguistic
theory
(Chomsky,
1981;
Huang,
1982),
sentence
(2)
is
derived
from
sentence
(I)
by
a transformation
called
“move
0 “.
The transformation
is pemrformed
as
follows : the
object,
“Chang-San”
in
(1) , is
” E
” (“ba”)
to the position
moved
by carrier
.
indicated
(indicated
1060
in (Z), and then leaves behind
a “trace”
b y “e”).
The trace
dominates
no lexcial
/ ENGINEERING
be
bound
more
than
one
time.
every
category.
NP position
can
be filled
by an
empty categories
only
In Chinese,
in the subject
position
and direct
object
appear
in the indirect
object
posiposition,
and never
and
never
in the indirect
object
position
tion,
empty
likes
likes
an
Once being bound,
the empty NP
its antecedant.
not be raised
any further
this is because
empty
NP has exactly
one antecedant
and cannot
to
will
Not
pronoun:
SEE
Chang-san
(Chang-san
raise-bind
mechanism
is used to cope with
categories;
in other
words,
to find out the
antecedant
for
each
empty
category
except
those
With the aid
are free
(eq. sentence
(8) ) .
which
of
the
raise-bind
mechanism,
the
parser
will
inserted
into
the
vacant
generate
an empty
NP
Then
position
where an NP is expected
to appear.
up in some way along
the empty
NP will be raised
when
the
tree
is
growing
up
the
parsing
tree,
until
(recall
that
the
parser
works
bottom-up),
At this
point,
the
its
antecedant
is
parsed.
parser
binds
the empty
NP by setting
it to refer
The
playing)
construction:
pivot
fibi
were
SE
Chang-san
(Chang-san
(7)
children
de
who
children
and
prepositional
object
position.
In our
implementation,
an empty
NP contains
three
fields:
(1) a field to keep the pointer
to
came
antecedant,
(2) a field to keep where it
its
and
(3) a field
to keep
the syntactic
or
from,
semantic
constraints
on the
empty
NP for
later
checking.
We
can
formulate
the
rules
informally
to
treat
relativization
as follows:
for a noun and a
the
relative
clause
to be combined
into
an NP,
relative
clause
must contain
an empty NP which
is
unbound
and
marked
coming
from
either
subject
position
or object position,
and the empty NP will
be
bound
We
as
to
the
(head)
can
also
state
follows:
once
a
noun.
the rules
clause
is
for
passivization
the
constructed,
parser
checks
whether
” @ +NP” (similar
to
volved
in the
clause.
the
prepositional
“by+NP”
in English)
If
so, there
and marked
empty
NP which
is unbound
the object
position,
and it
subject
of the clause.
will
phrase,
is
in-
must
be an
coming
from
be
bound
to
the
Rules
for pivot constructions
can be formulated
as
follows:
in a pivot
construction,
the
direct
object
will
bind the empty
NP coming
from
the subject
position
of the embedded
clause.
Similarly,
rules
for
topicalization,
batransofmration
and
others
can
be designed.
To
illustrate
the above rules,
let’s
consider
example
(9) and its
parsing
tree
in figure
1.
such a construction
or relativization,
ation
be ruled
out.
If the mechanism
is adopted
English
sentence
analysis,
a test
must
be
go to dinner
de children
were asked by Li-szu
who
to
for
permore
formed
to
rule
out
sentences
with
one
or
categories
which
have no binder.
But such
sentences
are
in general
grammatical
in Chinese
empty
(see
(8)).
MORE
V
SYNTACTIC
Relativization
movement;
several
PHENOMENA
in
it
Chinese
is a long-distance
can move
an object
across
Noun phrase (IO’) is an
ndoes.
that
is,
S (sentence)
example.
(~O)[sZk a4 +p9
by Li-szu
ask
(the children
to dinner)
will
rs $$J $&.W
I ask Li-szu
(the book which
go
(1-l) r.eZgf&
NP
e 111 &
l-
help me
buy
I asked Li-szu
el I&
A
de
the
like
s
de
to buy
book
for
me)
man
If the head
Noun phrase
(11) is ambiguous.
noun
(“the man”)
binds
el, the NP means
“the man
If the head noun binds
e2,
whom
someone
likes”.
it means “the man who likes someone
or something”.
To remove
the ambiguity
needs
semantic
interactions.
(ask)
el 4
Figure
e2 +
e3
/J\ a
The
(9) :
parsing
the dummy
subject.
(2)
is a pivot construction,
Node
S2 is
constructed.
because
of the PP, “by
for passivization,
rules
is
constructed.
r)
(children)
tree
follow the bottom-up
parser
(I) Node Sl is constructed
Let’s
Now we can
e2
e3 +
1.
5iuEAR
(go to dinne
According
of
to
(9)
parse
example
serves
V’ is constructed.
and
el
as
Node
V’
so el is bound to e2. (3)
S2 is a passive
clause,
Li-szu”.
According
to the
e3 binds e2. (4) Node NP
to
the
rules
for
rela-
formulate
the rules
for relativizas follows:
for a noun and a relative
clause
to be combined
into an NP, the parser
checks
the
raised
from the relative
clause.
“empty-NP
list”
And
“if no empty NP is raised,
rule out the NP;
and marked
coming
from
if an empty
NP is raised
subject
position
or object position
or embedded
object
position
(as in (IO)), set the empty NP
to be bound to the head noun;
NPs
are
raised
from
subject
and
if two
empty
(as in (II)), employ
semantic
object
position
to determine
the proper
binding.”
analysis
ation
Like
relativization,
long-distance
the parsing
tree
in figure1
is finished,
it is
easy to answer who were asked and who went
to dinner.
Since
el is the dummy subject
of “go
to
dinner”
and
the
binder
of el
is e2,
whose
binder
is e3,
whose
binder
is “children”,
we can
conclude
it is “children”
who went to dinner.
In
the same way,
we also
conclude
it is
“children”
who were asked.
A
element
raise-bind
mechanism
also
serves
as a
rule out incorrect
sentencesorincorrect
trees.
For example,
if no empty
NP is
within
a construction
involving
passiviz-
The
to
is
treated
is also a
in a similar
Another
syntactic
phenomena
crucial
to
the
is
known
as
the
Complex
NP
Constraint
(CNPC)
(Radford,
1981):
rule
can move
any
CNPC -- No transformation
parser
filter
parsing
raised
and
way.
tivization,
e3 is bound
to
“children”.
Notice
that only e3 was raised
up across
node S2, because
el and e2 had been bound beforeS2was
constructed.
Once
topicalization
movement
complex
NP
(CNP)
out
is
an
of a complex
NP
containing
NP.
a relative
clause.
CNPC can be easily
encoded
in our grammar
NPs can not be raised
up
way-- all empty
ar: NP node.
Hence it is impossible
for the
NP within
a CNP to be bound
to any element
The
in
this
across
empty
out
of
that
CNP.
In most cases,
ba-transformation
and passiviBut
zation
will move the direct
objects
of verbs.
raising”
the phenomena
known as “subject-to-object
(Radford,
1981) makes some differences:
NATURAL LANGUAGE
/
106
1
--The
subject
of an embedded
clause
can be moved
into the subject
(or ba-object)
position
of the
higher
clause
by passivization
(or
ACKNOWLEDGEMENTS
ba-transform-
Thanks
ation).
Chen,
is
sentence
(13)
example,
(12) by such a movement.
For
sentence
derived
to
the
and Chen,
J.J.
enlightening
discussions
of
J.C.
from
REFERENCES
people
(This
believe
will
mistake
will
be
this
is
mistake
believed
to
right
be right)
To cope
with
subject-to-object
raising,
the
in previous
section
for
passivization
are
of
a passive
the
subject
modified
as
follows:
the object
clause will bind the empty NP in either
position
or the subject
position
of an embedded
clause.
rules
VI
A COMPARISON
WITH
THE
HOLD-LIST
MECHANISM
In ATN (Bates, 1978), the hold-list
mechanism
used
for
the purpose
similar
to that
of the
is
raise-bind
But
mechanism.
we
object
to
approch,
for
(1) it is not
fit for
deal with
null
I;a;ser;
(2)
it cannot
(6)-(a));
(3) it handles
e . example
position
position
left
(eg. example
(eg. example
(Z)-(4)),
(5)).
extraposition,
the position
left
(right)
with
right
extraposition,
not
a
to
its
ATN
theory,
bcth
is called
an NP to
To deal
mech-
extraposition
and right
dominating
is
bound,
extraposition
move an NP to a position
and a null
pronominal,
if
its
trace,
to an NP dominating
the
always
bound
So, the raisepronominal
(Chomsky , 1981) .
null
bind
mechanism
is
sufficient
to
cope
with
all
since
its function
is to raise
empty
categories,
up an empty
category
to be bouend to an NP which
dominates
this empty category.
VII
CONCLUSION
We have presented
how the raise-bindmechanism
copes with traces
and null pronominals
in Chinese.
With the use of the mechanism,
many sophisticated
syntactic
phenomena
can be encoded
in the grammar
easily.
The
complete.
remove
correct
reached.
1062
mechanism
is
simple
and
theoretically
If semantic
analysis
is employed
to
such
as
example
(II),
the
ambiguities,
bindings
of empty categories
can always be
/ ENGINEERING
Chomsky,
.
.
Blnding,
and
Lectures
N. (1981)
Forise,
Dordrecht.
J.
l-31 Huang
(1982)
Logical
Theory
the
of
on Government
Relations
Grammar,
in
MIT
and
Chinese
doctoral
dissertation.
(1973)
[Rustln
"A
General
19731.
[41
Kaplan,
R.M.
Processor",
in
r51
Kay,
M.
19731.
b1
L. J. Lin, K. J. Chen, James Huang and L.S. Lee
(1986)
"SASC:
A Syntactic
Analysis
System
for
Chinese
Sentences",
International
Journal
of
Processing
of Chinese
and
Oriental
Computer
an
pronominals
extraright
extra-
trace.
uses
another
left
121
bottom-up
anism.
In linguistic
Bates,
(1973)
The
MIND
Languages,
Published
Computer
Society.
left
An movement
if it moves
(right)
such
M. (1978)
"The Theory
and Practice
of
Augmented
Transition
Network Grammars”,Natural
Language
Communication
with Computers,
pp.lVl259.
I31
[71
Radford,
Student's
Theory,
[81
A. (1981)
Guide to
Cambridge
Rustin
R, ed.
ing, Algorithm
by
System,
Syntactic
in
Chinese
Transformational
Chomsky's
Extended
Univ. Press,
1981.
(1973)
Press,
Natural
N.Y.
Language
[Rustin
Language
Syntax:
Standard
Process-
A