A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text

advertisement
A Stochastic Parts Program and Noun
Phrase Parser for Unrestricted Text
Kenneth Ward Church
Ambiguity Resolution
In A Reductionist Parser
Atro Voutilainen & Pasi Tapanainen
Presented by Mat Kelly
CS895 – Web-based Information Retrieval
Old Dominion University
October 18, 2011
Church’s Ideas
• Objective: Develop system to tag
speech and resolve ambiguity
• When ambiguity occurs, use
stochastic processes to determine
optimal lexical possibility
PPSS
NP
VB
UH
AT
IN
NN
{ I
{ see
{ a
{ bird
PPSS = pronoun
NP = proper noun
VB = verb
UH = interjection
IN = preposition
AT = article
NN = noun
Various Ambiguities
• Noun-Verb Ambiguity: wind
– e.g. wind your watch versus the wind blows
• Noun-Complementizer Ambiguity: that
– Did you see that? vs.
– It is a shame that he is leaving.
• Noun-Noun and Adjective-Noun Distinction
– e.g. oily FLUID versus TRANSMISSION fluid
– First puts emphasis on fluid, second on
transmission
Overcoming Lexical Ambiguity
• Linear time dynamic programming algorithm
that optimizes product of lexical probabilities
• Recognize that no Markov
process can sufficiently
word
word
word
recognize English grammar
(Chomsky)
dependent on
– e.g. “The man who said that statement is arriving
today”
• Long distances across length > 1 prevents Markovian
analysis
Parsing Difficulties
• No amount of syntactic sugar will help resolve
the ambiguity*:
– Time flies like an arrow.
– Flying planes can be dangerous.
• Parser must allow for multiple possibilities
*Voutilainen states otherwise
Parsing Impossibilities
• Even the parser that considers likelihood will
sometimes become confused with “garden
Past tense verb
path” sentences
– or –
– The horse raced past the barn fell.
Passive participle
• Other than these, there is always a unique
best interpretation that can be found with
very limited resources.
Considering Likelihood
• Have/VB the students take the exam. (Imperative)
• Have/AUX the students taken the exam?
(question)
• Fidditch (a parser) proposed lexical
disambiguation rule [**n+prep] != n [npstarters]
– i.e. if a noun/preposition is followed by something
that starts with a noun phrase, rule out the noun
possibility
• Most lexical rules in Fidditch can be reformulated
in terms of bigram and trigram statistics
…With the Help of a Dictionary
• Dictionaries tend to convey possibilities and
not likelihood.
• Initially consider all assignments for words
– [NP [N I] [N see] [N a] [N bird]]
– [S [NP [N I] [N see] [N a]] [VP [V bird]]]
• Some parts of speech assignments are more
likely than others
Cross-Referencing Likelihood With
The Tagged Brown Corpus
• A Manually Tagged Corpus
• Parsing Likelihood Based on Brown Corpus
Word
Parts of Speech
I
PPSS (pronoun)
see
VB (verb)
a
AT (article)
bird
NN (noun)
5837 NP (proper noun)
1
771 UH (interjection)
1
2301 IN (French)
3
6
26
• Probability that “I” is a pronoun = 5837/5838
• frequency(PPSS|“I”)/frequency(“I”)
Contextual Probability
• Est. Probability observing part of speech X:
– Given POS Y and Z, X = freq(XYZ)/freq(YZ)
– Observing a verb before an article and a noun is
ratio of freq(VB,AT,NN)/freq(AT,NN)
Enumerate all potential parsings
I
see
a
bird
PPSS
VB
AT
NN
PPSS
VB
IN
NN
PPSS
UH
AT
NN
PPSS
UH
IN
NN
NP
VB
AT
NN
NP
VB
IN
NN
NP
UH
AT
NN
NP
UH
IN
NN
• Score each sequence by product of lexical probability,
contextual probability and select best sequence
• Not necessary to enumerate all possible assignments,
as scoring algorithm cannot see more than two words
away
Complexity Reduction
• Some sequences cannot possibly compete
with others in the parsing process and are
abandoned
• Only O(n) paths are enumerated
An example
I see a bird
Find all assignments of parts of speech and score partial sequence
(log probabilities)
(-4.848072 “NN”)
(-7.4453945, “AT”, “NN”)
(-15.01957, “IN” “NN”)
(-10.1914 “VB” “AT” “NN”)
(-18.54318” “VB” “IN” “NN)
(-29.974142 “UH” “AT” “NN”)
(-36.53299 “UH” “IN” “NN”)
(-12.927581 “PPSS” “VB” AT” “NN”)
(-24.177242 “NP” “VB” “AT” “NN”)
(-35.667458 “PPSS” “UH” “AT” “NN”)
(-44.33943 “NP” “UH” “AT” “NN”)
Note that all four possible paths
derived from French usage of
“IN” score less than others and
there is not way additional input
will make a difference
Continues for two more iterations to
obtain path values to put word 3 and 4 out
of range
(-12.262333 “” “” “PPSS” “VB” “AT” “NN”)
I/PPSS see/VB a/AT bird/NN.
Further Attempts at Ambiguity
Resolution (Voutilaninen’s Paper)
• Assigning annotations using a finite-state
parser
• Knowledge-based reductionist grammatical
analysis will be facilitated, introducing more
ambiguity.
• The amount of ambiguity, as shown, does not
predict the speed of analysis
Constraint Grammar (CG) Parsing
• Preprocessing and morphological analysis
• Disambiguation of morphological (part-ofspeech) ambiguities
• Mapping of syntactic functions onto
morphological categories
• Disambiguation of syntactic functions
Morphological Description
(“<*i>”
(“i”
NOM SG)
(“I” <*>
<*> ABBR
<NonMod>
PRON PERS NOM SG1))
(“I” <*> <NonMod> PRON PERS NOM SG1))
(“<see>”
(“<see>”
(“see”
–SG3 VFIN))
(“see” <SVO>
<SVO> V
V PRES
SUBJUNCTIVE
VFIN)
(“<a>”
(“see” <SVO> V IMP VFIN)
(“see”
<SVO> V INF)
(“a”
<Indef>DET
CENTRAL ART SG))
(“see” <SVO> V PRES –SG3 VFIN))
(“<bird>”
(“<a>”
(“bird” N NOM SG))
(“a” <Indef> DET CENTRAL ART SG))
(“<$.>”)
(“<bird>”
(“bird” <SV> V SUBJUNCTIVE VFIN)
(“bird” <SV> V IMP VFIN)
(“bird” <SV> V INF)
(“bird” <SV V PRES –SG3 VFIN)
(“bird” N NOM SG))
(“<$.>”)
Removing Ambiguity
Disambiguator Performance
• Best known competitors make misprediction
about part of speech in up to 5% of words
• This (ENGCG) disambiguator makes a false
prediction in only up to 0.3% of all cases
Finite-State Syntax
• All three types of structural ambiguity are presented in
parallel
– Morphological, clause boundary and syntactic
• No subgrammars for morphological disambiguation are
needed – one uniform rule components suffices for
expressing grammar.
• FS Parser – each sentence reading separately;
• CG - only distinguishes between alternative word readings
• FS rules only have to parse one unambiguous sentence at a
time – improves parsing accuracy
• Syntax more expressive than CG
– Full power of RegEx available
The Implication Rule
• Express distributions in a straightforward,
positive fashion and are very compact
– Several CG rules that express bits and pieces of
the same grammatical phenomenon can usually
be expressed with one or two transparent finite
state rules
Experimentation
• Experiment: 200 syntactic applied to a test text
• Objective: Were morphological ambiguities that
are too hard for ENGCG disambiguator to resolve
resolvable with more expressive grammatical
description and a more powerful parsing
formalism?
• Difficult to write parser as mature as ENGCG, so
some of the rules were “inspired” by the test text
– Though all rules were tested against various other
corpora to assure generality
Experimentation
• Test data first analyzed with ENGCG
disambiguator – Of 1400 word, 43 remained
ambiguous to morphological category
• Then, finite-state parser enriched text (creating
more ambiguities)
• After parsing was complete with FS parser, only 3
words remained morphologically ambiguous
• Thus, introduction of more descriptive elements
into sentence resolved almost all 43 ambiguities
Caveat
• Morphological ambiguities went from 434
But syntactic ambiguities raised amounted to
64 sentences:
– 48 sentences (75%) received a single syntactic
analysis
– 13 sentences received two analyses
– 1 sentence received 3 analyses
– 2 received 4 analyses
• A new notation was developed
A Tagging Example
• Add boundary markers
• Gives words functional tags
•@mv = main verb in non-finite
construction
•@MV inspires in a main verb in a
finite construction
•@MAINC main clause tag to
distinguish primary verb
•@>N determiner or premodifier
of a nominal
• [[fat butcher’s] wife] vs.
[[fat [butcher’s wife]
• Irresolvable ambiguity
• Kept convert by notation
• Uppercase used for finite
construction
•Eases grammarian’s task
@@
smoking
PCP
@mv
SUBJ@
@
cigarettes
N
@obj
inspires
V
@MV
the
DET
@>N
@
fat
A
@>N
@
butcher‘s
N
@>N
@
wife
N
@OBJ
@
and
CC
@CC
@
daughters
N
@OBJ
@
.
FULL
STOP
@
MAINC@
@
@@
If we could not treat these non-finite clauses separately, extra checks for further subjects in nonfinite clauses would have been necessary
Grouping Non-Finite Clauses
• Note the two simplex subjects in the
same clause
• Subject in finite clause with
main verb
• Non-finite clause with main
verb
• Note the difference in case
• Unable to attach adverb “so early” to
“dislikes” or “leaving”
• Structurally irresolvable ambiguity
•Description through notation is
shallow
@@
Henry
N
@SUBJ
@
dislikes
V
@MV
her
PRON
@subj
leaving
PCP1
@mv
so
ADV
@>A
@
early
ADV
@ADVL
@
.
FULL
STOP
@OBJ
@@
@MAINC
@
@
OBJ@
@
Extended Subjects
• “What makes them acceptable” acts
as finite clause
•“that they have different verbal
regents” acts as a subject complement
@@
What
PRON
@SUBJ
@
makes
V
@MV
them
PRON
@OBJ
@
acceptable
A
@OC
@/
is
V
@MV
that
CS
@CS
@
they
PRON
@SUBJ
@
have
V
@MV
different
A
@>N
@
verbal
A
@>N
@
regents
N
@OBJ
@
.
FULL
STOP
SUBJ@
MAINC@
SC@
@
@/
@
@@
Deferred Prepositions
• @>>P signifies a delayed preposition
•i.e. “about” has no right-hand
context, it is either prior or nonexistent
• Adverbs can also be deferred in this
fashion
•Without a main verb:
•“Tolstoy her greatest novelist”
granted clause status
• Signified by clause boundary
symbol @\
• Note no function tag (only main
verb gets these)
@@
@@
@@
Pushkin
What
This
NPRON
PRO
@SUBJ
@>>P
@SUBJ
iswas
are
Russia’s
the
you
greatest
house
talking
poet
she
about
,
was
<Deferred>
VVN
@MV
@MV
MAINC@
MAINC@@ @
V
NDET
@AUX
@>N
@>N
@@
@
@SUBJ
APRON @>N
@@
and
?looking
CCQUES @CC
N
@SC
@MV
NPCP1 @SC
PRON @SUBJ
PREP @ADVL
COMM
V
@AUX
A
@@
@
@/
MAINC@ @\@
@
@@
@
Tolstoy
for
her
PREP
PRON
@ADVL
@>N
@ @
greatest
A
@>N
@
@SC
@
.
novelist
.
FULL
NSTOP
FULL
STOP
N<@
@ @@
PCP1
@MV
T-ION
N
@SUBJ
@
@
@@
@@
Ambiguity Resolution with a
Finite-State (FS) Parser
A pressure lubrication system is employed, the pump, driven from the distributor shaft
extension, drawing oil from the sump through a strainer and distributing it through the cartridge
oil filter to a main gallery in the cylinder block casting.
• 10 million sentence readings
– 1032 readings if each boundary between each word is
made four-ways ambiguous
– 1064 readings if all syntactic ambiguities are added
• In isolation, each word is ambiguous in 1-70 ways
• We can show that # of readings does not alone
predict parsing complexity.
Reduction of Parsing Complexity in
Reducing Ambiguity
• Window of more than 2 or 3 words requires
excessively hard computation.
• Acquiring collocation matrices based on 4/5-gram
requires tagged corpora >> than current manually
validated tagged ones.
• Mispredictions accumulate but more
mispredictions are likely to occur in later stages
with this scheme
– No reason to use unsure probabilistic information as
along as we use defined linguistic knowledge
Degree of Complexity Reduction
• Illegitimate readings are discarded along the
way
– In a sentence with is 1066-way ambiguous might
have only 1045 ambiguities left after initial
processing through an automaton.
– Takes a fraction of a second, reduced readings by
1/ 21
10
– Can then apply another rule, repeat.
– Reduces ambiguity to an acceptable level quickly
Applying Rules Prior to Parsing:
Four Methods
Rule
Automaton
Sentence
Automaton
∩
Intersection
Result
Rule
Automaton
End
Result
Iteratively Repeat
with all rules
1. Process rules iteratively: Takes a long time
2. Order rule automata before parsing – the most
efficient rules are applied first
3. Process all rules together
4. Use extra information to direct parsing
Before Parsing,
Reduce Number of Automata
• A set of them can be easily combined using
intersection
• Not all rules are needed in parsing because
some categories might not be present in
sentence – select applicable rules at runtime.
method
1
2
3
4
5
Non-opt.
31000
730
1500
500
290
Optimized
7000
840
350
110
30
Execution Times (sec.)
Summing up Process Described
1. Preprocess text (text normalization and boundary
detection)
2. Morphologically analyze and enrich text with syntactic
and clause boundary ambiguities
3. Transform each sentence into FSA
4. Select relevant rules for sentence
5. Intersect a couple of rule groups with the sentence
automaton
6. Apply all remaining rules in parallel
7. Rank resulting multiple analyses according to heuristic
rules and select best one desire totally unambiguous
result
Conclusions
• Church:
– Tag parts of speech
– Disregard illigitimate permutations based on
unlikelihood (through prob analysis)
• Voutilainen:
– The grammar rules, not amt of ambiguity
determines hardness of ambiguity resolution.
– Tag parts of speech but consider finite and nonfinite constructions to reduce complexity of
overcoming ambiguity.
References
Church, K. W. (1988). A Stochastic Parts Program and Noun Phrase
Parser for Unrestricted Text. Proceedings of the second conference on
Applied natural language processing (Vol. 136, pp. 136-143).
Association for Computational Linguistics. Retrieved from
http://portal.acm.org/citation.cfm?id=974260
Voutilainen, A., & Tapanainen, P. (1995). Ambiguity resolution in a
reductionistic parser. Sixth Conference of the European Chapter of the
Association for Computational Linguistics, 4(Keskuskatu 8), 394-403.
Association for Computational Linguistics. Retrieved from
http://arxiv.org/abs/cmp-lg/9502013
Download