A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text Kenneth Ward Church Ambiguity Resolution In A Reductionist Parser Atro Voutilainen & Pasi Tapanainen Presented by Mat Kelly CS895 – Web-based Information Retrieval Old Dominion University October 18, 2011 Church’s Ideas • Objective: Develop system to tag speech and resolve ambiguity • When ambiguity occurs, use stochastic processes to determine optimal lexical possibility PPSS NP VB UH AT IN NN { I { see { a { bird PPSS = pronoun NP = proper noun VB = verb UH = interjection IN = preposition AT = article NN = noun Various Ambiguities • Noun-Verb Ambiguity: wind – e.g. wind your watch versus the wind blows • Noun-Complementizer Ambiguity: that – Did you see that? vs. – It is a shame that he is leaving. • Noun-Noun and Adjective-Noun Distinction – e.g. oily FLUID versus TRANSMISSION fluid – First puts emphasis on fluid, second on transmission Overcoming Lexical Ambiguity • Linear time dynamic programming algorithm that optimizes product of lexical probabilities • Recognize that no Markov process can sufficiently word word word recognize English grammar (Chomsky) dependent on – e.g. “The man who said that statement is arriving today” • Long distances across length > 1 prevents Markovian analysis Parsing Difficulties • No amount of syntactic sugar will help resolve the ambiguity*: – Time flies like an arrow. – Flying planes can be dangerous. • Parser must allow for multiple possibilities *Voutilainen states otherwise Parsing Impossibilities • Even the parser that considers likelihood will sometimes become confused with “garden Past tense verb path” sentences – or – – The horse raced past the barn fell. Passive participle • Other than these, there is always a unique best interpretation that can be found with very limited resources. Considering Likelihood • Have/VB the students take the exam. (Imperative) • Have/AUX the students taken the exam? (question) • Fidditch (a parser) proposed lexical disambiguation rule [**n+prep] != n [npstarters] – i.e. if a noun/preposition is followed by something that starts with a noun phrase, rule out the noun possibility • Most lexical rules in Fidditch can be reformulated in terms of bigram and trigram statistics …With the Help of a Dictionary • Dictionaries tend to convey possibilities and not likelihood. • Initially consider all assignments for words – [NP [N I] [N see] [N a] [N bird]] – [S [NP [N I] [N see] [N a]] [VP [V bird]]] • Some parts of speech assignments are more likely than others Cross-Referencing Likelihood With The Tagged Brown Corpus • A Manually Tagged Corpus • Parsing Likelihood Based on Brown Corpus Word Parts of Speech I PPSS (pronoun) see VB (verb) a AT (article) bird NN (noun) 5837 NP (proper noun) 1 771 UH (interjection) 1 2301 IN (French) 3 6 26 • Probability that “I” is a pronoun = 5837/5838 • frequency(PPSS|“I”)/frequency(“I”) Contextual Probability • Est. Probability observing part of speech X: – Given POS Y and Z, X = freq(XYZ)/freq(YZ) – Observing a verb before an article and a noun is ratio of freq(VB,AT,NN)/freq(AT,NN) Enumerate all potential parsings I see a bird PPSS VB AT NN PPSS VB IN NN PPSS UH AT NN PPSS UH IN NN NP VB AT NN NP VB IN NN NP UH AT NN NP UH IN NN • Score each sequence by product of lexical probability, contextual probability and select best sequence • Not necessary to enumerate all possible assignments, as scoring algorithm cannot see more than two words away Complexity Reduction • Some sequences cannot possibly compete with others in the parsing process and are abandoned • Only O(n) paths are enumerated An example I see a bird Find all assignments of parts of speech and score partial sequence (log probabilities) (-4.848072 “NN”) (-7.4453945, “AT”, “NN”) (-15.01957, “IN” “NN”) (-10.1914 “VB” “AT” “NN”) (-18.54318” “VB” “IN” “NN) (-29.974142 “UH” “AT” “NN”) (-36.53299 “UH” “IN” “NN”) (-12.927581 “PPSS” “VB” AT” “NN”) (-24.177242 “NP” “VB” “AT” “NN”) (-35.667458 “PPSS” “UH” “AT” “NN”) (-44.33943 “NP” “UH” “AT” “NN”) Note that all four possible paths derived from French usage of “IN” score less than others and there is not way additional input will make a difference Continues for two more iterations to obtain path values to put word 3 and 4 out of range (-12.262333 “” “” “PPSS” “VB” “AT” “NN”) I/PPSS see/VB a/AT bird/NN. Further Attempts at Ambiguity Resolution (Voutilaninen’s Paper) • Assigning annotations using a finite-state parser • Knowledge-based reductionist grammatical analysis will be facilitated, introducing more ambiguity. • The amount of ambiguity, as shown, does not predict the speed of analysis Constraint Grammar (CG) Parsing • Preprocessing and morphological analysis • Disambiguation of morphological (part-ofspeech) ambiguities • Mapping of syntactic functions onto morphological categories • Disambiguation of syntactic functions Morphological Description (“<*i>” (“i” NOM SG) (“I” <*> <*> ABBR <NonMod> PRON PERS NOM SG1)) (“I” <*> <NonMod> PRON PERS NOM SG1)) (“<see>” (“<see>” (“see” –SG3 VFIN)) (“see” <SVO> <SVO> V V PRES SUBJUNCTIVE VFIN) (“<a>” (“see” <SVO> V IMP VFIN) (“see” <SVO> V INF) (“a” <Indef>DET CENTRAL ART SG)) (“see” <SVO> V PRES –SG3 VFIN)) (“<bird>” (“<a>” (“bird” N NOM SG)) (“a” <Indef> DET CENTRAL ART SG)) (“<$.>”) (“<bird>” (“bird” <SV> V SUBJUNCTIVE VFIN) (“bird” <SV> V IMP VFIN) (“bird” <SV> V INF) (“bird” <SV V PRES –SG3 VFIN) (“bird” N NOM SG)) (“<$.>”) Removing Ambiguity Disambiguator Performance • Best known competitors make misprediction about part of speech in up to 5% of words • This (ENGCG) disambiguator makes a false prediction in only up to 0.3% of all cases Finite-State Syntax • All three types of structural ambiguity are presented in parallel – Morphological, clause boundary and syntactic • No subgrammars for morphological disambiguation are needed – one uniform rule components suffices for expressing grammar. • FS Parser – each sentence reading separately; • CG - only distinguishes between alternative word readings • FS rules only have to parse one unambiguous sentence at a time – improves parsing accuracy • Syntax more expressive than CG – Full power of RegEx available The Implication Rule • Express distributions in a straightforward, positive fashion and are very compact – Several CG rules that express bits and pieces of the same grammatical phenomenon can usually be expressed with one or two transparent finite state rules Experimentation • Experiment: 200 syntactic applied to a test text • Objective: Were morphological ambiguities that are too hard for ENGCG disambiguator to resolve resolvable with more expressive grammatical description and a more powerful parsing formalism? • Difficult to write parser as mature as ENGCG, so some of the rules were “inspired” by the test text – Though all rules were tested against various other corpora to assure generality Experimentation • Test data first analyzed with ENGCG disambiguator – Of 1400 word, 43 remained ambiguous to morphological category • Then, finite-state parser enriched text (creating more ambiguities) • After parsing was complete with FS parser, only 3 words remained morphologically ambiguous • Thus, introduction of more descriptive elements into sentence resolved almost all 43 ambiguities Caveat • Morphological ambiguities went from 434 But syntactic ambiguities raised amounted to 64 sentences: – 48 sentences (75%) received a single syntactic analysis – 13 sentences received two analyses – 1 sentence received 3 analyses – 2 received 4 analyses • A new notation was developed A Tagging Example • Add boundary markers • Gives words functional tags •@mv = main verb in non-finite construction •@MV inspires in a main verb in a finite construction •@MAINC main clause tag to distinguish primary verb •@>N determiner or premodifier of a nominal • [[fat butcher’s] wife] vs. [[fat [butcher’s wife] • Irresolvable ambiguity • Kept convert by notation • Uppercase used for finite construction •Eases grammarian’s task @@ smoking PCP @mv SUBJ@ @ cigarettes N @obj inspires V @MV the DET @>N @ fat A @>N @ butcher‘s N @>N @ wife N @OBJ @ and CC @CC @ daughters N @OBJ @ . FULL STOP @ MAINC@ @ @@ If we could not treat these non-finite clauses separately, extra checks for further subjects in nonfinite clauses would have been necessary Grouping Non-Finite Clauses • Note the two simplex subjects in the same clause • Subject in finite clause with main verb • Non-finite clause with main verb • Note the difference in case • Unable to attach adverb “so early” to “dislikes” or “leaving” • Structurally irresolvable ambiguity •Description through notation is shallow @@ Henry N @SUBJ @ dislikes V @MV her PRON @subj leaving PCP1 @mv so ADV @>A @ early ADV @ADVL @ . FULL STOP @OBJ @@ @MAINC @ @ OBJ@ @ Extended Subjects • “What makes them acceptable” acts as finite clause •“that they have different verbal regents” acts as a subject complement @@ What PRON @SUBJ @ makes V @MV them PRON @OBJ @ acceptable A @OC @/ is V @MV that CS @CS @ they PRON @SUBJ @ have V @MV different A @>N @ verbal A @>N @ regents N @OBJ @ . FULL STOP SUBJ@ MAINC@ SC@ @ @/ @ @@ Deferred Prepositions • @>>P signifies a delayed preposition •i.e. “about” has no right-hand context, it is either prior or nonexistent • Adverbs can also be deferred in this fashion •Without a main verb: •“Tolstoy her greatest novelist” granted clause status • Signified by clause boundary symbol @\ • Note no function tag (only main verb gets these) @@ @@ @@ Pushkin What This NPRON PRO @SUBJ @>>P @SUBJ iswas are Russia’s the you greatest house talking poet she about , was <Deferred> VVN @MV @MV MAINC@ MAINC@@ @ V NDET @AUX @>N @>N @@ @ @SUBJ APRON @>N @@ and ?looking CCQUES @CC N @SC @MV NPCP1 @SC PRON @SUBJ PREP @ADVL COMM V @AUX A @@ @ @/ MAINC@ @\@ @ @@ @ Tolstoy for her PREP PRON @ADVL @>N @ @ greatest A @>N @ @SC @ . novelist . FULL NSTOP FULL STOP N<@ @ @@ PCP1 @MV T-ION N @SUBJ @ @ @@ @@ Ambiguity Resolution with a Finite-State (FS) Parser A pressure lubrication system is employed, the pump, driven from the distributor shaft extension, drawing oil from the sump through a strainer and distributing it through the cartridge oil filter to a main gallery in the cylinder block casting. • 10 million sentence readings – 1032 readings if each boundary between each word is made four-ways ambiguous – 1064 readings if all syntactic ambiguities are added • In isolation, each word is ambiguous in 1-70 ways • We can show that # of readings does not alone predict parsing complexity. Reduction of Parsing Complexity in Reducing Ambiguity • Window of more than 2 or 3 words requires excessively hard computation. • Acquiring collocation matrices based on 4/5-gram requires tagged corpora >> than current manually validated tagged ones. • Mispredictions accumulate but more mispredictions are likely to occur in later stages with this scheme – No reason to use unsure probabilistic information as along as we use defined linguistic knowledge Degree of Complexity Reduction • Illegitimate readings are discarded along the way – In a sentence with is 1066-way ambiguous might have only 1045 ambiguities left after initial processing through an automaton. – Takes a fraction of a second, reduced readings by 1/ 21 10 – Can then apply another rule, repeat. – Reduces ambiguity to an acceptable level quickly Applying Rules Prior to Parsing: Four Methods Rule Automaton Sentence Automaton ∩ Intersection Result Rule Automaton End Result Iteratively Repeat with all rules 1. Process rules iteratively: Takes a long time 2. Order rule automata before parsing – the most efficient rules are applied first 3. Process all rules together 4. Use extra information to direct parsing Before Parsing, Reduce Number of Automata • A set of them can be easily combined using intersection • Not all rules are needed in parsing because some categories might not be present in sentence – select applicable rules at runtime. method 1 2 3 4 5 Non-opt. 31000 730 1500 500 290 Optimized 7000 840 350 110 30 Execution Times (sec.) Summing up Process Described 1. Preprocess text (text normalization and boundary detection) 2. Morphologically analyze and enrich text with syntactic and clause boundary ambiguities 3. Transform each sentence into FSA 4. Select relevant rules for sentence 5. Intersect a couple of rule groups with the sentence automaton 6. Apply all remaining rules in parallel 7. Rank resulting multiple analyses according to heuristic rules and select best one desire totally unambiguous result Conclusions • Church: – Tag parts of speech – Disregard illigitimate permutations based on unlikelihood (through prob analysis) • Voutilainen: – The grammar rules, not amt of ambiguity determines hardness of ambiguity resolution. – Tag parts of speech but consider finite and nonfinite constructions to reduce complexity of overcoming ambiguity. References Church, K. W. (1988). A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. Proceedings of the second conference on Applied natural language processing (Vol. 136, pp. 136-143). Association for Computational Linguistics. Retrieved from http://portal.acm.org/citation.cfm?id=974260 Voutilainen, A., & Tapanainen, P. (1995). Ambiguity resolution in a reductionistic parser. Sixth Conference of the European Chapter of the Association for Computational Linguistics, 4(Keskuskatu 8), 394-403. Association for Computational Linguistics. Retrieved from http://arxiv.org/abs/cmp-lg/9502013