Prepositional Phrase Attachment Chris Brew Ohio State University 795M, Winter 2000 20/03/2016 1 Prepositional Phrase Attachment Hindle and Rooth: partial parser to get statistics Collins and Brooks: back off estimation from tree bank data + attachment decision. Merlo,Crocker and Berthouzoz: multiple PP disambiguated Ratnaparkhi: entirely unsupervised 795M, Winter 2000 20/03/2016 2 The problem S S NP NP VP I I VP V NP V NP PP bought was hed DET the 795M, Winter 2000 NN s hirt P with NP PP NP DET NN P NP the s hirt with pockets s oap 20/03/2016 3 Hindle and Rooth Whittemore, Ferrara and Brunner – Structural heuristics (Kimball’s Right Association, Frazier’s Minimal Attachment) account for only 55% of behaviour – Lexical preferences do much better H and R – note that the preferences for this experiment were provided by human judgement – ask how to get automatically a good list of lexical preferences 795M, Winter 2000 20/03/2016 4 Discovering Lexical Association in text Church’s part of speech analyser Hindle’s FIDDICH partial parser 13 million words of AP news wire 795M, Winter 2000 20/03/2016 5 Fiddich S NP AUX DART NBAR VP TNS VPREZ VPPRT theADJ NPL are radical changes ? ? PP ADV PREP in N N aimed evidently NBAR NPL CONJ NPL regulations NP pro+ ? PP FIN PREP NP . ? in DART NBAR PP the PNP PREP at VING remedying export NP PNP PNP Soviet Union VP NP IART NBAR PP an ADJ N PREP and extreme customs 795M, Winter 2000 ? 20/03/2016 NP of NBAR shortage N NPL consumer goods 6 Extract information about words ID a b c d e f g h i j k l 795M, Winter 2000 Verb aim remedy assuage Noun change regulation PRO-+ shortage good DART-PNP citizen scarcity item wiper VING VING 20/03/2016 Prep in Syntax -V at of in of as 7 What the table means noun column has head noun of noun phrase (or various special cases) verb column has head verb if noun phrase was its object prep column has following preposition Syntax column V- if no preceding verb 795M, Winter 2000 20/03/2016 8 Counting attachments Parser isn’t reliable, so use a decision procedure to assign nouns and verbs to noun-attach (na) and verb-attach (va) 795M, Winter 2000 20/03/2016 9 No preposition add a count for <noun,NULL> or <verb,NULL> ID a b c d e f g h i j k l 795M, Winter 2000 Verb aim remedy assuage Noun change regulation PRO-+ shortage good DART-PNP citizen scarcity item wiper VING VING 20/03/2016 Prep in Syntax -V at of in of as 10 Sure Verb Attach 1: if the noun phrase head is a pronoun add a count for <verb,prep> ID a b c d e f g h i j k l 795M, Winter 2000 Verb aim remedy assuage Noun change regulation PRO-+ shortage good DART-PNP citizen scarcity item wiper VING VING 20/03/2016 Prep in Syntax -V at of in of as 11 Sure Verb Attach 2: if the verb is passivized, verb attach unless preposition is “by” ID a b c d e f g h i j k l 795M, Winter 2000 Verb aim remedy assuage Noun change regulation PRO-+ shortage good DART-PNP citizen scarcity item wiper VING VING 20/03/2016 Prep in Syntax -V at of in of as 12 Sure Noun Attach if no verb available, then noun attach ID a b c d e f g h i j k l 795M, Winter 2000 Verb aim remedy assuage Noun change regulation PRO-+ shortage good DART-PNP citizen scarcity item wiper VING VING 20/03/2016 Prep in Syntax -V at of in of as 13 Ambiguous Attach 1: if LA score > 2.0 verb attach, < -2.0 noun attach. Use stats so far for calculating score. Repeat until stable. ID a b c d e f g h i j k l 795M, Winter 2000 Verb aim remedy assuage Noun change regulation PRO-+ shortage good DART-PNP citizen scarcity item wiper VING VING 20/03/2016 Prep in at of in Syntax -V <- maybe of as 14 Ambiguous Attach 2: Share counts between noun and verb ID a b c d e f g h i j k l 795M, Winter 2000 Verb aim remedy assuage Noun change regulation PRO-+ shortage good DART-PNP citizen scarcity item wiper VING VING 20/03/2016 Prep in at of in Syntax -V <- maybe of as 15 Unsure Attach: attach to noun by default ID a b c d e f g h i j k l 795M, Winter 2000 Verb aim remedy assuage Noun change regulation PRO-+ shortage good DART-PNP citizen scarcity item wiper VING VING 20/03/2016 Prep in Syntax -V at of in of as 16 LA scores va: (send (soldier NULL) (into Afghanistan)) na: (send (soldier (into Afghanistan))) LA= log2(P(va p|v,n)/P(na p|v,n)) = log2(P(va into|send,soldier)/P(na into|send,soldier)) and we approximate this using collected counts P(va into|send,soldier) ~ P(into|send)*P(NULL|soldier) P(na into|send,soldier) ~ P(into|soldier) 795M, Winter 2000 20/03/2016 17 Estimating the counts P(into|send) = |send,into| / | send| = .049 P(NULL|soldier) = |soldier,NULL|/ |soldier| = .800 P(into|soldier) = |soldier,into|/|soldier| = .0007 LA = log2(.049*.800/.0007) = 5.81 which is enough to be very sure that verb attach is right 795M, Winter 2000 20/03/2016 18 Smooth the estimates using typical association rates of prepositions with the whole classes of nouns and verbs P(p|n) = (|n,p|+P(na|p))/( |n|+1) where P(na|p) is |any noun,p|/|any noun| and similarly for verbs Laplace’s M-estimate again 795M, Winter 2000 20/03/2016 19 Performance ~ 80% correct can get better precision by accepting lower recall (useful for exploratory text analysis) “good enough to be added to a parser like Fidditch” 795M, Winter 2000 20/03/2016 20 Backed-off estimation Collins and Brooks – use N2 as well as N1 S S NP NP VP I I VP V NP V NP PP bought was hed DET the V 795M, Winter 2000 NN s hirt P with NP PP NP DET NN P NP the s hirt with pockets s oap N1 P N2 V 20/03/2016 N1 P N2 21 Use treebank data similar approaches – Ratnaparkhi, Reynar and Roukos – Brill and Resnik difficult to compare results with Hindle and Rooth, because the corpora used are different (but raw scores around 80% in both cases) 795M, Winter 2000 20/03/2016 22 The data 20801 training and 3097 test examples about 95% of the quadruples in the test data had not been seen in the training set. compare H&R 200,000 triples 795M, Winter 2000 20/03/2016 23 The backed-off method Katz’s approach to n-grams – If there are enough trigrams: p(wn|wn-1,wn-2) = | wn-2wn-1,wn | / | wn-2,wn-1| – otherwise back off to bigrams p(wn|wn-1,wn-2) = 1*|wn-1,wn | / |wn-1| – otherwise back off to unigram p(wn|wn-1,wn-2) = 1* 2*|wn | 795M, Winter 2000 20/03/2016 24 Take this method and apply to PP data Start with full quadruples Four possible triples to back off to Six possible pairs to back off to – Restrict attention to those containing P 795M, Winter 2000 20/03/2016 25 How to combine counts from triples and pairs ptriple(1|v,n1,p,n2) ~ p(1,v,n1,p)+p(1,v,p,n2)+p(1,n1,p,n2) p(v,n1,p)+p(v,p,n2)+p(n1,p,n2) ppair(1|v,n1,p,n2) ~ p(1,v,p)+p(1,p,n2)+p(1,n1,p) p(v,p)+p(p,n2)+p(n1,p) other combinations tried, this formula is better than simple averaging for this task 795M, Winter 2000 20/03/2016 26 What was “enough data”? In this task it turns out that using a threshold of 0 for the denominator is best. If there is even one instance of the quadruple, trust it. For n-grams, it was better to ignore low counts reason for this is not obvious, but in such situations trying things is essential. 795M, Winter 2000 20/03/2016 27 Results 84.1% correct without morphological analysis, 84.5% with Quadruples more accurate than triples , in turn more accurate than doubles, etc. But only 148 quadruples in test data, vs 764 triples, 1965 doubles, 216 singles 795M, Winter 2000 20/03/2016 28 Comparison with Hindle and Rooth We have 1924 test cases where H&R would have made a decision The backoff method using just the |v,p| and |n1,p| counts (86.5%) outscores H&R style (82.1%). 795M, Winter 2000 20/03/2016 29 Extra experiments Setting threshold to 5 reduces performance to 81.6% Tuples with prepositions in are the most effective. 795M, Winter 2000 20/03/2016 30 Attaching Multiple PPs Merlo, Crocker, Berthouzoz For a single PP there are two structures, for 2 PPs there are 5, for 3 PPs 14 so the problem is harder, a dumb algorithm will do poorly Generalization of Collins/Brooks 795M, Winter 2000 20/03/2016 31 Five structures for V NP PP PP Structure 1 535 The agency said it will [keep]v [the debt]np [under review]pp [ for possible downgrade] pp Structure 2 1160 Penney will [extend]v [[its involvement]np [with the service]pp]np [for at least five years] pp 795M, Winter 2000 20/03/2016 32 Structure 3 1394 [address]v [[budget limits]np [on [credit allocations [ for the Federal Housing agency ] pp]np]pp]np Structure 4 1055 [abandon] [the everyday pricing approach] [ in the face of [the poor results]] 795M, Winter 2000 20/03/2016 33 Structure 5 539 [answering] [questions [from members of Parliament]] [after his announcement] 795M, Winter 2000 20/03/2016 34 Algorithm Model of PP1 as Collins and Brooks, but excluding p2 Model of 2PPs is back off over sextuples (i,v,n1,p1,n2,p2) until we get to tuples that don’t have p1, or that don’t have p2 then Competitive Back off 795M, Winter 2000 20/03/2016 35 Competitive Back off Do standard back off for PP1 using v,n1,p1 Do standard back off for PP2 using v,n2,p2 Do back off for PP2 using n1 instead of n2 (ie., v,n1,p2) Combine these results using a simple procedure, with tiebreak where they conflict. 795M, Winter 2000 20/03/2016 36 Results PP1(2) 84.3% baseline 61.2% (choose most frequent) PP2(5) 69.6% baseline 29.8% (choose most frequent) PP3(14) 43.6% baseline 18.5% (choose most frequent) 795M, Winter 2000 20/03/2016 37 Results Take-home messages – Devise a baseline – Measure performance – Pick tasks where beating the baseline is » Impressive » Useful 795M, Winter 2000 20/03/2016 38 Ratnaparkhi (Coling 98) 970K unannotated sentences of WSJ tagger, simple chunker heuristic extraction of unambiguous cases 795M, Winter 2000 20/03/2016 39 Heuristic extraction (v,p,n2) if » p is a real preposition (not “of”) » v is the first verb that occurs < K words left of p » v is not a form of the verb “to be” » No noun occurs between v and p » n2 is first word < K words right of p » No verb occurs between p and n2 795M, Winter 2000 20/03/2016 40 Heuristic extraction 2 (n,p,n2) if » p is a real preposition (not “of”) » n is the first that occurs < K words left of p » No verb occurs between v and p » n2 is first word < K words right of p » No verb occurs between p and n2 795M, Winter 2000 20/03/2016 41 Accuracy of extraction Noisy data (c 69% correct) But abundant 795M, Winter 2000 20/03/2016 42 Evaluation 81.91% with a back off technique 81.85% with interpolation like H&R Baseline for this data 70.39% 795M, Winter 2000 20/03/2016 43 Portability Moved to Spanish and got similar performance H&R would have had to port Fidditch to Spanish 795M, Winter 2000 20/03/2016 44 Where to get more information Charniak ch 8. Hindle and Rooth CL 19(1) pp 103-120, 1993 Manning and Schütze, section 8.3 Original papers 795M, Winter 2000 20/03/2016 45