Probabilistic and Lexicalized Parsing 7/15/2016 COMS 4705 Fall 2004 1 Probabilistic CFGs • The probabilistic model – Assigning probabilities to parse trees • Disambiguate, LM for ASR, faster parsing • Getting the probabilities for the model • Parsing with probabilities – Slight modification to dynamic programming approach – Task: find max probability tree for an input string 7/15/2016 COMS 4705 – Fall 2004 2 Probability Model • Attach probabilities to grammar rules • Expansions for a given non-terminal sum to 1 VP -> V .55 VP -> V NP .40 VP -> V NP NP .05 – Read this as P(Specific rule | LHS) – “What’s the probability that VP will expand to V, given that we have a VP?” 7/15/2016 COMS 4705 – Fall 2004 3 Probability of a Derivation • A derivation (tree) consists of the set of grammar rules that are in the parse tree • The probability of a tree is just the product of the probabilities of the rules in the derivation • Note the independence assumption – why don’t we use conditional probabilities? 7/15/2016 COMS 4705 – Fall 2004 4 Probability of a Sentence • Probability of a word sequence (sentence) is probability of its tree in unambiguous case • Sum of probabilities of possible trees in ambiguous case 7/15/2016 COMS 4705 – Fall 2004 5 Getting the Probabilities • From an annotated database – E.g. the Penn Treebank – To get the probability for a particular VP rule just count all the times the rule is used and divide by the number of VPs overall – What if you have no treebank (e.g. for a ‘new’ language)? 7/15/2016 COMS 4705 – Fall 2004 6 Assumptions • We have a grammar to parse with • We have a large robust dictionary with parts of speech • We have a parser • Given all that… we can parse probabilistically 7/15/2016 COMS 4705 – Fall 2004 7 Typical Approach • Bottom-up (CYK) dynamic programming approach • Assign probabilities to constituents as they are completed and placed in the table • Use the max probability for each constituent going up the tree 7/15/2016 COMS 4705 – Fall 2004 8 How do we fill in the DP table? • Say we’re talking about a final part of a parse– finding an S, e.g., of the rule – S->0NPiVPj The probability of S is… P(S->NP VP)*P(NP)*P(VP) The acqua part is already known, since we’re doing bottom-up parsing. We don’t need to recalculate the probabilities of constituents lower in the tree. 7/15/2016 COMS 4705 – Fall 2004 9 Using the Maxima • P(NP) is known • But what if there are multiple NPs for the span of text in question (0 to i)? • Take the max (Why?) • Does not mean that other kinds of constituents for the same span are ignored (i.e. they might be in the solution) 7/15/2016 COMS 4705 – Fall 2004 10 CYK Parsing: John called Mary from Denver • • • • • S -> NP VP VP -> V NP NP -> NP PP VP -> VP PP PP -> P NP 7/15/2016 • NP -> John, Mary, Denver • V -> called • P -> from COMS 4705 – Fall 2004 11 Example S NP VP PP VP V John 7/15/2016 called NP Mary COMS 4705 – Fall 2004 NP P from Denver 12 Example S NP VP NP NP PP V John called 7/15/2016 Mary from Denver COMS 4705 – Fall 2004 13 Example John 7/15/2016 called Mary from COMS 4705 – Fall 2004 Denver 14 Base Case: Aw NP P NP V NP Denver from Mary called John 7/15/2016 COMS 4705 – Fall 2004 15 Recursive Cases: ABC NP P NP X V NP called Denver from Mary John 7/15/2016 COMS 4705 – Fall 2004 16 NP P VP NP X V Mary NP called Denver from John 7/15/2016 COMS 4705 – Fall 2004 17 NP X P VP NP from X V Mary NP called Denver John 7/15/2016 COMS 4705 – Fall 2004 18 PP NP X P Denver VP NP from X V Mary NP called John 7/15/2016 COMS 4705 – Fall 2004 19 S NP PP NP X P Denver VP NP from V Mary called John 7/15/2016 COMS 4705 – Fall 2004 20 PP NP Denver X X P S VP NP from X V Mary NP called John 7/15/2016 COMS 4705 – Fall 2004 21 NP X S VP NP X V Mary NP called PP NP P Denver from John 7/15/2016 COMS 4705 – Fall 2004 22 NP PP NP Denver X X X P S VP NP from X V Mary NP called John 7/15/2016 COMS 4705 – Fall 2004 23 VP NP PP NP X X X P Denver S VP NP from X V Mary NP called John 7/15/2016 COMS 4705 – Fall 2004 24 VP NP PP NP X X X P Denver S VP NP from X V Mary NP called John 7/15/2016 COMS 4705 – Fall 2004 25 NP PP NP X VP1 VP2 X X P Denver S VP NP from X V Mary NP called John 7/15/2016 COMS 4705 – Fall 2004 26 S NP PP NP X VP1 VP2 X X P Denver S VP NP from X V Mary NP called John 7/15/2016 COMS 4705 – Fall 2004 27 S VP NP PP NP X X X P Denver S VP NP from X V Mary NP called John 7/15/2016 COMS 4705 – Fall 2004 28 Problems with PCFGs • The probability model we’re using is just based on the rules in the derivation… – Doesn’t use the words in any real way – e.g. PP attachment often depends on the verb, its object, and the preposition (I ate pickles with a fork. I ate pickles with relish.) – Doesn’t take into account where in the derivation a rule is used – e.g. pronouns more often subjects than objects (She hates Mary. Mary hates her.) 7/15/2016 COMS 4705 – Fall 2004 29 Solution • Add lexical dependencies to the scheme… – Add the predilections of particular words into the probabilities in the derivation – I.e. Condition the rule probabilities on the actual words 7/15/2016 COMS 4705 – Fall 2004 30 Heads • Make use of notion of the head of a phrase, e.g. – The head of an NP is its noun – The head of a VP is its verb – The head of a PP is its preposition • Phrasal heads – Can ‘take the place of’ whole phrases, in some sense – Define most important characteristics of the phrase – Phrases are generally identified by their heads 7/15/2016 COMS 4705 – Fall 2004 31 Example (correct parse) Attribute grammar 7/15/2016 COMS 4705 – Fall 2004 32 Example (wrong) 7/15/2016 COMS 4705 – Fall 2004 33 How? • We started with rule probabilities – VP -> V NP PP P(rule|VP) • E.g., count of this rule divided by the number of VPs in a treebank • Now we want lexicalized probabilities – VP(dumped)-> V(dumped) NP(sacks)PP(in) – P(r|VP ^ dumped is the verb ^ sacks is the head of the NP ^ in is the head of the PP) – Not likely to have significant counts in any treebank 7/15/2016 COMS 4705 – Fall 2004 34 Declare Independence • So, exploit independence assumption and collect the statistics you can… • Focus on capturing two things – Verb subcategorization • Particular verbs have affinities for particular VPs – Objects have affinities for their predicates (mostly their mothers and grandmothers) • Some objects fit better with some predicates than others 7/15/2016 COMS 4705 – Fall 2004 35 Verb Subcategorization • Condition particular VP rules on their head… so r: VP -> V NP PP P(r|VP) Becomes P(r | VP ^ dumped) What’s the count? How many times was this rule used with dump, divided by the number of VPs that dump appears in total 7/15/2016 COMS 4705 – Fall 2004 36 Preferences • Subcat captures the affinity between VP heads (verbs) and the VP rules they go with. • What about the affinity between VP heads and the heads of the other daughters of the VP? 7/15/2016 COMS 4705 – Fall 2004 37 Example (correct parse) 7/15/2016 COMS 4705 – Fall 2004 38 Example (wrong) 7/15/2016 COMS 4705 – Fall 2004 39 Preferences • The issue here is the attachment of the PP • So the affinities we care about are the ones between dumped and into vs. sacks and into. – Count the times dumped is the head of a constituent that has a PP daughter with into as its head and normalize (alternatively, P(into|PP,dumped is mother’s head)) – Vs. the situation where sacks is a constituent with into as the head of a PP daughter (or, P(into|PP,sacks is mother’s head)) 7/15/2016 COMS 4705 – Fall 2004 40 Another Example • Consider the VPs – Ate spaghetti with gusto – Ate spaghetti with marinara • The affinity of gusto for eat is much larger than its affinity for spaghetti • On the other hand, the affinity of marinara for spaghetti is much higher than its affinity for ate 7/15/2016 COMS 4705 – Fall 2004 41 But not a Head Probability Relationship • Note the relationship here is more distant and doesn’t involve a headword since gusto and marinara aren’t the heads of the PPs (Hindle & Rooth ’91) Vp (ate) Vp(ate) Vp(ate) Pp(with) np v Ate spaghetti with gusto 7/15/2016 Np(spag) np Pp(with) v Ate spaghetti with marinara COMS 4705 – Fall 2004 42 Next Time • Midterm – Covers everything assigned and all lectures up through today – Short answers (2-3 sentences), exercises, longer answers (1 para) – Closed book, calculators allowed but no laptops 7/15/2016 COMS 4705 – Fall 2004 43