Basic Parsing with ContextFree Grammars Slides adapted from Julia Hirschberg and Dan Jurafsky 1 Homework – Getting Started Data – – – – – 2 News articles in TDT4 Make sure you are looking in ENG sub-directory You need to represent each article in .arff form You need to write a program that will extract “features” from each article (Note that files with POS tags are now available in Eng_POSTAGGED) The .arff file contains independent variables as well as the dependent variable An example Start with classifying into topic Suppose you want to start with just the words Two approaches 1. 2. 3 Use your intuition to choose a few words that might disambiguate Start with all words What would your .arff file look like? Words are the attributes. What are the values? 4 Binary: present or not Frequency: how many times it occurs TF*IDF: how many times it occurs in this document (TF = term frequency) divided by how many times it occurs in all documents (DF = document frequency news_2865.input 5 <DOC> <SOURCE_TYPE>NWIRE</SOURCE_TYPE> <SOURCE_LANG>ENG</SOURCE_LANG> <SOURCE_ORG>NYT</SOURCE_ORG> <DOC_DATE>20010101</DOC_DATE> <BROAD_TOPIC>BT_9</BROAD_TOPIC> <NARROW_TOPIC>NT_34</NARROW_TOPIC> <TEXT> Reversing a policy that has kept medical errors secret for more than two decades, federal officials say they will soon allow Medicare beneficiaries to obtain data about doctors who botched their care. Tens of thousands of Medicare patients file complaints each year about the quality of care they receive from doctors and hospitals. But in many cases, patients get no useful information because doctors can block the release of assessments of their performance. Under a new policy, officials said, doctors will no longer be able to veto disclosure of the findings of investigations. Federal law has for many years allowed for review of care received by Medicare patients, and the law says a peer review organization must inform the patient of the ``final disposition of the complaint'' in each case. But the federal rules used to carry out the law say the peer review organization may disclose information about a doctor only ``with the consent of that practitioner.'' The federal manual for peer review organizations includes similar language about disclosure. Under the new plan, investigators will have to tell patients whether their care met ``professionally recognized standards of health care'' and inform them of any action against the doctor or the hospital. Patients could use such information in lawsuits and other actions against doctors and hospitals that provided substandard care. The new policy came in response to All words for news_2865.input 6 Class Reversing A Policy That Has Kept Commonwealth Independent States Preemptive Refugees BT_9 dependent 1 independent 100 20 50 75 3 0 (news_2816.input) 0 0 0 0 Try it! Open your file Select attributes using Chi-square You can cut and paste resulting attributes to a file Classify How does it work? Try n-grams, POS or date next in same way – 7 How many features would each give you? CFG: Example Many possible CFGs for English, here is an example (fragment): – – – – – – – – – 8 S NP VP VP V NP NP DetP N | AdjP NP AdjP Adj | Adv AdjP N boy | girl V sees | likes Adj big | small Adv very DetP a | the the very small boy likes a girl Modify the grammar 9 10 Derivations of CFGs 12 String rewriting system: we derive a string (=derived structure) But derivation history represented by phrasestructure tree (=derivation structure)! Formal Definition of a CFG G = (V,T,P,S) V: finite set of nonterminal symbols T: finite set of terminal symbols, V and T are disjoint P: finite set of productions of the form A , A V and (T V)* 13 S V: start symbol Context? 14 The notion of context in CFGs has nothing to do with the ordinary meaning of the word context in language All it really means is that the non-terminal on the lefthand side of a rule is out there all by itself (free of context) A -> B C Means that I can rewrite an A as a B followed by a C regardless of the context in which A is found Key Constituents (English) 15 Sentences Noun phrases Verb phrases Prepositional phrases Sentence-Types Declaratives: I do not. S -> NP VP Imperatives: Go around again! S -> VP Yes-No Questions: Do you like my hat? S -> Aux NP VP WH Questions: What are they going to do? S -> WH Aux NP VP 16 18 19 NPs NP -> Pronoun – NP -> Proper-Noun – – The president NP -> Det Nominal Nominal -> Noun Noun – 20 New Jersey is west of New York City Lee Bollinger is the president of Columbia NP -> Det Noun – I came, you saw it, they conquered A morning flight to Denver PPs PP -> Preposition NP – – – – – 21 Over the house Under the house To the tree At play At a party on a boat at night 22 23 24 26 Recursion We’ll have to deal with rules such as the following where the non-terminal on the left also appears somewhere on the right (directly) NP -> NP PP VP -> VP PP 27 [[The flight] [to Boston]] [[departed Miami] [at noon]] Recursion Of course, this is what makes syntax interesting Flights from Denver Flights from Denver to Miami Flights from Denver to Miami in February Flights from Denver to Miami in February on a Friday Flights from Denver to Miami in February on a Friday under $300 Flights from Denver to Miami in February on a Friday under $300 with lunch 28 Recursion [[Flights] [from Denver]] [[[Flights] [from Denver]] [to Miami]] [[[[Flights] [from Denver]] [to Miami]] [in February]] [[[[[Flights] [from Denver]] [to Miami]] [in February]] [on a Friday]] Etc. NP -> NP PP 29 Implications of recursion and context-freeness 30 If you have a rule like – VP -> V NP – It only cares that the thing after the verb is an NP It doesn’t have to know about the internal affairs of that NP The point VP -> V NP (I) hate flights from Denver flights from Denver to Miami flights from Denver to Miami in February flights from Denver to Miami in February on a Friday flights from Denver to Miami in February on a Friday under $300 flights from Denver to Miami in February on a Friday under $300 with lunch 31 Grammar Equivalence Can have different grammars that generate same set of strings (weak equivalence) – – Can have different grammars that have same set of derivation trees (strong equivalence) – – – 32 Grammar 1: NP DetP N and DetP a | the Grammar 2: NP a N | NP the N With CFGs, possible only with useless rules Grammar 2: NP a N | NP the N Grammar 3: NP a N | NP the N, DetP many Strong equivalence implies weak equivalence Normal Forms &c 33 There are weakly equivalent normal forms (Chomsky Normal Form, Greibach Normal Form) There are ways to eliminate useless productions and so on Chomsky Normal Form A CFG is in Chomsky Normal Form (CNF) if all productions are of one of two forms: A BC with A, B, C nonterminals A a, with A a nonterminal and a a terminal Every CFG has a weakly equivalent CFG in CNF 34 “Generative Grammar” 35 Formal languages: formal device to generate a set of strings (such as a CFG) Linguistics (Chomskyan linguistics in particular): approach in which a linguistic theory enumerates all possible strings/structures in a language (=competence) Chomskyan theories do not really use formal devices – they use CFG + informally defined transformations Nobody Uses Simple CFGs (Except Intro NLP Courses) 36 All major syntactic theories (Chomsky, LFG, HPSG, TAG-based theories) represent both phrase structure and dependency, in one way or another All successful parsers currently use statistics about phrase structure and about dependency Derive dependency through “head percolation”: for each rule, say which daughter is head Penn Treebank (PTB) Syntactically annotated corpus of newspaper texts (phrase structure) The newspaper texts are naturally occurring data, but the PTB is not! PTB annotation represents a particular linguistic theory (but a fairly “vanilla” one) Particularities – – – 37 Very indirect representation of grammatical relations (need for head percolation tables) Completely flat structure in NP (brown bag lunch, pink-andyellow child seat ) Has flat Ss, flat VPs Example from PTB ( (S (NP-SBJ It) (VP 's (NP-PRD (NP (NP the latest investment craze) (VP sweeping (NP Wall Street))) : (NP (NP a rash) (PP of (NP (NP new closed-end country funds) , (NP (NP those (ADJP publicly traded) portfolios) (SBAR (WHNP-37 that) (S (NP-SBJ *T*-37) (VP invest (PP-CLR in (NP (NP stocks) (PP of (NP a single foreign country))))))))))) Types of syntactic constructions Is this the same construction? An elf decided to clean the kitchen – An elf seemed to clean the kitchen An elf cleaned the kitchen – Is this the same construction? An elf decided to be in the kitchen – An elf seemed to be in the kitchen An elf was in the kitchen – 39 Types of syntactic constructions (ctd) Is this the same construction? There is an elf in the kitchen – *There decided to be an elf in the kitchen – There seemed to be an elf in the kitchen Is this the same construction? It is raining/it rains – ??It decided to rain/be raining – It seemed to rain/be raining 40 Types of syntactic constructions (ctd) Conclusion: to seem: whatever is embedded surface subject can appear in upper clause to decide: only full nouns that are referential can appear in upper clause Two types of verbs 41 Types of syntactic constructions: Analysis S NP S VP an elf V VP S to decide NP VP an elf V S V to seem NP PP to be in the kitchen VP an elf V PP to be in the kitchen Types of syntactic constructions: Analysis S NP S VP an elf V VP S decided NP VP PRO V S V seemed NP PP to be in the kitchen VP an elf V PP to be in the kitchen Types of syntactic constructions: Analysis S NP S VP an elf V VP S decided NP VP PRO V S V seemed NP PP to be in the kitchen VP an elf V PP to be in the kitchen Types of syntactic constructions: Analysis S NP S NPi VP an elf V an elf V S decided NP VP PRO V VP S seemed NP PP to be in the kitchen ti VP V PP to be in the kitchen Types of syntactic constructions: Analysis to seem: lower surface subject raises to upper clause; raising verb seems (there to be an elf in the kitchen) there seems (t to be an elf in the kitchen) it seems (there is an elf in the kitchen) 46 Types of syntactic constructions: Analysis (ctd) to decide: subject is in upper clause and co-refers with an empty subject in lower clause; control verb an elf decided (an elf to clean the kitchen) an elf decided (PRO to clean the kitchen) an elf decided (he cleans/should clean the kitchen) *it decided (an elf cleans/should clean the kitchen) 47 Lessons Learned from the Raising/Control Issue 48 Use distribution of data to group phenomena into classes Use different underlying structure as basis for explanations Allow things to “move” around from underlying structure -> transformational grammar Check whether explanation you give makes predictions Empirical Matter The Big Picture or Formalisms •Data structures •Formalisms •Algorithms •Distributional Models uses descriptive theory is about predicts Maud expects there to be a riot *Teri promised there to be a riot Maud expects the shit to hit the fan *Teri promised the shit to hit the explanatory theory is about Linguistic Theory Content: Relate morphology to semantics • Surface representation (eg, ps) • Deep representation (eg, dep) • Correspondence Syntactic Parsing 50 Syntactic Parsing 51 Declarative formalisms like CFGs, FSAs define the legal strings of a language -- but only tell you ‘this is a legal string of the language X’ Parsing algorithms specify how to recognize the strings of a language and assign each string one (or more) syntactic analyses Parsing as a Form of Search Searching FSAs – – Searching CFGs – – 52 Finding the right path through the automaton Search space defined by structure of FSA Finding the right parse tree among all possible parse trees Search space defined by the grammar Constraints provided by the input sentence and the automaton or grammar CFG for Fragment of English S NP VP S Aux NP VP VP V NP Det Nom PP -> Prep NP N old | dog | footsteps | young NP PropN V dog | include | prefer Nom -> Adj Nom Nom N Nom Nom N Aux does Prep from | to | on | of PropN Bush | McCain | Obama Det that | this | a| the Nom Nom PP VP V NP TopD BotUp E.g. LC’s Parse Tree for ‘The old dog the footsteps of the young’ for Prior CFG S NP VP V DET NOM old NOM DET N The NP dog the N footsteps PP of the young Top-Down Parser Builds from the root S node to the leaves Expectation-based Common search strategy – – – – – 55 Top-down, left-to-right, backtracking Try first rule with LHS = S Next expand all constituents in these trees/rules Continue until leaves are POS Backtrack when candidate POS does not match input string Rule Expansion “The old dog the footsteps of the young.” 56 Where does backtracking happen? What are the computational disadvantages? What are the advantages? Bottom-Up Parsing Parser begins with words of input and builds up trees, applying grammar rules whose RHS matches Det N V Det N Prep Det N The old dog the footsteps of the young. 57 Det Adj N Det N Prep Det N The old dog the footsteps of the young. Parse continues until an S root node reached or no further node expansion possible What’s right/wrong with…. Top-Down parsers – they never explore illegal parses (e.g. which can’t form an S) -- but waste time on trees that can never match the input Bottom-Up parsers – they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root) For both: find a control strategy -- how explore search space efficiently? – – 58 – Pursuing all parses in parallel or backtrack or …? Which rule to apply next? Which node to expand next? Some Solutions Dynamic Programming Approaches – Use a chart to represent partial results CKY Parsing Algorithm – – – Early Parsing Algorithm – – – 59 Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic theory Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never added Chart Parser Earley Parsing Allows arbitrary CFGs Fills a table in a single sweep over the input words – – Table is length N+1; N is number of words Table entries represent 60 Completed constituents and their locations In-progress constituents Predicted constituents States 61 The table-entries are called states and are represented with dotted-rules. S -> · VP A VP is predicted NP -> Det · Nominal An NP is in progress VP -> V NP · A VP has been found States/Locations 62 It would be nice to know where these things are in the input so… S -> · VP [0,0] A VP is predicted at the start of the sentence NP -> Det · Nominal [1,2] An NP is in progress; the Det goes from 1 to 2 VP -> V NP · A VP has been found starting at 0 and ending at 3 [0,3] Graphically 63 Earley As with most dynamic programming approaches, the answer is found by looking in the table in the right place. In this case, there should be an S state in the final column that spans from 0 to n+1 and is complete. If that’s the case you’re done. – 64 S –> α · [0,n+1] Earley Algorithm March through chart left-to-right. At each step, apply 1 of 3 operators – Predictor – Scanner – Match word predictions (rule with word after dot) to words Completer 65 Create new states representing top-down expectations When a state is complete, see what rules were looking for that completed constituent Predictor Given a state – – – – – With a non-terminal to right of dot That is not a part-of-speech category Create a new state for each expansion of the non-terminal Place these new states into same chart entry as generated state, beginning and ending where generating state ends. So predictor looking at – results in 66 S -> . VP [0,0] VP -> . Verb [0,0] VP -> . Verb NP [0,0] Scanner Given a state – – – – – With a non-terminal to right of dot That is a part-of-speech category If the next word in the input matches this part-of-speech Create a new state with dot moved over the non-terminal So scanner looking at – If the next word, “book”, can be a verb, add new state: – – 67 VP -> . Verb NP [0,0] VP -> Verb . NP [0,1] Add this state to chart entry following current one Note: Earley algorithm uses top-down input to disambiguate POS! Only POS predicted by some state can get added to chart! Completer Applied to a state when its dot has reached right end of role. Parser has discovered a category over some span of input. Find and advance all previous states that were looking for this category – Given: – – NP -> Det Nominal . [1,3] VP -> Verb. NP [0,1] Add – 68 copy state, move dot, insert in current chart entry VP -> Verb NP . [0,3] Earley: how do we know we are done? How do we know when we are done?. Find an S state in the final column that spans from 0 to n+1 and is complete. If that’s the case you’re done. – 69 S –> α · [0,n+1] Earley More specifically… 1. 2. Predict all the states you can upfront Read a word 1. 2. 3. 3. 70 Extend states based on matches Add new predictions Go to 2 Look at N+1 to see if you have a winner Example 71 Book that flight We should find… an S from 0 to 3 that is a completed state… Example 72 Example 73 Example 74 Details What kind of algorithms did we just describe – Not parsers – recognizers 75 The presence of an S state with the right attributes in the right place indicates a successful recognition. But no parse tree… no parser That’s how we solve (not) an exponential problem in polynomial time Converting Earley from Recognizer to Parser 76 With the addition of a few pointers we have a parser Augment the “Completer” to point to where we came from. Augmenting the chart with structural information S8 S9 S10 S11 S12 S13 S8 S9 S8 Retrieving Parse Trees from Chart 78 All the possible parses for an input are in the table We just need to read off all the backpointers from every complete S in the last column of the table Find all the S -> X . [0,N+1] Follow the structural traces from the Completer Of course, this won’t be polynomial time, since there could be an exponential number of trees So we can at least represent ambiguity efficiently Left Recursion vs. Right Recursion Depth-first search will never terminate if grammar is left recursive (e.g. NP --> NP PP) * * ( , ) 79 Solutions: – Rewrite the grammar (automatically?) to a weakly equivalent one which is not left-recursive e.g. The man {on the hill with the telescope…} NP NP PP (wanted: Nom plus a sequence of PPs) NP Nom PP NP Nom Nom Det N …becomes… NP Nom NP’ Nom Det N NP’ PP NP’ (wanted: a sequence of PPs) NP’ e Not so obvious what these rules mean… – Harder to detect and eliminate non-immediate left recursion – NP --> Nom PP – Nom --> NP – – Fix depth of search explicitly Rule ordering: non-recursive rules first 81 NP --> Det Nom NP --> NP PP Another Problem: Structural ambiguity Multiple legal structures – – – 82 Attachment (e.g. I saw a man on a hill with a telescope) Coordination (e.g. younger cats and dogs) NP bracketing (e.g. Spanish language teachers) NP vs. VP Attachment 83 Solution? – 84 Return all possible parses and disambiguate using “other methods” Summing Up Parsing is a search problem which may be implemented with many control strategies – Top-Down or Bottom-Up approaches each have problems – – Left recursion Syntactic ambiguity Next time: Making use of statistical information about syntactic constituents – 85 Combining the two solves some but not all issues Read Ch 14