Constituency parsing (句構造解析) Naoaki Okazaki okazaki at ecei.tohoku.ac.jp http://www.chokkan.org/ http://twitter.com/#!/chokkanorg 2011-10-25 Information Communication Theory (情報伝達学) 1 Acknowledgements • Portions of this material are from: • D. Jurafsky and J. H. Martin (2009). Speech and Language Processing, Pearson. • J. Nivre and S. Kübler (2006). Dependency Parsing. Tutorial at Coling-ACL 2006. • M. Collins (1999). Head-Driven Statistical Models for Natural Language Processing. Ph.D. thesis, University of Pennsylvania. • C. Macleod, et. al. (1998). COMLEX Syntax Reference Manual Version 3.0. Linguistic Data Consortium. 2011-10-25 Information Communication Theory (情報伝達学) 2 Syntactic parsing (構文解析) • Analyze a sentence to determine its grammatical structure • Syntactic parsing (mostly) builds a tree for a sentence • Every token in a given sentence need to appear as a node or leaf of its parse tree • An initial step to semantic analysis • Used by many NLP problems and applications (e.g., machine translation, summarization, question answering) • One of the core technologies of NLP • A large number of research papers were published • Many interesting ideas and technologies were proposed 2011-10-25 Information Communication Theory (情報伝達学) 3 Constituency and dependency S Constituent parsing VP NP PU PP NP JJ NP NN VBD JJ NP NN IN JJ NNS Economic news had little effect on financial markets . nmod sbj nmod nmod nmod obj pc p Dependency parsing 2011-10-25 (Nivre and Kübler, 2006) Information Communication Theory (情報伝達学) 4 Constituency and dependency • Constituency (構成, 構造) • Groups of words behaving as a single unit (e.g., phrases) • e.g., John loves Mary • The verb “loves” and the noun “Mary” forms a verb phrase • The noun “John” and the verb phrase “loves Mary” forms a sentence • Dependency (依存, 係り受け) • Describes relationships between two words • e.g., John loves Mary • “John” is a nominal subject of the verb “loves” • “Mary” is a direct object of the verb “Mary” • Fit to languages with scrambling (語順の入れ替え) 2011-10-25 Information Communication Theory (情報伝達学) 5 Table of contents • • • • • • • • • • Constituency Context Free Grammar (CFG) Brief overview of the formal grammar of English Penn Treebank CKY algorithm Probabilistic Context Free Grammar (PCFG) Probabilistic CKY algorithm Limitations of PCFG Enhancements of PCFG Evaluation • Let’s look at how a tree is built from a sentence! • Lecture #2: feature → tag (label) • Lecture #3: token sequence → tag sequence • Lectures #4 and #5: token sequence → tree 2011-10-25 Information Communication Theory (情報伝達学) 6 Constituency and ContextFree Grammar (CFG) Chapter 12.2, D. Jurafsky and J. H. Martin. Speech and Language Processing, 2009. 2011-10-25 Information Communication Theory (情報伝達学) 7 Constituency • Constituent • Groups of words behaving as a single unit (e.g., phrase, clause) • Linguists do not agree on the detail about constituency • Examples of noun phrases • my neighbor Totoro; the spy who loved me; • outstanding continued performance by an actor in a leading role 2011-10-25 Information Communication Theory (情報伝達学) 8 Evidences for constituency • Substitution My neighbor Totoro The spy who loved me The Dark Side of the Moon Michael Jackson I was very popular. • Construction • e.g., a noun phrase consists of a head noun and its modifiers • Ordering • e.g., a noun phrases can appear before a verb • Relocations • Constituent may be placed in a number of different locations • On 18 Oct 2011, we learned the formal grammars of English • We learned the formal grammars of English on 18 Oct 2011 2011-10-25 Information Communication Theory (情報伝達学) 9 Context-Free Grammars (CFG) (文脈自由文法) • Used for describing formation rules of constituents • Also called: • Phrase-Structure Grammars (句構造文法) • Backus-Naur Form (BNF) (BN記法) • Consists of a set of production rules • NP → Det Nominal • NP → ProperNoun • Nominal → Noun | Nominal Noun • Det → 𝑎 𝑎𝑛 𝑡ℎ𝑒 • ProperNoun → 𝐼 𝑦𝑜𝑢 ℎ𝑒 𝑠ℎ𝑒 𝑖𝑡 𝑡ℎ𝑒𝑦 … • Noun → 𝑎𝑏𝑎𝑛𝑑𝑜𝑛 𝑎𝑏𝑑𝑢𝑐𝑡𝑖𝑜𝑛 … | 𝑧𝑜𝑜 2011-10-25 Information Communication Theory (情報伝達学) Derivation of “a zoo” NP → Det Nominal → 𝐷𝑒𝑡 𝑁𝑜𝑢𝑛 → 𝑎 𝑧𝑜𝑜 Parse tree (derivation) NP dominate Nominal Det Noun a zoo 10 Formal definition of CFG • Four parameters: • 𝑁: a set of non-terminal symbols (or variables) (非終端記号) • e.g., NP, VP, PP, AP, Noun, Verb, Adj, Det • Σ: a set of terminal symbols (disjoint from 𝑁) (終端記号) • e.g., a, the, flight, book, that, I, my • 𝑅: a set of production rules (生成規則): 𝐴 → 𝐵 • 𝐴 (mother): a non-terminal symbol • 𝐵 (daughters): terminal/non-terminal symbol(s), i.e., 𝑁 ∪ Σ ∗ • e.g., S → VP, S → NP VP, S → Wh−NP Aux NP VP, NP → NP PP • 𝑆: a start symbol (初期記号, 開始記号) • e.g, S 2011-10-25 Information Communication Theory (情報伝達学) 11 An example of CFG grammar • S → NP VP • Noun → time • S → VP NP • Noun → flies • VP → Verb • Noun → arrow • VP → Verb NP • Verb → time • VP → VP PP • Verb → flies • NP → Noun • Verb → like • NP → Det NP • Verb → arrow • NP → Noun NP • Prep → like • NP → NP PP • Det → an • PP → Prep NP 2011-10-25 Information Communication Theory (情報伝達学) 12 A derivation of “Time flies like an arrow” Tree notation S NP VP VP PP NP Noun Verb Prep Det Noun Time flies like an arrow [S [NP [Noun Time]] [VP [VP [Verb flies]][PP [Prep like] [NP [Det an] [Noun arrow]]]]] Bracketed notation 2011-10-25 Information Communication Theory (情報伝達学) 13 Notes on CFG • Grammatical and ungrammatical • Grammatical: sentences that can be derived by a grammar • “Time flies flies like an arrow like an arrow” is grammatical • Ungrammatical: sentences that cannot be derived by a grammar • Generative grammar (生成文法) • Generation: rewrite the start symbol to produce a new sentence • Recognition: check the grammaticality of a given sentence • Parsing: build a tree for a given sentence using the grammar 2011-10-25 Information Communication Theory (情報伝達学) 14 Brief Overview of Formal Grammar of English Chapter 12.3, Jurafsky and Martin. Speech and Language Processing, 2009. 2011-10-25 Information Communication Theory (情報伝達学) 15 Sentence constructions • Declarative (平叙文): S → NP VP • This is a pen. • Imperative (命令文): S → VP • Mind the gap. • Yes-no question: SQ → Aux NP VP ? • Do you have a pen? • Wh-questions • Wh-subject-question: S → Wh−NP VP ? • Who wants to be a millionaire? • Wh-non-subject-question: S → Wh−NP Aux NP VP • What kind of sushi do you like? •… 2011-10-25 Information Communication Theory (情報伝達学) 16 Noun phrase (1/2) • Basic rule: • NP → Det Nominal • NP → ProperNoun • Nominal → Noun | Nominal Noun (recursive rule) • Nominal → 𝑎𝑏𝑎𝑛𝑑𝑜𝑛 𝑎𝑏𝑑𝑢𝑐𝑡𝑖𝑜𝑛 … | 𝑧𝑜𝑜 • Premodifiers (before the head noun) • NP → (Det) (Card) (Ord) (Quant) (AP) Nominal • Card → 𝑜𝑛𝑒 𝑡𝑤𝑜 … cardinal numbers (基数) • Ord → 𝑓𝑖𝑟𝑠𝑡 𝑠𝑒𝑐𝑜𝑛𝑑 … ordinal numbers (序数) • Quant → 𝑠𝑜𝑚𝑒 𝑚𝑎𝑛𝑦 … quantifiers (数量詞) • AP → Adj | AP Adj • Adj → 𝑠ℎ𝑜𝑟𝑡 𝑙𝑜𝑛𝑔 … adjectives (形容詞) 2011-10-25 Information Communication Theory (情報伝達学) 17 Noun phrase (2/2) • Postmodifiers (after the head noun) • Nominal → Nominal PP prepositional phrase • e.g., trips to the moon, books for beginners, live from Sydney to Vegas • Nominal → Nominal GerundVP gerundive (動名詞) • GerundVP → GerundV NP | GerundV PP | GerundV | GerundV NP PP • GerundV → 𝑏𝑒𝑖𝑛𝑔 𝑎𝑟𝑟𝑖𝑣𝑖𝑛𝑔 … • e.g., train departing for London at 10:30 • Other non-finite postmodifiers (非定形節): -ed and infinitive forms • Nominal → Nominal RelClause relative clause (関係詞節) • RelClause → (who | that) VP • e.g., book that I wrote 2011-10-25 Information Communication Theory (情報伝達学) 18 A parse tree of “All the morning flights from Denver to Tampa leaving before 10” NP PreDet all NP Det Nom the Nom GerundiveVP PP Nom Head (主辞): the word in a phrase that is grammatically the most important Nom Nom Noun Noun flights morning 2011-10-25 PP leaving before 10 to Tampa from Denver Figure 12.5, Jurafsky and Martin, Speech and Language Processing Information Communication Theory (情報伝達学) 19 Verb phrase • Various constructions: • VP → Verb appear • VP → Verb NP love Mary • VP → Verb NP PP leave London at ten • VP → Verb PP appear suddenly • VP → Verb S think it is cool • VP → Verb to VP want to fly • … intransitive (自動詞) transitive (他動詞) • VP constructions depend on the head verb • These sentences are unacceptable • * I appear University • * I find to fly 2011-10-25 Information Communication Theory (情報伝達学) 20 Verb subcategorization (下位範疇化) • Subcategorization • Numbers, orders, and types of syntactic arguments of verbs • Traditional grammar: transitive or intransitive • Modern grammar: as many as 100 subcategorizations • COMLEX (Macleod+, 1998) • • • • • • • • • intrans NP NP-NP NP-to-NP to-NP-NP S to-inf-sc be-ing-sc … 2011-10-25 appear, fly, go give, find, buy give, send give, send give, send think, find want, need begin, suggest He went I bought the book He gave his mother a big kiss He gave a big kiss to his mother He gave to his mother a big kiss They thought he was always late I wanted to come He began drinking at 9:00 every night Information Communication Theory (情報伝達学) 21 Handling subcategorization in CFG • Split symbols VP and Verb into subcategories • Verb−with−NP−complement → 𝑓𝑖𝑛𝑑 𝑙𝑒𝑎𝑣𝑒 𝑟𝑒𝑝𝑒𝑎𝑡 • Verb−with−S−complement → 𝑡ℎ𝑖𝑛𝑘 𝑏𝑒𝑙𝑖𝑒𝑣𝑒 𝑠𝑎𝑦 • Verb−with−Inf−VP−complement → 𝑤𝑎𝑛𝑡 𝑡𝑟𝑦 𝑛𝑒𝑒𝑑 • VP → Verb−with−NP−complement NP • VP → Verb−with−S−complement S • VP → Verb−with−Inf−VP−complement 𝑡𝑜 VP • This explodes production rules! • Use feature structures and unification (lecture #6) instead 2011-10-25 Information Communication Theory (情報伝達学) 22 Agreement • Relationship between words in terms of number, gender, etc • This flight * this flights • These flights * these flight • I work at a company * I works at a company • He works at a company * he work at a company • Is he old enough to drink? * [Am | Are] he old enough to drink? • We say a main verb and its subject noun agree in number • If the noun and the verb are either both singular or both plural 2011-10-25 Information Communication Theory (情報伝達学) 23 Handling agreement in CFG • Split non-terminal symbols Verb and Noun • S → 3SgNP 3SgVP • S → Non3SgNP Non3SgVP • Non3SgNP → Det PlNominal • SgNominal → SgNoun • PlNominal → PlNoun • SgNoun → 𝑓𝑙𝑖𝑔ℎ𝑡 𝑡𝑟𝑎𝑖𝑛 𝑐𝑎𝑟 | … • PlNoun → 𝑓𝑙𝑖𝑔ℎ𝑡𝑠 𝑡𝑟𝑎𝑖𝑛𝑠 𝑐𝑎𝑟𝑠 | … • This also explodes production rules! • Agreements between determiners and head nouns • Agreements for noun’s case: nominative (主格), accusative (目的格) • Also use feature structures and unification 2011-10-25 Information Communication Theory (情報伝達学) 24 Penn Treebank Chapter 12.4, D. Jurafsky and J. H. Martin. Speech and Language Processing, 2009. 2011-10-25 Information Communication Theory (情報伝達学) 25 Penn Treebank • Sentences are annotated with their parse trees • Brown: balanced corpus (15 text category) • Switchboard (電話の交換台): telephone conversations • Air Traffic Information System (ATIS): spoken language • Wall Street Journal (WSJ): news • A practical grammar with actual sentences • The annotation guideline is a good source of English grammar • Some additional information • Grammatical functions (e.g., SUBJ) • Empty nodes to mark long-distance dependencies • http://www.cis.upenn.edu/~treebank/ 2011-10-25 Information Communication Theory (情報伝達学) 26 A sentence from WSJ portion ( (S (NP-SBJ (NNP Carnival) (NNP Cruise) (NNP Lines) (NNP Inc|.|) ) (VP (VBD said) (SBAR (-NONE- 0) (S (NP-SBJ-1 (NP (JJ potential) (NNS problems) ) (PP (IN with) (NP (NP (DT the) (NN construction) ) (PP (IN of) (NP (NP (CD two) (JJ big) (NN cruise) (NNS ships) ) (PP (IN from) (NP (NNP Finland) ))))))) (VP (VBP have) (VP (VBN been) (VP (VBN averted) (NP (-NONE- *-1) ))))))) Represented by LISP style notation (|.| |.|) )) (S-expression) Trace 2011-10-25 Information Communication Theory (情報伝達学) 27 Extracting grammars from treebanks • Grammar used in Penn Treebank is relatively flat • Approx 4,500 distinct rules for VPs: • VP → (VBD PP) | (VBD PP PP) | (VBD PP PP PP) | (VBD PP PP PP PP) • VP → (VB ADVP PP) | (VB PP ADVP) | (ADVP VB PP) • VP → VBP PP PP PP PP PP ADVP PP • … • Thousands of NP rules: • NP → (DT JJ NN) | (DT JJ NNS) | (DT JJ NN NN) | (DT JJ JJ NN) | (DT JJ CD NN) • NP → (RB DT JJ NN NN) | (RB DT JJ JJ NNS) • NP → DT NNP NNP NNP NNP JJ NN • NP → DT JJ JJ VBG NN NNP NNP FW NNP • The state-owned industrial holding company Instituto Nacional de Industria • … 2011-10-25 Information Communication Theory (情報伝達学) 28 Long-distance dependencies • Passive voice • The UFO was found * • Implicit subjects (e.g., infinitives) • We are expected * to go • WH-movement • What did he buy *T*? • The girl who John saw *T* • The girl who *T* saw John • Moved clauses • The show must go on, Freddie said * 2011-10-25 Information Communication Theory (情報伝達学) 29 Parsing with CFG rules Chapter 13, D. Jurafsky and J. H. Martin. Speech and Language Processing, 2009. 2011-10-25 Information Communication Theory (情報伝達学) 30 CFG grammar used in this section • S → NP VP • Noun → time • S → VP NP • Noun → flies • VP → Verb • Noun → arrow • VP → Verb NP • Verb → time • VP → VP PP • Verb → flies • NP → Noun • Verb → like • NP → Det NP • Verb → arrow • NP → Noun NP • Prep → like • NP → NP PP • Det → an • PP → Prep NP 2011-10-25 Information Communication Theory (情報伝達学) 31 Top-down parsing • Build parse trees from the root node S down to leaves Sub-trees always include S This algorithm can generate new sentences S Sub-trees may not reach a sentence Infinite loops caused by recursive rules, e.g., NP → NP PP S S NP VP S NP Noun VP S VP NP Det S VP NP NP NP Noun S VP NP NP NP VP PP (… more to be generated …) 2011-10-25 Information Communication Theory (情報伝達学) 32 Bottom-up parsing • Build parse trees from sentences and reach the root node Sub-trees may not reach S Impossible to handle rules with empty at the right-hand side Sub-trees are guaranteed to include the input sentence S NP VP NP 2011-10-25 NP Noun Noun Verb Det Noun Time flies like an arrow Information Communication Theory (情報伝達学) 33 The two parsing algorithms are inefficient! S NP S VP VP Noun Verb PP Time flies like an arrow Repeated PP Perp Verb NP PP Time Noun like an arrow NP NP Noun Time NP NP PP Noun like an arrow flies 2011-10-25 like Det an NP Noun arrow flies VP Verb Time Systematic algorithm is necessary! Information Communication Theory (情報伝達学) NP NP PP Noun like an arrow flies 34 Dynamic programing for parsing • Cocke-Kasami-Younger (CKY) algorithm • Botton-up parsing algorithm • Simple and easy to implement • Used by a great number of NLP studies • Early algorithm • Top-down parsing algorithm • (See Chapter 13.4.2) • Chart parsing • (See Chapter 13.4.3) 2011-10-25 Information Communication Theory (情報伝達学) 35 Cocke-Kasami-Younger (CKY) • Bottom-up parsing algorithm • Works for CNF rules • Binary branching rules for non-terminals only (𝐴 → 𝐵 𝐶) • Regions of an input strings are represented by [𝑖, 𝑗) • Sub-trees spanning [𝑖, 𝑗) are built from • sub-trees [𝑖, 𝑘) and [𝑘, 𝑗) ([𝑖 < 𝑘 < 𝑗) • Kasami (嵩 忠雄) NP S PP NP or VP 0 𝑖 2011-10-25 time 1 flies 2 like 3 an 𝑘 Information Communication Theory (情報伝達学) 4 arrow 5 𝑗 36 Chomsky normal form (CNF) • CFG is restricted to • 𝜖-free (the right-hand side of each rule is not empty) • Rules are in either of the two forms: • 𝐴 → 𝐵 𝐶 (binary branching of non-terminals only) VP VBD NP VP PP V VP INF-VP to VP Verb NP • 𝐴 → 𝛼 (terminal symbols do not appear in binary branching rules) • CFG and CNF are weakly equivalent • They generate the same set of sentences • They do not always assign the same derivation to each sentence 2011-10-25 Information Communication Theory (情報伝達学) 37 Conversion from CFG to CNF • Any CFG grammar can be converted to a weakly equivalent CNF grammar • Rules with terminal and non-terminal symbols mixed INF−VP → to VP INF−VP → TO VP TO → to • Unary rules S → VP VP → Verb Verb → go S → go VP → go Verb → go • Rules with more than two branches (next slide) 2011-10-25 Information Communication Theory (情報伝達学) 38 Binarization • Left binarization (Aho and Ullman, 1972; Charniak+ 1998) VP → Verb NP NP PP: 0.7 VP → Verb_NP_NP PP: 0.7 Verb_NP_NP → Verb_NP NP: 1.0 Verb_NP → Verb NP: 1.0 • Right binarization • The opposite to left binarization (e.g., VP → Verb NP_NP_PP: 0.7) • Head binarization (Klein and Manning, 2003) • Left binarization if the first child is the head; right binarization otherwise • Compact binarization (Schmid, 2004) • Greedy strategy to combine frequently occurring symbols first 2011-10-25 Information Communication Theory (情報伝達学) 39 CKY algorithm 1. 2. 3. 4. 5. 6. 7. function cky(words, grammar) for 𝑗 ← 1 to length(words) table 𝑗 − 1, 𝑗 ← 𝐴 𝐴 → words[𝑗] ∈ grammar for 𝑖 ← (𝑗 − 2) downto 0 for 𝑘 ← (𝑖 + 1) to (𝑗 − 1) table 𝑖, 𝑗 ← table 𝑖, 𝑗 ∪ 𝐴 𝐴 → 𝐵𝐶 ∈ grammar, 𝐵 ∈ table 𝑖, 𝑘 , 𝐶 ∈ table[𝑘, 𝑗] return table j 0 1 2 3 4 5 0 i,k i,j 1 k,j 2 j-1 i 3 4 2011-10-25 Information Communication Theory (情報伝達学) 40 0 time 1 flies 2 like 3 an 4 arrow 5 time [0,1] S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an [0,2] [0,3] [0,4] [0,5] flies [1,2] [1,3] [1,4] [1,5] like [2,3] [2,4] [2,5] an [3,4] [3,5] Remove and expand unit productions Grammar for CKY parsing 2011-10-25 Information Communication Theory (情報伝達学) arrow [4,5] 41 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time 1 flies 2 like 3 an arrow 5 time [0,1] Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an [0,2] [0,3] [0,4] [0,5] flies [1,2] [1,3] [1,4] [1,5] like [2,3] [2,4] Grammar converted to CNF Information Communication Theory (情報伝達学) [2,5] an [3,4] 2011-10-25 4 [3,5] arrow [4,5] 42 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time 1 flies 2 like 3 an arrow 5 VP, NP, N, V time [0,1] Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an [0,2] [0,3] [0,4] [0,5] flies [1,2] [1,3] [1,4] [1,5] like [2,3] [2,4] [0,1]: Find rules that generate “time” Information Communication Theory (情報伝達学) [2,5] an [3,4] 2011-10-25 4 [3,5] arrow [4,5] 43 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time 1 flies 2 like 3 an arrow 5 VP, NP, N, V time [0,1] [0,2] [0,3] [0,4] [0,5] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an flies [1,2] [1,3] [1,4] [1,5] like [2,3] [2,4] [1,2]: Find rules that generate “flies” Information Communication Theory (情報伝達学) [2,5] an [3,4] 2011-10-25 4 [3,5] arrow [4,5] 44 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V 1 flies 2 like 3 an arrow S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,1] [0,2] 5 time [0,3] [0,4] [0,5] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an flies [1,2] [1,3] [1,4] [1,5] like [2,3] [2,4] [0,2] → [0,1] + [1,2] Information Communication Theory (情報伝達学) [2,5] an [3,4] 2011-10-25 4 [3,5] arrow [4,5] 45 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies 2 like 3 an arrow S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] 5 time [0,3] [0,4] [0,5] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] [2,3]: Find rules that generate “like” Information Communication Theory (情報伝達学) [2,5] an [3,4] 2011-10-25 4 [3,5] arrow [4,5] 46 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies like 3 an arrow [0,2] 5 time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] [1,3]: [1,2] + [2,3] Information Communication Theory (情報伝達学) [2,5] an [3,4] 2011-10-25 4 S (VP NP) S (NP VP) NP (N NP) VP (V NP) VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 [3,5] arrow [4,5] 47 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies 3 an arrow 5 time [0,2] [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] [0,3]: [0,1] + [1,3] and … Information Communication Theory (情報伝達学) [2,5] an [3,4] 2011-10-25 4 S (VP NP) S (NP VP) NP (N NP) VP (V NP) VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an like 2 [3,5] arrow [4,5] 48 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) 3 an arrow 5 time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] [0,3]: … and [0,2] + [2,3] Information Communication Theory (情報伝達学) [2,5] an [3,4] 2011-10-25 4 S (NP VP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an like 2 [3,5] arrow [4,5] 49 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 like 3 an arrow 5 S (NP VP) time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] Exercise: Continue the algorithm and write 7 parsed trees Information Communication Theory (情報伝達学) [2,5] an [3,4] 2011-10-25 4 [3,5] arrow [4,5] 50 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 like an 3 arrow 5 S (NP VP) time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] [2,5] Det an [3,4] [3,4]: Find rules that generate “an” 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 51 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,1] [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 like an 3 arrow 5 S (NP VP) time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] [2,5] Det an [3,4] [2,4] → [2,3] + [3,4] 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 52 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 like an 3 arrow 5 S (NP VP) time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] [2,5] Det an [3,4] [1,4] → [1,2] + [2,4] and … 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 53 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 like an 3 arrow 5 S (NP VP) time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] [2,5] Det an [3,4] [1,4] → … and [1,3] + [3,4] 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 54 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 like an 3 arrow 5 S (NP VP) time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] [2,5] Det an [3,4] [0,4] → [0,1] + [1,4], … 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 55 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 like an 3 arrow 5 S (NP VP) time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] [2,5] Det an [3,4] [0,4] → …, [0,2] + [2,4], … 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 56 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 like an 3 arrow 5 S (NP VP) time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] [2,5] Det an [3,4] [0,4] → …, [0,3] + [3,4] 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 57 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 like an 3 4 arrow 5 S (NP VP) time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] [2,5] Det an [3,4] [3,5] VP, NP, N, V [4,5]: Find rules that generate “arrow” 2011-10-25 Information Communication Theory (情報伝達学) arrow [4,5] 58 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,1] [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 like an 3 4 arrow 5 S (NP VP) time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] [1,5] VP, V, Prep like [2,3] [2,4] Det [2,5] NP (Det N) an [3,4] [3,5] VP, NP, N, V [3,5] → [3,4] + [4,5] 2011-10-25 Information Communication Theory (情報伝達学) arrow [4,5] 59 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 like an 3 4 arrow 5 S (NP VP) time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] VP, V, Prep [1,5] S (VP NP) VP (V NP) PP (Prep NP) [2,3] [2,4] Det like [2,5] NP (Det N) an [3,4] [3,5] VP, NP, N, V [2,5] → [2,3] + [3,5], … 2011-10-25 Information Communication Theory (情報伝達学) arrow [4,5] 60 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 2 like an 3 4 arrow 5 S (NP VP) time [0,3] [0,4] [0,5] S (NP VP) flies [1,2] [1,3] [1,4] VP, V, Prep [1,5] S (VP NP) VP (V NP) PP (Prep NP) [2,3] [2,4] Det like [2,5] NP (Det N) an [3,4] [3,5] VP, NP, N, V [2,5] → …, [2,4] + [4,5] 2011-10-25 Information Communication Theory (情報伝達学) arrow [4,5] 61 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an [1,2] 2 like an 3 4 arrow 5 S (NP VP) time [0,3] [0,4] S (NP VP) [0,5] VP (VP PP) S (NP VP) NP (NP PP) [1,3] [1,4] VP, V, Prep [1,5] S (VP NP) VP (V NP) PP (Prep NP) [2,3] [2,4] Det flies like [2,5] NP (Det N) an [3,4] [3,5] VP, NP, N, V [1,5] → [1,2] + [2,5], … 2011-10-25 Information Communication Theory (情報伝達学) arrow [4,5] 62 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an [1,2] 2 like an 3 4 arrow 5 S (NP VP) time [0,3] [0,4] S (NP VP) [0,5] VP (VP PP) S (NP VP) NP (NP PP) [1,3] [1,4] VP, V, Prep [1,5] S (VP NP) VP (V NP) PP (Prep NP) [2,3] [2,4] Det flies like [2,5] NP (Det N) an [3,4] [3,5] VP, NP, N, V [1,5] → …, [1,4] + [4,5], … 2011-10-25 Information Communication Theory (情報伝達学) arrow [4,5] 63 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an [1,2] 2 like an 3 4 arrow 5 S (NP VP) time [0,3] [0,4] S (NP VP) [0,5] VP (VP PP) S (NP VP) NP (NP PP) [1,3] [1,4] VP, V, Prep [1,5] S (VP NP) VP (V NP) PP (Prep NP) [2,3] [2,4] Det flies like [2,5] NP (Det N) an [3,4] [3,5] VP, NP, N, V [1,5] → …, [1,4] + [4,5] 2011-10-25 Information Communication Theory (情報伝達学) arrow [4,5] 64 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an [1,2] 2 like an 3 4 arrow 5 S (NP VP) time [0,3] [0,4] S (NP VP) [0,5] VP (VP PP) S (NP VP) NP (NP PP) [1,3] [1,4] VP, V, Prep [1,5] S (VP NP) VP (V NP) PP (Prep NP) [2,3] [2,4] Det S (VP NP) S (NP VP) NP (N NP) VP (V NP) flies like [2,5] NP (Det N) an [3,4] [3,5] VP, NP, N, V [0,5] → [0,1] + [1,5], … 2011-10-25 Information Communication Theory (情報伝達学) arrow [4,5] 65 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an [1,2] 2 like an 3 arrow 5 S (NP VP) time [0,3] [0,4] S (NP VP) [0,5] VP (VP PP) S (NP VP) NP (NP PP) [1,3] [1,4] VP, V, Prep [2,3] [2,4] Det S (VP NP) S (NP VP) NP (N NP) VP (V NP) S (NP VP) NP (NP PP) VP (VP PP) Information Communication Theory (情報伝達学) flies [1,5] S (VP NP) VP (V NP) PP (Prep NP) [0,5] → …, [0,2] + [2,5], … 2011-10-25 4 like [2,5] NP (Det N) an [3,4] [3,5] VP, NP, N, V arrow [4,5] 66 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an [1,2] 2 like an 3 arrow 5 S (NP VP) time [0,3] [0,4] S (NP VP) [0,5] VP (VP PP) S (NP VP) NP (NP PP) [1,3] [1,4] VP, V, Prep [2,3] [2,4] Det S (VP NP) S (NP VP) NP (N NP) VP (V NP) S (NP VP) NP (NP PP) VP (VP PP) Information Communication Theory (情報伝達学) flies [1,5] S (VP NP) VP (V NP) PP (Prep NP) [0,5] → …, [0,3] + [3,5], … 2011-10-25 4 like [2,5] NP (Det N) an [3,4] [3,5] VP, NP, N, V arrow [4,5] 67 S → NP VP S → VP NP VP → time VP → flies VP → like VP → arrow VP → Verb NP VP → VP PP NP → time NP → flies NP → arrow NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP 0 time VP, NP, N, V [0,1] 1 flies S (VP NP) S (NP VP) NP (N NP) VP (V NP) [0,2] VP, NP, N, V Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an [1,2] [0,5] → …, [0,4] + [4,5] 2011-10-25 2 like an 3 4 arrow 5 S (NP VP) time [0,3] [0,4] S (NP VP) [0,5] VP (VP PP) S (NP VP) NP (NP PP) [1,3] [1,4] VP, V, Prep [1,5] S (VP NP) VP (V NP) PP (Prep NP) [2,3] [2,4] Det S (VP NP) S (NP VP) NP (N NP) VP (V NP) S (NP VP) NP (NP PP) VP (VP PP) Information Communication Theory (情報伝達学) flies like [2,5] NP (Det N) an [3,4] [3,5] VP, NP, N, V arrow [4,5] 68 Obtained parse trees (1/7) S VP NP NP PP Prep Time 2011-10-25 flies like NP Det Noun an arrow Information Communication Theory (情報伝達学) 69 Obtained parse trees (2/7) S NP VP VP PP Prep Time 2011-10-25 flies like NP Det Noun an arrow Information Communication Theory (情報伝達学) 70 Obtained parse trees (3/7) NP Noun NP NP PP Prep Time 2011-10-25 flies like NP Det Noun an arrow Information Communication Theory (情報伝達学) 71 Obtained parse trees (4/7) VP Verb NP NP PP Prep Time 2011-10-25 flies like NP Det Noun an arrow Information Communication Theory (情報伝達学) 72 Obtained parse trees (5/7) S NP Noun Time 2011-10-25 VP NP flies VP like NP Det Noun an arrow Information Communication Theory (情報伝達学) 73 Obtained parse trees (6/7) NP NP Noun Time 2011-10-25 PP NP flies Prep like NP Det Noun an arrow Information Communication Theory (情報伝達学) 74 Obtained parse trees (7/7) VP VP Verb Time 2011-10-25 PP NP flies Prep like NP Det Noun an arrow Information Communication Theory (情報伝達学) 75 Python implementation (1/2) import collections def build(CNF): G = collections.defaultdict(list) for left, right in CNF: G[right].append(left) return G def cky(G, W): T = [[[] for j in range(len(W)+1)] for i in range(len(W))] for j in range(1, len(W)+1): T[j-1][j] += G.get(W[j-1], []) print "[%d,%d]: %r" % (j-1, j, G.get(W[j-1])) for i in range(j-2, -1, -1): for k in range(i+1, j): for x in T[i][k]: for y in T[k][j]: T[i][j] += G.get((x,y), []) print "[%d,%d]: %r (%s [%d,%d] and %s [%d,%d])" % ( i,j,G.get((x,y)),x,i,k,y,k,j) return T 2011-10-25 Information Communication Theory (情報伝達学) 76 Python implementation (2/2) if __name__ == '__main__': CNF = ( ('S', ('NP','VP')), ('S', ('VP','NP')), ('VP', 'time'), ('VP', 'flies'), ('VP', 'like'), ('VP', 'arrow'), ('VP', ('Verb','NP')), ('VP', ('VP','PP')), ('NP', 'time'), ('NP', 'flies'), ('NP', 'arrow'), ('NP', ('Det','NP')), ('NP', ('Noun','NP')), ('NP', ('NP','PP')), ('PP', ('Preposition','NP')), ('Noun', 'time'), ('Noun', 'flies'), ('Noun', 'arrow'), ('Verb', 'time'), ('Verb', 'flies'), ('Verb', 'like'), ('Verb', 'arrow'), ('Preposition', 'like'), ('Det', 'an'), ) G = build(CNF) T = cky(G, ('time', 'flies', 'like', 'an', 'arrow')) 2011-10-25 [0,1]: ['VP', 'NP', 'Noun', 'Verb'] [1,2]: ['VP', 'NP', 'Noun', 'Verb'] [0,2]: ['S'] (VP [0,1] and NP [1,2]) [0,2]: ['S'] (NP [0,1] and VP [1,2]) [0,2]: ['NP'] (Noun [0,1] and NP [1,2]) [0,2]: ['VP'] (Verb [0,1] and NP [1,2]) [2,3]: ['VP', 'Verb', 'Preposition'] [1,3]: ['S'] (NP [1,2] and VP [2,3]) [0,3]: ['S'] (NP [0,2] and VP [2,3]) [3,4]: ['Det'] [4,5]: ['VP', 'NP', 'Noun', 'Verb'] [3,5]: ['NP'] (Det [3,4] and NP [4,5]) [2,5]: ['S'] (VP [2,3] and NP [3,5]) [2,5]: ['VP'] (Verb [2,3] and NP [3,5]) [2,5]: ['PP'] (Preposition [2,3] and NP [3,5]) [1,5]: ['VP'] (VP [1,2] and PP [2,5]) [1,5]: ['S'] (NP [1,2] and VP [2,5]) [1,5]: ['NP'] (NP [1,2] and PP [2,5]) [0,5]: ['S'] (VP [0,1] and NP [1,5]) [0,5]: ['S'] (NP [0,1] and VP [1,5]) [0,5]: ['NP'] (Noun [0,1] and NP [1,5]) [0,5]: ['VP'] (Verb [0,1] and NP [1,5]) [0,5]: ['S'] (NP [0,2] and VP [2,5]) [0,5]: ['NP'] (NP [0,2] and PP [2,5]) [0,5]: ['VP'] (VP [0,2] and PP [2,5]) Output (empty slots omitted) Information Communication Theory (情報伝達学) 77 Notes on CKY • Computational cost 𝑂(|𝑅|𝑛3 ), where 𝑛 denotes # tokens • Loop counters 𝑖, 𝑗, 𝑘 range 0, 𝑛 VP, NP, N, V • |𝑅| presents the number of CNF rules [1,2] • Recognition and parsing • Recognition: successful if 0, 𝑛 includes S • Parsing: trace back from 0, 𝑛 to terminals using back links S (NP VP) [1,3] VP, V, Prep [2,3] • We can fill cells in either of these orderings: • 0,1 → 1,2 → 0,2 → 2,3 → 1,3 → 0,3 → ⋯ • 0,1 → 1,2 → 2,3 → 3,4 → 4,5 → 0,2 → 1,3 → 2,4 → ⋯ • Unary rules are handled by modifying the algorithm: • Every time a cell in the chart is filled with a non-terminal 𝐴, add all symbols 𝑋 that could be produced by unary rules 𝑋 → 𝐴 2011-10-25 Information Communication Theory (情報伝達学) 78 Ambiguity • We have multiple parse trees for a sentence because of: • Multiple semantic interpretations • Constituents (may hopefully) correspond to semantic structures • Overgeneration • Strict grammar → likely to receive no tree for a given sentence • Loose grammar → likely to receive many trees for a given sentence • Impossible to design a CFG grammar that exactly corresponds to a human language • Disambiguation (again!) • Similarly to part-of-speech tagging, we need to find the best parse tree for a given sentence, using a conditional probability (scoring) • Assign a conditional probability to each CFG rule • → Probabilistic Context Free Grammar (PCFG) 2011-10-25 Information Communication Theory (情報伝達学) 79 Probabilistic Context Free Grammar (PCFG) Chapter 14, D. Jurafsky and J. H. Martin. Speech and Language Processing, 2009. 2011-10-25 Information Communication Theory (情報伝達学) 80 Probabilistic Context Free Grammar (PCFG) • Four parameters: • 𝑁: a set of non-terminal symbols (or variables) (非終端記号) • e.g., NP, VP, PP, AP, Noun, Verb, Adj, Det • Σ: a set of terminal symbols (disjoint from 𝑁) (終端記号) • e.g., a, the, flight, book, that, I, my • 𝑅: a set of production rules (生成規則): 𝐴 → 𝐵 [𝑝] • 𝐴 (mother): a non-terminal symbol • 𝐵 (daughters): terminal/non-terminal symbol(s), i.e., 𝑁 ∪ Σ ∗ • 𝑝: conditional probability 𝑝(𝐵|𝐴) • e.g., S → VP [0.1], S → NP VP [0.6], NP → NP PP [0.2] • 𝑆: a start symbol (初期記号, 開始記号) • e.g, S 2011-10-25 Information Communication Theory (情報伝達学) 81 PCFG = CFG production rules with probabilities • PCFG simply appends a conditional probability 𝑝 to each production rule in CFG • We can write the probability of 𝐴 → 𝐵 𝑝 as, • 𝑝(𝐴 → 𝐵) • 𝑝(𝐴 → 𝐵|𝐴) • 𝑝(𝐵|𝐴) • Conditional probability distributions must satisfy, • 𝐵 𝑝(𝐴 → 𝐵) = 1 2011-10-25 Information Communication Theory (情報伝達学) 82 Computing PCFG rule probabilities • Maximum likelihood estimation (if we have a treebank) 𝐶(𝛼 → 𝛽) 𝐶(𝛼 → 𝛽) 𝑃(𝛼 → 𝛽|𝛼) = = 𝐶(𝛼 → 𝛾) 𝐶(𝛼) 𝛾 (the number of times where production 𝛼 → 𝛽 appears) = (the number of occurrences of symbol 𝛼) • As simple as counting frequency of co-occurrences in the treebank! 2011-10-25 Information Communication Theory (情報伝達学) 83 An example of PCFG grammar • S → NP VP [0.8] • S → VP NP [0.2] • VP → Verb [0.1] • VP → Verb NP [0.5] • VP → VP PP [0.4] • NP → Noun [0.1] • NP → Det NP [0.4] • NP → Noun NP [0.3] • NP → NP PP [0.2] • PP → Prep NP [1.0] 2011-10-25 • Noun → time [0.4] • Noun → flies [0.2] • Noun → arrow [0.4] • Verb → time [0.1] • Verb → flies [0.4] • Verb → like [0.4] • Verb → arrow [0.1] • Prep → like [1.0] • Det → an [1.0] Information Communication Theory (情報伝達学) 84 PCFGs for disambiguation • Disambiguation (resolving ambiguity) of parse trees • Probabilistic approach: to find the best parse tree 𝑇 of all possible trees that yields a given sentence 𝑆 𝑇= argmax 𝑃(𝑇|𝑆) 𝑇 𝑠.𝑡. 𝑆=yield(𝑇) 𝑇 means “our estimation for 𝑇” argmax : find 𝑇 that maximizes 𝑃(𝑇|𝑆) 𝑇 𝑠. 𝑡. 𝑆 = yield(𝑇): all possible trees that yield 𝑆 𝑇1 , 𝑇2 , 𝑇3 , … Definition 𝑇= argmax 𝑃(𝑇|𝑆) = 𝑇 𝑠.𝑡. 𝑆=yield(𝑇) 𝑃(𝑇, 𝑆) = argmax 𝑃(𝑇, 𝑆) 𝑃(𝑆) 𝑇 𝑠.𝑡. 𝑆=yield(𝑇) 𝑇 𝑠.𝑡. 𝑆=yield(𝑇) argmax Constant w.r.t 𝑆 𝑃 𝑇, 𝑆 = 𝑃 𝐴→𝐵 , 𝑃 𝑇, 𝑆 = 𝑃 𝑇 𝑃 𝑇 𝑆 = 𝑃(𝑇) (𝐴→𝐵)∈𝑇 2011-10-25 1 (∵ 𝑇 is defined to yield 𝑆) Information Communication Theory (情報伝達学) 85 Computing 𝑃 𝑇, 𝑆 S S 0.8 NP VP 0.1 Noun Time PP VP 0.016 0.1 Verb like an arrow NP 1.0 like Det 1.0 𝑇1 Verb 0.4 an NP 0.1 Perp 0.4 flies VP 1.0 0.4 0.4 0.2 PP 0.1 NP Time 0.1 Noun 0.2 NP 0.1 Noun PP 0.016 like an arrow 0.2 flies 0.4 𝑇2 arrow 𝑃 𝑃𝑃, 𝑙𝑖𝑘𝑒 𝑎𝑛 𝑎𝑟𝑟𝑜𝑤 = 𝑃 PP→Prep NP 𝑃 Prep→𝑙𝑖𝑘𝑒 𝑃 NP→Det NP 𝑃 Det→𝑎𝑛 𝑃 NP→Noun 𝑃 Noun→𝑎𝑟𝑟𝑜𝑤 = 1.0 × 1.0 × 0.4 × 1.0 × 0.1 × 0.4 = 0.016 𝑃 𝑇1 , 𝑆 = 𝑃 S→NP VP 𝑃 NP→Noun 𝑃 Noun→𝑇𝑖𝑚𝑒 𝑃 VP→VP PP 𝑃 VP→Verb 𝑃 Verb→𝑓𝑙𝑖𝑒𝑠 𝑃 PP→𝑙𝑖𝑘𝑒 𝑎𝑛 𝑎𝑟𝑟𝑜𝑤 = 0.8 × 0.1 × 0.4 × 0.4 × 0.1 × 0.4 × 0.016 = 0.000008192 𝑃 𝑇2 , 𝑆 = 𝑃 S→NP VP 𝑃 NP→Noun 𝑃 Noun→𝑇𝑖𝑚𝑒 𝑃 VP→VP PP 𝑃 VP→Verb 𝑃 Verb→𝑓𝑙𝑖𝑒𝑠 𝑃 PP→𝑙𝑖𝑘𝑒 𝑎𝑛 𝑎𝑟𝑟𝑜𝑤 = 0.2 × 0.1 × 0.1 × 0.2 × 0.1 × 0.2 × 0.016 = 0.000000128 2011-10-25 Information Communication Theory (情報伝達学) 86 argmax Repeated computations in 𝑃(𝑇|𝑆) 𝑇 𝑠.𝑡. 𝑆=yield(𝑇) S S 0.8 0.2 NP 0.1 Noun PP VP PP like an arrow Noun NP 1.0 1.0 0.2 NP 0.1 Noun 0.2 flies like an arrow Noun flies 0.4 like Det NP 0.4 2011-10-25 Time 0.016 0.2 Perp NP PP 0.1 1.0 0.3 Time NP 0.1 0.4 flies 0.2 Verb 0.016 0.1 Verb NP 0.1 0.4 0.4 Time VP VP NP 0.1 VP 0.5 Verb NP 0.1 PP 0.016 like an arrow an Noun Time 0.4 arrow Let’s extend CKY to PCKY! Information Communication Theory (情報伝達学) 0.2 NP 0.1 Noun PP 0.016 like an arrow 0.2 flies 87 Probabilistic CKY function pcky(words, grammar) 2. for 𝑗 ← 1 to length(words) 3. for (𝐴 → 𝑤) in rules(𝐴 → words 𝑗 ∈ grammar) 4. table 𝑗 − 1, 𝑗, 𝐴 ← 𝑃(𝐴 → 𝑤) 5. for 𝑖 ← (𝑗 − 2) downto 0 6. for 𝑘 ← (𝑖 + 1) to (𝑗 − 1) 7. for (𝐴 → 𝐵 𝐶) in rules(𝐴 → 𝐵 𝐶 ∈ grammar and table 𝑖, 𝑘, 𝐵 > 0 and table 𝑘, 𝑗, 𝐶 > 0) j 8. if table 𝑖, 𝑗, 𝐴 < 𝑃 𝐴 → 𝐵 𝐶 × table 𝑖, 𝑘, 𝐵 × table 𝑘, 𝑗, 𝐶 9. table 𝑖, 𝑗, 𝐴 ← 𝑃 𝐴 → 𝐵 𝐶 × table 𝑖, 𝑘, 𝐵 × table 𝑘, 𝑗, 𝐶 10. back 𝑖, 𝑗, 𝐴 ← 𝑘, 𝐵, 𝐶 11. return table 1. 2011-10-25 Information Communication Theory (情報伝達学) 88 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time 1 flies 2 like 3 4 arrow 5 time [0,1] [0,2] [0,3] [0,4] [0,5] flies [1,2] [1,3] [1,4] [1,5] like [2,3] [2,4] [2,5] an [3,4] Grammar for PCKY parsing 2011-10-25 an Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 89 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time 1 flies 2 like 3 an arrow VP (0.01) NP (0.04) N (0.4) V (0.1) [0,1] 5 time [0,2] [0,3] [0,4] [0,5] flies [1,2] [1,3] [1,4] [1,5] like [2,3] [2,4] [2,5] an [3,4] [0,1]: Find rules that generate “time” 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 90 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time 1 flies 2 like 3 an arrow VP (0.01) NP (0.04) N (0.4) V (0.1) [0,1] 5 time [0,2] [0,3] [0,4] [0,5] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] flies [1,3] [1,4] [1,5] like [2,3] [2,4] [2,5] an [3,4] [1,2]: Find rules that generate “flies” 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 91 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 1 flies 2 like 3 4 arrow S(VP NP):0.00004 S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 [0,1] [0,2] [0,3] [0,4] [0,5] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] 5 time flies [1,3] [1,4] [1,5] like [2,3] [2,4] [2,5] an [3,4] [0,2] → [0,1] + [1,2] 2011-10-25 an Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 92 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like 3 4 arrow S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 [0,2] [0,3] [0,4] [0,5] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] 5 time flies [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 [2,3] like [2,4] [2,5] an [3,4] [2,3]: Find rules that generate “like” 2011-10-25 an Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 93 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like 3 4 arrow S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 [0,2] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] 5 time [0,3] [0,4] [0,5] S (NP VP): 0.00064 flies [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 [2,3] like [2,4] [2,5] an [3,4] [1,3]: [1,2] + [2,3] 2011-10-25 an Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 94 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies like 2 3 4 arrow S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 [0,2] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 5 time [0,3] [0,4] [0,5] S (NP VP): 0.00064 flies [1,2] [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 [2,3] like [2,4] [2,5] an [3,4] [0,3]: [0,1] + [1,3] and … 2011-10-25 an Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 95 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies like 2 S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 3 4 arrow 5 time [0,4] [0,5] S (NP VP): 0.00064 flies [1,2] [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 [2,3] like [2,4] [2,5] an [3,4] [0,3]: … and [0,2] + [2,3] 2011-10-25 an Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 96 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] S (NP VP): 0.00064 flies [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 like [2,3] [2,4] [2,5] Det: 1.0 an [3,4] [3,4]: Find rules that generate “an” 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 97 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] [0,1] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] S (NP VP): 0.00064 flies [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 like [2,3] [2,4] [2,5] Det: 1.0 an [3,4] [2,4] → [2,3] + [3,4] 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 98 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] S (NP VP): 0.00064 flies [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 like [2,3] [2,4] [2,5] Det: 1.0 an [3,4] [1,4] → [1,2] + [2,4] and … 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 99 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] S (NP VP): 0.00064 flies [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 like [2,3] [2,4] [2,5] Det: 1.0 an [3,4] [1,4] → … and [1,3] + [3,4] 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 100 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] S (NP VP): 0.00064 flies [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 like [2,3] [2,4] [2,5] Det: 1.0 an [3,4] [0,4] → [0,1] + [1,4], … 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 101 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] S (NP VP): 0.00064 flies [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 like [2,3] [2,4] [2,5] Det: 1.0 an [3,4] [0,4] → …, [0,2] + [2,4], … 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 102 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] S (NP VP): 0.00064 flies [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 like [2,3] [2,4] [2,5] Det: 1.0 an [3,4] [0,4] → …, [0,3] + [3,4] 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] arrow [4,5] 103 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] S (NP VP): 0.00064 flies [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 like [2,3] [2,4] [2,5] Det: 1.0 an [3,4] [4,5]: Find rules that generate “arrow” 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 arrow [4,5] 104 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] [0,1] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] S (NP VP): 0.00064 flies [1,3] [1,4] [1,5] VP: 0.04 V: 0.4 Prep: 1.0 like [2,3] [2,4] Det: 1.0 [2,5] NP (Det N): 0.016 an [3,4] [3,5] → [3,4] + [4,5] 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 arrow [4,5] 105 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] S (NP VP): 0.00064 flies [1,3] [1,4] [1,5] S(VP NP):1.28e-4 VP(V NP):3.2e-3 PP(Prep NP):0.016 VP: 0.04 V: 0.4 Prep: 1.0 [2,3] [2,4] Det: 1.0 like [2,5] NP (Det N): 0.016 an [3,4] [2,5] → [2,3] + [3,5], … 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 arrow [4,5] 106 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] S (NP VP): 0.00064 flies [1,3] [1,4] [1,5] S(VP NP):1.28e-4 VP(V NP):3.2e-3 PP(Prep NP):0.016 VP: 0.04 V: 0.4 Prep: 1.0 [2,3] [2,4] Det: 1.0 like [2,5] NP (Det N): 0.016 an [3,4] [2,5] → …, [2,4] + [4,5] 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 arrow [4,5] 107 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] VP(VP PP):2.56e-4 S(NP VP):5.12e-5 NP(NP PP): 6.4e-5 S (NP VP): 0.00064 [1,3] [1,4] [2,3] [2,4] Det: 1.0 flies [1,5] S(VP NP):1.28e-4 VP(V NP):3.2e-3 PP(Prep NP):0.016 VP: 0.04 V: 0.4 Prep: 1.0 like [2,5] NP (Det N): 0.016 an [3,4] [1,5] → [1,2] + [2,5], … 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 arrow [4,5] 108 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] VP(VP PP):2.56e-4 S(NP VP):5.12e-5 NP(NP PP): 6.4e-5 S (NP VP): 0.00064 [1,3] [1,4] [2,3] [2,4] Det: 1.0 flies [1,5] S(VP NP):1.28e-4 VP(V NP):3.2e-3 PP(Prep NP):0.016 VP: 0.04 V: 0.4 Prep: 1.0 like [2,5] NP (Det N): 0.016 an [3,4] [1,5] → …, [1,4] + [4,5], … 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 arrow [4,5] 109 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [0,5] VP(VP PP):2.56e-4 S(NP VP):5.12e-5 NP(NP PP): 6.4e-5 S (NP VP): 0.00064 [1,3] [1,4] [2,3] [2,4] Det: 1.0 flies [1,5] S(VP NP):1.28e-4 VP(V NP):3.2e-3 PP(Prep NP):0.016 VP: 0.04 V: 0.4 Prep: 1.0 like [2,5] NP (Det N): 0.016 an [3,4] [1,5] → …, [1,4] + [4,5] 2011-10-25 4 Information Communication Theory (情報伝達学) [3,5] VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 arrow [4,5] 110 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 [0,4] 5 [0,5] VP(VP PP):2.56e-4 S(NP VP):5.12e-5 NP(NP PP): 6.4e-5 S (NP VP): 0.00064 [1,3] [1,4] [2,3] [2,4] Det: 1.0 Information Communication Theory (情報伝達学) flies [1,5] S(VP NP):1.28e-4 VP(V NP):3.2e-3 PP(Prep NP):0.016 [0,5] → [0,1] + [1,5], … 2011-10-25 arrow time VP: 0.04 V: 0.4 Prep: 1.0 S (VP NP): 1.28e-7 S (NP VP): 8.192e-6 NP (N NP): 7.68e-6 VP (V NP): 3.20e-6 4 like [2,5] NP (Det N): 0.016 an [3,4] [3,5] VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 arrow [4,5] 111 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [1,3] [1,4] [2,3] [2,4] Det: 1.0 Information Communication Theory (情報伝達学) flies [1,5] S(VP NP):1.28e-4 VP(V NP):3.2e-3 PP(Prep NP):0.016 VP: 0.04 V: 0.4 Prep: 1.0 S (VP NP): 1.28e-7 S (NP VP): 8.192e-6 NP (N NP): 7.68e-6 VP (V NP): 3.20e-6 S (NP VP): 6.144e-6 NP (NP PP): 7.68e-6 VP (VP PP): 6.4e-6 [0,5] VP(VP PP):2.56e-4 S(NP VP):5.12e-5 NP(NP PP): 6.4e-5 S (NP VP): 0.00064 [0,5] → …, [0,2] + [2,5], … 2011-10-25 4 like [2,5] NP (Det N): 0.016 an [3,4] [3,5] VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 arrow [4,5] 112 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 arrow 5 time [0,4] [1,3] [1,4] [2,3] [2,4] Det: 1.0 Information Communication Theory (情報伝達学) flies [1,5] S(VP NP):1.28e-4 VP(V NP):3.2e-3 PP(Prep NP):0.016 VP: 0.04 V: 0.4 Prep: 1.0 S (VP NP): 1.28e-7 S (NP VP): 8.192e-6 NP (N NP): 7.68e-6 VP (V NP): 3.20e-6 S (NP VP): 6.144e-6 NP (NP PP): 7.68e-6 VP (VP PP): 6.4e-6 [0,5] VP(VP PP):2.56e-4 S(NP VP):5.12e-5 NP(NP PP): 6.4e-5 S (NP VP): 0.00064 [0,5] → …, [0,3] + [3,5], … 2011-10-25 4 like [2,5] NP (Det N): 0.016 an [3,4] [3,5] VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 arrow [4,5] 113 S → NP VP S → VP NP VP → Verb VP → Verb NP VP → VP PP NP → Noun NP → Det NP NP → Noun NP NP → NP PP PP → Prep NP Noun → time Noun → flies Noun → arrow Verb → time Verb → flies Verb → like Verb → arrow Prep → like Det → an 0 [0.8] [0.2] [0.1] [0.5] [0.4] [0.1] [0.4] [0.3] [0.2] [1.0] [0.4] [0.2] [0.4] [0.1] [0.4] [0.4] [0.1] [1.0] [1.0] time VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 [0,1] 1 flies 2 like S(NP VP):0.00128 NP(N NP):0.0024 VP V NP):0.0010 S (NP VP): 0.0000768 [0,2] [0,3] VP: 0.04 NP: 0.02 N: 0.2 V: 0.4 [1,2] an 3 [0,4] 5 [0,5] VP(VP PP):2.56e-4 S(NP VP):5.12e-5 NP(NP PP): 6.4e-5 S (NP VP): 0.00064 [1,3] [1,4] [2,3] [2,4] Det: 1.0 Information Communication Theory (情報伝達学) flies [1,5] S(VP NP):1.28e-4 VP(V NP):3.2e-3 PP(Prep NP):0.016 [0,5] → …, [0,4] + [4,5] 2011-10-25 arrow time VP: 0.04 V: 0.4 Prep: 1.0 S (VP NP): 1.28e-7 S (NP VP): 8.192e-6 NP (N NP): 7.68e-6 VP (V NP): 3.20e-6 S (NP VP): 6.144e-6 NP (NP PP): 7.68e-6 VP (VP PP): 6.4e-6 4 like [2,5] NP (Det N): 0.016 an [3,4] [3,5] VP: 0.01 NP: 0.04 N: 0.4 V: 0.1 arrow [4,5] 114 The tree with the maximum probability 𝑇 S 0.8 NP VP 0.1 Noun 0.4 Time 0.4 PP VP 1.0 0.1 Verb 0.4 Perp NP 1.0 flies like 0.4 Det NP 0.1 1.0 an Noun 0.4 𝑃 𝑇, 𝑆 = 8.192 × 10−6 2011-10-25 arrow Information Communication Theory (情報伝達学) 115 Python implementation (1/2) import collections import math def build(CNF): G = collections.defaultdict(list) for left, right, p in CNF: G[right].append((left, math.log(p))) return G # Logarithm domain for probabilities # G: RHS -> list of (LHS, log(p)) def show_cell(T, i, j): for x, (p, l, r) in T[i][j].iteritems(): print "[%d,%d,%s]=%g: %r and %r" % (i, j, x, math.exp(p), l, r) def pcky(G, W): T = [[{} for j in range(len(W)+1)] for i in range(len(W))] for j in range(1, len(W)+1): for left, p in G.get(W[j-1], {}): T[j-1][j][left] = (p, (W[j-1], j, j), (W[j-1], j, j)) show_cell(T, j-1, j) for i in range(j-2, -1, -1): for k in range(i+1, j): for x, (px, lx, rx) in T[i][k].iteritems(): for y, (py, ly, ry) in T[k][j].iteritems(): for left, p in G.get((x, y), {}): # Compute the log probability of new node pnew = px + py + p # Maintain the maximum of log probability where [i,j] has the symbol left. if left not in T[i][j] or T[i][j][left][0] < pnew: T[i][j][left] = (pnew, (x, i, k), (y, k, j)) show_cell(T, i, j) return T 2011-10-25 Information Communication Theory (情報伝達学) 116 Python implementation (2/2) if __name__ == '__main__': PCNF = ( ('S', ('NP','VP'), 0.8), ('S', ('VP','NP'), 0.2), ('VP', 'time', 0.01), ('VP', 'flies', 0.04), ('VP', 'like', 0.04), ('VP', 'arrow', 0.01), ('VP', ('Verb','NP'), 0.5), ('VP', ('VP','PP'), 0.4), ('NP', 'time', 0.04), ('NP', 'flies', 0.02), ('NP', 'arrow', 0.04), ('NP', ('Det','NP'), 0.4), ('NP', ('Noun','NP'), 0.3), ('NP', ('NP','PP'), 0.2), ('PP', ('Preposition','NP'), 1.0), ('Noun', 'time', 0.4), ('Noun', 'flies', 0.2), ('Noun', 'arrow', 0.4), ('Verb', 'time', 0.1), ('Verb', 'flies', 0.4), ('Verb', 'like', 0.4), ('Verb', 'arrow', 0.1), ('Preposition', 'like', 1.0), ('Det', 'an', 1.0), ) G = build(PCNF) T = pcky(G, ('time', 'flies', 'like', 'an', 'arrow')) 2011-10-25 [0,1,VP]=0.01: ('time', 1, 1) and ('time', 1, 1) [0,1,NP]=0.04: ('time', 1, 1) and ('time', 1, 1) [0,1,Verb]=0.1: ('time', 1, 1) and ('time', 1, 1) [0,1,Noun]=0.4: ('time', 1, 1) and ('time', 1, 1) [1,2,VP]=0.04: ('flies', 2, 2) and ('flies', 2, 2) [1,2,NP]=0.02: ('flies', 2, 2) and ('flies', 2, 2) [1,2,Verb]=0.4: ('flies', 2, 2) and ('flies', 2, 2) [1,2,Noun]=0.2: ('flies', 2, 2) and ('flies', 2, 2) [0,2,VP]=0.001: ('Verb', 0, 1) and ('NP', 1, 2) [0,2,NP]=0.0024: ('Noun', 0, 1) and ('NP', 1, 2) [0,2,S]=0.00128: ('NP', 0, 1) and ('VP', 1, 2) [2,3,VP]=0.04: ('like', 3, 3) and ('like', 3, 3) [2,3,Preposition]=1: ('like', 3, 3) and ('like', 3, 3) [2,3,Verb]=0.4: ('like', 3, 3) and ('like', 3, 3) [1,3,S]=0.00064: ('NP', 1, 2) and ('VP', 2, 3) [0,3,S]=7.68e-05: ('NP', 0, 2) and ('VP', 2, 3) [3,4,Det]=1: ('an', 4, 4) and ('an', 4, 4) [4,5,VP]=0.01: ('arrow', 5, 5) and ('arrow', 5, 5) [4,5,NP]=0.04: ('arrow', 5, 5) and ('arrow', 5, 5) [4,5,Verb]=0.1: ('arrow', 5, 5) and ('arrow', 5, 5) [4,5,Noun]=0.4: ('arrow', 5, 5) and ('arrow', 5, 5) [3,5,NP]=0.016: ('Det', 3, 4) and ('NP', 4, 5) [2,5,VP]=0.0032: ('Verb', 2, 3) and ('NP', 3, 5) [2,5,S]=0.000128: ('VP', 2, 3) and ('NP', 3, 5) [2,5,PP]=0.016: ('Preposition', 2, 3) and ('NP', 3, 5) [1,5,VP]=0.000256: ('VP', 1, 2) and ('PP', 2, 5) [1,5,NP]=6.4e-05: ('NP', 1, 2) and ('PP', 2, 5) [1,5,S]=5.12e-05: ('NP', 1, 2) and ('VP', 2, 5) [0,5,VP]=6.4e-06: ('VP', 0, 2) and ('PP', 2, 5) [0,5,NP]=7.68e-06: ('Noun', 0, 1) and ('NP', 1, 5) [0,5,S]=8.192e-06: ('NP', 0, 1) and ('VP', 1, 5) Information Communication Theory (情報伝達学) 117 Notes on PCKY • Efficiency of PCKY • 𝑛: # of tokens, |𝑅|: # of CNF rules, |𝑆|: # of symbols • Computational cost 𝑂(|𝑅|𝑛3 ) because counters 𝑖, 𝑗, 𝑘 range 0, 𝑛 • Space requirement 𝑂(|𝑆|𝑛2 ) 2011-10-25 Information Communication Theory (情報伝達学) 118 Limitations of PCFG Chapter 14.4, D. Jurafsky and J. H. Martin. Speech and Language Processing, 2009. 2011-10-25 Information Communication Theory (情報伝達学) 119 Independence assumptions miss structural dependencies • NPs have different constructions depending on their positions • Pronouns appear at syntactic subject position more than at syntactic object position • Because pronouns tend to refer old information • Impossible to model this bias 𝑃(𝑁𝑃 → 𝑃𝑅𝑃) and 𝑃(𝑁𝑃 → 𝐷𝑇 𝑁𝑁), which depends on the position of NP Distribution of NP constructions in the Switchboard corpus (Francis+, 1999) Pronoun 2011-10-25 Non-pronoun Subject 90.8% 9.2% Object 34.3% 65.7% Total 79.8% 20.2% Information Communication Theory (情報伝達学) 120 Attachment ambiguity S VP NP PRP VBD I S ate NP NNP VP NP PRP VBD PP IN NP I ate Sushi with NNS hands NP NP NNP PP IN NP Sushi with NNS • The two trees differ only at: • 𝑃(VP → VBD NP PP) and 𝑃(VP → VBD NP)𝑃(NP → NP PP) hands • In order to choose the left tree (VP attachment), • 𝑃 VP → VBD NP PP > 𝑃(VP → VBD NP)𝑃(NP → NP PP) 2011-10-25 Information Communication Theory (情報伝達学) 121 Independence assumptions miss lexical dependencies S VP NP PRP VBD I S ate NP NNP VP NP PRP VBD PP IN NP I ate Sushi with NNS NP NP NNP PP IN NP shrimps Sushi with NNS • What if the word “hands” is changed to “shrimps”? • The probabilities of the two trees in the previous slide are unchanged shrimps • Impossible to choose the right tree (NP attachment) because • 𝑃 VP → VBD NP PP > 𝑃(VP → VBD NP)𝑃(NP → NP PP) • Attachment is the most difficult issue in parsing 2011-10-25 Information Communication Theory (情報伝達学) 122 Coordination ambiguity NP NP NP NNS IN PP NP CC NP and NNS NNS cats NP PP NNS IN dogs in dogs in houses NP NP NNS Collins (1999) houses CC NP and NNS cats • Coordination disambiguation also requires lexical information • Dogs is semantically a conjunct (接続) for cats (better than houses) • Dogs does not fit inside cats 2011-10-25 Information Communication Theory (情報伝達学) 123 Enhancing PCFGs Chapter 14.5, D. Jurafsky and J. H. Martin. Speech and Language Processing, 2009. 2011-10-25 Information Communication Theory (情報伝達学) 124 Basic strategies for enhancing PCFG • Problem of PCFGs • Independence assumption (context freeness) is too strong • Remedy for PCFGs • Encode more contexts to PCFG non-terminals/rules • Refine the events of constituents • Lexicalized PCFG • Encode lexical information (e.g., words) into non-terminal symbols • Unlexicalized PCFG 2011-10-25 Information Communication Theory (情報伝達学) 125 Parent annotation • Attach symbols of parent nodes (Johnson, 1998) S NP S NP^S VP PRP VBD I need NP DT NN a flight VP^S PRP VBD I need Before NP^VP DT NN a flight After Figure 14.8, Jurafsky and Martin, Speech and Language Processing 2011-10-25 Information Communication Theory (情報伝達学) 126 Fine-grained non-terminals • Split part-of-speech nodes as well (Klein and Manning, 2003) VP^S TO to VP^S VP^VP PP^VP VB see to NP^PP IN if Confusion between preposition and subordinating conjunction TO^VP NN VP^VP VB^VP see IN^SBAR S^SBAR if NP^S NNS advertising works SBAR^VP Sentential complement Incorrect parse obtained NN^NP VP^S VBZ^VP advertising works Correct parse obtained after node splitting 2011-10-25 Information Communication Theory (情報伝達学) 127 Other node-splitting strategies • Klein and Manning (2003) • Subcategorize AUX (auxiliary verb) NP NP • AUX-BE and AUX-HAVE • Separate demonstratives (e.g., that, these) UNARY-DT DT NN these the book and regular determiners (e.g., a, the) from DT • DT, UNARY-DT • Separate finite and infinitival VPs • VP is used everywhere! NP^S • Petrov et al. (2006) • An automatic approach to node-splitting • State-of-the-art performance • F1: 89.7 2011-10-25 S S • F1: 85.7 from 77.77 (baseline) DT VBZ This is NP^S VP^S-VBF VP^S VP^VP DT VBZ VB NP^VP This panic NN NP^VP is NN NN panic buying buying Information Communication Theory (情報伝達学) 128 Lexicalized PCFGs • Lexicalized grammar • Each non-terminal is annotated with its lexical head • Attach the head word to each non-terminal • VP(𝑑𝑢𝑚𝑝𝑒𝑑) → VBD(𝑑𝑢𝑚𝑝𝑒𝑑) NP(𝑠𝑎𝑐𝑘𝑠) PP(𝑖𝑛𝑡𝑜) • Attach the head word and its POS to each non-terminal • VP(𝑑𝑢𝑚𝑝𝑒𝑑, VBD) → VBD(𝑑𝑢𝑚𝑝𝑒𝑑,VBD) NP(𝑠𝑎𝑐𝑘𝑠,NNP) PP(𝑖𝑛𝑡𝑜,IN) 2011-10-25 Information Communication Theory (情報伝達学) 129 Finding the head of each non-terminal • Head rule (Magerman, 1995; Collins 1999, Appendix A) • Rules for NPs • If the last word is tagged POS, return the last word • Else search from left to right for the first child which is an NN, NNP, NNPS, • • • • • NX, POS, or JJR Else search from left to right for the first child which is an NP Else search from right to left for the first child which is a $, ADJP, or PRN Else search from right to left for the first child which is a CD Else search from right to left for the first child which is a JJ, JJS, RB, or QP Else return the last word Parent From Priority list S Left TO IN VP S SBAR ADJP UCP NP VP Left TO VBD VBN MD VBZ VB VBG VBP VP ADJP NN NNS NP PP Right IN TO VBG VBN RP FW 2011-10-25 Information Communication Theory (情報伝達学) 130 Lexicalized tree (Collins, 1999) S(dumped) NP(workers) VP(dumped) NNS(workers) VBD(dumped) NP(sacks) workers dumped NNS(sacks) IN(into) sacks into Lexicalized internal rules PP(into) ROOT → S(𝑑𝑢𝑚𝑝𝑒𝑑) S(𝑑𝑢𝑚𝑝𝑒𝑑) → NP(𝑤𝑜𝑟𝑘𝑒𝑟𝑠) VP(𝑑umped) NP(𝑤𝑜𝑟𝑘𝑒𝑟𝑠) → NNS(𝑤𝑜𝑟𝑘𝑒𝑟𝑠) VP(𝑑𝑢𝑚𝑝𝑒𝑑) → VBD(𝑑𝑢𝑚𝑝𝑒𝑑) NP(𝑠𝑎𝑐𝑘𝑠) PP(𝑖𝑛𝑡𝑜) PP(𝑑𝑢𝑚𝑝𝑒𝑑) → IN(𝑖𝑛𝑡𝑜) NP(𝑏𝑖𝑛) NP(𝑏𝑖𝑛) → DT(𝑎) NN(𝑏𝑖𝑛) 2011-10-25 Information Communication Theory (情報伝達学) NP(bin) DT(a) NN(bin) a bin 131 Computing rule probabilities of lexicalized PCFG • Maximum likelihood estimation (as we did before) 𝑃(VP(𝑑𝑢𝑚𝑝𝑒𝑑) → VBD(𝑑𝑢𝑚𝑝𝑒𝑑) NP(𝑠𝑎𝑐𝑘𝑠) PP(𝑖𝑛𝑡𝑜)) 𝐶𝑜𝑢𝑛𝑡(VP(𝑑𝑢𝑚𝑝𝑒𝑑) → VBD(𝑑𝑢𝑚𝑝𝑒𝑑) NP(𝑠𝑎𝑐𝑘𝑠) PP(𝑖𝑛𝑡𝑜)) = 𝐶𝑜𝑢𝑛𝑡(VP(𝑑𝑢𝑚𝑝𝑒𝑑)) • As simple as counting the number of occurrences of tree fragments, ... but wait! • Lexicalized PCFG rules are too specific to appear frequently in the treebank (data sparseness problem) • Collins (1999) decomposed the probability of each lexicalized PCFG rule into smaller units 2011-10-25 Information Communication Theory (情報伝達学) 132 Collins model 1 (simplified) • Decompose a rule probability with a product of its elements • Generate the node of the head word using 𝑃𝐻 • Generate daughters that are left side of the head word using 𝑃𝐿 • Generate STOP symbol using 𝑃𝐿 • Generate daughters that are right side of the head word using 𝑃𝑅 • Generate STOP symbol using 𝑃𝑅 • 𝑃(VP(𝑑𝑢𝑚𝑝𝑒𝑑) → VBD(𝑑𝑢𝑚𝑝𝑒𝑑) NP(𝑠𝑎𝑐𝑘𝑠) PP(𝑖𝑛𝑡𝑜)) • 𝑃𝐻 (VBD(𝑑𝑢𝑚𝑝𝑒𝑑)|VP(𝑑𝑢𝑚𝑝𝑒𝑑)) • 𝑃𝐿 (STOP|VP(𝑑𝑢𝑚𝑝𝑒𝑑)) • 𝑃𝑅 (NP(𝑠𝑎𝑐𝑘𝑠)|VP(𝑑𝑢𝑚𝑝𝑒𝑑)) • 𝑃𝑅 (PP(𝑖𝑛𝑡𝑜)|VP(𝑑𝑢𝑚𝑝𝑒𝑑)) These are less subject to the sparseness problem than the original probability • 𝑃𝑅 (STOP|VP(𝑑𝑢𝑚𝑝𝑒𝑑)) • Actual model is more complicated; refer to Collins (1999) 2011-10-25 Information Communication Theory (情報伝達学) 133 Efficiency of lexicalized PCKY • Unlexicalized PCKY • 𝑛: # of tokens, |𝑅|: # of CNF rules, |𝑆|: # of symbols • Computational cost: 𝑂(|𝑅|𝑛3 ) because counters 𝑖, 𝑗, 𝑘 range 0, 𝑛 • Space requirement: 𝑂(|𝑆|𝑛2 ) • Lexicalized PCKY • 𝑅 → |𝑅|𝑛2 , 𝑆 → |𝑆|𝑛 because symbols are bi-lexicalized • Computational cost: 𝑂 𝑅 𝑛5 • Space requirement: 𝑂(|𝑆|𝑛3 ) • 10,000 symbols × (50 words)3 ×(8 bytes/double) = 10GB • Eisner (1999) reduced the computational cost to 𝑂 𝑅 𝑛4 2011-10-25 Information Communication Theory (情報伝達学) 134 Evaluation 2011-10-25 Information Communication Theory (情報伝達学) 135 System and gold-standard trees S VP NP PRP VBD I S ate NP NNP VP NP PRP VBD PP IN NP I ate Sushi with NNS NP NP NNP shrimps PP NP IN Sushi with NNS shrimps System 2011-10-25 Gold-standard Information Communication Theory (情報伝達学) 136 PARSEVAL (Black et al., 1991) • Brackets in the parse tree by the system • (S, 0, 5), (NP, 0, 1), (VP, 1, 5), (NP, 2, 3), (PP, 3, 5), (NP, 4, 5) • Brackets in the gold-standard tree • (S, 0, 5), (NP, 0, 1), (VP, 1, 5), (NP, 2, 5), (NP, 2, 3), (PP, 3, 5), (NP, 4, 5) • Precision: 6/6 = 100% • Recall: 6/7 = 85.7% • Labeled precision: 6/6 = 100% • Labeled recall: 6/7 = 85.7% • Crossing brackets: 0 (e.g., “I ate” and “ate Sushi”) • Crossing accuracy: 100% • Tagging accuracy: 5/5 = 100% 2011-10-25 Information Communication Theory (情報伝達学) 137 Further readings • Dan Klein’s lecture • CS294-5: http://www.cs.berkeley.edu/~klein/cs294-5/ • CS288: http://www.cs.berkeley.edu/~klein/cs288/sp10/ • Christopher Manning’s lecture • CS224N: https://courseware.stanford.edu/pg/courses/214428/cs224n-fall- 2011 • Jason Eisner’s lecture: • #600.465: http://www.cs.jhu.edu/~jason/465/ • Erhard W. Hinrichs. Course on Linguistic Annotation and Treebanks - Winter 2007 • http://www.ling.ohio-state.edu/~hinrichs/ • Ann Bies, Mark Ferguson, Karen Katz, Robert MacIntyre. 1995. Bracketing Guidelines for Treebank II Style Penn Treebank Project. • ftp://ftp.cis.upenn.edu/pub/treebank/doc/manual/root.ps.gz 2011-10-25 Information Communication Theory (情報伝達学) 138 Reference (1/2) • A. V. Aho and J. D. Ullman (1972). The theory of parsing, translation, and • • • • • compiling. Prentice-Hall, Inc. E. Black, et al. (1991). A procedure for quantitatively comparing the syntactic coverage of English grammars. In Proceedings of DARPA Speech and Natural Language Workshop, pp. 306-311. E. Charniak, et. al. (1998). Edge-based best-first chart parsing, In Proceedings of the Six Workshop on Very Large Corpora, pp. 127–133. M. Collins (1999). Head-Driven Statistical Models for Natural Language Processing. Ph.D. thesis, University of Pennsylvania. J. Eisner and G. Satta (1999). Efficient parsing for bilexical context-free grammars and head automaton grammars, In ACL-99, pp. 457-464. H. S. Fransis, et al. (1999). Are lexical subjects deviant? In CLS-99, University of Chicago. 2011-10-25 Information Communication Theory (情報伝達学) 139 References (2/2) • M. Johnson (1998). PCFG models of linguistic tree representations. • • • • • • • Computational Linguistics, 24(4), 613-632. M. Johnson (2002). A simple pattern-matching algorithm for recovering empty nodes and their antecedents. In ACL-2002. D. Klein and C. D. Manning (2003). Accurate unlexicalized parsing. In ACL2003, pp. 423–430. C. Macleod, et. al (1998). COMLEX Syntax Reference Manual Version 3.0. Linguistic Data Consortium. D. M. Magerman (1995). Statistical decision-tree models for parsing. In ACL95, pp. 276-283. Joakim Nivre and Sandra Kübler. Tutorial on Dependency Parsing. In ACLColing 2006. S. Petrov, et al (2006). Learning accurate, compact, and interpretable tree annotation. In Coling/ACL 2006, pp. 433-440. H. Schmid (2004). Efficient parsing of highly ambiguous context-free grammars with bit vectors, In Coling 2004, pp. 162–168. 2011-10-25 Information Communication Theory (情報伝達学) 140