Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D. Final Defense Supervisor: Raymond J. Mooney Machine Learning Group Department of Computer Science The University of Texas at Austin 1 Semantic Parsing Semantic Parsing: Transforming natural language (NL) sentences into completely formal meaning representations (MRs) Sample application domains where MRs are directly executable by another computer system to perform some task CLang: Robocup Coach Language Geoquery: A Database Query Application 2 CLang (RoboCup Coach Language) Coach CLang In RoboCup Coach competition, teams compete to coach simulated players The coaching instructions are given in a formal language called CLang If our player 2 has the ball, then position our player 5 in the midfield. Semantic Parsing Simulated soccer field ((bowner (player our {2})) (do (player our {5}) (pos (midfield)))) 3 GeoQuery: A Database Query Application Query application for U.S. geography database [Zelle & Mooney, 1996] User What are the rivers in Texas? Semantic Parsing Query answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas)))) Angelina, Blanco, … DataBase 4 Motivation for Semantic Parsing Theoretically, it answers the question of how people interpret language Practical applications Question answering Natural language interface Knowledge acquisition Reasoning 5 Motivating Example If our player 2 has the ball, our player 4 should stay in our half ((bowner (player our {2})) (do our {4} (pos (half our)))) Semantic parsing is a compositional process. Sentence structures are needed for building meaning representations. bowner: ball owner pos: position 6 Syntax-Based Approaches Meaning composition follows the tree structure of a syntactic parse Composing the meaning of a constituent from the meanings of its sub-constituents in a syntactic parse Hand-built approaches (Woods, 1970, Warren and Pereira, 1982) Learned approaches sentences Miller et al. (1996): Conceptually simple Zettlemoyer & Collins (2005)): hand-built Combinatory Categorial Grammar (CCG) template rules 7 Example MR: bowner(player(our,2)) S NP PRP$ NN our player VP CD 2 NP VB has DT NN the ball Use the structure of a syntactic parse 8 Example MR: bowner(player(our,2)) S NP PRP$-our our VP NN-player(_,_) CD-2 player 2 NP VB-bowner(_) has DT-null NN-null the ball Assign semantic concepts to words 9 Example MR: bowner(player(our,2)) S NP-player(our,2) PRP$-our our VP NN-player(_,_) CD-2 player 2 NP VB-bowner(_) has DT-null NN-null the ball Compose meaning for the internal nodes 10 Example MR: bowner(player(our,2)) S VP-bowner(_) NP-player(our,2) PRP$-our our NN-player(_,_) CD-2 player 2 VB-bowner(_) has NP-null DT-null NN-null the ball Compose meaning for the internal nodes 11 Example MR: bowner(player(our,2)) S-bowner(player(our,2)) NP-player(our,2) PRP$-our our VP-bowner(_) NN-player(_,_) CD-2 player 2 NP-null VB-bowner(_) has DT-null NN-null the ball Compose meaning for the internal nodes 12 Semantic Grammars Non-terminals in a semantic grammar correspond to semantic concepts in application domains Hand-built approaches (Hendrix et al., 1978) Learned approaches Tang & Mooney (2001), Kate & Mooney (2006), Wong & Mooney (2006) 13 Example MR: bowner(player(our,2)) bowner player has our our the ball 2 player 2 bowner → player has the ball 14 Thesis Contributions Introduce two novel syntax-based approaches to semantic parsing Theoretically well-founded in computational semantics (Blackburn and Bos, 2005) Great opportunity: leverage the significant progress made in statistical syntactic parsing for semantic parsing (Collins, 1997; Charniak and Johnson, 2005; Huang, 2008) 15 Thesis Contributions SCISSOR: a novel integrated syntacticsemantic parser SYNSEM: exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional meaning composition Investigate when the knowledge of syntax can help 16 Representing Semantic Knowledge in Meaning Representation Language Grammar (MRLG) Assumes a meaning representation language (MRL) is defined by an unambiguous context-free grammar. Each production rule introduces a single predicate in the MRL. The parse of a MR gives its predicate-argument structure. Production Predicate CONDITION →(bowner PLAYER) P_BOWNER PLAYER →(player TEAM {UNUM}) P_PLAYER UNUM → 2 P_UNUM TEAM → our P_OUR 17 Roadmap SCISSOR SYNSEM Future Work Conclusions 18 SCISSOR Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations Integrated syntactic-semantic parsing Allows both syntax and semantics to be used simultaneously to obtain an accurate combined syntactic-semantic analysis A statistical parser is used to generate a semantically augmented parse tree (SAPT) 19 Syntactic Parse S NP VP PRP$ NN CD VB our player 2 has NP DT NN the ball 20 SAPT S-P_BOWNER NP-P_PLAYER VP-P_BOWNER PRP$-P_OUR NN-P_PLAYER CD- P_UNUM VB-P_BOWNER our player 2 has NP-NULL DT-NULL NN-NULL the ball Non-terminals now have both syntactic and semantic labels Semantic labels: dominate predicates in the sub-trees 21 SAPT S-P_BOWNER NP-P_PLAYER VP-P_BOWNER PRP$-P_OUR NN-P_PLAYER CD- P_UNUM VB-P_BOWNER our player 2 has NP-NULL DT-NULL NN-NULL the ball MR: P_BOWNER(P_PLAYER(P_OUR,P_UNUM)) 22 SCISSOR Overview SAPT Training Examples learner Integrated Semantic Parser TRAINING 23 SCISSOR Overview NL Sentence Integrated Semantic Parser SAPT TESTING Compose MR MR 24 Extending Collins’ (1997) Syntactic Parsing Model Find a SAPT with the maximum probability A lexicalized head-driven syntactic parsing model Extending the parsing model to generate semantic labels simultaneously with syntactic labels 25 Why Extending Collins’ (1997) Syntactic Parsing Model Suitable for incorporating semantic knowledge Head dependency: predicate-argument relation Syntactic subcategorization: a set of arguments that a predicate appears with Bikel (2004) implementation: easily extendable 26 Parser Implementation Supervised training on annotated SAPTs is just frequency counting Testing: a variant of standard CKY chartparsing algorithm Details in the thesis 27 Smoothing Each label in SAPT is the combination of a syntactic label and a semantic label Increases data sparsity Break the parameters down Ph(H | P, w) = Ph(Hsyn, Hsem | P, w) = Ph(Hsyn | P, w) × Ph(Hsem | P, w, Hsyn) 28 Experimental Corpora CLang (Kate, Wong & Mooney, 2005) 300 pieces of coaching advice 22.52 words per sentence Geoquery (Zelle & Mooney, 1996) 880 queries on a geography database 7.48 word per sentence MRL: Prolog and FunQL 29 Prolog vs. FunQL (Wong, 2007) What are the rivers in Texas? Prolog: X1: river; x2: texas answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas)))) FunQL: answer(river(loc_2(stateid(texas)))) Logical forms: widely used as MRLs in computational semantics, support reasoning 30 Prolog vs. FunQL (Wong, 2007) What are the rivers in Texas? Flexible order Prolog: answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas)))) FunQL: answer(river(loc_2(stateid(texas)))) Strict order Better generalization on Prolog 31 Experimental Methodology standard 10-fold cross validation Correctness CLang: exactly matches the correct MR Geoquery: retrieves the same answers as the correct MR Metrics Precision: % of the returned MRs that are correct Recall: % of NLs with their MRs correctly returned F-measure: harmonic mean of precision and recall 32 Compared Systems COCKTAIL (Tang & Mooney, 2001) WASP (Wong & Mooney, 2006) Semantic grammar, string kernels Z&C (Zettleymoyer & Collins, 2007) Semantic grammar, machine translation KRISP (Kate & Mooney, 2006) Deterministic, inductive logic programming Syntax-based, combinatory categorial grammar (CCG) LU (Lu et al., 2008) Semantic grammar, generative parsing model 33 Compared Systems COCKTAIL (Tang & Mooney, 2001) WASP (Wong & Mooney, 2006) Semantic grammar, string kernels Z&C (Zettleymoyer & Collins, 2007) Semantic grammar, machine translation KRISP (Kate & Mooney, 2006) Deterministic, inductive logic programming Hand-built lexicon for Geoquery Manual CCG Template rules Syntax-based, combinatory categorial grammar (CCG) LU (Lu et al., 2008) Semantic grammar, generative parsing model 34 Compared Systems COCKTAIL (Tang & Mooney, 2001) WASP (Wong & Mooney, 2006) handling logical forms Semantic grammar, string kernels Z&C (Zettleymoyer & Collins, 2007) Semantic grammar, machine translation λ-WASP, KRISP (Kate & Mooney, 2006) Deterministic, inductive logic programming Syntax-based, combinatory categorial grammar (CCG) LU (Lu et al., 2008) Semantic grammar, generative parsing model 35 Results on CLang Precision Recall F-measure COCKTAIL - - - SCISSOR 89.5 73.7 80.8 WASP 88.9 61.9 73.0 KRISP 85.2 61.9 71.7 - - - 82.4 57.7 67.8 Z&C LU Memory overflow Not reported (LU: F-measure after reranking is 74.4%) 36 Results on CLang Precision Recall F-measure SCISSOR 89.5 73.7 80.8 WASP 88.9 61.9 73.0 KRISP 85.2 61.9 71.7 LU 82.4 57.7 67.8 (LU: F-measure after reranking is 74.4%) 37 Results on Geoquery Precision Recall F-measure SCISSOR 92.1 72.3 81.0 WASP 87.2 74.8 80.5 KRISP 93.3 71.7 81.1 LU 86.2 81.8 84.0 COCKTAIL 89.9 79.4 84.3 λ-WASP 92.0 86.6 89.2 Z&C 95.5 83.2 88.9 FunQL Prolog (LU: F-measure after reranking is 85.2%) 38 Results on Geoquery (FunQL) Precision Recall F-measure SCISSOR 92.1 72.3 81.0 WASP 87.2 74.8 80.5 KRISP 93.3 71.7 81.1 LU 86.2 81.8 84.0 competitive (LU: F-measure after reranking is 85.2%) 39 Why Knowledge of Syntax does not Help Geoquery: 7.48 word per sentence Short sentence Sentence structure can be feasibly learned from NLs paired with MRs Gain from knowledge of syntax vs. flexibility loss 40 Limitation of Using Prior Knowledge of Syntax Traditional syntactic analysis N1 N2 What is the smallest state answer(smallest(state(all))) 41 Limitation of Using Prior Knowledge of Syntax Traditional syntactic analysis Semantic grammar N1 N1 N2 What is the smallest state answer(smallest(state(all))) What N2 state is the smallest answer(smallest(state(all))) Isomorphic syntactic structure with MR Better generalization 42 Why Prior Knowledge of Syntax does not Help Geoquery: 7.48 word per sentence Short sentence Sentence structure can be feasibly learned from NLs paired with MRs Gain from knowledge of syntax vs. flexibility loss LU vs. WASP and KRISP Decomposed model for semantic grammar 43 Detailed Clang Results on Sentence Length 0-10 (7%) 11-20 (33%) 21-30 (46%) 31-40 (13%) 44 SCISSOR Summary Integrated syntactic-semantic parsing approach Learns accurate semantic interpretations by utilizing the SAPT annotations knowledge of syntax improves performance on long sentences 45 Roadmap SCISSOR SYNSEM Future Work Conclusions 46 SYNSEM Motivation SCISSOR requires extra SAPT annotation for training Must learn both syntax and semantics from same limited training corpus High performance syntactic parsers are available that are trained on existing large corpora (Collins, 1997; Charniak & Johnson, 2005) 47 SCISSOR Requires SAPT Annotation S-P_BOWNER NP-P_PLAYER VP-P_BOWNER PRP$-P_OUR NN-P_PLAYER CD- P_UNUM VB-P_BOWNER our player 2 has NP-NULL DT-NULL NN-NULL the ball Time consuming. Automate it! 48 Part I: Syntactic Parse S NP VP PRP$ NN CD VB our player 2 has NP DT NN the ball Use a statistical syntactic parser 49 Part II: Word Meanings P_OUR P_PLAYER our player our P_UNUM 2 player P_BOWNER NULL has 2 has the the NULL ball ball P_BOWNER P_PLAYER P_OUR P_UNUM Use a word alignment model (Wong and Mooney (2006) ) 50 Learning a Semantic Lexicon IBM Model 5 word alignment (GIZA++) top 5 word/predicate alignments for each training example Assume each word alignment and syntactic parse defines a possible SAPT for composing the correct MR 51 S VP NP P_OUR NP λa1λa2P_PLAYER our player λa1P_BOWNER P_UNUM 2 has NP NULL NULL the ball Introducing λvariables in semantic labels for missing arguments (a1: the first argument) 52 Part III: Internal Semantic Labels S P_BOWNER P_PLAYER VP NP P_OUR P_OUR NP λa1λa2P_PLAYER our player λa1P_BOWNER P_UNUM 2 has P_UNUM NP NULL NULL the ball How to choose the dominant predicates? 53 Learning Semantic Composition Rules ? λa1λa2P_PLAYER player λa1λa2PLAYER P_BOWNER P_UNUM 2 + P_UNUM P_PLAYER P_OUR P_UNUM λa1 P_PLAYER , a2=c2 (c2: child 2) 54 Learning Semantic Composition Rules S P_BOWNER P_PLAYER ? VP P_OUR P_OUR λa1P_PLAYER λa1λa2P_PLAYER our player λa1P_BOWNER P_UNUM 2 has P_UNUM NP NULL NULL the ball λa1λa2PLAYER + P_UNUM {λa1P_PLAYER, a2=c2} 55 Learning Semantic Composition Rules S P_BOWNER P_PLAYER P_PLAYER VP P_OUR P_OUR λa1P_PLAYER λa1λa2P_PLAYER our player λa1P_BOWNER P_UNUM 2 has ? NULL NULL the ball P_OUR +λa1P_PLAYER {P_PLAYER, a1=c1} P_UNUM 56 Learning Semantic Composition Rules ? P_BOWNER P_PLAYER λa1P_BOWNER P_PLAYER P_OUR P_OUR λa1P_PLAYER λa1λa2P_PLAYER our player P_UNUM λa1P_BOWNER NULL P_UNUM 2 has NULL NULL the ball 57 Learning Semantic Composition Rules P_BOWNER P_BOWNER P_PLAYER λa1P_BOWNER P_PLAYER P_OUR P_OUR λa1P_PLAYER λa1λa2P_PLAYER our player P_UNUM λa1P_BOWNER NULL P_UNUM 2 has NULL NULL the ball P_PLAYER + λa1P_BOWNER {P_BOWNER, a1=c1} 58 Ensuring Meaning Composition N1 N2 What is the smallest state answer(smallest(state(all))) Non-isomorphism 59 Ensuring Meaning Composition Non-isomorphism between NL parse and MR parse Various linguistic phenomena Machine translation between NL and MRL Use automated syntactic parses Introduce macro-predicates that combine multiple predicates. Ensure that MR can be composed using a syntactic parse and word alignment 60 SYNSEM Overview training/test sentence, S Syntactic parser syntactic parse tree,T Before training & testing Unambiguous CFG of MRL Semantic knowledge acquisition Training set, {(S,T,MR)} Semantic lexicon & composition rules Parameter estimation Probabilistic parsing model Training Input sentence parse T Testing Output MR Semantic parsing 61 SYNSEM Overview training/test sentence, S Syntactic parser syntactic parse tree,T Before training & testing Unambiguous CFG of MRL Semantic knowledge acquisition Training set, {(S,T,MR)} Semantic lexicon & composition rules Parameter estimation Probabilistic parsing model Training Input sentence, S Testing Output MR Semantic parsing 62 Parameter Estimation • • • • Apply the learned semantic knowledge to all training examples to generate possible SAPTs Use a standard maximum-entropy model similar to that of Zettlemoyer & Collins (2005), and Wong & Mooney (2006) Training finds a parameter that (approximately) maximizes the sum of the conditional log-likelihood of the training set including syntactic parses Incomplete data since SAPTs are hidden variables 63 Features Lexical features: Unigram features: # that a word is assigned a predicate Bigram features: # that a word is assigned a predicate given its previous/subsequent word. Rule features: # a composition rule applied in a derivation 64 Handling Logical Forms What are the rivers in Texas? answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas)))) λv1P_ANSWER(x1) λv1P_RIVER(x1) λv1λv2P_LOC(x1,x2) λv1P_EQUAL(x2) Handle shared logical variables Use Lambda Calculus (v: variable) 65 Prolog Example What are the rivers in Texas? answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas)))) λv1P_ANSWER(x1) (λv1P_RIVER(x1) λv1 λv2P_LOC(x1,x2) λv1P_EQUAL(x2)) Handle shared logical variables Use Lambda Calculus (v: variable) 66 Prolog Example What are the rivers in Texas? answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas)))) λv1P_ANSWER(x1) (λv1P_RIVER(x1) λv1λv2P_LOC(x1,x2) λv1P_EQUAL(x2)) Handle shared logical variables Use Lambda Calculus (v: variable) 67 Prolog Example answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas)))) SBARQ Start from a syntactic parse SQ NP PP WHNP VBP NP IN NP What are the rivers in Texas 68 Prolog Example answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas)))) SBARQ Add predicates to words SQ NP PP λv1λa1P_ANSWER What NULL λv1P_RIVER are the rivers λv1λv2P_LOC λv1P_EQUAL in Texas 69 Prolog Example answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas)))) SBARQ Learn a rule with variable unification SQ NP λv1P_LOC λv1λa1P_ANSWER What NULL λv1P_RIVER are the rivers λv1λv2P_LOC λv1P_EQUAL in λv1λv2P_LOC(x1,x2) + λv1P_EQUAL(x2) λv1P_LOC Texas 70 Experimental Results CLang Geoquery (Prolog) 71 Syntactic Parsers (Bikel,2004) WSJ only WSJ + in-domain sentences CLang(SYN0): F-measure=82.15% Geoquery(SYN0) : F-measure=76.44% CLang(SYN20): 20 sentences, F-measure=88.21% Geoquery(SYN40): 40 sentences, F-measure=91.46% Gold-standard syntactic parses (GOLDSYN) 72 Questions Q1. Can SYNSEM produce accurate semantic interpretations? Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? Q3. Does it also improve on long sentences? Q4. Does it improve on limited training data due to the prior knowledge from large treebanks? Q5. Can it handle syntactic errors? 73 Results on CLang Precision Recall F-measure GOLDSYN 84.7 74.0 79.0 SYN20 85.4 70.0 76.9 SYN0 87.0 67.0 75.7 SCISSOR 89.5 73.7 80.8 WASP 88.9 61.9 73.0 KRISP 85.2 61.9 71.7 LU 82.4 57.7 67.8 SYNSEM SAPTs (LU: F-measure after reranking is 74.4%) GOLDSYN > SYN20 > SYN0 74 Questions Q1. Can SynSem produce accurate semantic interpretations? [yes] Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes] Q3. Does it also improve on long sentences? 75 Detailed Clang Results on Sentence Length Prior + Knowledge Flexibility 0-10 (7%) + 11-20 (33%) Syntactic error 21-30 (46%) = ? 31-40 (13%) 76 Questions Q1. Can SynSem produce accurate semantic interpretations? [yes] Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes] Q3. Does it also improve on long sentences? [yes] Q4. Does it improve on limited training data due to the prior knowledge from large treebanks? 77 Results on Clang (training size = 40) Precision Recall F-measure GOLDSYN 61.1 35.7 45.1 SYN20 57.8 31.0 40.4 SYN0 53.5 22.7 31.9 SCISSOR 85.0 23.0 36.2 WASP 88.0 14.4 24.7 KRISP 68.35 20.0 31.0 SYNSEM SAPTs The quality of syntactic parser is critically important! 78 Questions Q1. Can SynSem produce accurate semantic interpretations? [yes] Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes] Q3. Does it also improve on long sentences? [yes] Q4. Does it improve on limited training data due to the prior knowledge from large treebanks? [yes] Q5. Can it handle syntactic errors? 79 Handling Syntactic Errors Training ensures meaning composition from syntactic parses with errors For test NLs that generate correct MRs, measure the F-measures of their syntactic parses SYN0: 85.5% SYN20: 91.2% If DR2C7 is true then players 2 , 3 , 7 and 8 should pass to player 4 80 Questions Q1. Can SynSem produce accurate semantic interpretations? [yes] Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes] Q3. Does it also improve on long sentences? [yes] Q4. Does it improve on limited training data due to the prior knowledge of large treebanks? [yes] Q5. Is it robust to syntactic errors? [yes] 81 Results on Geoquery (Prolog) Precision Recall F-measure GOLDSYN 91.9 88.2 90.0 SYN40 90.2 86.9 88.5 SYN0 81.8 79.0 80.4 COCKTAIL 89.9 79.4 84.3 λ-WASP 92.0 86.6 89.2 Z&C 95.5 83.2 88.9 SYNSEM SYN0 does not perform well All other recent systems perform competitively 82 SYNSEM Summary Exploits an existing syntactic parser to drive the meaning composition process Prior knowledge of syntax improves performance on long sentences Prior knowledge of syntax improves performance on limited training data Handle syntactic errors 83 Discriminative Reranking for semantic Parsing Adapt global features used for reranking syntactic parsing for semantic parsing Improvement on CLang No improvement on Geoquery where sentences are short, and are less likely for global features to show improvement on 84 Roadmap SCISSOR SYNSEM Future Work Conclusions 85 Future Work Improve SCISSOR Discriminative SCISSOR (Finkel, et al., 2008) Handling logical forms SCISSOR without extra annotation (Klein and Manning, 2002, 2004) Improve SYNSEM Utilizing syntactic parsers with improved accuracy and in other syntactic formalism 86 Future Work Utilizing wide-coverage semantic representations (Curran et al., 2007) Better generalizations for syntactic variations Utilizing semantic role labeling (Gildea and Palmer, 2002) Provides a layer of correlated semantic information 87 Roadmap SCISSOR SYNSEM Future Work Conclusions 88 Conclusions SCISSOR: a novel integrated syntactic-semantic parser. SYNSEM: exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional meaning composition. Both produce accurate semantic interpretations. Using the knowledge of syntax improves performance on long sentences. SYNSEM also improves performance on limited training data. 89 Thank you! Questions? 90