Using String-Kernels for Learning Semantic Parsers Rohit J. Kate Raymond J. Mooney Machine Learning Group Department of Computer Sciences University of Texas at Austin USA Semantic Parsing • Semantic Parsing: Transforming natural language (NL) sentences into computer executable complete meaning representations (MRs) for some application • Example application domains – CLang: Robocup Coach Language – Geoquery: A Database Query Application 2 CLang: RoboCup Coach Language • In RoboCup Coach competition teams compete to coach simulated players [http://www.robocup.org] • The coaching instructions are given in a formal language called CLang [Chen et al. 2003] If the ball is in our goal area then player 1 should intercept it. Simulated soccer field Semantic Parsing (bpos (goal-area our) (do our {1} intercept)) CLang 3 Geoquery: A Database Query Application • Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] Which rivers run through the states bordering Texas? Arkansas, Canadian, Cimarron, Gila, Mississippi, Rio Grande … Answer Semantic Parsing answer(traverse(next_to(stateid(‘texas’)))) Query 4 Learning Semantic Parsers • We assume meaning representation languages (MRLs) have deterministic context free grammars – True for almost all computer languages – MRs can be parsed unambiguously 5 NL: Which rivers run through the states bordering Texas? MR: answer(traverse(next_to(stateid(‘texas’)))) Parse tree of MR: ANSWER RIVER answer TRAVERSE traverse STATE NEXT_TO STATE next_to STATEID stateid ‘texas’ Non-terminals: ANSWER, RIVER, TRAVERSE, STATE, NEXT_TO, STATEID Terminals: answer, traverse, next_to, stateid, ‘texas’ Productions: ANSWER answer(RIVER), RIVER TRAVERSE(STATE), STATE NEXT_TO(STATE), TRAVERSE traverse, NEXT_TO next_to, STATEID ‘texas’ 6 Learning Semantic Parsers • Assume meaning representation languages (MRLs) have deterministic context free grammars – True for almost all computer languages – MRs can be parsed unambiguously • Training data consists of NL sentences paired with their MRs • Induce a semantic parser which can map novel NL sentences to their correct MRs • Learning problem differs from that of syntactic parsing where training data has trees annotated over the NL sentences 7 KRISP: Kernel-based Robust Interpretation for Semantic Parsing • Learns semantic parser from NL sentences paired with their respective MRs given MRL grammar • Productions of MRL are treated like semantic concepts • SVM classifier with string subsequence kernel is trained for each production to identify if an NL substring represents the semantic concept • These classifiers are used to compositionally build MRs of the sentences 8 Overview of KRISP MRL Grammar NL sentences with MRs Collect positive and negative examples Train string-kernel-based SVM classifiers Training Testing Novel NL sentences Best MRs (correct and incorrect) Semantic Parser Best MRs 9 Overview of KRISP MRL Grammar NL sentences with MRs Collect positive and negative examples Train string-kernel-based SVM classifiers Training Testing Novel NL sentences Best MRs (correct and incorrect) Semantic Parser Best MRs 10 KRISP’s Semantic Parsing • We first define Semantic Derivation of an NL sentence • We next define Probability of a Semantic Derivation • Semantic parsing of an NL sentence involves finding its Most Probable Semantic Derivation • Straightforward to obtain MR from a semantic derivation 11 Semantic Derivation of an NL Sentence MR parse with non-terminals on the nodes: ANSWER RIVER answer TRAVERSE traverse STATE NEXT_TO STATE next_to STATEID stateid ‘texas’ Which rivers run through the states bordering Texas? 12 Semantic Derivation of an NL Sentence MR parse with productions on the nodes: ANSWER answer(RIVER) RIVER TRAVERSE(STATE) TRAVERSE traverse STATE NEXT_TO(STATE) NEXT_TO next_to STATE STATEID STATEID ‘texas’ Which rivers run through the states bordering Texas? 13 Semantic Derivation of an NL Sentence Semantic Derivation: Each node covers an NL substring: ANSWER answer(RIVER) RIVER TRAVERSE(STATE) TRAVERSE traverse STATE NEXT_TO(STATE) NEXT_TO next_to STATE STATEID STATEID ‘texas’ Which rivers run through the states bordering Texas? 14 Semantic Derivation of an NL Sentence Semantic Derivation: Each node contains a production and the substring of NL sentence it covers: (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO(STATE), [5..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9 15 Semantic Derivation of an NL Sentence Substrings in NL sentence may be in a different order: ANSWER answer(RIVER) RIVER TRAVERSE(STATE) TRAVERSE traverse STATE NEXT_TO(STATE) NEXT_TO next_to STATE STATEID STATEID ‘texas’ Through the states that border Texas which rivers run? 16 Semantic Derivation of an NL Sentence Nodes are allowed to permute the children productions from the original MR parse (ANSWER answer(RIVER), [1..10]) (RIVER TRAVERSE(STATE), [1..10]] (STATE NEXT_TO(STATE), [1..6]) (NEXT_TO next_to, [1..5]) (TRAVERSE traverse, [7..10]) (STATE STATEID, [6..6]) (STATEID ‘texas’, [6..6]) Through the states that border Texas which rivers run? 1 2 3 4 5 6 7 8 9 10 17 Probability of a Semantic Derivation • Let Pπ(s[i..j]) be the probability that production π covers the substring s[i..j] of sentence s • For e.g., PNEXT_TO next_to (“the states bordering”) (NEXT_TO next_to, [5..7]) 0.99 the states bordering 5 6 7 • Obtained from the string-kernel-based SVM classifiers trained for each production π • Assuming independence, probability of a semantic derivation D: P ( D) P (s[i.. j ]) ( ,[ i .. j ])D 18 Probability of a Semantic Derivation contd. (ANSWER answer(RIVER), [1..9]) 0.98 (RIVER TRAVERSE(STATE), [1..9]) 0.9 (TRAVERSE traverse, [1..4]) (STATE NEXT_TO(STATE), [5..9]) 0.95 0.89 (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) 0.99 0.93 (STATEID ‘texas’, [8..9]) 0.98 Which rivers run through the states bordering Texas? 1 2 3 P ( D) 4 5 6 7 8 9 P (s[i.. j ]) 0.673 ( ,[ i .. j ])D 19 Computing the Most Probable Semantic Derivation • Task of semantic parsing is to find the most probable semantic derivation of the NL sentence given all the probabilities Pπ(s[i..j]) • Implemented by extending Earley’s [1970] context-free grammar parsing algorithm • Resembles PCFG parsing but different because: – Probability of a production depends on which substring of the sentence it covers – Leaves are not terminals but substrings of words 20 Computing the Most Probable Semantic Derivation contd. • Does a greedy approximation search, with beam width ω=20, and returns ω most probable derivations it finds • Uses a threshold θ=0.05 to prune low probability trees 21 Overview of KRISP MRL Grammar NL sentences with MRs Collect positive and negative examples Train string-kernel-based SVM classifiers Best semantic derivations (correct and incorrect) Pπ(s[i..j]) Training Testing Novel NL sentences Semantic Parser Best MRs 22 KRISP’s Training Algorithm • Takes NL sentences paired with their respective MRs as input • Obtains MR parses • Induces the semantic parser and refines it in iterations • In the first iteration, for every production π: – Call those sentences positives whose MR parses use that production – Call the remaining sentences negatives 23 KRISP’s Training Algorithm contd. First Iteration STATE NEXT_TO(STATE) Positives Negatives •which rivers run through the states bordering texas? •what state has the highest population ? •what is the most populated state bordering oklahoma ? •which states have cities named austin ? •what states does the delaware river run through ? •what is the largest city in states that border california ? •what is the lowest point of the state with the largest area ? … … String-kernel-based SVM classifier 24 String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” K(s,t) = ? 25 String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states K(s,t) = 1+? 26 String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = next K(s,t) = 2+? 27 String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = to K(s,t) = 3+? 28 String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states next K(s,t) = 4+? 29 String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” K(s,t) = 7 30 String Subsequence Kernel contd. • The kernel is normalized to remove any bias due to different string lengths K ( s, t ) K normalized(s, t ) K (s, s) * K (t , t ) • Lodhi et al. [2002] give O(n|s||t|) algorithm for computing string subsequence kernel • Used for Text Categorization [Lodhi et al, 2002] and Information Extraction [Bunescu & Mooney, 2005] 31 String Subsequence Kernel contd. • The examples are implicitly mapped to the feature space of all subsequences and the kernel computes the dot products state with the capital of states with area larger than states through which the states next to states that border states bordering states that share border 32 Support Vector Machines • SVMs find a separating hyperplane such that the margin is maximized Separating hyperplane state with the capital of states that are next to states with area larger than states through which 0.97 the states next to states that border states bordering states that share border Probability estimate of an example belonging to a class can be obtained using its distance from the hyperplane [Platt, 1999] 33 KRISP’s Training Algorithm contd. First Iteration STATE NEXT_TO(STATE) Positives Negatives •which rivers run through the states bordering texas? •what state has the highest population ? •what is the most populated state bordering oklahoma ? •which states have cities named austin ? •what states does the delaware river run through ? •what is the largest city in states that border california ? •what is the lowest point of the state with the largest area ? … … String-kernel-based SVM classifier PSTATENEXT_TO(STATE) (s[i..j]) 34 Overview of KRISP MRL Grammar NL sentences with MRs Collect positive and negative examples Train string-kernel-based SVM classifiers Best semantic derivations (correct and incorrect) Pπ(s[i..j]) Training Testing Novel NL sentences Semantic Parser Best MRs 35 Overview of KRISP MRL Grammar NL sentences with MRs Collect positive and negative examples Train string-kernel-based SVM classifiers Best semantic derivations (correct and incorrect) Pπ(s[i..j]) Training Testing Novel NL sentences Semantic Parser Best MRs 36 KRISP’s Training Algorithm contd. • Using these classifiers Pπ(s[i..j]), obtain the ω best semantic derivations of each training sentence • Some of these derivations will give the correct MR, called correct derivations, some will give incorrect MRs, called incorrect derivations • For the next iteration, collect positives from most probable correct derivation • Extended Earley’s algorithm can be forced to derive only the correct derivations by making sure all subtrees it generates exist in the correct MR parse • Collect negatives from incorrect derivations with higher probability than the most probable correct derivation 37 KRISP’s Training Algorithm contd. Most probable correct derivation: (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO(STATE), [5..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9 38 KRISP’s Training Algorithm contd. Most probable correct derivation: Collect positive examples (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO(STATE), [5..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9 39 KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 Incorrect MR: answer(traverse(stateid(‘texas’))) 7 8 9 40 KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: Collect negative examples (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 Incorrect MR: answer(traverse(stateid(‘texas’))) 7 8 9 41 KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: (ANSWER answer(RIVER), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (NEXT_TO next_to, [5..7]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Traverse both trees in breadth-first order till the first nodes where their productions differ are found. 42 KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: (ANSWER answer(RIVER), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (NEXT_TO next_to, [5..7]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Traverse both trees in breadth-first order till the first nodes where their productions differ are found. 43 KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: (ANSWER answer(RIVER), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (NEXT_TO next_to, [5..7]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Traverse both trees in breadth-first order till the first nodes where their productions differ are found. 44 KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: (ANSWER answer(RIVER), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (NEXT_TO next_to, [5..7]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Traverse both trees in breadth-first order till the first nodes where their productions differ are found. 45 KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: (ANSWER answer(RIVER), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (NEXT_TO next_to, [5..7]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Traverse both trees in breadth-first order till the first nodes where their productions differ are found. 46 KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: (ANSWER answer(RIVER), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (NEXT_TO next_to, [5..7]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Mark the words under these nodes. 47 KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: (ANSWER answer(RIVER), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (NEXT_TO next_to, [5..7]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Mark the words under these nodes. 48 KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: (ANSWER answer(RIVER), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (NEXT_TO next_to, [5..7]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Consider all the productions covering the marked words. Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation. 49 KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: (ANSWER answer(RIVER), [1..9]) (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO (STATE), [5..9]) (NEXT_TO next_to, [5..7]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) Which rivers run through the states bordering Texas? (STATE STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Consider the productions covering the marked words. Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation. 50 KRISP’s Training Algorithm contd. Next Iteration: more refined positive and negative examples STATE NEXT_TO(STATE) Positives Negatives •the states bordering texas? •what state has the highest population ? •state bordering oklahoma ? •what states does the delaware river run through ? •states that border california ? •which states have cities named austin ? •states which share border •what is the lowest point of the state with the largest area ? •next to state of iowa •which rivers run through states bordering … … String-kernel-based SVM classifier PSTATENEXT_TO(STATE) (s[i..j]) 51 Overview of KRISP MRL Grammar NL sentences with MRs Collect positive and negative examples Train string-kernel-based SVM classifiers Best semantic derivations (correct and incorrect) Pπ(s[i..j]) Training Testing Novel NL sentences Semantic Parser Best MRs 52 Experimental Corpora • CLang [Kate, Wong & Mooney, 2005] – 300 randomly selected pieces of coaching advice from the log files of the 2003 RoboCup Coach Competition – 22.52 words on average in NL sentences – 13.42 tokens on average in MRs • Geoquery [Tang & Mooney, 2001] – 880 queries for the given U.S. geography database – 7.48 words on average in NL sentences – 6.47 tokens on average in MRs 53 Experimental Methodology • Evaluated using standard 10-fold cross validation • Correctness – CLang: output exactly matches the correct representation – Geoquery: the resulting query retrieves the same answer as the correct representation • Metrics Precision Number of correct MRs Number of test sentences with complete output MRs Recall Number of correct MRs Number of test sentences 54 Experimental Methodology contd. • Compared Systems: – CHILL [Tang & Mooney, 2001]: Inductive Logic Programming based semantic parser – SILT [Kate, Wong & Mooney, 2005]: learns transformation rules relating NL sentences to MR expressions – SCISSOR [Ge & Mooney, 2005]: learns an integrated syntacticsemantic parser, needs extra annotations – WASP [Wong & Mooney, 2006]: uses statistical machine translation techniques – Zettlemoyer & Collins (2005): CCG-based semantic parser • Different Experimental Setup (600 training, 280 testing examples) • Results available only for Geoquery corpus 55 Experimental Methodology contd. • KRISP gives probabilities for its semantic derivation which are taken as confidences of the MRs • We plot precision-recall curves by first sorting the best MR for each sentence by confidences and then finding precision for every recall value • WASP and SCISSOR also output confidences so we show their precision-recall curves • Results of other systems shown as points on precision-recall graphs 56 Results on CLang requires more annotation on the training corpus CHILL gives 49.2% precision and 12.67% recall with 160 examples, can’t run beyond. 57 Results on Geoquery 58 Experiments with Noisy NL Sentences • Any application of semantic parser is likely to face noise in the input • If the input is coming from a speech recognizer: – Interjections (um’s and ah’s) – Environment noise (door slams, phone rings etc.) – Out-of-domain words, ill-formed utterances etc. • KRISP does not use hard-matching rules unlike other systems and is hence more robust to noise • We show this by introducing simulated speech recognition errors in the corpus 59 Experiments with Noisy NL Sentences contd. • Interjections, environment noise etc. is likely to be recognized as real words, simulate this by adding a word with probability Padd after every word – An extra word w is added with probability P(w) proportional to its frequencies in the BNC • A speech recognizer may completely fail to detect a word, so with probability Pdrop a word is dropped you If the ball is in our goal area then our player 1 should intercept it. 60 Experiments with Noisy NL Sentences contd. • A speech recognizer may confuse a word with a high frequency phonetically close word, a word is substituted by another word w with probability: ped(w)*P(w) – where p is a parameter in [0,1] – ed(w) is w’s edit distance from the original word [Levenshtein, 1966] – P(w) is w’s probability proportional to its frequency in BNC you when If the ball is in our goal area then our should intercept it. 1 61 Experiments with Noisy NL Sentences contd. • Four noise levels were created by: – Varying parameters Padd and Pdrop from being 0 at level zero to 0.1 at level four – Varying parameter p from being 0 at level zero to 0.01 at level four • Results shown when only test sentences are corrupted, qualitatively similar results when both test and train sentences are corrupted • We show best F-measures (harmonic mean of precision and recall) 62 Results on Noisy CLang Corpus 63 Conclusions • KRISP: A new string-kernel-based approach for learning semantic parser • String-kernel-based SVM classifiers trained for each MRL production • Classifiers used to compositionally build complete MRs of NL sentences • Evaluated on two real-world corpora – Performs better than rule-based systems – Performs comparable to other statistical systems – More robust to noise 64 Thank You! Our corpora can be downloaded from: http://www.cs.utexas.edu/~ml/nldata.html Check out our online demo for Geoquery at: http://www.cs.utexas.edu/~ml/geo.html Questions?? 65 Extra: Experiments with Other Natural Languages 66 Extra: Dealing with Constants • MRL grammar may contain productions corresponding to constants in the domain: STATEID ‘new york’ RIVERID ‘colorado’ NUM ‘2’ STRING ‘DR4C10’ • User can specify these as constant productions giving their NL substrings • Classifiers are not learned for these productions • Matching substring’s probability is taken as 1 • If n constant productions have same substring then each gets probability of 1/n STATEID ‘colorado’ RIVERID ‘colorado’ 67 Extra: String Subsequence Kernel • Subsequences with gaps should be downweighted • Decay factor λ in the range of (0,1] penalizes gaps • All subsequences are the implicit features and penalties are the feature values s = “left side of our penalty area” t = “our left penalty area” u = left penalty K(s,t) = 4+? 68 Extra: String Subsequence Kernel • Subsequences with gaps should be downweighted • Decay factor λ in the range of (0,1] penalizes gaps • All subsequences are the implicit features and penalties are the feature values Gap of 3 => λ3 s = “left side of our penalty area” Gap of 0 => λ0 t = “our left penalty area” u = left penalty K(s,t) = 4+λ3*λ0 +? 69 Extra: String Subsequence Kernel • Subsequences with gaps should be downweighted • Decay factor λ in the range of (0,1] penalizes gaps • All subsequences are the implicit features and penalties are the feature values s = “left side of our penalty area” t = “our left penalty area” K(s,t) = 4+3λ+3 λ3+ λ5 70 Extra: KRISP’s Average Running Times Corpus Average Training Time (minutes) Average Testing Time (minutes) Geo250 1.44 0.05 Geo880 18.1 0.65 CLang 58.85 3.18 Average running times per fold in minutes taken by KRISP. 71 Extra: Experimental Methodology • Correctness – CLang: output exactly matches the correct representation – Geoquery: the resulting query retrieves the same answer as the correct representation If the ball is in our penalty area, all our players except player 4 should stay in our half. Correct: ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) ((bpos (penalty-area opp)) Output: (do (player-except our{4}) (pos (half our))) 72 Extra: Computing the Most Probable Semantic Derivation • Task of semantic parsing is to find the most probable semantic derivation of the NL sentence • Let En,s[i..j], partial derivation, denote any subtree of a derivation tree with n as the LHS non-terminal of the root production covering sentence s from index i to j • Example of ESTATE,s[5..9] : (STATE NEXT_TO(STATE), [5..9]) (NEXT_TO next_to, [5..7]) (STATE STATEID, [8..9]) (STATEID ‘texas’, [8..9]) the states bordering Texas? 5 6 7 8 9 • Derivation D is then EANSWER, s[1..|s|] 73 Extra: Computing the Most Probable Semantic Derivation contd. • Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] • This is computed recursively as follows: E*STATE,s[5..9] (STATE NEXT_TO(STATE), [5..9]) E*NEXT_TO,s[i..j] E*STATE,s[i..j] the states bordering Texas? 5 6 7 8 9 E * n , s[ i.. j ] makeTree( arg max ( P ( s[i.. j ]) n n1 .. nt G )) 74 Extra: Computing the Most Probable Semantic Derivation contd. • Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] • This is computed recursively as follows: E*STATE,s[5..9] (STATE NEXT_TO(STATE), [5..9]) E*NEXT_TO,s[5..5] E*STATE,s[6..9] the states bordering Texas? 5 6 7 8 9 E * n , s[ i.. j ] makeTree( arg max ( P ( s[i.. j ]) n n1 .. nt G )) 75 Extra: Computing the Most Probable Semantic Derivation contd. • Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] • This is computed recursively as follows: E*STATE,s[5..9] (STATE NEXT_TO(STATE), [5..9]) E*NEXT_TO,s[5..6] E*STATE,s[7..9] the states bordering Texas? 5 6 7 8 9 E * n , s[ i.. j ] makeTree( arg max ( P ( s[i.. j ]) n n1 .. nt G )) 76 Extra: Computing the Most Probable Semantic Derivation contd. • Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] • This is computed recursively as follows: E*STATE,s[5..9] (STATE NEXT_TO(STATE), [5..9]) E*NEXT_TO,s[5..7] E*STATE,s[8..9] the states bordering Texas? 5 6 7 8 9 E * n , s[ i.. j ] makeTree( arg max ( P ( s[i.. j ]) n n1 .. nt G )) 77 Extra: Computing the Most Probable Semantic Derivation contd. • Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] • This is computed recursively as follows: E*STATE,s[5..9] (STATE NEXT_TO(STATE), [5..9]) E*NEXT_TO,s[5..8] E*STATE,s[9..9] the states bordering Texas? 5 6 7 8 9 E * n , s[ i.. j ] makeTree( arg max ( P ( s[i.. j ]) n n1 .. nt G )) 78 Extra: Computing the Most Probable Semantic Derivation contd. • Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] • This is computed recursively as follows: E*STATE,s[5..9] (STATE NEXT_TO(STATE), [5..9]) E*NEXT_TO,s[i..j] E*STATE,s[i..j] the states bordering Texas? 5 6 7 8 9 E *n,s[i.. j ] makeTree( arg max ( P ( s[i.. j ]) P( E *nk , pk ))) nn1 .. nt G ( p1 ,..., pt ) partition( s[ i .. j ],t ) k 1..t 79 Extra: Computing the Most Probable Semantic Derivation contd. • Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] • This is computed recursively as follows: E*STATE,s[5..9] (STATE NEXT_TO(STATE), [5..9]) E*STATE,s[i..j] E*NEXT_TO,s[i..j] the states bordering Texas? 5 6 7 8 9 E *n,s[i.. j ] makeTree( arg max ( P ( s[i.. j ]) P( E *nk , pk ))) nn1 .. nt G ( p1 ,..., pt ) partition( s[ i .. j ],t ) k 1..t 80