Using String-Kernels for Learning Semantic Parsers Machine Learning Group

advertisement
Using String-Kernels for
Learning Semantic Parsers
Rohit J. Kate Raymond J. Mooney
Machine Learning Group
Department of Computer Sciences
University of Texas at Austin
USA
Semantic Parsing
• Semantic Parsing: Transforming natural language
(NL) sentences into computer executable complete
meaning representations (MRs) for some
application
• Example application domains
– CLang: Robocup Coach Language
– Geoquery: A Database Query Application
2
CLang: RoboCup Coach Language
• In RoboCup Coach competition teams compete to
coach simulated players [http://www.robocup.org]
• The coaching instructions are given in a formal
language called CLang [Chen et al. 2003]
If the ball is in our
goal area then
player 1 should
intercept it.
Simulated soccer field
Semantic Parsing
(bpos (goal-area our) (do our {1} intercept))
CLang
3
Geoquery: A Database Query Application
• Query application for U.S. geography database
containing about 800 facts [Zelle & Mooney, 1996]
Which rivers run
through the states
bordering Texas?
Arkansas, Canadian, Cimarron,
Gila, Mississippi, Rio Grande …
Answer
Semantic Parsing
answer(traverse(next_to(stateid(‘texas’))))
Query
4
Learning Semantic Parsers
• We assume meaning representation languages
(MRLs) have deterministic context free grammars
– True for almost all computer languages
– MRs can be parsed unambiguously
5
NL: Which rivers run through the states bordering Texas?
MR: answer(traverse(next_to(stateid(‘texas’))))
Parse tree of MR:
ANSWER
RIVER
answer
TRAVERSE
traverse
STATE
NEXT_TO
STATE
next_to
STATEID
stateid
‘texas’
Non-terminals: ANSWER, RIVER, TRAVERSE, STATE, NEXT_TO, STATEID
Terminals: answer, traverse, next_to, stateid, ‘texas’
Productions: ANSWER  answer(RIVER), RIVER  TRAVERSE(STATE),
STATE  NEXT_TO(STATE), TRAVERSE  traverse,
NEXT_TO  next_to, STATEID  ‘texas’
6
Learning Semantic Parsers
• Assume meaning representation languages
(MRLs) have deterministic context free grammars
– True for almost all computer languages
– MRs can be parsed unambiguously
• Training data consists of NL sentences paired with
their MRs
• Induce a semantic parser which can map novel NL
sentences to their correct MRs
• Learning problem differs from that of syntactic
parsing where training data has trees annotated
over the NL sentences
7
KRISP: Kernel-based Robust Interpretation
for Semantic Parsing
• Learns semantic parser from NL sentences paired
with their respective MRs given MRL grammar
• Productions of MRL are treated like semantic
concepts
• SVM classifier with string subsequence kernel is
trained for each production to identify if an NL
substring represents the semantic concept
• These classifiers are used to compositionally build
MRs of the sentences
8
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Training
Testing
Novel NL sentences
Best MRs (correct
and incorrect)
Semantic
Parser
Best MRs
9
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Training
Testing
Novel NL sentences
Best MRs (correct
and incorrect)
Semantic
Parser
Best MRs
10
KRISP’s Semantic Parsing
• We first define Semantic Derivation of an NL
sentence
• We next define Probability of a Semantic
Derivation
• Semantic parsing of an NL sentence involves
finding its Most Probable Semantic Derivation
• Straightforward to obtain MR from a semantic
derivation
11
Semantic Derivation of an NL Sentence
MR parse with non-terminals on the nodes:
ANSWER
RIVER
answer
TRAVERSE
traverse
STATE
NEXT_TO
STATE
next_to
STATEID
stateid
‘texas’
Which rivers run through the states bordering Texas?
12
Semantic Derivation of an NL Sentence
MR parse with productions on the nodes:
ANSWER  answer(RIVER)
RIVER  TRAVERSE(STATE)
TRAVERSE  traverse
STATE  NEXT_TO(STATE)
NEXT_TO  next_to
STATE  STATEID
STATEID  ‘texas’
Which rivers run through the states bordering Texas?
13
Semantic Derivation of an NL Sentence
Semantic Derivation: Each node covers an NL substring:
ANSWER  answer(RIVER)
RIVER  TRAVERSE(STATE)
TRAVERSE  traverse
STATE  NEXT_TO(STATE)
NEXT_TO  next_to
STATE  STATEID
STATEID  ‘texas’
Which rivers run through the states bordering Texas?
14
Semantic Derivation of an NL Sentence
Semantic Derivation: Each node contains a production
and the substring of NL sentence it covers:
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..4])
(STATE  NEXT_TO(STATE), [5..9])
(NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
7
8
9
15
Semantic Derivation of an NL Sentence
Substrings in NL sentence may be in a different order:
ANSWER  answer(RIVER)
RIVER  TRAVERSE(STATE)
TRAVERSE  traverse
STATE  NEXT_TO(STATE)
NEXT_TO  next_to
STATE  STATEID
STATEID  ‘texas’
Through the states that border Texas which rivers run?
16
Semantic Derivation of an NL Sentence
Nodes are allowed to permute the children productions
from the original MR parse
(ANSWER  answer(RIVER), [1..10])
(RIVER  TRAVERSE(STATE), [1..10]]
(STATE  NEXT_TO(STATE), [1..6])
(NEXT_TO  next_to, [1..5])
(TRAVERSE  traverse, [7..10])
(STATE  STATEID, [6..6])
(STATEID  ‘texas’, [6..6])
Through the states that border Texas which rivers run?
1
2
3
4
5
6
7
8
9 10
17
Probability of a Semantic Derivation
• Let Pπ(s[i..j]) be the probability that production π covers
the substring s[i..j] of sentence s
• For e.g., PNEXT_TO  next_to (“the states bordering”)
(NEXT_TO  next_to, [5..7])
0.99
the states bordering
5
6
7
• Obtained from the string-kernel-based SVM classifiers
trained for each production π
• Assuming independence, probability of a semantic
derivation D:
P ( D) 
 P (s[i.. j ])
( ,[ i .. j ])D
18
Probability of a Semantic Derivation contd.
(ANSWER  answer(RIVER), [1..9])
0.98
(RIVER  TRAVERSE(STATE), [1..9])
0.9
(TRAVERSE  traverse, [1..4])
(STATE  NEXT_TO(STATE), [5..9])
0.95
0.89
(NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9])
0.99
0.93
(STATEID  ‘texas’, [8..9])
0.98
Which rivers run through the states bordering Texas?
1
2
3
P ( D) 
4
5
6
7
8
9
 P (s[i.. j ])  0.673
( ,[ i .. j ])D
19
Computing the Most Probable Semantic
Derivation
• Task of semantic parsing is to find the most
probable semantic derivation of the NL sentence
given all the probabilities Pπ(s[i..j])
• Implemented by extending Earley’s [1970]
context-free grammar parsing algorithm
• Resembles PCFG parsing but different because:
– Probability of a production depends on which substring
of the sentence it covers
– Leaves are not terminals but substrings of words
20
Computing the Most Probable Semantic
Derivation contd.
• Does a greedy approximation search, with beam
width ω=20, and returns ω most probable
derivations it finds
• Uses a threshold θ=0.05 to prune low probability
trees
21
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Best semantic
derivations (correct
and incorrect)
Pπ(s[i..j])
Training
Testing
Novel NL sentences
Semantic
Parser
Best MRs
22
KRISP’s Training Algorithm
• Takes NL sentences paired with their respective
MRs as input
• Obtains MR parses
• Induces the semantic parser and refines it in
iterations
• In the first iteration, for every production π:
– Call those sentences positives whose MR parses use
that production
– Call the remaining sentences negatives
23
KRISP’s Training Algorithm contd.
First Iteration
STATE  NEXT_TO(STATE)
Positives
Negatives
•which rivers run through the states bordering
texas?
•what state has the highest population ?
•what is the most populated state bordering
oklahoma ?
•which states have cities named austin ?
•what states does the delaware river run through ?
•what is the largest city in states that border
california ?
•what is the lowest point of the state with the
largest area ?
…
…
String-kernel-based
SVM classifier
24
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et al.,
2002]
s = “states that are next to”
t = “the states next to”
K(s,t) = ?
25
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et al.,
2002]
s = “states that are next to”
t = “the states next to”
u = states
K(s,t) = 1+?
26
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et al.,
2002]
s = “states that are next to”
t = “the states next to”
u = next
K(s,t) = 2+?
27
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et al.,
2002]
s = “states that are next to”
t = “the states next to”
u = to
K(s,t) = 3+?
28
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et al.,
2002]
s = “states that are next to”
t = “the states next to”
u = states next
K(s,t) = 4+?
29
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et al.,
2002]
s = “states that are next to”
t = “the states next to”
K(s,t) = 7
30
String Subsequence Kernel contd.
• The kernel is normalized to remove any bias due to different
string lengths
K ( s, t )
K normalized(s, t ) 
K (s, s) * K (t , t )
• Lodhi et al. [2002] give O(n|s||t|) algorithm for computing
string subsequence kernel
• Used for Text Categorization [Lodhi et al, 2002] and
Information Extraction [Bunescu & Mooney, 2005]
31
String Subsequence Kernel contd.
• The examples are implicitly mapped to the feature space
of all subsequences and the kernel computes the dot
products
state with the capital of
states with area larger than
states through which
the states next to
states that border
states bordering
states that share border
32
Support Vector Machines
• SVMs find a separating hyperplane such that the margin
is maximized
Separating
hyperplane
state with the capital of
states that are next to
states with area larger than
states through which
0.97
the states next to
states that border
states bordering
states that share border
Probability estimate of an example belonging to a class can be
obtained using its distance from the hyperplane [Platt, 1999] 33
KRISP’s Training Algorithm contd.
First Iteration
STATE  NEXT_TO(STATE)
Positives
Negatives
•which rivers run through the states bordering
texas?
•what state has the highest population ?
•what is the most populated state bordering
oklahoma ?
•which states have cities named austin ?
•what states does the delaware river run through ?
•what is the largest city in states that border
california ?
•what is the lowest point of the state with the
largest area ?
…
…
String-kernel-based
SVM classifier
PSTATENEXT_TO(STATE) (s[i..j])
34
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Best semantic
derivations (correct
and incorrect)
Pπ(s[i..j])
Training
Testing
Novel NL sentences
Semantic
Parser
Best MRs
35
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Best semantic
derivations (correct
and incorrect)
Pπ(s[i..j])
Training
Testing
Novel NL sentences
Semantic
Parser
Best MRs
36
KRISP’s Training Algorithm contd.
• Using these classifiers Pπ(s[i..j]), obtain the ω best
semantic derivations of each training sentence
• Some of these derivations will give the correct MR, called
correct derivations, some will give incorrect MRs, called
incorrect derivations
• For the next iteration, collect positives from most probable
correct derivation
• Extended Earley’s algorithm can be forced to derive only
the correct derivations by making sure all subtrees it
generates exist in the correct MR parse
• Collect negatives from incorrect derivations with higher
probability than the most probable correct derivation
37
KRISP’s Training Algorithm contd.
Most probable correct derivation:
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..4])
(STATE  NEXT_TO(STATE), [5..9])
(NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
7
8
9
38
KRISP’s Training Algorithm contd.
Most probable correct derivation: Collect positive
examples
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..4])
(STATE  NEXT_TO(STATE), [5..9])
(NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
7
8
9
39
KRISP’s Training Algorithm contd.
Incorrect derivation with probability greater than the
most probable correct derivation:
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
Incorrect MR: answer(traverse(stateid(‘texas’)))
7
8
9
40
KRISP’s Training Algorithm contd.
Incorrect derivation with probability greater than the
most probable correct derivation: Collect negative
examples (ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
Incorrect MR: answer(traverse(stateid(‘texas’)))
7
8
9
41
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Traverse both trees in breadth-first order till the first nodes where
their productions differ are found.
42
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Traverse both trees in breadth-first order till the first nodes where
their productions differ are found.
43
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Traverse both trees in breadth-first order till the first nodes where
their productions differ are found.
44
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Traverse both trees in breadth-first order till the first nodes where
their productions differ are found.
45
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Traverse both trees in breadth-first order till the first nodes where
their productions differ are found.
46
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Mark the words under these nodes.
47
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Mark the words under these nodes.
48
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Consider all the productions covering the marked words.
Collect negatives for productions which cover any marked word
in incorrect derivation but not in the correct derivation.
49
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Consider the productions covering the marked words.
Collect negatives for productions which cover any marked word
in incorrect derivation but not in the correct derivation.
50
KRISP’s Training Algorithm contd.
Next Iteration: more refined positive and negative examples
STATE  NEXT_TO(STATE)
Positives
Negatives
•the states bordering texas?
•what state has the highest population ?
•state bordering oklahoma ?
•what states does the delaware river run through ?
•states that border california ?
•which states have cities named austin ?
•states which share border
•what is the lowest point of the state with the
largest area ?
•next to state of iowa
•which rivers run through states bordering
…
…
String-kernel-based
SVM classifier
PSTATENEXT_TO(STATE) (s[i..j])
51
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Best semantic
derivations (correct
and incorrect)
Pπ(s[i..j])
Training
Testing
Novel NL sentences
Semantic
Parser
Best MRs
52
Experimental Corpora
• CLang [Kate, Wong & Mooney, 2005]
– 300 randomly selected pieces of coaching advice from
the log files of the 2003 RoboCup Coach Competition
– 22.52 words on average in NL sentences
– 13.42 tokens on average in MRs
• Geoquery [Tang & Mooney, 2001]
– 880 queries for the given U.S. geography database
– 7.48 words on average in NL sentences
– 6.47 tokens on average in MRs
53
Experimental Methodology
• Evaluated using standard 10-fold cross validation
• Correctness
– CLang: output exactly matches the correct
representation
– Geoquery: the resulting query retrieves the same
answer as the correct representation
• Metrics
Precision 
Number of correct MRs
Number of test sentences with complete output MRs
Recall 
Number of correct MRs
Number of test sentences
54
Experimental Methodology contd.
• Compared Systems:
– CHILL [Tang & Mooney, 2001]: Inductive Logic Programming based
semantic parser
– SILT [Kate, Wong & Mooney, 2005]: learns transformation rules
relating NL sentences to MR expressions
– SCISSOR [Ge & Mooney, 2005]: learns an integrated syntacticsemantic parser, needs extra annotations
– WASP [Wong & Mooney, 2006]: uses statistical machine translation
techniques
– Zettlemoyer & Collins (2005): CCG-based semantic parser
• Different Experimental Setup (600 training, 280 testing
examples)
• Results available only for Geoquery corpus
55
Experimental Methodology contd.
• KRISP gives probabilities for its semantic
derivation which are taken as confidences of the
MRs
• We plot precision-recall curves by first sorting the
best MR for each sentence by confidences and
then finding precision for every recall value
• WASP and SCISSOR also output confidences so
we show their precision-recall curves
• Results of other systems shown as points on
precision-recall graphs
56
Results on CLang
requires more annotation on
the training corpus
CHILL gives 49.2% precision and 12.67% recall with 160 examples, can’t run beyond.
57
Results on Geoquery
58
Experiments with Noisy NL Sentences
• Any application of semantic parser is likely to face
noise in the input
• If the input is coming from a speech recognizer:
– Interjections (um’s and ah’s)
– Environment noise (door slams, phone rings etc.)
– Out-of-domain words, ill-formed utterances etc.
• KRISP does not use hard-matching rules unlike
other systems and is hence more robust to noise
• We show this by introducing simulated speech
recognition errors in the corpus
59
Experiments with Noisy NL Sentences contd.
• Interjections, environment noise etc. is likely to be
recognized as real words, simulate this by adding
a word with probability Padd after every word
– An extra word w is added with probability P(w)
proportional to its frequencies in the BNC
• A speech recognizer may completely fail to detect
a word, so with probability Pdrop a word is dropped
you
If the ball is in our goal area then our player 1
should intercept it.
60
Experiments with Noisy NL Sentences contd.
• A speech recognizer may confuse a word with a
high frequency phonetically close word, a word is
substituted by another word w with probability:
ped(w)*P(w)
– where p is a parameter in [0,1]
– ed(w) is w’s edit distance from the original word
[Levenshtein, 1966]
– P(w) is w’s probability proportional to its frequency in
BNC
you
when
If the ball is in our goal area then our
should intercept it.
1
61
Experiments with Noisy NL Sentences contd.
• Four noise levels were created by:
– Varying parameters Padd and Pdrop from being 0 at level
zero to 0.1 at level four
– Varying parameter p from being 0 at level zero to 0.01
at level four
• Results shown when only test sentences are
corrupted, qualitatively similar results when both
test and train sentences are corrupted
• We show best F-measures (harmonic mean of
precision and recall)
62
Results on Noisy CLang Corpus
63
Conclusions
• KRISP: A new string-kernel-based approach for
learning semantic parser
• String-kernel-based SVM classifiers trained for
each MRL production
• Classifiers used to compositionally build complete
MRs of NL sentences
• Evaluated on two real-world corpora
– Performs better than rule-based systems
– Performs comparable to other statistical systems
– More robust to noise
64
Thank You!
Our corpora can be downloaded from:
http://www.cs.utexas.edu/~ml/nldata.html
Check out our online demo for Geoquery at:
http://www.cs.utexas.edu/~ml/geo.html
Questions??
65
Extra: Experiments with Other Natural
Languages
66
Extra: Dealing with Constants
• MRL grammar may contain productions
corresponding to constants in the domain:
STATEID  ‘new york’ RIVERID  ‘colorado’
NUM  ‘2’ STRING  ‘DR4C10’
• User can specify these as constant productions
giving their NL substrings
• Classifiers are not learned for these productions
• Matching substring’s probability is taken as 1
• If n constant productions have same substring then
each gets probability of 1/n
STATEID  ‘colorado’ RIVERID  ‘colorado’
67
Extra: String Subsequence Kernel
• Subsequences with gaps should be downweighted
• Decay factor λ in the range of (0,1] penalizes gaps
• All subsequences are the implicit features and
penalties are the feature values
s = “left side of our penalty area”
t = “our left penalty area”
u = left penalty
K(s,t) = 4+?
68
Extra: String Subsequence Kernel
• Subsequences with gaps should be downweighted
• Decay factor λ in the range of (0,1] penalizes gaps
• All subsequences are the implicit features and
penalties are the feature values
Gap of 3 => λ3
s = “left side of our penalty area”
Gap of 0 => λ0
t = “our left penalty area”
u = left penalty
K(s,t) = 4+λ3*λ0 +?
69
Extra: String Subsequence Kernel
• Subsequences with gaps should be downweighted
• Decay factor λ in the range of (0,1] penalizes gaps
• All subsequences are the implicit features and
penalties are the feature values
s = “left side of our penalty area”
t = “our left penalty area”
K(s,t) = 4+3λ+3 λ3+ λ5
70
Extra: KRISP’s Average Running Times
Corpus
Average Training
Time (minutes)
Average Testing
Time (minutes)
Geo250
1.44
0.05
Geo880
18.1
0.65
CLang
58.85
3.18
Average running times per fold in minutes taken by KRISP.
71
Extra: Experimental Methodology
• Correctness
– CLang: output exactly matches the correct
representation
– Geoquery: the resulting query retrieves the same
answer as the correct representation
If the ball is in our penalty area, all our players
except player 4 should stay in our half.
Correct:
((bpos (penalty-area our))
(do (player-except our{4}) (pos (half
our)))
((bpos (penalty-area opp))
Output:
(do (player-except our{4}) (pos (half
our)))
72
Extra: Computing the Most Probable
Semantic Derivation
• Task of semantic parsing is to find the most probable
semantic derivation of the NL sentence
• Let En,s[i..j], partial derivation, denote any subtree of a
derivation tree with n as the LHS non-terminal of the root
production covering sentence s from index i to j
• Example of ESTATE,s[5..9] :
(STATE  NEXT_TO(STATE), [5..9])
(NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
the states bordering Texas?
5
6
7
8
9
• Derivation D is then EANSWER, s[1..|s|]
73
Extra: Computing the Most Probable
Semantic Derivation contd.
• Let E*STATE,s[5.,9], denote the most probable partial derivation
among all ESTATE,s[5.,9]
• This is computed recursively as follows:
E*STATE,s[5..9]
(STATE  NEXT_TO(STATE), [5..9])
E*NEXT_TO,s[i..j]
E*STATE,s[i..j]
the states bordering Texas?
5
6
7
8
9
E * n , s[ i.. j ]  makeTree( arg max ( P ( s[i.. j ])
  n  n1 .. nt G
))
74
Extra: Computing the Most Probable
Semantic Derivation contd.
• Let E*STATE,s[5.,9], denote the most probable partial derivation
among all ESTATE,s[5.,9]
• This is computed recursively as follows:
E*STATE,s[5..9]
(STATE  NEXT_TO(STATE), [5..9])
E*NEXT_TO,s[5..5]
E*STATE,s[6..9]
the states bordering Texas?
5
6
7
8
9
E * n , s[ i.. j ]  makeTree( arg max ( P ( s[i.. j ])
  n  n1 .. nt G
))
75
Extra: Computing the Most Probable
Semantic Derivation contd.
• Let E*STATE,s[5.,9], denote the most probable partial derivation
among all ESTATE,s[5.,9]
• This is computed recursively as follows:
E*STATE,s[5..9]
(STATE  NEXT_TO(STATE), [5..9])
E*NEXT_TO,s[5..6]
E*STATE,s[7..9]
the states bordering Texas?
5
6
7
8
9
E * n , s[ i.. j ]  makeTree( arg max ( P ( s[i.. j ])
  n  n1 .. nt G
))
76
Extra: Computing the Most Probable
Semantic Derivation contd.
• Let E*STATE,s[5.,9], denote the most probable partial derivation
among all ESTATE,s[5.,9]
• This is computed recursively as follows:
E*STATE,s[5..9]
(STATE  NEXT_TO(STATE), [5..9])
E*NEXT_TO,s[5..7]
E*STATE,s[8..9]
the states bordering Texas?
5
6
7
8
9
E * n , s[ i.. j ]  makeTree( arg max ( P ( s[i.. j ])
  n  n1 .. nt G
))
77
Extra: Computing the Most Probable
Semantic Derivation contd.
• Let E*STATE,s[5.,9], denote the most probable partial derivation
among all ESTATE,s[5.,9]
• This is computed recursively as follows:
E*STATE,s[5..9]
(STATE  NEXT_TO(STATE), [5..9])
E*NEXT_TO,s[5..8]
E*STATE,s[9..9]
the states bordering Texas?
5
6
7
8
9
E * n , s[ i.. j ]  makeTree( arg max ( P ( s[i.. j ])
  n  n1 .. nt G
))
78
Extra: Computing the Most Probable
Semantic Derivation contd.
• Let E*STATE,s[5.,9], denote the most probable partial derivation
among all ESTATE,s[5.,9]
• This is computed recursively as follows:
E*STATE,s[5..9]
(STATE  NEXT_TO(STATE), [5..9])
E*NEXT_TO,s[i..j]
E*STATE,s[i..j]
the states bordering Texas?
5
6
7
8
9
E *n,s[i.. j ]  makeTree( arg max ( P ( s[i.. j ])  P( E *nk , pk )))
  nn1 .. nt G
( p1 ,..., pt )
partition( s[ i .. j ],t )
k 1..t
79
Extra: Computing the Most Probable
Semantic Derivation contd.
• Let E*STATE,s[5.,9], denote the most probable partial derivation
among all ESTATE,s[5.,9]
• This is computed recursively as follows:
E*STATE,s[5..9]
(STATE  NEXT_TO(STATE), [5..9])
E*STATE,s[i..j]
E*NEXT_TO,s[i..j]
the states bordering Texas?
5
6
7
8
9
E *n,s[i.. j ]  makeTree( arg max ( P ( s[i.. j ])  P( E *nk , pk )))
  nn1 .. nt G
( p1 ,..., pt )
partition( s[ i .. j ],t )
k 1..t
80
Download