Learning for Semantic Parsing with Kernels under Various Forms of Supervision

advertisement
Learning for Semantic Parsing
with Kernels under Various
Forms of Supervision
Rohit J. Kate
Ph.D. Final Defense
Supervisor: Raymond J. Mooney
Machine Learning Group
Department of Computer Sciences
University of Texas at Austin
Semantic Parsing
• Semantic Parsing: Transforming natural language
(NL) sentences into computer executable complete
meaning representations (MRs) for domain-specific
applications
• Requires deeper semantic analysis than other
semantic tasks like semantic role labeling, word
sense disambiguation, information extraction
• Example application domains
– CLang: Robocup Coach Language
– Geoquery: A Database Query Application
2
CLang: RoboCup Coach Language
• In RoboCup Coach competition teams compete to
coach simulated players [http://www.robocup.org]
• The coaching instructions are given in a formal
language called CLang [Chen et al. 2003]
If the ball is in our
goal area then player
1 should intercept it.
Simulated soccer field
Semantic Parsing
(bpos (goal-area our) (do our {1} intercept))
CLang
3
Geoquery: A Database Query Application
• Query application for U.S. geography database
containing about 800 facts [Zelle & Mooney, 1996]
Which rivers run
through the states
bordering Texas?
Arkansas, Canadian, Cimarron,
Gila, Mississippi, Rio Grande …
Answer
Semantic Parsing
answer(traverse(next_to(stateid(‘texas’))))
Query
4
Engineering Motivation for Semantic Parsing
• Most computational language-learning research
analyzes open-domain text but the analysis is
shallow
• Realistic semantic parsing currently entails domain
dependence
• Applications of domain-dependent semantic
parsing
–
–
–
–
Natural language interfaces to computing systems
Communication with robots in natural language
Personalized software assistants
Question-answering systems
• Machine Learning makes developing semantic
parsers for specific applications more tractable
5
Cognitive Science Motivation for Semantic
Parsing
• Most natural-language learning methods require
supervised training data that is not available to a
child
– No POS-tagged or treebank data
• Assuming a child can infer the likely meaning of
an utterance from context, NL-MR pairs are more
cognitively plausible training data
6
Thesis Contributions
• A new framework for learning for semantic parsing based
on kernel-based string classification
– Requires no feature engineering
– Does not use any hard-matching rules or any grammar rules for
natural language which makes it robust
• First semi-supervised learning system for semantic parsing
• Considers learning for semantic parsing under cognitively
motivated weaker and more general form of ambiguous
supervision
• Introduces transformations for meaning representation
grammars to make them conform better with natural
language semantics
7
Outline
• KRISP: A Semantic Parsing Learning System
• Utilizing Weaker Forms of Supervision
– Semi-supervision
– Ambiguous supervision
• Transforming meaning representation grammar
• Directions for Future Work
• Conclusions
8
KRISP: Kernel-based Robust Interpretation
for Semantic Parsing [Kate & Mooney, 2006]
• Learns semantic parser from NL sentences paired
with their respective MRs given meaning
representation language (MRL) grammar
• Productions of MRL are treated like semantic
concepts
• SVM classifier with string subsequence kernel is
trained for each production to identify if an NL
substring represents the semantic concept
• These classifiers are used to compositionally build
MRs of the sentences
9
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Training
Testing
Novel NL sentences
Best MRs (correct
and incorrect)
Semantic
Parser
Best MRs
10
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Training
Testing
Novel NL sentences
Best MRs (correct
and incorrect)
Semantic
Parser
Best MRs
11
Meaning Representation Language
MR: answer(traverse(next_to(stateid(‘texas’))))
Parse tree of MR:
ANSWER
 answer(RIVER)
ANSWER
answer
RIVER
RIVER  TRAVERSE(STATE)
TRAVERSE
TRAVERSE  traverse traverse
STATE
STATE  NEXT_TO(STATE)
NEXT_TO
STATE
NEXT_TOnext_to
 next_to
STATE  STATEID
STATEID
stateid
Productions: ANSWER  answer(RIVER)
STATE  NEXT_TO(STATE)
NEXT_TO  next_to
STATEID
‘texas’  ‘texas’
RIVER  TRAVERSE(STATE)
TRAVERSE  traverse
STATEID  ‘texas’
12
Semantic Parsing by KRISP
• SVM classifier for each production gives the
probability that a substring represents the semantic
concept of the production
NEXT_TO  next_toNEXT_TO
0.02
 next_to
NEXT_TO
0.01  next_to 0.95
Which rivers run through the states bordering Texas?
13
Semantic Parsing by KRISP
• SVM classifier for each production gives the
probability that a substring represents the semantic
concept of the production
TRAVERSE
 traverse
0.91 0.21
TRAVERSE
 traverse
Which rivers run through the states bordering Texas?
14
Semantic Parsing by KRISP
• Semantic parsing is done by finding the most
probable derivation of the sentence [Kate & Mooney
2006]
ANSWER  answer(RIVER) 0.89
RIVER  TRAVERSE(STATE) 0.92
TRAVERSE  traverse 0.91
STATE  NEXT_TO(STATE) 0.81
NEXT_TO  next_to 0.95
STATE  STATEID 0.98
STATEID  ‘texas’ 0.99
Which rivers run through the states bordering Texas?
Probability of the derivation is the product of the probabilities
15
at the nodes.
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Best semantic
derivations (correct
and incorrect)
Classification probabilities
Training
Testing
Novel NL sentences
Semantic
Parser
Best MRs
16
KRISP’s Training Algorithm
• Takes NL sentences paired with their respective
MRs as input
• Obtains MR parses
• Induces the semantic parser and refines it in
iterations
• In the first iteration, for every production:
– Call those sentences positives whose MR parses use
that production
– Call the remaining sentences negatives
17
KRISP’s Training Algorithm contd.
First Iteration
STATE  NEXT_TO(STATE)
Positives
Negatives
•which rivers run through the states bordering
texas?
•what state has the highest population ?
•what is the most populated state bordering
oklahoma ?
•which states have cities named austin ?
•what states does the delaware river run through ?
•what is the largest city in states that border
california ?
•what is the lowest point of the state with the
largest area ?
…
…
String-kernel-based
SVM classifier
18
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et al.,
2002]
s = “states that are next to”
t = “the states next to”
K(s,t) = ?
19
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et al.,
2002]
s = “states that are next to”
t = “the states next to”
u = states
K(s,t) = 1+?
20
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et al.,
2002]
s = “states that are next to”
t = “the states next to”
u = next
K(s,t) = 2+?
21
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et al.,
2002]
s = “states that are next to”
t = “the states next to”
u = to
K(s,t) = 3+?
22
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et al.,
2002]
s = “states that are next to”
t = “the states next to”
u = states next
K(s,t) = 4+?
23
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et al.,
2002]
s = “states that are next to”
t = “the states next to”
K(s,t) = 7
24
String Subsequence Kernel contd.
• The kernel is normalized to remove any bias due to different
string lengths
K ( s, t )
K normalized(s, t ) 
K (s, s) * K (t , t )
• Lodhi et al. [2002] give O(n|s||t|) algorithm for computing
string subsequence kernel
• Used for Text Categorization [Lodhi et al, 2002] and
Information Extraction [Bunescu & Mooney, 2005]
25
String Subsequence Kernel contd.
• The examples are implicitly mapped to the feature space
of all subsequences and the kernel computes the dot
STATE  NEXT_TO(STATE)
products
state with the capital of
states with area larger than
states through which
the states next to
states that border
states bordering
states that share border
26
Support Vector Machines
• SVMs find a separating hyperplane such that the margin
is maximized
state with the capital of
STATE  NEXT_TO(STATE)
Separating
hyperplane
states that are next to
states with area larger than
states through which
0.97
the states next to
states that border
states bordering
states that share border
Probability estimate of an example belonging to a class can be
obtained using its distance from the hyperplane [Platt, 1999] 27
KRISP’s Training Algorithm contd.
First Iteration
STATE  NEXT_TO(STATE)
Positives
Negatives
•which rivers run through the states bordering
texas?
•what state has the highest population ?
•what is the most populated state bordering
oklahoma ?
•which states have cities named austin ?
•what states does the delaware river run through ?
•what is the largest city in states that border
california ?
•what is the lowest point of the state with the
largest area ?
…
…
String-kernel-based
SVM classifier
Clasification probabilities
28
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Best semantic
derivations (correct
and incorrect)
Classification probabilities
Training
Testing
Novel NL sentences
Semantic
Parser
Best MRs
29
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Best semantic
derivations (correct
and incorrect)
Classification probabilities
Training
Testing
Novel NL sentences
Semantic
Parser
Best MRs
30
KRISP’s Training Algorithm contd.
• Using these classifiers, obtain the ω best semantic
derivations of each training sentence
• Some of these derivations will give the correct
MR, called correct derivations, some will give
incorrect MRs, called incorrect derivations
• For the next iteration, collect positives from most
probable correct derivation
• Collect negatives from incorrect derivations with
higher probability than the most probable correct
derivation
31
KRISP’s Training Algorithm contd.
Most probable correct derivation:
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..4])
(STATE  NEXT_TO(STATE), [5..9])
(NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
7
8
9
32
KRISP’s Training Algorithm contd.
Most probable correct derivation: Collect positive
examples
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..4])
(STATE  NEXT_TO(STATE), [5..9])
(NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
7
8
9
33
KRISP’s Training Algorithm contd.
Most probable correct derivation: Collect positive
examples
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..4])
(STATE  NEXT_TO(STATE), [5..9])
(NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
7
8
9
34
KRISP’s Training Algorithm contd.
Incorrect derivation with probability greater than the
most probable correct derivation:
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
Incorrect MR: answer(traverse(stateid(‘texas’)))
7
8
9
35
KRISP’s Training Algorithm contd.
Incorrect derivation with probability greater than the
most probable correct derivation: Collect negative
examples (ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
Incorrect MR: answer(traverse(stateid(‘texas’)))
7
8
9
36
KRISP’s Training Algorithm contd.
Next Iteration: more refined positive and negative examples
STATE  NEXT_TO(STATE)
Positives
Negatives
•the states bordering texas?
•what state has the highest population ?
•state bordering oklahoma ?
•what states does the delaware river run through ?
•states that border california ?
•which states have cities named austin ?
•states which share border
•what is the lowest point of the state with the
largest area ?
•next to state of iowa
•which rivers run through states bordering
…
…
String-kernel-based
SVM classifier
Better classification probabilities
37
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Best semantic
derivations (correct
and incorrect)
Classification probabilities
Training
Testing
Novel NL sentences
Semantic
Parser
Best MRs
38
Experimental Corpora
• CLang [Kate, Wong & Mooney, 2005]
– 300 randomly selected pieces of coaching advice from the log files
of the 2003 RoboCup Coach Competition
– 22.52 words on average in NL sentences
– 13.42 tokens on average in MRs
• Geoquery [Tang & Mooney, 2001]
– 880 queries for the given U.S. geography database
– 7.48 words on average in NL sentences
– 6.47 tokens on average in MRs
39
Experimental Methodology
• Evaluated using standard 10-fold cross validation
• Correctness
– CLang: output exactly matches the correct
representation
– Geoquery: the resulting query retrieves the same
answer as the correct representation
• Metrics
Precision 
Number of correct MRs
Number of test sentences with complete output MRs
Recall 
Number of correct MRs
Number of test sentences
40
Experimental Methodology contd.
• Compared Systems:
– CHILL [Tang & Mooney, 2001]: Inductive Logic
Programming based semantic parser
– SCISSOR [Ge & Mooney, 2005]: learns an integrated
syntactic-semantic parser, needs extra annotations
– WASP [Wong & Mooney, 2006]: uses statistical machine
translation techniques
– Zettlemoyer & Collins (2007): Combinatory
Categorial Grammar (CCG) based semantic parser
• Different Experimental Setup (600 training, 280
testing examples)
• Requires an initial hand-built lexicon
41
Experimental Methodology contd.
• KRISP gives probabilities for its semantic
derivation which are taken as confidences of the
MRs
• We plot precision-recall curves by first sorting the
best MR for each sentence by confidences and
then finding precision for every recall value
• WASP and SCISSOR also output confidences so
we show their precision-recall curves
• Results of other systems shown as points on
precision-recall graphs
42
Results on CLang
requires more annotation on
the training corpus
CHILL gives 49.2% precision and 12.67% recall with 160 examples, can’t run beyond.
43
Results on Geoquery
44
Robustness of KRISP
• KRISP does not use grammar rules for natural
language
• String-kernel-based classification softly captures
wide range of natural language expressions
 Robust to rephrasing and noise
45
Robustness of KRISP
• KRISP does not use grammar rules for natural
language
• String-kernel-based classification softly captures
wide range of natural language expressions
 Robust to rephrasing and noise
TRAVERSE  traverse 0.95
Which rivers run through the states bordering Texas?
46
Robustness of KRISP
• KRISP does not use grammar rules for natural
language
• String-kernel-based classification softly captures
wide range of natural language expressions
 Robust to rephrasing and noise
TRAVERSE  traverse 0.78
Which are the rivers that run through the states
bordering Texas?
47
Robustness of KRISP
• KRISP does not use grammar rules for natural
language
• String-kernel-based classification softly captures
wide range of natural language expressions
 Robust to rephrasing and noise
TRAVERSE  traverse 0.68
Which rivers run though the states bordering Texas?
48
Robustness of KRISP
• KRISP does not use grammar rules for natural
language
• String-kernel-based classification softly captures
wide range of natural language expressions
 Robust to rephrasing and noise
TRAVERSE  traverse 0.65
Which rivers
through the states bordering Texas?
49
Robustness of KRISP
• KRISP does not use grammar rules for natural
language
• String-kernel-based classification softly captures
wide range of natural language expressions
 Robust to rephrasing and noise
TRAVERSE  traverse 0.81
Which rivers ahh.. run through the states bordering Texas?
50
Experiments with Noisy NL Sentences
• Any application of semantic parser is likely to face
noise in the input
• If the input is coming from a speech recognizer:
– Interjections (um’s and ah’s)
– Environment noise (door slams, phone rings etc.)
– Out-of-domain words, ill-formed utterances etc.
• We demonstrate robustness of KRISP by
introducing simulated speech recognition errors in
the corpus
51
Experiments with Noisy NL Sentences contd.
• Noise was introduced in the NL sentences by:
– Adding extra words chosen according to their frequencies in the
BNC
– Dropping words randomly
– Substituting words with phonetically close high frequency words
• Four levels of noise was created by increasing the
probabilities of the above
• Results shown when only test sentences are corrupted,
qualitatively similar results when both test and train
sentences are corrupted
• We show best F-measures (harmonic mean of precision
and recall)
52
Results on Noisy CLang Corpus
100
90
80
F-measure
70
60
KRISP
WASP
SCISSOR
50
40
30
20
10
0
0
1
2
3
Noise Level
4
5
53
Outline
• KRISP: A Supervised Learning System
• Utilizing Weaker Forms of Supervision
– Semi-supervision
– Ambiguous supervision
• Transforming meaning representation grammar
• Directions for Future Work
• Conclusions
54
Semi-Supervised Semantic Parsing
• Building annotated training data is expensive
• Utilize NL sentences not annotated with their
MRs, usually cheaply available
• KRISP can be turned into a semi-supervised
learner if the SVM classifiers are given
appropriate unlabeled examples
• Which substrings should be the unlabeled
examples for which productions’ SVMs?
55
SEMISUP-KRISP: Semi-Supervised
Semantic Parser Learner
[Kate & Mooney, 2007a]
• First learns a semantic parser from the supervised
data using KRISP
56
SEMISUP-KRISP: Semi-Supervised
Semantic Parser Learner contd.
Semantic Parsing
KRISP
Supervised Corpus
Which rivers run through the states bordering Texas?
answer(traverse(next_to(stateid(‘texas’))))
What is the lowest point of the state with the largest area?
answer(lowest(place(loc(largest_one(area(state(all)))))))
What is the largest city in states that border California?
answer(largest(city(loc(next_to(stateid( 'california'))))))
Collect
labeled
examples
SVM classifiers
……
---
Unsupervised Corpus
Which states have a city named Springfield?
What is the capital of the most populous state?
+
++
+
+ +
How many rivers flow through Mississippi?
How many states does the Mississippi run through?
How high is the highest point in the smallest state?
Which rivers flow through the states that border California?
…….
Semantic Parsing
57
SEMISUP-KRISP: Semi-Supervised
Semantic Parser Learner
• First learns a semantic parser from the supervised
data using KRISP
• Applies the learned parser on the unsupervised NL
sentences
• Whenever an SVM classifier is called to estimate
the probability of a substring, that substring
becomes an unlabeled example for that classifier
TRAVERSE
 traverse
 next_to of examples
• These
substrings
are NEXT_TO
representative
that the classifiers will encounter during testing
Which rivers run through the states bordering Texas?
58
SVMs with Unlabeled Examples
Production: NEXT_TO  next_to
-
state with the capital of
area larger than
through which
- -
Separating
hyperplane
+
-
states that are next to
+
+
the states next to
states that border
+
the states bordering
+
+
+
states that share border
59
SVMs with Unlabeled Examples
Production: NEXT_TO  next_to
-
+
- - -
+
+
+
+
+
+
Using unlabeled test examples during training can help find a
better hyperplane [Joachims 1999]
60
Transductive SVMs contd.
• Find a labeling that separates all the examples with
maximum margin
• Finding the exact solution is intractable but
approximation algorithms exist [Joachims 1999],
[Chen et al. 2003], [Collobert et al. 2006]
61
SEMISUP-KRISP: Semi-Supervised
Semantic Parser Learner contd.
Semantic Parsing
Supervised Corpus
Which rivers run through the states bordering Texas?
answer(traverse(next_to(stateid(‘texas’))))
What is the lowest point of the state with the largest area?
answer(lowest(place(loc(largest_one(area(state(all)))))))
What is the largest city in states that border California?
answer(largest(city(loc(next_to(stateid( 'california'))))))
Collect
labeled
examples
Transductive
SVM classifiers
……
---
Unsupervised Corpus
Which states have a city named Springfield?
What is the capital of the most populous state?
How many rivers flow through Mississippi?
How many states does the Mississippi run through?
How high is the highest point in the smallest state?
Collect
unlabeled
examples
+
++
+
+ +
Learned
Semantic
parser
Which rivers flow through the states that border California?
…….
Semantic Parsing
62
Experiments
• Compared the performance of SEMISUP-KRISP
and KRISP on the Geoquery domain
• Corpus contains 250 NL sentences annotated with
their correct MRs
• Collected 1037 unannotated sentences from our
web-based demo
• Evaluated by 10-fold cross validation keeping the
unsupervised data same in each fold
• Increased the amount of supervised training data
and measured the best F-measure
63
Results
Learning curves on the Geoquery corpus
100
90
Best F-measure
80
70
60
SEMISUP-KRISP
50
KRISP
40
30
20
10
0
0
40
80
120
160
200
240
No. of supervised training examples
64
Results
Learning curves on the Geoquery corpus
100
90
Best F-measure
80
70
60
SEMISUP-KRISP
50
KRISP
GEOBASE
40
25%
saving
30
20
10
0
0
40
80
120
160
200
240
No. of supervised training examples
GEOBASE: Hand-built semantic parser [Borland International, 1988]
65
Outline
• KRISP: A Supervised Learning System
• Utilizing Weaker Forms of Supervision
– Semi-supervision
– Ambiguous supervision
• Transforming meaning representation grammar
• Directions for Future Work
• Conclusions
66
Unambiguous Supervision for Learning
Semantic Parsers
• The training data for semantic parsing consists of
hundreds of natural language sentences
unambiguously paired with their meaning
representations
67
Unambiguous Supervision for Learning
Semantic Parsers
• The training data for semantic parsing consists of
hundreds of natural language sentences
unambiguously paired with their meaning
representations
Which rivers run through the states bordering Texas?
answer(traverse(next_to(stateid(‘texas’))))
What is the lowest point of the state with the largest area?
answer(lowest(place(loc(largest_one(area(state(all)))))))
What is the largest city in states that border California?
answer(largest(city(loc(next_to(stateid( 'california'))))))
……
68
Shortcomings of Unambiguous Supervision
• It requires considerable human effort to annotate
each sentence with its correct meaning
representation
• Does not model the type of supervision children
receive when they are learning a language
– Children are not taught meanings of individual
sentences
– They learn to identify the correct meaning of a sentence
from several meanings possible in their perceptual
context
69
“Mary is on the phone”
???
70
Ambiguous Supervision for Learning
Semantic Parsers
• A computer system simultaneously exposed to perceptual
contexts and natural language utterances should be able to
learn the underlying language semantics
• We consider ambiguous training data of sentences associated
with multiple potential meaning representations
– Siskind (1996) uses this type “referentially uncertain” training data to
learn meanings of words
• Capturing meaning representations from perceptual contexts
is a difficult unsolved problem
– Our system directly works with symbolic meaning representations
71
“Mary is on the phone”
???
72
???
“Mary is on the phone”
73
Ironing(Mommy, Shirt)
???
“Mary is on the phone”
74
Ironing(Mommy, Shirt)
Working(Sister, Computer)
???
“Mary is on the phone”
75
Ironing(Mommy, Shirt)
Carrying(Daddy, Bag)
Working(Sister, Computer)
???
“Mary is on the phone”
76
Ambiguous Training Example
Ironing(Mommy, Shirt)
Carrying(Daddy, Bag)
Working(Sister, Computer)
Talking(Mary, Phone)
Sitting(Mary, Chair)
???
“Mary is on the phone”
77
Next Ambiguous Training Example
Ironing(Mommy, Shirt)
Working(Sister, Computer)
Talking(Mary, Phone)
Sitting(Mary, Chair)
???
“Mommy is ironing shirt”
78
Ambiguous Supervision for Learning
Semantic Parsers contd.
• Our model of ambiguous supervision corresponds
to the type of data that will be gathered from a
temporal sequence of perceptual contexts with
occasional language commentary
• We assume each sentence has exactly one meaning
in a perceptual context
• Each meaning is associated with at most one
sentence in a perceptual context
79
Sample Ambiguous Corpus
gave(daisy, clock, mouse)
Daisy gave the clock to the mouse.
ate(mouse, orange)
ate(dog, apple)
Mommy saw that Mary gave the
hammer to the dog.
saw(mother,
gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
threw(dog, ball)
runs(dog)
The dog threw the ball.
saw(john, walks(man, dog))
Forms a bipartite graph
80
KRISPER: KRISP with EM-like Retraining
•
Extension of KRISP that learns from ambiguous
supervision
•
Uses an iterative EM-like method to gradually
converge on a correct meaning for each sentence
•
Given a sentence and a meaning representation,
KRISP can also find the probability that it is the
correct meaning representation for the sentence
81
KRISPER’s Training Algorithm
1. Assume every possible meaning for a sentence is correct
gave(daisy, clock, mouse)
Daisy gave the clock to the mouse.
ate(mouse, orange)
ate(dog, apple)
Mommy saw that Mary gave the
hammer to the dog.
saw(mother,
gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
The dog threw the ball.
threw(dog, ball)
runs(dog)
saw(john, walks(man, dog))
82
KRISPER’s Training Algorithm
1. Assume every possible meaning for a sentence is correct
gave(daisy, clock, mouse)
Daisy gave the clock to the mouse.
ate(mouse, orange)
ate(dog, apple)
Mommy saw that Mary gave the
hammer to the dog.
saw(mother,
gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
The dog threw the ball.
threw(dog, ball)
runs(dog)
saw(john, walks(man, dog))
83
KRISPER’s Training Algorithm contd.
2. Resulting NL-MR pairs are weighted and given to KRISP
gave(daisy, clock, mouse)
1/2
Daisy gave the clock to the mouse.
1/2
1/4
1/4
Mommy saw that Mary gave the
1/4
hammer to the dog.
1/4
The dog broke the box.
1/5 1/5
1/5
1/5 1/5
1/3 1/3
John gave the bag to the mouse.
1/3
1/3
The dog threw the ball.
1/3
1/3
ate(mouse, orange)
ate(dog, apple)
saw(mother,
gave(mary, dog, hammer))
broke(dog, box)
gave(woman, toy, mouse)
gave(john, bag, mouse)
threw(dog, ball)
runs(dog)
saw(john, walks(man, dog))
84
KRISPER’s Training Algorithm contd.
3. Estimate the confidence of each NL-MR pair using the
gave(daisy, clock, mouse)
resulting parser
Daisy gave the clock to the mouse.
ate(mouse, orange)
ate(dog, apple)
Mommy saw that Mary gave the
hammer to the dog.
saw(mother,
gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
The dog threw the ball.
threw(dog, ball)
runs(dog)
saw(john, walks(man, dog))
85
KRISPER’s Training Algorithm contd.
3. Estimate the confidence of each NL-MR pair using the
gave(daisy, clock, mouse)
resulting parser
0.92
Daisy gave the clock to the mouse.
0.11
0.32
0.88
Mommy saw that Mary gave the
0.22
hammer to the dog.
0.24
0.71 0.18
0.85
The dog broke the box.
0.14
0.95
0.24 0.89
John gave the bag to the mouse.
0.33
0.97
The dog threw the ball.
0.81
0.34
ate(mouse, orange)
ate(dog, apple)
saw(mother,
gave(mary, dog, hammer))
broke(dog, box)
gave(woman, toy, mouse)
gave(john, bag, mouse)
threw(dog, ball)
runs(dog)
saw(john, walks(man, dog))
86
KRISPER’s Training Algorithm contd.
4. Use maximum weighted matching on a bipartite graph
to find the best NL-MR pairs [Munkres, 1957]
gave(daisy, clock, mouse)
0.92
Daisy gave the clock to the mouse.
0.11
0.32
0.88
Mommy saw that Mary gave the
0.22
hammer to the dog.
0.24
0.71 0.18
0.85
The dog broke the box.
0.14
0.95
0.24 0.89
John gave the bag to the mouse.
0.33
0.97
The dog threw the ball.
0.81
0.34
ate(mouse, orange)
ate(dog, apple)
saw(mother,
gave(mary, dog, hammer))
broke(dog, box)
gave(woman, toy, mouse)
gave(john, bag, mouse)
threw(dog, ball)
runs(dog)
saw(john, walks(man, dog))
87
KRISPER’s Training Algorithm contd.
4. Use maximum weighted matching on a bipartite graph
to find the best NL-MR pairs [Munkres, 1957]
gave(daisy, clock, mouse)
0.92
Daisy gave the clock to the mouse.
0.11
0.32
0.88
Mommy saw that Mary gave the
0.22
hammer to the dog.
0.24
0.71 0.18
0.85
The dog broke the box.
0.14
0.95
0.24 0.89
John gave the bag to the mouse.
0.33
0.97
The dog threw the ball.
0.81
0.34
ate(mouse, orange)
ate(dog, apple)
saw(mother,
gave(mary, dog, hammer))
broke(dog, box)
gave(woman, toy, mouse)
gave(john, bag, mouse)
threw(dog, ball)
runs(dog)
saw(john, walks(man, dog))
88
KRISPER’s Training Algorithm contd.
5. Give the best pairs to KRISP in the next iteration,
continue till converges
gave(daisy, clock, mouse)
Daisy gave the clock to the mouse.
ate(mouse, orange)
ate(dog, apple)
Mommy saw that Mary gave the
hammer to the dog.
saw(mother,
gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
The dog threw the ball.
threw(dog, ball)
runs(dog)
saw(john, walks(man, dog))
89
Ambiguous Corpus Construction
• There is no real-world ambiguous corpus yet
available for semantic parsing to our knowledge
• We artificially obfuscated the real-world
unambiguous corpus by adding extra distracter
MRs to each training pair (Ambig-Geoquery)
• We also created an artificial ambiguous corpus
(Ambig-ChildWorld) which more accurately
models real-world ambiguities in which potential
candidate MRs are often related
90
Ambiguity in Corpora
• Three levels of ambiguity were created:
MRs per NL 1
2
3
4
5
Level 1
25% 50% 25%
Level 2
11% 22% 34% 22% 11%
Level 3
6%
6
7
13% 19% 26% 18% 12% 6%
91
Results on Ambig-Geoquery Corpus
100
90
Best F-measure
80
70
60
No ambiguity
Level 1 ambiguity
50
Level 2 ambiguity
Level 3 ambiguity
40
30
20
10
0
225
450
675
900
Number of training examples
92
Results on Ambig-ChildWorld Corpus
100
90
Best F-measure
80
70
60
No ambiguity
Level 1 ambiguity
50
Level 2 ambiguity
Level 3 ambiguity
40
30
20
10
0
225
450
675
900
Number of training examples
93
Outline
• KRISP: A Supervised Learning System
• Utilizing Weaker Forms of Supervision
– Semi-supervision
– Ambiguous supervision
• Transforming meaning representation grammar
• Directions for Future Work
• Conclusions
94
Why Transform Meaning Representation
Grammar?
• Productions of meaning representation grammar
(MRG) may not correspond well with NL semantics
CLang MR expression: (rec (pt -32 -35) (pt 0 35) )
REGION  (rec POINT POINT)
POINT  (pt NUM NUM)
NUM  -32
NUM  -35
POINT  (pt NUM NUM)
NUM  0
NUM  35
????
“our midfield”
95
Why Transform Meaning Representation
Grammar?
• Productions of meaning representation grammar
(MRG) may not correspond well with NL semantics
Geoquery MR: answer(longest(river(loc_2(stateid(‘Texas’)))))
ANSWER  answer ( RIVER )
RIVER  longest ( RIVER )
RIVER  river ( LOCATIONS )
LOCATIONS  loc_2 ( STATE )
STATE  STATEID
STATEID  stateid ( ‘Texas’ )
Which is the longest river in Texas?
96
Why Transform Meaning Representation
Grammar?
• Productions of meaning representation grammar
(MRG) may not correspond well with NL semantics
Geoquery MR: answer(longest(river(loc_2(stateid(‘Texas’)))))
ANSWER  answer ( RIVER )
RIVER  longest ( RIVER )
RIVER  river ( LOCATIONS )
LOCATIONS  loc_2 ( STATE )
STATE  STATEID
STATEID  stateid ( ‘Texas’ )
Which is the longest river in Texas?
97
Manual Engineering of MRG
• Several awkward constructs from the original
CLang grammar were manually replaced with NL
compatible MR expressions
• MRG for Geoquery was manually constructed for
its functional MRL which was derived from the
original Prolog expressions
• Requires expertise in MRL and domain
knowledge
Automatically transform MRG to improve semantic
parsing
98
Transforming Meaning Representation
Grammar
• Train KRISP using the given MRG and parse the
training sentences
• Collect “bad” productions which KRISP often
uses incorrectly (its output MR parses use them
but the correct MR parses do not, or vice versa)
• Modify these productions using four Context-Free
Grammar transformation operators
• The transformed MRG accepts the same MRL as
the original MRG
99
Transformation Operators
1. Create non-terminal from a terminal: Introduces
a new semantic concept
STATE  largest STATE
CITY  largest CITY
Bad productions
PLACE  largest PLACE
LARGEST  largest
10
0
Transformation Operators
1. Create non-terminal from a terminal: Introduces
a new semantic concept
STATE  LARGEST STATE
CITY  LARGEST CITY
PLACE  LARGEST PLACE
LARGEST  largest
10
1
Transformation Operators
2. Merge non-terminals: Generalizes productions
STATE  LARGEST STATE
CITY  LARGEST CITY
PLACE  LARGEST PLACE
STATE  SMALLEST STATE
Bad productions
CITY  SMALLEST CITY
PLACE  SMALLEST PLACE
10
2
Transformation Operators
2. Merge non-terminals: Generalizes productions
STATE  LARGEST STATE
CITY  LARGEST CITY
PLACE  LARGEST PLACE
STATE  SMALLEST STATE
Bad productions
CITY  SMALLEST CITY
PLACE  SMALLEST PLACE
QUALIFIER  LARGEST
QUALIFIER  SMALLEST
10
3
Transformation Operators
2. Merge non-terminals: Generalizes productions
STATE  QUALIFIER STATE
CITY  QUALIFIER CITY
PLACE  QUALIFIER PLACE
QUALIFIER  LARGEST
QUALIFIER  SMALLEST
10
4
Transformation Operators
3. Combine non-terminals: Combines the concepts
CITY  SMALLEST MAJOR CITY
LAKE  SMALLEST MAJOR LAKE
Bad productions
SMALLEST_MAJOR  SMALLEST MAJOR
10
5
Transformation Operators
3. Combine non-terminals: Combines the concepts
CITY  SMALLEST_MAJOR CITY
LAKE  SMALLEST_MAJOR LAKE
SMALLEST_MAJOR  SMALLEST MAJOR
10
6
Transformation Operators
4. Delete production: Eliminates a semantic concept
NUM  AREA LEFTBR STATE RIGHTBR
NUM  DENSITY LEFTBR CITY RIGHTBR
LEFTBR  (
RIGHTBR  )
Bad productions
10
7
Transformation Operators
4. Delete production: Eliminates a semantic concept
NUM  AREA ( STATE RIGHTBR
NUM  DENSITY ( CITY RIGHTBR
LEFTBR  (
RIGHTBR  )
Bad productions
10
8
Transformation Operators
4. Delete production: Eliminates a semantic concept
NUM  AREA ( STATE )
NUM  DENSITY ( CITY )
LEFTBR  (
RIGHTBR  )
Bad productions
10
9
MRG Transformation Algorithm
• A heuristic search is used to find a good MRG
among all possible MRGs
• All possible instances of each type of operator are
applied, then the training examples are re-parsed
and the semantic parser is re-trained
• Two iterations were sufficient for convergence of
performance
11
0
Results on Geoquery Using Transformation
Operators
11
1
Rest of the Dissertation
• Utilizing More Supervision
– Utilize syntactic parses using tree-kernel
– Utilize Semantically Augmented Parse Trees [Ge &
Mooney, 2005]
Not much improvement in performance
• Meaning representation macros to transform MRG
• Ensembles of semantic parsers
– Simple majority ensemble of KRISP, WASP and
SCISSOR achieves the best overall performance
11
2
Outline
• KRISP: A Supervised Learning System
• Utilizing Weaker Forms of Supervision
– Semi-supervision
– Ambiguous supervision
• Transforming meaning representation grammar
• Directions for Future Work
• Conclusions
11
3
Directions for Future Work
• Improve KRISP’s semantic parsing framework
– Do not make independence assumption
– Allow words to overlap
Will increase complexity of the system
ANSWER  answer(RIVER) 0.89
RIVER  TRAVERSE(STATE) 0.92
TRAVERSE  traverse 0.91
STATE  NEXT_TO(STATE) 0.81
NEXT_TO  next_to 0.95
STATE  STATEID 0.98
STATEID  ‘texas’ 0.99
Which rivers run through the states bordering Texas? 11
4
Directions for Future Work
• Improve KRISP’s semantic parsing framework
– Do not make independence assumption
– Allow words to overlap
Will increase complexity of the system
• Better kernels:
– Dependency tree kernels
– Use word categories or domain-specific word ontology
– Noise resistant kernel
• Learn from perceptual contexts
– Combine with a vision-based system to map real-world
perceptual contexts into symbolic MRs
11
5
Directions for Future Work contd.
Structured Information Extraction
• Most IE work has focused on extracting single
entities or binary relations, e.g. “person”,
“company”, “employee-of”
• Structured IE like extracting complex n-ary
relations [McDonald et al., 2005] is more useful in
automatically building databases and text mining
• Level of semantic analysis required is intermediate
between normal IE and semantic parsing
11
6
Directions for Future Work contd.
Complex relation (person, job, company)
NL sentence: John Smith is the CEO of Inc. Corp.
MR: (John Smith, CEO, Inc. Corp.)
(person, job,company)
(person, job)
(job, company)
John Smith is the CEO of Inc. Corp.
KRISP should be applicable to extract complex relations
by treating it like higher level production composed
of lower level productions.
11
7
Directions for Future Work contd.
Broaden the applicability of semantic parsers to
open-domain
• Difficult to construct one MRL for open-domain
• But a suitable MRL may be constructed by
narrowing down the meaning of open-domain
natural language based on the actions expected
from the computer
• Will need help from open-domain techniques of
word-sense disambiguation, anaphora resolution
etc.
11
8
Conclusions
• A new string-kernel-based approach for learning
semantic parsers, more robust to noisy input
• Extension for semi-supervised semantic parsing to
utilize unannotated training data
• Learns from more general and weaker form of
ambiguous supervision
• Transforms meaning representation grammar to
improve semantic parsing
• In future, scope and applicability of semantic
parsing can be broadened
11
9
Thank You!
Questions??
12
0
Download