CS 388: Natural Language Processing: Semantic Parsing Raymond J. Mooney

advertisement
CS 388:
Natural Language Processing:
Semantic Parsing
Raymond J. Mooney
University of Texas at Austin
1
Representing Meaning
• Representing the meaning of natural language is
ultimately a difficult philosophical question, i.e.
the “meaning of meaning”.
• Traditional approach is to map ambiguous NL to
unambiguous logic in first-order predicate
calculus (FOPC).
• Standard inference (theorem proving) methods
exist for FOPC that can determine when one
statement entails (implies) another. Questions can
be answered by determining what potential
responses are entailed by given NL statements and
background knowledge all encoded in FOPC.
2
Model Theoretic Semantics
• Meaning of traditional logic is based on model theoretic
semantics which defines meaning in terms of a model (a.k.a.
possible world), a set-theoretic structure that defines a
(potentially infinite) set of objects with properties and relations
between them.
• A model is a connecting bridge between language and the world
by representing the abstract objects and relations that exist in a
possible world.
• An interpretation is a mapping from logic to the model that
defines predicates extensionally, in terms of the set of tuples of
objects that make them true (their denotation or extension).
– The extension of Red(x) is the set of all red things in the world.
– The extension of Father(x,y) is the set of all pairs of objects <A,B> such
that A is B’s father.
3
Truth-Conditional Semantics
• Model theoretic semantics gives the truth
conditions for a sentence, i.e. a model satisfies a
logical sentence iff the sentence evaluates to true
in the given model.
• The meaning of a sentence is therefore defined as
the set of all possible worlds in which it is true.
4
Semantic Parsing
• Semantic Parsing: Transforming natural
language (NL) sentences into completely
formal logical forms or meaning
representations (MRs).
• Sample application domains where MRs are
directly executable by another computer
system to perform some task.
– CLang: Robocup Coach Language
– Geoquery: A Database Query Application
5
CLang: RoboCup Coach Language
• In RoboCup Coach competition teams compete to
coach simulated players [http://www.robocup.org]
• The coaching instructions are given in a formal
language called CLang [Chen et al. 2003]
If the ball is in our
goal area then
player 1 should
intercept it.
Simulated soccer field
Semantic Parsing
(bpos (goal-area our) (do our {1} intercept))
CLang
6
Geoquery:
A Database Query Application
• Query application for U.S. geography database
containing about 800 facts [Zelle & Mooney, 1996]
Which rivers run
through the states
bordering Texas?
Arkansas, Canadian, Cimarron,
Gila, Mississippi, Rio Grande …
Answer
Semantic Parsing
answer(traverse(next_to(stateid(‘texas’))))
Query
7
Procedural Semantics
• The meaning of a sentence is a formal
representation of a procedure that performs
some action that is an appropriate response.
– Answering questions
– Following commands
• In philosophy, the “late” Wittgenstein was
known for the “meaning as use” view of
semantics compared to the model theoretic
view of the “early” Wittgenstein and other
logicians.
8
Predicate Logic Query Language
• Most existing work on computational
semantics is based on predicate logic
What is the smallest state by area?
answer(x1,smallest(x2,(state(x1),area(x1,x2))))
x1 is a logical variable that denotes “the
smallest state by area”
9
Functional Query Language (FunQL)
• Transform a logical language into a functional,
variable-free language (Kate et al., 2005)
What is the smallest state by area?
answer(x1,smallest(x2,(state(x1),area(x1,x2))))
answer(smallest_one(area_1(state(all))))
10
Learning Semantic Parsers
• Manually programming robust semantic parsers
is difficult due to the complexity of the task.
• Semantic parsers can be learned automatically
from sentences paired with their logical form.
NLMR
Training Exs
Natural
Language
Semantic-Parser
Learner
Semantic
Parser
Meaning
Rep
11
Engineering Motivation
• Most computational language-learning research strives
for broad coverage while sacrificing depth.
– “Scaling up by dumbing down”
• Realistic semantic parsing currently entails domain
dependence.
• Domain-dependent natural-language interfaces have a
large potential market.
• Learning makes developing specific applications more
tractable.
• Training corpora can be easily developed by tagging
existing corpora of formal statements with naturallanguage glosses.
12
Cognitive Science Motivation
• Most natural-language learning methods
require supervised training data that is not
available to a child.
– General lack of negative feedback on grammar.
– No POS-tagged or treebank data.
• Assuming a child can infer the likely
meaning of an utterance from context,
NLMR pairs are more cognitively
plausible training data.
13
Our Semantic-Parser Learners
• CHILL+WOLFIE (Zelle & Mooney, 1996; Thompson & Mooney,
1999, 2003)
– Separates parser-learning and semantic-lexicon learning.
– Learns a deterministic parser using ILP techniques.
• COCKTAIL (Tang & Mooney, 2001)
– Improved ILP algorithm for CHILL.
• SILT (Kate, Wong & Mooney, 2005)
– Learns symbolic transformation rules for mapping directly from NL to LF.
• SCISSOR (Ge & Mooney, 2005)
– Integrates semantic interpretation into Collins’ statistical syntactic parser.
• WASP (Wong & Mooney, 2006)
– Uses syntax-based statistical machine translation methods.
• KRISP (Kate & Mooney, 2006)
– Uses a series of SVM classifiers employing a string-kernel to iteratively build
semantic representations.
14
CHILL
(Zelle & Mooney, 1992-96)
• Semantic parser acquisition system using Inductive
Logic Programming (ILP) to induce a parser
written in Prolog.
• Starts with a deterministic parsing “shell” written
in Prolog and learns to control the operators of this
parser to produce the given I/O pairs.
• Requires a semantic lexicon, which for each word
gives one or more possible meaning
representations.
• Parser must disambiguate words, introduce proper
semantic representations for each, and then put
them together in the right way to produce a proper
representation of the sentence.
15
CHILL Example
• U.S. Geographical database
– Sample training pair
• Cuál es el capital del estado con la población más grande?
• answer(C, (capital(S,C), largest(P, (state(S), population(S,P)))))
– Sample semantic lexicon
•
•
•
•
•
cuál :
answer(_,_)
capital:
capital(_,_)
estado:
state(_)
más grande: largest(_,_)
población: population(_,_)
16
WOLFIE
(Thompson & Mooney, 1995-1999)
• Learns a semantic lexicon for CHILL from the
same corpus of semantically annotated sentences.
• Determines hypotheses for word meanings by
finding largest isomorphic common subgraphs
shared by meanings of sentences in which the
word appears.
• Uses a greedy-covering style algorithm to learn a
small lexicon sufficient to allow compositional
construction of the correct representation from the
words in a sentence.
17
WOLFIE + CHILL
Semantic Parser Acquisition
NLMR
Training Exs
WOLFIE
Lexicon Learner
Semantic
Lexicon
CHILL
Parser Learner
Natural
Language
Semantic
Parser
Meaning
Rep
18
Compositional Semantics
• Approach to semantic analysis based on building up
an MR compositionally based on the syntactic
structure of a sentence.
• Build MR recursively bottom-up from the parse tree.
BuildMR(parse-tree)
If parse-tree is a terminal node (word) then
return an atomic lexical meaning for the word.
Else
For each child, subtreei, of parse-tree
Create its MR by calling BuildMR(subtreei)
Return an MR by properly combining the resulting MRs
for its children into an MR for the overall parse-tree.
Composing MRs from Parse Trees
What is the capital of Ohio?
S answer(capital(loc_2(stateid('ohio'))))
NP
WP
What
VP capital(loc_2(stateid('ohio')))
answer()
answer()
answer()
NP
V
capital(loc_2(stateid('ohio')))
VBZ  DT N capital() PP loc_2(stateid('ohio'))
is

the capital IN loc_2() NP stateid('ohio')

capital()
of
NNPstateid('ohio')
loc_2()
Ohio stateid('ohio')
20
Disambiguation with
Compositional Semantics
• The composition function that combines the MRs
of the children of a node, can return  if there is no
sensible way to compose the children’s meanings.
• Could compute all parse trees up-front and then
compute semantics for each, eliminating any that
ever generate a  semantics for any constituent.
• More efficient method:
– When filling (CKY) chart of syntactic phrases, also
compute all possible compositional semantics of each
phrase as it is constructed and make an entry for each.
– If a given phrase only gives  semantics, then remove
this phrase from the table, thereby eliminating any parse
that includes this meaningless phrase.
Composing MRs from Parse Trees
What is the capital of Ohio?
S
NP
WP
What
VP
NP
V
VBZ
is
DT
PP
N

the capital IN loc_2() NP riverid('ohio')
of
NNPriverid('ohio')
loc_2()
Ohio riverid('ohio')
22
Composing MRs from Parse Trees
What is the capital of Ohio?
S
VP
NP
WP
What

NP capital()
V
PPloc_2(stateid('ohio'))
VBZ  DT N capital() IN loc_2() NP stateid('ohio')
stateid('ohio')
NNP
of
is
the capital


capital()
loc_2()
Ohio stateid('ohio')
SCISSOR:
Semantic Composition that Integrates Syntax
and Semantics to get Optimal Representations
24
SCISSOR
• An integrated syntax-based approach
– Allows both syntax and semantics to be used
simultaneously to build meaning representations
• A statistical parser is used to generate a semantically
augmented parse tree (SAPT)
S-bowner
NP-player
VP-bowner
• Translate a SAPT into a complete formal meaning
PRP$-team NN-player
CD-unum
NP-null
representation
(MR)
usingVB-bowner
a meaning composition
our
player
2
has
DT-null
NN-null
process
MR: bowner(player(our,2))
the
ball
25
Semantic Composition Example
S-bowner(player(our,2))
NP-player(our,2)
PRP$-our
our
VP-bowner(_)
NN-player(_,_) CD-2
player
VB-bowner(_)
2
require no argumentsrequire arguments
player(team,unum)
has
NP-null
DT-null
NN-null
the
ball
semantic vacuous
bowner(player)
26
Semantic Composition Example
S-bowner(player(our,2))
NP-player(our,2)
PRP$-our
our
VP-bowner(_)
NN-player(_,_) CD-2
player
2
VB-bowner(_)
has
NP-null
DT-null
NN-null
the
ball
player(team,unum)
bowner(player)
27
Semantic Composition Example
S-bowner(player(our,2))
NP-player(our,2)
PRP$-our
our
VP-bowner(_)
NN-player(_,_) CD-2
player
2
VB-bowner(_)
has
NP-null
DT-null
NN-null
the
ball
player(team,unum)
bowner(player)
28
SCISSOR
• An integrated syntax-based approach
– Allows both syntax and semantics to be used
simultaneously to build meaning representations
• A statistical parser is used to generate a semantically
augmented parse tree (SAPT)
• Translate a SAPT into a complete formal meaning
representation (MR) using a meaning composition
process
• Allow statistical modeling of semantic selectional
constraints in application domains
– (AGENT pass) = PLAYER
29
Overview of SCISSOR
NL Sentence
SAPT Training Examples
learner
Integrated Semantic Parser
SAPT
TRAINING
TESTING
ComposeMR
MR
30
Extending Collins’ (1997) Syntactic Parsing Model
• Collins’ (1997) introduced a lexicalized headdriven syntactic parsing model
• Bikel’s (2004) provides an easily-extended opensource version of the Collins statistical parser
• Extending the parsing model to generate semantic
labels simultaneously with syntactic labels
constrained by semantic constraints in application
domains
31
Integrating Semantics into the Model
• Use the same Markov processes
• Add a semantic label to each node
• Add semantic subcat frames
– Give semantic subcategorization preferences
– bowner takes a player as its argument
S(has)
S-bowner(has
)
NP-player(player)
NP(player)
VP(has)
VP-bowner(has)
NP(ball)
NP-null(ball)
PRP$-team
PRP$ NN-player
NN
CD-unum
CD
VB
VB-bowner
DTDT-nullNN
NN-null
our
our
player
player
2 2
has has
the the ball ball
32
Adding Semantic Labels into the Model
S-bowner(has)
VP-bowner(has)
Ph(VP-bowner | S-bowner, has)
33
Adding Semantic Labels into the Model
S-bowner(has)
VP-bowner(has)
{NP}-{player}
{ }-{ }
Ph(VP-bowner | S-bowner, has) ×
Plc({NP}-{player} | S-bowner, VP-bowner, has) × Prc({}-{}| S-bowner, VP-bowner, has)
34
Adding Semantic Labels into the Model
S-bowner(has)
NP-player(player) VP-bowner(has)
{NP}-{player}
{ }-{ }
Ph(VP-bowner | S-bowner, has) ×
Plc({NP}-{player} | S-bowner, VP-bowner, has) × Prc({}-{}| S-bowner, VP-bowner, has) ×
Pd(NP-player(player) | S-bowner, VP-bowner, has, LEFT, {NP}-{player})
35
Adding Semantic Labels into the Model
S-bowner(has)
NP-player(player) VP-bowner(has)
{ }-{ }
{ }-{ }
Ph(VP-bowner | S-bowner, has) ×
Plc({NP}-{player} | S-bowner, VP-bowner, has) × Prc({}-{}| S-bowner, VP-bowner, has) ×
Pd(NP-player(player) | S-bowner, VP-bowner, has, LEFT, {NP}-{player})
36
Adding Semantic Labels into the Model
S-bowner(has)
STOP NP-player(player) VP-bowner(has)
{ }-{ }
{ }-{ }
Ph(VP-bowner | S-bowner, has) ×
Plc({NP}-{player} | S-bowner, VP-bowner, has) × Prc({}-{}| S-bowner, VP-bowner, has) ×
Pd(NP-player(player) | S-bowner, VP-bowner, has, LEFT, {NP}-{player}) ×
Pd(STOP | S-bowner, VP-bowner, has, LEFT, {}-{})
37
Adding Semantic Labels into the Model
S-bowner(has)
STOP NP-player(player) VP-bowner(has)
{ }-{ }
STOP
{ }-{ }
Ph(VP-bowner | S-bowner, has) ×
Plc({NP}-{player} | S-bowner, VP-bowner, has) × Prc({}-{}| S-bowner, VP-bowner, has) ×
Pd(NP-player(player) | S-bowner, VP-bowner, has, LEFT, {NP}-{player}) ×
Pd(STOP | S-bowner, VP-bowner, has, LEFT, {}-{}) ×
Pd(STOP | S-bowner, VP-bowner, has, RIGHT, {}-{})
38
SCISSOR Parser Implementation
• Supervised training on annotated SAPTs is just
frequency counting
• Augmented smoothing technique is employed to
account for additional data sparsity created by
semantic labels.
• Parsing of test sentences to find the most probable
SAPT is performed using a variant of standard
CKY chart-parsing algorithm.
39
Smoothing
• Each label in SAPT is the combination of a
syntactic label and a semantic label
• Increases data sparsity
• Use Bayes rule to break the parameters down
Ph(H | P, w)
= Ph(Hsyn, Hsem | P, w)
= Ph(Hsyn | P, w) × Ph(Hsem | P, w, Hsyn)
40
Learning Semantic Parsers with a
Formal Grammar for Meaning Representations
• Our other techniques assume that meaning
representation languages (MRLs) have
deterministic context free grammars
– True for almost all computer languages
– MRs can be parsed unambiguously
41
NL: Which rivers run through the states bordering Texas?
MR: answer(traverse(next_to(stateid(‘texas’))))
Parse tree of MR:
ANSWER
RIVER
answer
TRAVERSE
traverse
STATE
NEXT_TO
STATE
next_to
STATEID
stateid
‘texas’
Non-terminals: ANSWER, RIVER, TRAVERSE, STATE, NEXT_TO, STATEID
Terminals: answer, traverse, next_to, stateid, ‘texas’
Productions: ANSWER  answer(RIVER), RIVER  TRAVERSE(STATE),
STATE  NEXT_TO(STATE), TRAVERSE  traverse,
NEXT_TO  next_to, STATEID  ‘texas’
42
KRISP: Kernel-based Robust Interpretation
for Semantic Parsing
• Learns semantic parser from NL sentences paired
with their respective MRs given MRL grammar
• Productions of MRL are treated like semantic
concepts
• SVM classifier with string subsequence kernel is
trained for each production to identify if an NL
substring represents the semantic concept
• These classifiers are used to compositionally build
MRs of the sentences
43
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Training
Testing
Novel NL sentences
Best MRs (correct
and incorrect)
Semantic
Parser
Best MRs
44
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Training
Testing
Novel NL sentences
Best MRs (correct
and incorrect)
Semantic
Parser
Best MRs
45
KRISP’s Semantic Parsing
• We first define Semantic Derivation of an NL
sentence
• We next define Probability of a Semantic
Derivation
• Semantic parsing of an NL sentence involves
finding its Most Probable Semantic Derivation
• Straightforward to obtain MR from a semantic
derivation
46
Semantic Derivation of an NL Sentence
MR parse with non-terminals on the nodes:
ANSWER
RIVER
answer
TRAVERSE
traverse
STATE
NEXT_TO
STATE
next_to
STATEID
stateid
‘texas’
Which rivers run through the states bordering Texas?
47
Semantic Derivation of an NL Sentence
MR parse with productions on the nodes:
ANSWER  answer(RIVER)
RIVER  TRAVERSE(STATE)
TRAVERSE  traverse
STATE  NEXT_TO(STATE)
NEXT_TO  next_to
STATE  STATEID
STATEID  ‘texas’
Which rivers run through the states bordering Texas?
48
Semantic Derivation of an NL Sentence
Semantic Derivation: Each node covers an NL substring:
ANSWER  answer(RIVER)
RIVER  TRAVERSE(STATE)
TRAVERSE  traverse
STATE  NEXT_TO(STATE)
NEXT_TO  next_to
STATE  STATEID
STATEID  ‘texas’
Which rivers run through the states bordering Texas?
49
Semantic Derivation of an NL Sentence
Semantic Derivation: Each node contains a production
and the substring of NL sentence it covers:
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..4])
(STATE  NEXT_TO(STATE), [5..9])
(NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
7
8
9
50
Semantic Derivation of an NL Sentence
Substrings in NL sentence may be in a different order:
ANSWER  answer(RIVER)
RIVER  TRAVERSE(STATE)
TRAVERSE  traverse
STATE  NEXT_TO(STATE)
NEXT_TO  next_to
STATE  STATEID
STATEID  ‘texas’
Through the states that border Texas which rivers run?
51
Semantic Derivation of an NL Sentence
Nodes are allowed to permute the children productions
from the original MR parse
(ANSWER  answer(RIVER), [1..10])
(RIVER  TRAVERSE(STATE), [1..10]]
(STATE  NEXT_TO(STATE), [1..6])
(NEXT_TO  next_to, [1..5])
(TRAVERSE  traverse, [7..10])
(STATE  STATEID, [6..6])
(STATEID  ‘texas’, [6..6])
Through the states that border Texas which rivers run?
1
2
3
4
5
6
7
8
9 10
52
Probability of a Semantic Derivation
• Let Pπ(s[i..j]) be the probability that production π covers
the substring s[i..j] of sentence s
• For e.g., PNEXT_TO  next_to (“the states bordering”)
(NEXT_TO  next_to, [5..7])
0.99
the states bordering
5
6
7
• Obtained from the string-kernel-based SVM classifiers
trained for each production π
• Assuming independence, probability of a semantic
derivation D:
P ( D) 
 P (s[i.. j ])
( ,[ i .. j ])D
53
Probability of a Semantic Derivation contd.
(ANSWER  answer(RIVER), [1..9])
0.98
(RIVER  TRAVERSE(STATE), [1..9])
0.9
(TRAVERSE  traverse, [1..4])
(STATE  NEXT_TO(STATE), [5..9])
0.95
0.89
(NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9])
0.99
0.93
(STATEID  ‘texas’, [8..9])
0.98
Which rivers run through the states bordering Texas?
1
2
3
P ( D) 
4
5
6
7
8
9
 P (s[i.. j ])  0.673
( ,[ i .. j ])D
54
Computing the Most Probable Semantic
Derivation
• Task of semantic parsing is to find the most
probable semantic derivation of the NL sentence
given all the probabilities Pπ(s[i..j])
• Implemented by extending Earley’s [1970]
context-free grammar parsing algorithm
• Resembles PCFG parsing but different because:
– Probability of a production depends on which substring
of the sentence it covers
– Leaves are not terminals but substrings of words
55
Computing the Most Probable Semantic
Derivation contd.
• Does a greedy approximation search, with beam
width ω=20, and returns ω most probable
derivations it finds
• Uses a threshold θ=0.05 to prune low probability
trees
56
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Best semantic
derivations (correct
and incorrect)
Pπ(s[i..j])
Training
Testing
Novel NL sentences
Semantic
Parser
Best MRs
57
KRISP’s Training Algorithm
• Takes NL sentences paired with their respective
MRs as input
• Obtains MR parses
• Induces the semantic parser using an SVM with a
string subsequence kernel and refines it in
iterations
• In the first iteration, for every production π:
– Call those sentences positives whose MR parses use
that production
– Call the remaining sentences negatives
58
Support Vector Machines
• Recent approach based on extending a neuralnetwork approach like Perceptron.
• Finds that linear separator that maximizes the
margin between the classes.
• Based in computational learning theory, which
explains why max-margin is a good approach
(Vapnik, 1995).
• Good at avoiding over-fitting in high-dimensional
feature spaces.
• Performs well on various text and language
problems, which tend to be high-dimensional.
59
59
Picking a Linear Separator
• Which of the alternative linear separators
is best?
60
60
Classification Margin
• Consider the distance of points from the separator.
• Examples closest to the hyperplane are support vectors.
• Margin ρ of the separator is the width of separation
between classes.
ρ
r
61
61
SVM Algorithms
• Finding the max-margin separator is an
optimization problem called quadratic
optimization.
• Algorithms that guarantee an optimal margin take
at least O(n2) time and do not scale well to large
data sets.
• Approximation algorithms like SVM-light
(Joachims, 1999) and SMO (Platt, 1999) allow
scaling to realistic problems.
62
62
Kernels
• SVMs can be extended to learning non-linear separators by
using kernel functions.
• A kernel function is a similarity function between two
instances, K(x1,x2), that must satisfy certain mathematical
constraints.
• A kernel function implicitly maps instances into a higher
dimensional feature space where (hopefully) the categories
are linearly separable.
• A kernel-based method (like SVMs) can use a kernel to
implicitly operate in this higher-dimensional space without
having to explicitly map instances into this much larger
(perhaps infinite) space (called “the kernel trick”).
• Kernels can be defined on non-vector data like strings,
trees, and graphs, allowing the application of kernel-based
methods to complex, unbounded, non-vector data
structures.
63
63
Non-linear SVMs: Feature spaces
• General idea: the original feature space can
always be mapped to some higher-dimensional
feature space where the training set is separable:
Φ: x → φ(x)
64
64
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et
al., 2002]
s = “states that are next to”
t = “the states next to”
K(s,t) = ?
65
65
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et
al., 2002]
s = “states that are next to”
t = “the states next to”
u = states
K(s,t) = 1+?
66
66
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et
al., 2002]
s = “states that are next to”
t = “the states next to”
u = next
K(s,t) = 2+?
67
67
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et
al., 2002]
s = “states that are next to”
t = “the states next to”
u = to
K(s,t) = 3+?
68
68
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et
al., 2002]
s = “states that are next to”
t = “the states next to”
u = states next
K(s,t) = 4+?
69
69
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et
al., 2002]
s = “states that are next to”
t = “the states next to”
u = states to
K(s,t) = 5+?
70
70
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et
al., 2002]
s = “states that are next to”
t = “the states next to”
u = next to
K(s,t) = 6+?
71
71
String Subsequence Kernel
• Define kernel between two strings as the number
of common subsequences between them [Lodhi et
al., 2002]
s = “states that are next to”
t = “the states next to”
u = states next to
K(s,t) = 7
72
72
KRISP’s Training Algorithm contd.
First Iteration
STATE  NEXT_TO(STATE)
Positives
Negatives
•which rivers run through the states bordering
texas?
•what state has the highest population ?
•what is the most populated state bordering
oklahoma ?
•which states have cities named austin ?
•what states does the delaware river run through ?
•what is the largest city in states that border
california ?
•what is the lowest point of the state with the
largest area ?
…
…
String-kernel-based
SVM classifier
73
String Subsequence Kernel
• The examples are implicitly mapped to the feature space
of all subsequences and the kernel computes the dot
products
state with the capital of
states with area larger than
states through which
the states next to
states that border
states bordering
states that share border
74
Support Vector Machines
• SVMs find a separating hyperplane such that the margin
is maximized
Separating
hyperplane
state with the capital of
states that are next to
states with area larger than
states through which
0.97
the states next to
states that border
states bordering
states that share border
Probability estimate of an example belonging to a class can be
obtained using its distance from the hyperplane [Platt, 1999] 75
KRISP’s Training Algorithm contd.
First Iteration
STATE  NEXT_TO(STATE)
Positives
Negatives
•which rivers run through the states bordering
texas?
•what state has the highest population ?
•what is the most populated state bordering
oklahoma ?
•which states have cities named austin ?
•what states does the delaware river run through ?
•what is the largest city in states that border
california ?
•what is the lowest point of the state with the
largest area ?
…
…
String-kernel-based
SVM classifier
PSTATENEXT_TO(STATE) (s[i..j])
76
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Best semantic
derivations (correct
and incorrect)
Pπ(s[i..j])
Training
Testing
Novel NL sentences
Semantic
Parser
Best MRs
77
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Best semantic
derivations (correct
and incorrect)
Pπ(s[i..j])
Training
Testing
Novel NL sentences
Semantic
Parser
Best MRs
78
KRISP’s Training Algorithm contd.
• Using these classifiers Pπ(s[i..j]), obtain the ω best
semantic derivations of each training sentence
• Some of these derivations will give the correct MR, called
correct derivations, some will give incorrect MRs, called
incorrect derivations
• For the next iteration, collect positives from most probable
correct derivation
• Extended Earley’s algorithm can be forced to derive only
the correct derivations by making sure all subtrees it
generates exist in the correct MR parse
• Collect negatives from incorrect derivations with higher
probability than the most probable correct derivation
79
KRISP’s Training Algorithm contd.
Most probable correct derivation:
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..4])
(STATE  NEXT_TO(STATE), [5..9])
(NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
7
8
9
80
KRISP’s Training Algorithm contd.
Most probable correct derivation: Collect positive
examples
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..4])
(STATE  NEXT_TO(STATE), [5..9])
(NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
7
8
9
81
KRISP’s Training Algorithm contd.
Incorrect derivation with probability greater than the
most probable correct derivation:
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
Incorrect MR: answer(traverse(stateid(‘texas’)))
7
8
9
82
KRISP’s Training Algorithm contd.
Incorrect derivation with probability greater than the
most probable correct derivation: Collect negative
examples (ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE  STATEID, [8..9])
(STATEID  ‘texas’, [8..9])
Which rivers run through the states bordering Texas?
1
2
3
4
5
6
Incorrect MR: answer(traverse(stateid(‘texas’)))
7
8
9
83
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Traverse both trees in breadth-first order till the first nodes where
their productions differ are found.
84
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Traverse both trees in breadth-first order till the first nodes where
their productions differ are found.
85
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Traverse both trees in breadth-first order till the first nodes where
their productions differ are found.
86
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Traverse both trees in breadth-first order till the first nodes where
their productions differ are found.
87
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Traverse both trees in breadth-first order till the first nodes where
their productions differ are found.
88
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Mark the words under these nodes.
89
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Mark the words under these nodes.
90
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Consider all the productions covering the marked words.
Collect negatives for productions which cover any marked word
91
in incorrect derivation but not in the correct derivation.
91
KRISP’s Training Algorithm contd.
Most Probable
Correct derivation:
Incorrect derivation:
(ANSWER  answer(RIVER), [1..9])
(ANSWER  answer(RIVER), [1..9])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE 
traverse, [1..4])
(STATE  NEXT_TO
(STATE), [5..9])
(NEXT_TO 
next_to, [5..7])
(RIVER  TRAVERSE(STATE), [1..9])
(TRAVERSE  traverse, [1..7])
(STATE 
STATEID, [8..9])
(STATEID 
‘texas’, [8..9])
Which rivers run through the states bordering Texas?
(STATE  STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
Consider the productions covering the marked words.
Collect negatives for productions which cover any marked word
in incorrect derivation but not in the correct derivation.
92
KRISP’s Training Algorithm contd.
Next Iteration: more refined positive and negative examples
STATE  NEXT_TO(STATE)
Positives
Negatives
•the states bordering texas?
•what state has the highest population ?
•state bordering oklahoma ?
•what states does the delaware river run through ?
•states that border california ?
•which states have cities named austin ?
•states which share border
•what is the lowest point of the state with the
largest area ?
•next to state of iowa
•which rivers run through states bordering
…
…
String-kernel-based
SVM classifier
PSTATENEXT_TO(STATE) (s[i..j])
93
Overview of KRISP
MRL Grammar
NL sentences
with MRs
Collect positive and
negative examples
Train string-kernel-based
SVM classifiers
Best semantic
derivations (correct
and incorrect)
Pπ(s[i..j])
Training
Testing
Novel NL sentences
Semantic
Parser
Best MRs
94
WASP
A Machine Translation Approach to Semantic Parsing
• Uses statistical machine translation
techniques
– Synchronous context-free grammars (SCFG)
(Wu, 1997; Melamed, 2004; Chiang, 2005)
– Word alignments (Brown et al., 1993; Och &
Ney, 2003)
• Hence the name: Word Alignment-based
Semantic Parsing
95
A Unifying Framework for
Parsing and Generation
Natural Languages
Machine
translation
96
A Unifying Framework for
Parsing and Generation
Natural Languages
Semantic parsing
Machine
translation
Formal Languages
97
A Unifying Framework for
Parsing and Generation
Natural Languages
Semantic parsing
Machine
translation
Tactical generation
Formal Languages
98
A Unifying Framework for
Parsing and Generation
Synchronous Parsing
Natural Languages
Semantic parsing
Machine
translation
Tactical generation
Formal Languages
99
A Unifying Framework for
Parsing and Generation
Synchronous Parsing
Natural Languages
Semantic parsing
Machine
translation
Compiling:
Aho & Ullman
(1972)
Tactical generation
Formal Languages
100
Synchronous Context-Free Grammars
(SCFG)
• Developed by Aho & Ullman (1972) as a
theory of compilers that combines syntax
analysis and code generation in a single
phase
• Generates a pair of strings in a single
derivation
101
Context-Free Semantic Grammar
QUERY
QUERY  What is CITY
What
is
CITY
CITY  the capital CITY
CITY  of STATE
STATE  Ohio
the
capital
of
CITY
STATE
Ohio
102
Productions of
Synchronous Context-Free Grammars
Natural language
Formal language
QUERY  What is CITY / answer(CITY)
103
Synchronous Context-Free
Grammar Derivation
QUERY
What
is
the
QUERY
answer
CITY
capital
of
(
capital
CITY
STATE
Ohio
CITY
(
loc_2
)
CITY
(
stateid
)
STATE
(
)
'ohio'
)
CITY
Ohio
the
capital
CITY
capital(CITY)
QUERY
CITY

 What
of
STATE
is
CITY
loc_2(STATE)
// answer(CITY)
answer(capital(loc_2(stateid('ohio'))))
STATE
Ohio
//stateid('ohio')
What is the capital
of
104
Probabilistic Parsing Model
d1
CITY
CITY
capital
capital
CITY
of
STATE
Ohio
(
loc_2
CITY
(
)
STATE
stateid
(
)
'ohio'
)
CITY  capital CITY / capital(CITY)
CITY  of STATE / loc_2(STATE)
STATE  Ohio / stateid('ohio')
105
Probabilistic Parsing Model
d2
CITY
CITY
capital
capital
CITY
of
RIVER
Ohio
(
loc_2
CITY
(
)
RIVER
riverid
(
)
'ohio'
)
CITY  capital CITY / capital(CITY)
CITY  of RIVER / loc_2(RIVER)
RIVER  Ohio / riverid('ohio')
106
Probabilistic Parsing Model
d1
d2
CITY
capital
(
loc_2
CITY
(
stateid
)
capital
STATE
(
CITY
)
'ohio'
loc_2
)
CITY  capital CITY / capital(CITY)
0.5
CITY  of STATE / loc_2(STATE)
0.3
STATE  Ohio / stateid('ohio')
0.5
+
(
CITY
(
riverid
λ
Pr(d1|capital of Ohio) = exp( 1.3 ) / Z
)
RIVER
(
)
'ohio'
)
CITY  capital CITY / capital(CITY)
0.5
CITY  of RIVER / loc_2(RIVER)
0.05
RIVER  Ohio / riverid('ohio')
0.5
+
λ
Pr(d2|capital of Ohio) = exp( 1.05 ) / Z
normalization constant
107
Overview of WASP
Unambiguous CFG of MRL
Lexical acquisition
Training set, {(e,f)}
Lexicon, L
Parameter estimation
Training
Parsing model parameterized by λ
Testing
Input sentence, e'
Semantic parsing
Output MR, f'
108
Lexical Acquisition
• Transformation rules are extracted from
word alignments between an NL sentence, e,
and its correct MR, f, for each training
example, (e, f)
109
Word Alignments
Le
And
programme
the
a
program
été
has
mis
en
been
application
implemented
• A mapping from French words to their
meanings expressed in English
110
Lexical Acquisition
• Train a statistical word alignment model
(IBM Model 5) on training set
• Obtain most probable n-to-1 word
alignments for each training example
• Extract transformation rules from these
word alignments
• Lexicon L consists of all extracted
transformation rules
111
Word Alignment for Semantic Parsing
The
goalie
should
always
stay
in
our
half
( ( true ) ( do our { 1 } ( pos ( half our ) ) ) )
• How to introduce syntactic tokens such as
parens?
112
Use of MRL Grammar
RULE  (CONDITION DIRECTIVE)
The
CONDITION  (true)
goalie
should
DIRECTIVE  (do TEAM {UNUM} ACTION)
always
TEAM  our
UNUM  1
stay
ACTION  (pos REGION)
in
n-to-1
our
REGION  (half TEAM)
half
TEAM  our
top-down,
left-most
derivation of
an unambiguous
CFG
113
Extracting Transformation Rules
The
goalie
should
always
stay
in
our
TEAM
half
RULE  (CONDITION DIRECTIVE)
CONDITION  (true)
DIRECTIVE  (do TEAM {UNUM} ACTION)
TEAM  our
UNUM  1
ACTION  (pos REGION)
REGION  (half TEAM)
TEAM  our
TEAM  our / our
114
Extracting Transformation Rules
The
goalie
should
always
stay
in
REGION
TEAM
half
RULE  (CONDITION DIRECTIVE)
CONDITION  (true)
DIRECTIVE  (do TEAM {UNUM} ACTION)
TEAM  our
UNUM  1
ACTION  (pos REGION)
REGION  (half TEAM)
our)
TEAM  our
REGION  TEAM half / (half TEAM)
115
Extracting Transformation Rules
The
goalie
should
always
ACTION
stay
in
REGION
RULE  (CONDITION DIRECTIVE)
CONDITION  (true)
DIRECTIVE  (do TEAM {UNUM} ACTION)
TEAM  our
UNUM  1
ACTION  (pos (half
REGION)
our))
REGION  (half our)
ACTION  stay in REGION / (pos REGION)
116
Probabilistic Parsing Model
• Based on maximum-entropy model:
1
Pr (d | e) 
exp   i f i (d)
Z (e)
i
• Features fi (d) are number of times each
transformation rule is used in a derivation d
• Output translation is the yield of most
probable derivation
f *  marg max d Prλ (d | e) 
117
Parameter Estimation
• Maximum conditional log-likelihood
criterion
λ  arg max λ
*
 log Pr (f | e)
λ
( e ,f )
• Since correct derivations are not included in
training data, parameters λ* are learned in
an unsupervised manner
• EM algorithm combined with improved
iterative scaling, where hidden variables are
correct derivations (Riezler et al., 2000)
118
Experimental Corpora
• CLang
– 300 randomly selected pieces of coaching advice from
the log files of the 2003 RoboCup Coach Competition
– 22.52 words on average in NL sentences
– 14.24 tokens on average in formal expressions
• GeoQuery [Zelle & Mooney, 1996]
–
–
–
–
250 queries for the given U.S. geography database
6.87 words on average in NL sentences
5.32 tokens on average in formal expressions
Also translated into Spanish, Turkish, & Japanese.
119
Experimental Methodology
• Evaluated using standard 10-fold cross validation
• Correctness
– CLang: output exactly matches the correct
representation
– Geoquery: the resulting query retrieves the same
answer as the correct representation
• Metrics
| Correct Completed Parses |
Precision 
| Completed Parses |
|Correct Completed Parses|
Recall 
|Sentences|
120
Precision Learning Curve for CLang
121
Recall Learning Curve for CLang
122
Precision Learning Curve for GeoQuery
123
Recall Learning Curve for Geoquery
124
Precision Learning Curve for GeoQuery
(WASP)
125
Recall Learning Curve for GeoQuery
(WASP)
126
λWASP
• Logical forms can be made more
isomorphic to NL sentences than FunQL
and allow for better compositionality and
generalization.
• Version of WASP that uses λ calculus to
introduce and bind logical variables.
– Standard in compositional formal semantics,
e.g. Montague semantics.
• Modify SCFG to λ-SCFG
127
SCFG Derivations
QUERY
QUERY
SCFG Derivations
QUERY
What is
FORM
QUERY
answer(x1, FORM )
QUERY  What is FORM / answer(x1,FORM)
SCFG Derivations
QUERY
What is
the smallest
FORM
FORM FORM
QUERY
answer(x1, FORM )
smallest(x2,( FORM , FORM ))
FORM  the smallest FORM FORM / smallest(x2,(FORM,FORM))
SCFG Derivations
QUERY
What is
the smallest
QUERY
FORM
answer(x1, FORM )
FORM FORM
smallest(x2,( FORM , FORM ))
state
state(x1)
FORM  state / state(x1)
SCFG Derivations
QUERY
What is
the smallest
QUERY
FORM
answer(x1, FORM )
FORM FORM
state by area
smallest(x2,( FORM , FORM ))
state(x1) area(x1,x2)
FORM  by area / area(x1,x2)
SCFG Derivations
QUERY
What is
the smallest
FORM
FORM FORM
state by area
What is the smallest state by area
QUERY
answer(x1, FORM )
smallest(x2,( FORM , FORM ))
state(x1) area(x1,x2)
answer(x1,smallest(x2,(state(x1),area(x1,x2))))
SCFG Derivations
QUERY
What is
the smallest
FORM
FORM FORM
state by area
What is the smallest state by area
QUERY
answer(x1, FORM )
smallest(x2,( FORM , FORM
??? ))
state(x1) area(x1,x2)
answer(x1,smallest(x2,(state(x1),area(x1,x2))))
λ-SCFG Derivations
QUERY
What is
the smallest
FORM
FORM FORM
state by area
What is the smallest state by area
QUERY
answer(x1, FORM )
λx1.smallest(x2,( FORM , FORM ))
λx1.state(x1) λx1.λx2.area(x1,x2)
answer(x1,smallest(x2,(state(x1),area(x1,x2))))
λ-SCFG Derivations
QUERY
What is
the smallest
FORM
FORM FORM
state by area
What is the smallest state by area
QUERY
answer(x1, FORM(x1) )
λx1.smallest(x2,( FORM(x1) , FORM(x1,x2) ))
λx1.state(x1) λx1.λx2.area(x1,x2)
answer(x1,smallest(x2,(state(x1),area(x1,x2))))
λ-SCFG Derivations
QUERY
What is
the smallest
FORM
FORM FORM
state by area
What is the smallest state by area
QUERY
answer(x1, FORM(x1) )
λx1.smallest(x2,( FORM(x1) , FORM(x1,x2) ))
λx1.state(x1) λx1.λx2.area(x1,x2)
answer(x1,smallest(x2,(state(x1),area(x1,x2))))
λ-SCFG Production Rules
NL string:
FORM  smallest FORM FORM /
MR string:
λx1.smallest(x2,( FORM(x1) , FORM(x1,x2) ))
Variable-binding λ-operator:
Binds occurrences of x1 in the MR string
Argument lists:
For function applications
138
Yield of λ-SCFG Derivations
QUERY
What is
the smallest
FORM
FORM FORM
state by area
QUERY
answer(x1, FORM(x1) )
λx1.smallest(x2,( FORM(x1) , FORM(x1,x2) ))
λx1.state(x1) λx1.λx2.area(x1,x2)
???
What is the smallest state by area
answer(x1,smallest(x2,(state(x1),area(x1,x2))))
Computing Yield with
Lambda Calculus
QUERY
answer(x1, FORM(x1) )
λx2.smallest(x1,( FORM(x2) , FORM(x2,x1) ))
λx1.state(x1) λx1.λx2.area(x1,x2)
Computing Yield with
Lambda Calculus
QUERY
answer(x1, FORM(x1) )
λx2.smallest(x1,( FORM(x2) , FORM(x2,x1) ))
λx1.state(x1) λx1.λx2.area(x1,x2)
Lambda functions
Computing Yield with
Lambda Calculus
QUERY
answer(x1, FORM(x1) )
λx2.smallest(x1,( (λx1.state(x1))(x2) , (λx1.λx2.area(x1,x2))(x2,x1) ))
Computing Yield with
Lambda Calculus
QUERY
answer(x1, FORM(x1) )
λx2.smallest(x1,( (λx1.state(x1))(x2) , (λx1.λx2.area(x1,x2))(x2,x1) ))
Function application:
Replace bound occurrences of x1 with x2
(λx1.f(x1))(x2) = f(x2)
Computing Yield with
Lambda Calculus
QUERY
answer(x1, FORM(x1) )
λx2.smallest(x1,( state(x2) , (λx1.λx2.area(x1,x2))(x2,x1) ))
Computing Yield with
Lambda Calculus
QUERY
answer(x1, FORM(x1) )
λx2.smallest(x1,( state(x2) , area(x2,x1) ))
Computing Yield with
Lambda Calculus
QUERY
answer(x1, FORM(x1) )
λx2.smallest(x1,( state(x2) , area(x2,x1) ))
Lambda function
Computing Yield with
Lambda Calculus
QUERY
answer(x1, λx2.smallest(x1,(state(x2),area(x2,x1)))(x1) )
Computing Yield with
Lambda Calculus
QUERY
answer(x1, smallest(x3,(state(x1),area(x1,x3))) )
Computing Yield with
Lambda Calculus
QUERY
answer(x1, smallest(x3,(state(x1),area(x1,x3))) )
Computing Yield with
Lambda Calculus
answer(x1,smallest(x3,(state(x1),area(x1,x3))))
Logical form free of λ-operators
with logical variables properly named
Learning in λWASP
• Must update induction of SCFG rules to
introduce λ functions and produce a
λSCFG.
λWASP Results on Geoquery
152
Tactical Natural Language Generation
• Mapping a formal MR into NL
• Can be done using statistical machine
translation
– Previous work focuses on using generation in
interlingual MT (Hajič et al., 2004)
– There has been little, if any, research on
exploiting statistical MT methods for
generation
153
Tactical Generation
• Can be seen as inverse of semantic parsing
The goalie should always stay in our half
Semantic parsing
Tactical generation
((true) (do our {1} (pos (half our))))
154
Generation by Inverting WASP
• Same synchronous grammar is used for
both generation and semantic parsing
Tactical generation:
Semantic
parsing:
NL:
Input
Output
MRL:
QUERY  What is CITY / answer(CITY)
155
Generation by Inverting WASP
• Same procedure for lexical acquisition
• Chart generator very similar to chart parser,
but treats MRL as input
• Log-linear probabilistic model inspired by
Pharaoh (Koehn et al., 2003), a phrasebased MT system
• Uses a bigram language model for target
NL
• Resulting system is called WASP-1
156
Geoquery (NIST score; English)
157
RoboCup (NIST score; English)
contiguous phrases only
Similar human evaluation results
in terms of fluency and adequacy
158
Conclusions
• Semantic parsing maps NL sentences to completely formal
MRs.
• Semantic parsers can be effectively learned from
supervised corpora consisting of only sentences paired
with their formal MRs (and possibly also SAPTs).
• Learning methods can be based on:
– Adding semantics to an existing statistical syntactic parser and
then using compositional semantics.
– Using SVM with string kernels to recognize concepts in the NL
and then composing them into a complete MR using the MRL
grammar.
– Using probabilistic synchronous context-free grammars to learn an
NL/MR grammar that supports both semantic parsing and
generation.
Download