Learning for Semantic Parsing Raymond J. Mooney Machine Learning Group Yuk Wah Wong

advertisement
Machine Learning Group
Learning for Semantic Parsing
Raymond J. Mooney
Yuk Wah Wong
Ruifang Ge
Rohit Kate
Machine Learning Group
Department of Computer Sciences
University of Texas at Austin
University of Texas at Austin
Syntactic Natural Language Learning
• Most computational research in natural-language
learning has addressed “low-level” syntactic
processing.
–
–
–
–
Morphology (e.g. past-tense generation)
Part-of-speech tagging
Shallow syntactic parsing (chunking)
Syntactic parsing
2
Semantic Natural Language Learning
• Learning for semantic analysis has been restricted
to relatively “shallow” meaning representations.
– Word sense disambiguation (e.g. SENSEVAL)
– Semantic role assignment (determining agent, patient,
instrument, etc., e.g. FrameNet, PropBank)
– Information extraction
3
Semantic Parsing
• A semantic parser maps a natural-language
sentence to a complete, detailed semantic
representation: logical form or meaning
representation (MR).
• For many applications, the desired output is
immediately executable by another program.
• Two application domains:
– CLang: RoboCup Coach Language
– GeoQuery: A Database Query Application
4
CLang: RoboCup Coach Language
• In RoboCup Coach competition teams compete to
coach simulated players
• The coaching instructions are given in a formal
language called CLang
Coach
If the ball is in our
penalty area, then all our
players except player 4
should stay in our half.
Simulated soccer field
Semantic Parsing
CLang
((bpos (penalty-area our))
(do (player-except our{4}) (pos (half our)))
5
GeoQuery: A Database Query Application
• Query application for U.S. geography database
containing about 800 facts [Zelle & Mooney, 1996]
User
How many states
does the
Mississippi run
through?
Semantic Parsing
Query answer(A, count(B,
(state(B),
C=riverid(mississippi),
traverse(C,B)),
A))
6
Learning Semantic Parsers
• Manually programming robust semantic parsers is
difficult due to the complexity of the task.
• Semantic parsers can be learned automatically from
sentences paired with their logical form.
NLMR
Training Exs
Natural
Language
Semantic-Parser
Learner
Semantic
Parser
Meaning
Rep
7
Engineering Motivation
• Most computational language-learning research
strives for broad coverage while sacrificing depth.
– “Scaling up by dumbing down”
• Realistic semantic parsing currently entails
domain dependence.
• Domain-dependent natural-language interfaces
have a large potential market.
• Learning makes developing specific applications
more tractable.
• Training corpora can be easily developed by
tagging existing corpora of formal statements
with natural-language glosses.
8
Cognitive Science Motivation
• Most natural-language learning methods require
supervised training data that is not available to a
child.
– General lack of negative feedback on grammar.
– No treebank or sense or semantic-role tagged data.
• Assuming a child can infer the likely meaning of
an utterance from context, NLMR pairs are more
cognitively plausible training data.
9
Our Semantic-Parser Learners
• CHILL+WOLFIE (Zelle & Mooney, 1996; Thompson & Mooney,
1999, 2003)
– Separates parser-learning and semantic-lexicon learning.
– Learns a deterministic parser using ILP techniques.
• COCKTAIL (Tang & Mooney, 2001)
– Improved ILP algorithm for CHILL.
• SILT (Kate, Wong & Mooney, 2005)
– Learns symbolic transformation rules for mapping directly from NL to MR.
• SCISSOR (Ge & Mooney, 2005)
– Integrates semantic interpretation into Collins’ statistical syntactic parser.
• WASP (Wong & Mooney, 2006)
– Uses syntax-based statistical machine translation methods.
• KRISP (Kate & Mooney, 2006)
– Uses a series of SVM classifiers employing a string-kernel to iteratively build
semantic representations.
10
GeoQuery On-Line Demo
http://www.cs.utexas.edu/users/ml/geo.html
11
SCISSOR:
Semantic Composition that Integrates Syntax
and Semantics to get Optimal Representations
• Based on a fairly standard approach to compositional
semantics [Jurafsky and Martin, 2000]
• A statistical parser is used to generate a semantically
augmented parse tree (SAPT)
– Augment Collins’ head-driven model 2 to incorporate
S-bowner
semantic labels
NP-player
VP-bowner
• Translate
a complete
formal meaning
PRP$-team SAPT
NN-playerinto
CD-unum
VB-bowner
NP-null
representation
(MR) 2
our
player
has
DT-null
NN-null
MR: bowner(player(our,2))
the
ball
12
Overview of SCISSOR
NL Sentence
SAPT Training Examples
learner
Integrated Semantic Parser
SAPT
TRAINING
TESTING
ComposeMR
MR
13
SCISSOR SAPT Parser Implementation
• Semantic labels added to Bikel’s (2004) opensource version of the Collins statistical parser.
• Head-driven derivation of production rules
augmented to also generate semantic labels.
• Parameter estimates during training employ an
augmented smoothing technique to account for
additional data sparsity created by semantic labels.
• Parsing of test sentences to find the most probable
SAPT is performed using a standard beam-search
constrained version of CKY chart-parsing
algorithm.
14
ComposeMR
bowner
player
team
player
our
player
bowner
unum
2
null
bowner
has
null
null
the
ball
15
ComposeMR
bowner(_)
player(_,_)
team
player(_,_)
our
player
bowner(_)
unum
2
null
bowner(_)
has
null
null
the
ball
16
ComposeMR
bowner(player(our,2))
bowner(_)
bowner(_)
player(our,2)
player(_,_)
player(_,_)
team
player(_,_)
our
player
unum
2
null
bowner(_)
has
null
null
the
ball
player(team,unum)
bowner(player)
17
WASP
A Machine Translation Approach to Semantic Parsing
• Based on a semantic grammar of the natural
language.
• Uses machine translation techniques
– Synchronous context-free grammars (SCFG) (Wu, 1997;
Melamed, 2004; Chiang, 2005)
– Word alignments (Brown et al., 1993; Och & Ney, 2003)
• Hence the name: Word Alignment-based
Semantic Parsing
18
Synchronous Context-Free Grammars (SCFG)
• Developed by Aho & Ullman (1972) as a theory of
compilers that combines syntax analysis and code
generation in a single phase
• Generates a pair of strings in a single derivation
19
Compiling, Machine Translation, and
Semantic Parsing
• SCFG: formal language to formal language
(compiling)
• Alignment models: natural language to natural
language (machine translation)
• WASP: natural language to formal language
(semantic parsing)
20
Context-Free Semantic Grammar
QUERY
QUERY  What is CITY
What
is
CITY
CITY  the capital CITY
CITY  of STATE
STATE  Ohio
the
capital
of
CITY
STATE
Ohio
21
Productions of
Synchronous Context-Free Grammars
pattern
template
QUERY  What is CITY / answer(CITY)
• Referred to as transformation rules in Kate, Wong
& Mooney (2005)
22
Synchronous Context-Free Grammars
QUERY
What
is
the
QUERY
answer
CITY
capital
of
(
capital
CITY
STATE
Ohio
CITY
(
loc_2
)
CITY
(
stateid
)
STATE
(
)
'ohio'
)
CITY
Ohio
the
capital
CITY
capital(CITY)
QUERY
CITY

 What
of
STATE
is
CITY
loc_2(STATE)
// answer(CITY)
answer(capital(loc_2(stateid('ohio'))))
STATE
Ohio
//stateid('ohio')
What is the capital
of
23
Parsing Model of WASP
•
•
•
•
N (non-terminals) = {QUERY, CITY, STATE, …}
S (start symbol) = QUERY
Tm (MRL terminals) = {answer, capital, loc_2, (, ), …}
Tn (NL words) = {What, is, the, capital, of, Ohio, …}
QUERY  What is CITY / answer(CITY)
• L (lexicon) =
CITY  the capital CITY / capital(CITY)
CITY  of STATE / loc_2(STATE)
STATE  Ohio / stateid('ohio')
• λ (parameters of probabilistic model) = ?
24
Probabilistic Parsing Model
d1
CITY
CITY
capital
capital
CITY
of
STATE
Ohio
(
loc_2
CITY
(
)
STATE
stateid
(
)
'ohio'
)
CITY  capital CITY / capital(CITY)
CITY  of STATE / loc_2(STATE)
STATE  Ohio / stateid('ohio')
25
Probabilistic Parsing Model
d2
CITY
CITY
capital
capital
CITY
of
RIVER
Ohio
(
loc_2
CITY
(
)
RIVER
riverid
(
)
'ohio'
)
CITY  capital CITY / capital(CITY)
CITY  of RIVER / loc_2(RIVER)
RIVER  Ohio / riverid('ohio')
26
Probabilistic Parsing Model
d1
d2
CITY
capital
(
loc_2
CITY
(
stateid
)
capital
STATE
(
CITY
)
'ohio'
loc_2
)
CITY  capital CITY / capital(CITY)
0.5
CITY  of STATE / loc_2(STATE)
0.3
STATE  Ohio / stateid('ohio')
0.5
+
(
CITY
(
riverid
λ
Pr(d1|capital of Ohio) = exp( 1.3 ) / Z
)
RIVER
(
)
'ohio'
)
CITY  capital CITY / capital(CITY)
0.5
CITY  of RIVER / loc_2(RIVER)
0.05
RIVER  Ohio / riverid('ohio')
0.5
+
λ
Pr(d2|capital of Ohio) = exp( 1.05 ) / Z
normalization constant
27
Parsing Model of WASP
•
•
•
•
N (non-terminals) = {QUERY, CITY, STATE, …}
S (start symbol) = QUERY
Tm (MRL terminals) = {answer, capital, loc_2, (, ), …}
Tn (NL words) = {What, is, the, capital, of, Ohio, …}
QUERY  What is CITY / answer(CITY)
• L (lexicon) =
CITY  the capital CITY / capital(CITY)
CITY  of STATE / loc_2(STATE)
STATE  Ohio / stateid('ohio')
• λ (parameters of probabilistic model)
28
Overview of WASP
Unambiguous CFG of MRL
Lexical acquisition
Training set, {(e,f)}
Lexicon, L
Parameter estimation
Training
Parsing model parameterized by λ
Testing
Input sentence, e'
Semantic parsing
Output MR, f'
29
Lexical Acquisition
• Transformation rules are extracted from word
alignments between an NL sentence, e, and its
correct MR, f, for each training example, (e, f)
30
Word Alignments
Le
And
programme
the
a
program
été
has
mis
en
been
application
implemented
• A mapping from French words to their meanings
expressed in English
31
Lexical Acquisition
• Train a statistical word alignment model (IBM
Model 5) on training set
• Obtain most probable n-to-1 word alignments for
each training example
• Extract transformation rules from these word
alignments
• Lexicon L consists of all extracted transformation
rules
32
Word Alignment for Semantic Parsing
The
goalie
should
always
stay
in
our
half
( ( true ) ( do our { 1 } ( pos ( half our ) ) ) )
• How to introduce syntactic tokens such as parens?
33
Use of MRL Grammar
RULE  (CONDITION DIRECTIVE)
The
CONDITION  (true)
goalie
should
DIRECTIVE  (do TEAM {UNUM} ACTION)
always
TEAM  our
UNUM  1
stay
ACTION  (pos REGION)
in
n-to-1
our
REGION  (half TEAM)
half
TEAM  our
top-down,
left-most
derivation of
an unambiguous
CFG
34
Extracting Transformation Rules
The
goalie
should
always
stay
in
our
TEAM
half
RULE  (CONDITION DIRECTIVE)
CONDITION  (true)
DIRECTIVE  (do TEAM {UNUM} ACTION)
TEAM  our
UNUM  1
ACTION  (pos REGION)
REGION  (half TEAM)
TEAM  our
TEAM  our / our
35
Extracting Transformation Rules
The
goalie
should
always
stay
in
REGION
TEAM
half
RULE  (CONDITION DIRECTIVE)
CONDITION  (true)
DIRECTIVE  (do TEAM {UNUM} ACTION)
TEAM  our
UNUM  1
ACTION  (pos REGION)
REGION  (half TEAM)
our)
TEAM  our
REGION  TEAM half / (half TEAM)
36
Extracting Transformation Rules
The
goalie
should
always
ACTION
stay
in
REGION
RULE  (CONDITION DIRECTIVE)
CONDITION  (true)
DIRECTIVE  (do TEAM {UNUM} ACTION)
TEAM  our
UNUM  1
ACTION  (pos (half
REGION)
our))
REGION  (half our)
ACTION  stay in REGION / (pos REGION)
37
Probabilistic Parsing Model
• Based on maximum-entropy model:
1
Pr (d | e) 
exp   i f i (d)
Z (e)
i
• Features fi (d) are number of times each
transformation rule is used in a derivation d
• Output translation is the yield of most probable
derivation
f *  marg max d Prλ (d | e) 
38
Parameter Estimation
• Maximum conditional log-likelihood criterion
λ*  arg max λ
 log Pr (f | e)
λ
( e ,f )
• Since correct derivations are not included in
training data, parameters λ* are learned in an
unsupervised manner
• EM algorithm combined with improved iterative
scaling, where hidden variables are correct
derivations (Riezler et al., 2000)
39
KRISP: Kernel-based Robust Interpretation
by Semantic Parsing
• Learns semantic parser from NL sentences paired
with their respective MRs given MRL grammar
• Productions of MRL are treated like semantic
concepts
• SVM classifier is trained for each production with
string subsequence kernel
• These classifiers are used to compositionally build
MRs of the sentences
40
Experimental Corpora
• CLang
– 300 randomly selected pieces of coaching advice from
the log files of the 2003 RoboCup Coach Competition
– 22.52 words on average in NL sentences
– 14.24 tokens on average in formal expressions
• GeoQuery [Zelle & Mooney, 1996]
–
–
–
–
250 queries for the given U.S. geography database
6.87 words on average in NL sentences
5.32 tokens on average in formal expressions
Also translated into Spanish, Turkish, & Japanese.
41
Experimental Methodology
• Evaluated using standard 10-fold cross validation
• Correctness
– CLang: output exactly matches the correct
representation
– Geoquery: the resulting query retrieves the same
answer as the correct representation
• Metrics
| Correct Completed Parses |
Precision 
| Completed Parses |
|Correct Completed Parses|
Recall 
|Sentences|
42
Precision Learning Curve for CLang
43
Recall Learning Curve for CLang
44
Precision Learning Curve for GeoQuery
45
Recall Learning Curve for Geoquery
46
Precision Learning Curve for GeoQuery (WASP)
47
Recall Learning Curve for GeoQuery (WASP)
48
Tactical Natural Language Generation
• Mapping a formal MR into NL
• Can be done using statistical machine translation
– Previous work focuses on using generation in
interlingual MT (Hajič et al., 2004)
– There has been little, if any, research on exploiting
statistical MT methods for generation
49
Tactical Generation
• Can be seen as inverse of semantic parsing
The goalie should always stay in our half
Semantic parsing
Tactical generation
((true) (do our {1} (pos (half our))))
50
Generation by Inverting WASP
• Same synchronous grammar is used for both
generation and semantic parsing
Tactical generation:
Semantic
parsing:
NL:
Input
Output
MRL:
QUERY  What is CITY / answer(CITY)
51
Generation by Inverting WASP
• Same procedure for lexical acquisition
• Chart generator very similar to chart parser, but
treats MRL as input
• Log-linear probabilistic model inspired by
Pharaoh (Koehn et al., 2003), a phrase-based MT
system
• Uses a bigram language model for target NL
• Resulting system is called WASP-1
52
Geoquery (NIST score; English)
53
RoboCup (NIST score; English)
contiguous phrases only
Similar human evaluation results
in terms of fluency and adequacy
54
Future Work
• Explore methods that can automatically generate
SAPTs to minimize the annotation effort for
SCISSOR.
• Learning semantic parsers just from sentences
paired with “perceptual context.”
55
Conclusions
• Learning semantic parsers is an important and
challenging problem in natural-language learning.
• We have obtained promising results on several
applications using a variety of approaches with
different strengths and weaknesses.
• One of our semantic parsers has been inverted to
produce a generation system.
• Not many others have explored this problem, I
would encourage others to consider it.
• More and larger corpora are needed for training
and testing semantic parser induction.
56
Thank You!
Our papers on learning semantic parsers are on-line at:
http://www.cs.utexas.edu/~ml/publication/lsp.html
Our corpora can be downloaded from:
http://www.cs.utexas.edu/~ml/nldata.html
Try our GeoQuery demo at:
http://www.cs.utexas.edu/~ml/geo.html
Questions??
57
PR Curves
• SCISSOR, WASP, and KRISP give probabilities
for their semantic derivations which are taken as
confidences of the MRs
• We plot precision-recall curves (PR Curves) at the
last points of the learning curves by first sorting
the best MR for each sentence by confidences and
then finding precision for every recall value
• The result of COCKTAIL on GeoQuery is shown
as a point on the PR Curve, while its result on
CLang is not shown since it failed to run at the last
point of the learning curve.
58
PR Curve for CLang
59
PR Curve for GeoQuery
60
Precision Learning Curve for GeoQuery (880)
61
Recall Learning Curve for Geoquery (880)
62
PR Curve for GeoQuery (880)
63
Precision Learning Curve for GeoQuery
(WASP with lambda-calculus)
64
Recall Learning Curve for GeoQuery
(WASP with lambda-calculus)
65
Download