Grounded Language Learning Models for Ambiguous Supervision Joohyun Kim

advertisement
Grounded Language Learning
Models for Ambiguous Supervision
Joohyun Kim
Supervising Professor: Raymond J. Mooney
Ph.D Thesis Defense Talk
August 23, 2013
Outline
• Introduction/Motivation
• Grounded Language Learning in Limited Ambiguity (Kim
and Mooney, COLING 2010)
– Learning to sportscast
• Grounded Language Learning in High Ambiguity
(Kim and Mooney, EMNLP 2012)
– Learn to follow navigational instructions
• Discriminative Reranking for Grounded Language
Learning (Kim and Mooney, ACL 2013)
• Future Directions
• Conclusion
2
Outline
• Introduction/Motivation
• Grounded Language Learning in Limited Ambiguity (Kim
and Mooney, COLING 2010)
– Learning to sportscast
• Grounded Language Learning in High Ambiguity
(Kim and Mooney, EMNLP 2012)
– Learn to follow navigational instructions
• Discriminative Reranking for Grounded Language
Learning (Kim and Mooney, ACL 2013)
• Future Directions
• Conclusion
3
Language Grounding
• The process to acquire the semantics of
natural language with respect to relevant
perceptual contexts
• Human child grounds language to perceptual
contexts via repetitive exposure in statistical
way (Saffran et al. 1999, Saffran 2003)
• Ideally, we want computational system to
learn from the similar way
4
Language Grounding: Machine
Iran’s goalkeeper
blocks the ball
5
Language Grounding: Machine
Iran’s goalkeeper
blocks the ball
Block(IranGoal
Keeper)
Machine
6
Language Grounding: Machine
Language Learning
Iran’s goalkeeper
blocks the ball
Computer Vision
Block(IranGoal
Keeper)
7
Natural Language and
Meaning Representation
Iran’s goalkeeper
blocks the ball
Block(IranGoalKeeper)
8
Natural Language and
Meaning Representation
Natural Language (NL)
Iran’s goalkeeper
blocks the ball
Block(IranGoalKeeper)
NL: A language that arises naturally by the innate nature of
human intellect, such as English, German, French, Korean, etc
9
Natural Language and
Meaning Representation
Natural Language (NL)
Iran’s goalkeeper
blocks the ball
Meaning Representation Language
(MRL)
Block(IranGoalKeeper)
NL: A language that arises naturally by the innate nature of
human intellect, such as English, German, French, Korean, etc
MRL: Formal languages that machine can understand such as
logic or any computer-executable code
10
Semantic Parsing and
Surface Realization
NL
Iran’s goalkeeper
blocks the ball
MRL
Block(IranGoalKeeper)
Semantic Parsing (NL  MRL)
Semantic Parsing: maps a natural-language sentence to a
full, detailed semantic representation
→ Machine understands natural language
11
Semantic Parsing and
Surface Realization
NL
Surface Realization (NL  MRL)
Iran’s goalkeeper
blocks the ball
MRL
Block(IranGoalKeeper)
Semantic Parsing (NL  MRL)
Semantic Parsing: maps a natural-language sentence to a
full, detailed semantic representation
→ Machine understands natural language
Surface Realization: Generates a natural-language sentence
from a meaning representation.
→ Machine communicates with natural language
12
Conventional Language Learning Systems
• Requires manually annotated corpora
• Time-consuming, hard to acquire, and not
scalable
Semantic Parser
Learner
Manually Annotated
Training Corpora
(NL/MRL pairs)
Semantic Parser
NL
MRL
13
Learning from Perceptual Environment
• Motivated by how children learn language in
rich, ambiguous perceptual environment with
linguistic input
• Advantages
– Naturally obtainable corpora
– Relatively easy to annotate
– Motivated by natural process of human language
learning
14
Navigation Example
Alice: 식당에서 우회전 하세요
Bob
Slide from David Chen
15
Navigation Example
Alice: 병원에서 우회전 하세요
Bob
Slide from David Chen
16
Navigation Example
Scenario 1
식당에서 우회전 하세요
Scenario 2
병원에서 우회전 하세요
Slide from David Chen
17
Navigation Example
Scenario 1
병원에서 우회전 하세요
Scenario 2
식당에서 우회전 하세요
Slide from David Chen
18
Navigation Example
Scenario 1
병원에서 우회전 하세요
Scenario 2
식당에서 우회전 하세요
Make a right turn
Slide from David Chen
19
Navigation Example
Scenario 1
식당에서 우회전 하세요
Scenario 2
병원에서 우회전 하세요
Slide from David Chen
20
Navigation Example
Scenario 1
식당
Scenario 2
Slide from David Chen
21
Navigation Example
Scenario 1
식당에서 우회전 하세요
Scenario 2
병원에서 우회전 하세요
Slide from David Chen
22
Navigation Example
Scenario 1
Scenario 2
병원
Slide from David Chen
23
Thesis Contributions
• Generative models for grounded language learning
from ambiguous, perceptual environment
– Unified probabilistic model incorporating linguistic cues
and MR structures (vs. previous approaches)
– General framework of probabilistic approaches that learn
NL-MR correspondences from ambiguous supervision
• Adapting discriminative reranking to grounded
language learning
– Standard reranking is not available
– No single gold-standard reference for training data
– Weak response from perceptual environment can train
discriminative reranker
24
Outline
• Introduction/Motivation
• Grounded Language Learning in Limited Ambiguity (Kim
and Mooney, COLING 2010)
– Learning to sportscast
• Grounded Language Learning in High Ambiguity
(Kim and Mooney, EMNLP 2012)
– Learn to follow navigational instructions
• Discriminative Reranking for Grounded Language
Learning (Kim and Mooney, ACL 2013)
• Future Directions
• Conclusion
25
Navigation Task (Chen and Mooney, 2011)
• Learn to interpret and follow navigation
instructions
– e.g. Go down this hall and make a right when you
see an elevator to your left
• Use virtual worlds and instructor/follower
data from MacMahon et al. (2006)
• No prior linguistic knowledge
• Infer language semantics by observing how
humans follow instructions
26
Sample Environment (MacMahon et al., 2006)
H
H – Hat Rack
L
L – Lamp
E
E
C
S
S – Sofa
S
B
E – Easel
C
B – Barstool
C - Chair
H
L
27
Executing Test Instruction
28
Task Objective
• Learn the underlying meanings of instructions by
observing human actions for the instructions
– Learn to map instructions (NL) into correct formal plan
of actions (MR)
• Learn from high ambiguity
– Training input of NL instruction / landmarks plan (Chen
and Mooney, 2011) pairs
– Landmarks plan
Describe actions in the environment along with notable
objects encountered on the way
Overestimate the meaning of the instruction, including
unnecessary details
Only subset of the plan is relevant for the instruction
29
Challenges
Instruction: "at the easel, go left and then take a right onto the blue
path at the corner"
Landmarks Travel ( steps: 1 ) ,
plan:
Verify ( at: EASEL , side: CONCRETE HALLWAY ) ,
Turn ( LEFT ) ,
Verify ( front: CONCRETE HALLWAY ) ,
Travel ( steps: 1 ) ,
Verify ( side: BLUE HALLWAY , front: WALL ) ,
Turn ( RIGHT ) ,
Verify ( back: WALL , front: BLUE HALLWAY , front: CHAIR ,
front: HATRACK , left: WALL , right: EASEL )
30
Challenges
Instruction: "at the easel, go left and then take a right onto the blue
path at the corner"
Landmarks Travel ( steps: 1 ) ,
plan:
Verify ( at: EASEL , side: CONCRETE HALLWAY ) ,
Turn ( LEFT ) ,
Verify ( front: CONCRETE HALLWAY ) ,
Travel ( steps: 1 ) ,
Verify ( side: BLUE HALLWAY , front: WALL ) ,
Turn ( RIGHT ) ,
Verify ( back: WALL , front: BLUE HALLWAY , front: CHAIR ,
front: HATRACK , left: WALL , right: EASEL )
31
Challenges
Instruction: "at the easel, go left and then take a right onto the blue
path at the corner"
Correct
plan:
Travel ( steps: 1 ) ,
Verify ( at: EASEL , side: CONCRETE HALLWAY ) ,
Turn ( LEFT ) ,
Verify ( front: CONCRETE HALLWAY ) ,
Travel ( steps: 1 ) ,
Verify ( side: BLUE HALLWAY , front: WALL ) ,
Turn ( RIGHT ) ,
Verify ( back: WALL , front: BLUE HALLWAY , front: CHAIR ,
front: HATRACK , left: WALL , right: EASEL )
Exponential Number of Possibilities!
 Combinatorial matching problem between instruction and landmarks plan
32
Previous Work (Chen and Mooney, 2011)
• Circumvent combinatorial NL-MR correspondence
problem
– Constructs supervised NL-MR training data by refining
landmarks plan with learned semantic lexicon
Greedily select high-score lexemes to choose probable MR
components out of landmarks plan
– Trains supervised semantic parser to map novel
instruction (NL) to correct formal plan (MR)
– Loses information during refinement
Deterministically select high-score lexemes
Ignores possibly useful low-score lexemes
Some relevant MR components are not considered at all
33
Proposed Solution (Kim and Mooney, 2012)
• Learn probabilistic semantic parser directly
from ambiguous training data
– Disambiguate input + learn to map NL instructions
to formal MR plan
– Semantic lexicon (Chen and Mooney, 2011) as basic unit
for building NL-MR correspondences
– Transforms into standard PCFG (Probabilistic
Context-Free Grammar) induction problem with
semantic lexemes as nonterminals and NL words
as terminals
34
System Diagram (Chen and Mooney, 2011)
Observation
World State
Action Trace
Possible
information
loss
Landmarks Plan
Plan Refinement
Supervised Refined Plan
(Supervised) Semantic Parser Learner
Training
Testing
Instruction
World State
Semantic Parser
Execution Module (MARCO)
Action Trace
35
Learning/Inference
Instruction
Learning system for parsing
navigation instructions
Navigation Plan Constructor
System Diagram of Proposed Solution
Observation
World State
Action Trace
Learning system for parsing
navigation instructions
Navigation Plan Constructor
Landmarks Plan
Instruction
Probabilistic Semantic
Parser Learner (from
ambiguous supervison)
Instruction
Semantic Parser
Training
Testing
World State
Execution Module (MARCO)
Action Trace
36
PCFG Induction Model for Grounded
Language Learning (Borschinger et al. 2011)
• PCFG rules to describe generative process from
MR components to corresponding NL words
37
Hierarchy Generation PCFG Model
(Kim and Mooney, 2012)
• Limitations of Borschinger et al. 2011
– Only work in low ambiguity settings
 1 NL – a handful of MRs (≤ order of 10s)
– Only output MRs included in the constructed PCFG from
training data
• Proposed model
– Use semantic lexemes as units of semantic concepts
– Disambiguate NL-MR correspondences in semantic concept
(lexeme) level
– Disambiguate much higher level of ambiguous supervision
– Output novel MRs not appearing in the PCFG by composing
MR parse with semantic lexeme MRs
38
Semantic Lexicon (Chen and Mooney, 2011)
• Pair of NL phrase w and MR subgraph g
• Based on correlations between NL instructions and
context MRs (landmarks plans)
– How graph g is probable given seeing phrase w
• Examples
cooccurrence
of g and w
general occurrence
of g without w
– “to the stool”, Travel(), Verify(at: BARSTOOL)
– “black easel”, Verify(at: EASEL)
– “turn left and walk”, Turn(), Travel()
39
Lexeme Hierarchy Graph (LHG)
• Hierarchy of semantic lexemes
by subgraph relationship,
constructed for each training
example
– Lexeme MRs = semantic
concepts
– Lexeme hierarchy = semantic
concept hierarchy
– Shows how complicated
semantic concepts hierarchically
generate smaller concepts, and
further connected to NL word
groundings
Turn
Verify
side:
front:
RIGHT HATRACK SOFA
Turn
Verify
RIGHT
side:
HATRACK
Travel
Verify
steps:
3
at:
EASEL
Travel
Travel
Verify
at:
EASEL
Verify
Verify
side:
HATRACK
at:
EASEL
Turn
Turn
40
PCFG Construction
• Add rules per each node in LHG
– Each complex concept chooses which subconcepts
to describe that will finally be connected to NL
instruction
Each node generates all k-permutations of children
nodes – we do not know which subset is correct
– NL words are generated by lexeme nodes by
unigram Markov process (Borschinger et al. 2011)
– PCFG rule weights are optimized by EM
Most probable MR components out of all possible
combinations are estimated
41
PCFG Construction
𝑅𝑜𝑜𝑡 → 𝑆𝑐 ,
∀𝑐 ∈ 𝑐𝑜𝑛𝑡𝑒𝑥𝑡𝑠
∀𝑛𝑜𝑛 − 𝑙𝑒𝑎𝑓 𝑛𝑜𝑑𝑒 𝑎𝑛𝑑 𝑖𝑡𝑠 𝑀𝑅 m
𝑆𝑚 → 𝑆𝑚1 , … , 𝑆𝑚𝑛 ,
𝑤ℎ𝑒𝑟𝑒 𝑚1 , … , 𝑚𝑛 : 𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛 𝑙𝑒𝑥𝑒𝑚𝑒 𝑀𝑅 𝑜𝑓 𝑚,
∙ : 𝑎𝑙𝑙 𝑘 − 𝑝𝑒𝑟𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛𝑠 𝑓𝑜𝑟 𝑘 = 1, … , 𝑛
∀𝑙𝑒𝑥𝑒𝑚𝑒 𝑀𝑅 𝑚
𝑆𝑚 → 𝑃ℎ𝑟𝑎𝑠𝑒𝑚
𝑃ℎ𝑟𝑎𝑠𝑒𝑚 → 𝑊𝑜𝑟𝑑𝑚
𝑃ℎ𝑋𝑚 → 𝑃ℎ𝑋𝑚 𝑊𝑜𝑟𝑑𝑚
𝑃ℎ𝑟𝑎𝑠𝑒𝑚 → 𝑃ℎ𝑋𝑚 𝑊𝑜𝑟𝑑𝑚
𝑃ℎ𝑋𝑚 → 𝑃ℎ𝑋𝑚 𝑊𝑜𝑟𝑑∅
𝑃ℎ𝑟𝑎𝑠𝑒𝑚 → 𝑃ℎ𝑚 𝑊𝑜𝑟𝑑∅
𝑃ℎ𝑚 → 𝑃ℎ𝑋𝑚 𝑊𝑜𝑟𝑑𝑚
𝑃ℎ𝑋𝑚 → 𝑊𝑜𝑟𝑑𝑚
𝑃ℎ𝑚 → 𝑃ℎ𝑚 𝑊𝑜𝑟𝑑∅
𝑃ℎ𝑋𝑚 → 𝑊𝑜𝑟𝑑∅
𝑃ℎ𝑚 → 𝑊𝑜𝑟𝑑𝑚
𝑊𝑜𝑟𝑑𝑚 → 𝑠,
∀𝑠 s. t. 𝑠, 𝑚 ∈ 𝑙𝑒𝑥𝑖𝑐𝑜𝑛 𝐿
𝑊𝑜𝑟𝑑𝑚 → 𝑤,
∀𝑤𝑜𝑟𝑑 𝑤 ∈ 𝑠 s. t. 𝑠, 𝑚 ∈ 𝑙𝑒𝑥𝑖𝑐𝑜𝑛 𝐿
𝑊𝑜𝑟𝑑∅ → 𝑤,
∀𝑤𝑜𝑟𝑑 𝑤 ∈ 𝑁𝐿𝑠
Child concepts are generated
from parent concepts
selectively
All semantic concepts
generate relevant NL
words
Each semantic concept generates
at least one NL word
42
Parsing New NL Sentences
• PCFG rule weights are optimized by Inside-Outside
algorithm with training data
• Obtain the most probable parse tree for each test NL
sentence from the learned weights using CKY algorithm
• Compose final MR parse from lexeme MRs appeared in
the parse tree
– Consider only the lexeme MRs responsible for generating
NL words
– From the bottom of the tree, mark only responsible MR
components that propagate to the top level
– Able to compose novel MRs never seen in the training
data
43
Most probable parse tree for a test NL instruction
Turn
Verify
front:
BLUE
HALL
LEFT
Verify
Turn
steps:
2
at:
SOFA
RIGHT
Turn
Verify
Travel
Verify
Turn
LEFT
front:
SOFA
steps:
2
at:
SOFA
RIGHT
Travel
Verify
Turn
Turn
LEFT
NL:
front:
SOFA
Travel
Turn left and
at:
SOFA
find the sofa
then turn around
the corner
Turn
Verify
front:
BLUE
HALL
LEFT
front:
SOFA
Travel
Verify
Turn
steps:
2
at:
SOFA
RIGHT
Turn
Verify
Travel
Verify
Turn
LEFT
front:
SOFA
steps:
2
at:
SOFA
RIGHT
Travel
Verify
Turn
Turn
LEFT
at:
SOFA
Turn
LEFT
Verify
front:
BLUE
HALL
Turn
LEFT
front:
SOFA
Travel
Travel
Verify
Turn
steps:
2
at:
SOFA
RIGHT
Verify
Turn
at:
SOFA
46
Unigram Generation PCFG Model
• Limitations of Hierarchy Generation PCFG
Model
– Complexities caused by Lexeme Hierarchy Graph
and k-permutations
– Tend to over-fit to the training data
• Proposed Solution: Simpler model
– Generate relevant semantic lexemes one by one
– No extra PCFG rules for k-permutations
– Maintains simpler PCFG rule set, faster to train
47
PCFG Construction
• Unigram Markov generation of relevant
lexemes
– Each context MR generates relevant lexemes one
by one
– Permutations of the appearing orders of relevant
lexemes are already considered
48
PCFG Construction
𝑅𝑜𝑜𝑡 → 𝑆𝑐 ,
∀𝑐 ∈ 𝑐𝑜𝑛𝑡𝑒𝑥𝑡𝑠
∀𝑙𝑒𝑥𝑒𝑚𝑒 𝑀𝑅 𝑚
𝑆𝑐 → 𝐿𝑚 𝑆𝑐
𝑆𝑐 → 𝐿𝑚
𝐿𝑚 → 𝑃ℎ𝑟𝑎𝑠𝑒𝑚
𝑃ℎ𝑋𝑚 → 𝑃ℎ𝑋𝑚 𝑊𝑜𝑟𝑑𝑚
𝑃ℎ𝑟𝑎𝑠𝑒𝑚 → 𝑊𝑜𝑟𝑑𝑚
𝑃ℎ𝑋𝑚 → 𝑃ℎ𝑋𝑚 𝑊𝑜𝑟𝑑∅
𝑃ℎ𝑟𝑎𝑠𝑒𝑚 → 𝑃ℎ𝑋𝑚 𝑊𝑜𝑟𝑑𝑚
𝑃ℎ𝑟𝑎𝑠𝑒𝑚 → 𝑃ℎ𝑚 𝑊𝑜𝑟𝑑∅
𝑃ℎ𝑚 → 𝑃ℎ𝑋𝑚 𝑊𝑜𝑟𝑑𝑚
𝑃ℎ𝑚 → 𝑃ℎ𝑚 𝑊𝑜𝑟𝑑∅
𝑃ℎ𝑋𝑚 → 𝑊𝑜𝑟𝑑𝑚
𝑃ℎ𝑚 → 𝑊𝑜𝑟𝑑𝑚
𝑃ℎ𝑋𝑚 → 𝑊𝑜𝑟𝑑∅
𝑊𝑜𝑟𝑑𝑚 → 𝑠,
∀𝑠 s. t. 𝑠, 𝑚 ∈ 𝑙𝑒𝑥𝑖𝑐𝑜𝑛 𝐿
𝑊𝑜𝑟𝑑𝑚 → 𝑤,
∀𝑤𝑜𝑟𝑑 𝑤 ∈ 𝑠 s. t. 𝑠, 𝑚 ∈ 𝑙𝑒𝑥𝑖𝑐𝑜𝑛 𝐿
Each semantic concept is
generated by unigram
Markov process
All semantic concepts
generate relevant NL
words
𝑆∅ → 𝑃ℎ𝑟𝑎𝑠𝑒∅
𝑃ℎ𝑟𝑎𝑠𝑒∅ → 𝑃ℎ𝑟𝑎𝑠𝑒∅ 𝑊𝑜𝑟𝑑∅
𝑊𝑜𝑟𝑑∅ → 𝑤,
∀𝑤𝑜𝑟𝑑 𝑤 ∈ 𝑁𝐿𝑠
49
Parsing New NL Sentences
• Follows the similar scheme as in Hierarchy
Generation PCFG model
• Compose final MR parse from lexeme MRs
appeared in the parse tree
– Consider only the lexeme MRs responsible for
generating NL words
– Mark relevant lexeme MR components in the
context MR appearing in the top nonterminal
50
Most probable parse tree for a test NL instruction
Context
MR
Turn
Verify
front:
BLUE
HALL
LEFT
front:
SOFA
Travel
Verify
Turn
steps:
2
at:
SOFA
RIGHT
Context MR
Relevant
Lexemes
Turn
Turn
Verify
Travel
Verify
Turn
LEFT
front: front:
BLUE SOFA
HALL
steps:
2
at:
SOFA
RIGHT
LEFT
Context MR
Travel
Verify
at:
SOFA
Turn
Verify
Travel
Verify
Turn
LEFT
front: front:
BLUE SOFA
HALL
steps:
2
at:
SOFA
RIGHT
Turn
NL:
Turn left and
find the sofa
then turn around
the corner
Context
MR
Turn
LEFT
Verify
front:
BLUE
HALL
Turn
Relevant
Lexemes
front:
SOFA
Travel
Travel
Verify
Turn
steps:
2
at:
SOFA
RIGHT
Verify
Turn
LEFT
at:
SOFA
Context
MR
Turn
LEFT
Verify
front:
BLUE
HALL
Turn
Relevant
Lexemes
front:
SOFA
Travel
Travel
Verify
Turn
steps:
2
at:
SOFA
RIGHT
Verify
Turn
LEFT
at:
SOFA
Turn
LEFT
Verify
front:
BLUE
HALL
Turn
LEFT
front:
SOFA
Travel
Travel
Verify
Turn
steps:
2
at:
SOFA
RIGHT
Verify
Turn
at:
SOFA
54
Data
• 3 maps, 6 instructors, 1-15 followers/direction
• Hand-segmented into single sentence steps to make the learning
easier (Chen & Mooney, 2011)
• Mandarin Chinese translation of each sentence (Chen, 2012)
• Word-segmented version by Stanford Chinese Word Segmenter
• Character-segmented version
Paragraph
Take the wood path towards the easel.
At the easel, go left and then take a
right on the the blue path at the
corner. Follow the blue path towards
the chair and at the chair, take a right
towards the stool. When you reach
the stool, you are at 7.
Turn, Forward, Turn left, Forward,
Turn right, Forward x 3, Turn right,
Forward
Single sentence
Take the wood path towards the easel.
Turn
At the easel, go left and then take a right on
the the blue path at the corner.
Forward, Turn left, Forward, Turn right
55
Data Statistics
Paragraph
Single-Sentence
706
3236
Avg. # sentences
5.0 (±2.8)
1.0 (±0)
Avg. # actions
10.4 (±5.7)
2.1 (±2.4)
English
Avg. #
words Chinese-Word
/ sent Chinese-Character
37.6 (±21.1)
7.8 (±5.1)
31.6 (±18.1)
6.9 (±4.9)
48.9 (±28.3)
10.6 (±7.3)
660
629
661
508
448
328
# Instructions
English
Vocab
Chinese-Word
ulary
Chinese-Character
56
Evaluations
• Leave-one-map-out approach
– 2 maps for training and 1 map for testing
– Parse accuracy & Plan execution accuracy
• Compared with Chen and Mooney, 2011 and
Chen, 2012
– Ambiguous context (landmarks plan) is refined by greedy
selection of high-score lexemes with two different lexicon
learning algorithms
 Chen and Mooney, 2011: Graph Intersection Lexicon Learning (GILL)
 Chen, 2012: Subgraph Generation Online Lexicon Learning (SGOLL)
– Semantic parser KRISP (Kate and Mooney, 2006) trained on the
resulting supervised data
57
Parse Accuracy
• Evaluate how well the learned semantic
parsers can parse novel sentences in test data
• Metric: partial parse accuracy
58
Parse Accuracy (English)
90.16
88.36
87.58
86.1
74.81
68.79
68.59
76.44
69.31
65.41
55.41
PRECISION
57.03
RECALL
Chen & Mooney (2011)
Chen (2012)
Hierarchy Generation PCFG Model
Unigram Generation PCFG Model
F1
59
Parse Accuracy (Chinese-Word)
88.87
80.56
79.45
75.53
73.66
71.14
76.41
70.74
58.76
PRECISION
Chen (2012)
RECALL
Hierarchy Generation PCFG Model
F1
Unigram Generation PCFG Model
60
Parse Accuracy (Chinese-Character)
92.48
79.77
79.73
77.55
75.52
73.05
70.01
67.38
56.47
PRECISION
Chen (2012)
RECALL
Hierarchy Generation PCFG Model
F1
Unigram Generation PCFG Model
61
End-to-End Execution Evaluations
• Test how well the formal plan from the output
of semantic parser reaches the destination
• Strict metric: Only successful if the final
position matches exactly
– Also consider facing direction in single-sentence
– Paragraph execution is affected by even one
single-sentence execution
62
End-to-End Execution Evaluations
(English)
67.14
54.4
57.28
57.22
28.12
16.18
SINGLE-SENTENCE
19.18
20.17
PARAGRAPH
Chen & Mooney (2011)
Chen (2012)
Hierarchy Generation PCFG Model
Unigram Generation PCFG Model
63
End-to-End Execution Evaluations
(Chinese-Word)
61.03
58.7
63.4
20.13
SINGLE-SENTENCE
Chen (2012)
Hierarchy Generation PCFG Model
23.12
19.08
PARAGRAPH
Unigram Generation PCFG Model
64
End-to-End Execution Evaluations
(Chinese-Character)
62.85
57.27
55.61
23.33
16.73
SINGLE-SENTENCE
Chen (2012)
Hierarchy Generation PCFG Model
12.74
PARAGRAPH
Unigram Generation PCFG Model
65
Discussion
• Better recall in parse accuracy
– Our probabilistic model uses useful but low score lexemes as
well → more coverage
– Unified models are not vulnerable to intermediate information
loss
• Hierarchy Generation PCFG model over-fits to training data
– Complexities: LHG and k-permutation rules
 Particularly weak in Chinese-character corpus
 Longer avg. sentence length: hard to estimate PCFG weights
• Unigram Generation PCFG model is better
– Less complexity, avoid over-fitting, better generalization
• Better than Borschinger et al. 2011
– Overcome intractability in complex MRL
– Learn from more general, complex ambiguity
– Novel MR parses never seen during training
66
Comparison of Grammar Size
and EM Training Time
Data
Hierarchy Generation
PCFG Model
Unigram Generation
PCFG Model
|Grammar|
Time (hrs)
|Grammar|
Time (hrs)
English
20451
17.26
16357
8.78
Chinese
(Word)
21636
15.99
15459
8.05
Chinese
(Character)
19792
18.64
13514
12.58
67
Outline
• Introduction/Motivation
• Grounded Language Learning in Limited Ambiguity (Kim
and Mooney, COLING 2010)
– Learning to sportscast
• Grounded Language Learning in High Ambiguity
(Kim and Mooney, EMNLP 2012)
– Learn to follow navigational instructions
• Discriminative Reranking for Grounded Language
Learning (Kim and Mooney, ACL 2013)
• Future Directions
• Conclusion
68
Discriminative Reranking
• Effective approach to improve performance of
generative models with secondary discriminative
model
• Applied to various NLP tasks
–
–
–
–
–
–
–
Syntactic parsing (Collins, ICML 2000; Collins, ACL 2002; Charniak & Johnson, ACL 2005)
Semantic parsing (Lu et al., EMNLP 2008; Ge and Mooney, ACL 2006)
Part-of-speech tagging (Collins, EMNLP 2002)
Semantic role labeling (Toutanova et al., ACL 2005)
Named entity recognition (Collins, ACL 2002)
Machine translation (Shen et al., NAACL 2004; Fraser and Marcu, ACL 2006)
Surface realization in language generation (White & Rajkumar, EMNLP 2009;
Konstas & Lapata, ACL 2012)
• Goal:
– Adapt discriminative reranking to grounded language
learning
69
Discriminative Reranking
• Generative model
– Trained model outputs the best result with max probability
1-best candidate
with maximum probability
Candidate 1
Trained
Generative
Model
Testing
Example
70
Discriminative Reranking
• Can we do better?
– Secondary discriminative model picks the best out of n-best
candidates from baseline model
n-best candidates
Candidate 1
GEN
Candidate 2
Trained
Baseline
Generative
Model
Candidate 3
Output
Candidate 4
…
…
Testing
Example
Trained
Secondary
Discriminative
Model
Best prediction
Candidate n
71
How can we apply discriminative
reranking?
• Impossible to apply standard discriminative
reranking to grounded language learning
– Lack of a single gold-standard reference for each
training example
– Instead, provides weak supervision of surrounding
perceptual context (landmarks plan)
• Use response feedback from perceptual world
– Evaluate candidate formal MRs by executing them in
simulated worlds
Used in evaluating the final end-task, plan execution
– Weak indication of whether a candidate is good/bad
– Multiple candidate parses for parameter update
Response signal is weak and distributed over all candidates
72
Reranking Model:
Averaged Perceptron (Collins, 2000)
• Parameter weight vector is updated when
trained model predicts a wrong candidate
Our generative models
feature
n-best candidates vector
Candidate 1
GEN
Trained
Baseline
Generative
Model
𝒂𝟏
𝒂𝒈 − 𝒂𝟒
𝒂𝟐
1.21
Candidate 3
𝒂𝟑
-1.09
Candidate 4
𝒂𝟒
Candidate n
Perceptron
𝑊
1.46
Not
Available
Gold Standard
Reference
Best prediction
𝒂𝒏
Update
-0.16
Candidate 2
…
…
Training
Example
perceptron
score (𝑊 ∙ 𝑎)
0.59
𝒂𝒈
73
Response-based Weight Update
• Pick a pseudo-gold parse out of all candidates
– Most preferred one in terms of plan execution
– Evaluate composed MR plans from candidate parses
– MARCO (MacMahon et al. AAAI 2006) execution module runs
and evaluates each candidate MR in the world
Also used for evaluating end-goal, plan execution
performance
– Record Execution Success Rate
Whether each candidate MR reaches the intended
destination
MARCO is nondeterministic, average over 10 trials
– Prefer the candidate with the best execution success
rate during training
74
Response-based Update
• Select pseudo-gold reference based on MARCO execution
results
n-best candidates
Candidate 1
Derived
MRs
Best prediction
𝑴𝑹𝟏
Execution
Success
Rate
𝟎. 𝟔
Perceptron
Score (𝑊 ∙ 𝑎)
1.79
Candidate 2
𝑴𝑹𝟐
𝟎. 𝟒
0.21
Candidate 3
𝑴𝑹𝟑
𝟎. 𝟎
-1.09
Candidate 4
𝑴𝑹𝟒
MARCO
Execution
Module
𝟎. 𝟗
𝑴𝑹𝒏
𝟎. 𝟐
Feature vector
difference
Perceptron
𝑊
1.46
Pseudo-gold
Reference
…
Candidate n
Update
0.59
75
Weight Update with Multiple Parses
• Candidates other than pseudo-gold could be useful
– Multiple parses may have same maximum execution
success rates
– “Lower” execution success rates could mean correct plan
given indirect supervision of human follower actions
 MR plans are underspecified or ignorable details attached
 Sometimes inaccurate, but contain correct MR components to
reach the desired goal
• Weight update with multiple candidate parses
– Use candidates with higher execution success rates than
currently best-predicted candidate
– Update with feature vector difference weighted by
difference between execution success rates
76
Weight Update with Multiple Parses
• Weight update with multiple candidates that have higher
execution success rate than currently predicted parse
n-best candidates
Candidate 1
Derived
MRs
Best prediction
𝑴𝑹𝟏
Execution
Success
Rate
𝟎. 𝟔
Perceptron
Score (𝑊 ∙ 𝑎)
1.24
Candidate 2
𝑴𝑹𝟐
𝟎. 𝟒
1.83
Candidate 3
𝑴𝑹𝟑
𝟎. 𝟎
-1.09
Candidate 4
𝑴𝑹𝟒
MARCO
Execution
Module
𝟎. 𝟗
Update (1)
Feature vector
Difference
×
(𝟎. 𝟗 − 𝟎. 𝟒)
Perceptron
𝑊
1.46
…
Candidate n
𝑴𝑹𝒏
𝟎. 𝟐
0.59
77
Weight Update with Multiple Parses
• Weight update with multiple candidates that have higher
execution success rate than currently predicted parse
n-best candidates
Candidate 1
Derived
MRs
Best prediction
𝑴𝑹𝟏
Execution
Success
Rate
𝟎. 𝟔
Perceptron
Score (𝑊 ∙ 𝑎)
1.24
Candidate 2
𝑴𝑹𝟐
𝟎. 𝟒
1.83
Candidate 3
𝑴𝑹𝟑
𝟎. 𝟎
-1.09
Candidate 4
𝑴𝑹𝟒
MARCO
Execution
Module
𝟎. 𝟗
Update (2)
Feature vector
Difference
×
(𝟎. 𝟔 − 𝟎. 𝟒)
Perceptron
𝑊
1.46
…
Candidate n
𝑴𝑹𝒏
𝟎. 𝟐
0.59
78
Features
• Binary indicator whether a certain composition of
nonterminals/terminals appear in parse tree
(Collins, EMNLP 2002, Lu et al., EMNLP 2008, Ge & Mooney, ACL 2006)
L1: Turn(LEFT), Verify(front:SOFA, back:EASEL),
Travel(steps:2), Verify(at:SOFA), Turn(RIGHT)
L2: Turn(LEFT), Verify(front:SOFA)
𝒇𝒇 𝑳𝑳𝟑𝟏𝟓𝟑 ⇒
→
|𝑳
𝟏=𝟏𝟏
↝ 𝑳𝑳𝐟𝐢𝐧𝐝
𝟓𝟑
𝟓𝑳
𝟔=
𝟏=
L3: Travel(steps:2), Verify(at:SOFA), Turn(RIGHT)
L4: Turn(LEFT)
L5: Travel(), Verify(at:SOFA)
L6: Turn()
Turn left and
find the sofa
then turn around the corner
79
Evaluations
• Leave-one-map-out approach
– 2 maps for training and 1 map for testing
– Parse accuracy
– Plan execution accuracy (end goal)
• Compared with two baseline models
– Hierarchy and Unigram Generation PCFG models
– All reranking results use 50-best parses
– Try to get 50-best distinct composed MR plans and
according parses out of 1,000,000-best parses
Many parse trees differ insignificantly, leading to same
derived MR plans
Generate sufficiently large 1,000,000-best parse trees from
baseline model
80
Response-based Update vs. Baseline
(English)
Parse F1
Single-sentence
Paragraph
68.27
77.24
29.2
67.14
28.12
76.44
74.81
59.65
57.22
73.32
HIERARCHY
Baseline
22.62
UNIGRAM
Response-based
20.17
HIERARCHY
Baseline
UNIGRAM
Single
HIERARCHY
Baseline
UNIGRAM
Response-based
81
Response-based Update vs. Baseline
(Chinese-Word)
Parse F1
Single-sentence
Paragraph
65.64
77.74
23.74
77.26
23.12
64.12
63.4
21.29
76.41
75.53
HIERARCHY
Baseline
19.08
61.03
UNIGRAM
Response-based
HIERARCHY
Baseline
UNIGRAM
Response-based
HIERARCHY
Baseline
UNIGRAM
Response-based
82
Response-based Update vs. Baseline
(Chinese-Character)
Parse F1
Single-sentence
Paragraph
65.5
79.76
25.35
64.08
23.33
62.85
77.55
22.25
76.26
73.05
55.61
12.74
HIERARCHY
Baseline
UNIGRAM
Response-based
HIERARCHY
Baseline
UNIGRAM
Response-based
HIERARCHY
Baseline
UNIGRAM
Response-based
83
Response-based Update vs. Baseline
• vs. Baseline
– Response-based approach performs better in the final
end-task, plan execution.
– Optimize the model for plan execution
84
Response-based Update with
Multiple vs. Single Parses (English)
Parse F1
Single-sentence
68.27
77.81
Paragraph
68.93
29.2
29.1
77.24
26.57
62.81
73.32 73.43
HIERARCHY
Single
22.62
59.65
UNIGRAM
Multi
HIERARCHY
Single
UNIGRAM
Multi
HIERARCHY
Single
UNIGRAM
Multi
85
Response-based Update with
Multiple vs. Single Parses (Chinese-Word)
Parse F1
Single-sentence
Paragraph
66.27
78.8
25.95
65.64
78.11
77.74
23.74
77.26
64.12 64.15
21.29
HIERARCHY
Single
UNIGRAM
Multi
HIERARCHY
Single
UNIGRAM
Multi
21.55
HIERARCHY
Single
UNIGRAM
Multi
86
Response-based Update with
Multiple vs. Single Parses (Chinese-Character)
Parse F1
79.44
79.76
Single-sentence
Paragraph
27.16
66.84
79.94
25.35
65.5
76.26
64.08 64.08
HIERARCHY
Single
UNIGRAM
Multi
HIERARCHY
Single
22.25
UNIGRAM
Multi
22.58
HIERARCHY
Single
UNIGRAM
Multi
87
Response-based Update with
Multiple vs. Single Parses
• Using multiple parses improves the
performance in general
– Single-best pseudo-gold parse provides only weak
feedback
– Candidates with low execution success rates
produce underspecified plans or plans with
ignorable details, but capturing gist of preferred
actions
– A variety of preferable parses help improve the
amount and the quality of weak feedback
88
Outline
• Introduction/Motivation
• Grounded Language Learning in Limited Ambiguity (Kim
and Mooney, COLING 2010)
– Learning to sportscast
• Grounded Language Learning in High Ambiguity
(Kim and Mooney, EMNLP 2012)
– Learn to follow navigational instructions
• Discriminative Reranking for Grounded Language
Learning (Kim and Mooney, ACL 2013)
• Future Directions
• Conclusion
89
Future Directions
• Integrating syntactic components
– Learn joint model of syntactic and semantic
structure
• Large-scale data
– Data collection, model adaptation to large-scale
• Machine translation
– Application to summarized translation
• Real perceptual data
– Learn with raw features (sensory and vision data)
90
Outline
• Introduction/Motivation
• Grounded Language Learning in Limited Ambiguity (Kim
and Mooney, COLING 2010)
– Learning to sportscast
• Grounded Language Learning in High Ambiguity
(Kim and Mooney, EMNLP 2012)
– Learn to follow navigational instructions
• Discriminative Reranking for Grounded Language
Learning (Kim and Mooney, ACL 2013)
• Future Directions
• Conclusion
91
Conclusion
• Conventional language learning is expensive and not
scalable due to annotation of training data
• Grounded language learning from relevant, perceptual
context is promising and training corpus is easy to obtain
• Our proposed models provide general framework of full
probabilistic model for learning NL-MR correspondences
with ambiguous supervision
• Discriminative reranking is possible and effective with
weak feedback from perceptual environment
92
Thank You!
Download