Adaptation Tutorial

advertisement
Bridging Rule and Data-Driven Models
for Robust Dialogue State Tracking
Kai Yu
Shanghai Jiao Tong University
Universal Data Driven Solution
• Pro
• High performance with big data
• Less expert knowledge required
• Con
• Hard to explain
• Feature engineering is still important
Interpretable or Knowledge inspired structure ?
Kai Yu. Bridging Rule and Data-driven Models.
Prior Knowledge Incorporation
• Bayesian era
• Prior distribution, e.g. MAP
• Tree questions
• Deep learning era – structured deep learning
• Task-related structure changes, e.g. phone tgt
sharing
• Explicit attention/memory networks
• Interpretable feature augmentation, e.g. i-vector
• How about high-level knowledge, e.g. Rules?
Kai Yu. Bridging Rule and Data-driven Models.
Content
• Part I: Dialog State Tracking
• Part II: Data-driven and Rule-based DST
• Part III: Bridging Rule and Data-Driven Models
Part I
Dialog State Tracking
Architecture of SDS
Input Module
Automatic
Speech
Recognition
Spoken
Language
Understanding
Control Module
Dialogue State
Tracking
User
Output Module
Text-to-speech
Synthesis
Natural
Language
Generation
Kai Yu. Bridging Rule and Data-driven Models.
Decision
Making
Dialogue
Management
Key Concepts
acttype-slot-value tuples
(goal, user-action,history)
• Dialogue Manager
Input
Observation
Dialogue State
Tracking
Decision
Making
Kai Yu. Bridging Rule and Data-driven Models.
Output
Result
Dialogue Act Inform(Restauran=Chinese)
• acttype denotes how the slot-value
information is delivered by the user
– Inform
– Deny
– Affirm
– Negate
– Select
–…
• System act and user act may be different
Kai Yu. Bridging Rule and Data-driven Models.
Dialogue Act
Inform(
Restauran=Chinese)
• What is slot & value?
– Slot: Food
• Value: American food, Chinese, Indian, Japanese, …
– Slot: Area
• Value: north, west, south, east, …
– Slot: Price range
• Value: cheap, moderate, expensive, …
– Slot: Name
• Value: rice house, royal spice, sala thong, …
Kai Yu. Bridging Rule and Data-driven Models.
Dialogue State
• Dialogue state denotes the machine’s
understanding of the status of the dialogue
• Widely used definition: a tuple consisting of
S = (g,u,h)
– User’s goal g
– User’s action u
– Dialogue history h
Kai Yu. Bridging Rule and Data-driven Models.
State-of-the-art Dialogue Management
• Inference and decision under uncertainty
Dialogue Act
Distribution
Dialogue State
Tracking
Dialogue
State
Distribution
Decision Making
Dialogue
Act
• Input observation:
acttype-slot-value tuples with confidence score
• Maintained state status
belief state (i.e. state distribution)
Kai Yu. Bridging Rule and Data-driven Models.
Formulate Dialogue State Tracking
As An Independent Problem
Dialogue Act
Distribution
Dialogue State
Tracking
Dialogue State
Distribution
• Input Observation
– act-slot-value tuples with confidence score of each turn
– N-best ASR hypotheses of each user turn
• Output Hypothesis
– Dialogue state (goal) distribution
– One-best dialogue state (goal) estimate
• Output Reference
– Single Dialogue state
Kai Yu. Bridging Rule and Data-driven Models.
Dialogue State Tracking: Example dialogue
• (Machine)
– Hello, welcome to the restaurant system. How
may I help you?
Dialogue State
• (User)
Food
Area
Price range
Name
/
/
/
/
– I’d like some Chinese food.
Observations
Act
Slot
Value
Confidence
Inform
Food
Chinese
0.7
Deny
Chinese
Kai Yu. Bridging RuleFood
and Data-driven Models.
0.3
Dialogue State Tracking: Example dialogue
• (Machine)
– OK. What area would you like?
• (User)
Food
Area
Price range
Name
Chinese
/
/
/
Act
Slot
Value
Confidence
Inform
Area
College of
Veterinary
Medicine
0.55
Inform
Area
– College Town.
College
Town
Kai Yu. Bridging Rule and Data-driven Models.
0.45
Dialogue State Tracking: Example dialogue
• (Machine)
– College of Veterinary Medicine, is that right?
Food
Area
Price range
Name
Chinese
College of
Veterinary
Medicine
/
/
• (User)
– No. College Town.
Act
Slot
Value
Confidence
Negate
Area
College of
Veterinary
Medicine
0.8
Inform
Area
College
Kai Yu. Bridging Rule and Data-driven Models.
Town
0.2
Dialogue State Tracking: Example dialogue
• (Machine)
– There are 3 restaurant serving Chinese in College
Town: Apollo Chinese Restaurant, Hai Hong Lou,
Asian Noodle House. What price range would you
like?
Food
Area
Price range Name
Chinese
• (User)
– Low
College Town
/
/
Act
Slot
Value
Confidence
Inform
Price range
Cheap
0.8
Inform
Price range
Moderate
0.1
Inform
range Models.
Expensive
Kai Yu. Bridging RulePrice
and Data-driven
0.1
Dialogue State Tracking: Example dialogue
• (Machine)
– Apollo Chinese Restaurant is a good restaurant
with low price.
Food
Area
Price range
Name
Chinese
College Town
Cheap
/
• (User)
– OK. What’s the phone number?
Act
Slot
Value
Confidence
Inform
Name
Apollo Chinese
Restaurant
1.0
Kai Yu. Bridging Rule and Data-driven Models.
Dialogue State Tracking: Example dialogue
• (Machine)
– Telephone is 607 272 1188
Food
Area
Price range
Name
Chinese
College Town
Cheap
Apollo
Chinese
Restaurant
• (User)
– Thank you. Good bye.
Act
Slot
Value
Confidence
Bye
/
/
1.0
Kai Yu. Bridging Rule and Data-driven Models.
Dialog State Tracking Challenge (DSTC) 2
• Restaurant domain
• User goal change allowed
• 8 slots (requestable/informable) -> more complex
“state” definition than DSTC 1
[Williams, et al. SigDial 2013]
Kai Yu. Bridging Rule and Data-driven Models. [Henderson, et al. SigDial 2014]
DSTC 3
• Domain extension: restaurant -> hotel
• 8 slots -> 13 slots (5 new)
• Only 10 dialogues are given as seed domain data
[Henderson, et al. IEEE SLT 2014]
Kai Yu. Bridging Rule and Data-driven Models.
State Definition of DSTC 2 & 3
• Goals
– For each informable slot in the ontology, a
distribution over the values for that slot.
– A distribution over joint goals.
• Method
– A distribution over methods. The list of possible values is given in the
ontology. (e.g. by-constraint, by-name etc.)
• Requested Slots
– For each requestable slot in the ontology, a binary distribution over
whether this slot has been requested by the user and the system
should inform it.
[Henderson, et al. IEEE SLT 2014]
Kai Yu. Bridging Rule and Data-driven Models.
Evaluation Metrics
•
•
•
Only evaluate turns in which a slot appears in the SLU output
or the system response
All candidate values that haven’t been observed up to the
current turn are clustered together as a special value “none”
Evaluation metrics
–
–
Accuracy (1-best quality), the bigger the better
• Percentage of turns in which the tracker’s 1-best joint-goal
hypothesis is correct.
L2 (Probability calibration), the smaller, the better
• The L2 norm between the vector of scores output by dialogue
state tracker and a vector with 1 in the position of the correct
item, and 0 elsewhere.
[Henderson, et al. SigDial 2014]
Kai Yu. Bridging Rule and Data-driven Models.
Part II
Data-driven and Rule-based DST
Data Driven DST Approaches
Generative
Model
Discriminative
Model
• POMDP Bayesian Update
•
•
•
•
•
DNN – bi-classification
RNN – multi-classification
CRF
Decision trees
…
Kai Yu. Bridging Rule and Data-driven Models.
Dialogue State Tracking in POMDP
• Belief state is updated using Bayes’ theorem with
consideration of Markov and independence
assumptions
• Generative model
• Parameters are estimated from data
• Parameters can be refined using reinforcement learning
Kai Yu. Bridging Rule and Data-driven Models.
Deep Neural Network
• Slots are independent of each other, one DNN per slot
• Value v of the slot is one feature dimension -> binary
classifier
Kai Yu. Bridging Rule and Data-driven Models.
[Henderson, et al. SigDial 2013]
[Sun, et al. SigDial 2014]
Recurrent Neural Network
Belief over all values
n-gram feature
Internal Memory
• Tracks belief of all values
Simultaneously
• Use n-gram ASR feature
1. Calculate hidden
layer
Delex n-gram for slot s
Delex n-gram for value v
2. For each value v
3. All gv combined to
form joint vector g
4. Update belief and
memory
Kai Yu. Bridging Rule and Data-driven Models.
[Henderson, et al. SigDial 2014]
Features
• Positive-negative features (probabilistic features)
– Inform(i,v) = sum of the scores of SLU hyp informing value is v at turn i
• Statistics of features (statistics of probabilistic features)
– Max(i,v) = the largest score given by SLU informing, affirming, denying,
or negating the value is v at turn i
• Act-type features (indicator features)
– Acttype(i,m) = number of occurrences of act type m at turn i
• Slot answers’ status (indicator features)
– Canthelp(i,v) indicator of whether the system cannot offer a venue with
constrain “value is v” at turn I
• ASR hypothesis word n-gram (indicator features)
• Delexicalized word n-gram
Kai Yu. Bridging Rule and Data-driven Models.
Rule-based Approach for DST
• Example
For
each
(act,
slot,
value)
with
confidence score b
{
if b ≥ 0.8
{
if act ∈ {‘inform’, ‘affirm’}
{
state[slot] := value
}
}
}
Kai Yu. Bridging Rule and Data-driven Models.
Rule Based Approaches: Past
• Historically, most commercial systems have used handcrafted rules for state tracking, selecting the values
with the highest confidence so far and discarding
alternatives.
0.6
√
0.1
0.3
Kai Yu. Bridging Rule and Data-driven Models.
Rule Based Approaches: Past
• Historically, most commercial systems have used handcrafted rules for state tracking, selecting the values
with the highest confidence so far and discarding
alternatives.
0.6
0.4
0.1
0.2
0.3
0.4
√
Kai Yu. Bridging Rule and Data-driven Models.
Rule Based Approaches: Past
• Historically, most commercial systems have used handcrafted rules for state tracking, selecting the values
with the highest confidence so far and discarding
alternatives.
0.6
0.4
0.2
0.1
0.2
0.5
0.3
0.4
0.3
Kai Yu. Bridging Rule and Data-driven Models.
√
Rule Based Approaches: Past
• Historically, most commercial systems have used handcrafted rules for state tracking, selecting the values
with the highest confidence so far and discarding
alternatives.
0.6
0.4
0.2
0.1
0.1
0.2
0.5
0.7
0.3
0.4
0.3
0.2
Kai Yu. Bridging Rule and Data-driven Models.
√
Rule-based Approach: Past
• Rule: Bayes’ theorem, and Markovian state
transition
• Most approaches are assumed to be common
to all slots
Kai Yu. Bridging Rule and Data-driven Models.
Generative Bayesian Rule-based
Approach for DST
• Follows Bayesian state transition similar as POMDP
• Two assumptions
Set by rules
– Model completely trusts what the user says
– User goal does not change when the user is silent ()
For at-1 is
inform or
affirm
[Lukas, et al. SigDial 2013]
Kai Yu. Bridging Rule and Data-driven Models.
Believability of Observed Information
• Assume event A is positively mentioned
independently at each turn, then the belief of event
A ever happens is
• Assume event A and B are mutually exclusive, given
the belief of A of the previous turn, after observing B
(i.e. negative mentioned A), the new belief of A is
Kai Yu. Bridging Rule and Data-driven Models.
[Wang, et al. SigDial 2013]
Example
Rule-based
DSTC
Example
of
Rule-basedModel
Modelinin
DSTC
Model
in of
DSTC
HWU Belief
Belief Tracker
HWU
Tracker
HWU Belief Tracker
• For a specific slot, the belief of value v at turn (i + 1)
For a
a specific
slot,
the
belief
of of
value
v atv turn
i + i1+is1updated
by by
For
specific
slot,
the
belief
value
at
turn
is
updated
STC
ef of value
v at turn⇢
i + 1 is updated by
is updated
by
⇢1 − (1 − bi (v))(1 − Pi++ 1(v))
u = i nf or m
+
bi + 1(v) +=
1u−=(1i nf
− or
bi (v))(1
− − Pi + 1 (v)) u = i nf or m
− bi (v))(1
−
P
(v))
m
bi (v) 1 − Pi + 1−(v)
u = deny
bi + 1(v) i + =1
b (v) 1 − Pi + 1(v)
i
v)turn
1 −i +Pi1−+ is
(v)
u
=
deny
updated
by
1
u = deny
Combined
together
(implemented
inimplemention)
the code)
• Combined
together
(code
nf
or m (implemented in the code)
together
mented
inbuthe
code)
+
−
)
=
deny
(v)
=
1
−
(1
−
b
(v))(1
−
P
(v))
1
−
P
i+ 1
i
i+ 1
i + 1(v)
+
u= i
Combined
+ 1 (v))
Rule
bi + +1(v) = 1 − (1 −− bi (v))(1 − Pi++ 1(v)) 1 − Pi−+ 1(v)
– Pi + 1(v))sum1of
bi (v))(1 −
− scores
Pi + 1(v)of SLU hypotheses informing or
code)
value
is v at
i hypotheses informing or
I P +affirming
(v): sumthe
of the
scores
of turn
the SLU
1 (v))
i
affirming
the
value
is
v in the
i -th
turn.
sum
scores
ofofSLU
hypotheses
denyinginforming
or
−i+P(v):
I 1–
P
sum
of of
the
scores
the
SLU hypotheses
i + 1 (v)
−
or
the
value
isvv in
at
turn
i turn.
ores ofI the
SLU
hypotheses
informing
Pi−negating
(v): sum
ofvalue
the
scores
of the
theor
hypotheses denying or
affirming
the
is
iSLU
-th
− turn.the value is v in the i -th turn.
v in the
inegating
-th
I P
(v): sum of the scores of the SLU hypotheses denying or
i
U hypotheses
informing orKai Yu. Bridging Rule and Data-driven Models.
[Wang, et al. SigDial 2013]
Rule-based Approaches for DST
• Pro
– Simple and Efficient
– Understandable
– Easy to transfer to new domain
• Con
– Usually perform worse than statistical models
– Can not improve performance with more data
Kai Yu. Bridging Rule and Data-driven Models.
DSTC-2/3 Performance Comparison
• DSTC-2 Performance Comparison (11 teams incl. baseline and HWU)
Performance
Rank
Accuracy
L2
Naïve Baseline
9
0.619
0.738
HWU
6
0.711
0.466
DNN
3
0.750
0.416
RNN
2
0.768
0.346
• DSTC-3 Performance Comparison (9 teams incl. baseline and HWU)
Performance
Rank
Accuracy
L2
Naïve Baseline
8
0.555
0.860
HWU
6
0.575
0.744
DNN
---
0.583
0.583
RNN
1
0.646
0.538
Kai Yu. Bridging Rule and Data-driven Models.
40
Part III
Bridging Rule and Data-driven
Models
Bridging Rule-based and Data-driven
Approaches
Rule-based
Data-driven
Portability
Performance
Interpretability
Evolve with Data
Efficiency
Kai Yu. Bridging Rule and Data-driven Models.
Bridging Rule-based and Data-driven
Approaches
Datadriven
?
?
Rulebased
Kai Yu. Bridging Rule and Data-driven Models.
Constrained Markov Bayesian
Polynomial
Datadriven
CMBP
RPN
[Sun, et al. IEEE SLT 2014]
[Yu, et al. IEEE/ACM TASL 2015]
Rulebased
Kai Yu. Bridging Rule and Data-driven Models.
General
ViewView
of HWU
Tracker
General
ofBelief
HWU
Belief
Tracker
Original Form
• Original Form
bi + 1(v) =
=
1 − (1 − bi (v))(1 − Pi+ (v)) (1 − Pi− (v))
Pi+ (v) + bi (v) − bi (v)Pi+ (v) − Pi+ (v)Pi− (v)
bi (v)Pi− (v) + bi (v)Pi+ (v)Pi− (v)
•
•



General
Form
General
Form
is a polynomial function with coefficients
of
1, 0 or -1
+
−
bi + 1(v) = P bi (v), Pi (v), Pi (v)
Feature:
P (·)form:
is a polynomial
Model
3-order function with coefficients of { − 1, 0, 1}
Rule: Set coefficients of
according to prior knowledge
I Feature:
bi (v), Pi+of
(v),
Pi−contiguous
(v)
(Probability
calculation
two
events)
I
I
Model form: 3-order P (·)
Rule: Set coefficients of P (·) according to prior knowledge
(Probability calculation of two contiguous events)
Kai Yu. Bridging Rule and Data-driven Models.
New Thoughts of Rule Based Model
• Extend features to all features related to prior
knowledge
• Model arbitrary Bayesian probability
operations using a polynomial function
• Use constraints to enforce probability
requirement and incorporate prior knowledge
Kai Yu. Bridging Rule and Data-driven Models.
Generalized
Feature for Rule-based
+
I p + (v): sum of the scores of the SLU hypotheses in
I pi (v): sum
Probability
Calculation
of the scores
of the SLU hypot he
i
affirming the value is v in the i -th turn.
affirming
the
value
is
v
in
the
i
-th
turn.
−
–I p (v):sum
ofof
scores
of SLU hypotheses
informing
or de
sum
the
scores
of
the
SLU
hypotheses
i−
I paffirming
value
turn
i theof
ofv atthe
the
SLU hypot he
negating
the
value
is vscores
in
i -th
turn.
i (v): sum
–I negating
sumPof
scores
hypotheses
denying
or
the
value
is
v
in
the
i
-th
turn.
+
+ of0SLU
p̃i (v) = Pv 06= v pi (v )
negating
value v at turn
I0
+
+
P
I p̃− (v) =
− pi 0(v )
06
I– p̃ i (v) =
v
=
v
p
(v )
06
v
=
v
i
i
P
−
−
0)
I–I b
p̃rii : (v)
=
p
(v
06
belief of “vthe
being ‘the rest’ in the i -th t
= vvalue
i
by
rule-based
model.
of “the
being ‘the
rest’‘the
in atrest’
turn i”in the
bri belief
:the
belief
of “value
the
value
being
belief
of of
“the
value
being
v at vturn
i” i -th turn”
bby
belief
“ the
value
being
in the
i (v):
the
rule-based
model.
the
rule-based
model.
I bi (v): belief of “ the value being v in the i -th
–I
–I
the rule-based model.
Kai Yu. Bridging Rule and Data-driven Models.
Generalized Model for Rule-based
Probability Calculation
• A Markov Bayesian Polynomial (MBP) model is
defined as a polynomial function of probability
quantities
• A regular MBP model of order k is an MBP
model with polynomial order k and all
coefficients belongs to {-1,0,1}
Kai Yu. Bridging Rule and Data-driven Models.
Generalized Prior Knowledge
Incorporation - Constraints
• Physical and logical constraints
– Probability sum-to-one constraints
– Feature definition requirements
• Intuition knowledge constraints
– E.g. the belief should be unchanged or positively
correlated with the positive scores from SLU
Kai Yu. Bridging Rule and Data-driven Models.
Physical and Logical Constraints
Kai Yu. Bridging Rule and Data-driven Models.
Intuition Knowledge Constraints
• If neither positive nor negative information is
collected, the belief should not change
• If both ASR and SLU is perfectly correct, the model
should always give the correct result
• The belief should be unchanged or positively
correlated with:
1. positive scores from SLU.
2. sum of the negative of the other values.
3. belief of the last turn.
• …
Kai Yu. Bridging Rule and Data-driven Models.
Constrained Markov Bayesian
Polynomial
• M(k) denotes the search space of regular MBP of
order k, sufficient to just exploit M(3)
• Rewrite the above MBP for M(3): (x0=1)
• Constraints need to be formalized
Kai Yu. Bridging Rule and Data-driven Models.
Polynomial coefficients
are “parameters”
Constraints Formalization
–
–
Formalize constraints using mathematics
E.g. for constraint:
•
The belief should be unchanged or positively
correlated with the positive scores from SLU
Kai Yu. Bridging Rule and Data-driven Models.
Linear Constraints Approximation
Approximate the exact constraints with more relaxed
easy-to-evaluate linear constraints
denotes all possible input vectors
Kai Yu. Bridging Rule and Data-driven Models.
Rule Generation & Selection
The optimal CMBP is the solution of the following
integer programming problem
DST accuracy evaluated on training data
Kai Yu. Bridging Rule and Data-driven Models.
Procedure of CMBP Optimization
1. Solve integer programming with a dummy
criterion to get all feasible CMBP solutions
2. Calculate state tracking performance of all
feasible solutions on the training data
3. Find the optimal rule from top N candidates
– Select the one with low complexity
– Select multiple ones and perform score
combination
Kai Yu. Bridging Rule and Data-driven Models.
Real-coefficient CMBP
• Prior knowledge has helped us to find the structure,
can we tune more to the data-driven direction?
• Extend integer-coefficient CMBP to real-coefficient
CMBP
– Get an integer solution
– Perform hill climbing using grid search
Kai Yu. Bridging Rule and Data-driven Models.
Performance on DSTC-2
• Comparison with state-of-the-art DST trackers on DSTC-2
System
Rank
Baseline*
5
LambdaMART (Williams)
1
RNN (Henderson et al.)
2
DNN (Sun et al.)
3
Int CMBP
2.5
Real CMBP
2.5
Accuracy
0.719
0.784
0.768
0.750
0.756
0.762
L2
0.464
0.735
0.346
0.416
0.370
0.436
Baseline* is the best baseline from the 4 organizer-provided baselines
Kai Yu. Bridging Rule and Data-driven Models.
Performance on DSTC-3
• Comparison with state-of-the-art DST trackers on DSTC-3
System
Baseline*
RNN (Henderson et al.)
Rank
6
1
Accuracy
0.575
0.646
L2
0.691
0.538
IBM Rule (Kadlec et al.)
Int CMBP (Sun et al.)
Int CMBP
Real CMBP
2
3
2.5
1.5
0.630
0.610
0.623
0.632
0.627
0.556
0.552
0.591
Baseline* is the best baseline from the 4 organizer-provided baselines
Kai Yu. Bridging Rule and Data-driven Models.
Weakness of CMBP
• Adding features and increasing model
complexity are difficult
– Additional prior knowledge is needed to keep the
search space small
– Hard to incorporate other intuitive features (e.g.
machine actions)
• Hill climbing is inefficient
Kai Yu. Bridging Rule and Data-driven Models.
Recurrent Polynomial Network
Datadriven
RPN
CMBP
[Xie, et al. SigDial 2015]
Rulebased
Kai Yu. Bridging Rule and Data-driven Models.
RPN Definition and Example
• Input node
• Computation node
– Sum node
– Product node
– Activation node
Kai Yu. Bridging Rule and Data-driven Models.
RPN for DST
• A layered RPN structure for dialogue state tracking
which essentially corresponds to 3-order CMBP
Kai Yu. Bridging Rule and Data-driven Models.
Activation Function
2
2
2
1.5
1.5
1.5
1
1
1
0.5
0.5
0.5
0
0
-1
0
-0.5
1
-1
-0.52
0
0
1
-1
-1
-1
(a) 𝑓(𝑥) = 𝑐𝑙𝑖𝑝(𝑥)
0
1
-0.52
-1
(b) 𝑓(𝑥) =
1
1+𝑒 −5(𝑥−0.5)
(c) 𝑓 𝑥 = 𝑠𝑜𝑓𝑡𝑐𝑙𝑖𝑝(𝑥)
0.03
0.02
0.01
0
-0.15
(d) Partial enlargement of 𝑠𝑜𝑓𝑡𝑐𝑙𝑖𝑝(⋅)
Kai Yu. Bridging Rule and Data-driven Models.
0.01
0.03
RPN with Activation Function
Kai Yu. Bridging Rule and Data-driven Models.
More Complex Structure
• Add features and recurrent connections
Kai Yu. Bridging Rule and Data-driven Models.
Training
• Initialize RPN weights using CMBP
• Each 3-order CMBP corresponds to a set of
weights in RPN
• Taking advantage of prior knowledge and
constraints
• MSE criterion
• Backpropagation through time (BPTT) to
update all weights
Kai Yu. Bridging Rule and Data-driven Models.
Performance on DSTC-2
• Comparison with state-of-the-art DST trackers on DSTC-2
System
Baseline*
Rank
5
Accuracy
0.719
L2
0.464
LambdaMART (Williams)
RNN (Henderson et al.)
DNN (Sun et al.)
1
2
3
0.784
0.768
0.750
0.735
0.346
0.416
Int CMBP
Real CMBP
RPN
2.5
2.5
2.5
0.756
0.762
0.757
0.370
0.436
0.347
Baseline* is the best baseline from the 4 organizer-provided baselines
Kai Yu. Bridging Rule and Data-driven Models.
Performance on DSTC-3
• Comparison with state-of-the-art DST trackers on DSTC-3
System
Baseline*
RNN (Henderson et al.)
Rank
6
1
Accuracy
0.575
0.646
L2
0.691
0.538
IBM Rule (Kadlec et al.)
Int CMBP (Sun et al.)
Int CMBP
2
3
2.5
0.630
0.610
0.623
0.627
0.556
0.552
Real CMBP
RPN
1.5
0.5
0.632
0.650
0.591
0.549
Baseline* is the best baseline from the 4 organizer-provided baselines
Kai Yu. Bridging Rule and Data-driven Models.
Summary
• Structures are useful for data driven learning
especially for generalization
• High-level rules can be used to achieve
efficient and understandable model
• Bridge rule-based and data-driven model
– Use rule to find appropriate model structure
– Encode knowledge into constraints
– Use data to enhance model power
Kai Yu. Bridging Rule and Data-driven Models.
Questions?
Acknowledgement
Thank my students Lu Chen and Kai Sun
for helping to prepare the slides
Kai Yu. Bridging Rule and Data-driven Models.
References
•
•
•
•
•
•
•
•
•
Williams J D, Raux A, Ramachandran D, et al. The Dialog State Tracking Challenge. 14th Annual Meeting of
the Special Interest Group on Discourse and Dialogue (SigDial), 2013.
Henderson M, Thomson B, Williams J D. The second dialog state tracking challenge . 15th Annual Meeting of
the Special Interest Group on Discourse and Dialogue (SigDial), 2014.
Henderson M, Thomson B, Williams J D. The Third Dialog State Tracking Challenge. IEEE Spoken Language
Technology Workshop (SLT), 2014.
Henderson M, Thomson B, Young S. Deep neural network approach for the dialog state tracking challenge.
14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SigDial), 2013.
Sun K, Chen L, Zhu S, et al. The SJTU system for dialog state tracking challenge 2. 15th Annual Meeting of the
Special Interest Group on Discourse and Dialogue (SigDial), 2014.
Henderson M, Thomson B, Young S. Word-based dialog state tracking with recurrent neural networks. 15th
Annual Meeting of the Special Interest Group on Discourse and Dialogue (SigDial), 2014.
Henderson M, Thomson B, Young S. Robust dialog state tracking using delexicalised recurrent neural
networks and unsupervised adaptation. IEEE Spoken Language Technology Workshop (SLT), 2014.
Zilka L, Marek D, Korvas M, et al. Comparison of Bayesian Discriminative and Generative Models for Dialogue
State Tracking. 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SigDial),
2013.
Wang Z, Lemon O. A simple and generic belief tracking mechanism for the dialog state tracking challenge:
On the believability of observed information. 14th Annual Meeting of the Special Interest Group on
Discourse and Dialogue (SigDial), 2013.
Kai Yu. Bridging Rule and Data-driven Models.
References
•
•
•
•
•
•
•
Kadlec R, Libovický J, Macek J, et al. IBM’s Belief Tracker: Results On Dialog State Tracking Challenge
Datasets. In Dialog in Motion workshop on EACL , 2014.
Kadlec R, Vodolán M, Libovicky J, et al. Knowledge-based Dialog State Tracking. IEEE Spoken Language
Technology Workshop (SLT), 2014.
Vodolán M, Kadlec R, Kleindienst J. Hybrid Dialog State Tracker. NIPS Workshop on Machine Learning for
Spoken Language Understanding and Interaction, 2015.
Williams J D. Web-style ranking and SLU combination for dialog state tracking. 15th Annual Meeting of the
Special Interest Group on Discourse and Dialogue (SigDial). 2014.
Yu K, Sun K, Chen L, et al. Constrained Markov Bayesian Polynomial for Efficient Dialogue State Tracking.
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2015.
Sun K, Chen L, Zhu S, et al. A Generalized Rule Based Tracker for Dialogue State Tracking. IEEE Spoken
Language Technology Workshop (SLT), 2014.
Xie Q, Sun K, Zhu S, et al. Recurrent Polynomial Network for Dialogue State Tracking with Mismatched
Semantic Parsers. The 16th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), 2015.
Kai Yu. Bridging Rule and Data-driven Models.
Download