Bridging Rule and Data-Driven Models for Robust Dialogue State Tracking Kai Yu Shanghai Jiao Tong University Universal Data Driven Solution • Pro • High performance with big data • Less expert knowledge required • Con • Hard to explain • Feature engineering is still important Interpretable or Knowledge inspired structure ? Kai Yu. Bridging Rule and Data-driven Models. Prior Knowledge Incorporation • Bayesian era • Prior distribution, e.g. MAP • Tree questions • Deep learning era – structured deep learning • Task-related structure changes, e.g. phone tgt sharing • Explicit attention/memory networks • Interpretable feature augmentation, e.g. i-vector • How about high-level knowledge, e.g. Rules? Kai Yu. Bridging Rule and Data-driven Models. Content • Part I: Dialog State Tracking • Part II: Data-driven and Rule-based DST • Part III: Bridging Rule and Data-Driven Models Part I Dialog State Tracking Architecture of SDS Input Module Automatic Speech Recognition Spoken Language Understanding Control Module Dialogue State Tracking User Output Module Text-to-speech Synthesis Natural Language Generation Kai Yu. Bridging Rule and Data-driven Models. Decision Making Dialogue Management Key Concepts acttype-slot-value tuples (goal, user-action,history) • Dialogue Manager Input Observation Dialogue State Tracking Decision Making Kai Yu. Bridging Rule and Data-driven Models. Output Result Dialogue Act Inform(Restauran=Chinese) • acttype denotes how the slot-value information is delivered by the user – Inform – Deny – Affirm – Negate – Select –… • System act and user act may be different Kai Yu. Bridging Rule and Data-driven Models. Dialogue Act Inform( Restauran=Chinese) • What is slot & value? – Slot: Food • Value: American food, Chinese, Indian, Japanese, … – Slot: Area • Value: north, west, south, east, … – Slot: Price range • Value: cheap, moderate, expensive, … – Slot: Name • Value: rice house, royal spice, sala thong, … Kai Yu. Bridging Rule and Data-driven Models. Dialogue State • Dialogue state denotes the machine’s understanding of the status of the dialogue • Widely used definition: a tuple consisting of S = (g,u,h) – User’s goal g – User’s action u – Dialogue history h Kai Yu. Bridging Rule and Data-driven Models. State-of-the-art Dialogue Management • Inference and decision under uncertainty Dialogue Act Distribution Dialogue State Tracking Dialogue State Distribution Decision Making Dialogue Act • Input observation: acttype-slot-value tuples with confidence score • Maintained state status belief state (i.e. state distribution) Kai Yu. Bridging Rule and Data-driven Models. Formulate Dialogue State Tracking As An Independent Problem Dialogue Act Distribution Dialogue State Tracking Dialogue State Distribution • Input Observation – act-slot-value tuples with confidence score of each turn – N-best ASR hypotheses of each user turn • Output Hypothesis – Dialogue state (goal) distribution – One-best dialogue state (goal) estimate • Output Reference – Single Dialogue state Kai Yu. Bridging Rule and Data-driven Models. Dialogue State Tracking: Example dialogue • (Machine) – Hello, welcome to the restaurant system. How may I help you? Dialogue State • (User) Food Area Price range Name / / / / – I’d like some Chinese food. Observations Act Slot Value Confidence Inform Food Chinese 0.7 Deny Chinese Kai Yu. Bridging RuleFood and Data-driven Models. 0.3 Dialogue State Tracking: Example dialogue • (Machine) – OK. What area would you like? • (User) Food Area Price range Name Chinese / / / Act Slot Value Confidence Inform Area College of Veterinary Medicine 0.55 Inform Area – College Town. College Town Kai Yu. Bridging Rule and Data-driven Models. 0.45 Dialogue State Tracking: Example dialogue • (Machine) – College of Veterinary Medicine, is that right? Food Area Price range Name Chinese College of Veterinary Medicine / / • (User) – No. College Town. Act Slot Value Confidence Negate Area College of Veterinary Medicine 0.8 Inform Area College Kai Yu. Bridging Rule and Data-driven Models. Town 0.2 Dialogue State Tracking: Example dialogue • (Machine) – There are 3 restaurant serving Chinese in College Town: Apollo Chinese Restaurant, Hai Hong Lou, Asian Noodle House. What price range would you like? Food Area Price range Name Chinese • (User) – Low College Town / / Act Slot Value Confidence Inform Price range Cheap 0.8 Inform Price range Moderate 0.1 Inform range Models. Expensive Kai Yu. Bridging RulePrice and Data-driven 0.1 Dialogue State Tracking: Example dialogue • (Machine) – Apollo Chinese Restaurant is a good restaurant with low price. Food Area Price range Name Chinese College Town Cheap / • (User) – OK. What’s the phone number? Act Slot Value Confidence Inform Name Apollo Chinese Restaurant 1.0 Kai Yu. Bridging Rule and Data-driven Models. Dialogue State Tracking: Example dialogue • (Machine) – Telephone is 607 272 1188 Food Area Price range Name Chinese College Town Cheap Apollo Chinese Restaurant • (User) – Thank you. Good bye. Act Slot Value Confidence Bye / / 1.0 Kai Yu. Bridging Rule and Data-driven Models. Dialog State Tracking Challenge (DSTC) 2 • Restaurant domain • User goal change allowed • 8 slots (requestable/informable) -> more complex “state” definition than DSTC 1 [Williams, et al. SigDial 2013] Kai Yu. Bridging Rule and Data-driven Models. [Henderson, et al. SigDial 2014] DSTC 3 • Domain extension: restaurant -> hotel • 8 slots -> 13 slots (5 new) • Only 10 dialogues are given as seed domain data [Henderson, et al. IEEE SLT 2014] Kai Yu. Bridging Rule and Data-driven Models. State Definition of DSTC 2 & 3 • Goals – For each informable slot in the ontology, a distribution over the values for that slot. – A distribution over joint goals. • Method – A distribution over methods. The list of possible values is given in the ontology. (e.g. by-constraint, by-name etc.) • Requested Slots – For each requestable slot in the ontology, a binary distribution over whether this slot has been requested by the user and the system should inform it. [Henderson, et al. IEEE SLT 2014] Kai Yu. Bridging Rule and Data-driven Models. Evaluation Metrics • • • Only evaluate turns in which a slot appears in the SLU output or the system response All candidate values that haven’t been observed up to the current turn are clustered together as a special value “none” Evaluation metrics – – Accuracy (1-best quality), the bigger the better • Percentage of turns in which the tracker’s 1-best joint-goal hypothesis is correct. L2 (Probability calibration), the smaller, the better • The L2 norm between the vector of scores output by dialogue state tracker and a vector with 1 in the position of the correct item, and 0 elsewhere. [Henderson, et al. SigDial 2014] Kai Yu. Bridging Rule and Data-driven Models. Part II Data-driven and Rule-based DST Data Driven DST Approaches Generative Model Discriminative Model • POMDP Bayesian Update • • • • • DNN – bi-classification RNN – multi-classification CRF Decision trees … Kai Yu. Bridging Rule and Data-driven Models. Dialogue State Tracking in POMDP • Belief state is updated using Bayes’ theorem with consideration of Markov and independence assumptions • Generative model • Parameters are estimated from data • Parameters can be refined using reinforcement learning Kai Yu. Bridging Rule and Data-driven Models. Deep Neural Network • Slots are independent of each other, one DNN per slot • Value v of the slot is one feature dimension -> binary classifier Kai Yu. Bridging Rule and Data-driven Models. [Henderson, et al. SigDial 2013] [Sun, et al. SigDial 2014] Recurrent Neural Network Belief over all values n-gram feature Internal Memory • Tracks belief of all values Simultaneously • Use n-gram ASR feature 1. Calculate hidden layer Delex n-gram for slot s Delex n-gram for value v 2. For each value v 3. All gv combined to form joint vector g 4. Update belief and memory Kai Yu. Bridging Rule and Data-driven Models. [Henderson, et al. SigDial 2014] Features • Positive-negative features (probabilistic features) – Inform(i,v) = sum of the scores of SLU hyp informing value is v at turn i • Statistics of features (statistics of probabilistic features) – Max(i,v) = the largest score given by SLU informing, affirming, denying, or negating the value is v at turn i • Act-type features (indicator features) – Acttype(i,m) = number of occurrences of act type m at turn i • Slot answers’ status (indicator features) – Canthelp(i,v) indicator of whether the system cannot offer a venue with constrain “value is v” at turn I • ASR hypothesis word n-gram (indicator features) • Delexicalized word n-gram Kai Yu. Bridging Rule and Data-driven Models. Rule-based Approach for DST • Example For each (act, slot, value) with confidence score b { if b ≥ 0.8 { if act ∈ {‘inform’, ‘affirm’} { state[slot] := value } } } Kai Yu. Bridging Rule and Data-driven Models. Rule Based Approaches: Past • Historically, most commercial systems have used handcrafted rules for state tracking, selecting the values with the highest confidence so far and discarding alternatives. 0.6 √ 0.1 0.3 Kai Yu. Bridging Rule and Data-driven Models. Rule Based Approaches: Past • Historically, most commercial systems have used handcrafted rules for state tracking, selecting the values with the highest confidence so far and discarding alternatives. 0.6 0.4 0.1 0.2 0.3 0.4 √ Kai Yu. Bridging Rule and Data-driven Models. Rule Based Approaches: Past • Historically, most commercial systems have used handcrafted rules for state tracking, selecting the values with the highest confidence so far and discarding alternatives. 0.6 0.4 0.2 0.1 0.2 0.5 0.3 0.4 0.3 Kai Yu. Bridging Rule and Data-driven Models. √ Rule Based Approaches: Past • Historically, most commercial systems have used handcrafted rules for state tracking, selecting the values with the highest confidence so far and discarding alternatives. 0.6 0.4 0.2 0.1 0.1 0.2 0.5 0.7 0.3 0.4 0.3 0.2 Kai Yu. Bridging Rule and Data-driven Models. √ Rule-based Approach: Past • Rule: Bayes’ theorem, and Markovian state transition • Most approaches are assumed to be common to all slots Kai Yu. Bridging Rule and Data-driven Models. Generative Bayesian Rule-based Approach for DST • Follows Bayesian state transition similar as POMDP • Two assumptions Set by rules – Model completely trusts what the user says – User goal does not change when the user is silent () For at-1 is inform or affirm [Lukas, et al. SigDial 2013] Kai Yu. Bridging Rule and Data-driven Models. Believability of Observed Information • Assume event A is positively mentioned independently at each turn, then the belief of event A ever happens is • Assume event A and B are mutually exclusive, given the belief of A of the previous turn, after observing B (i.e. negative mentioned A), the new belief of A is Kai Yu. Bridging Rule and Data-driven Models. [Wang, et al. SigDial 2013] Example Rule-based DSTC Example of Rule-basedModel Modelinin DSTC Model in of DSTC HWU Belief Belief Tracker HWU Tracker HWU Belief Tracker • For a specific slot, the belief of value v at turn (i + 1) For a a specific slot, the belief of of value v atv turn i + i1+is1updated by by For specific slot, the belief value at turn is updated STC ef of value v at turn⇢ i + 1 is updated by is updated by ⇢1 − (1 − bi (v))(1 − Pi++ 1(v)) u = i nf or m + bi + 1(v) += 1u−=(1i nf − or bi (v))(1 − − Pi + 1 (v)) u = i nf or m − bi (v))(1 − P (v)) m bi (v) 1 − Pi + 1−(v) u = deny bi + 1(v) i + =1 b (v) 1 − Pi + 1(v) i v)turn 1 −i +Pi1−+ is (v) u = deny updated by 1 u = deny Combined together (implemented inimplemention) the code) • Combined together (code nf or m (implemented in the code) together mented inbuthe code) + − ) = deny (v) = 1 − (1 − b (v))(1 − P (v)) 1 − P i+ 1 i i+ 1 i + 1(v) + u= i Combined + 1 (v)) Rule bi + +1(v) = 1 − (1 −− bi (v))(1 − Pi++ 1(v)) 1 − Pi−+ 1(v) – Pi + 1(v))sum1of bi (v))(1 − − scores Pi + 1(v)of SLU hypotheses informing or code) value is v at i hypotheses informing or I P +affirming (v): sumthe of the scores of turn the SLU 1 (v)) i affirming the value is v in the i -th turn. sum scores ofofSLU hypotheses denyinginforming or −i+P(v): I 1– P sum of of the scores the SLU hypotheses i + 1 (v) − or the value isvv in at turn i turn. ores ofI the SLU hypotheses informing Pi−negating (v): sum ofvalue the scores of the theor hypotheses denying or affirming the is iSLU -th − turn.the value is v in the i -th turn. v in the inegating -th I P (v): sum of the scores of the SLU hypotheses denying or i U hypotheses informing orKai Yu. Bridging Rule and Data-driven Models. [Wang, et al. SigDial 2013] Rule-based Approaches for DST • Pro – Simple and Efficient – Understandable – Easy to transfer to new domain • Con – Usually perform worse than statistical models – Can not improve performance with more data Kai Yu. Bridging Rule and Data-driven Models. DSTC-2/3 Performance Comparison • DSTC-2 Performance Comparison (11 teams incl. baseline and HWU) Performance Rank Accuracy L2 Naïve Baseline 9 0.619 0.738 HWU 6 0.711 0.466 DNN 3 0.750 0.416 RNN 2 0.768 0.346 • DSTC-3 Performance Comparison (9 teams incl. baseline and HWU) Performance Rank Accuracy L2 Naïve Baseline 8 0.555 0.860 HWU 6 0.575 0.744 DNN --- 0.583 0.583 RNN 1 0.646 0.538 Kai Yu. Bridging Rule and Data-driven Models. 40 Part III Bridging Rule and Data-driven Models Bridging Rule-based and Data-driven Approaches Rule-based Data-driven Portability Performance Interpretability Evolve with Data Efficiency Kai Yu. Bridging Rule and Data-driven Models. Bridging Rule-based and Data-driven Approaches Datadriven ? ? Rulebased Kai Yu. Bridging Rule and Data-driven Models. Constrained Markov Bayesian Polynomial Datadriven CMBP RPN [Sun, et al. IEEE SLT 2014] [Yu, et al. IEEE/ACM TASL 2015] Rulebased Kai Yu. Bridging Rule and Data-driven Models. General ViewView of HWU Tracker General ofBelief HWU Belief Tracker Original Form • Original Form bi + 1(v) = = 1 − (1 − bi (v))(1 − Pi+ (v)) (1 − Pi− (v)) Pi+ (v) + bi (v) − bi (v)Pi+ (v) − Pi+ (v)Pi− (v) bi (v)Pi− (v) + bi (v)Pi+ (v)Pi− (v) • • General Form General Form is a polynomial function with coefficients of 1, 0 or -1 + − bi + 1(v) = P bi (v), Pi (v), Pi (v) Feature: P (·)form: is a polynomial Model 3-order function with coefficients of { − 1, 0, 1} Rule: Set coefficients of according to prior knowledge I Feature: bi (v), Pi+of (v), Pi−contiguous (v) (Probability calculation two events) I I Model form: 3-order P (·) Rule: Set coefficients of P (·) according to prior knowledge (Probability calculation of two contiguous events) Kai Yu. Bridging Rule and Data-driven Models. New Thoughts of Rule Based Model • Extend features to all features related to prior knowledge • Model arbitrary Bayesian probability operations using a polynomial function • Use constraints to enforce probability requirement and incorporate prior knowledge Kai Yu. Bridging Rule and Data-driven Models. Generalized Feature for Rule-based + I p + (v): sum of the scores of the SLU hypotheses in I pi (v): sum Probability Calculation of the scores of the SLU hypot he i affirming the value is v in the i -th turn. affirming the value is v in the i -th turn. − –I p (v):sum ofof scores of SLU hypotheses informing or de sum the scores of the SLU hypotheses i− I paffirming value turn i theof ofv atthe the SLU hypot he negating the value is vscores in i -th turn. i (v): sum –I negating sumPof scores hypotheses denying or the value is v in the i -th turn. + + of0SLU p̃i (v) = Pv 06= v pi (v ) negating value v at turn I0 + + P I p̃− (v) = − pi 0(v ) 06 I– p̃ i (v) = v = v p (v ) 06 v = v i i P − − 0) I–I b p̃rii : (v) = p (v 06 belief of “vthe being ‘the rest’ in the i -th t = vvalue i by rule-based model. of “the being ‘the rest’‘the in atrest’ turn i”in the bri belief :the belief of “value the value being belief of of “the value being v at vturn i” i -th turn” bby belief “ the value being in the i (v): the rule-based model. the rule-based model. I bi (v): belief of “ the value being v in the i -th –I –I the rule-based model. Kai Yu. Bridging Rule and Data-driven Models. Generalized Model for Rule-based Probability Calculation • A Markov Bayesian Polynomial (MBP) model is defined as a polynomial function of probability quantities • A regular MBP model of order k is an MBP model with polynomial order k and all coefficients belongs to {-1,0,1} Kai Yu. Bridging Rule and Data-driven Models. Generalized Prior Knowledge Incorporation - Constraints • Physical and logical constraints – Probability sum-to-one constraints – Feature definition requirements • Intuition knowledge constraints – E.g. the belief should be unchanged or positively correlated with the positive scores from SLU Kai Yu. Bridging Rule and Data-driven Models. Physical and Logical Constraints Kai Yu. Bridging Rule and Data-driven Models. Intuition Knowledge Constraints • If neither positive nor negative information is collected, the belief should not change • If both ASR and SLU is perfectly correct, the model should always give the correct result • The belief should be unchanged or positively correlated with: 1. positive scores from SLU. 2. sum of the negative of the other values. 3. belief of the last turn. • … Kai Yu. Bridging Rule and Data-driven Models. Constrained Markov Bayesian Polynomial • M(k) denotes the search space of regular MBP of order k, sufficient to just exploit M(3) • Rewrite the above MBP for M(3): (x0=1) • Constraints need to be formalized Kai Yu. Bridging Rule and Data-driven Models. Polynomial coefficients are “parameters” Constraints Formalization – – Formalize constraints using mathematics E.g. for constraint: • The belief should be unchanged or positively correlated with the positive scores from SLU Kai Yu. Bridging Rule and Data-driven Models. Linear Constraints Approximation Approximate the exact constraints with more relaxed easy-to-evaluate linear constraints denotes all possible input vectors Kai Yu. Bridging Rule and Data-driven Models. Rule Generation & Selection The optimal CMBP is the solution of the following integer programming problem DST accuracy evaluated on training data Kai Yu. Bridging Rule and Data-driven Models. Procedure of CMBP Optimization 1. Solve integer programming with a dummy criterion to get all feasible CMBP solutions 2. Calculate state tracking performance of all feasible solutions on the training data 3. Find the optimal rule from top N candidates – Select the one with low complexity – Select multiple ones and perform score combination Kai Yu. Bridging Rule and Data-driven Models. Real-coefficient CMBP • Prior knowledge has helped us to find the structure, can we tune more to the data-driven direction? • Extend integer-coefficient CMBP to real-coefficient CMBP – Get an integer solution – Perform hill climbing using grid search Kai Yu. Bridging Rule and Data-driven Models. Performance on DSTC-2 • Comparison with state-of-the-art DST trackers on DSTC-2 System Rank Baseline* 5 LambdaMART (Williams) 1 RNN (Henderson et al.) 2 DNN (Sun et al.) 3 Int CMBP 2.5 Real CMBP 2.5 Accuracy 0.719 0.784 0.768 0.750 0.756 0.762 L2 0.464 0.735 0.346 0.416 0.370 0.436 Baseline* is the best baseline from the 4 organizer-provided baselines Kai Yu. Bridging Rule and Data-driven Models. Performance on DSTC-3 • Comparison with state-of-the-art DST trackers on DSTC-3 System Baseline* RNN (Henderson et al.) Rank 6 1 Accuracy 0.575 0.646 L2 0.691 0.538 IBM Rule (Kadlec et al.) Int CMBP (Sun et al.) Int CMBP Real CMBP 2 3 2.5 1.5 0.630 0.610 0.623 0.632 0.627 0.556 0.552 0.591 Baseline* is the best baseline from the 4 organizer-provided baselines Kai Yu. Bridging Rule and Data-driven Models. Weakness of CMBP • Adding features and increasing model complexity are difficult – Additional prior knowledge is needed to keep the search space small – Hard to incorporate other intuitive features (e.g. machine actions) • Hill climbing is inefficient Kai Yu. Bridging Rule and Data-driven Models. Recurrent Polynomial Network Datadriven RPN CMBP [Xie, et al. SigDial 2015] Rulebased Kai Yu. Bridging Rule and Data-driven Models. RPN Definition and Example • Input node • Computation node – Sum node – Product node – Activation node Kai Yu. Bridging Rule and Data-driven Models. RPN for DST • A layered RPN structure for dialogue state tracking which essentially corresponds to 3-order CMBP Kai Yu. Bridging Rule and Data-driven Models. Activation Function 2 2 2 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 -1 0 -0.5 1 -1 -0.52 0 0 1 -1 -1 -1 (a) 𝑓(𝑥) = 𝑐𝑙𝑖𝑝(𝑥) 0 1 -0.52 -1 (b) 𝑓(𝑥) = 1 1+𝑒 −5(𝑥−0.5) (c) 𝑓 𝑥 = 𝑠𝑜𝑓𝑡𝑐𝑙𝑖𝑝(𝑥) 0.03 0.02 0.01 0 -0.15 (d) Partial enlargement of 𝑠𝑜𝑓𝑡𝑐𝑙𝑖𝑝(⋅) Kai Yu. Bridging Rule and Data-driven Models. 0.01 0.03 RPN with Activation Function Kai Yu. Bridging Rule and Data-driven Models. More Complex Structure • Add features and recurrent connections Kai Yu. Bridging Rule and Data-driven Models. Training • Initialize RPN weights using CMBP • Each 3-order CMBP corresponds to a set of weights in RPN • Taking advantage of prior knowledge and constraints • MSE criterion • Backpropagation through time (BPTT) to update all weights Kai Yu. Bridging Rule and Data-driven Models. Performance on DSTC-2 • Comparison with state-of-the-art DST trackers on DSTC-2 System Baseline* Rank 5 Accuracy 0.719 L2 0.464 LambdaMART (Williams) RNN (Henderson et al.) DNN (Sun et al.) 1 2 3 0.784 0.768 0.750 0.735 0.346 0.416 Int CMBP Real CMBP RPN 2.5 2.5 2.5 0.756 0.762 0.757 0.370 0.436 0.347 Baseline* is the best baseline from the 4 organizer-provided baselines Kai Yu. Bridging Rule and Data-driven Models. Performance on DSTC-3 • Comparison with state-of-the-art DST trackers on DSTC-3 System Baseline* RNN (Henderson et al.) Rank 6 1 Accuracy 0.575 0.646 L2 0.691 0.538 IBM Rule (Kadlec et al.) Int CMBP (Sun et al.) Int CMBP 2 3 2.5 0.630 0.610 0.623 0.627 0.556 0.552 Real CMBP RPN 1.5 0.5 0.632 0.650 0.591 0.549 Baseline* is the best baseline from the 4 organizer-provided baselines Kai Yu. Bridging Rule and Data-driven Models. Summary • Structures are useful for data driven learning especially for generalization • High-level rules can be used to achieve efficient and understandable model • Bridge rule-based and data-driven model – Use rule to find appropriate model structure – Encode knowledge into constraints – Use data to enhance model power Kai Yu. Bridging Rule and Data-driven Models. Questions? Acknowledgement Thank my students Lu Chen and Kai Sun for helping to prepare the slides Kai Yu. Bridging Rule and Data-driven Models. References • • • • • • • • • Williams J D, Raux A, Ramachandran D, et al. The Dialog State Tracking Challenge. 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SigDial), 2013. Henderson M, Thomson B, Williams J D. The second dialog state tracking challenge . 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SigDial), 2014. Henderson M, Thomson B, Williams J D. The Third Dialog State Tracking Challenge. IEEE Spoken Language Technology Workshop (SLT), 2014. Henderson M, Thomson B, Young S. Deep neural network approach for the dialog state tracking challenge. 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SigDial), 2013. Sun K, Chen L, Zhu S, et al. The SJTU system for dialog state tracking challenge 2. 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SigDial), 2014. Henderson M, Thomson B, Young S. Word-based dialog state tracking with recurrent neural networks. 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SigDial), 2014. Henderson M, Thomson B, Young S. Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised adaptation. IEEE Spoken Language Technology Workshop (SLT), 2014. Zilka L, Marek D, Korvas M, et al. Comparison of Bayesian Discriminative and Generative Models for Dialogue State Tracking. 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SigDial), 2013. Wang Z, Lemon O. A simple and generic belief tracking mechanism for the dialog state tracking challenge: On the believability of observed information. 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SigDial), 2013. Kai Yu. Bridging Rule and Data-driven Models. References • • • • • • • Kadlec R, Libovický J, Macek J, et al. IBM’s Belief Tracker: Results On Dialog State Tracking Challenge Datasets. In Dialog in Motion workshop on EACL , 2014. Kadlec R, Vodolán M, Libovicky J, et al. Knowledge-based Dialog State Tracking. IEEE Spoken Language Technology Workshop (SLT), 2014. Vodolán M, Kadlec R, Kleindienst J. Hybrid Dialog State Tracker. NIPS Workshop on Machine Learning for Spoken Language Understanding and Interaction, 2015. Williams J D. Web-style ranking and SLU combination for dialog state tracking. 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SigDial). 2014. Yu K, Sun K, Chen L, et al. Constrained Markov Bayesian Polynomial for Efficient Dialogue State Tracking. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2015. Sun K, Chen L, Zhu S, et al. A Generalized Rule Based Tracker for Dialogue State Tracking. IEEE Spoken Language Technology Workshop (SLT), 2014. Xie Q, Sun K, Zhu S, et al. Recurrent Polynomial Network for Dialogue State Tracking with Mismatched Semantic Parsers. The 16th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), 2015. Kai Yu. Bridging Rule and Data-driven Models.