Grounding in Conversational Systems Dan Bohus January 2003


Grounding in

Conversational Systems

Dan Bohus

January 2003

Dialogs on Dialogs Reading Group

Carnegie Mellon University


 Early grounding theories

 Discourse Contributions - Clark & Schaefer

 Conversational acts – Traum

 A Computational Framework (Horvitz,


 Principles

 Systems

 Grounding in RavenClaw

Clark & Schaefer

 In discourse, humans collaborate to establish/maintain mutual ground

 Discourse is structured in contributions

 Contribution : Presentation + Acceptance

 Grounding criterion:

“A and B mutually believe that the partners have understood what A said to a criterion sufficient for the current purposes”

Clark & Schaefer (2)

 Evidence of understanding:

 Display

 Demonstration

 Acknowledgement

 Initiating the next relevant contribution

 Continued attention

Display/Demonstration order challenged…

Clark & Schaefer (3)

 Infinite recursion avoided by Strength of

Evidence Principle

 4 possible states of non-understading

 L did not notice S’s utterance

 L notices it but does not hear it correctly

 L hears it correctly but does not understand it

 L understands it


 Conversational acts , extension of speech acts theory

 Turn-taking

 Grounding

 Initiate, Continue, Cancel, ReqAck, Ack,

ReqRepair, Repair

 Core speech acts

 Argumentational acts

Eliminates infinite recursion by: ack.s don’t need further ack.s

Traum (2)

 Later work, the following computational model is introduced:

U (

  


GC (


( G






G (


 Finally, Brennan (& Clark)

 another computational formulation;

 studies the different types of grounding behaviors in different media


These models are by-and-large descriptive.

 Can’t be used to determine what’s the next best thing to do to achieve the grounding criterion.

Moreover, they don’t describe quantitatively/make use of the uncertainty in contributions

 Are insensitive to differences in channels, content, populations, etc…

 Cannot be used for guidance

 Decision Theory to the rescue ! ! !

Decision Theory

 Action under uncertainty

 Given a set of states S = {s}, evidence e, and a set of actions A = {a}, if:

 P(s|e) – is a probabilistic model of the state conditioned on the evidence

 U(a,s) = the utility of taking action a when in state s.

 Take action that maximizes the expected utility:

 EU(a|e) =



Conversation under Uncertainty

Conversation = action under uncertainty

Example: I want to fly to Pittsburgh …

 States = {grounded, not_grounded}

Unaccessible, but describable by a probabilistic model

P(g | e) = P(Pittsburgh | e) … confidence annot.

 Actions = {explicit_confirm, implicit_confirm, continue_dialog}

 Utilities:

 U(ec,g) < U(ic,g) < U(cd,g)

 U(ec,ng) > U(ic,ng) > U(cd,ng)

I want to fly to Pittsburgh (2)

 States:

 NotGrounded (ng)

 Grounded (g)

 Actions:

 ExplicitConfirm (ec)

 ImplicitConfirm (ic)

 ContinueDialog (cd)

 Utilities:

 U(ec,g) < U(ic,g) < U(cd,g)

 U(ec,ng) > U(ic,ng) > U(cd,ng) ec ic cd ng t1 t2 g


 Early grounding theories

 Discourse Contributions - Clark & Schaefer

 Conversational acts – Traum

 A Computational Framework (Horvitz, Paek)

 Principles

 Systems


 Bayesian Receptionist (Quartet architecture)

 Presenter (Quartet architecture)

 Grounding in RavenClaw

DeepListener - Domain

 Domain

 Provides spoken command-and-control functionality for LookOut

 Respond to offers of assistance (Yes/No)

 Small domain, but illustrates the core ideas very well

DeepListener - States

States: 5 possible “intentions” of the user

 Acknowledgement

 Negation

 Reflection

 Unrecognized Signal

 No Signal

State model P(S|E) – temporal bayesian network.

 E = User’s Actions, Content, ASR Results and

Reliability + at time -1

DeepListener - Actions

 Actions:

 Execute the service

 Repeat

 Note a hesitation and try again

 Was that meant for me?

 Try to get the user’s attention

 Apologize for the interruption and forego the service

 Troubleshoot the overall dialog

DeepListener - Utilities

 Utilities

 Elicited through psychological experiments

 Elicited through slidebars

 Works when you have 2, 3 grounding actions, and a clear/small state-space design, but how about when the problem gets more complex ?

 Example (paper)

Bayesian Receptionist, Presenter

Bayesian Receptionist – performs the tasks of a receptionist at a MS front desk

 “I’m here to see Rashid”

 “Bathroom?”

 “Beam me to 25 please”

 … 32 goals

Presenter – command & control interface to PowerPoint presentations.

 Both based on Quartet architecture


 Uses DT and BN to ensure grounding at 4 different levels:

 Signal

 Channel

 Intention

 Conversation

 The actual DM task is encapsulated in the same framework at the Intention level

 Different domains = different intention levels

Quartet – Signal & Channel

 At each level infer a distribution over possible states. Key variables:

 Signal level – signal identified (low/med/hi)

 Channel level –user’s focus of attention

 Maintenance module integrates Signal &

Channel levels -> Maintenance Status:

 Channel x Signal: NoChannel, NoSignal,

ChannelButNoSignal, SignalButNoChannel,


Quartet – Intention Level

 Domain is mostly goal inference

 Hierarchical decomposition on levels, where lower levels refine the goals into more specific needs

 Use BN to model p(goal | e) at leach level

 Psychological studies to identify key variables and utilities

 Visual cues

 Linguistic variables; both syntactic and semantic

Quartet – Intention Level

 To move between levels, compare probability of goal to…

 p-progress

 (above: do it)

 p-guess

 (above: search confirmation)

 (below: search more info via VOI)

 p-backtrack

 used on return nodes

Use Value-OfInformation analysis to infer what’s the variable that should be queried next.

Comments on Intention level

 What is the size of the learning problem?

(How many BN needed?) How much data needed for training?

 Not very clear :

 how to deal with attribute/value, with rich ranges (e.g. which bus station ?)

 how to deal with basically richer dialog mechanisms (beyond C&C applications)

 focus shifts, mixed initiative

 providing help

Quartet – Conversation Level

 See image. Use Intention and

Maintenance Status to infer:

 Grounding: diagnoses mutual understanding

 Okay, ChannelFailure, IntentionFailure,


 Activity goal: measures if the user is engaged or not in an activity with the system

 Compute expected utility for each action

(utilities elicited through psychological studies)

Bayesian Receptionist, Presenter

 Runtime behavior (section 3)

 Presenter

 The Signal & Channel level allow a uniform treatment in the same framework of continuous listening

 Experiments show that it’s better than random, but significantly less so than humans

 But then again, the experiments were not very fair, being performed only at that level (i.e. no engaging in dialog allowed)

My Research …

 Deal with misunderstandings

 Use probabilistic modeling and decision theory to make grounding decisions (but not task decisions)

 I want a room tomorrow morning (0.73)

 States: time correctly understood/not

 Grounding Actions: no_action, expl_conf, impl_conf, reject

 Utilities: try to learn them by relating the actions to an overall dialog/grounding metric

RavenClaw: Dialog Task / Grounding





ExecuteQuery DiscussResults


Grounding Model

Dialog Task



Optimal action

State/how well are things going

Strategies/Grounding Actions

States and Actions

 Actions Strategies.xls

 States (have to keep it small!!!)

 Single “state-space” model

 What are the variables? Which are observable and which are stochastically modeled?

 Multiple “state-space” models

 First 5 strategies: state = amount of grounding on each concept

 What should state be for the rest? What are the indicators? Which are fully observable and which are not?

 How to combine decisions from different spaces


 Learn them! How ?

 Idea 1: POMDPs, maybe this small they are tractable

 Idea 2: Regression to some overall dialog metric

 What should that be?

 (hmm) amount of non-null grounding actions taken

 …
