Dan Bohus
January 2003
Dialogs on Dialogs Reading Group
Carnegie Mellon University
Early grounding theories
Discourse Contributions - Clark & Schaefer
Conversational acts – Traum
A Computational Framework (Horvitz,
Paek)
Principles
Systems
Grounding in RavenClaw
In discourse, humans collaborate to establish/maintain mutual ground
Discourse is structured in contributions
Contribution : Presentation + Acceptance
Grounding criterion:
“A and B mutually believe that the partners have understood what A said to a criterion sufficient for the current purposes”
Evidence of understanding:
Display
Demonstration
Acknowledgement
Initiating the next relevant contribution
Continued attention
Display/Demonstration order challenged…
Infinite recursion avoided by Strength of
Evidence Principle
4 possible states of non-understading
L did not notice S’s utterance
L notices it but does not hear it correctly
L hears it correctly but does not understand it
L understands it
Conversational acts , extension of speech acts theory
Turn-taking
Grounding
Initiate, Continue, Cancel, ReqAck, Ack,
ReqRepair, Repair
Core speech acts
Argumentational acts
Eliminates infinite recursion by: ack.s don’t need further ack.s
Later work, the following computational model is introduced:
U (
)
GC (
)
( G
C
(
(
)
)
G (
))
Finally, Brennan (& Clark)
another computational formulation;
studies the different types of grounding behaviors in different media
These models are by-and-large descriptive.
Can’t be used to determine what’s the next best thing to do to achieve the grounding criterion.
Moreover, they don’t describe quantitatively/make use of the uncertainty in contributions
Are insensitive to differences in channels, content, populations, etc…
Cannot be used for guidance
Decision Theory to the rescue ! ! !
Action under uncertainty
Given a set of states S = {s}, evidence e, and a set of actions A = {a}, if:
P(s|e) – is a probabilistic model of the state conditioned on the evidence
U(a,s) = the utility of taking action a when in state s.
Take action that maximizes the expected utility:
EU(a|e) =
S
U(a,s)*p(s|e)
Conversation = action under uncertainty
Example: I want to fly to Pittsburgh …
States = {grounded, not_grounded}
Unaccessible, but describable by a probabilistic model
P(g | e) = P(Pittsburgh | e) … confidence annot.
Actions = {explicit_confirm, implicit_confirm, continue_dialog}
Utilities:
U(ec,g) < U(ic,g) < U(cd,g)
U(ec,ng) > U(ic,ng) > U(cd,ng)
States:
NotGrounded (ng)
Grounded (g)
Actions:
ExplicitConfirm (ec)
ImplicitConfirm (ic)
ContinueDialog (cd)
Utilities:
U(ec,g) < U(ic,g) < U(cd,g)
U(ec,ng) > U(ic,ng) > U(cd,ng) ec ic cd ng t1 t2 g
Early grounding theories
Discourse Contributions - Clark & Schaefer
Conversational acts – Traum
A Computational Framework (Horvitz, Paek)
Principles
Systems
DeepListener
Bayesian Receptionist (Quartet architecture)
Presenter (Quartet architecture)
Grounding in RavenClaw
Domain
Provides spoken command-and-control functionality for LookOut
Respond to offers of assistance (Yes/No)
Small domain, but illustrates the core ideas very well
States: 5 possible “intentions” of the user
Acknowledgement
Negation
Reflection
Unrecognized Signal
No Signal
State model P(S|E) – temporal bayesian network.
E = User’s Actions, Content, ASR Results and
Reliability + at time -1
Actions:
Execute the service
Repeat
Note a hesitation and try again
Was that meant for me?
Try to get the user’s attention
Apologize for the interruption and forego the service
Troubleshoot the overall dialog
Utilities
Elicited through psychological experiments
Elicited through slidebars
Works when you have 2, 3 grounding actions, and a clear/small state-space design, but how about when the problem gets more complex ?
Example (paper)
Bayesian Receptionist – performs the tasks of a receptionist at a MS front desk
“I’m here to see Rashid”
“Bathroom?”
“Beam me to 25 please”
… 32 goals
Presenter – command & control interface to PowerPoint presentations.
Both based on Quartet architecture
Uses DT and BN to ensure grounding at 4 different levels:
Signal
Channel
Intention
Conversation
The actual DM task is encapsulated in the same framework at the Intention level
Different domains = different intention levels
At each level infer a distribution over possible states. Key variables:
Signal level – signal identified (low/med/hi)
Channel level –user’s focus of attention
Maintenance module integrates Signal &
Channel levels -> Maintenance Status:
Channel x Signal: NoChannel, NoSignal,
ChannelButNoSignal, SignalButNoChannel,
Signal
Domain is mostly goal inference
Hierarchical decomposition on levels, where lower levels refine the goals into more specific needs
Use BN to model p(goal | e) at leach level
Psychological studies to identify key variables and utilities
Visual cues
Linguistic variables; both syntactic and semantic
To move between levels, compare probability of goal to…
p-progress
(above: do it)
p-guess
(above: search confirmation)
(below: search more info via VOI)
p-backtrack
used on return nodes
Use Value-OfInformation analysis to infer what’s the variable that should be queried next.
What is the size of the learning problem?
(How many BN needed?) How much data needed for training?
Not very clear :
how to deal with attribute/value, with rich ranges (e.g. which bus station ?)
how to deal with basically richer dialog mechanisms (beyond C&C applications)
focus shifts, mixed initiative
providing help
See image. Use Intention and
Maintenance Status to infer:
Grounding: diagnoses mutual understanding
Okay, ChannelFailure, IntentionFailure,
ConversationFailure
Activity goal: measures if the user is engaged or not in an activity with the system
Compute expected utility for each action
(utilities elicited through psychological studies)
Runtime behavior (section 3)
Presenter
The Signal & Channel level allow a uniform treatment in the same framework of continuous listening
Experiments show that it’s better than random, but significantly less so than humans
But then again, the experiments were not very fair, being performed only at that level (i.e. no engaging in dialog allowed)
Deal with misunderstandings
Use probabilistic modeling and decision theory to make grounding decisions (but not task decisions)
I want a room tomorrow morning (0.73)
States: time correctly understood/not
Grounding Actions: no_action, expl_conf, impl_conf, reject
Utilities: try to learn them by relating the actions to an overall dialog/grounding metric
Login
GetQuery
RoomLine
RoomLine
ExecuteQuery DiscussResults
Bye
Grounding Model
Dialog Task
Grounding
Level
Optimal action
State/how well are things going
Strategies/Grounding Actions
Actions Strategies.xls
States (have to keep it small!!!)
Single “state-space” model
What are the variables? Which are observable and which are stochastically modeled?
Multiple “state-space” models
First 5 strategies: state = amount of grounding on each concept
What should state be for the rest? What are the indicators? Which are fully observable and which are not?
How to combine decisions from different spaces
Learn them! How ?
Idea 1: POMDPs, maybe this small they are tractable
Idea 2: Regression to some overall dialog metric
What should that be?
(hmm) amount of non-null grounding actions taken
…
…