'LDORJXHPDQDJHPHQWIRUDQDGYLFHJLYLQJYLUWXDODVVLVWDQW Ana García-Serrano, Javier Calle, and Josefa Z. Hernández

advertisement
'LDORJXHPDQDJHPHQWIRUDQDGYLFHJLYLQJYLUWXDODVVLVWDQW
Ana García-Serrano, Javier Calle, and Josefa Z. Hernández
Computer Science Department
TechnicalUniversity of Madrid
Boadilla del Monte
28660 Madrid, Spain
{agarcía, jcalle, phernan} @isys.dia.fi.upm.es
Abstract
In this contribution we present the work related to the
dialogue management performed in the ADVICE1
project, an on-going European Commission research
project. The overall objective of this project is to
design and implement an advice-giving system for
e-commerce, supported by QDWXUDOODQJXDJH, JUDSKLFV
and DQLPDWHG ' FKDUDFWHU Classical interfaces used
to leave almost all the responsibility of the interaction
to the user, instead of sharing the commitment. These
commitments motivate the clarifications and
confirmations always present in conversations. Also
appears the need of a ‘common ground’ between user
and system, which is reached in this work by the
‘threads model’. The virtual assistant performs a
dialogue based interaction supported by the joint
intention management and a generic product offer tree
provided by the domain model of the product
specialist to accomplish the system's participation in
the dialogue (pro-active assistance to the user) to
improve the success possibilities of the interaction.
Keywords: Dialogue, Communicative acts, Thread,
User-Computer Interaction.
1.
Introduction
Searching for and selecting products in digital markets is a
difficult task for consumers mainly due to the lack of
intelligent support or assistance on the Web. In addition to
the capability of obtaining good quality information for the
user, is also necessary to endow the system with expressive
enough capabilities to adequately express this output.
Multimodal interfaces make possible the integrated
performance of different media, improving in this way the
success possibilities of the interaction.
The overall objective of the ADVICE project is to
design and implement an advice-giving system for
E-commerce, supporting a move from the current catalogue
based customer services to a customer adapted intelligent
1 EU Project IST 1999-11305
(www.advice.iao.fhg.de)
assistance, emulating in some way the performance of a
human seller. The agent-based design allows the
concurrence in tasks processing, the distribution of the
resources like knowledge and methods, and the flexible
interaction among the system components: the interface
agent, the interaction agent and the Intelligent Agent.
The Interface Agent is responsible for the multimedia
interaction. There are two input modes: the English
language, through a Natural Language Interpreter, and a
Graphic User Interface (by clicking on the settled items,
such as icons, menus, etc). The interface agent will collect
the user utterances and transform them into semantic
structures (streams of speech acts). Regarding the output,
there are three different modes: NL Generator, GUI, and
3d-character avatar. The outgoing semantic structures
(speech acts) from the Interaction Agent are the input to the
Interface Agent to construct a coordinated complete output.
The Interaction Agent is responsible for the adequate
management of the interaction between the Interface Agent
and the Intelligent Agent, as well as with the user. It means
that it has to (i) manage the evolution of the conversation in
a coherent way, (ii) deliver to the Intelligent Agent the
query of the user together with relevant information about
this user that may influence the selection of the appropriate
answer and (iii) send to the Interface Agent the information
to be presented to the user at every moment.
The Intelligent Agent is responsible for the generation
of the information required by the customer. A two-layer
knowledge model supports it: the reasoning model structure
and the domain structure. The reasoning model structure
contains the collection of problem solving methods
specialized in performing the different required tasks (e.g.
sales-oriented tasks) attending to the user demand.
Classical interfaces used to help the user through the
interaction, instead of sharing a commitment. A dialogue is
a full-convene process where both speakers need to be tuned
up for the best performance. For a flexible dialogue is
required at least one joint commitment (so the speakers can
understand one another). These commitments motivate the
clarifications and confirmations always present in
conversations. Research to model the commitment has been
carried out in the late years, such as theoretical models of
joint action [Cohen & Levesque, 91]. Recent IST project
Trindi [Cooper & Larsson, 99] claims for the need of a
‘common ground’ between user and system. IST project
Advice joins this line of research prototyping the ‘threads
model’.
The interaction, as focused for an advice-giving
assistant for the e-commerce, could be seen as a sequence of
alternative interventions from both participants, user and
system. A session will be built through several dialogues.
Eventually, they can be related each other, even nested.
That should be seen as a chess game, in which both players
make their moves to attain their goals that could imply
smaller nested sub goals. In a formal analysis, some of the
states of the ‘dialogue game’ should be observed. They
could be represented as states in a FSA. The transition one
state after the other will stand for dialogue steps or moves in
that game. Those steps keep a conventional relation,
shaping a ‘valid dialogue’ for both interlocutors through
their interaction. There is also static information all over the
dialogue (the context).
From the locutive point of view, both interlocutors
play their role in turns, so the dialogue is divided into
discourses generated alternatively by user and system.
Finally there is a third locutive level the utterance the
atomic information piece of the discourse.
A satisfactory management of dialogue requires in
general both semantic representation (content of what has
been expressed) and pragmatic information. The semantic
structures used to link interface and interaction agents in the
Advice project are based on Searle’s speech acts [Searle,
69]. The current identified set of speech acts is shown in the
figure 1:
Courtesy acts
Salute: < c, a >
Farewell: < c, a >
Thank: < c, a >
Disannoy: < c, a >
Empathetic: <c,a>
Satisfy: < c, a >
Wish: < c, a >
[c]: conventional (formal/informal)
[a]: allowable (open/close)
Examples:
Nice to see you [name]: salute(i,c)
Excuse me... : disannoy(f,c)
Don’t worry... : empathetic(i,c)
Excellent : satisfy(f,c)
Command:<t,s,c>
Null Speech
Null: < >
[s]: subject (user/system/...)
[c]: content (values...)
Examples:
Who are...?: request(data,identity,...,)
Show me some saws:
command(search, system, product)
Well,... : null()
Figure 1: Set of Speech Acts for the Advice
2.2
Components of the Interaction
It could be differentiated between controlling the
conversation (Dialogue Manager) and handling the details
of the interaction (Session Model). Also after the decision
taking process performed to identify the system next
movement, it is necessary to generate it (Discourse Maker).
The Dialogue Manager controls the state and the
intention (thread) of the interaction. Interventions in the
dialogue will be divided into ‘discourse pieces’ (subset of
the discourse with independent understanding), and each of
them is represented through a collection of speech acts.
Throughout an intervention, it's needed to suit and update
the thread or locutive line, and to produce a ‘transition’
from the previous ‘dialogue state’ to the next(s) one(s).
Finally, there's another kind of information carried out by
the speech acts: the ‘details’ of the conversation (names,
features, etc). This information is ‘static’ during all the
session and is stored in a ‘Session Model’.
The Dialogue Manager component has the control of
,QWHUDFWLRQ$JHQW
6HVVLRQ
0RGHO
Semantic structures
'LDORJXH
0DQDJHU
'6WDWH
7KUHDG
Representative acts
[t]: type (confirmation / data /...)
Inform: <t,m,s,c > [m]: matter (approve/deny/identity/..)
[s]: subject (product/user/system/...)
[c]: content (...)
Example:
I’m Al: inform(data,identity,user,Al)
Authoritative acts
[m]: matter (start / offer / task / ...)
Authorize: <m,a > [a]: allowable (open / closed)
Example:
Can I help you? authorize(task,open)
Directive acts
[t]: type (choice/data/comparison/...)
Request: <t,m,s,c> [m]: matter (approve/deny/identity/..)
Semantic structures
'LVFRXUVH
0DNHU
Configurable offer
Fig. 2: Components of the ADVICE Interaction Agent
the conversation (the formal aspects through the Dialogue
state and intentional ones through the thread). After
processing the information carried by the speech acts, the
Discourse Maker should be able of constructing the system
response. At this point, some content will be required for
filling the response. This content will be obtained from the
context. When ‘new’ information is required, the system
will ask the Intelligent Agent to produce it (using data from
the thread and the context). The Intelligent Agent outcome
is a ‘configurable offer’ modeled as a decision tree
structure. The Discourse Maker receives the decision tree
and extracts the needed information to reach a solution by
inquiring the user for precise explanations.
Once the ‘content filling’ is completed, a semantic
structure is obtained as a stream of chained speech acts. The
interface agent will translate these structures into sentences,
icons or menus.
3.
Processing example
User and system make discourses by turn (all discourses
shown in the example are taken from NL interpreter, just for
simplicity). The utterances are interpreted and subsequent
streams of ‘speech acts’ generated. In figure 2, it can be
seen two ways for the same meaning (except for the pet
word, interpreted as a NULL speech act).
For success in the communication, both entities should have
a common discourse line. In the same way, there should be
a third entity agreeing them both. This will be the task of the
‘thread joint’, managing user’s and system’s new and past
threads. In the following figure 4 is shown some discourses
of a dialogue, and the corresponding threads generated.
WXUQ 'LVFRXUVH
6 Welcome to Tooltechnics,
I’m the virtual sales assistant.
8 Hi there. I’m John Smith.
6 Hello Mr. Smith. Nice to meet you.
What can I do for you?
8 Well, I want a circular saw.
NULL
Well, I want a pendulum
Given a set of accessories,
the
user
clicks
on
“ d l
”
As for the intentional processing, unfortunately it is a bit
more complex. There should be differentiated between
user’s intention (or the system’s image of it), and the
system’s monitor of intentional reactions (this is, the
intentions or ilocutive goals generated by the system to
succeed in the interaction). These reactions usually come
from some event on the processing (in the dialogue or in the
search for intelligent content), so it’s most appropriate to set
a dedicated entity to manage them.
Natural
Language
Interprete
COMMAND
matter:
purchase
subject:
system
object:
product
INFORM
type:
matter:
subject:
object:
GUI
Interacti
on
INFORM
type:
matter:
subject:
object:
Authorize
(task)
Command
(product)
Request
6 What kind of saw do you want to
purchase, a pendulum-cover saw or a (product)
plunge-cut saw?
Request
(data)
8 What’s the difference between them? Request
(data)
Solve
6 Well, you see...
data
identity
product
saw
data
feature
product
Figure 2: utterance interpretation
Then, the dialogue manager converts them into steps, as
shown in figure 3. These steps are obtained thanks to some
prediction of the feasible steps for the given circumstance,
besides the input from the interface of course. They will be
let into the dialogue state manager, that will process them
and consequently next state will be forced (FSA). This state
defines the form of system’s next discourse.
WXUQ 'LVFRXUVH
3DWWHUQ
6 Welcome to Tooltechnics,
Greeting
I’m the virtual sales assistant.
Greeting
8 Hi there. I’m John Smith.
Greeting
6 Hello Mr. Smith. Nice to meet you.
What can I do for you?
Authorize
Command
8 Well, I want a circular saw.
System
6 What kind of saw do you want to
purchase, a pendulum-cover saw or a
requires
plunge-cut saw?
Explanation
8 What’s the difference between them? Question
System
6 Well, you see...
provides
Explanation
Figure 3: dialogue steps
7KUHDG
Figure 4: intentions and threads
In the figure 5 is shown the full process in a moment of the
dialogue. The user makes a discourse that implies an inform
act. The information (circular saw, furthermore, a plunge
cut type one) is stored in the session model. The manager
changes the state (from ‘require clarification’ returns to
8VHU profiler
0RGHO
6HVVLRQ0RGHO
1. Recent Discourse
2. Context 5
toolname circular
$YDWDUV
5
³,WKLQN,QHHG
DSOXQJHFXWVDZ´
,1
1/,17(535(7(5
John Smith
,QWHUDFWLRQ$JHQW
,QWHUIDFH$JHQW
λ
name
type
plunge-cut
Add_data (product, type, “plunge-cut”)
'LDORJXH0DQDJHU. State: 6ROYH7DVN
,QIRUP< data,
purchase, product,
“plunge-cut saw” >
Thread Joint stack: 5HTXHVW(product)
&RPPDQG(r_prod.)
λ
5HTXHVW
(data)
6ROYH7DVN
(prod)
5HTXHVW
1/*(1(5$725
³([FHOOHQW
:KDWGR\RXQHHGLWIRU"´
287
'LVFRXUVH0DNHU
6DWLVI\<formal, closed! Proceed: solves bifurcation; simplifies tree; no
5HTXHVW<data,
solution; generates new request for explanation.
application , product>
Choice among two options
(in this case)
Figure 5: interaction flow
‘solve task’. The thread will close system’s ‘asking about
type’, then follow to previous one (user’s ‘request product’).
The discourse maker fails in constructing a response with
the final solution, because the tree of solutions has several
of them. Hence, springs a new system’s thread: ‘request
more data’. The discourse maker is now able to make the
response: a question referring next bifurcation in the
solutions tree. This response becomes more definite in a
stream of speech acts (satisfy + request) that will be
interpreted by the Natural Language generator. Finally, the
‘action’ taken by the system will be used for updating the
state (again to ‘require clarification’).
4
Current Related Work
The approach showing the dialogue as a game scene, in
which two players ‘act’ by turn seeking a goal, has been
established on the regularity observed in conversations:
similar goals in similar context should follow the same
pattern (global organizing principle). Moreover, a discourse
of a kind (‘dialogue act’) should be reacted with a certain
kind of discourse (local organizing principle). For example,
a question should be followed by an answer (adjacency
pairs). Even if these predictions fail, divergences will be
blazed with some extra information (often prosodic).
Summarizing, interlocutors use to ‘act’ according with a set
of shared scripts or better named ‘games’ [Levin,
Moore,78].
The Trains project [Allen et al., 94] pursues an English
speaking interface for solving some task (collaborating with
the user) in a rigid domain. [Kowtko, Isard, 93] adds a
theory of intonational pragmatics in which an intonational
tune might differ the sort of (functional) move being made
in a conversational game. [Poesio, Mikheev, 98] show some
prediction with the sub-game nesting problem (based on a
notion of game structure).
The plan-based approach is based upon the fact that
humans plan actions to achieve goals, for understanding a
user the system has to predict those plans. This is, for
answering to user’s utterances it has to uncover and respond
appropriately to the underlying plan; for helping the user to
achieve his goals, it has to predict them and construct the
suitable plan. This approach is good for the real dialogue in
which not every piece of it does explicitly appear (or
perhaps it’s not perceived). It’s also good for language
eccentricities, such as metaphor in which goals can be
grasped although not explicitly shown. Cutting dispensable
phrases out can enhance conversation. Finally, if an
emotional model is added [Breese and Ball, 98; Lisetti, 99],
some additional goals might be added to that plan, so the
system might react to user’s aim for making him
comfortable.
Longbow project, from the University of Pittsburgh,
can illustrate this sort of model. Longbow [Young 96] is a
hierarchical discourse planning system that combines
decompositional reasoning (planning to perform an abstract
action by performing that action's sub-steps) and causal
reasoning (planning to perform an action by ensuring that
the action's preconditions all hold before its execution) in
the same representation. It’s based on a domain-independent
hierarchical AI planning algorithm named DPOCL
(Decompositional, Partial-Order Causal Link planner)
[Young et al., 94].
The plan-based modeling approach provides a
generalization in which dialogue is handled as a particular
act among other, even unconscious, communicative acts.
Furthermore, dialogue can be seen as a special case of other
rational non-communicative behavior [Cohen, 97].
However, these models spend much effort in
redundant act recognition. On the other hand, the certainty
handled for each ‘move’ is very low, both for the analysis
and for the response generation.
There must be distinction between task and dialogue
control speech. For this, a new approach is presented:
multilevel plan structures. Complexity is far from hand,
being unplumbled and sometimes undecidable. Restricted
planning may be reasonable in some cases. Goals don’t
explain every human act, because some of them are
unmotivated (or have hidden complex goals). Thus, a
supplementation for this model is needed.
[Breese and Ball, 98] explain that some motivations
are better seen as a ‘state’ of the user, either for long
(personality) or short time (emotion). Hence, the
conversation is interpreted and constructed by any
conversational model (interpreter plus generator), and
conducted by an emotional conversational model
(affective). A model that will lead to 'what mood is the user
in' (as well as ‘what to say when user is... ’) by choosing
among semantically equivalent, but emotionally diverse
paraphrases.
[Lisetti, 99] uses prototypes or schematas of emotional
situations (memory of emotional context). So each emotion
is to have a script containing its causal chain and its
components. These scripts complement the conversation
model by describing the user state. In addition, they can also
be used to dynamically instantiate motivational states for
the computer entities interacting with the user, so goals are
provided for plan generation.
The Joint-Action model [Cohen, Levesque 91] is based
upon the premise that a dialogue is feed by both talking
entities, and that they have at least one common reference
point, a commitment. Those joints have to be kept ‘alive’ by
both entities with an acceptable level of certainty and
efficiency, so they can refer to it. Therefore, they’ll need to
ask for clarification to avoid getting it too low, or to confirm
or make enforcement utterances to let the interlocutor know
there’s no need for redundant speech in that point.
Conclusion
It was presented the Interaction agent that allows an
advanced multimedia user interface that ensures a high level
of user-system interaction. The main advantage comes from
the integration of advanced understanding and expression
capabilities with the intelligent configuration of answers
what leads to the generation of the right information in the
right way and moment.
The role played by the intelligent agent in the
underlying interaction processes as well as the dialogue
management was illustrated through an example. The
current prototype is running in Ciao Prolog [Bueno, 99] and
Java in a PII processor machine.
References
Allen, J.F., Schubert, L.K., Ferguson, G.M., Heeman, P.A.,
Hwang, C.H., Kato, T., Light, M.N., Martin, N.G, Miller,
B.W, Poesio, M, Traum, D.R.. (1994). The TRAINS
project: A case study in building a conversational
planning agent. Technical Rep. 532 Dep. of Computer
Science, University of Rochester
Breese, J., Ball, G. (1998). Modeling Emotional State and
Personality for Conversational Agents. Interactive and
Mixed-Initiative Decision Theoretic Systems, Technical
Rep SS-98-03 1998 AAAI Press
Bueno, F., Cabeza, D., Carro, M., Hermenegildo, M., López,
P., Puebla, G. (1999). The Ciao Prolog System: A Next
Generation Logic Programming Environment. Reference
Manual. The Ciao System Documentation Series Technical
Report CLIP 3/97.1, The CLIP Group, School of Computer
Science, Technical University of Madrid.
Cohen, P.R., Levesque, H.J. (1991) Confirmation and Joint
Action, Proceedings of International Joint Conf. on
Artificial Intelligence, 1991
Cohen, P.R. (1997). Dialogue Modeling. Survey of the state
of the art in Human Language Technology, 6, pp. 234240.
Kowtko, J.C., Isard, S.D., Doherty, G.M. (1993).
Conversational Games Within Dialogue, Research P. 31,
Human Comm. Research Centre, Edinburgh.
Levin, J.A., J. A., Moore. (1977). Dialogue games:
Metacommunication structures for natural language
interaction. Cognitive Sc. 1(4), 395-420
Lisetti, C. (1999). A User Model of Emotion-Cognition.
Proceedings of Seventh International Conference on User
Modeling, 1999.
Poesio, M, Mikheev, A. (1998) The Predictive Power of
Game Structure in Dialogue Act Recognition:
Experimental Results Using Maximum Entropy
Estimation. Proceedings of ICSLP-98, Nov 1998
Searle, J.R., (1969). Speech Acts: an essay in the philosophy
of language. Cambridge Univ. Press.
Young, R. M., Moore, J. D., Pollack, M. E., (1994).
Towards a Principled Representation of Discourse Plans.
Proceedings of the Sixteenth Conference of the Cognitive
Science Society, Atlanta, GA, 1994.
Young, M., 1996. A Developer's Guide to the Longbow
Discourse Planning System. University of Pittsburgh
Intelligent Systems Program Technical Report 94-4.
Download