'LDORJXHPDQDJHPHQWIRUDQDGYLFHJLYLQJYLUWXDODVVLVWDQW Ana García-Serrano, Javier Calle, and Josefa Z. Hernández Computer Science Department TechnicalUniversity of Madrid Boadilla del Monte 28660 Madrid, Spain {agarcía, jcalle, phernan} @isys.dia.fi.upm.es Abstract In this contribution we present the work related to the dialogue management performed in the ADVICE1 project, an on-going European Commission research project. The overall objective of this project is to design and implement an advice-giving system for e-commerce, supported by QDWXUDOODQJXDJH, JUDSKLFV and DQLPDWHG ' FKDUDFWHU Classical interfaces used to leave almost all the responsibility of the interaction to the user, instead of sharing the commitment. These commitments motivate the clarifications and confirmations always present in conversations. Also appears the need of a ‘common ground’ between user and system, which is reached in this work by the ‘threads model’. The virtual assistant performs a dialogue based interaction supported by the joint intention management and a generic product offer tree provided by the domain model of the product specialist to accomplish the system's participation in the dialogue (pro-active assistance to the user) to improve the success possibilities of the interaction. Keywords: Dialogue, Communicative acts, Thread, User-Computer Interaction. 1. Introduction Searching for and selecting products in digital markets is a difficult task for consumers mainly due to the lack of intelligent support or assistance on the Web. In addition to the capability of obtaining good quality information for the user, is also necessary to endow the system with expressive enough capabilities to adequately express this output. Multimodal interfaces make possible the integrated performance of different media, improving in this way the success possibilities of the interaction. The overall objective of the ADVICE project is to design and implement an advice-giving system for E-commerce, supporting a move from the current catalogue based customer services to a customer adapted intelligent 1 EU Project IST 1999-11305 (www.advice.iao.fhg.de) assistance, emulating in some way the performance of a human seller. The agent-based design allows the concurrence in tasks processing, the distribution of the resources like knowledge and methods, and the flexible interaction among the system components: the interface agent, the interaction agent and the Intelligent Agent. The Interface Agent is responsible for the multimedia interaction. There are two input modes: the English language, through a Natural Language Interpreter, and a Graphic User Interface (by clicking on the settled items, such as icons, menus, etc). The interface agent will collect the user utterances and transform them into semantic structures (streams of speech acts). Regarding the output, there are three different modes: NL Generator, GUI, and 3d-character avatar. The outgoing semantic structures (speech acts) from the Interaction Agent are the input to the Interface Agent to construct a coordinated complete output. The Interaction Agent is responsible for the adequate management of the interaction between the Interface Agent and the Intelligent Agent, as well as with the user. It means that it has to (i) manage the evolution of the conversation in a coherent way, (ii) deliver to the Intelligent Agent the query of the user together with relevant information about this user that may influence the selection of the appropriate answer and (iii) send to the Interface Agent the information to be presented to the user at every moment. The Intelligent Agent is responsible for the generation of the information required by the customer. A two-layer knowledge model supports it: the reasoning model structure and the domain structure. The reasoning model structure contains the collection of problem solving methods specialized in performing the different required tasks (e.g. sales-oriented tasks) attending to the user demand. Classical interfaces used to help the user through the interaction, instead of sharing a commitment. A dialogue is a full-convene process where both speakers need to be tuned up for the best performance. For a flexible dialogue is required at least one joint commitment (so the speakers can understand one another). These commitments motivate the clarifications and confirmations always present in conversations. Research to model the commitment has been carried out in the late years, such as theoretical models of joint action [Cohen & Levesque, 91]. Recent IST project Trindi [Cooper & Larsson, 99] claims for the need of a ‘common ground’ between user and system. IST project Advice joins this line of research prototyping the ‘threads model’. The interaction, as focused for an advice-giving assistant for the e-commerce, could be seen as a sequence of alternative interventions from both participants, user and system. A session will be built through several dialogues. Eventually, they can be related each other, even nested. That should be seen as a chess game, in which both players make their moves to attain their goals that could imply smaller nested sub goals. In a formal analysis, some of the states of the ‘dialogue game’ should be observed. They could be represented as states in a FSA. The transition one state after the other will stand for dialogue steps or moves in that game. Those steps keep a conventional relation, shaping a ‘valid dialogue’ for both interlocutors through their interaction. There is also static information all over the dialogue (the context). From the locutive point of view, both interlocutors play their role in turns, so the dialogue is divided into discourses generated alternatively by user and system. Finally there is a third locutive level the utterance the atomic information piece of the discourse. A satisfactory management of dialogue requires in general both semantic representation (content of what has been expressed) and pragmatic information. The semantic structures used to link interface and interaction agents in the Advice project are based on Searle’s speech acts [Searle, 69]. The current identified set of speech acts is shown in the figure 1: Courtesy acts Salute: < c, a > Farewell: < c, a > Thank: < c, a > Disannoy: < c, a > Empathetic: <c,a> Satisfy: < c, a > Wish: < c, a > [c]: conventional (formal/informal) [a]: allowable (open/close) Examples: Nice to see you [name]: salute(i,c) Excuse me... : disannoy(f,c) Don’t worry... : empathetic(i,c) Excellent : satisfy(f,c) Command:<t,s,c> Null Speech Null: < > [s]: subject (user/system/...) [c]: content (values...) Examples: Who are...?: request(data,identity,...,) Show me some saws: command(search, system, product) Well,... : null() Figure 1: Set of Speech Acts for the Advice 2.2 Components of the Interaction It could be differentiated between controlling the conversation (Dialogue Manager) and handling the details of the interaction (Session Model). Also after the decision taking process performed to identify the system next movement, it is necessary to generate it (Discourse Maker). The Dialogue Manager controls the state and the intention (thread) of the interaction. Interventions in the dialogue will be divided into ‘discourse pieces’ (subset of the discourse with independent understanding), and each of them is represented through a collection of speech acts. Throughout an intervention, it's needed to suit and update the thread or locutive line, and to produce a ‘transition’ from the previous ‘dialogue state’ to the next(s) one(s). Finally, there's another kind of information carried out by the speech acts: the ‘details’ of the conversation (names, features, etc). This information is ‘static’ during all the session and is stored in a ‘Session Model’. The Dialogue Manager component has the control of ,QWHUDFWLRQ$JHQW 6HVVLRQ 0RGHO Semantic structures 'LDORJXH 0DQDJHU '6WDWH 7KUHDG Representative acts [t]: type (confirmation / data /...) Inform: <t,m,s,c > [m]: matter (approve/deny/identity/..) [s]: subject (product/user/system/...) [c]: content (...) Example: I’m Al: inform(data,identity,user,Al) Authoritative acts [m]: matter (start / offer / task / ...) Authorize: <m,a > [a]: allowable (open / closed) Example: Can I help you? authorize(task,open) Directive acts [t]: type (choice/data/comparison/...) Request: <t,m,s,c> [m]: matter (approve/deny/identity/..) Semantic structures 'LVFRXUVH 0DNHU Configurable offer Fig. 2: Components of the ADVICE Interaction Agent the conversation (the formal aspects through the Dialogue state and intentional ones through the thread). After processing the information carried by the speech acts, the Discourse Maker should be able of constructing the system response. At this point, some content will be required for filling the response. This content will be obtained from the context. When ‘new’ information is required, the system will ask the Intelligent Agent to produce it (using data from the thread and the context). The Intelligent Agent outcome is a ‘configurable offer’ modeled as a decision tree structure. The Discourse Maker receives the decision tree and extracts the needed information to reach a solution by inquiring the user for precise explanations. Once the ‘content filling’ is completed, a semantic structure is obtained as a stream of chained speech acts. The interface agent will translate these structures into sentences, icons or menus. 3. Processing example User and system make discourses by turn (all discourses shown in the example are taken from NL interpreter, just for simplicity). The utterances are interpreted and subsequent streams of ‘speech acts’ generated. In figure 2, it can be seen two ways for the same meaning (except for the pet word, interpreted as a NULL speech act). For success in the communication, both entities should have a common discourse line. In the same way, there should be a third entity agreeing them both. This will be the task of the ‘thread joint’, managing user’s and system’s new and past threads. In the following figure 4 is shown some discourses of a dialogue, and the corresponding threads generated. WXUQ 'LVFRXUVH 6 Welcome to Tooltechnics, I’m the virtual sales assistant. 8 Hi there. I’m John Smith. 6 Hello Mr. Smith. Nice to meet you. What can I do for you? 8 Well, I want a circular saw. NULL Well, I want a pendulum Given a set of accessories, the user clicks on “ d l ” As for the intentional processing, unfortunately it is a bit more complex. There should be differentiated between user’s intention (or the system’s image of it), and the system’s monitor of intentional reactions (this is, the intentions or ilocutive goals generated by the system to succeed in the interaction). These reactions usually come from some event on the processing (in the dialogue or in the search for intelligent content), so it’s most appropriate to set a dedicated entity to manage them. Natural Language Interprete COMMAND matter: purchase subject: system object: product INFORM type: matter: subject: object: GUI Interacti on INFORM type: matter: subject: object: Authorize (task) Command (product) Request 6 What kind of saw do you want to purchase, a pendulum-cover saw or a (product) plunge-cut saw? Request (data) 8 What’s the difference between them? Request (data) Solve 6 Well, you see... data identity product saw data feature product Figure 2: utterance interpretation Then, the dialogue manager converts them into steps, as shown in figure 3. These steps are obtained thanks to some prediction of the feasible steps for the given circumstance, besides the input from the interface of course. They will be let into the dialogue state manager, that will process them and consequently next state will be forced (FSA). This state defines the form of system’s next discourse. WXUQ 'LVFRXUVH 3DWWHUQ 6 Welcome to Tooltechnics, Greeting I’m the virtual sales assistant. Greeting 8 Hi there. I’m John Smith. Greeting 6 Hello Mr. Smith. Nice to meet you. What can I do for you? Authorize Command 8 Well, I want a circular saw. System 6 What kind of saw do you want to purchase, a pendulum-cover saw or a requires plunge-cut saw? Explanation 8 What’s the difference between them? Question System 6 Well, you see... provides Explanation Figure 3: dialogue steps 7KUHDG Figure 4: intentions and threads In the figure 5 is shown the full process in a moment of the dialogue. The user makes a discourse that implies an inform act. The information (circular saw, furthermore, a plunge cut type one) is stored in the session model. The manager changes the state (from ‘require clarification’ returns to 8VHU profiler 0RGHO 6HVVLRQ0RGHO 1. Recent Discourse 2. Context 5 toolname circular $YDWDUV 5 ³,WKLQN,QHHG DSOXQJHFXWVDZ´ ,1 1/,17(535(7(5 John Smith ,QWHUDFWLRQ$JHQW ,QWHUIDFH$JHQW λ name type plunge-cut Add_data (product, type, “plunge-cut”) 'LDORJXH0DQDJHU. State: 6ROYH7DVN ,QIRUP< data, purchase, product, “plunge-cut saw” > Thread Joint stack: 5HTXHVW(product) &RPPDQG(r_prod.) λ 5HTXHVW (data) 6ROYH7DVN (prod) 5HTXHVW 1/*(1(5$725 ³([FHOOHQW :KDWGR\RXQHHGLWIRU"´ 287 'LVFRXUVH0DNHU 6DWLVI\<formal, closed! Proceed: solves bifurcation; simplifies tree; no 5HTXHVW<data, solution; generates new request for explanation. application , product> Choice among two options (in this case) Figure 5: interaction flow ‘solve task’. The thread will close system’s ‘asking about type’, then follow to previous one (user’s ‘request product’). The discourse maker fails in constructing a response with the final solution, because the tree of solutions has several of them. Hence, springs a new system’s thread: ‘request more data’. The discourse maker is now able to make the response: a question referring next bifurcation in the solutions tree. This response becomes more definite in a stream of speech acts (satisfy + request) that will be interpreted by the Natural Language generator. Finally, the ‘action’ taken by the system will be used for updating the state (again to ‘require clarification’). 4 Current Related Work The approach showing the dialogue as a game scene, in which two players ‘act’ by turn seeking a goal, has been established on the regularity observed in conversations: similar goals in similar context should follow the same pattern (global organizing principle). Moreover, a discourse of a kind (‘dialogue act’) should be reacted with a certain kind of discourse (local organizing principle). For example, a question should be followed by an answer (adjacency pairs). Even if these predictions fail, divergences will be blazed with some extra information (often prosodic). Summarizing, interlocutors use to ‘act’ according with a set of shared scripts or better named ‘games’ [Levin, Moore,78]. The Trains project [Allen et al., 94] pursues an English speaking interface for solving some task (collaborating with the user) in a rigid domain. [Kowtko, Isard, 93] adds a theory of intonational pragmatics in which an intonational tune might differ the sort of (functional) move being made in a conversational game. [Poesio, Mikheev, 98] show some prediction with the sub-game nesting problem (based on a notion of game structure). The plan-based approach is based upon the fact that humans plan actions to achieve goals, for understanding a user the system has to predict those plans. This is, for answering to user’s utterances it has to uncover and respond appropriately to the underlying plan; for helping the user to achieve his goals, it has to predict them and construct the suitable plan. This approach is good for the real dialogue in which not every piece of it does explicitly appear (or perhaps it’s not perceived). It’s also good for language eccentricities, such as metaphor in which goals can be grasped although not explicitly shown. Cutting dispensable phrases out can enhance conversation. Finally, if an emotional model is added [Breese and Ball, 98; Lisetti, 99], some additional goals might be added to that plan, so the system might react to user’s aim for making him comfortable. Longbow project, from the University of Pittsburgh, can illustrate this sort of model. Longbow [Young 96] is a hierarchical discourse planning system that combines decompositional reasoning (planning to perform an abstract action by performing that action's sub-steps) and causal reasoning (planning to perform an action by ensuring that the action's preconditions all hold before its execution) in the same representation. It’s based on a domain-independent hierarchical AI planning algorithm named DPOCL (Decompositional, Partial-Order Causal Link planner) [Young et al., 94]. The plan-based modeling approach provides a generalization in which dialogue is handled as a particular act among other, even unconscious, communicative acts. Furthermore, dialogue can be seen as a special case of other rational non-communicative behavior [Cohen, 97]. However, these models spend much effort in redundant act recognition. On the other hand, the certainty handled for each ‘move’ is very low, both for the analysis and for the response generation. There must be distinction between task and dialogue control speech. For this, a new approach is presented: multilevel plan structures. Complexity is far from hand, being unplumbled and sometimes undecidable. Restricted planning may be reasonable in some cases. Goals don’t explain every human act, because some of them are unmotivated (or have hidden complex goals). Thus, a supplementation for this model is needed. [Breese and Ball, 98] explain that some motivations are better seen as a ‘state’ of the user, either for long (personality) or short time (emotion). Hence, the conversation is interpreted and constructed by any conversational model (interpreter plus generator), and conducted by an emotional conversational model (affective). A model that will lead to 'what mood is the user in' (as well as ‘what to say when user is... ’) by choosing among semantically equivalent, but emotionally diverse paraphrases. [Lisetti, 99] uses prototypes or schematas of emotional situations (memory of emotional context). So each emotion is to have a script containing its causal chain and its components. These scripts complement the conversation model by describing the user state. In addition, they can also be used to dynamically instantiate motivational states for the computer entities interacting with the user, so goals are provided for plan generation. The Joint-Action model [Cohen, Levesque 91] is based upon the premise that a dialogue is feed by both talking entities, and that they have at least one common reference point, a commitment. Those joints have to be kept ‘alive’ by both entities with an acceptable level of certainty and efficiency, so they can refer to it. Therefore, they’ll need to ask for clarification to avoid getting it too low, or to confirm or make enforcement utterances to let the interlocutor know there’s no need for redundant speech in that point. Conclusion It was presented the Interaction agent that allows an advanced multimedia user interface that ensures a high level of user-system interaction. The main advantage comes from the integration of advanced understanding and expression capabilities with the intelligent configuration of answers what leads to the generation of the right information in the right way and moment. The role played by the intelligent agent in the underlying interaction processes as well as the dialogue management was illustrated through an example. The current prototype is running in Ciao Prolog [Bueno, 99] and Java in a PII processor machine. References Allen, J.F., Schubert, L.K., Ferguson, G.M., Heeman, P.A., Hwang, C.H., Kato, T., Light, M.N., Martin, N.G, Miller, B.W, Poesio, M, Traum, D.R.. (1994). The TRAINS project: A case study in building a conversational planning agent. Technical Rep. 532 Dep. of Computer Science, University of Rochester Breese, J., Ball, G. (1998). Modeling Emotional State and Personality for Conversational Agents. Interactive and Mixed-Initiative Decision Theoretic Systems, Technical Rep SS-98-03 1998 AAAI Press Bueno, F., Cabeza, D., Carro, M., Hermenegildo, M., López, P., Puebla, G. (1999). The Ciao Prolog System: A Next Generation Logic Programming Environment. Reference Manual. The Ciao System Documentation Series Technical Report CLIP 3/97.1, The CLIP Group, School of Computer Science, Technical University of Madrid. Cohen, P.R., Levesque, H.J. (1991) Confirmation and Joint Action, Proceedings of International Joint Conf. on Artificial Intelligence, 1991 Cohen, P.R. (1997). Dialogue Modeling. Survey of the state of the art in Human Language Technology, 6, pp. 234240. Kowtko, J.C., Isard, S.D., Doherty, G.M. (1993). Conversational Games Within Dialogue, Research P. 31, Human Comm. Research Centre, Edinburgh. Levin, J.A., J. A., Moore. (1977). Dialogue games: Metacommunication structures for natural language interaction. Cognitive Sc. 1(4), 395-420 Lisetti, C. (1999). A User Model of Emotion-Cognition. Proceedings of Seventh International Conference on User Modeling, 1999. Poesio, M, Mikheev, A. (1998) The Predictive Power of Game Structure in Dialogue Act Recognition: Experimental Results Using Maximum Entropy Estimation. Proceedings of ICSLP-98, Nov 1998 Searle, J.R., (1969). Speech Acts: an essay in the philosophy of language. Cambridge Univ. Press. Young, R. M., Moore, J. D., Pollack, M. E., (1994). Towards a Principled Representation of Discourse Plans. Proceedings of the Sixteenth Conference of the Cognitive Science Society, Atlanta, GA, 1994. Young, M., 1996. A Developer's Guide to the Longbow Discourse Planning System. University of Pittsburgh Intelligent Systems Program Technical Report 94-4.