Framework of Communication Activation Robot Participating in Multiparty Conversation

advertisement
Dialog with Robots: Papers from the AAAI Fall Symposium (FS-10-05)
Framework of Communication Activation Robot
Participating in Multiparty Conversation
Yoichi Matsuyama, Hikaru Taniyama, Shinya Fujie and Tetsunori Kobayashi
Department of Computer Science and Engineering, Waseda University, Japan
3-4-1 Okubo, Room 55N-05-09 Shunjuku-ku, Tokyo 169-8555, Japan
matsuyama@pcl.cs.waseda.ac.jp
participation structure based on these analysis. Mutlu, et
al. (Mutlu et al. 2009) indicated that a robot can establish
participant roles of its conversational partners using its eye
gaze cues. Bohus, et al. (Bohus and Horvitz 2009) considered a model of multiparty engagement, where multiple
people may enter and leave conversations, and interact with
the system using paralinguistic information such as eye gaze
although they use character agent not physical robot. In this
manner, a robotic system should decide its behavior using
results of participation role estimation based on not only linguistic but also paralinguistic information to participate in
multiparty conversation. Breazeal, et al. developed robots
that had capabilities of expression of paralinguistic information(Breazeal et al. 2008). They are equipped with sufficient
degrees of freedom to express the information. Their MDS
robot can express paralanguage for social interaction using
eyelids, eyebrows, mouth and other facial parts.
However, there is few case studies to apply both recognition and expression of paralinguistic to multiparty conversation using a physical robot system. Therefore, in this paper,
we propose a framework of both hardware and software for a
robot to participate in multiparty conversation with embodiment that can recognize and express paralanguage.
In this paper, we focus on an entertainment game task that
is played among elderly people and staffs at an elderly care
facility(Matsuyama et al. 2008). Previously, various robots
for elderly care have been produced. In Ifbot case (Kato et
al. 2004) , ten thousands of utterance scripts are prepared
for human-robot conversation. Some of the users reported
that Ifbot can change relationship among the families and
release depressed moods when confronting the human’s terminal stages. Seal robot Paro (Wada and T.Shibata 2006)
was also designed for elderly care. They reported not only
psychical and physiological effectiveness such as relaxing,
increasing motivation to communication and improving vital signs, but also social effectiveness such as increasing frequency of communication among patients and nurses.
Particularly, we focus on “Nandoku” game as multiparty
conversation activation task in daycare center. Nandoku
game is one recreation game which can be described as multiparty conversation with a master of ceremony(MC) and
several panelists. In a Nandoku game, MC writes down a
“Kanji” question which is difficult to pronounce even for
Japanese, and then panelists try to figure out its pronuncia-
Abstract
We propose a framework for a robot to participate in
and activate multiparty conversation. In multiparty conversation, the robot should select its behavior based on
both linguistic information and participation structure.
In this paper, we focus on multiparty conversation game
“Nandoku,” which is often played in elderly care facilities. The robot acts as one of the participants, and tries
to promote the communication activeness. The framework handles the dialogue situation from three aspects:
multiparty conversation, game progress and communication activation, and selects the most effective robot’s
behavior according to these three aspects.
1 Introduction
We propose a framework for a conversational robot participating in multiparty conversation and activating it.
Embodiment is indispensable for conversation. In this paper, we define embodiment in conversation as to exist in conversational situation and possess capabilities to understand
its conversational partners’ intention and express its own intention to them with its physical body. Fujie, et al. argue
that combination of linguistic and paralinguistic information
can improve communication efficiency (Fujie, Fukushima,
and Kobayashi 2004). Paralanguage is physical signals that
complement linguistic information, such as prosody, timing
of utterance, direction of eye gaze, face expression, gesture,
and position.
In real life, most of our conversation at home and workplaces are progressed by more than one person. Recognition
and expression of paralanguage are more complicated than
the one-to-one situation. There are several researches about
behaviors of robots and character agents in multiparty conversation. Their goals are to recognize current role of a robot
and select appropriate behavior to improve the quality of
conversation. Such a work was based on psychological analysis of participation structure in multiparty conversation that
Goffman(E.Goffman 1981) and Clark(H.Clark 1996) organized. Matsusaka, et al. (Matsusaka, Tojo, and Kobayashi
2003) considered an actual robot system as first case that
participates in multiparty conversation using estimation of
c 2010, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
68
tion. We consider conversational robot system participating
in these multiparty conversation and activating it with capabilities of recognition and expression of paralanguage.
In this task, a robot should share the conversational situation and obey the rules of both multiparty conversation
and game task. Moreover, it should select its behavior that
can promote communication activeness. In this paper, we
consider a framework to evaluate and decide behaviors of a
conversational robot from three points of view; multiparty,
game progress and conversation activation.
In the following section, we consider a model for participation role estimation in multiparty conversation. In the
third section, we consider a methodology to activate multiparty conversation. In the forth section,
we describe our new
robotic hardware called “SCHEMA [ e:ma]” and software
architecture. In the fifth section, we discuss this framework,
and in the final section we conclude this paper.
Speaker
Side-participants
Pi : ri
P :r
P : r’
accept / reject
Pk : rk
Pi···k : participants (i = k)
ri···k : role of each participant
Participation Structure
Figure 2: Function of participation role
Goffman(E.Goffman 1981) and Clark(H.Clark 1996) indicates the participants’ roles in the participant structure of
multiparty conversation. The participants’ roles are divided
into a person who are allowed to participate in the conversation and a person who are not allowed to participate in
the conversation. Moreover, the former is divided into three
roles; Speaker who is primary speaker, Addressee who is
primary listener, Side-Participant who is not addressed listener.
Participants progress conversation with changing their
roles dynamically. Not only speaker but also both addressee
and side-participant should be careful to their behaviors depending on their own roles. Otherwise, the conversation can
not be progressed properly.
In multiparty conversation research, the next speaker estimation is the most important problem to solve. In addition, a
robot should posses policies to select its behavior depending
on their roles to participate in multiparty conversation.
Clark(H.Clark 1996) observed human-human interaction
in multiparty conversation and indicates that conversation
has the objective structure shown in Fig. 1. Each participant
is assigned to the following roles.
• Participant in conversation
– Speaker (SPK)
– Addressee (ADR)
– Side-Participant (SPT)
• Not participant but listener
– Bystander (BYS)
– Eavesdropper (EAD)
Participants change their roles dynamically to progress
the multiparty conversation smoothly.
2.2
Overhearers
Figure 1: Participation structure(H.Clark 1996)
2 Participating Multiparty Dialogue
2.1
Addressee
Some behaviors of participants posses functions to change
states of other participants. For instance, when a speaker
gazes at side-participant, it has ability to change the role
of side-participant into addressee. We use the term “ability to change ” here because the speaker needs the sideparticipantsf acceptance to change the role.
Participants request to each other to change roles of other
participants. And they repeat accepting or rejecting answer
to the requests to change its and others’ roles. We propose a
description methodology shown in Fig. 2. The examples are
as follows.
Requests are categorized as assignment to itself; “Which
role do I want to be?” and assignment to other participants;
“Which role of other participant’s do I want to change?” For
instance, A behavior maintaining its turn can be regarded as
“speaker’s request to assign speaker to itself(speaker)” (Fig.
3(a)). An addressing behavior can be regarded as “speaker’s
request to assign addressee to side-participant” (Fig. 3(b)).
Answers are categorized as acceptance and rejection. A
behavior accepting turn can be regarded as “addressee’s acceptance to speaker’s request to assign speaker to addressee
” (Fig. 4(a)). In contrast, a behavior rejecting turn can be
regarded as “addressee’s rejection to speaker’s request to assign speaker to addressee” (Fig. 4(b)).
3
Communication Activation Task
3.1 NANDOKU Game
The Request-Answer model in the previous section is observed in point of view of participation structure in multiparty conversation. In order that robot progresses a particular task with participating multiparty conversation, we
should describe functions in other points of view.
In this paper, we focus on participation in Nandoku game
and activation of conversation among the panelists. In Nandoku game, MC writes down on a whiteboard or project on
Request-Answer Mondel
In this section, we propose a model of effectiveness of participants’ behaviors including a robot in participation structure.
69
Pi : SPK
Pi : SPK
3.2 Functions of Behaviors in Quiz Game Task
Pi : SPK
Pi : SPK
P : SPT
(a)
Panelists’ behaviors in Nandoku game are not only to answer questions but also encourage other panelists to answer.
Therefore, we define functions of robot’s behavior in the
point of view of progressing the game as the following four.
P : ADR
(b)
1. ANSWER ... Function to offer answer
2. ASK_HINT ... Function to ask MC for hint
3. LET_ANSWER ... Function to encourage other panelists
to answer
4. INFORM ... Function to offer trivia information depending on a question
Figure 3: Example of Request. (a) speaker’s request to assign itself(speaker) to speaker, (b) speaker’s request to assign side-participant to addressee
Pi : SPK
P : ADR
Pi : SPK
P : SPK
P : ADR
accept
3.3 Function of Behaviors in Communication
Activation
P : SPK
reject
P : ADR
In Nandoku game, it succeeds in communication activation
by giving chances to other panelists to answer. These successful situation can be realized by not only encouraging
someone to answer directly but also answering to giving a
hint to other panelists or asking MC for a hint. And it can be
also realized when a robot reacts MC’s utterance to attract
attentions of other panelists.
And when either MC or a robot offers interesting information the situation should be activated. However, because this
type of function should be included by functions in the point
of view of progressing game, we can share these functions
there.
For this reason, we define the following four function in
the point of view of activating communication.
1. REACT_TO_ALMOST ... Function to react to MC’s utterance “Almost”
2. REACT_TO_CORRECT ... Function to react to MC’s utterance “Correct”
3. HESITATE ... Function to hesitate to answer (to say
something when MC encourage a robot to answer but has
no idea)
4. MUTTER ... Function to mutter (to say something suggesting hint)
1. and 2. are functions of reactive behaviors to MC’s specific actions. 3. is a function without substantial utterance
out of function of ANSWER and INFORM behaviors in the
previous section. 4. is a function of behaviors to say something independent from progressing the game but dependent
on the question.
P : ADR
(a)
(b)
Figure 4: Example of Answer. (a) addressee’s acceptance
to speaker’s request to assign addressee to speaker, (b) addressee’s rejection to speaker’s request to assign addressee
to speaker
Whiteboard
MC
Robot
(as a panelist)
Panelist
A
Panelist
B
Panelist
C
Figure 5: Situation of NANDOKU Game
a screen a Japanese Kanji question which is difficult to pronounce even for Japanese and then panelists answer the pronunciation of it. The situation is shown in Fig. 5. MC makes
a question for the game, encourages panelists to answer,
evaluates the answers and offer information related with the
question. The panelists can answer when they are asked by
MC or at any given point in time.
4
Framework
4.1 Hardware Design
To implement our proposal methodology, we produced a
conversational robot called “SCHEMA” (Matsuyama et al.
2009) that has necessary and sufficient degrees of freedom
for multiparty conversation shown in Fig. 6
It is approximately 1.2[m] height, which is the same level
of eyes of an adult male sitting down a chair. It has 10 degrees of freedom for right-left eyebrows, eyelids, right-left
eyes(roll and pitch) and neck(pitch and yaw). It can express anxiousness and surprise using its eyelid and control
eye gaze using eyes, neck and autonomous turret. And it
This kind of game is played as a recreation in elderly care
facilities to activate communication and their brain. MC is
usually a care staff in the facilities and elderly people are
panelists. In this research, a robot participates in the game
as one of panelists to activate communication. For this reason, we should also consider functions of robot’s behavior in
two more points of view; “Proceeding Nandoku game” and
“Activating Communication.”
70
Figure 7: System architecture
Figure 6: conversational robot SCHEMA [ e:ma]
the game progress viewpoint and the conversation activation
viewpoint.
Functions of behavior in multiparty conversation are
Request-Answer model as is described in 2.2. For instance,
answering something is a behavior that has a function of
“Somebody’s request to assign speaker to himself/herself.”
Here are two important points. First, a behavior often has
multiple functions. For instance, the answering behavior has
also a function of “Acceptance to somebody’s request to assign speaker to himself/herself” and a function of “Rejection
to somebody’s request to assign speaker to the other participant.”
Second, because these roles and related requests are
changing momentarily, the system needs to calculate based
on specific situations to confirm functions completely. For
instance, the function of “Acceptance to speaker’s request”
of a behavior doesn’t make sense when there is no request.
Functions of behaviors in progressing game are the four
types as described in 3.2. Some behaviors have one of
these functions, and some have none. The behaviors without the functions are independent of progress of the game.
For instance, mutter behaviors can sometimes be hints to
other panelists accidentally, but it is independent of the game
progress.
Functions of behaviors in communication activation are
the four types as described in 3.3. As with game progress
functions, some behaviors have one of these functions, and
others have none.
has 6 degrees of freedom for each arm, which can express
gestures. One degree of freedom is assigned to mouth to
indicate explicitly whether the robot is speaking or not.
A computer is inside the belly to control robot’s actions
and an external computer sends commands to execute various behaviors though WiFi network.
4.2
Software Modules
The system overview is shown in Fig. 7. This system is
categorized as the input group, the behavior selection group
and the output group. The detail of the behavior selection
group is described in the following section.
The input group consists of a mobile device and speech
recognizers as sensors to understand the environment. The
mobile device is for MC’s question selection. When a question is selected in the device, it is projected on a screen. System is notified the current question through a WiFi network.
Headset microphones for each participants including MC are
used for input of speech recognizers.
The output group consists of an action player and a speech
synthesizer. The action player executes robot’s physical action and the speech synthesizer output speech, which are
synchronized with each other.
4.3
Behavior Selection
Stuation Understanding We also deal with situation understanding from the multiparty conversation viewpoint, the
game progress viewpoint and the conversation activation
viewpoint.
Situation in the multiparty conversation viewpoint can be
interpreted as each participant’s role in participation structure as is described in 2.
The multiparty conversation state manager estimates the
situation by states of speech recognizers to recognize the
current speaker, state of robot’s speech synthesizer and history of robot’s behaviors. As a general rule, a participant
The behavior selection group progresses to select a behavior
of the highest value based on situations from various behaviors in the behavior dictionary.
Behavior Dictionary The behavior dictionary contains
various predefined behaviors. In each behavior, utterances,
gestures and functions utilized for evaluation are defined. As
we described in 3.1, functions of behaviors are interpreted
differently in different points of view. In this research, because of focusing on activating Nandoku game, we deal
with functions from the multiparty conversation viewpoint,
71
whose speech recognizer is working should be estimated as
the current speaker. And when the result of speech recognizer includes a participant’s name, he or she should be estimated to be requested as addressee. When several participants speak at a same time, the state manager estimates next
speaker according predefined rules. In particular, MC gets
first priority as speaker when MC is speaking.
Situation in Nandoku game is categorized as a long term
and a short term situation. The long term situation is
changeable status in long period in the game task. For instance, information of the current question, the current game
state(Pre-answering state or Post-answering state). Long
term situation changes when MC selects question with the
mobile device or the speech recognizer recognizes MC’s
specific keyword such as “Correct.”
The short term situation is changeable status in short period in a state of game. For instance, when MC asks someone for answer, MC evaluates someone’s answer. Short
term situation changes when the speech recognizer recognizes MC’s specific keyword such as a participant’s name
and “Almost.”
A situation in communication activation is described how
each panelist is activated. In particular, each panelist has its
activeness value from 0 to 100. We assume that panelist who
answers more frequently is more likely to be activated. The
activation state manager increases activeness of a panelist
who becomes speaker in the multiparty conversation viewpoint or a panelist who are asked for answer in the activeness
viewpoint. And it decreases activeness gently to avoid unreasonable increase of activeness.
a
w1
w2
w3
Figure 8: Behavior Evaluation.
After system calculates functions of each behavior, each
evaluator calculate value based on optional situations and
functions. Evaluation value of each behavior is the sum of
the waited values.
In the case of real number value, the evaluation value
changes continuously. This case is depent on communication activation. For instance, “Ask panelist A for answer”
behavior has little effectiveness when A has high activeness
value. But when A has low activeness, the behavior should
affect effectively. So that the value is described as follows.
a
MAX − aA
if aA < aMAX
e=
aMAX
0.0
otherwise
Behavior Evaluation The behavior evaluation group
starts to progress when each state manager generate triggers.
It evaluates all behaviors in behavior dictionary and returns
a behavior with the highest value and transfer it to output
group.
According to predefined rules, each state manager generate triggers. The multiparty conversation state manager
generates its triggers when participation roles change and,
participants request or answer. The game state manager generates its trigger when situation of game changes. The activation state manager generates its trigger when panelists’
communication activeness is lower than a threshold.
The flow of behavior evaluation is shown in Fig. 8. The
behavior evaluation starts with calculating functions of behaviors. Although most of functions of behaviors are predefined statically as is described in 4.3, as for the multiparty
conversation viewpoint, functions are decided based on situation using partially predefined functions. After this understanding process, the system can evaluate value of behaviors
using several functions and situation in each viewpoint.
Evaluation process are progressed by multiple evaluators.
Each evaluator evaluates using optional information in state
managers and functions in each behavior.
Evaluation is categorized as simple true and false value
or real number value. In the case of true and false value,
evaluators evaluate based rules such as “From viewpoint of
game progress, current situation is that MC asks a robot for
answer. Therefore this behavior has function ANSWER.”
Here, aA is A’s activeness and aMAX is maximum expectation of execution of asking behavior for answer, which is
predefined.
The final evaluation value of each behavior is waited sum
of each evaluator.
An example of an evaluation(true and false evaluation) is
shown in Table 1. In this example, the evaluator follows (1)
the multiparty conversation viewpoint; a robot should not
assign participants to bystander role, (2) the game progress
viewpoint; a robot should reply MC’s requests and should
not disturb the game progress. And it is independent from
the activation viewpoint. This is a part of multiple evaluators
in this framework. System designer can add their evaluators
to the system to improve robot’s behaviors easily.
5
Discussion
This robotic system understands situations and evaluates behaviors using both linguistic and paralinguistic information
to participate in and activate multiparty conversation. In
this paper, we propose methodology to understand in three
viewpoint and evaluate behaviors with multiple evaluators.
72
Table 1: Example of evaluator that calculates using true or false value
Point of view
Multiparty Conversation
Multiparty Conversation
Weight
−100.0
+100.0
Nandoku Game
+100.0
Nandoku Game
−100.0
Activation
+100.0
Activation
+100.0
Situation
∗
Optional request
Pre-answering State
MC’s Request to Answer to Robot
Pre-answering
Other panelist is answering
Pre-answering
MC’s utterance “Almost.”
Pre-answering
MC’s utterance “Correct.”
Function
Request to assign other Side Participant to Bystander
Acceptance to the request
ANSWER
ANSWER
REACT TO ALMOST
REACT TO CORRECT
Currently the system recognizes MC’s speech as linguistic
information and duration of utterance of all participants as
paralinguistic information.
However, these paralinguistic information is not sufficient
for this robot to participate in multiparty conversation. The
next speaker estimation problem needs not only speech information but also visual information such as participants’
directions of eye gaze. We will consider to add both visual
and auditory information to this framework to understand
paralanguage.
System designers can add various evaluators to change its
behavior. However, currently weight of each evaluator are
predefined. We will consider learning mechanisms to update each weight and method of evaluators. Also, the current
system can only passively response to participants’ auditory
information. For instance, after a robot asks one of panelists
for answer, it can not expand the conversation. Therefore we
will consider planning methodology to expand conversation
depending on a current topic or question in the game.
national conference on computer graphics and interactive
techniques, ACM SIGGRAPH 2008, new tech demos.
E.Goffman. 1981. Forms of Talk. University of Pennsylvania
Press.
Fujie, S.; Fukushima, K.; and Kobayashi, T. 2004. A conversation robot with back-channel feedback function based
on linguistic and non-linguistic information. In Proc. of 2nd
Intl. Conf. on Autonomous Robots and Agents, ICARA2004,
379–384.
H.Clark. 1996. Using Language. Cambridge, UK, Cambridge University Press.
Kato, S.; Ohshiro, S.; H.Itoh; and Kimura, K. 2004. Development of a communication robot ifbot. In Proceedings
of 2004 IEEE International Conference on Robotics and Automation (ICRA), 697–702.
Matsusaka, Y.; Tojo, T.; and Kobayashi, T. 2003. Conversation robot participating in group conversation. The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Information and Systems
E86-D(1):26–36.
Matsuyama, Y.; Taniyama, H.; Fujie, S.; and Kobayashi, T.
2008. Designing communication activation system in group
communication. In Proc. Humanoids2008, 629–634.
Matsuyama, Y.; Taniyama, H.; Hosoya, K.; Tsuboi, H.;
Fujie, S.; and Kobayashi, T. 2009. Schema: multi-party
interaction-oriented humanoid robot. In International conference on computer graphics and interactive techniques,
ACM SIGGRAPH ASIA 2009 Emerging Technologies.
Mutlu, B.; Shiwa, T.; Kanda, T.; Ishiguro, H.; and Hagita,
N. 2009. Footing in human-robot conversations: How robots
might shape participant roles using gaze cues. In Proceeding
of Human Robot Interaction 2009, 61–68.
Wada, K., and T.Shibata. 2006. Robot therapy in a care
house - its sociopsychological and physiological effects on
the residents. In Proceedings of 2006 IEEE International
Conference on Robotics and Automation (ICRA), 3966–
3971.
6 Conclusion
We propose integrated framework for a robot participating
in and activating multiparty conversation using paralinguistic information. The system understands situation and evaluates functions of behavior in three view points. In order
to activate multiparty conversation, we will consider learning mechanism for multiple evaluator to update automatically by situation and also consider mechanism for expanding topic.
7
Acknowledgments
TOSHIBA corporation provided the speech synthesizer engine customized for our spoken dialogue system. We also
wish to thank the staff of NPO Community Care Link Tokyo
Care-town Kodaira Day Care Service Center.
References
Bohus, D., and Horvitz, E. 2009. Model for multiparty engagement in open-world dialogue. In Proceedings of SIGDIAL 2009, 225–234.
Breazeal, C.; Grupen, R.; Deegan, R.; Weber, J.; and Narendran, K. 2008. Mobile, dexterous, social robots for mobile manipulation and human-robot interaction. In Inter-
73
Download