Control of facial expressions of the humanoid robot head

advertisement
Control of facial expressions of the humanoid robot
head ROMAN
Karsten Berns
Jochen Hirth
Robotics Research Lab
Department of Computer Science
University of Kaiserslautern Germany
Email: berns@informatik.uni-kl.de
Robotics Research Lab
Department of Computer Science
University of Kaiserslautern Germany
Email: j hirth@informatik.uni-kl.de
Abstract— For humanoid robots which are able to assist
humans in their daily life, the capability for adequate interaction
with human operators is a key feature. If one considers that more
than 60% of human communication is conducted non-verbally
(by using facial expressions and gestures), an important research
topic is how interfaces for this non-verbal communication can
be developed. To achieve this goal, several robotic heads have
been designed. However, it remains unclear how exactly such a
head should look like and what skills it should have to be able to
interact properly with humans. This paper describes an approach
that aims at answering some of these design choices. A behaviorbased control to realize facial expressions which is a basic ability
needed for interaction with humans is presented. Furthermore a
poll in which the generated facial expressions should be detected
is visualized. Additionally, the mechatronical design of the head
and the accompanying neck joint are given.
software architecture of ROMAN is presented. Based on this
control approach an experimental evaluation is realized, where
the facial expressions of ROMAN should be detected. The
results of this evaluation are visualized at the end of the paper.
Index Terms— humanoid robot head, facial expressions, mechanical design, behavior based control
I. I NTRODUCTION
Worldwide, several research projects focus on the development of humanoid robots. Especially for the head design there
is an ongoing discussion if it should look like a human head
or if a more technical optimized head construction [1], [2]
should be developed. The advantage of a technical head is, that
there is no restriction according to the design parameters like
head size or shape. This fact reduces the effort for mechanical
construction. On the other hand, if realistic facial expressions
should be used to support communication between a robot
and a person, human likeness could increase the performance
of the system. The aim of the humanoid head project of
the University of Kaiserslautern is to develop both a very
complex robot head able to simulate the facial expressions of
humans while perceiving its environment by a sensor system
(stereo-camera system, artificial nose, several microphones,
...) similar to the senses of a human. On the other hand, the
robot head should look like a human head to examine, if its
performance due to non-verball communication will be higher
compared to technical heads. In the following, a behaviorbased control architecture for realizing facial expressions is
introduced. Starting from the state of research new goals
for the realization of facial expressions are defined. Second
the mechatronics system of ROMAN, including a neck- and
an eye-construction, is described. Then the behavior-based
Fig. 1. The humanoid robot head ”ROMAN” (ROMAN = RObot huMan
interAction machiNe) of the University of Kaiserslautern.
II. S TATE OF RESEARCH
Several projects worldwide focus on the implementation of
facial expressions on different types of heads (e.g. see [3]).
Uncertainly Kismet is one of the most advanced projects
in the area of human robot interaction [4], [5], [2]. Kismet
is able to show different types of emotions by using facial
expressions. The main disadvantage of the selected approach
is the way how a facial expression is activated. Each facialexpression-behavior is related to a so called ”releaser”. This
”releaser” calculates the activation of the corresponding facialexpression-bahavior. The behavior with the highest activation
determines the resulting facial expression of the robot. That
means the so called ”winner-takes-all” concept is used. In
order to show more than a certain number of predefined
facial expressions it is necessary to fuse the single facial
expressions. With this fusion it is also possible to show
graduated facial expressions and to get smooth transitions
between the different expressions. Also with regard to the
use of speech and the corresponding movements of the mouth
a fusion of the determined control parameters is necessary.
Otherwise the robot would change its facial expression when it
starts to talk. Or new specific behaviors must be implemented
for the combination of speech and facial expressions.
Another Project concerning facial expressions is WA-4RII of
the Takanishi Lab of the Waseda University [1], [6]. For
realization of facial expressions a 3-dimensional emotion map
is used. In this map there are fix areas defined for 6 Emotions.
If the input value is located in the area of a certain emotion, the
corresponding facial expression is presented by the robot. This
activation system of facial expressions is the main disadvantage in this approach. Because this system allows only to show
a predefined number of facial expressions. A better approach
should be able to show more than a predefined number of
facial expressions. This could be realized with a mixture of
the different facial expressions and with a graduated activation
of the different expressions.
The Alpha-Project of the University of Freiburg [7], [8]
realizes an approach that uses the fusion of the different facial
expressions. But a disadvantage in this approach is, that the
head of Alpha only has a few degrees of freedom to show
facial expressions.
All these considerations lead to the following questions:
• What should the architecture of such a behavior based
control for facial expressions look like?
• How should the activation of such a behavior work?
• How should the fusion of the facial expressions be
calculated?
• How to prevent influence between different facial expressions?
III. M ECHATRONICS OF ROMAN
Fig. 2.
Mechanical construction of the humanoid robot head
Mechanics - The mechanics of the head consists of a
basic unit (cranial bone) including the lower jaw, the neck,
and the motor unit for the artificial eyes. Beside the eye
construction, which is built at the moment in our mechanical
workshop, all mechantonical components are installed in our
head. In the basic unit 8 metal plates, which can be moved
via wires, are glued on the silicon skin at a position, on which
Ekman‘s action units are placed. The plate areas as well as its
fixing positions on the skin and the direction of its movement
are optimized in a simulation system according to the basic
emotions which should be expressed. As actuators, 10 servomotors are used to pull and push the wires. Additionally, a
servo-motor is used to raise and lower the lower jaw.
For the neck construction a concept was selected where
motions are realized by a kinematic chain with 4 DOF. For
the design of the neck basic characteristics of the geometry,
kinematics and dynamics of a human neck are considered.
From the analysis of the human neck a ball joint could be
selected for the construction of the 3DOF for basic motion.
Unfortunately, it is very hard to design appropriate driving
system for such a solution. So, it has been decided, that for
kinematic functions of the human neck a serial chain similar
to Cardan joint solution is applied. The first degree of freedom
is the rotation over vertical axis. The range of this rotation for
artificial neck was assumed as +/ − 60◦ . The second degree
of freedom is the inclination of the neck over horizontal axis
in the side plane(range of motion +/ − 30◦ ). The third degree
of freedom is the inclination of the neck in frontal plane. It
is rotating around the axis which is moving accordingly to
the second degree of freedom (range of motion +/ − 30◦ ). In
addition there is a 4th joint used for nodding ones head (range
of motion +/ − 40◦ ).
Sensor system - Besides encoders fixed on the DC motors,
the neck and the eye movements, an inertial system will be
integrated in the head which measures the angular velocity
and the acceleration in all 3 DOFs. This gives an estimation
of the pose of the head. Also two microphones (fixed in
the ears) and a loudspeaker are included in the head. The
main sensor system for the interaction with humans is the
stereo-vision system, which consist of two dragonfly cameras.
Several experiments have been performed like the detection of
a human head (see [9]). At the moment the integration of the
stereo camera system in the eye design of the head is under
development.
Control architecture - The control of the servo and DC motors as well as the determination of the pose from the inertial
system is done with a DSP (Motorola 56F803) connected to
a CPLD (Altera EPM 70 128). In total 5 of these computing
units are installed in the head one for the inertial system, one
for the stepping motors of the eyes, two for the 4 DC motors
of the neck and one of the 11 servo motors which move the
skin. These computing units are connected via CAN-bus to
an embedded PC. The two microphones and the loudspeaker
are connected to the sound card of the embedded PC. The
cameras, which are included in the eye construction, use the
firewire IEEE 1394 input channel of the embedded PC (see
figure 3).
The calculation of movements of the different facial expressions is done on a Linux-PC. The behavior based control
is implemented with the help of the Modular Controller
Architecture (MCA). MCA is a modular, network transparent
and realtime capable C/C++ framework for controlling robots
(see [10] and [11] for details). MCA is conceptually based
on modules and edges between them. Modules may be organised in module groups, which allows the implementation of
hierarchical structures.
Fig. 4. A single behavior-node. a = the activity, r = target rating, ι = the
activation, ~e = input, i = inhibition, F (~e, ι, i) = transfer function and ~
u the
output calculated by the transfer function
Fig. 3.
Computer architecture of ROMAN.
IV. C ONCEPT FOR THE BEHAVIOR BASED CONTROL OF
FACIAL EXPRESSIONS
In the following our control approach will be presented. To
reach the the above mentioned targets for the control concept a
behavior-based approach was selected. The design of a single
bahavior-node is shown in figure 4. The basic concept of our
behavior-nodes is presented in [12].
There are 6 basic facial expressions (anger, disgust, fear,
happiness, sadness and surprise). Every expression is related
to one behavior-node. The control of these 6 behaviors is done
similar to the Kismet-Project [2], by a 3-dimensional input
vector (A, V, S) (A = Arousal, V = Valence, S = Stance). This
vector is represented by a point in the emotion map, which
is a cube. In this cube every basic facial expression is also
represented by a point. The activation of facial expression i, ιi
is calculated in equation 1, where diag stands for the diagonal
of the cube which is the maximum possible distance of two
points. Pi means the point that represents facial expression i
and I the input vector.
ιi =
diag − |Pi − I|
diag
(1)
Fig. 5. Concept of the behavior based control. A facial expression behavior
consists of two steps, the first step considers the time depending character of
facial expressions, the second step realizes the movements of the face.
Figure 5 shows the concept of the behavior based architecture. Every facial expression consists of two steps. The
first step gets the activation calculated with the input vector
(A, V, S). It determines a function, act(t, ι), for the activation
depending on time (equation 2 and figure 6). In equation 2 g
stands for the gradient, h for the time the max. value is hold, n
for the negative gradient and p for the percent of the max. value
that are hold until the input falls under a certain threshold. The
resulting functions for the 6 basic facial expressions are shown
in figure 7.
if t ≤ g1
and ι > 0.1,
if g1 < t ≤ 1+h
g
and ι > 0.1,
p−1
1+h
if 1+h
g <t< g + n
act(t, ι) =
and ι > 0.1,


p−1

p
·
ι
if t ≥ 1+h

g + n




and ι > 0.1,




p−1
p

max(p · ι + n · ι · t, 0) if 1+h

g + n < t < −n




and ι ≤ 0.1,



0
else.
(2)

g·ι·t










ι










ι + n · ι · t
Fig. 7. The activation functions for the facial expressions of the robot head
ROMAN, A = anger, B = disgust, C = fear, D = happiness, E = sadness and
F = surprise.
TABLE I
TABLE OF THE ROBOT HEAD ROMAN S ACTION UNITS .
Fig. 6. The general function that represents the regard of time for facial
expressions. The parameters that changes depending on facial expression are
the gradient, time t that the maximum is hold, x and the negative gradient.
This activation is led to the second behavior step. This
step realizes the facial expressions. Therefore, several action
units, similar to [13], are defined as you can see in table I.
The difference between the action units in [13] and the ”new
ones” are that the action units of ROMAN not only move
in one direction. That means that the ”muscles” of ROMAN
are able to pull and push. Because of this the action units
used here have two parameters: AU (x, y), x{−1, 1} stands
for the direction (−1 = push, 1 = pull) and y stands for
the strength of the movement. The action units used by
the behaviors are shown in table II. The strength si of the
action unit AUi (xi , y)i caused by facial expression j, is
calculated as shown in equation 4. The parameter aj in this
equation stands for the activation of the facial expression j
(see equation 3). The output u~j of the facial expression j is:
u~j = (s1 , s2 , s9 , s12 , s15 , s20 , s24 , s26 ).
ai = acti (t, ι), i[1, ..., 6]
(3)
si = AUi (xi , yi ) · aj
(4)
⇔ si = xi · yi · aj
Finally the action unit alignments of every basic facial expression behavior are fused. This fusion is done by equation 5,
Action Unit Number
1
2
9
12
15
20
24
26
Description
raise and lower inner eyebrow
raise and lower outer eyebrow
nose wrinkle
raise mouth corner
lower mouth corner
stretch lips
press lips
lower chin
where u~i stands for the action unit alignment of the facial
expression i, ai for the activation of facial expression i and ~u
for the resulting action unit alignment.
~u =
5
X
k=0
ak
P5
j=0
aj
· u~k
(5)
To prevent influence between different facial expressions,
the inhibition option of the behaviors is used. If a facial
expression is activated for more than 80% ”contrary” facial
expressions are inhibited (e.g. if happiness is activated for
more than 80% the activation of sadness should be 0%) and
if one facial expression is more than 95% activated all other
facial expressions are inhibited.
TABLE II
T HE ACTION UNIT COMBINATIONS TO SHOW THE 6 BASIC FACIAL
EXPRESSIONS .
Facial Expression
Anger
Disgust
Fear
Happiness
Sadness
Surprise
Action Unit Combination
AU 1(−1, y) + AU 2(1, y) + AU 20(1)
AU 1(−1, y) + AU 2(−1, y) + AU 9(1, y) +
AU 20(1, y) + AU 24(1, y)
AU 1(1, y) + AU 2(1, y) + AU 26(1, y)
AU 12(1, y) + AU 26(1, y)
AU 1(1, y) + AU 2(−1, y) + AU 15(1, y)
AU 1(1, y) + AU 2(1, y) + AU 26(1)
The resulting facial expressions of our robot head ROMAN1
are shown in figure 8.
with levels from 1 to 5 (1 means a weak and 5 a strong
correlation).
The results of the experiment should help to get more
information of the recognition and the demonstration of facial
expressions. Furthermore with help of the results the facial
expressions of ROMAN should be rectified.
The program used for the analysis of the evaluation was the
SPSS2 (= Statistical Package for the Social Scientist). The
results of the evaluation are shown in table III and in table IV.
The left column contains the shown facial expression. The
right column contains the average values of the detected
correlation between the 6 basic facial expression and the
current picture or video.
TABLE III
T HE RESULTS OF THE EXPERIMENTAL EVALUATION FOR THE PICTURES .
Presented emotion
Anger
Detected strength
Anger:4.5, Disgust:1.8, Fear:1.5,
Happiness:1.0, Sadness:1.4, Surprise:1.2
Anger:1.7, Disgust:2.6, Fear:1.0,
Happiness:2.6, Sadness:1.2, Surprise:3.7
Anger:1.4 Disgust:1.8, Fear:3.6,
Happiness:1.5, Sadness:1.8, Surprise:3.8
Anger:1.1 Disgust:1.0, Fear:1.2,
Happiness:4.3, Sadness:1.0, Surprise:2.3
Anger:2.2 Disgust:2.3, Fear:2.8,
Happiness:1.0, Sadness:3.9, Surprise:1.3
Anger:1.3 Disgust:1.3, Fear:2.7,
Happiness:1.4, Sadness:1.6, Surprise:4.2
Anger:1.5, Disgust:1.5, Fear:3.0,
Happiness:1.6, Sadness:1.8, Surprise:2.8
Anger:3.0, Disgust:2.4, Fear:2.0,
Happiness:1.0, Sadness:2.5, Surprise:1.4
Anger:2.2, Disgust:2.4, Fear:2.8,
Happiness:1.0, Sadness:3.7, Surprise:1.3
Disgust
Fear
Happiness
Sadness
Surprise
50% Fear
Anger, Fear and Disgust
50% Sadness
TABLE IV
T HE RESULTS OF THE EXPERIMENTAL EVALUATION FOR THE VIDEOS .
Presented emotion
Anger
Disgust
Fear
Happiness
Fig. 8. The facial expressions generated with the robot head ROMAN: A =
anger, B = disgust, C = fear, D = happiness, E = sadness and F = surprise.
Sadness
Surprise
V. E XPERIMENTAL E VALUATION
The experiment set-up was as follows: we presented 9
pictures and 9 videos with facial expressions of ROMAN
to 32 persons (13 women and 19 men) at the age of 21
to 61 years. Every person has to determine the correlation
between presented expression an the 6 basic facial expressions
1 The silikon skin of ROMAN was produced and designed by Clostermann
Design Ettlingen.
50% Fear
Anger, Fear and Disgus
50% Sadness
Detected strength
Anger:3.7, Disgust:2.1, Fear:2.7,
Happiness:1.4, Sadness:1.7, Surprise:1.7
Anger:3.0, Disgust:2.6, Fear:1.9,
Happiness:1.0, Sadness:2.3, Surprise:1.5
Anger:1.9, Disgust:1.5, Fear:3.3,
Happiness:2.5, Sadness:1.5, Surprise:4.2
Anger:1.1, Disgust:1.0, Fear:1.1,
Happiness:4.3, Sadness:1.1, Surprise:2.5
Anger:1.6, Disgust:1.5, Fear:1.9,
Happiness:1.1, Sadness:3.6, Surprise:1.5
Anger:1.6, Disgust:1.6, Fear:3.5,
Happiness:1.3, Sadness:1.3, Surprise:4.6
Anger:2.0, Disgust:1.8, Fear:3.2,
Happiness:1.4, Sadness:1.8, Surprise:3.5
Anger:1.9, Disgust:1.4, Fear:2.4,
Happiness:1.2, Sadness:3.4, Surprise:1.5
Anger:1.6, Disgust:1.4, Fear:2.0,
Happiness:1.1, Sadness:3.5, Surprise:1.5
The results of the evaluation show that the correct recognition of the facial expressions anger, happiness, sadness and
2 http://www.spss.com/de/
surprise is significant (significance α < 5%). But the facial
expressions fear and disgust are not identified. Furthermore
the results show that in most cases there are no significant
differences between the evaluation of the pictures and the
videos (significance α < 5%). Due to continuous human
machine interaction the result of the video experiment is more
important. The analysis of the understated emotions ( fear
50% activation and sadness 50% activation) show, the subjects
recognize that the facial expression is not that strong than
in the case of 100% activation. The evaluation of the anger,
fear and disgust mixture shows that in this case no facial
expression is identified. But the selected facial expressions
are named in the most cases (similar to the interpretation of
comparable human faces expressions). Compared to Ekman’s
experiments for the recognition of human facial expressions
there was no significant difference in the evaluation of the
basic emotions anger, happiness, sadness and surprise. To
improve the perception of the facial expressions fear and
disgust it is planed to optimize the action units at the wing of
the nose. In addition new action units of the neck and of the
eyes should be used to strengthen these facial expressions [14].
VI. C ONCLUSION
In this paper a humanoid head construction is introduced,
which will be used to interact with humans. One focus
of the present research is how the facial expressions of a
human being can be realized on the robot head ROMAN.
From our point of view the complexity of this problem
will be reduced, if the robot head is human-like. Using
a behavior-based control architecture for the robot head
ROMAN, facial expressions were realized. In experiments
with several persons it is shown that the generated facial
expressions are in general classified correctly.
The next steps taken in the course of this work will
include the addition of facial expression analysis to the image
processing subsystem, and in parallel to this the completion
of the mechatronical design as well as enhancements of the
implemention of the behaviour based control concept for
interaction with humans.
ACKNOWLEDGMENT
A special thank to Prof. Dr. Stephan Dutke of the Department of Psychology of the University of Kaiserslautern, who
helped us with the experimental evaluation.
R EFERENCES
[1] A. Takanishi, H. Miwa, and H. Takanobu, “Development of humanlike head robots for modeling human mind and emotional human-robot
interaction,” IARP International workshop on Humanoid and human
Friendly Robotics, pp. 104–109, December 2002.
[2] C. L. Breazeal, “Emotion and sociable humanoid robots,” International
Journal of Human-Computer Studies, vol. 59, no. 1–2, pp. 119–155,
2003.
[3] N. Esau, B. Kleinjohann, L. Kleinjohann, and D. Stichling, “Mexi machine with emotionally extended intelligence: A software architecture
for behavior based handling of emotions and drives,” in Proceedings
of the 3rd International Conference on Hybrid and Intelligent Systems
(HIS’03), Melbourne, Australia, December 14-17 2003, pp. 961–970.
[4] C. Breazeal, “Sociable machines: Expressive social exchange between
humans and robots,” Ph.D. dissertation, Massachusetts Institute Of
Technology, May 2000.
[5] “Kismet,”
http://www.ai.mit.edu/projects/humanoid-robotics-group/
kismet/kismet.html, 2001.
[6] “Emotion expression humanoid robot,” http://www.takanishi.mech.
waseda.ac.jp/research/eyes/we-4rII/index.htm, 2005.
[7] M. Bennewitz, F. Faber, D. Joho, M. Schreiber, and S. Behnke, “Towards
a humanoid museum guide robotthat interacts with multiple persons,”
in Proceedings of the IEEE-RAS/RSJ International Conference on Humanoid Robots (Humanoids), Tsukuba, Japan, December 5-7 2005, pp.
418–423.
[8] ——, “Enabling a humanoid robot to interact with multiple persons,”
in Proceedings of the nternational Conference on Dextrous Autonomous
Robots and Humanoids (DARH), Yverdon-les-Bains - Switzerland, May
19-22 2005.
[9] K. Berns and T. Braun, “Design concept of a human-like robot head,”
in Proceedings of the IEEE-RAS/RSJ International Conference on Humanoid Robots (Humanoids), Tsukuba, Japan, December 5-7 2005, pp.
32–37.
[10] K.-U. Scholl, V. Kepplin, J. Albiez, and R. Dillmann, “Developing
robot prototypes with an expandable modular controller architecture,” in
Proceedings of the International Conference on Intelligent Autonomous
Systems, Venedig, June 2000, pp. 67–74.
[11] K. Scholl, V. Kepplin, J. Albiez, and R. Dillmann, “Developing robot
prototypes with an expandable modular controller architecture,” in
Proceedings of the International Conference on Intelligent Autonomous
Systems, Venedig, June 2000, pp. 67 – 74.
[12] J. Albiez, T. Luksch, K. Berns, and R. Dillmann, “An activation-based
behavior control architecture for walking machines,” The International
Journal on Robotics Research, Sage Publications, vol. 22, pp. 203–211,
2003.
[13] P. Ekman and W. Friesen, Facial Action Coding System. Consulting
psychologist Press, Inc, 1978.
[14] P. Ekman, W. Friesen, and J. Hager, Facial Action Coding System. A
Human Face, 2002.
Download