Control of facial expressions of the humanoid robot head ROMAN Karsten Berns Jochen Hirth Robotics Research Lab Department of Computer Science University of Kaiserslautern Germany Email: berns@informatik.uni-kl.de Robotics Research Lab Department of Computer Science University of Kaiserslautern Germany Email: j hirth@informatik.uni-kl.de Abstract— For humanoid robots which are able to assist humans in their daily life, the capability for adequate interaction with human operators is a key feature. If one considers that more than 60% of human communication is conducted non-verbally (by using facial expressions and gestures), an important research topic is how interfaces for this non-verbal communication can be developed. To achieve this goal, several robotic heads have been designed. However, it remains unclear how exactly such a head should look like and what skills it should have to be able to interact properly with humans. This paper describes an approach that aims at answering some of these design choices. A behaviorbased control to realize facial expressions which is a basic ability needed for interaction with humans is presented. Furthermore a poll in which the generated facial expressions should be detected is visualized. Additionally, the mechatronical design of the head and the accompanying neck joint are given. software architecture of ROMAN is presented. Based on this control approach an experimental evaluation is realized, where the facial expressions of ROMAN should be detected. The results of this evaluation are visualized at the end of the paper. Index Terms— humanoid robot head, facial expressions, mechanical design, behavior based control I. I NTRODUCTION Worldwide, several research projects focus on the development of humanoid robots. Especially for the head design there is an ongoing discussion if it should look like a human head or if a more technical optimized head construction [1], [2] should be developed. The advantage of a technical head is, that there is no restriction according to the design parameters like head size or shape. This fact reduces the effort for mechanical construction. On the other hand, if realistic facial expressions should be used to support communication between a robot and a person, human likeness could increase the performance of the system. The aim of the humanoid head project of the University of Kaiserslautern is to develop both a very complex robot head able to simulate the facial expressions of humans while perceiving its environment by a sensor system (stereo-camera system, artificial nose, several microphones, ...) similar to the senses of a human. On the other hand, the robot head should look like a human head to examine, if its performance due to non-verball communication will be higher compared to technical heads. In the following, a behaviorbased control architecture for realizing facial expressions is introduced. Starting from the state of research new goals for the realization of facial expressions are defined. Second the mechatronics system of ROMAN, including a neck- and an eye-construction, is described. Then the behavior-based Fig. 1. The humanoid robot head ”ROMAN” (ROMAN = RObot huMan interAction machiNe) of the University of Kaiserslautern. II. S TATE OF RESEARCH Several projects worldwide focus on the implementation of facial expressions on different types of heads (e.g. see [3]). Uncertainly Kismet is one of the most advanced projects in the area of human robot interaction [4], [5], [2]. Kismet is able to show different types of emotions by using facial expressions. The main disadvantage of the selected approach is the way how a facial expression is activated. Each facialexpression-behavior is related to a so called ”releaser”. This ”releaser” calculates the activation of the corresponding facialexpression-bahavior. The behavior with the highest activation determines the resulting facial expression of the robot. That means the so called ”winner-takes-all” concept is used. In order to show more than a certain number of predefined facial expressions it is necessary to fuse the single facial expressions. With this fusion it is also possible to show graduated facial expressions and to get smooth transitions between the different expressions. Also with regard to the use of speech and the corresponding movements of the mouth a fusion of the determined control parameters is necessary. Otherwise the robot would change its facial expression when it starts to talk. Or new specific behaviors must be implemented for the combination of speech and facial expressions. Another Project concerning facial expressions is WA-4RII of the Takanishi Lab of the Waseda University [1], [6]. For realization of facial expressions a 3-dimensional emotion map is used. In this map there are fix areas defined for 6 Emotions. If the input value is located in the area of a certain emotion, the corresponding facial expression is presented by the robot. This activation system of facial expressions is the main disadvantage in this approach. Because this system allows only to show a predefined number of facial expressions. A better approach should be able to show more than a predefined number of facial expressions. This could be realized with a mixture of the different facial expressions and with a graduated activation of the different expressions. The Alpha-Project of the University of Freiburg [7], [8] realizes an approach that uses the fusion of the different facial expressions. But a disadvantage in this approach is, that the head of Alpha only has a few degrees of freedom to show facial expressions. All these considerations lead to the following questions: • What should the architecture of such a behavior based control for facial expressions look like? • How should the activation of such a behavior work? • How should the fusion of the facial expressions be calculated? • How to prevent influence between different facial expressions? III. M ECHATRONICS OF ROMAN Fig. 2. Mechanical construction of the humanoid robot head Mechanics - The mechanics of the head consists of a basic unit (cranial bone) including the lower jaw, the neck, and the motor unit for the artificial eyes. Beside the eye construction, which is built at the moment in our mechanical workshop, all mechantonical components are installed in our head. In the basic unit 8 metal plates, which can be moved via wires, are glued on the silicon skin at a position, on which Ekman‘s action units are placed. The plate areas as well as its fixing positions on the skin and the direction of its movement are optimized in a simulation system according to the basic emotions which should be expressed. As actuators, 10 servomotors are used to pull and push the wires. Additionally, a servo-motor is used to raise and lower the lower jaw. For the neck construction a concept was selected where motions are realized by a kinematic chain with 4 DOF. For the design of the neck basic characteristics of the geometry, kinematics and dynamics of a human neck are considered. From the analysis of the human neck a ball joint could be selected for the construction of the 3DOF for basic motion. Unfortunately, it is very hard to design appropriate driving system for such a solution. So, it has been decided, that for kinematic functions of the human neck a serial chain similar to Cardan joint solution is applied. The first degree of freedom is the rotation over vertical axis. The range of this rotation for artificial neck was assumed as +/ − 60◦ . The second degree of freedom is the inclination of the neck over horizontal axis in the side plane(range of motion +/ − 30◦ ). The third degree of freedom is the inclination of the neck in frontal plane. It is rotating around the axis which is moving accordingly to the second degree of freedom (range of motion +/ − 30◦ ). In addition there is a 4th joint used for nodding ones head (range of motion +/ − 40◦ ). Sensor system - Besides encoders fixed on the DC motors, the neck and the eye movements, an inertial system will be integrated in the head which measures the angular velocity and the acceleration in all 3 DOFs. This gives an estimation of the pose of the head. Also two microphones (fixed in the ears) and a loudspeaker are included in the head. The main sensor system for the interaction with humans is the stereo-vision system, which consist of two dragonfly cameras. Several experiments have been performed like the detection of a human head (see [9]). At the moment the integration of the stereo camera system in the eye design of the head is under development. Control architecture - The control of the servo and DC motors as well as the determination of the pose from the inertial system is done with a DSP (Motorola 56F803) connected to a CPLD (Altera EPM 70 128). In total 5 of these computing units are installed in the head one for the inertial system, one for the stepping motors of the eyes, two for the 4 DC motors of the neck and one of the 11 servo motors which move the skin. These computing units are connected via CAN-bus to an embedded PC. The two microphones and the loudspeaker are connected to the sound card of the embedded PC. The cameras, which are included in the eye construction, use the firewire IEEE 1394 input channel of the embedded PC (see figure 3). The calculation of movements of the different facial expressions is done on a Linux-PC. The behavior based control is implemented with the help of the Modular Controller Architecture (MCA). MCA is a modular, network transparent and realtime capable C/C++ framework for controlling robots (see [10] and [11] for details). MCA is conceptually based on modules and edges between them. Modules may be organised in module groups, which allows the implementation of hierarchical structures. Fig. 4. A single behavior-node. a = the activity, r = target rating, ι = the activation, ~e = input, i = inhibition, F (~e, ι, i) = transfer function and ~ u the output calculated by the transfer function Fig. 3. Computer architecture of ROMAN. IV. C ONCEPT FOR THE BEHAVIOR BASED CONTROL OF FACIAL EXPRESSIONS In the following our control approach will be presented. To reach the the above mentioned targets for the control concept a behavior-based approach was selected. The design of a single bahavior-node is shown in figure 4. The basic concept of our behavior-nodes is presented in [12]. There are 6 basic facial expressions (anger, disgust, fear, happiness, sadness and surprise). Every expression is related to one behavior-node. The control of these 6 behaviors is done similar to the Kismet-Project [2], by a 3-dimensional input vector (A, V, S) (A = Arousal, V = Valence, S = Stance). This vector is represented by a point in the emotion map, which is a cube. In this cube every basic facial expression is also represented by a point. The activation of facial expression i, ιi is calculated in equation 1, where diag stands for the diagonal of the cube which is the maximum possible distance of two points. Pi means the point that represents facial expression i and I the input vector. ιi = diag − |Pi − I| diag (1) Fig. 5. Concept of the behavior based control. A facial expression behavior consists of two steps, the first step considers the time depending character of facial expressions, the second step realizes the movements of the face. Figure 5 shows the concept of the behavior based architecture. Every facial expression consists of two steps. The first step gets the activation calculated with the input vector (A, V, S). It determines a function, act(t, ι), for the activation depending on time (equation 2 and figure 6). In equation 2 g stands for the gradient, h for the time the max. value is hold, n for the negative gradient and p for the percent of the max. value that are hold until the input falls under a certain threshold. The resulting functions for the 6 basic facial expressions are shown in figure 7. if t ≤ g1 and ι > 0.1, if g1 < t ≤ 1+h g and ι > 0.1, p−1 1+h if 1+h g <t< g + n act(t, ι) = and ι > 0.1, p−1 p · ι if t ≥ 1+h g + n and ι > 0.1, p−1 p max(p · ι + n · ι · t, 0) if 1+h g + n < t < −n and ι ≤ 0.1, 0 else. (2) g·ι·t ι ι + n · ι · t Fig. 7. The activation functions for the facial expressions of the robot head ROMAN, A = anger, B = disgust, C = fear, D = happiness, E = sadness and F = surprise. TABLE I TABLE OF THE ROBOT HEAD ROMAN S ACTION UNITS . Fig. 6. The general function that represents the regard of time for facial expressions. The parameters that changes depending on facial expression are the gradient, time t that the maximum is hold, x and the negative gradient. This activation is led to the second behavior step. This step realizes the facial expressions. Therefore, several action units, similar to [13], are defined as you can see in table I. The difference between the action units in [13] and the ”new ones” are that the action units of ROMAN not only move in one direction. That means that the ”muscles” of ROMAN are able to pull and push. Because of this the action units used here have two parameters: AU (x, y), x{−1, 1} stands for the direction (−1 = push, 1 = pull) and y stands for the strength of the movement. The action units used by the behaviors are shown in table II. The strength si of the action unit AUi (xi , y)i caused by facial expression j, is calculated as shown in equation 4. The parameter aj in this equation stands for the activation of the facial expression j (see equation 3). The output u~j of the facial expression j is: u~j = (s1 , s2 , s9 , s12 , s15 , s20 , s24 , s26 ). ai = acti (t, ι), i[1, ..., 6] (3) si = AUi (xi , yi ) · aj (4) ⇔ si = xi · yi · aj Finally the action unit alignments of every basic facial expression behavior are fused. This fusion is done by equation 5, Action Unit Number 1 2 9 12 15 20 24 26 Description raise and lower inner eyebrow raise and lower outer eyebrow nose wrinkle raise mouth corner lower mouth corner stretch lips press lips lower chin where u~i stands for the action unit alignment of the facial expression i, ai for the activation of facial expression i and ~u for the resulting action unit alignment. ~u = 5 X k=0 ak P5 j=0 aj · u~k (5) To prevent influence between different facial expressions, the inhibition option of the behaviors is used. If a facial expression is activated for more than 80% ”contrary” facial expressions are inhibited (e.g. if happiness is activated for more than 80% the activation of sadness should be 0%) and if one facial expression is more than 95% activated all other facial expressions are inhibited. TABLE II T HE ACTION UNIT COMBINATIONS TO SHOW THE 6 BASIC FACIAL EXPRESSIONS . Facial Expression Anger Disgust Fear Happiness Sadness Surprise Action Unit Combination AU 1(−1, y) + AU 2(1, y) + AU 20(1) AU 1(−1, y) + AU 2(−1, y) + AU 9(1, y) + AU 20(1, y) + AU 24(1, y) AU 1(1, y) + AU 2(1, y) + AU 26(1, y) AU 12(1, y) + AU 26(1, y) AU 1(1, y) + AU 2(−1, y) + AU 15(1, y) AU 1(1, y) + AU 2(1, y) + AU 26(1) The resulting facial expressions of our robot head ROMAN1 are shown in figure 8. with levels from 1 to 5 (1 means a weak and 5 a strong correlation). The results of the experiment should help to get more information of the recognition and the demonstration of facial expressions. Furthermore with help of the results the facial expressions of ROMAN should be rectified. The program used for the analysis of the evaluation was the SPSS2 (= Statistical Package for the Social Scientist). The results of the evaluation are shown in table III and in table IV. The left column contains the shown facial expression. The right column contains the average values of the detected correlation between the 6 basic facial expression and the current picture or video. TABLE III T HE RESULTS OF THE EXPERIMENTAL EVALUATION FOR THE PICTURES . Presented emotion Anger Detected strength Anger:4.5, Disgust:1.8, Fear:1.5, Happiness:1.0, Sadness:1.4, Surprise:1.2 Anger:1.7, Disgust:2.6, Fear:1.0, Happiness:2.6, Sadness:1.2, Surprise:3.7 Anger:1.4 Disgust:1.8, Fear:3.6, Happiness:1.5, Sadness:1.8, Surprise:3.8 Anger:1.1 Disgust:1.0, Fear:1.2, Happiness:4.3, Sadness:1.0, Surprise:2.3 Anger:2.2 Disgust:2.3, Fear:2.8, Happiness:1.0, Sadness:3.9, Surprise:1.3 Anger:1.3 Disgust:1.3, Fear:2.7, Happiness:1.4, Sadness:1.6, Surprise:4.2 Anger:1.5, Disgust:1.5, Fear:3.0, Happiness:1.6, Sadness:1.8, Surprise:2.8 Anger:3.0, Disgust:2.4, Fear:2.0, Happiness:1.0, Sadness:2.5, Surprise:1.4 Anger:2.2, Disgust:2.4, Fear:2.8, Happiness:1.0, Sadness:3.7, Surprise:1.3 Disgust Fear Happiness Sadness Surprise 50% Fear Anger, Fear and Disgust 50% Sadness TABLE IV T HE RESULTS OF THE EXPERIMENTAL EVALUATION FOR THE VIDEOS . Presented emotion Anger Disgust Fear Happiness Fig. 8. The facial expressions generated with the robot head ROMAN: A = anger, B = disgust, C = fear, D = happiness, E = sadness and F = surprise. Sadness Surprise V. E XPERIMENTAL E VALUATION The experiment set-up was as follows: we presented 9 pictures and 9 videos with facial expressions of ROMAN to 32 persons (13 women and 19 men) at the age of 21 to 61 years. Every person has to determine the correlation between presented expression an the 6 basic facial expressions 1 The silikon skin of ROMAN was produced and designed by Clostermann Design Ettlingen. 50% Fear Anger, Fear and Disgus 50% Sadness Detected strength Anger:3.7, Disgust:2.1, Fear:2.7, Happiness:1.4, Sadness:1.7, Surprise:1.7 Anger:3.0, Disgust:2.6, Fear:1.9, Happiness:1.0, Sadness:2.3, Surprise:1.5 Anger:1.9, Disgust:1.5, Fear:3.3, Happiness:2.5, Sadness:1.5, Surprise:4.2 Anger:1.1, Disgust:1.0, Fear:1.1, Happiness:4.3, Sadness:1.1, Surprise:2.5 Anger:1.6, Disgust:1.5, Fear:1.9, Happiness:1.1, Sadness:3.6, Surprise:1.5 Anger:1.6, Disgust:1.6, Fear:3.5, Happiness:1.3, Sadness:1.3, Surprise:4.6 Anger:2.0, Disgust:1.8, Fear:3.2, Happiness:1.4, Sadness:1.8, Surprise:3.5 Anger:1.9, Disgust:1.4, Fear:2.4, Happiness:1.2, Sadness:3.4, Surprise:1.5 Anger:1.6, Disgust:1.4, Fear:2.0, Happiness:1.1, Sadness:3.5, Surprise:1.5 The results of the evaluation show that the correct recognition of the facial expressions anger, happiness, sadness and 2 http://www.spss.com/de/ surprise is significant (significance α < 5%). But the facial expressions fear and disgust are not identified. Furthermore the results show that in most cases there are no significant differences between the evaluation of the pictures and the videos (significance α < 5%). Due to continuous human machine interaction the result of the video experiment is more important. The analysis of the understated emotions ( fear 50% activation and sadness 50% activation) show, the subjects recognize that the facial expression is not that strong than in the case of 100% activation. The evaluation of the anger, fear and disgust mixture shows that in this case no facial expression is identified. But the selected facial expressions are named in the most cases (similar to the interpretation of comparable human faces expressions). Compared to Ekman’s experiments for the recognition of human facial expressions there was no significant difference in the evaluation of the basic emotions anger, happiness, sadness and surprise. To improve the perception of the facial expressions fear and disgust it is planed to optimize the action units at the wing of the nose. In addition new action units of the neck and of the eyes should be used to strengthen these facial expressions [14]. VI. C ONCLUSION In this paper a humanoid head construction is introduced, which will be used to interact with humans. One focus of the present research is how the facial expressions of a human being can be realized on the robot head ROMAN. From our point of view the complexity of this problem will be reduced, if the robot head is human-like. Using a behavior-based control architecture for the robot head ROMAN, facial expressions were realized. In experiments with several persons it is shown that the generated facial expressions are in general classified correctly. The next steps taken in the course of this work will include the addition of facial expression analysis to the image processing subsystem, and in parallel to this the completion of the mechatronical design as well as enhancements of the implemention of the behaviour based control concept for interaction with humans. ACKNOWLEDGMENT A special thank to Prof. Dr. Stephan Dutke of the Department of Psychology of the University of Kaiserslautern, who helped us with the experimental evaluation. R EFERENCES [1] A. Takanishi, H. Miwa, and H. Takanobu, “Development of humanlike head robots for modeling human mind and emotional human-robot interaction,” IARP International workshop on Humanoid and human Friendly Robotics, pp. 104–109, December 2002. [2] C. L. Breazeal, “Emotion and sociable humanoid robots,” International Journal of Human-Computer Studies, vol. 59, no. 1–2, pp. 119–155, 2003. [3] N. Esau, B. Kleinjohann, L. Kleinjohann, and D. Stichling, “Mexi machine with emotionally extended intelligence: A software architecture for behavior based handling of emotions and drives,” in Proceedings of the 3rd International Conference on Hybrid and Intelligent Systems (HIS’03), Melbourne, Australia, December 14-17 2003, pp. 961–970. [4] C. Breazeal, “Sociable machines: Expressive social exchange between humans and robots,” Ph.D. dissertation, Massachusetts Institute Of Technology, May 2000. [5] “Kismet,” http://www.ai.mit.edu/projects/humanoid-robotics-group/ kismet/kismet.html, 2001. [6] “Emotion expression humanoid robot,” http://www.takanishi.mech. waseda.ac.jp/research/eyes/we-4rII/index.htm, 2005. [7] M. Bennewitz, F. Faber, D. Joho, M. Schreiber, and S. Behnke, “Towards a humanoid museum guide robotthat interacts with multiple persons,” in Proceedings of the IEEE-RAS/RSJ International Conference on Humanoid Robots (Humanoids), Tsukuba, Japan, December 5-7 2005, pp. 418–423. [8] ——, “Enabling a humanoid robot to interact with multiple persons,” in Proceedings of the nternational Conference on Dextrous Autonomous Robots and Humanoids (DARH), Yverdon-les-Bains - Switzerland, May 19-22 2005. [9] K. Berns and T. Braun, “Design concept of a human-like robot head,” in Proceedings of the IEEE-RAS/RSJ International Conference on Humanoid Robots (Humanoids), Tsukuba, Japan, December 5-7 2005, pp. 32–37. [10] K.-U. Scholl, V. Kepplin, J. Albiez, and R. Dillmann, “Developing robot prototypes with an expandable modular controller architecture,” in Proceedings of the International Conference on Intelligent Autonomous Systems, Venedig, June 2000, pp. 67–74. [11] K. Scholl, V. Kepplin, J. Albiez, and R. Dillmann, “Developing robot prototypes with an expandable modular controller architecture,” in Proceedings of the International Conference on Intelligent Autonomous Systems, Venedig, June 2000, pp. 67 – 74. [12] J. Albiez, T. Luksch, K. Berns, and R. Dillmann, “An activation-based behavior control architecture for walking machines,” The International Journal on Robotics Research, Sage Publications, vol. 22, pp. 203–211, 2003. [13] P. Ekman and W. Friesen, Facial Action Coding System. Consulting psychologist Press, Inc, 1978. [14] P. Ekman, W. Friesen, and J. Hager, Facial Action Coding System. A Human Face, 2002.