Facial Expression Based Real-Time Emotion Recognition Mechanism for Students with High-Functioning Autism Hui-Chuan Chu1, William Wei-Jen Tsai3, Min-Ju Liao2, Wei-Kai Cheng3, Yuh-Min Chen3, Su-Chen Wang3 1 Department of Special Education, National University of Tainan, Taiwan 2 Department of Psychology, National Chung-Cheng University, Taiwan 3 Institute of Manufacturing Information and Systems, National Chen Kung University ABSTRACT The emotional problems of students with autism may greatly affect their learning in e-learning environments. This paper presents the development of an emotion recognition mechanism based on a proposed emotional adjustment model for students with high-functioning autism in a mathematics e-learning environment. The physiological signals and facial expressions were obtained through evoking autistic students’ emotions in a mathematical e-learning environment, and used for training the emotion classification model and to verify the performance of the emotion classification mechanism. In total, 34 facial features were obtained experimentally that were conducted by using a counterbalanced design, and 46% of features were further extracted by the chi-square method, Information Gain (IG), and Wrapper feature selection methods. A Support Vector Machine was used to train the emotion recognition model and assess the performance of the proposed emotion recognition mechanism. Four emotional categories, calmness, happiness, anxiety, and anger, were identified based on the 34 features. The accuracy rate of the recognition model was 82.64%. By balancing feature reduction and recognition accuracy, using Wrapper and SVM, we were able to reach an 81.63% accuracy rate and reduce feature size by 46%. This Emotion Recognition mechanism can operate with an affective tutoring system by recognizing emotional changes in students with high-functioning autism, thus enabling timely emotional adjustments. KEYWORDS Students with high-functioning autism, Face measures, Emotion recognition INTRODUCTION Improving achievement for all students, including students with disabilities, is a focus of current educational reform efforts in many countries. However, because students with disabilities learn differently from normally achieving students, a major issue in special education is the provision of adaptive education in various academic disciplines, and developing learning approaches for students with disabilities based on their diverse characteristics. Research has indicated that elementary and secondary school students consider mathematics to be the most difficult subject (Mazzocco & Myers, 2003). As students advance through the grades, their difficulties with mathematics continue to increase, and the ratio of those who hate math increases, while learning interest and motivation plummet, which all work to reduce learning effectiveness. Students with disabilities may have more severe problems in this regard, indicating that it is important for researchers to explore innovative instructional methods and delivery systems for more effectively addressing the serious challenges this population faces (Fuchs & Vaughn, 2012). Many e-learning methodologies have recently been proposed. Advances in information and communication technologies have had some positive effects, such as eliminating time and space barriers, reducing learning costs, and providing adaptive learning services, and are significantly improving learning effectiveness. Furthermore, among the various disabilities students may have, those with high-functioning autism—despite their deficiencies in social communication skills and resistance to change (Wainer & Ingersoll, 2011)—often have excellent spatial cognition and memory. For these students, e-learning environments can meet their behavioral and cognitive requirements (Swezey, 2003; Cheng & Ye, 2010), stimulate their learning motivation, and resolve many difficulties they experience in the learning process. (Vullamparthi et al., 2011). A profound emotional impairment is a core feature of autism spectrum disorder (ASD) (Begeer et al, 2008; Fabio et al, 2011), frequently influencing internal and external emotional factors in the learning process, which may arise during failures and setbacks. Emotional reactions from students with ASD are typically very intense, and can cause anxiety or other negative attitudes (Reaven, 2009). Despite the numerous benefits that e-learning environments can provide for students with ASD, emotional problems are a main cause of interruptions in learning, reducing learning effectiveness. The application of the field of special education e-learning typically emphasizes cognitive assistance, instructional design, and multimedia presentation for teaching materials, but often neglects the emotional experiences of students with ASD during learning. Therefore, providing emotion-related assistance to students with ASD is an emerging issue. Since the concept of affective computing was first proposed by Picard (1997), e-learning platforms have evolved from intelligent tutoring systems (ITSs) (Schiaffino et al., 2008; Curilem et al., 2006) to affective tutoring systems (ATSs) (Mao & Li, 2010; Christos & Anastasios, 2009; Shen et al., 2009; Afzal & Robinson, 2011) that integrate new technologies with the conventional e-learning environment. The ATSs are based on the idea that an emotion detection mechanism can be developed that enables a learning system to automatically identify a student’s emotional state. Fortunately, this feature satisfied the requirement of emotional adaptation for students with ASD. Currently, the main research direction is the identification of emotional features, such as physiological signals, facial expressions, speech, and physical postures (Moridis & Economides, 2008). Although using an ATS for autistic students can increase their learning performance, some critical issues remain. Students with ASD have severe difficulties with communication-related emotions, interacting with society, and presenting their own facial expressions and describing those of others. Notably, their facial expressions change less than those of ordinary people (Bieberich & Morgan, 2004; Czapinski & Bryson, 2003). Physiological sensors are unsuitable for students because wearing a device makes them uncomfortable, and these negative feelings may cause students to be restive in the learning process. Additionally, how to improve the real-time recognition performance is an important issue for timely emotional adjustment. To address the challenges described above, this work utilizes the facial-expression-based approach to recognize student emotional states in the learning process for two reasons. First, human emotional expressions come in numerous forms, such as facial expressions, body language, and vocalizations. Among these, 55% of emotional information is conveyed by facial features (Mehrabian, 1968). Second, facial-feature-based recognition does not use body-based devices. Addressing performance issues for real-time emotion recognition using the feature selection technique to reduce the dimensions of data a set is necessary. Furthermore, accelerating training time, preventing over-fitting issues, and identifying which features are useful in recognition process are of critical importance. PROPOSED AFFECTIVE TUTORING SYSTEM FRAMEWORK An ATS framework was designed that works by identifying the students’ emotional states combined with emotional adaptation in the learning process (Fig. 1). The model has two layers, a cognitive layer and an emotional layer, which support students in different ways. Affective Tutoring System Framework Cognitive Content Repository Cognitive Adaptation Mid-term Assessment Domain Knowlegde Model Learning Adaptation Model Cognitive Adaptation Students Model Cognitive Layer Remote Supporting Parent Parent // Teacher Teacher Advisor Advisor // Expert Expert Learning Plan Execution Adaptive Learning Outcomes Assessment Student Student with with ASD ASD Affective Content Repository Affective Layer Affective Adaptation Emotion Recognition Emotion Adaptation Emotion Model Emotion Adaptation Model Figure 1. Affective Tutoring System for students with autism Analysis of Strategy Effectiveness EMOTION RECOGNITION MECHANISM Online Recognition Phase Emotion Emotion Recognition Recognition Result Result PC With WebCam Emotion EmotionFeatures Features recognition Modeling Phase Feature Extraction Video Source Recognition Model Construction generate Emotion Model Emotion EmotionTagging Tagging Emotion EmotionFeatures Features Evaluation, Optimization Figure 2. Emotion Recognition Mechanism An emotion classification mechanism for students with high-functioning autism was designed to resolve emotional recognition problems and ensure smooth operation of the affective system framework. The Emotion Recognition Mechanism has a modeling phase and an online recognition phase. In the modeling phase, the proposed mechanism extracts a feature set from video records of an emotion evocation experiment, and builds the emotion recognition model. The online recognition phase extracts a feature set from the webcam as input to the emotion model for real-time emotion recognition. These are explained in more detail below. Feature Extraction The goal of the facial expression feature extraction process (Fig. 3) is to identify changes in the facial expressions of students with ASD. Facial feature anchor points were transformed into expression features characteristic of different emotional states, producing 34 primary facial features. This process included facial feature tracking, expression feature extraction, and expression feature normalization. Facial Feature Tracking Facial Feature Point Facial Feature Preprocessing Video Frameset Distance-based Feature Set Angle-based Feature Set Statistical Processing Normalization Synthetic Feature Set Figure 3. Facial Expression Feature Extraction Process Facial Feature Tracking: Changes in facial expression are closely related to facial features. Thus, the number and position of the corresponding points of facial features must be defined for subsequent facial feature positioning and tracking. Most studies used two-dimensional coordinate positioning on the X-axis and Y-axis. The core technology developed in this work, using FaceAPI (Zintel, 2008), which considers the Z-axis, in the Cartesian head three-dimensional coordinate positioning method to automatically follow features in sequences of human face images and capture feature point coordinates. This allows the heads of subjects to turn +/- 90° while preserving excellent tracking. Figure 4 shows the selection of facial feature points, with a total of 21 points. Of these, five points are for each eye, three points are for each eyebrow, four points are for the mouth, and one point on nose. LbP1 RbP2 LbP2 LbP3 RbP3 RbP1 LD15 LA10 LeP2 LeP1 LeP4 ReP4 LeP3 ReP2 ReP1 ReP3 LD6 D1 LD8 RA11 RD16 RD7 RD9 ReP5 LeP5 nP RD4 LD3 A14 mP2 mP3 mP1 y x LA12 y x mP4 RA17 D5 A13 D2 z z Figure 4. Facial feature point distribution Figure 5. Facial feature distances and angles Figure 6. An Example of Signal Series (Feature D5) Facial Feature Preprocessing: In total, 21 feature points were acquired. Feature distances and angles for judging facial expressions were defined between each point to serve as features of a facial expression (Fig. 5). Twelve feature distances and five feature angles were calculated using the coordinates of facial feature points, as in Eqs. (3) and (4): - (3) - (4) Where D2: mouth width. mP1: right angle positioning point of the mouth. mP3: left angle positioning point of the mouth. Statistical Processing: The average and standard deviation of each feature change is calculated for preset time intervals (Fig. 6). The time interval value depends on configurations of the data collection procedure (see “dataset” section). Normalization: As the range of raw data values varies widely, SVM algorithms will not work properly without normalization. If one feature has a broad range of values, the distance will be governed by this particular feature. Therefore, the range of all features must be normalized, such that each feature contributes a rational ratio to the final distance: - (5) Recognition Model Construction This section introduces how to construct an emotion recognition model using a machine learning approach. This process includes feature selection, and model training. These are explained in sequence below: Feature Selection Filter Synthetic Feature Set SVM Model Training Emotion Model Model Evaluation Wrapper Figure 7. Construction Process for the emotion classification model Feature selection Feature selection algorithms can be classified as filters and wrappers. With filter algorithms, features are first scored and ranked according to relevance to the class label, and are then selected according to a threshold. Wrappers utilize a specific machine learning algorithm as a black box to score a feature subset by evaluating its predictive power. The filter approach is often computationally cost-effective, but wrapper approach usually leads to a better generalization in data separation. This work used both approaches to search for the optimized feature set and compare it. In filter-based feature selection, this work applied IG (Quinlan, 1979) and chi-square value to rank each attribute. Chi-Square Value ( ): This method is based on the measurement of the lack of independence between an emotion feature and emotion classes, and can be compared to the chi-square distribution with one degree of freedom to assess extremeness. Information Gain (IG): According to information theory, IG is a feature selection method that is fundamentally defined as “the amount of information prior to testing” subtracted by “the amount of information after testing”. As in Eq. (5), the amount of information a feature contains was calculated to judge whether the feature needed to be selected. This method was used to reduce the original feature sets to feature subsets that were easier to process, thereby reducing the number of feature dimensions. Thus, IG was used in this work to calculate the amount of information in each emotion feature. “The amount of information before testing” corresponds to this part of the study, and represents the calculation of the total amount of information contained in the emotional categories. “The amount of information after testing” is the total amount of information in certain single emotion features after information S was categorized. As the IG increases, the amount of information in an emotional feature increases, which is important to the classification algorithm. InfoGain( Aj ) = Info(S ) - Entropy( Aj ) Where Info(S ) : Total amount of information contained by all the emotion categories. (8) Entropy( Aj ) : Total amount of information contained in a certain emotion feature after information is categorized. A j : A certain single emotion feature. For wrapper feature selection, this work used an SVM as a black box and iterated a feature set by Sequential Forward Selection (SFS) to search for the best feature set. Emotion recognition model training Support Vector Machine (SVM): An SVM is a machine learning method derived from statistical learning theory. It uses structural risk minimization (SRM) for rules, constructs a separating hyperplane through the learning mechanism, and differentiates data from two or more classes. This work posits a multi-class emotion classification problem, which is an inseparable linear type. The radial basis function (RBF) kernel (Hsu et al., 2003) must be used to calculate an equation such as Eq. (9), to serve as the kernel function. The feature data are transformed from the input space to the feature space, and linear classification is then applied to the space. K (xi , x j ) = exp(- g || xi - y j ||2 ), g > 0 (9) The RBF kernel is useful for classifying nonlinear, highly dimensional data, as it only requires an adjustment of cost (C) and gamma (γ). Therefore, finding the optimized C and γ is critical. The SFS was used to seek the optimal combination of parameters (C and γ) to train the enhanced SVM-based emotion classification model. EVALUATION RESULT Dataset Our previous study (Chu et al., 2012) proposed a dual-mode offline classification mechanism for high-functioning autism students; this work furthers that research. To improve real-time facial feature recognition, this work removed physiological data, used legacy video source files for data refinement, and captured video data with 15 frame per second (FPS) at 30-second intervals (450 frames). If data loss or noise exceeded 20% during one interval (less than 360 frames were available) then it was removed. In total, 592 emotion samples remained after filtering to construct the emotion recognition model. Among these samples, 241 samples were tagged as calmness, 91 samples as happy, 218 as anxious, and 42 samples as angry. Figure 8 and 9 show the emotion evocation experimental environment. Monitor and Data Collection Webcam Participant with Autism Monitor Parents Monitor Researcher Special Education Teacher Figure 8. Emotion evocation experimental environment Figure 9. A student with high-functioning autism doing the exercise Evaluation Protocol This work’s protocol was to evaluate model effectiveness and the proposed methodology (Fig. 10). In this work, 10-fold cross-validation was applied to prevent over-fitting (Delen et al., 2005). The evaluation protocol is described below. SMOTE All Features Non-SMOTE (Raw Source) Feature Extraction Feature Set SVM Classifier Wrapper Evaluation IG Filter Chi-Square Feature Extraction Emotion Recognition Figure 10. The evaluation procedure Number of features preserved: To optimize performance and identify the importance of features, this work determined the number of features preserved. TP Rate and F-Measure: After completing model training, precision (P), recall (R), and the F-measure were used to assess performance in emotion classification using the following equation: - (10) - (11) - (12) Where: TP: Number of emotion categories classified correctly. FP: Number of emotion categories classified incorrectly. FN: Number of samples belonging to certain emotion categories but classified incorrectly. ROC (Receiver Operating Characteristic) AUC (Area Under Curve): The ROC curve is a standard technique for summarizing classifier performance over a range of trades using true positive and false positive error rates (Swets, 1988). The ROC convex hull can also be used as a robust method of identifying potentially optimal classifiers (Provost & Fawcett, 2001). Result of Feature Selection Through statistical processing, normalization,34 face-related emotional features were obtained as input for feature selection. According to analytical results, this work searched for the best configuration of feature selection through emotion recognition evaluation. The feature selection details are listed below. Table 1. Number of features preserved Feature Feature Reduced Rate Selection Quantity IG 15 65% Chi-Square 15 65% Wrapper 18 47% Evaluation of Emotion Recognition Mechanism In the evaluation protocol, all configuration sets were input variables for evaluation of the emotion recognition model. Testing and verification used 10-fold cross-validation to obtain the final average correct rate and prevent over-fitting. The optimal emotional classification accuracy was 68.24% (Table 4). All Features IG ChiSquare Wrapper Index TP Rate (%) F-Measure ROC AUC TP Rate (%) F-Measure ROC AUC TP Rate (%) F-Measure ROC AUC TP Rate (%) F-Measure ROC AUC Table4. Result of Emotion Recognition Model Calmness Anxiety Happy 74.30 71.10 59.30 0.710 0.706 0.632 0.752 0.767 0.771 71.00 68.30 47.30 0.668 0.665 0.528 0.712 0.733 0.707 73.40 73.90 47.30 0.704 0.709 0.528 0.746 0.769 0.707 75.10 71.10 47.30 0.702 0.703 0.541 0.742 0.765 0.711 Angry 38.10 0.457 0.680 28.60 0.393 0.636 31.00 0.406 0.647 33.00 0.412 0.656 Overall 68.24 0.679 0.755 63.34 0.626 0.714 66.55 0.658 0.742 66.39 0.657 0.739 CONCLUSIONS AND FUTURE WORK Providing emotion-related assistance to students with ASD is important. This work applied a novel ATS framework to assist high-functioning autistic students in e-learning, and an automated real-time emotion recognition mechanism that supports the proposed ATS framework. The mechanism uses facial expressions to recognize emotions. The emotions of students with ASD during the learning process, such as calmness, happiness, anxiety, and anger, were evoked by real mathematical e-learning situations. 34 features were selected to build an emotion recognition model with the SVM. Emotion classification accuracy was 68.24%. Through cross-validation of all configuration subsets, including three feature selection methods, our work demonstrates that using Chi-Square is the best solution for recognition because the rational trade off in accuracy lost is 1.69%, over half of the feature set decreased (65%), and accuracy in recognizing negative emotions increases. This demonstrates that the mechanism is feasible, effective, and practical. By combining it with an e-learning system, this approach could serve as effective and on-time, unobtrusive monitor that assists students in learning for achieving a harmonious experience. On the other hand, the proposed mechanism is a necessary component in an ATS for any purpose. However, the following limitations and issues deserve comment. First, children with autism spectrum disorders often have emotions that are highly correlated with some repetitive actions or spontaneous non-verbal sounds. Therefore, these actions and sounds can also be combined to develop an emotion recognition mechanism, and this may make the identification of emotions more accurate and faster. At the same time, a larger scale experiment needs to be performed to ensure the stability of the emotion classification model. Second, researchers developing an emotional adjustment model for use in mathematics e-learning not only need to consider the accuracy of emotion recognition, but also develop emotional adaptation strategies for autistic students. Such strategies should consider two important factors, emotions and learning, and then develop new types of teaching methods. Currently, few studies are examining emotional adaptation strategies for students with disabilities in a mathematics e-learning context. Finally, the development of emotion recognition and adaptation strategies are needed to improve mathematics e-learning for autistic students. Since the ultimate goal of such work is to enhance learning effectiveness, the practical application of an emotional adjustment model for learning will need to be undertaken and compared to traditional classroom learning or other e-learning systems. ACKNOWLEDGEMENT The authors would like to thank the National Science Council of the Republic of China, Taiwan, for financially supporting this research under Contract No. NSC 100-2628-S-024-003-MY3. REFERENCES Afzal, S., & Robinson, P. (2011). Designing for Automatic Affect Inference in Learning Environments. Educational Technology & Society, 14 (4), 21–34. Alexander S., Sarrafzadeh A. (2006). "Easy with Eve: A Functional Affective Tutoring System", 8th Int. Conf. on Intelligent Tutoring Systems. Begeer, S., Koot, H. M., Rieffe, C., Meerum Terwogt, M., & Stegge, H. (2008). Emotional competence in children with autism: Diagnostic criteria and empirical evidence. Developmental Review, 28, 342-369. Bieberich, A. A., & Morgan, S. B. (2004). Self-regulation and affective expression during play in children with autism or Down syndrome: A short-term longitudinal study. Autism and Developmental Disorders, 34 (4), 439-448. Brusilovsky, P., & Peylo, C. (2003). Adaptive and intelligent web-based educational systems. Artificial Intelligence in Education, 13 (2-4), 159–172. Cheng, Y., & Ye, J. (2010). Exploring the social competence of students with autism spectrum conditions in a collaborative virtual learning environment: The pilot study. Computers & Education, 54, 1068-1077. Christos, N. M., & Anastasios, A. (2009). Economides: Prediction of student's mood during an online test using formula-based and neural network-based method. Computers & Education, 53 (3), 644-652. Chu H.-C., Liao M.-J., Cheng W.-K., William W.-J. Tsai, Chen Y.-M.(2012), Emotion Classification for Students with Autism in Mathematics e-learning using Physiological and Facial Expression Measures, WASET 2012, Paris, France Curilem, S. G., Barbosa, A. R., & De Azevedo, F. M. (2007). Intelligent tutoring systems: Formalization as automata and interface design using neural networks. Computers and Education, 49(3), 545-561. Czapinski, P., & Bryson, S. E. (2003). Reduced facial muscles movements in autism: Evidence for dysfunction in the neuromuscular pathway. Brain and Cognition, 51 (2), 177-179. Delen, D., Walker, G., & Kadam, A. (2005). Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine, 34 (2), 113-127. Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Retrieved March 24, 2012, from http://www.csie.ntu.edu.tw/∼cjlin/papers/guide/guide.pdf. Jabon, M., Ahn, S. J., & Bailenson, J. N. (2011). Automatically analyzing facial-feature movements to identify human errors. IEEE Journal of Intelligent Systems, 26(2), 54-63 Kapoor A., Burleson W., Picard W.R. (2007), Automatic prediction of frustration, International Journal of Human-Computer Studies 65, 724–736. Litman D & Forbes, K (2003). Recognizing emotions from student speech in tutoring dialogues. Proc. of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). Mao, X., & Li, Z. (2010). Agent based affective tutoring systems: A pilot study. Computers & Education, 55(1), 202–208. Mehrabian, A. (1968). Communication without words. Psychology Today, 2 (9), 52-55 Miriam M. , Rana el K. , Matthew G. , Rosalind P. (2008), Technology for just-in-time in-situ learning of facial affect for persons diagnosed with an autism spectrum disorder, Proceedings of the 10th international ACM SIGACCESS conference on Computers and accessibility, Halifax, Nova Scotia, Canada. Mohamed B.A., Mahmoud N., Adel. M. A., Guy G. (2010). The Affective Tutoring System. Expert Systems with Applications, 37: 3013-3023 Moridis, C. N., & Economides, A. A. (2008). Toward computer-aided affective learning systems: a literature review. Educational Computing Research , 39 (4), 313-337. Nkambou, R., Bourdeau, J., & Psyché, V. (2010). Building Intelligent Tutoring Systems: An Overview. In Nkambou, R., Bourdeau, J. et Mizoguchi, R. (Eds.). Advances in Intelligent Tutoring Systems: Studies in Computational Intelligence (pp. 361-375), Heidelberg: Springer Verlag Picard, R. W. (1997). Affective Computing, Cambridge, MA: MIT Press. Provost, F. and T. Fawcett (2001), "Robust Classification for Imprecise Environments." Machine Learning 42, 203-231 Quinlan, J. R. (1979). Discovering Rules from Large Collections of Examples: A Case Study, In Michie, D. (Ed.) Expert Systems in the Microelectronic Age (pp. 168-201). Edinburgh, Scotland: Edinburgh University Press. Reaven, J. (2009). Children with high-functioning Autism Spectrum Disorders and co-occurring anxiety symptoms: implications for assessment and treatment. Specialists in Pediatric Nursing, 14 (3), 192-199. Rosa Angela Fabio , Patrizia Oliva, Anna Maria Murdaca (2011). Systematic and emotional contents in overselectivity processes in autism. Research in Autism Spectrum Disorders Volume 5, Issue 1, Pages 575–583 Schiaffino, S., Garcia, P., & Amandi, A. (2008). ETeacher: Providing personalized assistance to e-learning students. Computers and Education, 51(4), 1744–175. Shen, L., Wang, M., & Shen, R. (2009). Affective e-learning: Using emotional data to improve learning in pervasive learning environment. Educational Technology and Society, 12 (2), 176–189. Swets, J. (1988). Measuring the accuracy of diagnostic systems. Science 240, 1285 - 1293. Swezey, S. (2003). Book reviews-autism and ICT: a guide for teachers and parents. Computers & Education, 40, 95-96. Vullamparthi, A. J., Khargharia, H. S.,Bindhumadhava, B. S., & Babu, N. S. C. (2011). A Smart Tutoring Aid for the Autistic Educational Aid for Learners on the Autism Spectrum. IEEE International Conference on Technology for Education (pp. 43-50), Los Alamitos: IEEE Computer Society. Wainer, A., & Ingersoll, B. (2011). The use of innovative computer technology for teaching social communication to individuals with autism spectrum disorders. Research in Autism Spectrum Disorders, 5, 96-107. Whitehall J, Bartlett M & Movellan J. (2008), Automatic Facial Expression Recognition for Intelligent Tutoring Systems, Proc. IEEE Computer Vision and Pattern Recognition Conference. Zintel, E. (2008). Tools and Products. IEEE Computer Graphics and Applications, 28(6), 14-17.