2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) MeltdownCrisis: Dataset of Autistic Children during Meltdown Crisis 1st Marwa Masmoudi 2nd Salma Kammoun Jarraya 3rd Mohamed Hammami Mir@cl Laboratory Mir@cl Laboratory Mir@cl Laboratory University of Sfax CS Department, CS Departement Faculty of Science Sfax, Tunisia Faculty of Computing and Information Technology Sfax, Tunisia marwa.masmoudi19@gmail.com King Abdulaziz University mohamed.hammami@fss.rnu.tn Jeddah, KSA, Saudi Arabia smohamad1@kau.edu.sa Among the preventive measures, we try to examine the development vision-based applications in order to ensure the safety of autistic children, and to alert the caregiver by meltdown crisis appearance. The improvement of this application chiefly relies on datasets for the sake of evaluation and comparison. We cannot deny the importance of datasets/ benchmarks in the advancement of new approaches compared with the existing ones, meanwhile their shortage remains an important matter. In Autism application domains, few available datasets have many limitations: some do not have open access, others do not cover a wide range of realistic scenarios, stream types, visual information, etc. Furthermore, we note that there is no database for children with autism during a meltdown crisis recorded in realistic conditions. So, collecting our ”Meltdown Crisis ” dataset in realistic condition deal with this problematic is our challenge in this work. To have a wide range of usability, the dataset should have a maximum type of information, and should be organized to facilitate semantic access. Even though the semantic organization/access highly depends on the application, it can be defined based on scenarios common to the application domain. For instance, in the domain of applications destined to autistic children, the semantics can be defined in terms of two main scenario classes: ”Normal” vs. ”Abnormal”, or ”Comfortable” vs. ”Meltdown Crisis” States. This paper introduces a ”Meltdown Crisis” dataset which is arranged according meltdown crisis scenarios have to do with an abnormal facial expression or abnormal activities or both of them. To achieve our goal, first we describe the dataset through the use of videos captured by a Microsoft Kinect camera which allow full-body 3D motion capture, and facial and vocal recognition capabilities [4]. It has been used in various systems; to identify emotion through facial expressions [5], [6], [7], track 3D skeleton [8], [9] and discover different type of activities a person can do [10]. The rest of this paper is organized as the following: Section II gives an overview about existing autism datasets. Section III presents the different meltdown crisis scenarios and provides details about the Meltdown Crisis dataset. In Section IV, we Abstract—No one refutes the importance of datasets in the development of any new approach. Despite their importance, datasets in computer vision remain insufficient for some applications. Presently, very limited autism datasets associated with clinical tests or screening are available, and most of them are genetic in nature. However, there is no database that combines both the abnormal facial expressions and the aggressive behaviors of an autistic child during Meltdown crisis. This paper introduces a Meltdown Crisis, a new and rich dataset that can be used for the evaluation/development of computer vision-based applications pertinent to children who suffer of autism as security tool, e.g. Meltdown crisis detection. In particular, the ”MeltdownCrisis ” dataset includes video streams captured with Kinect which offers a wide range of visual information. It is divided on a facial expressions data and physical activities data. The current ”MeltdownCrisis ” dataset version covers several Meltdown crisis scenarios of autistic children along various normal state scenarios grouped into one set. Each scenario is represented through a rich set of features that can be extracted from Kinect camera. Index Terms—Autism, autistic children, Meltdown Crisis, behavior, facial expression, physical activities I. I NTRODUCTION Autism Spectrum Disorders (ASD) represents a heterogeneous set of neurodevelopmental disorders characterized by deficits in social communication and reciprocal interactions as well as stereotypical behaviors [1]. The prevalence of ASD has been increasing over the past two decades, with a global population prevalence of about 1% worldwide [2]. The diagnosis of an autistic child can be confirmed only after the age of 2 and 3 years from these abnormal facial expressions and activities. There are many facilities which enable autistic children to overcome their disability in daily life. However, these facilities dont allow to meet exceptional needs. Some of these needs are related to the autistic children security during an autism crisis. A child with high-grade autism frequently goes through a dangerous autistic crisis (Meltdown). According to [3], the Meltdown results from an overload and can occur when the child is alone. Children who go through a meltdown have no control over their actions; they can hurt themselves or others even if they dont want. 978-1-7281-5686-6/19/$31.00 ©2019 IEEE DOI 10.1109/SITIS.2019.00048 239 Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply. evaluate it using a simple approach as an example. Section V displays evaluation results. Finally, section VI draws conclusion of the whole work and presents our future works. year to create it. The subjects were displayed in emotional facial expression videos (reference stimuli) from the Mind Reading CD - a psychology resource [13]. They were ordered to imitate those expressions. There are two predefined and similar sets of expressions with 18 tasks for each set. Each one of these subjects has to mimic one set of expressions, i,e, 18 different expressions. These expressions include smiling, frowning, being tearful, etc., and belong to one of the six basic emotion groups - Anger, Fear, Disgust, Happiness, Sadness and Surprise. Data were collected from 32 facial markers worn by each child through the use of 6 MoCap cameras at 100 fps. Four stability marks were placed on the forehead and ears, and they are used to measure and correct head motion. The positions of the remaining 28 marks are recomputed according to the stability marks to show movement caused by head motion. Thus, we can focus on expression-related motion. These marks provide the necessary data for analysis of facial expressions. [7] gathered a Kinect Data for Objective Measurement of ADHD and ASD (KOMAA) dataset which include video recordings of a total of 55 subjects. The duration of each video is approximately 12 minutes. It is recorded through the use a Kinect 2.0 device which is capable of capturing high resolution RGB and depth images. All participants who undergo this experience were over the age of 18. During recordings, they sit in front of a computer screen to read and listen to a set of 12 piece of news. Each one of them is accompanied by 2-3 questions to be answered by these participants orally. Topics in this database can be categorized as the following; the first one is a control group that includes subjects who have no symptoms of ADHD / ASD and who have never been diagnosed with either Attention Deficit Hyperactivity Disorder (ADHD) or ADHD. The other three categories include the ASD group (consist of subjects with ASD), the ADHD group (subjects with ADHD) and the ASD ADHD group (subjects with ADHD and ASD). KOMAA dataset are used to evaluate an application that meant automatically to help in screening people who are suffering from ADHD and ASD by extracting Facial animation units (AnUs). They adapted a dynamic deep learning method to identify facial units from RGB data. Overall, all the works mentioned above have developed different datasets with different purposes. But, these latter cant cover the several challenges can persist in real conditions acquisition. In addition, in many cases of crisis the face cannot be determined and the crisis is characterized by abnormal and aggressive physical activity. In the next subsection, we will show a sample of works describing these abnormal activities. II. E XISTING DATASETS FOR AUTISTS Since 2011, many datasets have been acquired and built using the Kinect V2 camera. Kinect with Microsoft’s SDK offers several advantages over other cameras such as RGB data, Skeleton detection, and face localization. Moreover, all these benefits can work even with the presence of more than one person. In addition, dataset for people with Autism Spectrum Disorder (ASD) still in their infancy and there have been limited researches dealing with this topic. In this context, many different works have been proposed using datasets for autistic peoples. These datasets can be divided in three groups, on the one hand works employing facial expressions datasets. In the other hand many paper dealing with physical activities datasets, and, finally, using a multi-modal dataset that combine the facial expressions and the physical activities is treated in very limited research. A. Facial autistic dataset Facial expression is an important means of expressing and interpreting the emotional and mental states of a human being [11]. Given the incapacity of the autistic children to express a true emotion and to communicate properly with the other, a set of works find that it is a very rich domain to develop and create new approach and dataset which allows to recognize the autistic emotions through facial expression. In [6], authors used a method of machine learning to study an eye movement dataset from a task of face recognition [12], to categorize children with, and without ASD. [6] employed a dataset encompassed three groups of participants: 29 chines children aged from 4- to 11 with ASD, 29 Chinese TD (typically Developing) children fit with the chronological age, and another group of 29 Chinese TD children matched with IQ. Many children with ASD were examined by highly qualified doctors. The latter’s diagnosis met the diagnostic criteria for Autism Spectrum Disorder in accord with the DSM-IV 1 . Children were asked to memorize six faces (three Chinese faces as the same-race faces, and three Caucasian faces as the other-race faces), and later they undergo a test to identify these faces from 18 novel faces, including same- and otherrace faces. In [5], authors suggested an approach that allows to find out the characteristics of the facial gestures that cause a feeling of perceived atypticity among the observers. It also examines the difference between the subjects TSA and TD (Typical Development) in term of region predicated upon the dynamics and expressions’ activity which are proper to emotion. In order to assess their method, [5] utilized the MIMICRY dataset. They also used a MoCap marker database (designed and created by R. Grossman at the Facelab that has 45 subjects (24 with ASD and 21 TD) aged between 9 to 14 1 Diagnostic B. Physical Activities autistic dataset Children with ASD are often restricted, inflexible, and even obsessive in their behaviors, activities, and interests [10]. Different stereotypical and aggressive activities can be shown by these autistic people. So, the activities recognition of autistic peoples presents a rich source of data to be treated in several research works. and statistical manual of mental disorders, 2018 240 Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply. In this context, [14] introduced the development of an integrated system for children with autism (surveillance, rehabilitation and daily life assistance). The hierarchical classifier for the recognition of human’s position has been developed as well as the scalable symbols for Hidden Markov Models has been created. For the acquisition of data, Microsoft Kinect 2.0 depth sensor is used to record the skeleton data for autistic children, and to create some basic activities which introduced by psychologists. A few experiments for basic action models have been conducted. and the results achieved proved to be robust. [10]created the first online 3D dataset for autistic peoples. 3D-Autism Dataset (3D-AD) is captured with Kinect sensor. They explore different categories of autistic repetitive behaviors: static and dynamic ones, simple and complex: in fact, motions like hands on the face, hands back, tapping ears, head banging (or rocking back and forth), flicking, hands stimming, hand moving front of the face, toe walking, walking in circles and play with a toy from/to different positions repetitively; have been repeated for at least 10 times with non-autistic people. Capturing the depth maps was done rate 33 frames per second with Kinect-v2 camera. The 3D-AD have been experimentally evaluated using dynamic time warping to detect these abnormal behaviors. Although, these works recognized an important number of abnormal activities of autistic children, they dont cover the issues of abnormal activities that appear during meltdown crisis. Furthermore, although facial expressions play a crucial role in an emotion understanding [15] [16], body language which is expressed through poses offers complementary information. Therefore, the multi-modal data set is considered as a big challenge as it contains facial expression features and skeleton features as well. Example of this type of datasets is presented in the next subsection. cards which are either in the therapists hand or lie on the same table as the robot. In [18] considered only the RGB + depth modalities recorded using a Kinect v2 camera (at 30 FPS). The camera is placed directly above the robot head, towards the child. The child is facing the camera frontally, but due to the constraints related to robot’s position and recording cameras, the upper body is visible most of the time. They developed the first open access and online dataset for autistic children called DE-ENIGMA. It contains 2,031 annotated videos picturing children body movements and facial expressions. Most of these sequences (749) describe actions performed by children responding /collaboration to the therapist. In addition, overall, the dataset is designed to support robust, context-sensitive, multi-modal and naturalistic human-robot interaction solutions for enhancing the social imagination skills of such children [18]. But, dont cover challenges situation and scenarios of autistic children during meltdown crisis. Furthermore, our need for ”MeltdownCrisis ” dataset that covers many crisis cases and scenarios, and most importantly, that is recorded in all the streams provided by the Kinect camera to provide for the needs of different techniques. III. M ELTDOWN CRISIS DATASET: FACIAL EXPRESSIONS AND ACTIVITIES In order to overcome the problem of lack of realistic datasets, we have built a new dataset covering real scenarios of autistic children behaviors in normal state and during a Meltdown crisis. The most critical, delicate and serious mission is the one of recording videos for autistic children. It is not an easy task to record them in any culture. A deep research was conducted on autistic children hosted in healthcare in the world and particularly in Tunisia. We reached an agreement with the healthcare center ” ASSAADA” for autistics. This center supports 23 children aged between 6 and 15 years. The work has been conducted with the cooperation of professional team (psychiatrist, psychomotor therapist and others) that helped us to implement our various tasks such as: studying medical and psychological records of autistic children, analyzing videos, and conducting interviews with parents. The outcome that have been identified in this study is to consider the most severe Meltdown crisis of 13 autistic children. The goal of our ”MeltdownCrisis ” dataset construction is to obtain a vision-based dataset target the Autistic children and captured in various image-based streams using the Kinect sensor. To accomplish this goal, first we must determine: • The Kinect streams to record our ”MeltdownCrisis ” dataset with. • The autism behaviors during different Crisis types and scenarios. • The facial expressions and physical activities for autistic children during Meltdown crisis. Then, we arrange the ”MeltdownCrisis ” dataset content based on the scenarios, present a detailed description for our dataset including facial expressions and physical activities that can be shown during meltdown crisis. C. Multi-modal Autistic dataset In Psychology, behavior is defined as a set of observable reactions in an individual placed in his environment and in given circumstances. [17] examined the expression of emotion through body movement and posture along with facial expressions, also, it studied their importance in the context of embodied conversational agents [10]. In this context, [18] introduced a new fine-grained action and emotion recognition tasks which are defined on nonstaged videos. In fact, these videos of autistic children were taken during robot-assisted therapy sessions. The sessions are assisted by either therapist-only, or robot as well; the former are those who are interested in this work, while the latter is used for control purposes. In robot-assisted sessions a child and a therapist sit in front of a robot which is placed on a table. The therapist uses a remote- controlled robot to engage the child in the process of learning emotions. The therapy is based on scenarios in which the therapist shows cards which display various emotions (happy, sad, angry, etc.). In fact, these emotions are reproduced by the robot, and the child has to match them to those performed. The child has to pick up 241 Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply. A. ”MeltdownCrisis” dataset Streams in our dataset. Consequently, we have identified the following previous scenarios: • A child cannot support to stay in a closed room. • A child does not accept to practice an educational activity. • A child can suddenly lose control over her body. • A child stimulated by the agitation of his comrade. • A child cannot support a certain type of clothes tissue. • A child does not accept to leave his cover. • A child cannot support a high sound volume. All these collected previous scenarios preceded children’s crises. The Scenarios of these crisis will be explained and described in next section. For the reason that the Kinect camera combines the RGB color, depth, skeleton, infrared, body index, and audio into one single camera [19], we record our ”MeltdownCrisis ” dataset with all these streams except the audio. By availing of all these benefits of this camera, we build a rich multi-modal vision-based dataset to be the first ”MeltdownCrisis ” dataset available in all these streams. As illustrated in “Fig. 1”, Kinect can provide four type of streams that can be a rich source for the computation of various features. C. ”MeltdownCrisis ” datasetDesign Our recorded dataset contains three basic categories as shown in “Fig. 2”: 1) Children in Normal State: in a normal state, an autistic child can show behaviors that a normal person can express with his/her ways. Indeed, the child often presents stereotyped facial expressions and activities, but he/she is not aggressive. So, in a normal state, behaviors of an autistic child are different from those of normal child behaviors. 2) Children in Post-Crisis State: in a post-crisis situation, a child exhibits abnormal repetitive behaviors such as nervous smile, put his/her hands over his/her ears, makes stereotyped movement with his/her hands, and moved quickly etc. 3) Children in Meltdown Crisis State: in meltdown crisis state an autistic child expresses a rapid and subtle combination of several emotions of anger, sadness, disgust, fearful, and aggressive physical activities such as: striking himself, hitting the other, biting himself, bashing against the ground with hands, putting himself on the floor and moving quickly while doing some rolling etc. These behaviors are expressed intensively and aggressively in a very fast and stereotyped way. Fig. 1. Image-Based Streams Offered by the Kinect and recorded in our dataset. B. Previous Meltdown Crisis Scenarios A child with a severe degree of autism frequently go through a dangerous meltdown crisis. The meltdown crisis based on being overwhelmed by a challenging situation. Children with autism suffer from sensory, emotional and informational overload, or even too much unpredictability, which can trigger the appearance of various abnormal behaviors that resemble angry reactions (crying, screaming or violent gestures). Meltdown crisis symptoms are variable and are manifested by abnormal behavior of the autistic child according to several scenarios. These scenarios are essentially related to stereotypical movements and qualitative impairment of behavior ranging from partial or total mutism to hyperactivity or hypoactivity, from aggression to self-mutilation. As we mentioned that Meltdown crisis can be triggered by a stimulus that makes an autistic child in an uncomfortable situation. In the international manual of peadopsychiatry DSM-V and CIM-10 2 , we cannot limit the number of previous meltdown crisis scenario. That is, each child is a specific case which has these own stimuli and his own crisis. In this work, we only define the most frequent previous scenarios showing by the 13 children participating 2 Classification Fig. 2. MeltdownCrisis dataset content. D. ”MeltdownCrisis ” datasetSetting The videos are recorded using a Kinect V2 camera. This camera adds simplicity to the extraction of facial features and uses interaction techniques. Moreover, over time, it has internationale des maladies, 2008 242 Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply. and API Face Basics and 10 geometric distances (Euclidian Distance) [22] computed between the five interest facial points proposed by Kinect “Fig. 4”. “Table II”presents a detailed description for the 15 proposed features. Then, to prove the relevance of these features, we conducted a feature selection experiment; we apply two filtering methods (Relief-F [23] and Information Gain (IG) [24]), and a Wrappers method with 10 cross validations. We notice that these experiments have shown that the use of the totality of proposed features gave us the highest classification rate. become a low-cost solution for practicing the capture of gestural and facial movements. During 3 months, we have observed and recorded the behavior of the thirteen selected children in real conditions and during the whole day. Video acquisition is done in 3 rooms and with predefined settings for the Kinect; a height varies between 0.5 and 0.8 meters, angles of horizontal view = 70 and vertical 60 and with a distance varies between 0.5 and 2 meters. The Meltdown Crisis dataset is in the form of several video clips (Extended Event File XEF) recorded through the use of a single Kinect camera and Kinect Studio v2.0 [20]. As we have mentioned above, the thirteen selected children are 5 and 9 years-old. All the recorded videos are described in “Table I”which contain scenarios of child in normal state, child in post-crisis state and child in meltdown crisis state. These scenarios cover facial expressions and behavioral variations of the child during Meltdown crisis and also in normal states of the autistic children. ”MeltdownCrisis” dataset includes 59 videos: 18 videos during Post-Meltdown crisis and 18 videos in Meltdown crisis and 23 videos in normal state. To illustrate the content of MeltdownCrisis dataset, let us present a case when children are in post meltdown crisis state and in meltdown crisis state shown “Fig. 3”. V. R ESULTS AND DISCUSSIONS The evaluation experiments are based on True Positives (TP), False Positive (FP), and Accuracy (AC) [25]. TP is the number of positive instances (emotion frames) that were classified as positive (emotion). FP is the number of positive instances (emotion frames) that were classified as negative (Not emotions). While AC is the overall classification performance. We evaluate the set of features it using many classifier algorithms such as Support Vector Machine (SVM), Random Forest (RF), Random Tree(RT), C4.5 and MultiLayer Perceptron (MLP). We obtain the best results using RF (89.23%) and SVM (88.96%). The results are showing in “Table III”. The set of features from MeltdownCrisis dataset record encouraging results. The “Table III”, shows the results obtained by RF and SVM classifiers with and without features selection methods. The results exhibited a classification rate of 87.53% with RF and 87.71%with SVM for Information Gain algorithm with average rank which found a set of 8 selected attributes (D4, D6, D3, D7, D1, D2, D8, D9). In fact, the average rank is the average of the relevance weights for each features. In addition, The Relieff algorithm with average rank and the wrapper method with 10 cross validation give the same 9 relevant features (MouthOpen, D1, D2, D3, D5, D6, D8, D9, D10). With these selected features RF classifier gives us 88.31% of accuracy and SVM classifier 88.16%. With Relieff algorithm without average rank, the selected displayed features are D9, D2, D5, D8, D3, Mouth Open, D6, D4 and D7. The results show an accuracy of 88.39% with RF and 88.29% with SVM. For the Information Gain algorithm without average rank, the selected features are D4, D6, D3, D7, D1, D2, D8, D9, D10, D5, MouthMoved and MouthOpen. The obtained results are 88.59% with RF and 88.49% with SVM. Therefore, the best results are recorded by the use of the totality of proposed features which we obtain 89.23% with RF and 88.96% with SVM. The bar chart in “Fig. 5” shows a comparison between the results. And from it, we could notice that the RF gives better results than SVM. However, the above results show a high accuracy when we a greater number of features of our approach with our MeltdwounCrisis dataset. Also, the results still accurate even when the model used by another children as it has been proved by the validation dataset. Fig. 3. Child in post-Meltdown crisis state and in Meltdown crisis state. IV. ”MeltdownCrisis ” DATASET EVALUATION In this section, we evaluate a set of facial expressions features extracted from MeltdownCrisis dataset. From MeltdownCrisis dataset (59 videos), we prepare the training, testing and validation data covering all the scenarios. The learning dataset (8806 samples of 10 autistic children) is divided into 2 parts: 70% for learning and 30% for testing. Other 3151 samples (unseen videos of 3 autistic children) are used for validation. To accomplish this evaluation task, firstly we adopt a geometric approach [21] to extract a set of 15 continuous features which five faces proprieties [20] proposed by Kinect 243 Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply. TABLE I M ELTDWON C RISIS DATASET DESCRIPTION State Post-Meltdown Crisis Video NB 18 Children Scenarios NB 1 girl The girl makes stereotypical movements and shows abnormal facial expressions very rapidly Facial expressions Behaviors Challenges Nervous Smiles. |Showing grimace of anger. |Howl by opening mouth and closing eyes Stereotyped movement with hands and head. |Fixing eyes at high ceiling. |Biting himself. |Head in rapid motion 1 girl Child moves in the room in a very fast and continuous way. |She presents stereotyped behaviors with her hands and head with the presence of angry expressions. The child moves his eyes to different parts of the room in a quick and uncontrolled way. The child is in an uncomfortable situation. He is stimulated by the agitation of his comrade. The girls presents abnormal behaviors when they are in postmeltdown crisis with stereotypical movements with their head and hands Children are in uncomfortable condition , cannot stand in closed room. Showing grimace of anger |Howl by closing eyes Stereotyped movement with hands and head. |Biting herself. |Biting the toys. The girl is not engaged with the camera.|The presence of a boy who stands. |He disrupts the field of acquisition. |Childs head is still moving very fast The girl is not engaged with the camera. |The girl is agitated. Nervous Smiles Moving quickly in the room Child is not engaged with camera Nervous Smiles Biting himself Child is not engaged with camera Showing grimace of anger. |Howl by closing eyes. |Nervous Smiles Stereotyped movement with head and hands The girls are not engaged with the camera. Crying |Howl by opening mouth. |Showing grimace of anger The girl is in crisis and her face cannot be captured in a continuous way. The girl in meltdown crisis of very rare type, she shows no abnormal behavior but her eyes are fixed with loss of control. child moves in the room quickly and he gets on the ground and hits his feet.|He is in a state of severe crisis. The child moves into the room quickly. He cannot stand being in a closed room. |Put his hands over his ears. The child is in a state of severe crisis, he gets on the ground and he strikes him with his arms and his head. The child moves in the room in a very fast way. Fixing eyes Hitting their feet on the floor |Hitting themselves. |Moving head very fast. |Striking hands on each other. |Jumping by hitting their feet on the floor Losing of control over her body Howl by closing eyes.|Nervous Smiles.|Showing grimace of anger. Howl by closing eyes. |Showing grimace of anger Putting his hands over his ears.|Hitting his feet in the floor. Child is not engaged with camera Putting his hands over his ears. Child is not engaged with camera Howl by opening mouth and closing eyes.|Showing grimace of anger. Striking himself with his arms and his head.|Hitting himself. Child is not engaged with camera. |The child is very agitated. Nervous smiles. |Showing grimace of anger Putting his hands over his ears. 1 boy The child cannot stand to be in a closed space and refuses to practice an educational activity. 1 boy In order to have a toy, the child expresses himself in an aggressive way by screaming and closing his eyes Child plays with doll Nervous smiles. |Showing grimace of anger. |Howl by opening mouth and closing eyes Nervous smiles.|Showing grimace of anger. |Howl by opening mouth and closing eyes Smiles Moving very fast in the room. |Hitting his feet on the floor. |Putting his hands over his ears. Moving in the room with stereotyped movements. with hands. Child is not engaged with camera. |The child is very agitated. Child is not engaged with camera. |The child is very agitated. 1 boy 1 boy 2 girls 1 girl 1 boy Meltdwon Crisis 18 1 girl 1 boy 1 boy 1 boy 1 boy 1 girl Normal State 23 4 boys 1 girl 1 boy 1 girl Children are not engaged with their caregivers and they are isolated in their world. Child makes stereotyped movement with the toys Girl combs her hair Neutral Smiles Neutral The girl is not engaged with the camera. Child is not engaged with camera Child plays with doll without abnormal physical activities. Play with toys.|Practice educational activities. Child is not engaged with camera. Make stereotyped movement with toys. Combs her hair. Child is not engaged with camera. Child is not engaged with camera Children are not engaged with camera. 244 Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply. TABLE II F EATURES D ESCRIPTION Features D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 Left Eye Closed Looking Away Mouth Moved Mouth Open Right Eye Closed Description Distance between Eye Left and Eye Right [21] Distance between Eye Left and Nose [21] Distance between Eye left and Mouth Corner Right [21] Distance between Eye left and Mouth Corner Left [21] Distance between Eye Right and Nose [21] Distance between Eye Right and Mouth Corner Right [21] Distance between Eye Right and Mouth Corner Left [21] Distance between Mouth Corner Left and Mouth Corner Right [21] Distance between Mouth Corner Left and Nose [21] Distance between Mouth Corner Right and Nose [21] The user’s left eye is closed The user is looking away. The user’s mouth moved. The user’s mouth is open. The user’s right eye is closed. Fig. 4. The 10 geometric Distances : D1, D2, D3, D4, D5, D6, D7, D8, D9 and D10 Fig. 5. Classification rates with RF and SVM classifiers. 245 Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply. TABLE III R ESULTS OF RF AND SVM Feature Selection Method Without Feature Selection method Relief-f Algorithm with average Relief-f Algorithm without average Information Gain Algorithm with average Information Gain Algorithm without average Wrapper method Algorithm RF SVM RF SVM RF SVM RF SVM RF SVM RF SVM Training 89,23% 88,96% 88,31% 88,16% 88,39% 88,29% 87,53% 87,71% 88,59% 88,40% 88,31% 88,16% AC Testing 87,10% 86,41% 86,14% 85,84% 86,93% 86,06% 86,19% 86,36% 86,93% 86,06% 86,14% 85,84% Validation 93,60% 92,80% 93,85% 92,83% 93,36% 92,62% 92,68% 92,92% 93,54% 92,46% 93,91% 92,83% VI. C ONCLUSION Training 89,0% 8,92% 88,3% 88,2% 88,4% 88,3% 87,5% 87,7% 88,6% 88,4% 88,3% 88,2% TP Testing 87,1% 86,4% 86,1% 85,8% 86,9% 86,1% 86,2% 86,4% 86,8% 86,1% 86,1% 85,8% Validation 93,6% 92,8% 93,9% 92,8% 93,4% 92,6% 92,7% 92,9% 93,5% 92,5% 93,9% 92,8% Training 1,14% 1,18% 1,20% 1,31% 1,20% 1,29% 1,29% 1,35% 1,17% 1,28% 1,20% 1,31% FN Testing 1,32% 1,50% 1,41% 1,58% 1,33% 1,55% 1,41% 1,51% 1,34% 1,55% 1,41% 1,58% Validation 0,74% 0,75% 0,71% 0,76% 0,77% 0,76% 0,85% 0,74% 0,75% 0,79% 0,70% 0,76% [10] O. Rihawi, D. Merad, and J.-l. Damoiseaux, “3d-ad: 3d-autism dataset for repetitive behaviours with kinect sensor,” in 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2017, pp. 1–6. [11] L. Zhao, Z. Wang, and G. Zhang, “Facial expression recognition from video sequences based on spatial-temporal motion local binary pattern and gabor multiorientation fusion histogram,” Mathematical Problems in Engineering, vol. 2017, 2017. [12] L. Yi, P. C. Quinn, Y. Fan, D. Huang, C. Feng, L. Joseph, J. Li, and K. Lee, “Children with autism spectrum disorder scan own-race faces differently from other-race faces,” Journal of experimental child psychology, vol. 141, pp. 177–186, 2016. [13] O. Golan and S. Baron-Cohen, “Systemizing empathy: Teaching adults with asperger syndrome or high-functioning autism to recognize complex emotions using interactive multimedia,” Development and psychopathology, vol. 18, no. 2, pp. 591–617, 2006. [14] A. Postawka and P. Śliwiński, “A kinect-based support system for children with autism spectrum disorder,” in International Conference on Artificial Intelligence and Soft Computing. Springer, 2016, pp. 189– 199. [15] M. A. Nicolaou, H. Gunes, and M. Pantic, “Automatic segmentation of spontaneous data using dimensional labels from multiple coders,” in Proc. of LREC Int. Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality. Citeseer, 2010, pp. 43–48. [16] J. Kossaifi, G. Tzimiropoulos, S. Todorovic, and M. Pantic, “Afew-va database for valence and arousal estimation in-the-wild,” Image and Vision Computing, vol. 65, pp. 23–36, 2017. [17] R. Calvo, S. D’Mello, J. Gratch, A. Kappas, M. Lhommet, and S. Marsella, “Expressing emotion through posture and gesture,” 2015. [18] E. Marinoiu, M. Zanfir, V. Olaru, and C. Sminchisescu, “3d human sensing, action and emotion recognition in robot assisted therapy of children with autism,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2158–2167. [19] H.-H. Wu and A. Bainbridge-Smith, “Advantages of using a kinect camera in various applications,” University of Canterbury, 2011. [20] Microsoft, “Kinect studio v20,” 2016. [Online]. Available: https://www.microsoft.com/en-us/download/details.aspx?id=44561 [21] F. Z. Salmam, A. Madani, and M. Kissi, “Emotion recognition from facial expression based on fiducial points detection and using neural network,” International Journal of Electrical and Computer Engineering, vol. 8, no. 1, p. 52, 2018. [22] P. Boyer, Algèbre et géométries. Calvage & Mounet, 2015. [23] Y. Zhang, S. Wang, P. Phillips, and G. Ji, “Binary pso with mutation operator for feature selection using decision tree applied to spam detection,” Knowledge-Based Systems, vol. 64, pp. 22–31, 2014. [24] O. Villacampa, “Feature selection and classification methods for decision making: a comparative analysis,” 2015. [25] O. D. Lara and M. A. Labrador, “A survey on human activity recognition using wearable sensors,” IEEE communications surveys & tutorials, vol. 15, no. 3, pp. 1192–1209, 2012. In this study, we describe the setting and steps for building Meltdown crisis scenario datasets. The proposed MeltdownCrisis dataset is a feature-based and it covers all the stream types provided from the Kinect: color, body (skeleton), depth, infrared and body index. To facilitate its exploitation, the dataset is organized in terms of scenarios covering 13 specific case of meltdown crisis scenarios: it covers 23 videos for children are in normal state, 18 videos for children in postcrisis state and 18 videos for children in meltdown crisis. A set of features from our MeltdwownCrisis dataset is evaluated using many classifier algorithm and the best results obtained by RF algorithm classifier. In our future works, we will investigate the temporal aspect, evaluate the skeleton features and we explore deep learning algorithms. R EFERENCES [1] A. Zunino, P. Morerio, A. Cavallo, C. Ansuini, J. Podda, F. Battaglia, E. Veneselli, C. Becchio, and V. Murino, “Video gesture analysis for autism spectrum disorder detection,” in 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018, pp. 3421– 3426. [2] D. Control and Prevention, 2018. [Online]. Available: https://www.cdc.gov/ [3] M. Bennie, “Tantrum vs autistic meltdown: What is the difference,” Autism Awareness, 2016. [4] Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE multimedia, vol. 19, no. 2, pp. 4–10, 2012. [5] T. Guha, Z. Yang, A. Ramakrishna, R. B. Grossman, D. Hedley, S. Lee, and S. S. Narayanan, “On quantifying facial expression-related atypicality of children with autism spectrum disorder,” in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2015, pp. 803–807. [6] W. Liu, M. Li, and L. Yi, “Identifying children with autism spectrum disorder based on their face processing abnormality: A machine learning framework,” Autism Research, vol. 9, no. 8, pp. 888–898, 2016. [7] S. Jaiswal, M. F. Valstar, A. Gillott, and D. Daley, “Automatic detection of adhd and asd from expressive behaviour in rgbd data,” in 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 2017, pp. 762–769. [8] M. L. Anjum, O. Ahmad, S. Rosa, J. Yin, and B. Bona, “Skeleton tracking based complex human activity recognition using kinect camera,” in International Conference on Social Robotics. Springer, 2014, pp. 23–33. [9] C. Sinthanayothin, N. Wongwaen, and W. Bholsithi, “Skeleton tracking using kinect sensor & displaying in 3 d virtual scene,” International Journal of Advancements in Computing Technology, vol. 4, no. 11, 2012. 246 Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply.