Uploaded by Md Rony Ahmed

SITIS.2019.00048 (1)

advertisement
2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)
MeltdownCrisis: Dataset of Autistic Children during
Meltdown Crisis
1st Marwa Masmoudi
2nd Salma Kammoun Jarraya
3rd Mohamed Hammami
Mir@cl Laboratory
Mir@cl Laboratory
Mir@cl Laboratory
University of Sfax
CS Department,
CS Departement Faculty of Science
Sfax, Tunisia
Faculty of Computing and Information Technology
Sfax, Tunisia
marwa.masmoudi19@gmail.com
King Abdulaziz University
mohamed.hammami@fss.rnu.tn
Jeddah, KSA, Saudi Arabia
smohamad1@kau.edu.sa
Among the preventive measures, we try to examine the
development vision-based applications in order to ensure the
safety of autistic children, and to alert the caregiver by
meltdown crisis appearance. The improvement of this application chiefly relies on datasets for the sake of evaluation
and comparison. We cannot deny the importance of datasets/
benchmarks in the advancement of new approaches compared
with the existing ones, meanwhile their shortage remains an
important matter.
In Autism application domains, few available datasets have
many limitations: some do not have open access, others do
not cover a wide range of realistic scenarios, stream types,
visual information, etc. Furthermore, we note that there is no
database for children with autism during a meltdown crisis
recorded in realistic conditions. So, collecting our ”Meltdown
Crisis ” dataset in realistic condition deal with this problematic
is our challenge in this work.
To have a wide range of usability, the dataset should have
a maximum type of information, and should be organized to
facilitate semantic access. Even though the semantic organization/access highly depends on the application, it can be defined
based on scenarios common to the application domain. For
instance, in the domain of applications destined to autistic
children, the semantics can be defined in terms of two main
scenario classes: ”Normal” vs. ”Abnormal”, or ”Comfortable”
vs. ”Meltdown Crisis” States.
This paper introduces a ”Meltdown Crisis” dataset which is
arranged according meltdown crisis scenarios have to do with
an abnormal facial expression or abnormal activities or both
of them. To achieve our goal, first we describe the dataset
through the use of videos captured by a Microsoft Kinect
camera which allow full-body 3D motion capture, and facial
and vocal recognition capabilities [4]. It has been used in
various systems; to identify emotion through facial expressions
[5], [6], [7], track 3D skeleton [8], [9] and discover different
type of activities a person can do [10].
The rest of this paper is organized as the following: Section
II gives an overview about existing autism datasets. Section III
presents the different meltdown crisis scenarios and provides
details about the Meltdown Crisis dataset. In Section IV, we
Abstract—No one refutes the importance of datasets in the
development of any new approach. Despite their importance,
datasets in computer vision remain insufficient for some applications. Presently, very limited autism datasets associated with
clinical tests or screening are available, and most of them are
genetic in nature. However, there is no database that combines
both the abnormal facial expressions and the aggressive behaviors
of an autistic child during Meltdown crisis. This paper introduces
a Meltdown Crisis, a new and rich dataset that can be used for
the evaluation/development of computer vision-based applications
pertinent to children who suffer of autism as security tool, e.g.
Meltdown crisis detection. In particular, the ”MeltdownCrisis
” dataset includes video streams captured with Kinect which
offers a wide range of visual information. It is divided on a
facial expressions data and physical activities data. The current
”MeltdownCrisis ” dataset version covers several Meltdown crisis
scenarios of autistic children along various normal state scenarios
grouped into one set. Each scenario is represented through a rich
set of features that can be extracted from Kinect camera.
Index Terms—Autism, autistic children, Meltdown Crisis, behavior, facial expression, physical activities
I. I NTRODUCTION
Autism Spectrum Disorders (ASD) represents a heterogeneous set of neurodevelopmental disorders characterized by
deficits in social communication and reciprocal interactions
as well as stereotypical behaviors [1]. The prevalence of
ASD has been increasing over the past two decades, with a
global population prevalence of about 1% worldwide [2]. The
diagnosis of an autistic child can be confirmed only after the
age of 2 and 3 years from these abnormal facial expressions
and activities.
There are many facilities which enable autistic children to
overcome their disability in daily life. However, these facilities
dont allow to meet exceptional needs. Some of these needs
are related to the autistic children security during an autism
crisis. A child with high-grade autism frequently goes through
a dangerous autistic crisis (Meltdown). According to [3], the
Meltdown results from an overload and can occur when the
child is alone. Children who go through a meltdown have no
control over their actions; they can hurt themselves or others
even if they dont want.
978-1-7281-5686-6/19/$31.00 ©2019 IEEE
DOI 10.1109/SITIS.2019.00048
239
Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply.
evaluate it using a simple approach as an example. Section V
displays evaluation results. Finally, section VI draws conclusion of the whole work and presents our future works.
year to create it. The subjects were displayed in emotional
facial expression videos (reference stimuli) from the Mind
Reading CD - a psychology resource [13]. They were ordered
to imitate those expressions. There are two predefined and
similar sets of expressions with 18 tasks for each set. Each
one of these subjects has to mimic one set of expressions, i,e,
18 different expressions. These expressions include smiling,
frowning, being tearful, etc., and belong to one of the six basic
emotion groups - Anger, Fear, Disgust, Happiness, Sadness
and Surprise. Data were collected from 32 facial markers worn
by each child through the use of 6 MoCap cameras at 100 fps.
Four stability marks were placed on the forehead and ears,
and they are used to measure and correct head motion. The
positions of the remaining 28 marks are recomputed according
to the stability marks to show movement caused by head
motion. Thus, we can focus on expression-related motion.
These marks provide the necessary data for analysis of facial
expressions.
[7] gathered a Kinect Data for Objective Measurement
of ADHD and ASD (KOMAA) dataset which include video
recordings of a total of 55 subjects. The duration of each video
is approximately 12 minutes. It is recorded through the use a
Kinect 2.0 device which is capable of capturing high resolution
RGB and depth images. All participants who undergo this
experience were over the age of 18. During recordings, they
sit in front of a computer screen to read and listen to a set
of 12 piece of news. Each one of them is accompanied by
2-3 questions to be answered by these participants orally.
Topics in this database can be categorized as the following;
the first one is a control group that includes subjects who
have no symptoms of ADHD / ASD and who have never been
diagnosed with either Attention Deficit Hyperactivity Disorder
(ADHD) or ADHD. The other three categories include the
ASD group (consist of subjects with ASD), the ADHD group
(subjects with ADHD) and the ASD ADHD group (subjects
with ADHD and ASD). KOMAA dataset are used to evaluate
an application that meant automatically to help in screening
people who are suffering from ADHD and ASD by extracting
Facial animation units (AnUs). They adapted a dynamic deep
learning method to identify facial units from RGB data.
Overall, all the works mentioned above have developed
different datasets with different purposes. But, these latter
cant cover the several challenges can persist in real conditions
acquisition. In addition, in many cases of crisis the face cannot
be determined and the crisis is characterized by abnormal and
aggressive physical activity. In the next subsection, we will
show a sample of works describing these abnormal activities.
II. E XISTING DATASETS FOR AUTISTS
Since 2011, many datasets have been acquired and built
using the Kinect V2 camera. Kinect with Microsoft’s SDK
offers several advantages over other cameras such as RGB
data, Skeleton detection, and face localization. Moreover, all
these benefits can work even with the presence of more
than one person. In addition, dataset for people with Autism
Spectrum Disorder (ASD) still in their infancy and there have
been limited researches dealing with this topic. In this context,
many different works have been proposed using datasets for
autistic peoples. These datasets can be divided in three groups,
on the one hand works employing facial expressions datasets.
In the other hand many paper dealing with physical activities
datasets, and, finally, using a multi-modal dataset that combine
the facial expressions and the physical activities is treated in
very limited research.
A. Facial autistic dataset
Facial expression is an important means of expressing and
interpreting the emotional and mental states of a human being
[11]. Given the incapacity of the autistic children to express
a true emotion and to communicate properly with the other, a
set of works find that it is a very rich domain to develop and
create new approach and dataset which allows to recognize
the autistic emotions through facial expression.
In [6], authors used a method of machine learning to study
an eye movement dataset from a task of face recognition [12],
to categorize children with, and without ASD. [6] employed
a dataset encompassed three groups of participants: 29 chines
children aged from 4- to 11 with ASD, 29 Chinese TD (typically Developing) children fit with the chronological age, and
another group of 29 Chinese TD children matched with IQ.
Many children with ASD were examined by highly qualified
doctors. The latter’s diagnosis met the diagnostic criteria for
Autism Spectrum Disorder in accord with the DSM-IV 1 .
Children were asked to memorize six faces (three Chinese
faces as the same-race faces, and three Caucasian faces as
the other-race faces), and later they undergo a test to identify
these faces from 18 novel faces, including same- and otherrace faces.
In [5], authors suggested an approach that allows to find
out the characteristics of the facial gestures that cause a
feeling of perceived atypticity among the observers. It also
examines the difference between the subjects TSA and TD
(Typical Development) in term of region predicated upon
the dynamics and expressions’ activity which are proper to
emotion. In order to assess their method, [5] utilized the
MIMICRY dataset. They also used a MoCap marker database
(designed and created by R. Grossman at the Facelab that has
45 subjects (24 with ASD and 21 TD) aged between 9 to 14
1 Diagnostic
B. Physical Activities autistic dataset
Children with ASD are often restricted, inflexible, and even
obsessive in their behaviors, activities, and interests [10].
Different stereotypical and aggressive activities can be shown
by these autistic people. So, the activities recognition of
autistic peoples presents a rich source of data to be treated
in several research works.
and statistical manual of mental disorders, 2018
240
Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply.
In this context, [14] introduced the development of an
integrated system for children with autism (surveillance, rehabilitation and daily life assistance). The hierarchical classifier
for the recognition of human’s position has been developed
as well as the scalable symbols for Hidden Markov Models
has been created. For the acquisition of data, Microsoft Kinect
2.0 depth sensor is used to record the skeleton data for autistic
children, and to create some basic activities which introduced
by psychologists. A few experiments for basic action models
have been conducted. and the results achieved proved to be
robust.
[10]created the first online 3D dataset for autistic peoples.
3D-Autism Dataset (3D-AD) is captured with Kinect sensor.
They explore different categories of autistic repetitive behaviors: static and dynamic ones, simple and complex: in fact,
motions like hands on the face, hands back, tapping ears, head
banging (or rocking back and forth), flicking, hands stimming,
hand moving front of the face, toe walking, walking in circles
and play with a toy from/to different positions repetitively;
have been repeated for at least 10 times with non-autistic
people. Capturing the depth maps was done rate 33 frames
per second with Kinect-v2 camera. The 3D-AD have been
experimentally evaluated using dynamic time warping to detect
these abnormal behaviors. Although, these works recognized
an important number of abnormal activities of autistic children,
they dont cover the issues of abnormal activities that appear
during meltdown crisis.
Furthermore, although facial expressions play a crucial role
in an emotion understanding [15] [16], body language which
is expressed through poses offers complementary information.
Therefore, the multi-modal data set is considered as a big
challenge as it contains facial expression features and skeleton
features as well. Example of this type of datasets is presented
in the next subsection.
cards which are either in the therapists hand or lie on the same
table as the robot. In [18] considered only the RGB + depth
modalities recorded using a Kinect v2 camera (at 30 FPS). The
camera is placed directly above the robot head, towards the
child. The child is facing the camera frontally, but due to the
constraints related to robot’s position and recording cameras,
the upper body is visible most of the time. They developed the
first open access and online dataset for autistic children called
DE-ENIGMA. It contains 2,031 annotated videos picturing
children body movements and facial expressions. Most of
these sequences (749) describe actions performed by children
responding /collaboration to the therapist. In addition, overall,
the dataset is designed to support robust, context-sensitive,
multi-modal and naturalistic human-robot interaction solutions
for enhancing the social imagination skills of such children
[18]. But, dont cover challenges situation and scenarios of
autistic children during meltdown crisis.
Furthermore, our need for ”MeltdownCrisis ” dataset that
covers many crisis cases and scenarios, and most importantly,
that is recorded in all the streams provided by the Kinect
camera to provide for the needs of different techniques.
III. M ELTDOWN CRISIS DATASET: FACIAL EXPRESSIONS
AND ACTIVITIES
In order to overcome the problem of lack of realistic
datasets, we have built a new dataset covering real scenarios
of autistic children behaviors in normal state and during a
Meltdown crisis.
The most critical, delicate and serious mission is the one
of recording videos for autistic children. It is not an easy
task to record them in any culture. A deep research was
conducted on autistic children hosted in healthcare in the world
and particularly in Tunisia. We reached an agreement with
the healthcare center ” ASSAADA” for autistics. This center
supports 23 children aged between 6 and 15 years. The work
has been conducted with the cooperation of professional team
(psychiatrist, psychomotor therapist and others) that helped us
to implement our various tasks such as: studying medical and
psychological records of autistic children, analyzing videos,
and conducting interviews with parents. The outcome that have
been identified in this study is to consider the most severe
Meltdown crisis of 13 autistic children.
The goal of our ”MeltdownCrisis ” dataset construction
is to obtain a vision-based dataset target the Autistic children
and captured in various image-based streams using the Kinect
sensor. To accomplish this goal, first we must determine:
• The Kinect streams to record our ”MeltdownCrisis ”
dataset with.
• The autism behaviors during different Crisis types and
scenarios.
• The facial expressions and physical activities for autistic
children during Meltdown crisis.
Then, we arrange the ”MeltdownCrisis ” dataset content
based on the scenarios, present a detailed description for our
dataset including facial expressions and physical activities that
can be shown during meltdown crisis.
C. Multi-modal Autistic dataset
In Psychology, behavior is defined as a set of observable
reactions in an individual placed in his environment and in
given circumstances. [17] examined the expression of emotion through body movement and posture along with facial
expressions, also, it studied their importance in the context of
embodied conversational agents [10].
In this context, [18] introduced a new fine-grained action
and emotion recognition tasks which are defined on nonstaged videos. In fact, these videos of autistic children were
taken during robot-assisted therapy sessions. The sessions are
assisted by either therapist-only, or robot as well; the former
are those who are interested in this work, while the latter is
used for control purposes. In robot-assisted sessions a child
and a therapist sit in front of a robot which is placed on a
table. The therapist uses a remote- controlled robot to engage
the child in the process of learning emotions. The therapy is
based on scenarios in which the therapist shows cards which
display various emotions (happy, sad, angry, etc.). In fact,
these emotions are reproduced by the robot, and the child has
to match them to those performed. The child has to pick up
241
Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply.
A. ”MeltdownCrisis” dataset Streams
in our dataset. Consequently, we have identified the following
previous scenarios:
• A child cannot support to stay in a closed room.
• A child does not accept to practice an educational activity.
• A child can suddenly lose control over her body.
• A child stimulated by the agitation of his comrade.
• A child cannot support a certain type of clothes tissue.
• A child does not accept to leave his cover.
• A child cannot support a high sound volume.
All these collected previous scenarios preceded children’s
crises. The Scenarios of these crisis will be explained and
described in next section.
For the reason that the Kinect camera combines the RGB
color, depth, skeleton, infrared, body index, and audio into
one single camera [19], we record our ”MeltdownCrisis ”
dataset with all these streams except the audio. By availing of
all these benefits of this camera, we build a rich multi-modal
vision-based dataset to be the first ”MeltdownCrisis ” dataset
available in all these streams. As illustrated in “Fig. 1”, Kinect
can provide four type of streams that can be a rich source for
the computation of various features.
C. ”MeltdownCrisis ” datasetDesign
Our recorded dataset contains three basic categories as
shown in “Fig. 2”:
1) Children in Normal State: in a normal state, an
autistic child can show behaviors that a normal person
can express with his/her ways. Indeed, the child often
presents stereotyped facial expressions and activities, but
he/she is not aggressive. So, in a normal state, behaviors
of an autistic child are different from those of normal
child behaviors.
2) Children in Post-Crisis State: in a post-crisis situation,
a child exhibits abnormal repetitive behaviors such as
nervous smile, put his/her hands over his/her ears, makes
stereotyped movement with his/her hands, and moved
quickly etc.
3) Children in Meltdown Crisis State: in meltdown
crisis state an autistic child expresses a rapid and subtle combination of several emotions of anger, sadness,
disgust, fearful, and aggressive physical activities such
as: striking himself, hitting the other, biting himself,
bashing against the ground with hands, putting himself
on the floor and moving quickly while doing some
rolling etc. These behaviors are expressed intensively
and aggressively in a very fast and stereotyped way.
Fig. 1. Image-Based Streams Offered by the Kinect and recorded in our
dataset.
B. Previous Meltdown Crisis Scenarios
A child with a severe degree of autism frequently go through
a dangerous meltdown crisis. The meltdown crisis based on
being overwhelmed by a challenging situation. Children with
autism suffer from sensory, emotional and informational overload, or even too much unpredictability, which can trigger the
appearance of various abnormal behaviors that resemble angry
reactions (crying, screaming or violent gestures). Meltdown
crisis symptoms are variable and are manifested by abnormal
behavior of the autistic child according to several scenarios.
These scenarios are essentially related to stereotypical movements and qualitative impairment of behavior ranging from
partial or total mutism to hyperactivity or hypoactivity, from
aggression to self-mutilation. As we mentioned that Meltdown
crisis can be triggered by a stimulus that makes an autistic
child in an uncomfortable situation. In the international manual
of peadopsychiatry DSM-V and CIM-10 2 , we cannot limit the
number of previous meltdown crisis scenario. That is, each
child is a specific case which has these own stimuli and his
own crisis. In this work, we only define the most frequent
previous scenarios showing by the 13 children participating
2 Classification
Fig. 2. MeltdownCrisis dataset content.
D. ”MeltdownCrisis ” datasetSetting
The videos are recorded using a Kinect V2 camera. This
camera adds simplicity to the extraction of facial features
and uses interaction techniques. Moreover, over time, it has
internationale des maladies, 2008
242
Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply.
and API Face Basics and 10 geometric distances (Euclidian
Distance) [22] computed between the five interest facial points
proposed by Kinect “Fig. 4”. “Table II”presents a detailed
description for the 15 proposed features. Then, to prove the
relevance of these features, we conducted a feature selection
experiment; we apply two filtering methods (Relief-F [23] and
Information Gain (IG) [24]), and a Wrappers method with 10
cross validations. We notice that these experiments have shown
that the use of the totality of proposed features gave us the
highest classification rate.
become a low-cost solution for practicing the capture of
gestural and facial movements. During 3 months, we have
observed and recorded the behavior of the thirteen selected
children in real conditions and during the whole day. Video
acquisition is done in 3 rooms and with predefined settings for
the Kinect; a height varies between 0.5 and 0.8 meters, angles
of horizontal view = 70 and vertical 60 and with a distance
varies between 0.5 and 2 meters.
The Meltdown Crisis dataset is in the form of several video
clips (Extended Event File XEF) recorded through the use
of a single Kinect camera and Kinect Studio v2.0 [20]. As
we have mentioned above, the thirteen selected children are
5 and 9 years-old. All the recorded videos are described in
“Table I”which contain scenarios of child in normal state, child
in post-crisis state and child in meltdown crisis state. These
scenarios cover facial expressions and behavioral variations
of the child during Meltdown crisis and also in normal states
of the autistic children. ”MeltdownCrisis” dataset includes 59
videos: 18 videos during Post-Meltdown crisis and 18 videos
in Meltdown crisis and 23 videos in normal state.
To illustrate the content of MeltdownCrisis dataset, let us
present a case when children are in post meltdown crisis state
and in meltdown crisis state shown “Fig. 3”.
V. R ESULTS AND DISCUSSIONS
The evaluation experiments are based on True Positives
(TP), False Positive (FP), and Accuracy (AC) [25]. TP
is the number of positive instances (emotion frames) that
were classified as positive (emotion). FP is the number of
positive instances (emotion frames) that were classified as
negative (Not emotions). While AC is the overall classification
performance. We evaluate the set of features it using many
classifier algorithms such as Support Vector Machine (SVM),
Random Forest (RF), Random Tree(RT), C4.5 and MultiLayer Perceptron (MLP). We obtain the best results using RF
(89.23%) and SVM (88.96%). The results are showing in
“Table III”.
The set of features from MeltdownCrisis dataset record
encouraging results. The “Table III”, shows the results obtained by RF and SVM classifiers with and without features
selection methods. The results exhibited a classification rate
of 87.53% with RF and 87.71%with SVM for Information
Gain algorithm with average rank which found a set of 8
selected attributes (D4, D6, D3, D7, D1, D2, D8, D9). In fact,
the average rank is the average of the relevance weights for
each features. In addition, The Relieff algorithm with average
rank and the wrapper method with 10 cross validation give
the same 9 relevant features (MouthOpen, D1, D2, D3, D5,
D6, D8, D9, D10). With these selected features RF classifier
gives us 88.31% of accuracy and SVM classifier 88.16%. With
Relieff algorithm without average rank, the selected displayed
features are D9, D2, D5, D8, D3, Mouth Open, D6, D4
and D7. The results show an accuracy of 88.39% with RF
and 88.29% with SVM. For the Information Gain algorithm
without average rank, the selected features are D4, D6, D3,
D7, D1, D2, D8, D9, D10, D5, MouthMoved and MouthOpen.
The obtained results are 88.59% with RF and 88.49% with
SVM. Therefore, the best results are recorded by the use of
the totality of proposed features which we obtain 89.23% with
RF and 88.96% with SVM.
The bar chart in “Fig. 5” shows a comparison between
the results. And from it, we could notice that the RF gives
better results than SVM. However, the above results show a
high accuracy when we a greater number of features of our
approach with our MeltdwounCrisis dataset. Also, the results
still accurate even when the model used by another children
as it has been proved by the validation dataset.
Fig. 3. Child in post-Meltdown crisis state and in Meltdown crisis state.
IV. ”MeltdownCrisis ” DATASET EVALUATION
In this section, we evaluate a set of facial expressions
features extracted from MeltdownCrisis dataset. From MeltdownCrisis dataset (59 videos), we prepare the training, testing
and validation data covering all the scenarios. The learning
dataset (8806 samples of 10 autistic children) is divided into
2 parts: 70% for learning and 30% for testing. Other 3151
samples (unseen videos of 3 autistic children) are used for
validation.
To accomplish this evaluation task, firstly we adopt a
geometric approach [21] to extract a set of 15 continuous
features which five faces proprieties [20] proposed by Kinect
243
Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply.
TABLE I
M ELTDWON C RISIS DATASET DESCRIPTION
State
Post-Meltdown
Crisis
Video
NB
18
Children Scenarios
NB
1 girl
The girl makes stereotypical movements and shows abnormal facial
expressions very rapidly
Facial expressions
Behaviors
Challenges
Nervous Smiles. |Showing
grimace of anger. |Howl
by opening mouth and
closing eyes
Stereotyped
movement
with hands and head.
|Fixing eyes at high
ceiling. |Biting himself.
|Head in rapid motion
1 girl
Child moves in the room in a
very fast and continuous way. |She
presents stereotyped behaviors with
her hands and head with the presence of angry expressions.
The child moves his eyes to different parts of the room in a quick and
uncontrolled way.
The child is in an uncomfortable
situation. He is stimulated by the
agitation of his comrade.
The girls presents abnormal behaviors when they are in postmeltdown crisis with stereotypical
movements with their head and
hands
Children are in uncomfortable condition , cannot stand in closed
room.
Showing grimace of anger
|Howl by closing eyes
Stereotyped
movement
with hands and head.
|Biting herself. |Biting
the toys.
The girl is not engaged with the camera.|The presence of
a boy who stands.
|He disrupts the field
of acquisition. |Childs
head is still moving
very fast
The girl is not engaged with the camera. |The girl is agitated.
Nervous Smiles
Moving quickly in the
room
Child is not engaged
with camera
Nervous Smiles
Biting himself
Child is not engaged
with camera
Showing
grimace
of
anger. |Howl by closing
eyes. |Nervous Smiles
Stereotyped
movement
with head and hands
The girls are not engaged with the camera.
Crying |Howl by opening
mouth. |Showing grimace
of anger
The girl is in crisis
and her face cannot
be captured in a continuous way.
The girl in meltdown crisis of very
rare type, she shows no abnormal
behavior but her eyes are fixed with
loss of control.
child moves in the room quickly
and he gets on the ground and hits
his feet.|He is in a state of severe
crisis.
The child moves into the room
quickly. He cannot stand being in
a closed room. |Put his hands over
his ears.
The child is in a state of severe
crisis, he gets on the ground and
he strikes him with his arms and
his head.
The child moves in the room in a
very fast way.
Fixing eyes
Hitting their feet on the
floor |Hitting themselves.
|Moving head very fast.
|Striking hands on each
other. |Jumping by hitting
their feet on the floor
Losing of control over her
body
Howl
by
closing
eyes.|Nervous
Smiles.|Showing grimace
of anger.
Howl by closing eyes.
|Showing grimace of
anger
Putting his hands over his
ears.|Hitting his feet in the
floor.
Child is not engaged
with camera
Putting his hands over his
ears.
Child is not engaged
with camera
Howl by opening mouth
and closing eyes.|Showing
grimace of anger.
Striking himself with his
arms and his head.|Hitting
himself.
Child is not engaged
with camera. |The
child is very agitated.
Nervous smiles. |Showing
grimace of anger
Putting his hands over his
ears.
1 boy
The child cannot stand to be in a
closed space and refuses to practice
an educational activity.
1 boy
In order to have a toy, the child
expresses himself in an aggressive
way by screaming and closing his
eyes
Child plays with doll
Nervous smiles. |Showing
grimace of anger. |Howl
by opening mouth and
closing eyes
Nervous smiles.|Showing
grimace of anger. |Howl
by opening mouth and
closing eyes
Smiles
Moving very fast in the
room. |Hitting his feet
on the floor. |Putting his
hands over his ears.
Moving in the room with
stereotyped movements.
with hands.
Child is not engaged
with camera. |The
child is very agitated.
Child is not engaged
with camera. |The
child is very agitated.
1 boy
1 boy
2 girls
1 girl
1 boy
Meltdwon
Crisis
18
1 girl
1 boy
1 boy
1 boy
1 boy
1 girl
Normal State
23
4 boys
1 girl
1 boy
1 girl
Children are not engaged with their
caregivers and they are isolated in
their world.
Child makes stereotyped movement with the toys
Girl combs her hair
Neutral
Smiles
Neutral
The girl is not engaged with the camera.
Child is not engaged
with camera
Child plays with doll
without abnormal physical
activities.
Play with toys.|Practice
educational activities.
Child is not engaged
with camera.
Make stereotyped movement with toys.
Combs her hair.
Child is not engaged
with camera.
Child is not engaged
with camera
Children are not engaged with camera.
244
Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply.
TABLE II
F EATURES D ESCRIPTION
Features
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
Left Eye Closed
Looking Away
Mouth Moved
Mouth Open
Right Eye Closed
Description
Distance between Eye Left and Eye Right [21]
Distance between Eye Left and Nose [21]
Distance between Eye left and Mouth Corner Right [21]
Distance between Eye left and Mouth Corner Left [21]
Distance between Eye Right and Nose [21]
Distance between Eye Right and Mouth Corner Right [21]
Distance between Eye Right and Mouth Corner Left [21]
Distance between Mouth Corner Left and Mouth Corner Right [21]
Distance between Mouth Corner Left and Nose [21]
Distance between Mouth Corner Right and Nose [21]
The user’s left eye is closed
The user is looking away.
The user’s mouth moved.
The user’s mouth is open.
The user’s right eye is closed.
Fig. 4. The 10 geometric Distances : D1, D2, D3, D4, D5, D6, D7, D8, D9 and D10
Fig. 5. Classification rates with RF and SVM classifiers.
245
Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply.
TABLE III
R ESULTS OF RF AND SVM
Feature Selection Method
Without Feature
Selection method
Relief-f Algorithm
with average
Relief-f Algorithm
without average
Information Gain
Algorithm with average
Information Gain
Algorithm without average
Wrapper method
Algorithm
RF
SVM
RF
SVM
RF
SVM
RF
SVM
RF
SVM
RF
SVM
Training
89,23%
88,96%
88,31%
88,16%
88,39%
88,29%
87,53%
87,71%
88,59%
88,40%
88,31%
88,16%
AC
Testing
87,10%
86,41%
86,14%
85,84%
86,93%
86,06%
86,19%
86,36%
86,93%
86,06%
86,14%
85,84%
Validation
93,60%
92,80%
93,85%
92,83%
93,36%
92,62%
92,68%
92,92%
93,54%
92,46%
93,91%
92,83%
VI. C ONCLUSION
Training
89,0%
8,92%
88,3%
88,2%
88,4%
88,3%
87,5%
87,7%
88,6%
88,4%
88,3%
88,2%
TP
Testing
87,1%
86,4%
86,1%
85,8%
86,9%
86,1%
86,2%
86,4%
86,8%
86,1%
86,1%
85,8%
Validation
93,6%
92,8%
93,9%
92,8%
93,4%
92,6%
92,7%
92,9%
93,5%
92,5%
93,9%
92,8%
Training
1,14%
1,18%
1,20%
1,31%
1,20%
1,29%
1,29%
1,35%
1,17%
1,28%
1,20%
1,31%
FN
Testing
1,32%
1,50%
1,41%
1,58%
1,33%
1,55%
1,41%
1,51%
1,34%
1,55%
1,41%
1,58%
Validation
0,74%
0,75%
0,71%
0,76%
0,77%
0,76%
0,85%
0,74%
0,75%
0,79%
0,70%
0,76%
[10] O. Rihawi, D. Merad, and J.-l. Damoiseaux, “3d-ad: 3d-autism dataset
for repetitive behaviours with kinect sensor,” in 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance
(AVSS). IEEE, 2017, pp. 1–6.
[11] L. Zhao, Z. Wang, and G. Zhang, “Facial expression recognition from
video sequences based on spatial-temporal motion local binary pattern
and gabor multiorientation fusion histogram,” Mathematical Problems
in Engineering, vol. 2017, 2017.
[12] L. Yi, P. C. Quinn, Y. Fan, D. Huang, C. Feng, L. Joseph, J. Li,
and K. Lee, “Children with autism spectrum disorder scan own-race
faces differently from other-race faces,” Journal of experimental child
psychology, vol. 141, pp. 177–186, 2016.
[13] O. Golan and S. Baron-Cohen, “Systemizing empathy: Teaching adults
with asperger syndrome or high-functioning autism to recognize complex emotions using interactive multimedia,” Development and psychopathology, vol. 18, no. 2, pp. 591–617, 2006.
[14] A. Postawka and P. Śliwiński, “A kinect-based support system for
children with autism spectrum disorder,” in International Conference on
Artificial Intelligence and Soft Computing. Springer, 2016, pp. 189–
199.
[15] M. A. Nicolaou, H. Gunes, and M. Pantic, “Automatic segmentation
of spontaneous data using dimensional labels from multiple coders,”
in Proc. of LREC Int. Workshop on Multimodal Corpora: Advances in
Capturing, Coding and Analyzing Multimodality. Citeseer, 2010, pp.
43–48.
[16] J. Kossaifi, G. Tzimiropoulos, S. Todorovic, and M. Pantic, “Afew-va
database for valence and arousal estimation in-the-wild,” Image and
Vision Computing, vol. 65, pp. 23–36, 2017.
[17] R. Calvo, S. D’Mello, J. Gratch, A. Kappas, M. Lhommet, and
S. Marsella, “Expressing emotion through posture and gesture,” 2015.
[18] E. Marinoiu, M. Zanfir, V. Olaru, and C. Sminchisescu, “3d human
sensing, action and emotion recognition in robot assisted therapy of
children with autism,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2018, pp. 2158–2167.
[19] H.-H. Wu and A. Bainbridge-Smith, “Advantages of using a kinect
camera in various applications,” University of Canterbury, 2011.
[20] Microsoft, “Kinect studio v20,” 2016. [Online]. Available:
https://www.microsoft.com/en-us/download/details.aspx?id=44561
[21] F. Z. Salmam, A. Madani, and M. Kissi, “Emotion recognition from
facial expression based on fiducial points detection and using neural
network,” International Journal of Electrical and Computer Engineering, vol. 8, no. 1, p. 52, 2018.
[22] P. Boyer, Algèbre et géométries. Calvage & Mounet, 2015.
[23] Y. Zhang, S. Wang, P. Phillips, and G. Ji, “Binary pso with mutation
operator for feature selection using decision tree applied to spam
detection,” Knowledge-Based Systems, vol. 64, pp. 22–31, 2014.
[24] O. Villacampa, “Feature selection and classification methods for decision
making: a comparative analysis,” 2015.
[25] O. D. Lara and M. A. Labrador, “A survey on human activity recognition
using wearable sensors,” IEEE communications surveys & tutorials,
vol. 15, no. 3, pp. 1192–1209, 2012.
In this study, we describe the setting and steps for building
Meltdown crisis scenario datasets. The proposed MeltdownCrisis dataset is a feature-based and it covers all the stream
types provided from the Kinect: color, body (skeleton), depth,
infrared and body index. To facilitate its exploitation, the
dataset is organized in terms of scenarios covering 13 specific
case of meltdown crisis scenarios: it covers 23 videos for
children are in normal state, 18 videos for children in postcrisis state and 18 videos for children in meltdown crisis. A
set of features from our MeltdwownCrisis dataset is evaluated
using many classifier algorithm and the best results obtained
by RF algorithm classifier.
In our future works, we will investigate the temporal aspect,
evaluate the skeleton features and we explore deep learning
algorithms.
R EFERENCES
[1] A. Zunino, P. Morerio, A. Cavallo, C. Ansuini, J. Podda, F. Battaglia,
E. Veneselli, C. Becchio, and V. Murino, “Video gesture analysis
for autism spectrum disorder detection,” in 2018 24th International
Conference on Pattern Recognition (ICPR). IEEE, 2018, pp. 3421–
3426.
[2] D.
Control
and
Prevention,
2018.
[Online].
Available:
https://www.cdc.gov/
[3] M. Bennie, “Tantrum vs autistic meltdown: What is the difference,”
Autism Awareness, 2016.
[4] Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE multimedia,
vol. 19, no. 2, pp. 4–10, 2012.
[5] T. Guha, Z. Yang, A. Ramakrishna, R. B. Grossman, D. Hedley,
S. Lee, and S. S. Narayanan, “On quantifying facial expression-related
atypicality of children with autism spectrum disorder,” in 2015 IEEE
international conference on acoustics, speech and signal processing
(ICASSP). IEEE, 2015, pp. 803–807.
[6] W. Liu, M. Li, and L. Yi, “Identifying children with autism spectrum
disorder based on their face processing abnormality: A machine learning
framework,” Autism Research, vol. 9, no. 8, pp. 888–898, 2016.
[7] S. Jaiswal, M. F. Valstar, A. Gillott, and D. Daley, “Automatic detection
of adhd and asd from expressive behaviour in rgbd data,” in 2017
12th IEEE International Conference on Automatic Face & Gesture
Recognition (FG 2017). IEEE, 2017, pp. 762–769.
[8] M. L. Anjum, O. Ahmad, S. Rosa, J. Yin, and B. Bona, “Skeleton
tracking based complex human activity recognition using kinect camera,”
in International Conference on Social Robotics. Springer, 2014, pp.
23–33.
[9] C. Sinthanayothin, N. Wongwaen, and W. Bholsithi, “Skeleton tracking
using kinect sensor & displaying in 3 d virtual scene,” International
Journal of Advancements in Computing Technology, vol. 4, no. 11, 2012.
246
Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 08,2020 at 06:17:12 UTC from IEEE Xplore. Restrictions apply.
Download