Emotional State Classification in Patient–Robot Interaction Using

advertisement
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013
63
Emotional State Classification in Patient–Robot
Interaction Using Wavelet Analysis and
Statistics-Based Feature Selection
Manida Swangnetr and David B. Kaber, Member, IEEE
Abstract—Due to a major shortage of nurses in the U.S., future
healthcare service robots are expected to be used in tasks involving
direct interaction with patients. Consequently, there is a need to
design nursing robots with the capability to detect and respond
to patient emotional states and to facilitate positive experiences
in healthcare. The objective of this study was to develop a new
computational algorithm for accurate patient emotional state
classification in interaction with nursing robots during medical
service. A simulated medicine delivery experiment was conducted
at two nursing homes using a robot with different human-like
features. Physiological signals, including heart rate (HR) and galvanic skin response (GSR), as well as subjective ratings of valence
(happy–unhappy) and arousal (excited–bored) were collected on
elderly residents. A three-stage emotional state classification algorithm was applied to these data, including: 1) physiological feature
extraction; 2) statistical-based feature selection; and 3) a machinelearning model of emotional states. A pre-processed HR signal
was used. GSR signals were nonstationary and noisy and were
further processed using wavelet analysis. A set of wavelet coefficients, representing GSR features, was used as a basis for
current emotional state classification. Arousal and valence were
significantly explained by statistical features of the HR signal and
GSR wavelet features. Wavelet-based de-noising of GSR signals
led to an increase in the percentage of correct classifications of
emotional states and clearer relationships among the physiological
response and arousal and valence. The new algorithm may serve
as an effective method for future service robot real-time detection
of patient emotional states and behavior adaptation to promote
positive healthcare experiences.
Index Terms—Emotions, machine learning, physiological variables, regression analysis, service robots, wavelet analysis.
I. I NTRODUCTION
accuracy and reliability in basic nursing task performance (e.g.,
medication administration). During the past two decades, the
healthcare industry, including hospitals and nursing homes,
have identified the capability of mobile transport robots to
assist nurses in routine patient services. Hospital delivery robots
are now used to automatically transport prescribed medicines
to nurse stations, meals, and linens to patient rooms, and
medical records or specimens to labs. Existing commercial
robots, like the Aethon Tug, are capable of autonomous pointto-point navigation in healthcare environments. Although such
robots have been implemented in many hospitals, they do not
deliver medicines or other healthcare-related materials directly
to patients. There are always nurses who must go between
robots and patients. As a result, current robot designs do not
support direct interaction with patients. The robotics industry
is currently seeking research results on how different features
of robots support interactive tasks or social interaction between
humans and robots as part of healthcare operations.
In nursing operations, patient care requires timely and careful
task performance as well as support of positive patient emotions. If future service robots are to be used in tasks involving
direct interaction with patients, it is important to understand
how patients perceive robots and evaluate robot performance
as a basis for judging healthcare quality. Since emotions play
an important role for patients in communication and interaction
with hospital staff (e.g., describing pain), there is a need to
design service robots to be capable of detecting, classifying and
responding to patient emotions to achieve positive healthcare
experiences.
U
A. Subjective Measures of Emotional States
Manuscript received June 6, 2011; revised February 10, 2012; accepted
June 18, 2012. Date of publication September 12, 2012; date of current version
December 21, 2012. This work was supported in part by the Edward P. Fitts
Department of Industrial and Systems Engineering at North Carolina State
University. This paper was recommended by Associate Editor R. Roberts of the
former IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems
and Humans (2011 Impact Factor: 2.123).
M. Swangnetr is with the Back, Neck, and Other Join Pain Research Group,
Department of Production Technology, Khon Kaen University, Khon Kaen
40002, Thailand (e-mail: manida@kku.ac.th).
D. B. Kaber is with the Edward P. Fitts Department of Industrial and Systems
Engineering, North Carolina State University, Raleigh, NC 27695-7906 USA
(e-mail: dbkaber@ncsu.edu).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TSMCA.2012.2210408
Russell’s [2] theory of emotion is the most widely accepted
in psychology and contends that all emotions can be organized
in terms of continuous dimensions. The theory includes
a 2-D emotion space, defined by valence (pleasantness/
unpleasantness)
and
arousal
(strong
engagement/
disengagement), as presented in Fig. 1. Watson and Tellegen
[3] suggested that the axes of the emotion space should pass
through regions where emotion labels used by individuals are
most densely clustered. They conducted a factor analysis with
varimax rotation resulting in a model, including positive affect
(PA) and negative affect (NA), which was a 45◦ rotation from
Russell’s [2] valence-arousal model. However, Russell and
Feldman-Barrett [4] showed that when placing more emotion
terms into the 2-D space, the increased term density provided
SE OF service robots in healthcare operations represents
a potential technological solution to reducing overloaded
nursing staffs in hospitals [1] for critical tasks and to increase
2168-2291/$31.00 © 2012 IEEE
64
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013
Fig. 2.
Fig. 1. Two-dimensional valence-arousal structure of affect (adapted from
Watson & Tellegen, 1985).
no further guidance for factor model rotation. Reisenzien [5]
also offered that the valence-arousal model provided conceptually separate building blocks of core affective feelings. For
example, Reisenzien said enthusiasm is a PA that is a blend of
pleasantness and strong engagement. He also said distress is a
NA that is a blend of unpleasantness and strong engagement.
For this reason, in the present study, Russell’s [2] 2-D model of
emotion was used.
Subjective measures of emotion include self-reports, interviews on emotional experiences, and questionnaires on which
a participant identifies images of expressions or phrases that
most closely resemble their current feelings. Previous research
has used image-based self-report measures for assessing human
emotional states in terms of valence and arousal, such as the
self-assessment Manikin (SAM) [6]. Image-based questionnaires are designed to overcome the disadvantage of subjects
having to label emotions, which can lead to inconsistency in
responses. The SAM consists of pictures of manikins representing five states of arousal (ranging from “excited” to “bored”)
and five states of valence (ranging from “very happy” to “very
unhappy”). Subjects can rate their current emotional state by
either selecting a manikin or marking in a space between two
manikins, resulting in a nine-point scale.
B. Physiological Measures of Emotional States
Physiological responses have also been identified as reliable
indicators of human emotional and cognitive states. They are
considered automatic outcomes of the autonomous nervous
system (ANS), primarily driven by emotions. The ANS is composed of two main subsystems: the sympathetic nervous system (tends to mobilize the body for emergencies—the “fight”
response) and the parasympathetic nervous system (tends to
conserve and store bodily resources—the “flight” response).
Among several physiological measures, heart rate (HR) and
galvanic skin response (GSR) are the two most commonly used
for revealing states of arousal and valence [7]. These responses
are also relatively simple and inexpensive to measure with
minimal intrusiveness to subject behavior [8].
1) HR: HR is usually measured in milliseconds by detecting “QRS” wave complexes in an electrocardiographic (ECG)
GSR waveform.
record and determining the intervals between adjacent “R”
wave peaks. HR, in beats per minute (bpm), can be directly
calculated from “RR” intervals. There are also several statistical
measures that can be determined on HR (see Malik et al. [9] for
a comprehensive description). Common features used in studies
of human emotion and cognitive states include: mean HR (in
bpm) and the standard deviation of HR (SDHR), also expressed
in bpm.
HR has been previously used to differentiate between user
positive and negative emotions in human–computer interaction
(HCI) tasks. Mandryk and Atkins [10] developed fuzzy rules,
based on a literature review, defining how physiological signals
related to arousal and valence. They asserted that when HR
is high, arousal and valence are also high. However, other
studies have shown that there are no observed HR differences
between positive and negative emotions [11]–[13]. Therefore,
the relationships between HR and valence and arousal may not
be definite. Due to the practicality of HR for emotion state
classification, further assessment of these relationships was
conducted in this study.
2) GSR: GSR measures electrodermal activity in terms of
changes in resistance across two regions of skin. A voltage
is applied between two electrodes attached to the skin and
the resulting current is proportional to the skin conductance
(SC, μSiemens), or inversely proportional to the skin resistance
(μOmhs). The response is typically large and varies slowly over
time; however, it has been found to fluctuate quickly during
mental, physical, and emotional arousal. A GSR signal consists
of two main components: SC level (SCL) or tonic level—this
refers to the baseline level of response; and SC response (SCR)
or phasic response—this refers to changes from the baseline
causing a momentary increase in SC (i.e., a small wave superimposed on the SCL). SCR normally occurs in the presence of
a stimulus; however, a SCR that appears during rest periods, or
in absence of stimuli, is referred to as “nonspecific” SCR (NSSCR). Fig. 2 illustrates a typical GSR waveform.
Dawson et al. [7] identified GSR measures related to emotional state, including: SCL; change in SCL; frequency of NSSCRs; SCR amplitude; SCR latency; SCR rise time; SCR half
recovery time; SCR habitation (number of stimuli occurring
before no response); and slope of SCR habitation. The most
commonly used measure is the amplitude of SCR. Change in
SC is widely accepted to reflect cognitive activity and emotional
response with a linear correlation to arousal. Specifically, SC
has been found to increase as arousal increases [7], [10], [14],
SWANGNETR AND KABER: EMOTIONAL STATE CLASSIFICATION IN PATIENT–ROBOT INTERACTION
[15]. However, the relationship between SC and valence is not
definite. Although Dawson et al. [7] report that in most studies
SC does not distinguish positive from negative valence stimuli,
other research has demonstrated some relationship between SC
and valence. For example, increases in the magnitude of SC
have been associated with negative valence [15], [16].
Relatively few studies have examined the psychological significance of SCL and NS-SCRs produced during the performance of on-going tasks. Dawson et al. [7] reported relations
between these measures and task engagement as well as emotions. Typically, SCL increases about 1 μSiemen above resting
level during anticipation of a task and then increases another
1 to 2 μSiemens during performance. Pecchinenda and Smith
[17] measured NS-SCR rate, and maximum amplitude and
slope of SCL during a difficult problem-solving task. Results
showed that SC activity increased at the start of trials but
decreased by the end under the most difficult condition.
C. Current Modeling Approaches for Classifying
Emotional States
Although physiological signals can be noisy and some may
lack definitive relationships with emotional states, numerous
studies have been conducted on emotional state classification
using such objective data (e.g., [10], [15]–[20]). Emotion modeling approaches vary in terms of both physiological variable inputs and classification methods. Analysis of variance
(ANOVA) and linear regression are the most common for
identifying physiological signal features that may be significant
in differentiating among emotional states and selecting sets of
significant features for predicting emotions, respectively [18],
[19]. However, these approaches make the assumption of linear
relationships between physiological responses and emotional
states.
Fuzzy logic models are an alternative classification approach
to deal with nonlinear relationships and uncertainty among system inputs and outputs. Such models can represent continuous
processes that are not easily divided into discrete segments;
that is, when a change from one linguistically defined state to
another is not clear. Mandryk and Atkins [10] developed a fuzzy
logic model to transform HR, GSR, and facial electromyography (EMG) for smiling and frowning to arousal and valence
states during video game play. A second model was used to
classify arousal and valence into five lower level emotional
states related to the gaming situation, including: boredom, challenge, excitement, frustration, and fun. Model results revealed
the same trends as self-reported emotions for fun, boredom,
and excitement. This approach has the advantage of describing
variations among specific emotional states during the course of
a complete emotional experience. The major drawback is fuzzy
rules used in a fuzzy system for classification problems must
be constructed manually by experts in the problem domain.
However, as described above, previous research has not been
able to clearly define the relationships between physiological
responses (e.g., HR and GSR) and specific emotional states
(valence and arousal).
Machine learning approaches have also been used to deal
with nonlinear relationships and uncertainty in emotion clas-
65
sification based on physiological responses. Machine learning
algorithms can automatically generate models, including rules
and patterns, from data. Supervised learning algorithms are
used to identify functions for mapping inputs to desired outputs
based on training data and are later validated against test data.
Artificial neural networks (ANN) are a common form used for
human emotional state classification. Lee et al. [16] applied a
multilayer perceptron (MLP) network to recognize emotions
from the standard deviation of ECG RR intervals (in ms), the
root mean-square of the difference in successive RR intervals
(in ms), meanHR, the low-frequency/high-frequency ratio of
HR variability (HRV) and the SC magnitude for GSR. By using
ratings from the SAM questionnaire as desired outputs, the network was able to learn sadness, calm pleasure, interesting pleasure, and fear with correct classifications of 80.2%. Lisetti and
Nasoz [20] compared three different machine learning algorithms, including k-nearest neighbor (KNN), discriminant function analysis (DFA), and a neural network using a Marquardt
backpropagation (MBP) algorithm, for emotion classification
in a HCI application. They used minimum, maximum, mean,
and variance values of normalized GSR, body temperature, and
HR as algorithm inputs while emotion states/outputs (sadness,
anger, fear, surprise, frustration, and amusement) were elicited
based on movie clips. Results showed that emotion recognition
by the KNN was 72.3% accurate, the DFA was 75.0% accurate,
and the MBP NN was 84.1% accurate.
Instead of learning from labeled data, unsupervised learning approaches automatically discover patterns in a data set.
Amershi et al. [15] applied an unsupervised clustering technique to affective expressions in educational games. They
identified several influential features from SC, HR, and EMG
for smile and frown muscles. Results showed that only a few
statistical features of the responses (e.g., mean and standard
deviation) were relevant for defining clusters. In addition, clustering was able to identify meaningful patterns of reactions
within the data.
Some studies have been conducted on recognizing human
emotional states when interacting with robots. Itoh et al. [21]
developed a bioinstrumentation system to measure human
stress states during HRI. However, stress responses do not
strictly covary with emotional responses. For example, different
levels of stress may occur with an emotional response of fear.
Moreover, this study only used HRV as an indicator of stress
based on a defined classification rule in which the ratio of
sympathetic/parasympathetic responses increases if a human
feels stress. Kulic and Croft [22] used physiological responses
to assess subject emotional states of valence and arousal by
using the motion of a robot manipulator arm as a stimulus. The
emotional state assessment was based on a rule-based classification model constructed from a literature review. Subjective
responses were used separately to measure participant discrete
emotion responses to the robot motion. A series of studies by
Liu and Sarkar et al. [23]–[25] developed a human emotional
state assessment model based on physiological responses in a
HCI scenario. The model was then used for real-time emotion
classification in the same scenario. However, the manner in
which a human interacts with a robot is similar but not identical
to interactions between a human and a computer [26].
66
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013
D. Motivation and Objective
Real-world applications for interactive robots in hospital
environments are being developed. Physiological responses,
such as HR and GSR signals from patients can be monitored,
with minimal intrusiveness, in real-time in hospitals for determining patient status. Therefore, it is possible that physiological
measures can be extracted and emotional states classified in
real-time during patient interaction with robots. This situation
provides a basis for performing real-time robot expression
modification according to current human emotional states and
to ensure quality in robot-assisted healthcare operations.
Although there are several on-going research studies on
emotional state identification based on physiological data,
the literature reveals relatively few classification models for
recognizing human emotions when interacting with service
robots, particularly nursing robots in medicine delivery tasks.
Some methods of emotional classification have been adopted
and/or modified for testing in human interaction with humanoid
robots (e.g., [21]). However, the manner and circumstances
in which patients interact with a robot may be similar but
are not identical to interactions between a healthy human and
a robot in other situations. As nursing robots are expected
to become common in hospitals of the future, it is important to
develop accurate methods for assessing patient responses to
such robots providing medical services. It also appears that
the relationships between physiological responses (e.g., HR
and GSR) and emotional states (valence and arousal) are not
well-defined. Therefore, further assessment is needed through
sensitive physiological feature identification along with robust
emotional state classification modeling.
One major problem when analyzing physiological signals is
noise interference. Additional signal processing must be applied to attenuate noise without distorting signal characteristics.
Since physiological signals are nonstationary [27] and may
include random artifacts and other unpredictable phenomena,
they are problematic for several signal processing methods, such
as fast Fourier transform (FFT). A wavelet transform, which is
a tool for analysis of transient, nonstationary or time-varying
phenomena, is often useful for such processing.
Prior studies have represented stochastic physiological signals using statistical features (based on expert domain knowledge) to classify emotional states. Unfortunately, information
can be lost with such features as simplifying assumptions are
made, including knowledge of the probability density function
of the data. Furthermore, there may be signal features that
have not been identified by experts, but have the potential to
significantly improve emotion classification accuracy. It has
been suggested that signal processing features may be useful
for this purpose [27].
Considering these research issues, the objectives of the
present study were to:
1) Assess the relationships between physiological responses, including HR and GSR, and emotional states
in terms of valence and arousal during patient–robot
interaction (PRI)—Arousal states were expected to be
better explained by GSR features; while, valence states
were expected to be better explained by HR features.
2) Develop a machine learning algorithm for accurate patient emotional state classification with the potential to
classify states in real-time during PRI.
3) Examine the utility of advanced signal processing features for representing physiological signals in emotional
state identification—Wavelet coefficients can be used as
a compressed representation of amplitude, time, and frequency features of physiological signals.
4) Develop a wavelet-based de-noising algorithm by identifying the noise distribution and features of a reference
signal to eliminate those noise features overlapping with
the informative signal frequency.
5) Identify significant wavelet-based features for emotional
state classification—A statistical approach can be used to
identify physiological features with utility for classifying
emotional states.
II. E XPERIMENT
An experiment was conducted to develop an empirical data
set that could be used as a basis for addressing the above
objectives. Observations on human emotional states and physiological responses in interacting with an actual service robot in
the target context were needed to develop the emotional state
classification algorithm and to demonstrate the wavelet-based
signal processing methods.
A. Procedure
With the aging U.S. population, the Health Resources and
Services Administration predicted that elderly persons represent the primary future users of healthcare facilities for agerelated healthcare needs. Therefore, elderly will likely be the
largest user group of nursing robots in the future. For the present
study, we recruited 24 residents at senior centers (17 females
and 7 males) in Cary, North Carolina. They ranged in age
from 63 to 91 years with a mean of 80.5 years and a standard
deviation of 8.8 years.
At the beginning of the experiment, participants read and
signed an informed consent and completed a background survey. They were then provided with a brief introduction on
nursing robots and applications in healthcare tasks, including medicine delivery. This was followed by a familiarization
with the SAM form for emotion ratings and the physiological
measurement. A Polar HR monitor (Polar Electro Inc.) was
used, including a S810i wrist receiver and T31 transmitter
attached to an elastic strap around the chest area. The monitor
recorded heart activity in RR intervals, and the Polar Precision
Performance software was used for analysis.
An iWorx GSR-200 amplifier (iWorx Systems, Inc.) was
used to apply a voltage between electrodes placed on the surface
of a subject’s index and ring fingertips. Factory calibration of
the amplifier equated 1 volt to a SC of 5 μSiemens. The output
voltages of the GSR signal were transmitted to a DT9834 data
acquisition system (Data Translation, Inc.) and finally recorded
in a computer using quickDAQ software (Data Translation,
Inc.) with a sampling rate of 1024 Hz.
SWANGNETR AND KABER: EMOTIONAL STATE CLASSIFICATION IN PATIENT–ROBOT INTERACTION
Fig. 3.
67
PeopleBot platform.
Once the HR monitor and GSR electrodes were placed on a
subject, they were asked to sit and relax on a sofa located in a
simulated patient room. During this period, 1 min of HR and
GSR data was recorded and the mean responses were later used
as baselines [28] for normalizing test trial data.
There were six events identified during each experiment test
trial. At the beginning of a trial, a PeopleBot robot (see Fig. 3)
entered the simulated patient room (Event 1), with a container
of medicine in a gripper and stopped in front of the subject
(Event 2). The robot notified subjects of its arrival (Event 3)
and released the bottle of medicine from its gripper (Event 4).
The robot then waited for a short period of time before it turned
around (Event 5) and left the room (Event 6).
The design of the experiment was a randomized complete
block design with three independent variables representing
robot feature manipulations, including robot facial features
(see Fig. 4; abstract or android), speech capability (synthesized
or digitized voice), and different modes of user interaction
(i.e., visual messages or physical confirmation of receipt of
medication with a touch screen). Each subject was exposed to
all settings of each variable. No interactions of features were
studied in the experiment. The order of presentation of the
control condition (i.e., a robot without any features) among
stimulus trials was randomly assigned.
The physiological data (HR and GSR) were collected
throughout trials. At the end of each trial, subjects completed
the SAM questionnaire, indicating their emotional response
to the specific robot configurations. After subjects completed
14 test trials (2 replications of 2 levels of the face, voice
and interactivity conditions, plus 1 control condition), a final
interview was conducted in which they provided comments on
their impressions of the robot configurations. On average, each
subject took ∼50 min. to complete the experiment.
B. Data Post-Processing and Overall Results
To address individual differences in internal scaling of emotions and physiological responses, all observations on the response measures were normalized for each participant. The
Fig. 4. Some robot feature manipulations.
arousal and valence ratings from the SAM questionnaire were
converted to z-scores. Normalized ratings were then categorized as “low,” “medium,” or “high” levels of valence and
arousal. There were two subjects who did not follow experiment
instructions in the arousal and valence ratings, and for this
reason their data were considered invalid and were excluded
from analysis.
The physiological response measures, HR (in bpm) and GSR
(in μSiemens), were also normalized by subtracting test trial
readings (Yi ) from the mean baseline response (Ybaseline ) and
dividing by the maximum range for every participant, as shown
in a following equation:
Ynormalized =
Yi − Ybaseline
.
Max [abs(Yi − Ybaseline )]
(1)
Results of ANOVAs on the two subjective measures of
emotion, arousal, and valence indicated significant differences
in emotional state depending on robot feature settings. However, no one feature appeared more powerful than any other
for facilitating positive emotional experiences. Regarding the
physiological measures, there were also significant differences
among robot feature types. The interactivity condition produced greater HR and GSR responses than the face and voice
conditions when interacting with the robot. However, when
making comparisons among the settings of each feature, only
the levels of interactivity appeared to differ significantly. Correlation analyses were also conducted between the physiological
68
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013
TABLE I
A NOVA R ESULTS ON HR FOR L EVELS OF VALENCE AND A ROUSAL AT
S PECIFIC E VENTS (N OTE : F-S TATISTICS I NCLUDE N UMBERATOR
AND D ENOMINATOR D EGREES OF F REEDOM . P-VALUES
R EVEAL S IGNIFICANCE L EVELS FOR E ACH T EST.
B OLD VALUES ARE S IGNIFICANT.)
Fig. 5. Post-hoc results on GSR for levels of arousal at significant stimulus
events.
TABLE II
A NOVA R ESULTS ON GSR FOR L EVELS OF VALENCE
AND A ROUSAL AT S PECIFIC E VENTS
and subjective ratings of emotions. Results revealed a strong
positive relation between valance and HR, but no significant
correlation between arousal and GSR.
III. A NALYTICAL M ETHODOLOGY
A. Event Selection and Analytical Procedure
Analysis of physiological measures as a basis for emotional
state classification is generally conducted on an event basis.
Ekman [29] observed that human emotional responses typically
last between 0.5 and 4 s. Consequently, short recording periods
(< 0.5 s) might cause certain emotions to be missed or long
periods (> 4 s) might lead to observations of mixed emotions
[30], [31]. In addition, Dawson et al. [7] indicated any GSR
SCR, which begins between 1 and 3 s or 1 and 4 s after
stimulus onset, is considered to be elicited by that stimulus. We
used a 4-s time window for physiological data recording after
an event in the experiment trials for data analysis purposes.
One-way ANOVA models were structured for each event and
used to identify that event providing the greatest degree of
discrimination of physiological responses (i.e., HR and GSR)
based on subject emotional states, which were categorized as
three levels of arousal and three levels of valence. Experiment
results, shown in Tables I and II, revealed significant differences
in HR for the levels of valence during Events 4 (opening
gripper), 5 (subject accepting medicine) and 6 (robot leaving
room); while there were significant differences in GSR for the
levels of arousal during Events 2 (robot moving in front of
patient), 4 and 6.
Across these analyses, only Events 4 and 6 supported HR
and GSR for discriminating among emotional states. Event
6 occurred after the robot moved from the patient room and
was not considered a potential stimulus. Therefore, Event 4,
the robot opening its gripper to release the medicine bottle
to a patient, was selected as the stimulus event for further
investigation.
Fig. 6. Post-hoc results on HR for levels of valence at significant stimulus
events.
Further post-hoc analyses were conducted using Tukey’s
tests on HR and GSR responses recorded at influential stimulus
events to identify significant differences among the levels of
emotional state (arousal and valence). Results revealed high
arousal to be associated with higher GSR than low and medium
arousal (see Fig. 5); whereas medium valence was associated
with higher HR than low and high valence for significant
stimulus events (Fig. 6).
Fig. 7 presents the overall analytical procedure used in this
study. Initially, we sought to extract statistical and wavelet
features from the raw physiological signals. We then used a
statistical approach to select significant or relevant features
for use in the machine learning model for emotional state
classification. The following subsections describe each of these
steps in detail.
B. Physiological Feature Extraction: HR Analysis
Based on the prior research (e.g., [32]), several statistical
features of the HR response were identified for investigation in
classifying emotional states. These included mean and median
HR as measures of centrality and SDHR and the range of the
response as measures of variation. (The 4-s time window for
data analysis was not sufficient for determining HRV or other
HR features in the frequency domain.) Plots of the normalized
HR response distributions for some data collected in the study
revealed symmetry with a small number of outliers in a few
trials. For such distributions, based on small sample sizes, the
sample mean is a more efficient estimator of the population
mean (i.e., it has a smaller variance) than the median [33].
Therefore, mean HR was used for analysis purposes. Based
on the data set, either SDHR or the range were considered
suitable for estimating the population variance. These variables
SWANGNETR AND KABER: EMOTIONAL STATE CLASSIFICATION IN PATIENT–ROBOT INTERACTION
Fig. 7.
69
Procedure to development of emotional state classification algorithm.
were selected for subsequent statistical analysis to identify
physiological signal features for classifying emotional states.
TABLE III
MSES C ALCULATED ON THE D IFFERENCES B ETWEEN THE
R ECONSTRUCTED S IGNALS FOR VARIOUS DB N WAVELETS
C. Physiological Feature Extraction: GSR Analysis
1) Wavelet Selection: As suggested above, the recorded
GSR signals were nonstationary and noisy and required further
signal processing. A small wavelet, referred to as the “mother”
wavelet, was translated and scaled to obtain a time-frequency
localization of the GSR signal. This leads to most of the energy
of the signal being well represented by a few wavelet expansion
coefficients. The mother wavelet was chosen primarily based
on its similarity to the raw GSR signal [27]. Among several
families of wavelets, Daubechies (dbN), Symlets (symN), and
Coiflets (coifN) families have the properties of orthogonality
and compactness that can support fast computation algorithms.
Unlike the near symmetrical shape of symN and coifN, an
asymmetrically shaped dbN wavelet was found to closely match
the GSR waveform. Although previous studies have used dbN
wavelets to analyze GSR signals [34], [35], the choice of
wavelets was subjective and not primarily based on the shape of
a typical GSR signal. Therefore, these approaches may not have
captured all informative features of the signal or eliminated
noise. Gupta et al. [36] suggested that if the form of the wavelet
is matched to the form of the signal, such that the maximum
energy of the signal is accounted for in the initial scaling space,
then the energy in next lower wavelet subspace should be very
small. The mother wavelet that produces the minimum mean
square error [MSE; see (2)] between two signals is the best
match to the signal
[x(t) − x̂(t)]2 dt
.
(2)
MSE =
n−1
The typical frequency of the GSR signal is 0.0167–0.25 Hz
[7]. Therefore, reconstruction of the signal in the wavelet scale,
including frequencies below 0.5 Hz, will represent the GSR
signal, x(t); whereas signal reconstruction on the next lower
wavelet scale (including higher frequencies) will capture both
the GSR signal and noise, x̂(t). (Note: As the wavelet scale
decreases, the represented frequencies increase.) Several prior
studies have recommend that the cutoff frequency for high-
frequency noise be at least double the highest signal frequency.
The MSE of the difference between these two reconstructed
signals for the common dbN, are shown in Table III. Results
indicated that db3 was the most appropriate choice of mother
wavelet to represent the GSR signal.
2) Noise Elimination: In recording the GSR signal, the measurement device also generated noise, including: white noise
existing inherently in the amplifier (i.e., the power is spread
over the entire frequency spectrum), noise from poor electrode
contacts or variations in skin potential (i.e., a low-frequency
fluctuation), power line noise (60 Hz. in the US), motion
artifacts, etc. [37], [38]. For the purpose of frequency analysis,
signal noises were separated into: mid-band frequency noise,
which overlaps the GSR frequency; and high-frequency noise
(> 0.5 Hz).
a) High-frequency noise elimination: Based on decomposition of the GSR signal using the db3 wavelet, coefficients
representing high-frequency (> 0.5 Hz) details of the signal
(noise) were set to zero. Consequently, an entire 4-s GSR signal
(1024 × 4 = 4096 data points) was effectively represented by
only 24 wavelet coefficients. These coefficients were characterized by a 4-s time localization at four frequency ranges. (Note:
The number of coefficients for each frequency range is not
uniform. The higher the frequency, the greater the number of
coefficients.) The amplitude of the coefficients corresponded
to the amplitude of the GSR signal in specific frequencies at
specific times.
b) Mid-band frequency noise elimination: Wavelet
threshold shrinking algorithms have been widely used for
removal of noise from signals [39], [40]. The soft thresholding
shrinkage function was used in the present algorithm. The
concept is to set all frequency subband coefficients, which are
less than a particular threshold (λ), to zero since such details are
often associated with noise. Subsequently, shrinking is applied
to other nonzero coefficients based on the threshold value.
70
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013
Fig. 9. Laplace distribution fit to wavelet detail coefficients from the GSR
data collected during the subject rest period.
Fig. 8. Power spectrum of GSR occurring during subject rest period.
There are several wavelet threshold selection rules used for
denoising, for example: the Donoho andJohnstone universal
threshold (DJ threshold), calculated by σ 2 log(N ), where σ
is noise standard deviation and N is the signal size; a confidence
interval threshold, such as 3σ or 4σ; the SureShrink threshold,
based on Stein’s unbiased estimate of risk (a quadratic loss
function); and the Minimax threshold developed to yield the
minimum of the maximum MSE over a given set of functions.
However, in practice, the noise level σ, which is needed in all
these thresholding procedures, is rarely known and is therefore
commonly estimated based on the median absolute deviation
of the estimated wavelet coefficients at the finest level (i.e., the
highest frequency) divided by 0.6745 [41]. It can be seen that
these selection rules are generated from signal characteristics
but have no relation to the nature of the noise.
A frequency analysis was conducted on the 1 min of GSR
signal data recorded during the subject rest period. A FFT was
used to reveal the power spectrum of the baseline response
to range from 0 to 0.5 Hz with peak values occurring for
frequencies less than 0.1 Hz. (see Fig. 8). (This technique
was only applied to the GSR signal during the rest period to
identify noise in specific frequency ranges of the reference
signal. Wavelet analysis was used for all test signal denoising
and feature identification.)
The typical frequency of the GSR signal during the
rest period (without any stimuli) was between 0.0167 and
0.05 Hz. This frequency range is represented by approximation
coefficients after decomposition. Therefore, all detail coefficients (frequencies between 0.05 and 0.5 Hz) represent signal
noise and can be used as thresholds for signal de-noising. The
confidence interval threshold technique (mentioned above), determines a threshold from the standard deviation of noise. This
is based on the basic concept that by setting a threshold λ to 3σ,
the probability of noise coefficients out of the interval [−3σ, 3σ]
will be very small (0.27%). However, this technique is based
on the assumption that the noise coefficients are normally
distributed. In fact, the distribution of wavelet detail coefficients
is better represented by a zero-mean Laplace distribution [42],
which has heavier tails than a Gaussian distribution. Wavelet
Fig. 10.
Noisy GSR and de-noised GSR comparison.
detail coefficients obtained from the baseline data set were
tested for fit to Normal and Laplace distribution using the
Kolmogorov-Smirnov test. Results confirmed that the detail
coefficient distributions were not significantly different from
the Laplace distribution (p > 0.05), but there was a significant
difference from the Normal distribution. Fig. 9 illustrates the
histogram of detail coefficients from the signal recorded during
the rest period fitted by a Laplace distribution.
The Laplace distribution has a mean and median of μ and
2
= 2b2 . to have 99.73% confidence that noise covariance σL
efficients will be eliminated from the signal, the threshold
of wavelet shrinkage is set to 4.18σL. When compared with
the normal distribution, noise coefficients modeled with the
Laplace distribution will have a higher threshold value. The
σ of noise in the raw signal was estimated based on the rest
period data. We expected no GSR frequencies between 0.05
and 0.5 Hz when the subject was not exposed to a stimulus.
The σ represents the variance of the Laplace distribution on the
wavelet coefficients for the signal during the rest period. This σ
is also used as the threshold for de-noising the signal when the
subject is exposed to a stimulus.
Fig. 10 illustrates an example GSR signal de-noised using the
above methodology.
SWANGNETR AND KABER: EMOTIONAL STATE CLASSIFICATION IN PATIENT–ROBOT INTERACTION
The rectified signal with de-noising in all frequencies can be
compared with the raw GSR signal and the GSR signal with
only high-frequency noise elimination. The wavelet analysis
initially eliminated the high-frequency noise, for example, any
abrupt signals caused by motion artifacts. The transformation
then eliminated noise present in the frequencies overlapping
with the GSR signal frequency, resulting in a smoother GSR
signal. This methodology appears to be highly promising for
isolating informative GSR signal features and has not previously been demonstrated.
71
TABLE IV
S IGNIFICANT H IGH -F REQUENCY DE -N OISED GSR AND
HR F EATURES FOR C LASSIFYING A ROUSAL AND
VALENCE BASED ON R EGRESSION A NALYSIS
D. Feature Selection
The feature selection step in the new algorithm was to reduce
the complexity of any data model by selecting the minimum set
of most relevant physiological signal features for classifying
emotional states. Principal component analysis (PCA) is the
most commonly used data reduction technique in pattern recognition and classification [43] and has been applied to data in the
wavelet domain (e.g., [44]). The basic idea is to project the data
on to a lower dimensional space, where most of the information
is retained. Unfortunately, transformed variables are not easy
to interpret (i.e., assign abstract meaning; [45]). Moreover,
PCA does not take class information into consideration; consequently, there is no guarantee that the classes in the transformed data are better separated than in the original form [43].
Multiple-hypothesis testing procedures are another form of
feature selection technique that has been used for wavelet
coefficient selection (e.g., [46], [47]). All coefficients are simultaneously tested for a significant departure from zero; therefore,
the selected set of wavelet coefficients will provide the best
contrast between classes [41]. However, the features identified
in this approach are based on their classification capabilities but
not on specific relationships with classes.
Prior research (e.g., [32]) has also used stepwise regression
procedures for selection of physiological features for use in
models to predict emotional states. This type of analysis can be
used to identify features that have a significant relationship with
emotional responses and to avoid overly complex classification
models. However, such analysis is typically conducted in the
original time domain.
On this basis, a stepwise backward-elimination regression
approach was applied to identify the HR and GSR features
that were statistically significant in classifying emotional states.
The SDHR and RangeHR responses were highly correlated and
were, therefore, examined in separate models. The analysis also
examined both high-frequency de-noised and total de-noised
GSR signals (i.e., signals with high and mid-band frequency
elimination). Although previous studies demonstrated some
relationships may exist between the statistical features of HR
and GSR and emotional states, no prior work has investigated
wavelet coefficients to predict emotional states. Therefore, the
entire range of wavelet coefficients, determined based on the
GSR signals, was considered potentially viable for classification of concurrent subject emotional states. Since the orthogonality property of the Daubechies family of wavelets ensures
there is no correlation between wavelet coefficients for the
signal being processed, it is possible to include multiple dbN
TABLE V
S IGNIFICANT T OTAL DE -N OISED GSR AND HR F EATURES
FOR C LASSIFYING A ROUSAL AND VALENCE BASED
ON R EGRESSION A NALYSIS
wavelet coefficients as predictors in a single regression model
of emotional states without violating model assumptions.
Results revealed regression models of arousal producing the
highest R-squared values to include: 15 high-frequency denoised GSR wavelet coefficient features and SDHR; and 14
total de-noised GSR wavelet coefficient features and SDHR
(see Tables IV and V). For valence models, the best predictive
models included: 14 high-frequency de-noised GSR wavelet
coefficient features; and 11 total de-noised GSR wavelet coefficient features, MeanHR and SDHR (see Tables IV and V).
(Note that low and high GSR subscripts correspond with low
and high signal frequencies from time 0 to 4 s.)
E. Emotional State Classification
Among a large number of neural network structures, the
MLP is used more often and has proven to be a universal
approximator [43]. We implemented neural network models
using the NeuralSolution software (NeuroDimension, Inc.) with
an error-back propagation algorithm. A hyperbolic tangent
sigmoid function was used as the activation function for all
neurons in the hidden layer. A forward momentum of 0.9 [43]
was also used at the hidden nodes to prevent a network from
getting stuck at a local minimum during training. We determined weights of links among the nodes in the ANN structure
to minimize the classification error. We set MSE = 0.02[48] as
a criterion error goal. For validation (testing) of the ANNs, the
data “hold-out” method was used, including separating 80% of
the samples for training the NN and the remaining 20% for
72
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013
TABLE VI
OVERALL PCCS FOR A ROUSAL AND VALENCE C LASSIFICATION
N ETWORKS U SING H IGH -F REQUENCY DE -N OISED WAVELET F EATURES
S ELECTED BASED ON S TEPWISE R EGRESSION A NALYSIS
TABLE VII
OVERALL PCCS FOR A ROUSAL AND VALENCE C LASSIFICATION
N ETWORKS U SING T OTAL DE -N OISED WAVELET F EATURES
S ELECTED BASED ON S TEPWISE R EGRESSION A NALYSIS
Fig. 11.
Sensitivity analysis of arousal state.
validation. The validation data was randomly selected. The
number of data points was balanced among the classes of
valence and arousal.
To create a parsimonious ANN for classifying subject emotional states based on physiological signal features, we used
a single hidden layer of processing elements. The minimum
number of hidden nodes (h) in the hidden layer can be defined
based on the number of inputs (n) and outputs (m) using
Masters [49] equation
1
(3)
h = Int (m × n) 2 .
Fig. 12.
Sensitivity analysis of valence state.
In this paper, the emotional states of subjects were classified
into three levels, including: low, medium, and high arousal
or valence. Based on the sets of physiological data features
selected from the stepwise regression procedure for both the
high-frequency de-noising data set (16 features for arousal
model and 14 features for valence model) and the total denoising data set (15 features for arousal model and 13 features
for valence model), Masters’ equation indicated the minimum
number of hidden nodes for the ANNs to be approximately
six (for either emotional response). Holding fixed the set of
inputs selected from the stepwise regression and only using a
single hidden layer network structure, the number of hidden
layer nodes was optimized to achieve the highest percentage
of correct classifications (PCC) of subject emotional responses.
The PCCs for the best arousal and valence classification
networks from the high-frequency de-noised data set are shown
in Table VI. Results revealed the overall PCC in validation of
the ANN for classifying arousal to be 72%. This network was
constructed with eight hidden nodes and produced a R-squared
value of 0.73. The PCC of the ANN for classifying valence was
67%. The network was constructed with six hidden nodes and
produced a R-squared value of 0.6.
Results of arousal and valence state classification based on
the total de-noised data set, as shown in Table VII, revealed the
overall PCC in validation of the ANN for classifying arousal to
be 82%. This network was constructed with seven hidden nodes
and produced a R-squared value of 0.78. The PCC of the ANN
for classifying valence was 73%. The network was constructed
with seven hidden nodes and produced a R-squared value of
0.73. It can be seen that after applying the proposed algorithm to
eliminate signal noise, overlapping with the informative signal
frequency, the classification models produced higher PCCs for
both arousal and valence state.
Sensitivity analyses were performed using the NeuroSolutions software to provide information on the significance of the
various inputs to the arousal and valance classification networks
(see Figs. 11 and 12). The HR signal appeared to have the least
effect on the arousal model; however, it had a relatively high
effect on the valence model. In contrast, most GSR features
had significant effects on arousal while fewer numbers of GSR
features had great impact on valence states.
To demonstrate the performance of the statistical feature
selection methodology, additional ANNs were constructed including all GSR wavelet coefficients (24 features) and HR
features (MeanHR and SDHR) for classifying arousal and
valence. For the high-frequency de-noised data set, results
(Table VIII) revealed an overall PCC in validation of the ANN
for classifying arousal to be 67%. The PCC of the ANN for
classifying valence was 63%. For the total de-noised data set,
results (Table IX) revealed the PCC of the ANN for classifying
arousal to be 78%. The PCC of the ANN for classifying valence
was 75%. Using all wavelet coefficient features for classifying
current emotional states not only produced lower PCCs but
involved more complex networks. Although the valence model
(total de-noised) in all features resulted in a slightly higher
PCC (75% compared with 73% for the reduced model), the full
network was much more complex (26 input and 8 hidden nodes
versus 13 input and 7 hidden nodes for the reduced model).
SWANGNETR AND KABER: EMOTIONAL STATE CLASSIFICATION IN PATIENT–ROBOT INTERACTION
TABLE VIII
OVERALL PCCS FOR A ROUSAL AND VALENCE C LASSIFICATION
N ETWORKS U SING A LL H IGH -F REQUENCY
DE -N OISED WAVELET F EATURES
TABLE IX
OVERALL PCCS FOR A ROUSAL AND VALENCE C LASSIFICATION
N ETWORKS U SING A LL T OTAL DE -N OISED WAVELET F EATURES
73
form content in repeated observations. In this paper, the elderly
participants may have been unaware of the randomized order
of questions and may have misinterpreted the scales after some
trials. Another explanation for our findings is that the initial
wavelet analysis only filtered out the high-frequency noise from
the GSR signal. Frequency analysis of GSR signals recorded
during the rest period revealed some noise in mid-band frequencies, which overlapped the typical GSR frequency. This midband noise could have accounted for some of the variability
in the valence response; otherwise, HR features might have
explained valence states. After further applying the new midband de-noising algorithm and conducting sensitivity analyses
on the NN models, results confirmed arousal states were better
explained by GSR features than HR; while valence states were
better explained by HR (than arousal) and fewer GSR features.
IV. D ISCUSSION
V. C ONCLUSION AND F UTURE W ORK
This study demonstrated wavelet analysis can be an effective
approach for significant feature extraction from physiological
data. The wavelet technology also serves as an efficient means
for feature reduction; the number of GSR signal features was
reduced by 99.4% for analysis. The stepwise regression also
proved to be an effective statistical method for further feature
reduction. The number of GSR and HR features for the model
of arousal was reduced by 38.5% after applying high-frequency
noise elimination and by 42.3% after further applying midband frequency noise elimination. For the model of valence, the
number of physiological signal features was reduced by 46.2%
after applying high-frequency noise elimination and by 50%
after further applying mid-band frequency noise elimination.
Therefore, these methods can be used to develop emotional
state classification models without high complexity but that include significant physiological signal features (i.e., amplitude,
time and frequency).
Comparison of the R-squared values from the regression
models (ranging from 0.0424 to 0.0534) with the neural network models (ranging from 0.6 to 0.78) indicated that the relationships between the physiological responses, including HR
and GSR, and emotional states, in terms of valence and arousal,
are likely nonlinear. This was in agreement with speculation
based on the regression analysis.
Prior research has established that GSR is an indicator of
arousal, with a linear correlation among the physiological
response and emotional state. However, GSR has not been
shown to distinguish positive from negative valence stimuli.
In contrast, HR has been shown to have a strong relationship
with both arousal and valence states. The analyses on the
high-frequency de-noised data set confirmed the relationships
between GSR and HR and arousal; however, valence states
were not predicted by HR features. We suspected two possible
reasons for this. First, there may have been some bias in subject
self-reports of emotion during the study. Since the SAM questionnaire is a multi-dimensional subjective rating system, the
order of presentation of the subscales was randomized on forms
presented to subjects after each trial. This is a typical procedure
in human factors research used to promote subject attention to
The results of this study further support relationships between HR and GSR responses and emotional states, in terms
of arousal and valence. Hazlett [18] indicated in his review that
such physiological measures mostly reflect arousal and may
be limited for indicating emotional valence. We found GSR
and HR to be predictive of arousal states and for HR to be a
stronger indicator of valence. Hazlett noted that some studies
have validated facial EMG as a valence indicator. Our future
research will examine facial EMG signals as an additional
physiological measure for classifying emotional states.
The machine learning algorithm we developed has the potential to classify patient emotional states in real-time during
interaction with a robot. Percent correct classification accuracies for the NN models on arousal and valance ranged from 73
to 82%. Other emotion recognition methods have been developed using facial expressions via image processing [e.g., 50].
These methods have achieved high PCCs (e.g., 88.2–92.6%);
however, this methodology is based on explicit emotion expressions. Interaction with service robots may not induce extreme
happiness or unhappiness in patients. The study demonstrated
a set of wavelet coefficients could be determined to effectively
represent physiological signal features without additional postprocessing steps and therefore can support fast emotional state
classification, when integrated as inputs in NN models. Wavelet
technology also proved highly effective for eliminating noise,
overlapping with informative physiological signal frequencies,
and increasing the accuracy of emotional state classification
with the neural networks.
In general, the wavelet transformation process supports fast
coefficient computation for simultaneous noise elimination and
physiological signal representation. This is important because
we are ultimately interested in real-time classification of patient
emotional states for service robot behavior adaptation. The
approach requires that GSR and HR data be captured on paients
in real time; signals must then be transformed and reconstructed
using wavelet analysis; wavelet expansion coefficients must be
computed; and the coefficients must be used as inputs to a
trained ANN for emotional state prediction. Classified patient
states can then be used as a basis for adapting robot behavior
74
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013
or interface configurations to ensure positive patient emotional
experiences (e.g., high arousal and high valence) in healthcare.
Such a real-time emotional state classification system may be
a valuable tool as part of future service robot design to ensure
patient perceptions of quality in healthcare operations. We plan
to implement the new human emotional classification algorithm
on a service robot platform and to attempt to adapt various types
of behaviors, including positioning relative to users and speech
patterns. In the present study, the robot configuration generating
the highest HR and GSR responses was the design requiring
subjects to confirm their receipt of medicine. In a future study,
we will further explore such a configuration and use maximum
physiological responses for emotional state classification.
[17]
[18]
[19]
[20]
[21]
ACKNOWLEDGMENT
[22]
The authors thank Dr. Yuan-Shin Lee for comments on an
earlier unpublished technical report of this work as well as
Dr. T. Zhang, Dr. B. Zhu, L. Hodge, and Dr. P. Mosaly for
assistance in the experimental data collection and statistical
analysis.
[23]
R EFERENCES
[1] D. I. Auerbach, P. I. Buerhaus, and D. O. Staiger, “Better late than never:
Workforce supply implications of later entry into nursing,” Health Affairs,
vol. 26, no. 1, pp. 178–185, Jan./Feb. 2007.
[2] J. A. Russell, “A circumplex model of affect,” J. Person. Social Psychol.,
vol. 39, no. 6, pp. 1161–1178, 1980.
[3] D. Watson and A. Tellegen, “Toward a consensual structure of mood,”
Psychol. Bull., vol. 98, no. 2, pp. 219–235, Sep. 1985.
[4] J. A. Russell and L. Feldman-Barrett, “Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant,”
J. Person. Social Psychol., vol. 76, no. 5, pp. 805–819, 1999.
[5] R. Reisenzein, “Pleasure-activation theory and the intensity of emotions,”
J. Person. Social Psychol., vol. 67, no. 3, pp. 525–539, 1994.
[6] M. Bradley and P. Lang, “Measuring emotion: The self-assessment
Manikin and the semantic differential,” J. Behav. Ther. Exp. Psychiatry,
vol. 25, no. 1, pp. 49–59, Mar. 1994.
[7] M. E. Dawson, A. M. Schell, and D. L. Filion, “The electrodermal system,” in Handbook of Psychophysiology, J. T. Cacioppo, L. G. Tassinary,
and G. G. Berntson, Eds., 3rd ed. New York: Cambridge Univ. Press,
2007, pp. 159–181.
[8] C. D. Wickens, Engineering Psychology and Human Performance,
2nd ed. New York: Harper Collins Pub. Inc., 1992.
[9] M. Malik, J. T. Bigger, A. J. Camm, R. E. Kleiger, A. Malliani, A. J. Moss,
and P. J. Schwartz, “Heart rate variability: Standards of measurement,
physiological interpretation, and clinical use,” Eur. Heart J., vol. 17, no. 3,
pp. 354–381, Mar. 1996.
[10] R. L. Mandryk and M. S. Atkins, “A fuzzy physiological approach for continuously modeling emotion during interaction with play technologies,”
Int. J. Hum.-Comput. Stud., vol. 65, no. 4, pp. 329–347, 2007.
[11] S. A. Neumann and S. R. Waldstein, “Similar patterns of cardiovascular
response during emotional activation as a function of affective valence
and arousal and gender,” J. Psychosom. Res., vol. 50, no. 5, pp. 245–253,
May 2001.
[12] T. Ritz and M. Thöns, “Airway response of healthy individuals to affective
picture series,” Int. J. Psychophysiol., vol. 46, no. 1, pp. 67–75, Oct. 2002.
[13] C. Peter and A. Herbon, “Emotion representation and physiology assignments in digital systems,” Interact. Comput., vol. 18, no. 2, pp. 139–170,
Mar. 2006.
[14] A. Nakasone, H. Prendinger, and M. Ishizuka, “Emotion recognition from
electromyography and skin conductance,” in Proc. 5th Int. Workshop BSI,
Tokyo, Japan, 2005, pp. 219–222.
[15] S. Amershi, C. Conati, and H. Maclaren, “Using feature selection and
unsupervised clustering to identify affective expressions in educational
games,” in Proc. Workshop Motivational Affect Issues ITS, Jhongli,
Taiwan, 2006, pp. 21–28.
[16] C. K. Lee, S. K. Yoo, Y. J. Park, N. H. Kim, K. S. Jeong, and
B. C. Lee, “Using neural network to recognize human emotions from heart
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
rate variability and skin resistance,” in Proc. 27th Annu. Conf. IEEE Eng.
Med. Biol., Shanghai, China, pp. 5523–5525.
A. Pecchinenda and C. A. Smith, “The affective significance of skin
conductance activity during a difficult problem-solving task,” Cognit.
Emotion, vol. 10, no. 5, pp. 481–503, Sep. 1996.
R. L. Hazlett, “Measuring emotional valence during interactive experiences: Boys at video game play,” in Proc. CHI, Novel Methods: Emotions,
Gesture, Events, Montreal, QC, Canada, 2006, pp. 1023–1026.
H. Leng, Y. Lin, and L. A. Zanzi, “An experimental study on physiological
parameters toward driver emotion recognition,” Lecture Notes in Computer Science, vol. 4566, pp. 237–246, 2007.
C. L. Lisetti and F. Nasoz, “Using noninvasive wearable computers to recognize human emotions from physiological signals,” EURASIP J. Appl.
Signal Process., vol. 2004, pp. 1672–1687, 2004.
K. Itoh, H. Miwa, Y. Nukariya, M. Zecca, H. Takanobu, S. Roccella,
M. C. Carrozza, P. Dario, and A. Takanishi, “Development of a bioinstrumentation system in the interaction between a human and a robot,”
in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Beijing, China, 2006,
pp. 2620–2625.
D. Kulic and E. Croft, “Physiological and subjective responses to articulated robot motion,” Robotica, vol. 25, no. 1, pp. 13–27, Jan. 2007.
C. Liu, K. Conn, N. Sarkar, and W. Stone, “Online affect detection and
robot behavior adaptation for intervention of children with autism,” IEEE
Trans. Robot., vol. 24, no. 4, pp. 883–896, Aug. 2008.
P. Agrawal, C. Liu, and N. Sarkar, “Interaction between human and robotAn affect-inspired approach,” Interact. Stud., vol. 9, no. 2, pp. 230–257,
2008.
P. Rani, C. Liu, N. Sarkar, and E. Vanman, “An empirical study of machine
learning techniques for affect recognition in human-robot interaction,”
Pattern Anal. Appl., vol. 9, no. 1, pp. 58–69, May 2006.
C. L. Breazeal, “Emotion and sociable humanoid robots,” Int. J. Hum.Comput. Stud., vol. 59, no. 1/2, pp. 119–155, Jul. 2003.
K. Najarian and R. Splinter, Biomedical Signal and Image Processing.
Boca Raton, FL: CRC Press, 2006.
M. W. Scerbo, F. G. Freeman, P. J. Mikulka, R. Parasuraman,
F. Di Nocera, and L. J. Prinzel, “The efficacy of psychophysiological
measures for implementing adaptive technology,” NASA Langley Res.
Center, Hampton, VA, NASA TP-2001-211018, 2001.
P. Ekman, “Expression and the nature of emotion,” in Approaches to
Emotion, K. S. Scherer and P. Ekman, Eds. Hillsdale, NJ: Erlbaum,
1984.
P. Ekman, “An argument for basic emotions,” Cognit. Emotion, vol. 6,
no. 3/4, pp. 169–200, 1992.
P. Ekman, “Basic emotions,” in Handbook of Cognition and Emotion,
T. Dalgleish and M. Power, Eds. Sussex, U.K.: Wiley, 1999.
G. Zhang, R. Xu, Q. Ji, P. Cowings, and W. Toscano, “Context, observation, and operator State (COS): Dynamic fatigue monitoring,” presented
at the NASA Aviation Safety Tech. Conf., Denver, CO, 2008.
J. F. Kenney and E. S. Keeping, Mathematics of Statistics, 3rd ed.
Princeton, NJ: Van Nostrand, 1954, pt. 1.
M. Slater, C. Guger, G. Edlinger, R. Leeb, G. Pfurtscheller, A. Antley,
M. Garau, A. Brogni, and D. Friedman, “Analysis of physiological
responses to a social situation in an immersive virtual environment,”
Presence, Teleoper. Virtual Environ., vol. 15, no. 5, pp. 553–569,
Oct. 2006.
J. Laparra-Hernández, J. M. Belda-Lois, E. Medina, N. Campos, and
R. Poveda, “EMG and GSR signals for evaluating user’s perception of
different types of ceramic flooring,” Int. J. Ind. Ergonom., vol. 39, no. 2,
pp. 326–332, 2009.
A. Gupta, S. D. Joshi, and S. Prasad, “On a new approach for estimating
wavelet matched to signal,” in Proc. 8th Nat. Conf. Commun., Bombay,
India, 2002, pp. 180–184.
C. J. Peek, “A primer of biofeedback instrumentation,” in Biofeedback:
A Practitioner’s Guide, M. S. Schwartz and F. Andrasik, Eds., 3rd ed.
New York: Guilford Press, 2003.
R. Moghimi, “Understanding noise optimization in sensor signalconditioning circuits,” EE Times, 2008. [Online]. Available: http://
eetimes.com/design/automotive-design/4010307/Understanding-noiseoptimization-in-sensor-signal-conditioning-circuits-Part-1a-of-4-partsJ. Li, Y. Hou, P. Wei, and G. Chen, “A novel method for the determination
of the wavelet denoising threshold,” in Proc. 1st ICBBE, Wuhan, China,
2007, pp. 713–716.
S. Poornachandra and N. Kumaravel, “A novel method for the elimination
of power line frequency in ECG signal using hyper shrinkage function,”
Digit. Signal Process., vol. 18, no. 2, pp. 116–126, Mar. 2008.
F. Abramovich, T. C. Bailey, and T. Sapatinas, “Wavelet analysis and its
statistical applications,” Statistician, vol. 49, pt. 1, pp. 1–29, 2000.
SWANGNETR AND KABER: EMOTIONAL STATE CLASSIFICATION IN PATIENT–ROBOT INTERACTION
[42] E. Y. Lam, “Statistical modelling of the wavelet coefficients with different bases and decomposition levels,” Proc. Inst. Elect. Eng—Vis. Image
Signal Process., vol. 151, no. 3, pp. 203–206, Jun. 2004.
[43] R. Polikar, “Pattern recognition,” in Wiley Encyclopedia of Biomedical
Engineering, M. Akay, Ed. New York: Wiley, 2006.
[44] R. Yamada, J. Ushiba, Y. Tomita, and Y. Masakado, “Decomposition of
electromyographic signal by principal component analysis of wavelet coefficient,” in Proc. IEEE EMBS Asian-Pac. Conf. Biomed. Eng., Keihanna,
Japan, 2003, pp. 118–119.
[45] J. L. Semmlow, Biosignal and Biomedical Image Processing: MATLABBased Applications. New York: Marcel Dekker, 2004.
[46] F. Abramovich and Y. Benjamini, “Thresholding of wavelet coefficients
as multiple hypotheses testing procedure,” Lecture Notes in Statistics,
vol. 103, pp. 5–14, 1995.
[47] F. Abramovich and Y. Benjamini, “Adaptive thresholding of wavelet
coefficients,” Comput. Stat. Data Anal., vol. 22, no. 4, pp. 351–361,
Aug. 1996.
[48] L. J. Prinzel, “Research on hazardous states of awareness and physiological factors in aerospace operations,” NASA, Greenbelt, MD, NASA/
TM-2002-211444, L-18149, NAS 1.15:211444, 2002.
[49] T. Masters, Practical Neural Network Recipes in C++. San Diego, CA:
Academic, 1993.
[50] A. Chakraborty, A. Konar, U. K. Chakraborty, and A. Chatterjee, “Emotion recognition from facial expressions and its control using fuzzy
logic,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 39, no. 4,
pp. 726–743, Jul. 2009.
Manida Swangnetr received the B.S. degree in industrial engineering from Chulalongkorn University,
Bangkok, Thailand, in 2001 and the M.S. degree in
industrial engineering and the Ph.D. degree in industrial and systems engineering with a focus in human
factors and ergonomics from North Carolina State
University, Raleigh, in 2006 and 2010, respectively.
Currently, she is a Lecturer in the Departments of
Production Technology and Industrial Engineering at
Khon Kaen University, Thailand. She is also a Research Team Member in the Back, Neck, and Other
Join Pain research group in the Faculty of Associate Medical Sciences. Prior
to these appointments, she worked as a Research Assistant in the Department
of Industrial and Systems Engineering at North Carolina State University. She
has published several other papers on human emotional state classification
in interacting with robots through the International Ergonomics Association
Triennial Conference, the Annual Meeting of the Human Factors & Ergonomics
Society, and the AAAI Symposium on Dialog with Robots. Her current research
interests include: cognitive engineering; human functional state modeling in use
of automation; ergonomic interventions for occupational work; and ergonomics
approaches to training capabilities for disabled persons.
Dr. Swangnetr is a Member of the Human Factors and Ergonomics Society
and is a registered Associate Ergonomics Professional.
75
David B. Kaber (M’99) received B.S. and M.S. degrees in industrial engineering from the University of
Ccntral Florida, Orlando, in 1991 and 1993, respectively and the Ph.D. degree in industrial engineering
from Texas Tech University, Lubbock, in 1996.
Currently, he is a Professor of Industrial and Systems Engineering at North Carolina State University,
Raleigh and Associate Faculty in biomedical engineering and psychology. He is also the Director of
the Occupational Safety and Ergonomics Program,
which is supported by the National Institute for Occupational Safety and Health. Prior to this, he worked as an Associate Professor
at the same institution and as an Assistant Professor at Mississippi State University, Mississippi State. His current research interests include computational
modeling of human cognitive behavior in interacting with advanced automated
systems and optimizing design of automation interfaces based on tradeoffs in
information load, task performance, and cognitive workload.
Dr. Kaber is a recent Fellow of the Human Factors and Ergonomics Society
and is a Certified Human Factors Professional. He is also a Member of Alpha
Pi Mu, ASEE, IEHF, IIE, and Sigma Xi.
Download