Rapid The Impact of TransItions in Mental Workload in

advertisement
The Impact of Rapid TransItions in Mental
Workload in a Supervisory Control Setting _A__M_
MASSACHUSETT$ 1N6TtJTE
OF TECHNOLOGY
Ii
Mark W. Boyer
0 20
LIBRARIES
B.S. Aaonautcal Engineering
United States Air Fowce Academy, Colorado SPrgs, CD, 2012
Subndtted to the Depatment of Aronautics and Astronautics in partial
fUfmret Of the requirementsfor the degree of
Master of Science In Aeronautics and AstronautIcs
at de
MASSACHUSETS INSTMTUTE OP TECHNOLOGY
June 2014
0 2014 Mark W. Boyer. AU rights reserved.
This work is sponsored by the Missile Defense Agency under Air Force Contract #PAS721--C-02.
Opinions, interpretations, conclusions, and econ 101is are those of die authors and are nMt
necessarily endorsed by the United States Air Force, Depenment of Defcnse, or US Governiamt.
Signature redacted
Depsrtneust of Aaonumt
and As:nauics
May 1, 2014
Certifiedby ............. ..................................................
Signature redacted
M r
/
iJ09
/
Mary
Visiting Professor of Aeonautics antfAsftomauntcs
Thesis Supervisor
Signature redacted
Accepted by..........................................................................
Paulo C. Loano
Asaocite Professor of Aalaics and Astronautics
Chair, Comittee on Graduate Students
Approved for Public Release
14-MDA-7899 (30 June 14)
1
~:t
*'~r
The Impact of Rapid Transitions in Mental
Workload in a Supervisory Control Setting
by
Mark W. Boyer
Submitted to the department of Aeronautics and Astronautics on May 21,
2014 in partial fulfillment of the requirements for the degree of Master of
Science in Aeronautics and Astronautics
ABSTRACT
With improving and expanding automation in domains such as unmanned aerial vehicles, nuclear power plants, and
commercial aviation, operators will be expected to handle long periods of low task load while monitoring these
highly automated systems with only an occasional need to respond to critical and emergent situations. In most of
these situations operators must go from periods of low cognitive engagement to ones of high stress, high mental
workload, and limited time to respond. This research investigates the mental workload transition process from low
to high workload by measuring the hemodynamic response of subjects to a simulated missile defense task using
functional near-infrared spectroscopy, or fNIRS. NIRS is a non-invasive neuroimaging technique that uses specific
wavelengths of near-infrared light to measure concentrations of oxygenated and deoxygenated hemoglobin in the
prefrontal cortex, the region of the brain commonly associated with "higher level" mental activities. This missile
defense simulation consisted of low task loading through system monitoring and two critical events. For these two
critical events, participants had to dynamically allocate assets to rapidly respond to incoming missiles in just 100
seconds. The two factors controlled were critical event difficulty (2 levels- easy and hard) and critical event onset
time (3 levels - 40, 100, 160 minutes). Thirty subjects participated in the 3-hour experiment.
Subjects who received their first event at the middle onset time (100 minutes) had a lower hemodynamic response
than those in the early or late groups. Subjects in the "100-minute, hard" condition performed worse than other
groups, indicating that the diminished hemodynamic response may be correlated with diminished performance.
There was no significant difference for hemodynamic response between the difficulty levels, despite one scenario
difficulty level being significantly harder, which was reflective in performance scores. The most important factors to
predicting performance in this supervisory control application were NEO Five Factor Index Agreeableness, video
game usage, pre-event attention state, and deoxygenated hemoglobin levels. These results indicate that performance
can be correlated to the physiological response, and that physiological responses can change over time during
periods of very low workload. Additionally, hemodynamic trends correlating with the 30-minute vigilance
decrement indicate that physiological changes may be occurring during a vigilance task and that fNIRS may be a
suitable method for tracking mental state and engagement over time.
Thesis Supervisor: Mary L. Cummings
Title: Visiting Professor of Aeronautics and Astronautics
3
Acknowledgments
First, I would like to thank my advisor, Missy Cummings, for your guidance and wisdom over the
past two years. Despite not being at MIT during my time here, you were always generous with your time
and support, both personally and professionally.
Second, I would like to thank my Lincoln Lab advisor, Lee Spence. You were an indispensable
resource helping me develop, execute, and communicate my research project. Thank you for your
encouragement in dealing with tough classes and tough advisors. And finally, thank you for great boat
trips around the North Shore.
Thank you to MIT Lincoln Laboratory and the Missile Defense Agency for sponsoring this
research. Thank you to Dan O'Connor, Sung-Hyun Son, Whangbong Choi, and all the members of
Group 36 who helped make this research project possible. I have learned volumes about missile defense
and about the people working tirelessly to develop technology to keep our country safe.
Thank you to John Kuconis for organizing the funding to make graduate school possible for this
Lt and many others. It has been a tremendous experience, and I feel very fortunate to have been selected
to work at Lincoln Lab and MIT.
I would also like to thank Rob Jacob, Dan Afergan, and all the other members of the Tufts HCI
Lab for allowing me to run my study at your lab and providing me with technical and academic support
throughout.
Thank you to Erin Solovey for all of your help in making this fNIRS project work. Despite a
busy schedule, you always found time to help me setup my project, analyze my data, and present my
results.
4
Thank you to Jason, Andrew, Alex, Kathleen, and all the other HAL members for your help
throughout my time here. From picking classes, navigating coursework, and surviving MIT, I know I
would not have made it without all of your help and encouragement.
Thank you to Sally Chapman, Beth Marois, Marie Stuppard, and all of the support staff at MIT
for your help in smoothing out funding, scheduling, and advising. You are truly the backbone of the
Institute.
I would also like to thank all of the other grad students who I have worked with here at MIT. I
know I am forever in debt to those who carried me through tough classes and helped refine my research
goals and world views.
To my brother, Shane-you have always been my best bud, role model, and late night putting
companion. Even though we may stand eye-to-eye now, I'll always be looking up to you.
To my beautiful Catherine, thank you for your love and encouragement during the past two years.
You were always my biggest supporter and best friend, and I cherish how you've helped me grow during
all our time together.
To my parents-thank you for your unwavering support during my time here. You have always
given me wisdom, love, and encouragement, no matter the circumstance. I feel so lucky to have been
raised by such caring, honest, and supportive parents, and I hope to aspire to your example someday.
Finally, I want to thank God for the many blessings in my life. "I can do all things through him
who gives me strength." Philippians 4:13
5
Table of Contents
ABSTRACT .................................................................................................................................................. 3
Acknow ledgm ents ......................................................................................................................................... 4
List of Figures ............................................................................................................................................... 9
List of Tables .............................................................................................................................................. I I
List of Acronym s ........................................................................................................................................ 12
1. Introduction ............................................................................................................................................. 13
1.1
Research Goals ............................................................................................................................ 14
1.2
Thesis Layout .............................................................................................................................. 16
2. Background ............................................................................................................................................. 17
2.1
Vigilance ..................................................................................................................................... 17
2.1.1 Definitions .................................................................................................................................. 17
2.1.2 Previous W ork ........................................................................................................................... 18
2.1.3 Fram eworks for Studying V igilance .......................................................................................... 19
2.2
Boredom ...................................................................................................................................... 20
2.2.1
Definitions ........................................................................................................................... 20
2.2.2
Previous W ork .................................................................................................................... 21
2.2.3
M easuring Boredom ...................................... ...................................................................... 22
2.3
M ental W orkload ........................................................................................................................ 23
2.3.1
Definitions and M odels ....................................................................................................... 24
2.3.2
Previous W ork .................................................................................................................... 28
2.3.3
W orkload Transition ........................................................................................................... 29
2.3.4
W orkload M easurem ent ...................................................................................................... 33
2.4
Neurophysiology and functional Near-Infrared Spectroscopy (fNIRS) ..................................... 45
2.4.1
Neurophysiology ................................................................................................................. 45
2.4.2
fNIRS Background .............................................................................................................. 46
2.4.3
How fNIRS measures cognitive activity ............................................................................. 48
2.4.4
Using fKIRS to m easure workload ..................................................................................... 51
2.5.
Summ ary ..................................................................................................................................... 54
3. Experim ental M ethods ............................................................................................................................ 55
3.1
Experim ental Framework ............................................................................................................ 55
6
3.2
Experim ent Conduct and Data Collection .............................................................................
3.3
Experimental Design...................................................................................................................64
3.4
Experim ental M atrix...................................................................................................................68
3.5
Summ ary.....................................................................................................................................69
4
59
Results.................................................................................................................................................71
4.1
Data Processing...........................................................................................................................71
4.2
Results.........................................................................................................................................72
72
4.2.1
Sam ple Sum mary ................................................................................................................
4.2.2
Baseline Analysis................................................................................................................73
4.2.3
First Event Analysis........................................................................................................
74
4.2.4
Tim e to the Maximum and Return to Baseline ...............................................................
76
4.2.5
Perform ance........................................................................................................................78
4.2.6
M odel Creation ...................................................................................................................
80
4.2.7
Chat Box Analysis.........................................................................................................
83
4.2.8
Second W ave Analysis...................................................................................................
85
4.2.9
Lateralization ......................................................................................................................
87
4.2.10
Long-Term Effects..............................................................................................................88
4.3
Summ ary .....................................................................................................................................
91
Conclusions.........................................................................................................................................93
5.
5.1
Experim ent Conclusions..........................................................................................................
5.2
Lim itations..................................................................................................................................99
5.3
Future W ork..............................................................................................................................
93
100
Appendix A: Experim ent Timeline...........................................................................................................
102
Appendix B: Participant Consent Form ....................................................................................................
104
Appendix C: Dem ographic Survey...........................................................................................................108
Appendix D : Boredom Proneness Survey ................................................................................................
109
Appendix E: Post-Experim ent Survey ......................................................................................................
110
Appendix F: M essage Panel Alert Times .................................................................................................
112
Appendix G : Summ ary of Variables.........................................................................................................
113
Appendix H Results Tables......................................................................................................................
115
Appendix : Return to baseline calculations.............................................................................................
118
7
Appendix J: Tim e elapsed: Final chat m essage to W ave 1 start ...............................................................
119
Appendix K : fNIRS Data with V igilance Features Plotted ......................................................................
120
References.................................................................................................................................................
135
8
List of Figures
Figure 1: Yerkes-Dodson Arousal vs. Performance Curve....................................................................
25
Figure 2: Model of Human Information Processing (Wickens & Hollands, 1999)................................27
Figure 3: fNIRS sensor diagram for prefrontal cortex measurement (Sassaroli, et al., 2008)................41
Figure 4: Operator with fNIRS sensors mounted on forehead...............................................................
42
Figure 5: Scattering of Photons in Tissue (from ISS, Inc.)....................................................................
47
Figure 6: Light Penetration in Brain Tissue using fNIRS (from ISS, Inc.).............................................47
Figure 7: Nominal hemodynamic response ...........................................................................................
50
Figure 8: Operator Display ...........................................................................................................----.....
57
Figure 9: fNIRS data collection method ................................................................................................
61
Figure 10: fNIRS Probe Applied to Forehead (Scholkmann, Klein, Gerber, Wolf, & Wolf, 2014)...........62
Figure 11: Performance vs. Workload...................................................................................................
68
Figure 12: fNIRS Data Functional Diagram ...........................................................................................
72
Figure 13: Baseline Comparison.................................................................................................................73
Figure 14: Subject Number vs. HbO % Change .........................................................................................
75
Figure 15ab: Estimated Marginal Means for % Change HbO (left), HbR (right)................75
Figure 16: HbR % Change from baseline ...................................................................................................
76
Figure 17: HbO Time to the Maximum ..................................................................................................
77
Figure 18: Return to baseline time..........................................................................................................
78
9
Figure 19: Average Final Track Error by Difficulty ..............................................................................
79
Figure 20: Average Final Track Error by Time and Difficulty ...............................................................
79
Figure 21: %Below Threshold by Difficulty .............................................................................................
80
Figure 22: %Below Threshold by Time & Difficulty ................................................................................
80
Figure 23: Average Track Error vs Distraction Coding State..................................................................81
Figure 24: NEO-FFI Scores
...................................................
82
Figure 25: Video Game Usage....................................................................................................................83
Figure 26: Chat Response Time vs. Primary Variables ...............................
84
Figure 27: Time of Last Chat Message Before Event .................................................................................
85
Figure 28: Wave 2 Average Final Error vs. Time & Difficulty.............................................................
86
Figure 29: Wave 1 vs. Wave 2 Final Track Error Performance.............................................................
87
Figure 30: HbO Lateralization Effects (Left-Right) ................................................................................
88
Figure 31: HbR Lateralization Effects (Left-Right) ................................................................................
88
Figure 32: HbO Response for Subject with Vigilance Decrement Pattern............................................
89
Figure 33: HbO Level-Off Time.................................................................................................................90
Figure 34: Slope to Level-Off....................................................................................................................
90
Figure 35: HbO-HbR integral for 4 quarters.........................................................................................
91
10
List of Tables
Table 1: UAV Com ponent Descriptions.................................................................................................
59
Table 2: Data Collection and Processing Parameters .............................................................................
61
Table 3: Video Coding Criteria...................................................................................................................63
Table 4: Test Matrix....................................................................................................................................68
Table 5: Return To Baseline test results ................................................................................................
78
Table 6: HbO Com parisons for Onset Tim e .............................................................................................
115
Table 7: HbR Comparisons for Onset Time .............................................................................................
115
Table 8: HbO/HbR Marginal M eans Onset Time Comparison ................................................................
115
Table 9: Perform ance M odel Summ ary ....................................................................................................
116
Table 10: Chat Response M odel...............................................................................................................116
Table 11: Lateralization Effects................................................................................................................116
Table 12: Lateralization by Difficulty & Tim e .........................................................................................
117
Table 13: Lateralization vs Performance M odel.......................................................................................
117
11
List of Acronyms
ANOVA
Analysis of Variance
ANS
Autonomic Nervous System
ATC
Air Traffic Control
BMDS
Ballistic Missile Defense System
BOLD
Blood Oxygen Level Dependent
BPI
Boredom Proneness Index
COUHES
Committee on the Use of Humans as Experimental Subjects
CT
Computed Tomography
DPF
Differential Path Length
EEG
Electroencephelograph
EKG
Electrocardiogram
FAA
Federal Aviation Administration
fMRI
Functional Magnetic Resonance Imaging
tNIRS
Functional Near Infrared Spectroscopy
HAL
Humans and Automation Lab
HbO
Oxygenated Hemoglobin
HbR
Deoxygenated Hemoglobin
HbT
Total Hemoglobin
HRF
Hemodynamic Response Function
MIT
Massachusetts Institute of Technology
NEO FFI 3
NEO Industries Five Factor Index 3
PET
Positron Emission Tomography
SA
Situational Awareness
SNS
Sympathetic Nervous System
TCD
Transcranial Doppler Sonography
TLX
Task Load Index
UAV
Unmanned Aerial Vehicle
12
1. Introduction
The US Ballistic Missile Defense System (BMDS) is an integrated global network of sensors and
assets put in place to defend against a myriad of threats to the United States and its allies. Because of the
high speeds of both the threats and the defense's interceptor missiles in a ballistic missile engagement
scenario, the timelines are short - on the order of minutes or tens of minutes. Furthermore, threats may
employ a variety of countermeasures to confuse the defense. Thus, operators must be able to receive,
store, and process information rapidly to make critical decisions like system performance evaluation,
information reliability, task prioritization, or action effectiveness determination.
Current systems are designed with highly autonomous subsystems for detecting, tracking,
engaging, and intercepting threats in order to meet the highly demanding timeline. However, while
algorithms are much faster at performing calculations and optimizing response, algorithms are also
notoriously brittle and susceptible to overloading. To counter this, a role for the human operator has
frequently been proposed for the system which requires a determination about which roles can be
performed better by the computer alone, by the human alone, or by a human-computer team (Forest,
Kahn, Thomer, & Shapiro, 2007; Murphy & Shields, 2012). While a human-computer team may be more
efficient under optimal conditions (Rathje, Spence, & Cummings, 2013), human operators may face
extended periods of extremely low mental workload that can degrade their response in various ways
before being required to transition to an intense period of activity (Bainbridge, 1983).
Missile defense is only one instantiation of a larger class of human supervisory control domains
which require human supervision of highly automated systems, with only occasionally or rarely-occurring
events requiring attention. This research seeks to better understand the processes of attention
management in low workload, semi-autonomous supervisory control situations and provide tools for
measurement of these processes. In a growing number of fields such as unmanned aviation,
manufacturing, nuclear power generation, security systems, commercial aviation, and many more,
13
humans have transitioned from the role of system operator to system monitor. Instead of continuous input
and feedback, a system monitor's role is to only intervene in critical situations when the system reaches
an unknown state or error or when new information changes the operational context.
Monitors today are given an abundance of information to synthesize and often must make
challenging decisions in decreasing time windows due to increasing operational tempo. At the same time,
automation improvements have reduced the frequency of monitor interactions to levels of very low
workload, leaving many workers in states of boredom, frustration, and distraction. Extended periods of
low workload and boredom have negative consequences on the health and satisfaction of workers, but can
also lead to reduced reaction times, increased errors, and reduced mistake detection (Brown & Carroll,
1984; Bruursema, Kessler, & Spector, 2011; Dyer-Smith & Wesson, 1995). When critical events occur,
monitors in these low workload supervisory control environments are fighting the cognitive momentum
of long periods of boredom in order to "spool up" to high mental functioning, a problem which will only
get more pronounced as anomalous events become more and more rare. Workload transition is, and will
continue to be, one of the most important factors in a successful implementation of a system operating
under a human supervisory control paradigm.
1.1
Research Goals
This investigation seeks to expand the understanding of how humans handle the abrupt transition
from extended periods of very long duration monitoring to intense periods of activity. This issue is faced
in a wide range of domains such as military unmanned aerial vehicle operators, nuclear power plant
operators, robotic manufacturing supervisors, and many more. The primary purpose of this research is to
gather psychophysiological data on low workload, high workload, and transitional workload periods for a
supervisory control task. Conclusions drawn from this research will help add to the understanding of how
14
humans handle supervisory control roles for long duration, low task loading scenarios, and could be used
to help develop novel methods for assisting humans during the different phases of a supervisory control
task.
The greatest difference between this study and previous research done by Hart (2010), Thornburg
(2011), and Mkrtchyan (2012) is that this work addresses the operator's psychophysiological state during
the low workload, transition, and high workload periods using functional near-infrared spectroscopy
(fNIRS). fNIRS is a non-invasive method for dynamically measuring the physiological changes
occurring in the brain, which can provide information about mental state over time. Other methods of
functional brain imaging such as functional magnetic resonance imaging (fMRI), positron emission
tomography (PET), electroencephalograph (EEG) and transcranial Doppler (TCD) all provide different
benefits and have different costs for measuring brain activity. fNIRS is unique for its relatively low cost,
high temporal and spatial resolution, and robust data collection, in addition to its ease of use in typical
applied laboratory settings. Previous studies using fNIRS have focused primarily on the high-workload
&
spectrum of activities such as multitasking and working memory (Durantin, Gagnon, Tremblay,
Dehais, 2014; Harrison et al., 2013; Sassaroli et al., 2008; Solovey, 2009; Solovey et al., 2012;
Tsunashima & Yanagisawa, 2009). Others have studied the cerebral hemovelocity in operators during a
low-task load vigilance state (Shaw et al., 2013; Shaw et al., 2009; Warm, Matthews, & Parasuraman,
2009; Warm, Parasuraman, & Matthews, 2008). This study first and foremost focuses on objectively
determining whether it is possible to detect the transition from low to high workload using fNIRS, using a
relatively large pool of participants. Four questions motivated the design and conduct of this experiment:
1.
Is a change in the measured activity of the fNIRS data correlated to an actual change in real
mental workload?
2. What is the impact of time in low taskload environment on operator performance and
response?
15
3.
Is the magnitude of a transition from low workload to high workload correlated to
performance?
4.
Can the pre-transition state of the operator relative to a known baseline be used to predict the
post-transition response of the operator?
This work explores many aspects of the four questions described above and lays the groundwork
for future research in the domain. Designers and scientists can use the results presented here to improve
the design and analysis of operator consoles, training methods, and operational protocols to help mitigate
detrimental patterns in a number of fields. Additionally, this work expands the database of fNIRS
measurement into the low-workload domain, an area previously unstudied and critically important in the
future.
1.2
Thesis Layout
This thesis begins with a background examination of the topics surrounding vigilance, boredom,
and workload in supervisory control settings in Chapter 2. This chapter also explores the principles of
neurophysiology and the methods used to measure brain activity. Chapter 3 lays out the framework for
the missile defense simulation experiment and describes the various components used. Chapter 4 presents
the results from the experiment, and Chapter 5 discusses conclusions drawn from this research endeavor.
16
2. Background
This chapter discusses the research that relates to the problems faced by semi-autonomous system
supervisors in terms of attention management and the methods used to measure them. The beginning of
this chapter will explore some of the classic and recent research relating to vigilance, boredom, mental
workload, and mental workload transition. Later, this chapter will introduce functional near-infrared
spectroscopy (fNIRS) and provide evidence for why it is a novel, suitable, and effective method for
measuring mental workload.
2.1
Vigilance
This section will cover a brief introduction to vigilance, highlights of the previous work on the
topic, and a summary of the methods used to investigate vigilance.
2.1.1 Definitions
One of the core ideas of human interaction with semi-autonomous systems is the concept of
vigilance. Merriam-Webster defines vigilance as, "a state of being alertly watchful especially to avoid
danger" and Wendell Phillips famously stated in 1852 that, "eternal vigilance is the price of liberty"
(Sears, 1909). In the context of human factors, however, vigilance is a common problem in human
supervisory control of a semi-autonomous system, specifically the detection of stimuli that occur at low
frequency. Experimental psychologist D.E. Broadbent classified vigilance tasks as ones, "whose chief
feature is that a man responds only to very infrequent signals but may have to watch for them over long
periods" (Broadbent, 1958). More recently, vigilance has been defined as, "the ability of organism to
&
maintain their focus of attention and to remain alert to stimuli over prolonged periods of time" (Davies
Parasuraman, 1982; Parasuraman, 1986; Warm, 1984; Warm, Dember, & Hancock, 1996).
17
2.1.2 Previous Work
The first true vigilance experiments were conducted by Norman Mackworth in the 1940s to
understand why radar and sonar operators were missing signals, especially at the end of their watch. He
devised an experimental device known as the Mackworth Clock that operators would watch for hours to
detect a double jump in the second hand, the "signal" in the experiment. He found that vigilance waned
quickly, as operator performance decreased 10-15% after the first 30 minutes followed by a more gradual
decline for the rest of the experiment (Mackworth, 1948). This "vigilance decrement" has been studied
and replicated in various forms in the decades following the original experiment (Alves & Kelsey, 2010;
&
Caggiano & Parasuraman, 2004; Davies & Parasuraman, 1982; Parasuraman, 1986; Parasuraman
Davies, 1976; Pigeau, Angus, O'Neill, & Mack, 1995; See, Howe, Warm, & Dember, 1995; Shaw, et al.,
2009; Warm & Dember, 1998; Warm, et al., 1996; Warm, et al., 2009; Warm, et al., 2008). A whole field
devoted to vigilance research has spawned a steady flow of experiments in fields as diverse as ballistic
missile defense to medical analysis, particularly as many fields have made the transition from human as
direct controller to human as a system supervisor.
Warm, Parasuraman, and Matthews completed a review of vigilance research in their Human
Factorsarticle "Vigilance Requires Hard Mental Work and Is Stressful" (2008). They list the fields
where vigilance is important and has been considered include:
military surveillance, air traffic control, cockpit monitoring, seaboard navigation,
industrial process/quality control, long-distance driving, agricultural inspection
tasks.. .medical settings such as cytological screening, electrocardiogram monitoring,
anesthesia.. .airport baggage inspection, and illicit radioactive materials detection at
border crossings and ports.
This list has expanded in the years following to include domains like unmanned aerial systems
(Tvaryanas, Platte, Swigart, Colebank, & Miller, 2008), network security monitoring (Bejtlich, 2013;
Heberlein et al., 1990) and autonomous automobiles (Llaneras, Salinger, & Green, 2013), and it will
continue to grow as automation creeps into almost every domain.
18
One of their most crucial findings is that vigilance is demanding work, even though the operator
may have very few tasks to actually complete. Several studies (Caggiano & Parasuraman, 2004; Miller,
Szalma, Warm, Hitchcock, & Dember, 1999; Parasuraman, 1979; Parasuraman & Davies, 1976; See, et
al., 1995) show that vigilance performance is correlated closely with task type, not the actual task. They
define two types of tasks: a successive task, where the subject must make an absolute judgment based on
stored memory, or a simultaneous task, where the subject must make a comparative judgment based on
information within the signal and requires little memory requirements. It was found that subjects perform
worse on successive tasks, which points to the conclusion that vigilance tasks demand attentional
resources (Parasuraman, Warm, & Dember, 1987; Warm & Dember, 1998). The important factors that
determine performance include event rate, event irregularities, spatial uncertainty of stimuli, and
multitasking.
2.1.3 Frameworks for Studying Vigilance
Several different frameworks can be employed for understanding performance during vigilance
tasks. First, using attentional resource theories such as those proposed by Kahnemann (1973) or Wickens
(1984), vigilance performance decrement reflects a depletion of attentional resources over time. Similar
to the attentional framework, Warm, Parasuraman, and Matthews also use the framework of mental
workload-the degree of information processing capacity that is expended during the task-to understand
vigilance performance. Subjective methods such as the NASA-Task Load Index (TLX) or Multiple
Resource Questionnaire (MRQ) provide a multi-dimensional measure of workload (Boles & Adair, 2001;
Hart & Staveland, 1988). Several different studies show that these studies are reliable measures of
workload (Wickens & Hollands, 1999) and that vigilance tasks correlate to high levels of mental demand
and frustration (Warm, et al., 1996).
19
The newest method that Warm, Parasuraman, and Matthews cite to measure vigilance is a
neurological or physiological approach. Using various neuroimaging techniques such as positron
emission tomography (PET) or functional magnetic resonance imaging (fMRI), they find blood flow
increases to the prefrontal cortex, the region of the brain responsible for attention and cognition, during
vigilance tasks (Parasuraman & Caggiano, 2005; Parasuraman, Warm, & See, 1998). Finally, they cite
that stress is also an important component of vigilance. Subjects performing vigilance tasks have been
&
shown to have increased levels of hormones linked to stress such as catecholamines (Frankenhaeuser
Patkai, 1965; Lundberg, 2005; Parasuraman & Davies, 1984) or epinephrine and norepinephrine
(Frankenhaeuser & Lundberg, 1982; Frankenhaeuser, Nordheden, Myrsten, & Post, 1971). Through
these different frameworks, Warm, Parasuraman, and Matthews provide a compelling case that vigilance
tasks are demanding and stressful.
2.2
Boredom
Boredom is an important concept that is closely related to vigilance and workload. This section
will present a variety of definitions and historical contexts, discuss some of the previous work, and survey
some of the measurement techniques available to researchers.
2.2.1
Definitions
Boredom is a phenomenon that stretches across many fields including psychology, philosophy,
physiology, anthropology, and more, so one of the greatest challenges when researching boredom is
finding a consistent definition. Although the lineage of the term boredom can be traced back to Roman
taedia, early Christian acedia, and French ennui, the first recorded usage of the actual word boredom
dates back to 1853 when Charles Dickens penned the words, "And I am bored to death with it. Bored to
20
death with this place. Bored to death with my life. Bored to death with myself." in his novel Bleak House
(Dickens, 1853).
Boredom has been described as "a common experience that affects people on multiple levels,
including their thoughts, feelings, motivations, and actions." (van Tilburg & Igou, 2012) Others have
described it as "an unpleasant, transient affective state in which the individual feels a pervasive lack of
interest in and difficulty concentrating on the current activity," (Fisher, 1993) or "an affective experience
associated with cognitive attentional processes" (Leary, Rogers, Canfield, & Coe, 1986). Current
boredom researcher Dr. John Eastwood states that, "through the synthesis of psychodynamic, existential,
arousal, and cognitive theories of boredom, we argue that boredom is universally conceptualized as 'the
aversive experience of wanting, but being unable, engage in satisfying activity."' (Eastwood, Frischen,
Fenske, & Smilek, 2012) These myriad of definitions highlight the broad and nebulous nature of
boredom itself, which is why there have been many studies on the effects of boredom.
2.2.2
Previous Work
Some of the earliest work on boredom research dates from the 1930s when researchers like
Joseph Barmack were first defining boredom as "a state which is...unpleasant principally because of
inadequate motivation resulting in inadequate physiological adjustments to it." (1939) Barmack found
signs of physiological changes over time, but the study was limited in time and controls. In the 1940s
and '50s, researchers began to more thoroughly study and quantify why human performance suffered
when operators were placed in boring roles. Extreme sensory deprivation and boredom experiments
actually led to hallucinations and vastly degraded cognitive abilities, similar to what was seen by aviators
on long-haul flights, radar operators in air defense, or sonar operators on submarines (Heron, 1957;
Vernon, Mc Gill, Gulick, & Candland, 1959). Others have cited sensory deprivation as having possible
21
beneficial or therapeutic uses, but it remains an area that few researchers are investigating today
(Suedfeld, 1975).
Studies by Thackray (1977; 1979) and Wickens (1997) examined Air Traffic Control (ATC)
radar operators, finding that while some accidents and mistakes occur during busy times, more often they
occur during the low to moderate periods of traffic. Hopkin (1988) noted that the vast majority of
research on ATC focuses on the high-workload scenarios, leaving research into vigilance and boredom
woefully underdeveloped. Boredom has also been studied in various working environments (Bruursema,
et al., 2011; Drory, 1982; Fisher, 1993; Kroes, 2007), education (Larson & Richards, 1991; VogelWalcutt, Fiorella, Carper, & Schatz, 2012), and other tasks (Leary, et al., 1986; Pattyn, Neyt, Henderickx,
& Soetens, 2008).
2.2.3
Measuring Boredom
There have been several approaches taken to predict and measure boredom. The most commonly
used measures for measuring and predicting boredom are surveys like the Boredom Proneness Index,
developed by Farmer and Sundberg (1986). This 28-item survey has been shown as a reliable and
consistent measure of attention and interest in measuring classroom boredom. The Boredom
Susceptibility Scale (Zuckerman, 1979) is similar in nature but not as commonly used. In his 2003
review titled "Boredom: A Review", Vodanovich reviews many other scales, such as the job boredom
scales by Grubb (1975) and Lee (1986), a boredom coping measure by Hamilton, Haier, and Buchsbaum
&
(1984), two scales for leisure and free-time boredom (Iso-Ahola & Weissinger, 1987; Ragheb
Merydith, 2001), the Sexual Boredom Scale (Watt & Ewing, 1996) and several more measures of
attention, sensation, and emotion (Vodanovich, 2003). Although the author cites a large number of
scales, he points out that there are many gaps in the literature due to the relatively little research done in
comparison to other emotions or states. Work continues to be done in this area, with new tests such as the
22
State Boredom Test (Fahlman, Mercer-Lynn, Flora, & Eastwood, 2013) being developed to address gaps
in the testing spectrum.
Other approaches to capturing boredom have focused on objective physiological measures.
Barmack's original study in 1937 found a strongly inverse relationship between boredom level and both
oxygen consumption and blood pressure. A study of an ATC operators test measured blood pressure, oral
temperature, skin conductance, body movement, heart rate and heart rate variability while 45 male
subjects performed a one-hour trial (Thackray, et al., 1977). They found that subjects who subjectively
reported greater boredom and monotony had significant differences in several of the physiologic
measures. The study concluded that the "nature of the pattern associated with boredom and monotony
suggests a pattern more closely related to attentional processes than to 'arousal'." One more recent study
used automatic posture tracking to measure students' affective state between boredom and high
engagement (D'Mello, Chipman, & Graesser, 2007). Weber, et. al. (1980) and Frankenhaeuser (1971)
both measured catecholamine levels during low activity tests but found somewhat conflicting results.
A similar study was conducted by Merrifield (2014) and showed that heart rate, skin conductance
and cortisol levels were sensitive measures to boredom. Other studies using electrocortical
techniques like electroencephelograph (EEG) to measure electrical activity emanating from the brain
or electrodermal measures (Davies & Krkovic, 1965) have proven to be weak indicators of boredom.
Overall, the research relating to pure boredom has been limited since much more attention has been
focused towards vigilance.
2.3
Mental Workload
This section will first discuss some of the most well-recognized models and definitions for mental
workload. Next, it will briefly sample some of the extensive literature of mental workload studies. It will
23
then focus on the case of workload transition, looking at how humans handle a change from low to high
workload. Finally, it will cover a variety of methods of workload measurement such as primary task
measures, secondary task measures, subjective workload assessments, neurophysiological measures, and
physiological measures outside the brain.
2.3.1
Definitions and Models
Vigilance and boredom are correlated with mental workload, which has been defined many
&
different ways. Entire volumes have been written about the subject (Hancock, Mihaly, Rahimi,
Meshkati, 1988; Moray, 1979; O'Donnell & Eggemeier, 1986) with Hancock's partial bibliography in
1988 listing over 500 papers. Clearly this review can only cover some of the interesting highlights that
are relevant to this research.
One of the first important distinctions to make is between taskload and workload. Taskload is an
objective measure of the number of tasks given to a person to complete. For example, an ATC controller
may have ten planes to control in their sector. This type of measure is subject-independent since it is
generated only by the environment and factors external to human. Workload, in contrast, is the perceived
difficulty of the task by the human. It is a relative term that is a combination of the objective measure of
difficulty (taskload) with factors internal to the human such as experience, skill, mood, or physiological
state. There are reliable tests to compare workload levels between humans, but ultimately there is still
some level of subjective criteria for workload internal to every human.
Some of the earliest research into the field of workload was conducted by Yerkes and Dodson in
the early 1900's. Their work evolved into the famous Yerkes-Dodson Curve, which shows that
performance is a function of arousal (1908).
24
d)
0
low
medium
high
Arousal
Figure 1: Yerkes-Dodson Arousal vs. Performance Curve
They found that humans perform best at a moderate arousal level, with performance decreasing at
the low and high arousal extremes. This research has been shown to be conceptually accurate over the
course of the previous century and has been extrapolated beyond arousal to studies of low and high
workload. Hancock and Warm (1989) showed that a similar "inverted U" relationship exists between
attentional resource capacity and stress level. Studies by Hart (2010) and others (Mkrtchyan, et al., 2012;
Thornburg, et al., 2011) show a lesser performance drop in the low workload domain, although the larger
body of vigilance and boredom work presents evidence that there is at least some decrement below the
medium level of arousal.
Studies of mental workload began to truly take root in the 1950s and 60s in domains like ATC
and aviation. The first seminal conference on workload was the 1977 North Atlantic Trade Organization
Symposium on Mental Workload (Moray, 1979). At that conference experts from different fields such as
experimental psychology, control engineering, and physiological psychology presented models and
definitions of mental workload, laying the foundation in many fields for decades to come. As part of the
experimental psychology group, Moray himself gives a definition of mental workload as:
A load is something which imposes a burden on a structure, or makes it approach the
limit of its performance in some dimension. Go far enough along that dimension and the
25
system will fail in some way. In the case of mental workload, the central concept is the
rate at which information is processed by the human operator, and basically the rate at
which decisions are made and the difficulty of making the decisions. (Moray 1979)
Many similar definitions have been provided in subsequent years, with the key emphasis that the
human operator is limited in capacity of mental resources and that increases in task difficulty will require
greater resources from the operator to adequately perform a task (O'Donnell & Eggemeier, 1986). From
this fundamental presupposition, there are two dominant models by which the human responds to a
mental load. Resource Theory was proposed by Kahneman in 1973 and modeled human mental abilities
as a single pool of resources that could be applied to an information processing problem (Kahneman,
1973). This model states that resources varied by individual, and if the amount of demands exceeded the
available resources, then tasks would be shed until reaching equilibrium at maximum capacity.
Wickens expanded upon this model with Multiple Resource Theory, which is widely cited as one
of the most comprehensive models of how humans handle cognitive activity (Wickens, 1984). Similar to
Resource Theory, Multiple Resource Theory posits that humans draw from a pool of mental resources in
order to address sensory and cognitive demands, which are then processed through an allocation policy
and then carried out by the body. Unlike Resource Theory, Multiple Resource Theory assumes that each
of the individual sensory and cognition systems such as visual, auditory, or tactile have separate pools of
resources that can be accessed independently. Each resource is limited in differing capacity, so the
human may reach a maximum workload at different levels of taskload or engagement. The maximum
workload for any given person in any one modality may vary with task and experience, such as an
experienced pilot listening to a noisy radio or an air traffic controller visually inspecting a cluttered radar
display.
26
While Resource Theory and Multiple Resource Theory have emerged as some of the overarching
theories of mental workload, the process of receiving and interpreting information has also been explored
in many other ways. Wicken's model of information processing, shown in Figure 2, provides a good
system diagram for representing the transformation of data from the environment into responses (Wickens
& Hollands, 1999). Humans receive the raw signal through sensory organs (eyes, ears, nose, etc.),
process those signals into usable pieces of information (text, speech, etc), draw upon their short- and
long-term memory to interpret the information, and finally use their decision-making centers to generate
and execute a response. This process has several potential bottlenecks where humans may be resourcelimited, such as working memory, attentional resources, and response selection phases (Marois & Ivanoff,
2005). Humans may be able to process signals in multiple modalities, such as simultaneous visual and
auditory signals, but are generally constrained by the response selection stage (Pashler, 1994).
Understanding the neural correlates to the psychological phenomena is a growing trend and one of the
underpinnings of this research.
Short-Twrm
Sensmry
StImuli
--
Dedn
nd
Percepton
Response
Excuton
S
LoWnrkTng
Memy
Feedback
Figure 2: Model of Human Information Processing (Wickens & Hollands, 1999)
27
2.3.2
Previous Work
As mentioned previously, there have been an extensive number of studies to evaluate human
mental workload. In the aviation domain, there have been numerous studies that seek to examine the
mental workload of airline pilots (Battiste & Bortolussi, 1988; Roscoe, 1992; Sheridan & Simpson, 1979;
Wilson, 2002), military pilots (Alfredson, Holmberg, Andersson, & Wikforss, 2011; Sirevaag, Kramer,
Reisweber, Strayer, & Grenell, 1993; Svensson, Angelborg-Thanderez, Sj6berg, & Olsson, 1997;
Svensson, Angelborg-Thanderz, & Sjaberg, 1993), and air traffic controllers (Hopkin, 1988; Wickens, et
al., 1997). Outside of the classic aviation sphere, there have also been many studies examining the mental
workload of highly trained specialists like surgeons (Berguer, Smith, & Chung, 2001; Klein, Riley,
Warm, & Matthews, 2005; Zheng, Cassera, Martinec, Spaun, & Swanstr6m, 2010), missile operators
(Berka et al., 2005; Hill, Zaklad, Bittner, Byers, & Christ, 1988) and astronauts (Manzey, 2000; Manzey,
&
Lorenz, & Poljakov, 1998), but also many studying more mundane tasks like driving (De Waard
Studiecentrum, 1996; Recarte & Nunes, 2003). For a full listing of all experiments done to measure
mental workload, the reader could consult synopses done by Damos (1991), Hancock and Meshkati
(1988), Kantowitz and Campbell (1994), Lysaght et al. (1989), Moray (1988), O'Donnell and Eggermeier
(1986), Warm et al. (1996), Meshkati (2011), or Vidulich (2012).
As noted previously, the bulk of the work has focused on high mental workload since those are
the domains that intuitively associated with critical situations. Endsley and Rodgers (1997) found that
there was a positive correlation between workload and operational errors in the high-workload domain.
However, there is evidence that many errors that lead to accidents or incidents stem from low or
moderate-workload situations. Several studies have shown that ATC operational errors have occurred
under low to moderate traffic complexity (Redding, 1992; Stager, Hameluck, & Jubis, 1989). Thus,
Hopkin (1995) argues, underload has been relatively under-studied for being a real threat to safe
operations.
28
2.3.3
Workload Transition
Although many of the previous studies focus on steady state workload situations, the real world
rarely exhibits this trait. Often there are transitions from low to high task load that require an adaptation
by the operator to meet critical new demands. These transitions are a common feature in many domains
such as emergency medicine and military operations. For example, a missile defense officer may need to
rapidly transition from a very low task load monitoring state to a very demanding engagement state within
the course of a few seconds in order to successfully intercept an incoming missile. Many other military
and civilian career-fields face similar challenges, but missile defense provides an excellent case study for
task, and thus workload transition because of its low frequency of engagement (almost zero), short
timelines, and high criticality. It is the epitome of what Hancock calls "hours of boredom, moments of
terror." (Hancock & Krueger, 2010)
While mental workload transition shares some of the same attributes of both vigilance and
boredom at one end and high workload at the other, there are some additional problems associated just
with the transition period. One of the defining attributes of workload transition is uncertainty. Often in
environments with high uncertainty, operators will use decision-making heuristics such as reduction of
information, assumption-based reasoning, weighing pros and cons, suppression of uncertainty, or hedging
with alternatives (Kahneman, Slovic, & Tversky, 1982; Lipshitz & Strauss, 1997). Surprise is also one of
the detrimental factors associated with workload transition since humans must overcome their "behavioral
momentum" during a workload transition (Nevin, Mandell, & Atak, 1983). Surprise is effectually
overcompensation during that "momentum" exchange to adjust to a discontinuity in mental model and
can result in diminished mental performance (Meyer, Reisenzein, & Schtitzwohl, 1997). Automation
&
surprise has become one of the leading factors in automation-related accidents (Sarter, Woods,
Billings, 1997; Woods & Sarter, 2000). Team coordination is especially critical during transition periods
since roles may change between a low and high-workload environment.
29
Huey and Wickens (1993) propose five direct factors of workload transition, along with several
other considerations that also influence performance. The first factor is the task character. Included here
are factors such as task structure, performance criteria, task schedule, presentation rate, task complexity,
task variability, task duration, and task requirements and procedures. The task nature is highlighted by
considering examples such as a fighter pilot that suddenly encounters a surface-to-air missile versus a
nuclear power plant operator that receives a caution message. Both may be life-threatening situations, but
the task schedule, duration, and complexity are much more demanding for the aviator than the plant
operator.
When considering the human as the plant within a control system introduced earlier in section
2.3.1, the next three factors could be considered the input, processing, and output elements. The second
direct factor of workload transition, the "input", consists of information that is being transmitted from the
environment to the human. This is done through visual displays, visuals from the scene, focal vision,
peripheral vision, auditory, haptic, olfactory, or other sensory modalities. In addition to the raw data
collection from the various senses, humans also must do what Wickens calls "encoding" in his Multiple
Resource Theory model. Encoding consists of processing and translating the raw data into usable
information that can be acted upon by the processing centers of the brain.
In terms of the third factor of workload transition, the "processing" functions of the brain, there
are several models good models for how processing functions. Wickens defines processing as two stages
of perception and responding. Another commonly used framework for information processing is
Rasmussen's "Skill-Rule-Knowledge" hierarchy (Rasmussen, 1983). In it, humans make decisions based
on a combination of their experience, task complexity, uncertainty, and time pressure. Skill-based
behavior uses extensive experience to assign an automatic response with little use of cognitive resources.
This is often the kind of paradigm associated with routine behaviors, like walking or driving, or with
experts like emergency room doctors or fire chiefs who have well-developed heuristics for diagnosis in
30
what has been called naturalistic decision-making (Klein & Zsambok, 1997). Rule-based behavior is
found with a medium experience level, with the human generally following "if-then" types of procedures.
Finally, knowledge-based behavioroccurs when the human has very little experience with the situation
and must take a systematic or novel approach. This type of decision-making has been well-explored by
various psychologists (Bekier, Molesworth, & Williamson, 2011; Janis & Mann, 1977; Kahneman, 2011;
Rovira, McGarry, & Parasuraman, 2007; Skitka, Mosier, & Burdick, 1999). A third paradigm for
understanding cognition is Endsley's Situational Awareness Model (1995). Situational Awareness is a
three-step process of perceiving the environment, understanding what is happening, and projecting
probable courses of action onto the environment. Cognition occurs at all stages, but particularly during
the comprehension and projection phases of SA.
The fourth variable to workload transition, "output" variables, can also impact a mental workload
transition. Output variables are the avenue by which the human imparts their decision upon the system or
environment. These include control design, control gain or lag, and order of controls. If the controls are
poorly suited to the task, the human will have to apply additional resources to accomplish their goals.
Anyone who has used a mouse with the gain set extremely high will intuitively understand this challenge,
and many studies in aerospace controls work have demonstrated the importance of proper control laws
(Wickens, Vidulich, & Sandry-Garza, 1984). Display-control compatibility can have important
consequences on information processing abilities, and any disconnect between the two will place an
addition burden on the operator (Jagacinski, 1989).
The final primary factor that Huey and Wickens cite is computer aiding and automation. With
increasingly complex systems, computer aiding and automation is a necessity. Several studies of aviation
show that workload peaks and overall workload were reduced with implementation of automation
(Haworth, Atencio Jr, Bivens, Shively, & Delgado, 1987; Vienneau & Gozzo, 1987). However, they also
cite that automation often is only the translation of workload into another form, sometimes leaving the
31
operator with mismatched tasks or reduced system knowledge and actually increasing the cognitive
demands (Hart & Sheridan, 1984; Kessel & Wickens, 1982; Wickens & Kessel, 1979).
While the five factors listed above (task character, input variables, information processing, output
variables, and computer aiding and automation) are the primary drivers of additional workload during
workload transition, they are not the only factors that influence how it occurs in actual operations.
Insufficient sleep and fatigue are well-known to be detrimental to mental performance and can exacerbate
the issues found in workload transition (Kahol et al., 2008). Studies in aviation claim that fatigue
accounts for at least 4-8% of all mishaps (Caldwell, 2005; Tvaryanas, et al., 2008). Research in other
critical fields like medicine are even more alarming, such as studies that cite that residents on-call made
36% more serious medical errors and made 300% more fatigue-related medical errors that lead to a
patient's death when compared to well-rested residents (Landrigan et al., 2004; Lockley et al., 2007).
Even though this has been known for decades, governing bodies like the FAA are still combating the
sleep and fatigue issues of commercial aviation in 2013 (Sachse, 2011).
Beyond physiological factors, there are other factors that can play important roles in mental
workload transition. Huey and Wickens' cite that spatial awareness is a crucial element to successful
operations and places a large demand on mental resources (Huey & Wickens, 1993). They also describe
several biases in geographic memory and spatial orientation that can be damaging to correct operations.
In addition to geography, they also cite cognitive tunneling and cognitive switching as two
important phenomena that influence mental workload transition. When switching between tasks, there is
a certain cost incurred on the mental processing resources. If this is done often, the cost of switching can
&
drain the resources available for either process and can lead to the detriment of both tasks (Wylie
Allport, 2000). Cognitive tunneling is another phenomenon that influences the mental processes of
humans. Often, humans tend to focus on a single piece of information or hypothesis and fail to see
32
&
alternatives, even though their information may be wrong or other options may be better (Thomas
Wickens, 2001). Accidents such as the Eastern Airlines crash into the Everglades is an excellent example
of pilots who were fixated on solving an issue (the landing gear light bulb had burned out), failed to detect
important information (the autopilot had disengaged and they were descending), and ended up crashing
the airplane (Reed, McAdams, Thayer, Burgess, & Haley, 1973).
Overall, workload transition draws upon elements of both the low and high workload domains.
Fatigue and stress are shown to have a significant impact on the functioning of operators during high
workload, and often these operators are subject to these conditions due to vigilance tasks. The models for
mental workload are primarily derived from high-workload scenarios, and many of the important factors
of the information processing model are also important in workload transition.
2.3.4
Workload Measurement
This section will give an overview of the important parameters of workload measurement and
highlight several different techniques used for the measurement of mental workload.
23.4.1 Overview
There are several important considerations that must be addressed when trying to measure mental
workload. O'Donnell and Eggemeier (1986) provide a good overview of the topic and highlight the
important characteristics of workload measurement. Wierwille, and Eggemeier (1993) reinforce these
conclusions and also provide recommendations for choosing the appropriate type of measurement
technique for a workload evaluation. The three most important factors in a good workload measurement
tool are sensitivity, diagnosticity, and task intrusiveness, although also important are global sensitivity,
transferability, and implementation requirements (Wierwille & Eggemeier, 1993). Sensitivity refers to
the capability of a technique to detect changes in the levels of workload imposed by task performance. In
33
testing for "choke points", a relatively insensitive measure may be acceptable since it only need
discriminate the highest aberrations. When testing apects like operational procedures, display designs, or
crew composition, finer discriminations in workload must be made. Diagnosticity is based upon the
multiple-resource approach and refers to the ability of a test to, " discern the type or cause of workload, or
the ability to attribute it to an aspect or aspects of the operator's task" (Wierwille & Eggemeier, 1993). A
test with high diagnosticity measures only the resources being strained by the task in question and
provides some explanation of the workload-driving elements. For example, if trying to measure the
mental workload of a person while using a variety of visual displays, a visual test should be used to
determine the excess capacity. Finally, intrusiveness is highly important in measuring workload,
especially in experiments that attempt to mimic natural settings. A test that is overly intrusive could
induce its own extra level of workload or divert the subject's attention from the primary task, thereby
muddling the results of what is actually being measured.
2.3.4.2 Primary Task Method
The most obvious method for measuring workload is using the primary task. Since primary task
performance is often closely tracked, it is possible to try to extract workload from the speed and accuracy
of the task performance. This type of workload measurement operates under the assumption that "speed
and/or accuracy of performance will decrease as workload increases beyond a critical value or threshold
for unimpaired performance" (Wierwille & Eggemeier, 1993). Primary task measures can provide high
sensitivity and should be considered during any workload analysis. However, others (Hart & Wickens,
1990) have cautioned that subjects may be able to expend more resources to keep performance high,
making primary task workload measurement insensitive, especially at the low-to-moderate levels where
operators can easily compensate for changes in demand.
2.3.4.3 Subjective Workload Ratings
34
Beyond primary task performance, there are a number of other methods for measuring workload.
One of the most commonly-used techniques is the subjective workload analysis, which commonly comes
in the form of a workload survey. There are several different workload surveys such as the NASA Task
Load Index (TLX) (Hart & Staveland, 1988), the Cooper-Harper Rating Scale (Cooper & Harper Jr,
1969), or the Subjective Work Index Test (SWAT) (Reid & Nygren, 1988), each of which takes a similar
approach to measuring workload through a series of questions. The TLX is the most widely-used test
because it has been found to be a reliable measure of workload (Hart, 2006) and provides a
multidimensional analysis of which component of overall workload is most important. The test first asks
the subject to rate their perceived workload on six subscales of mental demands, physical demands,
temporal demands, own performance, effort, and frustration. Subjects are then asked to compare which
category was more influential to weight each of the components into an overall workload score. This test
is generally administered at the end of the trial, but can also be administered during an experiment to
measure workload at different points in time (Thornburg, et al., 2011).
While subjective tests are shown to be a fairly reliable measure of workload, they lack a certain
purely objective element that is important for measuring the effectiveness of a display or interface.
Although they may be able to give good feedback about global levels of workload, these tests are not able
to generate workload measurements at pinpointed times throughout the entire experiment. The workload
of an operator during the critical phase of a test may be much more important than the overall workload to
a human factors engineer, so global workload measurement may not be particularly valuable.
Additionally, Murdock shows that humans are particularly vulnerable to the primacy and recency biases
in what he calls the serial position effect, where humans tend to better remember events at the beginning
and end but fail to remember the middle events (Murdock Jr, 1962). Hence, if a critical event occurred in
the middle of the experiment, the perceived workload might be attenuated by the time it comes to
complete a survey like the TLX.
35
2.3.4.4 Secondary Task Measures
Secondary task measures are a derivative of resource theories of mental workload and function
under the premise that task performance on the secondary task will degrade when there are fewer
resources to allocate to the secondary task (Knowles, 1963). They attempt to measure the reserve
capacity of a resource by placing extra demands in addition to the primary task. If using the general
Resource Theory, the secondary task type does not matter much as long as it provides a high level of
sensitivity. Using a Multiple Resource Theory presumption, the secondary task should be paired closely
with the primary task so that they are demanding the same resources (Hart & Wickens, 1990). As
discussed previously, a highly diagnostic secondary task will be closely paired with the primary resource
being used in order to accurately record the workload of the subject using that specific resource.
There are two primary categorizations of secondary task measures: unrelated and embedded
secondary tasks. Unrelated tasks are often arbitrary tasks that have been well-developed by psychologists
to measure workload. Examples of these kinds of tasks include time estimation, tracking tasks, memory
&
tasks, tapping tasks arithmetic, and reaction tasks (O'Donnell & Eggemeier, 1986; Schlegel, Gilliland,
Schlegel, 1986; Wierwille & Eggemeier, 1993). As workload increases, performance decreases in a
predictable manner. However, these tasks can be intrusive, unnatural, and difficult to standardize if
subjects are coming from a diverse background. A different approach to secondary task measurement is
the embedded secondary task. These use a more natural task that is discreetly embedded within the
interface or system, minimizing suspicion and evoking a more natural response from subjects. Cummings
(2004) showed that workload for Tomahawk missile operators could be measured using a chat box
interface, something highly realistic to actual operations, natural to the overall task, and minimally
intrusive.
36
2.3.4.5 Neurophysiological Workload Measures
A third approach to measuring workload throughout an experiment or task is physiological
tracking. Physiological tracking allows for continuous monitoring of subject state, whereas many primary
and secondary task measures can only measure the subject's state at discreet event times. Although
physical workload can be a very important factor in environments like high-G military flying (Burton,
1980), scientists interested only in mental workload have several options for workload measurement that
all begin with the brain. Cognitive activity is driven by the firing of neurons in different regions of the
brain, which consume oxygen and glucose, produce electrical signals, and give off carbon dioxide
byproducts (Koechlin, Basso, Pietrini, Panzer, & Grafman, 1999; Miller & Cohen, 2001; Raichle, 2011;
&
Raichle & Mintun, 2006; Ridderinkhof, van den Wildenberg, Segalowitz, & Carter, 2004; Roy
Sherrington, 1890; Speert, 2012; Whyte, 2011). Therefore, studying any one of these elements can be a
suitable method for measuring mental activity. Since the brain is always active at some level, each of
these elements of neural "combustion" must be compared to a baseline level during a control or resting
period. Additionally, each person's individual physiology is different, making absolute comparisons
between subjects or even within subjects on different days a challenging task (Rypma & D'Esposito,
1999).
This section will give a very brief overview of the noninvasive techniques available. Cabeza
(2000), Coyle (2003), and Huppert (2006) provide good overviews of the literature surrounding noninvasive neurocognitive measurement techniques and some guidelines to researchers looking to study
mental workload. Invasive techniques such as brain implants and surgery pose higher risk to the subject,
higher costs to experimenters, and are often incompatible with studies of conscious, cognitive behavior,
which is the focus of most psychophysiological research. Non-invasive methods are much preferred for
healthy subjects participating simple experiments because of the lower costs and risks coupled with
relatively high reliability of modem techniques (Bennett & Miller, 2010).
37
As mentioned previously, there are three main categories of brain activity indicators: supply,
products, and byproducts. The supply category can be broken into glucose and oxygen, the two main
ingredients required for neural firing. Glucose uptake can be measured using Positron Emission
&
Tomography, or PET scan, which uses a radioactive tracer that is analogous to glucose (Buckner
Logan, 2001). Cabeza's review of the many measures of workload that have been correlated with glucose
&
uptake shows it to be a viable and accurate method for neurophysiological measurement (Cabeza
Nyberg, 2000). PET scans do come with a cost, though, since they require subjects to receive a
radioactive treatment which exposes patients to potentially high levels of radiation. Additionally, the
radioactive isotopes to perform PET scans must be produced on-site and have relatively short half-lives,
which limits the length of a possible test (Raichle & Mintun, 2006).
The second "supply" element is oxygen. Oxygen is carried to the brain via hemoglobin in the
blood stream, so there are several different methods for determining how much oxygen is being supplied
to the brain. Transcranial Doppler sonography, or TCD, measures the velocity of the blood flow to the
brain, or cerebral hemovelocity. Several studies show that increases in cerebral hemovelocity are
correlated to increases in mental workload (Droste, Harders, & Rastogi, 1989; Warm, et al., 2009),
although this technique is generally limited to entire brain workload analysis rather than measurement in
specific regions. Another method for measuring the oxygen flowing to the brain is through functional
Near-Infrared Spectroscopy, or fNIRS. Section 2.4 will cover this topic in depth, but simply put this
technology allows for the measurement of the concentration of both oxygenated and deoxygenated
hemoglobin through the measurement of the absorption of near-infrared light at specific wavelengths.
While still a relatively nascent technology, it has been shown to be a viable tool for measurement of
neural activity in several regions of the brain (Izzetoglu et al., 2005; Sassaroli, et al., 2008; Wolf, Ferrari,
& Quaresima, 2007) and a promising method for populations who are unable to lie motionless for
38
extended periods like infants and the mentally impaired (Lloyd-Fox, Blasi, & Elwell, 2010; Meek et al.,
1998; Sakatani, Chen, Lichty, Zuo, & Wang, 1999).
With an adequate supply of oxygen and glucose, the brain performs cognitive activities through
networks of neuron firing (Raichle, 2011). Using ions separated by membranes, the neuron is able to
generate electrical potentials that are transmitted throughout the brain. This process is controlled by the
nucleus of the neuron, which uses glucose and oxygen to regulate the electrical activity. When the
activity is greater, such as during a period of elevated mental activity, the nucleus uses greater amounts of
glucose and oxygen to fire more rapidly and produce more signals. The electrical output, therefore, is
actually the most direct method for measuring what is truly going on in the brain, while the supply chain
is merely a support mechanism. Electroencephelography, or EEG, attempts to measure these electrical
signals through probes placed at various locations around the skull. This technique has been in use since
the 1940s and has been shown to be a viable workload measurement tool (Berka et al., 2007; Berka, et al.,
2005; Dussault, Jouanin, Philippe, & Guezennec, 2005), brain-computer interface mechanism (Coyle, et
al., 2003; Hu et al., 2011; Tan & Nijholt, 2010) and assistance device for the physically or mentally
impaired (Luld et al., 2012). It can provide high temporal resolution but has low spatial resolution and is
susceptible to motion artifacts such as blinking and head movement (Hu, et al., 2011). It is a relatively
well-developed technique, with algorithms that can filter out many types of noise and artifacts
(Manyakov, Chumerin, Combaz, & Van Hulle, 2011), yet it still has a relatively low signal-to-noise ratio,
low spatial resolution, and can be intrusive to subjects (Coyle, et al., 2003; Nijholt, Bos, & Reuderink,
2009; Tan & Nijholt, 2010).
The third phase of the neural firing process is the removal of waste, which generally takes the
form of deoxygenated hemoglobin that is carried away in the bloodstream. Deoxygenated hemoglobin
can be measured by fNIRS because of its specific light absorption properties, but it also can be measured
using its magnetic properties. Functional Magnetic Resonance Imaging, or fMRI, takes advantage of
39
these magnetic properties to measure the concentration of deoxygenated hemoglobin with excellent
spatial resolution (Buckner & Logan, 2001; Carr, Rissman, & Wagner, 2010; Logothetis, Pauls, Augath,
Trinath, & Oeltermann, 2001). Numerous studies have measured brain activity during various task types
and have shown activation in different regions, showing the utility of fMRI and reinforcing that different
parts of the brain are responsible for different kinds of mental activity (Cabeza & Nyberg, 2000; Causse et
al., 2013; Cohen et al., 1993; Curtis & D'Esposito, 2003; Jaeggi et al., 2003; Manoach et al., 1997;
McCarthy et al., 1994; Ochsner, Bunge, Gross, & Gabrieli, 2002; Price, 2010; Tootell, Hadjikhani,
Mendola, Marrett, & Dale, 1998). One of the key advantages of fMRI is the ability to map the entire
brain during an activity, allowing researchers to examine which regions work in conjunction during a
certain task (Monchi, Petrides, Petre, Worsley, & Dagher, 2001).
fMRI is an excellent tool because of its spatial resolution (S10mm 3) but it does have several
drawbacks (Carr, et al., 2010). First, it is limited in its temporal resolution, with a typical system taking
2-4 seconds per slice for 5-16 slices. It is expensive to own and operate, with average costs running from
$200/hour up to $1 100/hour at the Massachusetts General Hospital Martinos Center for Biomedical
Imaging. Since subjects must lie in a tube with surrounded by a large and acoustically noisy magnet, it is
a very unnatural environment and can reduce cognitive abilities or divert attention from the primary task
(Haller et al., 2005). Subjects must also lay very still, which precludes using this technique on many
populations such as the young or the handicapped. Finally, fMRI only measures the deoxygenated
hemoglobin signature, so is may be difficult to get a full picture of the entire hemodynamics, since
oxygenated and total hemoglobin levels are closely coupled to deoxygenated hemoglobin. Still, it
remains a very effective and standard tool because of the high spatial resolution, entire brain-mapping
capability, and extensive history (Bennett & Miller, 2010; Cabeza & Kingstone, 2001).
In contrast to fMRI, fNIRS is less capable in measurement of the entire brain but can be used in
more natural settings (Ayaz et al., 2012; Girourd et al., 2009; Helton et al., 2010; Hirshfield et al., 2009;
40
Izzetoglu et al., 2011; Sassaroli, et al., 2008; Shimizu et al., 2009; Solovey, 2009; Solovey, et al., 2012;
Son, Guhe, Gray, Yazici, & Schoelles, 2005; Tsujimoto, Yamamoto, Kawaguchi, Koizumi, & Sawaguchi,
2004; Tsunashima & Yanagisawa, 2009). fNIRS sensors generally are mounted on a headband or cap
that is quickly applied, is comfortable for extended wear (hours), and can be used in realistic settings like
desktop computer work (Solovey, 2009), simulators (Shimizu, et al., 2009; Tsunashima & Yanagisawa,
2009), or actual on-road driving (Takahashi et al., 2011). This type of sensor setup can be seen below in
Figure 3and Figure 4. The use of fNIRS in natural settings is an important distinction because laboratory
tests do not always translate into the complexities and nuances of real-world activities.
The lack of total brain mapping capability with fNIRS may result in less information to draw
correlations between different brain regions and map neural networks, but researchers looking to study
mental workload, working memory, and other executive functions are primarily focused on the prefrontal
region, so this may be an acceptable tradeoff. If a researcher was interested in another type of activity,
such as a motor task, fNIRS sensors could be applied to that region of the brain to measure related
activity. This has been done in populations that cannot lie still, such as mentally handicapped or infants
where fMRI is difficult or impossible (Meek, et al., 1998).
OP"
Ladw
Figure 3: fNIRS sensor diagram for prefrontal cortex measurement (Sassaroli, et al., 2008)
41
Figure 4: Operator with fNIRS sensors mounted on forehead
2.3.4.6 Other Physiological Workload Measures
While measuring the brain may provide the most direct measures of mental activity, there are
many other secondary measures that have been shown to reliably predict mental workload (Kramer,
1991). Physiological measures have several pros and cons. The intrusiveness of physiological measures
ranges from minimal impact for items like heart rate trackers to higher impact with eye trackers or
respiratory monitors. However, they are generally non-intrusive to the primary task, they can measure
activity even when the subject is not physically interacting with the system, they provide a
multidimensional measurement of the subject, and can record continuous data. They require specialized
equipment, data is often relatively noisy, and physiological signals can come from multiple bodily
sources which may have little to do with the experiment, so experimenters must always weigh the pros
and cons when considering using a physiological measurement (Kramer, 1991). Some of these measures,
42
such as heart rate or blood pressure, can be directly traced to an increase demand by the brain, but many
are secondary responses that are merely correlated with increased mental workload. Physiological
measures can be divided roughly into three categories: cardiovascular measures, ocular measures, and
sympathetic nervous system measures. While other types of measures such as posture (D'Mello, et al.,
2007) or muscle tension (Wierwille, 1979) have been suggested, the vast majority of non-neurological
physiological measures fall into one of these categories.
Cardiovascular measures are some of the most commonly-used methods for tracking workload
over time. Kramer lists the most common measures of cardiac activity, including electrocardiogram
(ECG or EKG) measures like heart rate and heart rate variability, blood pressure measures, and blood
volume measures. Generally, heart rate, heart rate variability, and blood pressure all increase during
periods of high mental workload (Durantin, et al., 2014; Hjortskov et al., 2004; Pattyn, et al., 2008;
Roscoe, 1992; Sirevaag, et al., 1993). These measures are relatively easy to obtain with a simple heart
rate and blood pressure monitor, both of which are minimally intrusive to the subject. Respiratory rate
can also serve as an indicator of higher mental workload, although it can be a more intrusive measurement
technique since the subject is required to don a mask of some kind (Roscoe, 1992; Sirevaag, et al., 1993).
There are several different measures of eye activity that are associated with mental workload.
Pupil dilation has been found to be a good measure of workload, with increased dilation occurring during
periods of high workload (Beatty, 1982). In addition to the pupillary response, some eye motion
components such as dwell times and scan pattern variability are shown to correlate well with mental
workload, while others such as eye speed and large-amplitude movement frequency are more weakly
correlated (May, Kennedy, Williams, Dunlap, & Brannan, 1990). Finally, several eyelid measures such
as blink rate, blink pattern, and closure duration have been proposed as possible workload measures with
varying degrees of confidence (Wilson, 2002). Overall, it appears that changes in pattern are more telling
than changes in raw characteristics when measuring movement factors (Kramer, 1991). Much progress
43
has been made in the past few years in minimizing the intrusiveness and improving the usability of eyetracking devices, with various types of trackers available that can be head-mounted or desktop-mounted.
The third type of physiological workload measurement is through symptoms of the sympathetic
nervous system, or SNS, which is part of the autonomic nervous system, or ANS. The SNS is commonly
associated with the "fight-or-flight" response and stimulates many systems in the body when activated.
One of the most commonly-used measures is galvanic skin response, which measures sweat produced in
certain regions of the skin. Galvanic skin response has been associated with mental workload in several
different environments (Berguer, et al., 2001; Davies & Krkovic, 1965; Wierwille, 1979) and is a
relatively low-intrusive technique. Another SNS response to workload is the release of hormones such as
catecholamines (Frankenhaeuser & Lundberg, 1982; Lundberg, 2005), cortisol (Dickerson & Kemeny,
2004) and norepinephrine, which is more commonly called adrenaline (Frankenhaeuser, et al., 1971;
Frankenhaeuser & Patkai, 1965). These indicators can be measured in real-time although are generally
measured through blood drawn immediately following an experiment, which can make this a more
intrusive method. The great concern with all of these SNS responses is that they are often criticized as
measuring stress rather than mental workload, two different phenomena that are often but not necessarily
linked (Hancock & Desmond, 2001). Therefore, it is prudent to use these methods in conjunction with
other physiological measures in order to reduce the type I error.
Overall, the use of physiological measures of workload is promising but should always be done
with a caution of confounding variables. Many factors beyond just mental workload can influence
physiological responses such as physical activity, environmental conditions, sleep and fatigue,
metabolism, physical conditioning and more, so it is incumbent upon the researcher to control as many
variables as possible. Additionally, it is sometimes difficult to link a behavioral event with certainty to a
physiological measure because these are all secondary functions to actual mental activity in the brain, so
there will always be a degree of systematic error involved. With that in mind, these measures are also
44
relatively cheap, well-studied, and minimally-intrusive methods for collecting information about the state
of the operator at different times throughout a task.
2.4
Neurophysiology and functional Near-Infrared Spectroscopy (fNIRS)
Since fNIRS will be central to the experiment described in this thesis, this section will expand on
the neurophysiological phenomena being measured by fNIRS and how those phenomena correlate to
mental workload. This section will also discuss some of the limitations of using fNIRS.
2.4.1
Neurophysiology
At the core of neural hemodynamic studies such as fNIRS and fMRI is the blood oxygen leveldependent (BOLD) signal. Quoting from the 2011 Encyclopedia of Clinical Neuropsychology regarding
the BOLD signal in fMRI:
BOLD imaging is a version of magnetic resonance imaging that depends on the different
magnetic properties of oxygenated versus deoxygenated hemoglobin and, thus, indirectly,
on variations in local tissue perfusion. The utility of BOLD imaging for fMRI also
depends on the physiological phenomenon by which metabolically active cerebral tissue
"demands" more perfusion than less-active tissue. Thus, populations of neurons that are
particularly active during a cognitive or motor task actually elicit a surplus of perfusion
which, in turn, results in an increase in the ratio of oxygenated to deoxygenated
hemoglobin, detectable as a change in the BOLD signal. (Whyte, 2011)
This phenomenon can be traced back to 1890, when Roy and Sherrington noticed that regional
blood flow increased in areas of neural activity (Roy & Sherrington, 1890). Today it is generally
accepted that an increase in neural activity in a certain region will demand greater blood flow to supply
more oxygenated hemoglobin and to remove deoxygenated hemoglobin. While very useful for its
noninvasive qualities, the BOLD signal is an indirect measure of what scientists are truly trying to
measure, neural activity. The hemodynamic response is a byproduct of neural activity and is often
45
delayed by several seconds or is confounded by other tissue or hemodynamic phenomenon. Additionally,
BOLD measures the relationship between oxygen delivery and oxygen extraction rather than actual
oxygen consumption like PET. Finally, fMRI techniques generally extract useful information by
comparing two conditions of activity rather than against an absolute reference, which makes it an
excellent tool for classic "stimulus-response"-type experiments but less useful for naturalistic
experiments, where there is still some debate about what a "resting" state truly looks like or if one even
exists (Whyte, 2011).
2.4.2
fNIRS Background
fNIRS was first developed in 1977 by Jobsis's discovery that light at certain wavelengths in the
near-infrared spectrum passes through bone and tissue but is absorbed and scattered by hemoglobin
(JMbsis, 1977). Subsequent development refined the technique to be able to accurately measure
oxygenated and deoxygenated hemoglobin in the brain and other regions of the body (Chance et al.,
1998). fNIRS functions by injecting near-infrared light from light-emitting diodes or lasers at certain
wavelengths into the region of interest. This light passes through the skin and bone and is absorbed and
scattered by the hemoglobin circulating through the region of interest, which can be seen in Figure 5 and
Figure 6. The amount of light that is returned to the sensor is measured through photomultipliers, which
are converted into the digital signal used for post-processing. Finally, this optical signal is processed
through the Modified Beer-Lambert Law to convert optical intensity into hemoglobin concentration. This
general process is applicable to all tNIRS devices, although the exact methods for measuring hemoglobin
vary slightly between machines.
46
Figure 5: Scattering of Photons in
Tissue (from ISS, Inc.)
Figure 6: Light Penetration in Brain Tissue using NIRS
(from ISS, Inc.)
tNIRS devices for fingertip blood oxygenation are nearly ubiquitous in the medical community,
but their usage as a cognitive research instrument is still modest but growing (Wolf, et al., 2007). INIRS
could be considered a cousin of fMRI because they both measure the same BOLD signal but in different
ways. While fMRI measures hemoglobin levels using the different spin characteristics of oxygenated and
deoxygenated hemoglobin, fNIRS observes the same physical characteristics by measuring the absorption
of light at 690 nm and 830 nm. Due to the optical properties of hemoglobin, the concentration of
oxygenated and deoxygenated hemoglobin can be determined when combined with models of the optical
properties of the brain tissue.
Since fNIRS and fMRI measure the same physiological response (hemoglobin concentration),
several researchers have tried to correlate results from the more-established fMRI to the relatively nascent
fNIRS in order to bring greater reputability to independent fNIRS studies. This comparison between
fMRI and fNIRS has proven valid in a number of comparison studies, most notably a by Strangman
(2002) and Steinbrink (2006). Strangman shows that the response recorded by fNIRS during a motor task
was similar to the response simultaneously recorded by fMRI, and Steinbrink reviews nineteen different
fNIRS-fMRI simultaneous studies which show general overlap in conclusions. Cui captured
simultaneous fNIRS and fMRI data in the frontal and parietal lobes during a battery of cognitive tasks and
47
&
found a lower signal-to-noise for fNRS but a highly correlated response (Cui, Bray, Bryant, Glover,
Reiss, 2011).
fNIRS can also be used in complimentary roles with EEG, as shown by Hirshfield (2009). EEG
is also commonly used in brain-computer interfaces. EEG is a well-studied method for measuring brain
activity, but has several limitations. Although "dry" electrode systems exist, many systems still require a
gel application to the scalp to attach the electrodes, which is inconvenient for usage outside the lab.
Additionally, EEG is susceptible to several types of artifacts, including physiological differences,
environment changes, body movements, and especially ocular movements (blinking, eye movement, etc)
(Hu, et al., 2011). Finally, all data must be processed through feature extraction, which is highly context
dependent and relies on algorithms to select the best features for real-time analysis. Using fNIRS
simultaneously with EEG may help to show the overlap in capabilities so that researchers already wellversed in EEG can confidently use methods or results from fNIRS experiments. To summarize, using
multiple brain recording devices allows researchers to record a response in multiple channels, improving
the certainty that a response truly occurred while diminishing the noise that a single channel may have.
2.4.3
How fNIRS measures cognitive activity
Neural activity in a local region generally results in an increase in oxygenated hemoglobin and a
decrease in deoxygenated hemoglobin, although this simplification does not capture the full complexity of
the brain activation response (Raichle & Mintun, 2006). While a simplification, it does reflect some of
the overall mechanisms of the brain as a muscle, with neurons consuming glucose and oxygen to produce
electrical signals through a process called neurovascular coupling (Le6n-Carri6n & Le6n-Dominguez,
2012). As Raichle and Mintun point out though, while the brain consumes nearly 20% of the total energy
of the body it only accounts for roughly 3% of its mass, with much of the activity still occurring at resting
48
state. Therefore, the relative changes due to functional activation are not nearly as large, for example, as
might be expected in the bicep while doing a weighted curl compared to resting.
The tissue of the brain does not store oxygen well, so supplies must be constantly replenished,
which means that studying the blood flow is roughly equivalent to studying the overall content of the
blood volume. fNIRS measures the concentration of oxygenated and deoxygenated hemoglobin in a
localized region. Since these resources are constantly being used, even during resting, there must be a
constant resupply of oxygenated hemoglobin to the region. The high ratio between blood flow and blood
volume indicates that while little oxygen resources are stored in the brain, much oxygen is still required.
This can easily be confirmed by the fact that most humans lose consciousness in less than 15 seconds
after blood flow is cut off to the brain. However, the brain only uses about 40% of the oxygen that passes
through it during a normal state, indicating that there is generally some excess capacity to handle
instantaneous jumps in activity.
The "aerobic" mechanism to supply more oxygen to regions with sustained activity does not
begin to engage until 4-6 seconds after the initial response. Some have postulated that there may be an
"initial dip" in the first second of activity, but once the brain senses sustained activity, it will begin to
provide greater overall blood flow to provide the region with more oxygenated hemoglobin (Marxen,
Cassidy, Dawson, Ross, & Graham, 2012; Obata et al., 2004; Steinbrink, et al., 2006). There are several
hypotheses for the "initial dip", including local changes in oxygenation concentration due to metabolic
demand and an increase in total blood volume. The signal-to-noise ratio for these systems makes
answering this question very difficult, which is why transient phenomena less than one second, both
immediately following the stimulus presentation and during the return to baseline, are still an issue of
&
some controversy and under investigation (Marxen, et al., 2012; Schroeter, Kupka, Mildner, Uludag,
von Cramon, 2006). Additionally, there is often a tradeoff between spatial and temporal resolution, so
machines such as fMRI generally are too slow while systems such as fNIRS do not have enough spatial
49
resolution. This challenge is especially apparent in real-time systems, where both spatial and temporal
resolutions are obstacles to quick, precise measurements.
Even though the research into the transient phenomena is still nascent, a much broader body of
work has found that deoxygenated hemoglobin levels are generally found to decline during the first 4-6
second response to increased mental workload and then trend back towards steady state. Overall,
sustained mental workload is correlated to a sustained increase in oxygenated hemoglobin and an initial
drop followed by recovery towards equilibrium for deoxygenated hemoglobin (Steinbrink, et al., 2006).
The nominal response is shown below in Figure 7, with the hypothesized "early dip", the "aerobic"
response corresponding to sustained perfusion and neuronal activity, and finally a return to baseline after
the response-generating event has concluded. According to researchers at Tufts University's Biomedical
Engineering Department, the magnitude of the response is generally 3-5% of the baseline flow, but can
range up to 10% for total blood flow during intense activity (Mandeville et al., 1999).
Notional Hemodynamic Response
Event
-10
Oxy
-
Deoxy
"Steady state"
"Steady state"
-20
--
0
40
50
30
10
20
Time from stimulus onset (sec)
60
70
80
Figure 7: Nominal hemodynamic response
50
2.4.4
Using fNIRS to measure workload
Many see brain-computer interfaces as a way to measure mental workload to ultimately ease the
cognitive burden on humans, or help the mentally disabled (Tan & Nijholt, 2010). As a workloadmeasurement device, fNIRS is also compatible with other methods traditionally used by cognitive
psychologists and human factors engineers such as heart rate measures, eye-tracking measures, and skinresponse measures. Several studies have used fNIRS with techniques like heart rate variability (Durantin,
et al., 2014), or combinations of systemic signals like heart rate, blood pressure, galvanic skin response,
respiration, and scalp blood flow (Jelzow et al., 2011) that show the correlation between secondary
physiological responses to mental workload. Jelzow found that galvanic skin response and mean blood
pressure have the highest correlation with hemoglobin concentration changes, with r values up to 0.5.
Additionally, fMRI studies such as Gianaros (2004) and Napadow (2008) show significant correlations
between specific brain regions and cardiac measures like heart rate variability, with correlation t-values
ranging from 4 to 6 and p<0.05. These studies show that fNIRS is a reliable workload measurement tool
that correlates well with other physiological signals and can capture additional information about cortex
activity. These studies show that fNIRS correlates well with other physiological signals and can capture
additional information about cortex activity. Psychophysiological measures are especially useful because
they have the potential to discriminate mental activity from just stress. Stress often accompanies mental
workload but is not a prerequisite since it is possible to have high levels of stress but low levels of mental
activity (Wierwille, 1979). Using secondary physiological responses can help reinforce the brain data or
help to remove potential artifacts that might arise such as overall changes in blood pressure or heart rate.
Similar to the miniaturization and commercialization of other biometric measuring devices, as
fNIRS is miniaturized and mobilized there will likely be an expanding pool of researchers using this
device to measure blood flow in the brain. With miniature or wireless devices, fNIRS could be used
outside of tightly controlled research environments to capture mental activity in entirely natural
51
environments like the cockpit, where stress and danger are far more realistic. A wireless system would
allow for even greater comfort, improved mobility, and possible introduction into more extreme
environments where a tethered system would either be impossible, impractical or unsafe. Additionally,
smaller and cheaper devices could make brain-computer interfaces more ubiquitous as a viable
commercial product for adaptable automation systems.
While there is great promise for fNIRS, it is still a developing technology with many problems
still to be solved. Challenges using fNIRS were found in the literature (Le6n-Carri6n & Le6nDominguez, 2012; Wolf, et al., 2007) and through discussions with experts in the domain such as Dr.
Angelo Sassroli of Tufts University's Department of Biomedical Engineering. The sensors are very
sensitive to light, so currently measures must be taken to minimize exterior light, whether through
shielding the sensors or through dimming the testing environment, dampening the realism of some
situations. If the probes are applied incorrectly, fNIRS is also susceptible to light channeling, which can
flood the sensor with ambient light and wash out any actual signals (Wolf, et al., 2007). While fNIRS is
more robust to motion artifacts like eye blinks or minor head movement, it is still susceptible to gross
head movements so it could not currently be counted on to be reliable if applied to some critical tasks.
There are several different filtering methods to eliminate artifacts, but these methods are still susceptible
to failure and have only been tested in tightly controlled laboratory experiments (Solovey, 2009).
At a physiological level, fNIRS also has several limitations that are summarized by Leon-Carrion
and Leon-Dominguez (2012). fNIRS is limited in the depth it can measure into the brain, due to
excessive scattering and absorption that occurs as light travels farther into the tissue and the optical
window of the light wavelengths used. This limits the ability to study activity deep in the brain,
restricting studies to only the outer structures. Dark skin or hair can also absorb some wavelengths and
attenuate signals. Additionally, brain tissue composition and geometry vary slightly between humans so
it is difficult to get absolute measures of hemoglobin levels without knowing the individual's differential
52
pathlength factor (DPF). DPF is a constant that is used in the calculation of hemoglobin concentration
and can be obtained through measurement of the absorptivity of light through the tissue. Without the
DPF, it is possible to only make conclusions about relative changes, not absolute ones. The actual DPF
can be determined through additional measurements and processing, but it is generally considered a
constant under the assumptions of homogeneous tissue makeup and constant brain geometry from one
measurement to another (Wolf, et al., 2007). The other main assumption of fNIRS is the diffusion
approximation. In order to use the Boltzmann transport equation to convert light intensity into
hemoglobin concentration, it must be assumed that the tissue is homogeneous, scattering is much larger
than absorption, and the tissue has a specific geometry (Wolf, et al., 2007).
fNIRS is also limited in its ability to detect highly localized changes in activity. When localized
activity occurs and oxygen is consumed, the brain reacts by oversupplying the entire region with oxygen,
which is the response that is detected by fNIRS. This physiological limitation allows researchers to
determine the general region of activity but precludes knowing the precise location of activation.
Measuring the initial response in the first second is a difficult challenge because of limited spatial
resolution and low signal-to-noise, and it is likely only possible to definitively quantify these transient
phenomena with improved equipment and processing techniques (Marxen, et al., 2012). Finally, fNIRS
relies on the hemodynamics of the brain, so it is limited in precision and accuracy by the biological
mechanisms that control the brain and the individual differences between every human. While there are
still many problems to solve with fNIRS, it remains a promising technology that could help to better
understand human cognition and augment human performance.
53
2.5.
Summary
This chapter describes the important phenomena surrounding jobs that are primarily low intensity
but have critical periods requiring operator engagement and performance, such as BMD, ATC, or UAVs.
Operators of these systems must manage extended periods of vigilance and boredom, which are shown to
be detrimental to performance and operator well-being. Mental workload is one of the critical elements
of performance and should be moderated from becoming too high or too low. When operators transition
from one level of mental workload to another, there are several additional stressors that can impact
cognitive abilities, and designers should take care to provide adequate decision aids to facilitate these
transitions.
Mental workload can be measured in many ways, such as task performance, subjective ratings, or
physiological measures. The physiological response of the brain has been measured through devices such
as EEG, fMRI and fNIRS to track different physiological signs of increased activity. fNIRS is a noninvasive method for tracking mental workload by measuring blood oxygenation in specific regions of the
brain, and is a promising tool for psychophysiological study in natural settings in and outside of the lab.
The next chapter introduces a study conducted using fNIRS to measure the neurophysiological response
of military-age subjects during a simulated BMD mission, looking at how varying periods of low taskload
can impact the response during a cognitively challenging task.
54
3. Experimental Methods
3.1
Experimental Framework
The experiment employed a simulation designed to mimic aspects of the job of an Unmanned
Aerial Vehicle (UAV) sensor operator. This operator's job is to track threatening objects in a ballistic
missile defense environment until their positions are known with sufficient accuracy that they can be
engaged by defense missiles. The engagement timeline is very short, so many actions and decisions must
be made very quickly, often with uncertain information and potentially without warning. The simulation
was a Java-based environment where subjects were required to allocate radar tracking assets to reduce
track error on simulated ballistic missile threats to a specified threshold in an unclassified test scenario.
The primary task of the operator was to allocate assets to track any threats to reduce track error, and the
secondary task was to monitor text messages, known as the chat box, for any messages or alerts and the
map for situational awareness during the test.
The scenario was the launch of an unknown number of threats from a mid-Pacific location
towards other locations in the Pacific. The system simulated having satellite coverage to alert the
operator when the threats were launched. The operator was then required to track the threats via UAV
sensor to achieve a predetermined accuracy. In the simulation, all threats followed one of 3
predetermined trajectories unknown to the operator. The operator achieved the required accuracy by
controlling the tracking sensors on 3 UAVs.
In this simulation, the operator allocated the UAVs with the sensors that track the threats. The
tracks are subsequently fed to the Fire Control System, which uses the track data to engage targets with
interceptor missiles, but this aspect is not part of the study. The subjects were informed that their mission
was to achieve a certain level of tracking accuracy on all targets. If the threshold was not met, the
55
interceptor missiles could not be fired since they cannot acquire the targets independently. The operator
had the following displays, which can be seen together in Figure 8 and in detail in Table 1:
0
3 windows showing the field of regard of the tracking sensor for each UAV
*
1 window showing track accuracy for a selected target
*
1 message panel recording events and system messages
*
A map showing a 2D representation of the UAVs and targets
0
A timer and clock
*
A chat box
The primary goal of the operator was to assign UAV sensors to track the threats. Each UAV can
only track one threat at a time, and the objective was to track each threat long enough to achieve a track
error below the specified threshold. If necessary, the operator could use multiple UAVs together for
"stereo viewing" to achieve a lower track error much more rapidly than when only using one UAV per
target. Thirty to sixty seconds after launch, the targets became visible in the UAV sensor tracking
windows, which can be seen on the left side of the display in Figure 8. The participant does not know
when the event will occur and receives no direct alert that an event is about to begin. The targets
suddenly appear in the UAV Sensor Tracking Windows and in the Tracking Error Display at the start of
the event. The operator receives a message in the "System Message Display" that the system is on alert
before the event, but also receives at least one false alarm message before the event, as seen in Appendix
F. These messages helped reinforce the supervisory control task and did not directly signal the start of an
event, but may have provided some priming for subjects. The missiles remained in the UAV sensor
tracking window until they are out of the field of regard of the UAV. As the threats pass through the
fields of regard, the operator must re-task the UAVs to achieve the required error on all the threats. Other
tasks that the user had to do were monitoring the chat box for new messages and monitoring the map to
keep situational awareness of where the threats are at all times.
56
System
Track Error
Display
I
2-D Map
Display
Message
Display
I
I
Sensor
Tracker
Displays
I
I
Simulation
Simulation
Clock
Timer
Chat
Message
Display
Figure 8: Operator Display
57
COMPONENT NAME
COMPONENT PICTURE
UAV Sensor Tracker Display
-Shows targets available to be tracked by UAV sensor
(ovals) and target currently being tracked (outlined with
dashed line)
-Operator clicks on an available target to direct the
sensor to focus on target
-Small square represents where sensor is actually
pointing
-If sensor is not pointing at the threat, it is not
receiving any tracking data
-The sensor can only slew at a finite speed, so
sometimes there is a small lag time when the
sensor is directed to a new object
-Shows elevation (y-axis) vs. azimuth (x-axis)
Message Panel
-Displays messages from the system when launches are
detected
-Is not interactive but functions to alert when event may
be starting
Track Error Dispflay
-Shows the track error of system (y axis) on the selected
target vs. time (x axis)
-Track error must go below a pre-calculated threshold of
the interceptors in order to engage the target
-Operator can toggle between targets by clicking on
target boxes on the left side of window
58
2-D Mai Display
-Shows where UAVs are located, which UAV is
tracking which threat, where each threat is currently
located, and the calculated impact point for each threat
-Used as a situational awareness tool for the operator
-Solid lines show which UAV is currently locked on to
which missile
-Dotted lines show track of missile every 10 seconds
Chat Box
-Displays messages from "command"
-A simulated commander that is physically
separated from the operator
-Used to measure subject workload and situational
awareness
-Does not display UAV system information
Table 1: UAV Component Descriptions
3.2
Experiment Conduct and Data Collection
The test was conducted in the Human-Computer Interaction Lab at Tufts University. All
procedures were reviewed and approved by MIT's Committee on the Use of Humans as Experimental
Subjects (COUHES). All subjects were asked a series of eligibility questions and then were asked to read
and sign a consent form. Following the training period, the subjects were seated in front of the two
monitors used to interact with the system. The participants were knowingly video recorded, and all
computer interactions were collected using Camtasia* recording software. The primary data collection
was through the simulation computer logs, although subjects were informed that video recordings are
used as a backup in case of data logging failure. The simulation logged all interactions, such as chat box
message responses, final performance measures, and number of clicks on the various objects. The video
recording was combined with the interaction recording in order to create a log of user activity for the
periods surrounding the critical events. The encodings follow a similar encoding scheme as Mkrtchyan
59
(2012), Thornburg (2011), and Hart (2010) by classifying the subjects as directed, distracted, or asleep/
completely unaware attention states. These encoding states are further discussed in Table 3. The subjects
stayed seated in the testing room for the entire 3 hour experiment.
In addition to the video recording and computer recording, data was also collected using a
functional near infrared spectroscopy (fNIRS) measurement device. The device employed in this research
was the Imagent Functional Brain Imaging System Using Infrared Photons, developed and manufactured
by ISS, Incorporated. This device is a "non-invasive tissue oximeter for the absolute determination of
oxygenated and deoxygenated hemoglobin concentration, oxygen saturation and total hemoglobin content
in tissues". The overall process for data collection can be seen below in Figure 9. The data were
collected using the Boxy software package created by ISS Inc. and went through several steps in the
processing stream. The raw data returned from the Imagent are simply the light readings from the
sensors. These data are first transmitted to a computer, which has software that decodes the raw data
from the Imagent and converts it into a standard format of optical intensity with associated time markers.
The data are then sent across a local network to a second computer which runs a MATLAB script which
writes the data into a formatted file. These files are then processed using the Homer2 user interface
developed by the Massachusetts General Hospital Martinos Center for Biomedial Imaging. This software
uses the Modified Beer-Lambert Law to convert the light intensities into hemoglobin concentration levels
and allows the user to apply certain filters such as bandpass filters. Homer2 also allows the user to extract
the hemodynamic response function (HRF) from the overall data. The HRF is a plot of the concentrations
for a given amount of time surrounding the event.
HRF data was collected using the 60 seconds prior to a missile wave and the 100 seconds
following the arrival of the first missile, which was the length of the critical event period. This period
was chosen since it was short enough to simulate the time pressure of a real-world engagement, but long
enough to capture a variety of responses by participants. Subjects were not explicitly informed how long
60
the event would last, but the training tutorial provided a guide for the approximate length of a normal
engagement. They were also instructed that it was important to act quickly since threats may only appear
in the viewing area for a short period of time. Some of the key sampling and processing parameters can
be seen in Table 2 below.
Light is absorbed
differently by different
hemoglobin components
Near Infrared light
pulses are sent through
prefrontal cortex
Signal processing
software turns detection
signal into brain activity
Near Infrared light
sensors receive
transmitted light pulses
pattern by Boxy
software
Brain activity patterns
compared between
varying conditions and
to benchmarked data
using Homer2
Figure 9: fNIRS data collection method
Parameter
Description
Sampling frequency
12Hz
Sources spacing
9 total (4 left, 5 right). Spaced linearly from sensor. Dist. in cm:
Left: 2.04, 2.52, 3, 3.45 Right: 1.48, 1.95, 2.46, 3.0, 3.45
*Only 8 sensors used at one time (4 left, 4 right)
Source laser
Fiber coupled laser diodes
Wavelengths: 690nm, 830 nm
Avg power 10mW
Light detectors
Photomultiplier tubes
Sensors
Selected side-on photomultiplier tubes
Low pass filter
0.5 Hz
Table 2: Data Collection and Processing Parameters
61
Operators must be able to quickly switch attention between incoming information from multiple
sources (multi-tasking) while storing and synthesizing that information to create a unified mental model
of battlespace (working memory). Both of these functions are associated with the prefrontal cortex, so the
fNIRS sensors were applied to the forehead directly over the region of interest, as seen in Figure 10.
NIRS optode
Figure 10: fNIRS Probe Applied to Forehead (Scholkmann, Klein, Gerber, Wolf, & Wolf,
2014)
The subject donned the fNIRS measurement device at the beginning of the calibration period and
wore the device throughout the entire session. The subject was asked to try to avoid moving the sensors
in any way and to refrain from furrowing their brow since that has been shown to inhibit good data
collection from the device (Solovey, 2009). While the system is resistant to minor movement, it was
imperative to closely monitor the data and the subject during the experiment to determine if the device
has significantly moved from its original position. Other factors such as blinking, minor movements, and
heartbeat either have a low impact on good data collection or can be mitigated through filtering (Solovey,
2009). The fNIRS data were recorded throughout the entire experiment.
In addition to the fNIRS data, there were several other sources of data collected. Before the
experiment begins, the subject filled out three surveys. The first survey was a demographic survey which
62
recorded factors such as age, gender, occupation, military experience, and sleep, as seen in Appendix C.
The second survey was the Boredom Proneness Survey, a standard for measuring propensity to boredom
(Farmer, 1986). The third survey was the NEO Five Factor Inventory, a standard for measuring the "big
five" personality traits (McCrae & Costa, 2010). Previous studies have shown that conscientiousness
may play a significant role in performance in supervisory control situations (Thomburg, et al., 2011).
Subjects also filled out a NASA TLX workload survey at the end of the experiment, as well as a
customized workload survey to record responses which can be seen in Appendix E.
The final form of data collection was done through the use of the Camtasia@ software and video
recording. The participant behavior in the two minutes prior to each event was quantified by a twoperson panel using the encodings in Table 3 to track the subjects' attention state during the two minutes
preceding a critical event.
Table 3: Video Coding Criteria
Attention State
Criteria
Directed (1)
The participant appears focused, is scanning both displays, is only
monitoring or interacting with the interface, and is not doing any other task.
Distracted (2)
The participant is awake but may be drowsy (rapid blinking, rubbing eyes,
head on hands without moving, extended eye closures, etc). The participant
is looking outside the screen for extended periods, is playing with an object
besides the display (cell phone, hair tie, etc.), or is staring blankly at the
screen for long periods without activity.
Asleep/Unaware (3)
The participant is not paying attention to the interface at all and is
completely asleep or unaware of the interface.
63
3.3
Experimental Design
While there was only one data set, the overall study can be divided into a primary and a
secondary experiment. The primary experiment was a between subjects test consisting of only the data
relevant to the first wave of missiles. In order to maintain a high level of congruence with real-world
operations, repeated waves of missiles would be confounded by learning and priming effects. Of critical
importance in this investigation is not how operators may handle the fourth or fifth wave, but rather how
they handle thefirst (and unexpected) wave. The effects of surprise and novelty place special demands
on operator cognition that are theoretically and practically difficult to replicate with repeated trials,
especially when limited by time and resources. The primary experiment gathers data for assessing
operator performance in the transition period from a low workload environment to a high workload
environment using performance metrics from the simulation, behavioral coding, and psychophysiological
analysis. The primary analysis deals only with the data leading up to and during the first wave of
missiles. After the first wave is complete, it is invalid to assume participants will return to their pre-event
state, so any data collected after the first wave cannot be used to inform our research questions about the
effects of low to high workload transition on performance and hemodynamic response.
The secondary experiment aims at determining whether learning or fatigue effects are present by
presenting a second wave of threats at a later point in the experiment. Since subjects were recruited for a
4-hour block, the experiment could last up to 3 hours in total no matter what time the first wave was
presented, with one hour for filling out surveys and completing training. This presented a second data
collection opportunity at the end of the experiment to perform a repeat of the between-subjects testing
done in the first wave as well as a within-subjects test of changes between the waves. Data collection
during this wave was important, but was only done as a collection of opportunity and was kept as a
strictly secondary task to the primary experiment.
64
The independent variables for the primary experiment were:
1) Time from beginning of simulation until the first appearance of targets. This builds on
research already conducted that suggests that operators reach a "boredom" state after about
25 minutes of low-engagement activity (Mkrtchyan, et al., 2012; Thornburg, et al., 2011).
Targets began to appear at either 40, 100, or 160 minutes. These times allow for comparison
to previous studies as well as ample time for prolonged boredom to develop.
2) Number of targets presented to the operator. At a level of three targets, the subject can
usually assign one of the three sensors to each of the visible threats. However, at six targets
the subject must allocate limited assets to the task, adding a much higher cognitive workload
to the process of assigning assets. This condition demands more resources from the operator
and was hypothesized to expose variations in performance.
The primary dependent variable was the fNIRS data, but other dependent variables included
assessments of performance, such as scenario performance and behavior coding, as well as assessments of
demographics, such as NEO-FFI 3 score, age, or video game experience. The primary performance task
of the missile defense simulation was using the system to minimize the tracking error. When the system
initially detects a missile, it only has a rough estimate of the missile's position and velocity. Using the
UAV tracking system helps to create a more refined picture of where the missile is and where it is going
by reducing the track error. Once it goes below a threshold level of error, other missile operators can
engage the threat with intercept missiles. Track error achieved for each target, the number of targets that
meet threshold track error, and response time to chat messages were collected for each subject.
65
The dependent variables in this study are summarized as follows:
1) Subject performance during tracking tasks:
a.
Percentage of threats tracked to the predetermined track error threshold. The
threshold was determined through pilot studies to a level at which most participants
found the task challenging but possible to complete successfully.
b. Average final track error for all threats
2) Subject response time to chat box messages (secondary workload measurement)
3)
a.
300-500 second intervals during low workload periods
b.
15-20 second intervals during high workload and transition periods
Subject response to subjective workload assessment questions
a.
Experiment-specific post-event questionnaire (see Appendix E: Post-Experiment
Survey)
b. NASA Task Load Index (TLX)
4) fNIRS data to assess prefrontal cortex activity
a.
60-second baseline before event
b.
100-second period during event
c.
Period following event corresponding to a return to baseline levels
5) Behavioral codings (see Table 3: Video Coding Criteria)
a.
In 2 minutes before events
b. Periods of sleep during entire experiment
c.
Case studies of unusual behavior
In order to measure the mental workload of the subject, the chat box was used as a secondary task
measure. Previous work on cruise missile controllers has found that the chat box can suitably measure
performance across a range of visual workloads (Cummings, 2004). Subjects received chat messages at
66
pseudo-random intervals and with varying degrees of interaction varying from a personal question to a
simple system status message. Precautions were taken in the implementation to prevent the secondary
task from interfering with the primary task. In order to avoid the secondary task measure from becoming
an influential force on the study, the chat box is located outside the central area of the display. During the
low task load periods, chat box questions or statements were presented pseudo-randomly every 300-500
seconds. During the high task load period, questions were presented only every 15-20 seconds, and
questions asked were very simple to minimize time away from the primary interface. Although
impossible to fully eliminate any possibility of interruption of the primary task, the location, frequency,
and salience of the chat box were tested and adjusted during pilot testing to ensure minimal distraction.
Furthermore, the subjects were clearly instructed on the hierarchy of tasks at the beginning of the
experiment and told to prioritize the mission over responding to chat messages.
The use of a chat box as a secondary workload measure provides some insights into the subject's
workload by measuring the response time to a chat message. It is expected that as workload moves to the
extreme low end, the individual will have lower engagement and motivation which will result in a slower
response to a chat box message, shown by the blue solid line in Figure 11 below (Wickens & Hollands,
1999). More recent studies have shown that operator response is not necessarily diminished in low work
load environments, shown by the dashed black line (Hart, 2010).
On the converse, if a subject is overloaded, he or she will also have lower performance on a
workload measure because of the limited resources available to respond appropriately. If the subject does
not acknowledge the message at all, it can safely be assumed that they are at either end of the workload
spectrum since they are either completely unaware of the message or completely overwhelmed with
primary tasks. While this experiment focuses primarily on the two ends of the workload spectrum, at a
medium workload level subjects will likely have high performance at both the primary and secondary
task.
67
randomly placed into one of the experimental groups from Table 4 above. Participants were paid $75
dollars for participation and informed of a $150 dollar gift card prize for the participant that achieves the
best performance. The experiment timeline is detailed in Appendix A.
3.5
Summary
This chapter describes the experiment conducted to measure the effect of time in low task load
and situation difficulty on workload transition. Thirty participants were recruited to take part in a missile
defense simulation that follows a supervisory control structure. The hemodynamic response was recorded
throughout the entire 3-hour experiment using fNIRS. Subject tasks during low task load included
monitoring the system and responding to chat messages. Subject had to perform a dynamic asset
allocation problem to try to get 3 or 6 targets better than a certain performance threshold. The event onset
time and difficulty level was unknown to the participant. Each participant received one event at either 40,
100, or 160 minutes, and all subjects received a second event at 180 minutes. Subjects filled out several
surveys including a demographic survey, the Boredom Proneness Index, the NEO Five Factor Index, the
NASA TLX, and a debriefing survey. Video recording was also performed for each subject.
69
70
4
Results
This section first introduces the methods for data analysis for the experiment described in Chapter
3. It begins with a description of the methods for reducing the data from full time series to singular data
points suitable for statistical study. This chapter then describes the various statistical tests and modeling
methods applied and summarizes the most important results. Finally, it explores measuring long-term
trends in the data relating to boredom and fatigue.
4.1
Data Processing
As described in Chapter 3, the raw light intensity data produced by the fNIRS device was
recorded into a file containing a measurement from each sensor distance for each point in time (for a total
of 24 measurements at 12Hz recording rate). A representation of the data recording can be seen in Figure
12. This dataset was then converted into oxygenated and deoxygenated hemoglobin using the Homer2
software. The dataset was then filtered using the parameters listed in Table 2 from Section 3.2 to remove
the majority of artifacts and noise. Using Homer2, the Hemodynamic Response Function (HRF) was
extracted from the entire 3-hours of data. Of particular interest in the 3-hours of data is the interval
surrounding the missile tracking event. The HRF contains the 60 seconds prior to the arrival of the first
missile, referred to as the baseline, and the 100 seconds following the arrival of the first missile. This
baseline was chosen because it includes enough baseline data to cancel out a majority of artifacts. The
missile event lasted 100 seconds, so that is all that was included in the event dataset.
Once the baseline and event segments of the data were extracted from the overall time series, the
data for analysis was extracted using the "average of max" technique. The magnitude of the maximum
was found for each signal within the event period, and then those four maximums were then averaged
together to get an overall average maximum for the subject's oxygenated hemoglobin concentration
(HbO). This procedure was also done to find the average minimum for deoxygenated hemoglobin
71
concentration (HbR), since HbR is found to generally decrease during increased brain activity. The sum
of oxygenated and deoxygenated hemoglobin, total hemoglobin concentration (HbT), was also calculated.
The maximum magnitude was chosen over an average magnitude for several reasons. First, using the
average of several average responses tends to minimize any significant responses, so using the average of
maximum focuses on the strongest part of the signal. According to Resource Theory as described in
Chapter 2, cognitive errors are most likely to propagate when cognitive demands meet or exceed
cognitive resources. At these critical moments, the brain is working the hardest and so measuring the
maximum response provides a mechanism for comparing the critical response of subjects. This same
method was applied to the baseline period to achieve a consistent data analysis procedure.
Sensor
Source
HHbO
Signal
-ED
HbH
-S3IJ
HbT
HET
HbT
-S HS
HbT
T
Figure 12: fNIRS Data Functional Diagram
4.2
Results
4.2.1
Sample Summary
The sample of 30 participants was drawn from a Boston-area university using an online student
message board. All communications were conducted according to COUHES-approved protocol. The
72
mean age was 21.3 years (s.d 2.51), and ages ranged from 18 to 31 years old. The sample included 12
males and 18 females. Twenty one participants identified as undergraduate students, seven identified as
Master's students, and two identified as other status. A full summary of all variables collected can be
seen in Appendix G.
4.2.2
Baseline Analysis
Once the average of maximum magnitude was determined for the HbO and HbR event and
baseline periods were determined, a myriad of statistical tests were performed. First, a comparison of
total hemoglobin (HbT) values between the baseline period and each of the six experimental conditions
using a 2-factor ANOVA (difficulty x onset time) found no significant differences between any of the
conditions, (F(2,24)=0.453, p=0. 8 07 ). Similar trends were seen for HbO and HbR. These results provide
confidence that there were no inherent differences confounding the control variables. The results from
this test can be seen below in Figure 13.
*10
S.001
onset
I160
40
25
4.00"
2.00-
1.0016
.00Hard
Easy
difficulty
Figure 13: Baseline Comparison
73
4.2.3
First Event Analysis
Next, simple examination of the effect of scenario difficulty and first wave onset time was
performed to measure the impact of the experiment independent variables. The raw HbO and HbR data
were converted into percent change factors (over the baseline values) in order to determine the relative
changes occurring in each subject. The mean HbO percent change was 60.5% (s.d. 124.1%), with a
minimum of -123.4%, maximum of 337.6%, and median of 25.64%. The mean HbR percent change was
81.5% (s.d. 150.4%), with a minimum of -107.2%, maximum of 594.2%, and median of 39.1%. Twofactor ANOVAs for percent change in HbO and HbR found that difficulty was not a significant factor in
HbO (F(1,24)=1.471,p=0.237) or HbR (F(1,24)=0.298,p=0.71) response in terms of percent changes,
but that onset time was a significant factor in both HbO (F(2,24) = 7 .6 4 1, p=0.00 3 ) and HbR
(F(2,24)=3.304,p=0.054), which can be seen in Figure 14, Figure 15ab, and Figure 16 and in Table 6
and Table 7, which can be found in Appendix H. The most striking result was that the 100-minute onset
time was found to have a lower marginal mean response than the 40- and 160-minute cases, indicating a
diminished response for subjects during the middle of the experiment, the time when others have found
attentional inefficiencies to be highest (Hart, 2010).
74
onset
160
100
40
4.003.002.00
I
1100
000
-.00-
-li
-
I II Ia
I
II
a
*1I
I
CL
-2.00
C
4.00-
I
3.002.001.00-
Sl
.00-
I
-1.00-
III
II
I
I
CL
I
-2.003 6 9 12151821242730
3 6 9 12151821242730
3 6 9 12151821242730
subject
Figure 14: Subject Number vs. HbO % Change
HbR
HbO
onset
2.00-
-40
100
160
onset
2.00-
40
00
1.60
so-
I
I
1.00"
.5S0-
I--
IA&
100-
-so-
Hard
Esy
Hard
Easy
difficulty
difficulty
Figure 15a,b: Estimated Marginal Means for % Change HbO (left), HbR (right)
75
In addition to computing the results for the percent changes in HbO and HbR individually, the
ratio of HbO divided by HbR was also calculated. This may provide a measure of how oxygenated and
deoxygenated hemoglobin vary together and give greater insights into the overall hemodynamics (Gagnon
et al., 2012). Using this ratio, onset time was found to be a significant factor in the hemodynamic
response (F(2,24)=4.163,p=0.028), with the 100-minute onset time having a higher hemodynamic
response ratio than the 160-minute cases, as calculated using a Tukey HSD test for all pairs (p=0.0 2 6 ).
The summary of the pairwise comparisons for onset time using the HbO/HbR ratio can be seen in Table
8.
onset
27
Ils
4.00-
14
09
z
2.00-
~-2.00
H.,d
a-V
dImculty
Figure 16: HbR % Change from baseline
4.2.4
Time to the Maximum and Return to Baseline
Following the analysis of the primary dependent variables (HbO and HbR) relationship to the
independent variables (number of missiles and onset time), several auxiliary analyses were performed to
fully capture the response of subjects. The first auxiliary analysis was a measurement of time to achieve
the maximum HbO and the time to return from the maximum back to the baseline. To perform the time
to the maximum analysis, the HbO signal was averaged at each time point in order to generate a single
HbO signal, and then this average signal was used to find the time from the beginning of the event to the
76
occurrence of the monotonically increasing maximum. Due to the -6 second lag that can occur between
activity and hemodynamic response and when it is observed in the data, the window for the time to
maximum was extended to 120 seconds from the start of the missile event. The average time to the
maximum was 69.1 seconds (s.d 50.4 sec). The time to the maximum was not found to be significant for
onset time or difficulty, as seen in Figure 17.
Estimated Marginal Means of TimetoMax
onset
-40
100
160
100.00-
40.00-
EEar
difficufty
Figure 17: HbO Time to the Maximum
The second time-dependent measure was the time to return to the baseline. First, the mean and
standard deviation of the 60-second baseline period were calculated. These values can be seen in
Appendix I. Next, the return to baseline calculation was done by measuring when a I 0-second sliding
average window of the HbO signal returned to within one standard deviation of the baseline mean
following the maximum calculated above. The average total time to achieve the maximum and then
return to baseline was 153.4 seconds (s.d 127.9 sec). Table 5 shows the Kruskal-Wallis test applied to
the data that found the return to baseline time was significantly different for missile wave onset time
(X2(2) =8.788, p=0.0 12), but was not significant for scenario difficulty.
77
Figure 18 shows that there is a difference evident when comparing time to return to baseline for
the three different onset times. The 160-minute condition was the driving factor of the variation,
indicating that the physiological response of returning to baseline was slower when participants were
subject to the 160 min onset event time. This result points to greater evidence that fatigue and fNIRS
signals may be an important relationship.
600.00-
500.0s
u400.0
Test StatIstIcs"
Chi-Square
Returntoase
line
8.788
df
2
.012
1Asymp. Sig.
a. Kruskal Wallis Test
b. Grouping Variable:
onset
Table 5: Return To Baseline test
results
4.2.5
300.0
zoo-o-
100.2T
40
to0
160
on"et
Figure 18: Return to baseline time
Performance
The next set of calculations analyzes the performance of the subjects on the missile tracking
simulation. The average final track error and average percentage of missiles tracked to the threshold level
were compared between the 3- and 6-missile scenarios. Due to hetereoscedasticity in the results, the
results were calculated using several non-parametric methods. The Mann-Whitney test (U=37.00) and
Wilcoxon test (W=157.00) were performed to compare final track error between onset time conditions
and difficulty conditions, finding difficulty to be significant (p=0.002) but onset time not significant
(p=0.3 11). The discrepancy in performance for difficulty can be readily seen in Figure 19. Onset time
was not found to be a significant factor in performance, but Figure 20 suggests that there may have been
differences in the 100-minute 6-missile condition compared to the other 6-missile conditions. The
78
Wilcoxon and Kruskal-Wallis tests were applied to compare all 6 conditions together and found there was
statistical variation between the groups (X2()=12.897,p=0.0244).
Video analysis of participants in this
condition showed no abnormal or exceptional characteristics or behavior, so the clear differences in the
100-minute condition suggest the difference may have come from the effect of time and difficulty, as
opposed to extraneous variables. Brown-Forsythe tests for unequal variance found no significant
differences (p=0.285) in variance for overall onset time effects as well as comparing onset time within
just the 3- or 6-missile conditions. With only five subjects per experimental group, the statistical power
for these tests was low, so additional data may help to confirm or deny these results.
300.0
onset
300.00-
I
0
200.W
too1060
20000-
200DO
-4
20
200.00-
44
00.00-
100.00
SO.00-
50.00.
.00-
bo,
taov
*0S
EaV
murd
Hard
dffcuky
dfficufty
Figure 20: Average Final Track Error by
Time and Difficulty
Figure 19: Average Final Track Errorby
Difficulty
Similar calculations were performed using the percentage of missiles tracked below the specified
performance threshold. Again, difficulty was shown to be a significant factor but onset time was not, as
can be seen below in Figure 21 and Figure 22. The Mann-Whitney U was 66.5 and the Wilcoxon W was
186.5, leading to a 2-tailed significance of 0.040. This result confirms that the 6-missile scenario was
significantly harder than the 3-missile scenario, which was the expected result. However, the lack of
significance between the onset times demonstrates that regardless of how long subjects waited to interact
with the system, they performed statistically no different.
79
I. 00-
.
so-
I.
Go-
.
20004y
Hmrd
difficulty
F igure 21: % Below Threshold by Difficulty
06
1.00-
5
onset
70
.80-
C
*5
.6044
.4004
.20-
.00Hard
Easy
ditflculty
Figure 22: % Below Threshold by Time & Difficulty
4.2.6
Model Creation
In order to analyze which of the myriad of possible demographic, covariate, and independent
variable factors were most important in predicting performance, all possible factors were input into a
backwards linear regression model and tested for significance. The predicted variable was the average
final track error, which was a primary measure of performance. Since this is an error term, a lower
80
average final track error corresponds to better performance. Since this was a model based on the
simulation performance, the average final track error is a relative term that is useful only comparing
between subjects but not anchored to any real-world parameters. The predictor factors included age,
gender, Boredom Proneness survey scores, NEO-Five Factor Index Scores, time to max HbO, time to
return to baseline, the hemodynamic features of percent change of HbO and HbR, video coding scores of
distraction, video gaming experience and NASA TLX scores. The video coding scores come from a 2person panel who rated each subject's behavior during the 2 minutes prior to the event as directed,
divided, or asleep/completely distracted, as described in Chapter 3.5. The rating panel reached consensus
on each subject. The performance of subjects categorized by distraction coding can be seen in Figure 23.
200.00.
100W
SO.sdd0too0g-*0
distraction
Figure 23: Average Track Errorvs Distraction Coding State
The model, presented below in Equation 4.1, shows that there are four significant factors that
influenced performance. The NEO-FFI component Agreeableness was found to negatively correlate with
performance (p=0.00 7 ). NEO-FFI responses for all thirty subjects can be seen in Figure 24. Distraction
levels generated by the video coding were also a significant predictor of performance, with increased
distraction corresponding to decreased performance (p=0.050). Video game usage, identified during the
demographic survey and summarized in Figure 25, was a significant predictor as well, with "gamers"
performing better than non-gamers (p=0.0 2 2 ). Finally, HbR was shown as a significant predictor, with a
81
greater magnitude response in deoxygenated hemoglobin corresponding to an increase in performance
(p=0.019). The modeling parameters and significance levels can be seen in Table 9 in Appendix H. The
model parameters are A (NEO-FFI Agreeableness), distraction (video coding of distraction), AvgMinR
(HbR), videogame (video game usage). The R2 for this model was 0.397, thus the model fit was moderate
to strong.
FTE = flo - 6.896(A) - 42.061(HbR) + 37.54(D) - 17.637(VG) + E
Where
FTE - Average Final Track Error (lower = better performance)
po -
Model intercept constant
A - NEO FFI-3 Agreeableness rating
HbR - Minimum of deoxygenated hemoglobin during event minus baseline
D - Distraction coding from section 3.2
VG - Videogame usage
s - Model error
50
NEO-Five Factor Index Scores
45
40
35
E
30
S
25
0 Neuroticism
Extraversion
Open to Experience
Agreeableness
q:1 Conscientiousness
--
20
15
10
5
Figure 24: NEO-FFI Scores
82
(4.1)
20
Video Game Usage Histogram
-----
15
10
444V
Figure 25: Video Game Usage
4.2.7
Chat Box Analysis
The impact of participants' responses to the chat box was also analyzed for many of the
previously mentioned variables. Since the chat box was used as a secondary workload measure, it
provided an additional measure of workload over time. The average chat question response time was
calculated for the period from the start of the experiment to the start of the 'first wave of missiles. The
average chat response question time was 13.7 seconds (s.d 7.96). The number of chat questions missed
entirely in that period was also computed. The mean number of missed questions was 0.4 (s.d 0.67).
Neither average response time nor total questions missed were significantly correlated to the
hemodynamic measures of HbO, HbR, or HbT during the event. Subjects in the "100-minute, hard"
condition had a higher response time, which lends more evidence to the conclusion that they were less
engaged than the other conditions. This difference can be seen in Figure 26. When the NEO Five Factor
Index of Conscientiousness was included as a covariance measure, chat response time was significantly
different for difficulty level (p=0.0 4 4 ) and weakly related to onset time (p=0.080) using a 2-factor
83
ANOVA. These results are summarized in Table 10 in Appendix H.
difficulty
30.00
11
0
.00
40
100
IGO
onset
Figure 26: Chat Response Time vs. Primary Variables
It is also important to note here that the chat box may have unintentionally cued subjects to look
at the screen right before a missile event. A pre-determined script of messages was randomly spaced
between 300 and 500 seconds to mimic the pace of actual operations. All subjects received 70 chat
messages in total, with 4 messages presented during the event period. All subjects received the same
script in order to provide continuity across conditions. The chat script used the computer clock, while the
simulation used a server located at Lincoln Lab, which would occasionally lag for a second or two due to
system processing. Since the simulation clock was sometimes slower, chat messages intended to arrive
immediately after the start of the event would arrive slightly before the appearance of any missiles.
Consequently, 25 of 30 subjects received a message within the 60 seconds before the start of the first
wave and 15 received a message within the 20 seconds before the event. The table of times for all
subjects can be seen in Appendix J, and a histogram of the times can be seen in Figure 27. The times
correlated with subject performance (r = 0.232), but a linear regression did not reveal a significant slope
(F(1,28)=l.58, p=0.2l8). Since all subjects received a message within the 140 seconds prior in a
relatively random manner, it is reasonable to accept that it did not confound the experiment significantly.
84
Overall, the usage of chat box response times lends further evidence to previous supervisory control
experiments and contributes to the conclusion that participants in the "100-minute, hard" condition were
less engaged than other groups which led to worse performance.
Time of Last Chat Message Before Event
10
8
c6
Cr4
10
20
30
40
50
60
70
80
More
Time before Event
Figure 27: Time of Last Chat Message Before Event
4.2.8
Second Wave Analysis
A similar set of analyses were performed for the second wave of missiles. As described by Table
3 in section 3.2, the first wave analysis is the only legitimate analysis of the effects of a mental workload
transition from low taskload to high taskload for a novel event, since repeated waves are confounded by
situational priming and learning effects. However, the second wave analysis is helpful in understanding
the effects of fatigue and learning, and is also useful for making broad conclusions about mental
workload. The second wave was a repeat of difficulty from the first wave, and all subjects received the
second wave at 180 minutes.
Comparison tests from wave one for the difference in hemodynamic response were repeated for
Wave 2 and showed that there were no significant differences in HbO, HbR, or the ratio HbO/HbR for
either difficulty or Wave I onset time. Difficulty was found to have a significant impact on average final
track error when measured by the Mann-Whitney test (U=42.00) and the Wilcoxon test (W=147.00,
85
p=0.006). Wave 1 onset time did not have a significant effect on Wave 2 average final track error
(F(2,28)=0.81, p=0.455). HbR and Agreeableness were found to be a weak predictor of performance,
with HbR correlating with average final track error correlation (p=0.067) and Agreeableness correlating
with percentage below threshold (p=0.069). The data in Figure 28 show that there was large variation in
Wave 2 performance for subjects who received the 100-minute, 3-missile condition in Wave 1. This is
due to the fact that one subject fell asleep entirely and another subject nearly missed the entire second
wave event. These occurred 40-80 minutes after Wave 1 was complete and these subjects both performed
adequately answering chat questions before the first wave and tracking the first wave missiles below the
threshold, so these findings do not invalidate the conclusions made about Wave 1 above. Additionally,
those that received the Wave 1 160-minute, 6-missile condition appeared to struggle considerably in their
Wave 2 performance when compared to their Wave 1 performance, which can be seen in Figure 29.
onset
040
500.00-
160
400.00-
300.00-
LA. 200.00
S
100.00
24
.3
Hard
E.y
difficulty
Figure 28: Wave 2 Average Final Error vs. Time & Difficulty
86
FinalfrackErrori
500-
FlnalTrack2
400300Im
200-
S
100Sb
0-
CL
91
MC
4
a
500400
300200-
i
24
24U
100-
*4
040
100
160
onset
Figure 29: Wave 1 vs. Wave 2 Final Track ErrorPerformance
4.2.9
Lateralization
In addition to looking at the overall physiological response of the subjects, the data was split
hemispherically in order to examine if there were lateral effects between left and right brain hemispheres.
Psychology literature indicates that each hemisphere may have some task type specialization (Helton, et
al., 2010; Shimoda, Takeda, Imai, Kaneko, & Kato, 2008; Tucker, 1981). Paired t-tests found no
evidence of lateralization of the hemodynamic response when measured globally or broken down by
difficulty or onset time in an ANOVA, (Table 1 land Table 12 in Appendix H). The distribution of HbO
and HbR lateralization can be seen in Figure 30 and Figure 31. A linear regression model fit for Wave 1
and Wave 2 performance using HbO, HbR and HbO/HbR lateralization predictor variables found that no
measures of lateralization were significant predictors of performance, (Table 13 in Appendix H).
87
L-R hbe 1& L-R hbo 2 vs. Dlfflty& Onmo Thme
HbftLatwaflintlon
0Z L-R hbo
4
1:1 L-Rhbl1
L -R hb
10
-10
00
10
Difi.uty/
40
10
160
40
100
100
E
rf
Ontim
Waufy
40
100
160
HASrri
Figure 31: HbR Lateralization Effects (LeftRight)
Figure 30: HbO Lateralization Effects (LeftRight)
4.2.10 Long-Term Effects
In addition to the analysis of the response directly before, during, and after events, the
hemodynamic data was also analyzed to look at trends over the course of minutes or hours. Visual
inspection revealed that 25 of 30 subjects had a fairly strong, consistent pattern of steadily increasing
HbO levels during the first 30-60 minutes of the experiment, with steady levels of HbR. Conceptually,
the time frame associated with this phenomenon suggests that it may be related to the vigilance
decrement. This well-studied phenomenon shows performance declines over the first 30 minutes of a
vigilance task and then stabilizes or continues to decline but at a slower rate. Figure 32 shows a response
of a subject for the full 180 minutes, with the first 30 minutes highlighted. In order to compare with the
classic vigilance research that shows vigilance performance declines linearly for the first 30 minutes, the
time to level-off for HbO was calculated by fitting a linear slope through a 20-minute moving window
and then declaring a "level-off' when the slope either becomes negative or is less than 0.1% of initial
slope. These empirically derived constants for window size and level-off criteria aligned well with visual
inspection and captured the main effects of the phenomena. If the subject was determined to have no
88
slope or negative slope for the initial 20-minute window, they were excluded from calculation. Twenty
six of the 30 subjects exhibited this HbO vigilance pattern but only 15 subjects were classified as having
HbR vigilance patterns, indicating HbO may be the primary driver of vigilance response. Figure 33
shows a histogram of the level-off time for HbO, and the figures for each subject with a mean slope can
be seen in Appendix K.
Subl"m 31
so
30
F--
HbOl
Event Start
25
20
15
0
2000
4000
s000
6000
10000
12000
Tirne (s)
Figure 32: HbO Response for Subject with Vigilance Decrement Pattern
The mean HbO level-off time was 32.6 minutes (s.d 17.1) with a maximum of 78, minimum of
12 and median of 29 minutes, while the mean HbR level-off time was 23.1 minutes (s.d 11.7) with a
maximum of 46, minimum of 11, and median of 18 minutes. This reinforces the "30-minute decrement"
that Mackworth found nearly 70 years ago and shows that these behavioral phenomena may correlate to
physiological measures. The mean slope of the period from t=0 to the level-off point for HbO was 0.241
micromolar/min (s.d. 0.169) and for HbR was -0.04 micromolar/L/min (s.d. 0.080), as seen in Figure 34.
Several studies (Berka, et al., 2007; Helton, et al., 2010; Warm, et al., 2009; Warm, et al., 2008) have
measured a similar trend using other neuroimaging techniques, but to the author's knowledge this is one
of the first uses of fNIRS to measure the vigilance response.
89
Figure 33: HbO Level-Off Time
Slope to Level-O
-
0.8
0.6
0.4
0.2
0
-0.2
HbR
HbO
Figure 34: Slope to Level-Off
There were several other long-terms trends of interest. The first test looked at the variance of the
HbO and HbR signals over the course of the experiment by dividing the 180-minute test into 4 periods.
The overall test was divided into the first 30-minute period and then three 50-minute periods since it is
hypothesized that the first 30 minutes may have a difference physiological response due to vigilance
effects. ANOVAs for HbO and HbR variance and range were both inconclusive (i.e., no statistical
significance). The final test was looking at the area between the HbO and HbR curves. This integral
between HbO and HbR could potentially be used as a cumulative measure for work, similar to what is
described by the Fick Principle that oxygen consumption is the oxygenated blood minus the
90
deoxygenated blood, normalized by the total blood flow (Robertson et al., 1989). Comparing the
integrals for the four periods using ANOVA found no significant differences (F(3,26)= 1.35, p=0.26) as
seen in Figure 35, but this metric should be re-examined in future studies.
F rYACo-Migner
IO4f 4qtt*at
14
12
to.
I
0
-2
Figure 35: HbO-HbR integral for 4 quarters
4.3
Summary
Primary data analysis shows that hemodynamic response was affected by onset time but not by
difficulty. Separate from hemodynamic response, the 6-missile scenario proved to be a significantly more
difficult scenario than the 3-missile scenario, as expected, but first wave onset time was not found to be a
statistically significant factor in either first or second wave performance despite large differences in the
100-minute onset time condition that appear upon visual analysis of the data. A model using all possible
input parameters found that Agreeableness, Video Game experience, pre-event behavioral state, and HbR
response were significant performance predictors. Lateral analysis showed no major distinction between
left and right activity during mission response. Long-term trend analysis provides some evidence that
there is a physiological correlate to the vigilance decrement.
91
92
5.
Conclusions
This chapter addresses conclusions regarding the experiment described in this thesis and
examines how the results fit into the broader context of low-workload, human supervisory control tasks.
It also discusses possible confounding variables and limitations of this experiment. Finally, it provides
recommendations for future work using functional brain imaging in low-workload environments.
5.1
Experiment Conclusions
There are several conclusions that can be drawn from the primary experiment. First, the two
independent variables measured showed interesting relationships between difficulty and onset time. As
expected, participants who received the 3-missile scenario had performance scores significantly better
than those who received the 6-missile scenario. However, it was expected that this increase in objective
difficulty would correspond to a physiological difference between the difficulty levels, which was not
found. This result indicates that the physiological differences in mental workload were not significantly
different in this experiment when measured between the two difficulty levels. Unlike other studies
(Girourd, et al., 2009; Kramer & Parasuraman, 2007; Sassaroli, et al., 2008; Wilson & Russell, 2003;
Wilson & Russell, 2007), this study did not discriminate levels of mental workload. There are many
possible reasons for why this experiment did not detect any significant differences with respect to degree
of difficulty.
First, the number of trials was very limited for this fNIRS experiment. Although there were 30
subjects studied (which is significantly more than most previous NIRS studies), each subject only
received one event for the primary experiment and two events overall, which stands in contrast to other
fNIRS studies in which each subject receives several repeated trials in order to obtain an average
response. However, one could argue that this test scenario was more realistic in terms of work
environments. In order to truly capture the very low workload and rare event occurrence of events in
93
something like a ballistic missile defense environment, a repeated-measures structure would change the
character of the task and the subject response, so this limitation could only be offset by running additional
trials.
Second, the subjects in this experiment were novice operators. While provided with an
instruction tutorial, practice scenario, experimental guidance, and knowledge check, subjects were still
dealing with some uncertainty in using the interface, which could have contributed to the lack of
statistical difference in physiologic response. Third, the method for determining the response of the
subjects through the use of the average of the maximum method could bias the result away from sustained
workload and towards instantaneous workload. While subjects may have had to sustain their mental
workload for greater periods of time in the 6-missile scenario, subjects could have reached instantaneous
peaks for the 3-missile scenario that were equivalent to 6-missile levels but not sustained. Finally, while
several different filtering methods were applied to reduce noise and artifacts from the data, it is still
possible that some of the true responses may have been attenuated, magnified, or otherwise altered by
phenomena other than increased blood flow due to prefrontal cortex activity.
In contrast to the analysis regarding degree of difficulty, analysis of the impact of onset time
showed some remarkable results. The most significant result was that the 100-minute condition was
significantly different for both HbO and HbR in the first wave response, with the greatest difference
between the 100-minute group and 160-minute group. The best explanation for this is that subjects were
at their least engaged and least primed at the middle point of the experiment, as opposed to at the relative
beginning and end. Video analysis and experimenter notes suggest that subjects were more primed at the
40-minute mark than the 100-minute mark, and began losing concentration after an hour. Most subjects
returned to a more alert state in the last half hour since they knew the experiment was coming to a close.
This result is very relevant to understanding mental workload over the course of long-duration, lowworkload tasks. Over the course of a shift, an operator's attention and arousal can follow a pattern of
94
high at the beginning, low in the middle, and a return to higher at the end of a shift, so a corresponding
physiological difference at the middle onset time suggests that the brain may be less apt to handle difficult
situations during the middle of a boring shift. This response has been seen in a similar study (Hart, 2010).
Ultimately, the result that onset time was a significant factor in hemodynamic response but not task
difficulty lends evidence to the argument that INIRS is a better tool for measuring engagement, rather
than simply mental workload. The finding that the 100-minute onset time was linked to a diminished
psychophysiological response lends evidence to the hypothesis that mental workload transition is not
simply an accumulation of work or boredom, but rather a process dependent upon state of engagement or
attention at the start of the workload transition. When combined with other results such as the vigilance
decrement and the return to baseline, the power of fNIRS may truly be greatest as tracking mental state
rather than just mental work, a potentially more powerful measure.
Arguably the most important relationship is between hemodynamic response and performance,
and whether performance can be reliably predicted from a physiologic state. The backwards regression
model discussed in the previous chapter generated striking results. The first significant prediction factor
was the NEO Five Factor Index level of Agreeableness. In short, subjects who scored lower on
Agreeableness tended to perform better than those who scored higher. One of the primary traits of
Agreeableness is trust, so it could be inferred that subjects less willing to trust the system to function were
more alert, allowing them to perform better when forced to rapidly shift their paradigm of operation
(Costa Jr, McCrae, & Dye, 1991; McCrae & Costa Jr, 1999). People with a low Agreeableness rating are
also more likely to be competitive than those higher on the Agreeableness spectrum, which aligns well
with the fact that there was a large prize for the top performer.
The second factor that proved to be significant was video game experience. Several studies
(Boot, Kramer, Simons, Fabiani, & Gratton, 2008; Green & Bavelier, 2003) have linked video game
experience to performance in various kinds of simulation environments, especially military-related (Chen
95
& Barnes, 2012; Clare, Cummings, How, Whitten, & Toupet, 2012; Cummings, Clare, & Hart, 2010).
Clare (2012) found that video game experience was linked to automation trust, which adds evidence to
the hypothesis that trust, video game experience, and supervisory control task performance are
interrelated. Subjects with game experience are likely to be more comfortable operating with a computer
interface and dealing with complex scenarios, so it is a very telling result that game experience was found
to be a predictor of performance.
The third major factor in predicting performance was the state of behavior in the two minutes
preceding the event, which were generated from the video analysis. Subjects who were coded as divided
(N = 5) or asleep/unaware (N= 1) before Wave 1 were found to perform significantly worse than those
coded as directed. This result is consistent with other similar experiments, which found behavior state to
be linked to performance (Hart, 2010; Mkrtchyan, et al., 2012). Additionally, this result solidifies the
importance of attention in priming a subject for a cognitively demanding challenge. Even with chat
messages within the two minutes before the major event, many subjects still struggled to stay in a
properly primed state, especially those whom received the event at the 100 minute time. Subjects who
were not engaged had to command more attentional resources to first focus on the task and then perform
at a high level, as opposed to subjects who were already cognitively prepared to tackle a challenge. This
result reinforces the importance of being task-focused in operational settings, especially settings where
the potential for distraction is high.
The final performance predictor was HbR. This is an important result because it indicates that
performance can be related to hemodynamics. HbR is often viewed as a more reliable indicator of
workload since it is a measurement of the resources being consumed, so the result that subjects who
performed better consumed more resources fits well with the theoretical underpinnings of cognition and
psychophysiology. Furthermore, this result gives credence to the notion that this technology could be
used in by adaptive automation to help modulate mental workload in order to maximize overall system
96
performance based upon the measurement of hemodynamic signals during high workload periods.
Workload is more directly related to task volume or rate, while engagement also incorporates task type
and operator interest, so simply manipulating task volume may not be sufficient to maintain sustained
attention. This result should be interpreted with caution, however, since HbR cannot predict performance
alone and has to be considered in light of the other variables controlling for other sources of variability.
None of the four predictors were significant on their own.
These results are significant for a number of reasons. First and foremost, the correlation of
physiological trends with vigilance improves the external validity of this experiment. It also provides
evidence to the overarching notion that physiological measures can be used in the low workload periods
to make inferences about psychological state. fNIRS is just one of many methods that could be applied
towards monitoring operator arousal, stress, and mental workload during boredom and low workload, and
when used in combination, these techniques could provide reliable indications of when subject
engagement and focus begins to deteriorate. Second, these results provide a physiological perspective of
Hart's findings (2010) that attentional inefficiencies were highest in the middle of a low task load
experiment. Finally, these results lay the foundation for future work that can look at longer periods
before and after an event to determine how a subject's pre-event state affects transition and, ultimately,
performance. While this study looked specifically at the time directly surrounding an the events in
question, analysis of broader periods of time surrounding events could provide even more information
into what factors influence workload transition and event performance.
Looking back at the questions stated as goals in Section 1.1, there are several questions that were
answered and several more that remain open paths for investigation. The four research questions are
listed again below:
97
1.
Is a change in the measured activity of the fNIRS data correlated to an actual change in real
mental workload?
2. What is the impact of time in low taskload environment on operator performance and
response?
3.
Is the magnitude of a transition from low workload to high workload correlated to
performance?
4. Can the pre-transition state of the operator relative to a known baseline be used to predict the
post-transition response of the operator?
This study showed first that a transition from low to high workload was related to a significant
hemodynamic response. It then showed that critical event difficulty did not have a significant impact on
hemodynamic transition magnitude, but that time in the low taskload environment before an event was
significant. The decrease in HbO for the 100-minute condition along with the degraded performance for
subjects that received the 100-minute condition point to the fact that those subjects were the most
disengaged and that fNIRS may be a suitable technology for measuring engagement, although that
question requires a follow-up study. The model developed shows several factors which were important to
predicting performance, including HbR, Agreeableness, video game experience, and pre-transition
distraction state. The final questions looking at distinguishing engagement during low workload and
comparing pre- and post-transition states remain open areas for future research, although the detection of
a possible physiological correlate with the vigilance decrement is a promising result that should be
studied further.
Overall, this study accomplished many of the goals laid out at the onset, in addition to being one
of the first, if not the first, applications of fNIRS for a long-duration, more realistic environment. The
demonstration that fNIRS sensors were suitable for long-duration experimentation is an important
contribution to the fNIRS field as a whole, and the dataset collected will provide a rich area for future
98
analyses. This is one of the first applications of fNIRS into the low workload domain, and the
development of analysis tools and methods in this study has helped lay the foundation for how future
fNIRS studies can be analyzed.
5.2
Limitations
While every effort was made to control confounding variables and make valid conclusions, there
are several limitations that should be discussed for this work. First, the experiment was conducted under
only a single blind condition, with the experimenter having knowledge of the experimental condition at
all times. This was done in order to properly monitor the subject and ensure the simulation was working
correctly, but may have introduced a bias into the experiment. The experimenter tried to avoid entering
the room in the 30 minutes before the event and avoid interpersonal interaction, but this was not always
possible and may have resulted in attenuating boredom before an event. Second, increased experience in
running the experiment may have slightly modified the experimenter conduct over time, especially in
regards to addressing questions by the subject about how to best utilize the interface. Third, the chat box
may have unintentionally cued subjects immediately before an event. The chat box timer did not always
sync perfectly with the simulation, so several subjects received a message directly before an event that
may have raised their arousal level unintentionally. These occurrences were noted in the video analysis,
but it is still worth mentioning as a potential confounding factor since performance may have been
significantly worse in several cases without the inadvertent chat alert. Finally, the presence of video
recording may have influenced the actions of subjects.
There were also several limitations to the data processing and analysis. The Homer2 software
provides a user-friendly user interface for analyzing fNIRS data, but may provide misleading plots when
analyzing fNIRS data after a probe has faulted. All the data were checked to remove erroneous points
99
after sensor faults had occurred, but it can still be problematic when using the Homer2 software for
visualization. It also automatically centers the data on the average response, which can cause misleading
interpretations of long-term trends. This was corrected by setting the "time = 0" point for each HbO,
HbR, and HbT signal to a magnitude of zero at the beginning. For the HRF plots, the baseline period of
60 seconds before the event was used to offset the effects of Homer2's data averaging process. The other
main limitation of the analysis was the use of a single differential path-length factor (DPF) for the
computation of hemoglobin concentration. While a good approximation for most humans, the true DPF
varies slightly for each individual based on a number of factors, so the use of a single common DPF limits
the analysis. The average of the maximum method may also introduce some error because it may be
more susceptible to false readings due to motion artifacts than an averaging method. This was offset by
filtering to remove high frequency artifacts, as well as manual exclusion of signals with obvious large
excursions.
5.3
Future Work
Since low task load research will continue to be an important context for human-automation
collaboration, there are many avenues to explore that are suggested by this work. The first area for
further research would involve studies similar that collect more responses for each subject. While it
would be more difficult to prevent learning effects and capture true boredom, this may help provide
greater reliability and insulation against outliers due to artifacts in the fNIRS data. Other permutations of
this experiment could modify the overall environment, the amount of training time, the amount of low
taskload time, or the complexity of the task. Adding in a dedicated vigilance task such as monitoring a
process or video feed could help to elucidate the vigilance findings and increase the similarity to many
real-world environments.
The second area to explore is looking at using fNIRS to specifically focus on measuring
engagement during very low taskload operations, rather than the transition and high taskload periods.
100
The long-duration measures are good first steps towards answering this question, but possible future
experiments could use more traditional vigilance tests as a method for determining what an "engaged"
brain signal looks like as compared to a "disengaged" brain signal. A cursory investigation to link chat
box response time with hemodynamic response found no significant correlation, but an experiment
focused exclusively on measuring mental state during low taskload may draw different conclusions. A
more extensive machine learning application may uncover trends in the data that could lead to stronger
conclusions about engagement state over time. While a difficult problem, tracking brain state over time
would provide a very powerful input to any future system and could significantly alter the relationship
between a human and a machine. A system that can discern brain engagement during low workload
would have major implications in the development of robust adaptive automation techniques for low
taskload work.
The third area for investigation would be the development of an adaptive automation system for
low taskload environments. A primary feature of this system would be to monitor brain activity and then
combine it with system inputs or other physiological measures to create a multimodal inference of mental
state. Even a simple measure such as drowsiness could have major impacts on critical fields such as
transportation where fatigue and drowsiness are routinely cited as cause for scores of accidents every
year. If a reliable mental state tracking system could be developed, it could be used in a variety of ways,
such as finding factors that can mitigate true boredom and mental disengagement or measuring the
differences between various interfaces.
101
Appendix A: Experiment Timeline
Time (min)
0
Action
-Subject is brought into test room and read the following script:
"As part ofa study to analyze the way that humans interact with the highly autonomous
systems by MIT's Human and Automation Lab and Lincoln Laboratory, we request your
participationin this simulation. Participationis strictly voluntary andyou can choose to
withdraw at any time. All datawill be kept confidentialand encoded to ensure participant
anonymity. Pleasefeelfree to raise questions at any time throughoutthe experiment.
Thank you for your participation."
-Subject is first asked the following questions: (Qualifying response in parentheses)
"Are you a native English speaker?(Yes) Are you right handed?(Yes) Are you colorblind?
(No) Do you have any history ofhead trauma, neurologicaldisorder, or epilepsy?(No)"
If the subject answers any questions with a disqualifying response, they will be thanked
for their time and removed from the study. Otherwise, they will be presented with consent
form, which they will review with the experimenter (Appendix B). Subjects will then take
on-line versions of the demographic survey, boredom proneness survey, and NEO Five
Factor Index III forms (Appendices C, D, E). Subject will be assigned a number to
complete their forms.
10
Once subject completes previous forms, they are presented with powerpoint tutorial of
how to operate the software. They will be allowed as much time as necessary to go
through the tutorial. The experimenter will be present to answer any questions that may
come up regarding functions of the interface, although the experimenter will answer in
simplest form possible to minimize experimental bias.
25
The test environment will be opened for a tutorial version of the test. This tutorial will be
a simple case of two missiles launched together 90 seconds after the start of the tutorial
period. The subject will be allowed to experiment with all the different features of the
displays and ask the experimenter questions. The practice session gives the subject
approximately 4 minutes to experiment with the system while also showing them the
importance of quick action. After the tutorial, the subject will be asked to fill out a 5question test to show proficient understanding of how to operate the system.
30
The subject will be given an opportunity to take a break (water, bathroom, stretch, etc)
before beginning the test. The experimenter will read the following:
"Ifyou would like, you can use this time to use the restroom, get a drink, stretch or use
your phone. This will be your last chance to leave the room once the experiment begins,
unless it is an emergency."
35
The experimenter places the sensor onto the subject's forehead. After ensuring proper
placement and adequate subject comfort with the device, the experimenter will run the
system calibration program. Once the system is calibrated, the experimenter will advise
the subject to remain still and calm for a 1-minute baseline period.
40
Experimenter loads the scenario. Once it is configured, experimenter reads the following
102
script:
"The experiment is now about to begin. Your primary mission is to respondto the missile
threats in the most effective way possible. You should read and respondto chat messages
as quickly as possible without compromisingthe primary mission. During the test please
do not work on any other tasks or change or block the screens or modify the test computer
in any other way. If the clock is running, the system is operating. Please let the
experimenterknow of any problems or concerns during the test. Are you ready to start?"
At this time, the experimenter answers any last minute questions and then begins the trial.
The experimenter will remain close by outside the room to monitor the subject, record any
observations, and address any issues that may come up. The experimenter will enter the
room every 20-30 minutes to monitor the system function and ensure proper data
collection. They will refrain from any interaction with the subject to ensure a sterile
environment. The experimenter will specifically avoid entering the room within the 30
minutes prior to any event.
80
Test groups A and B will receive first "test" wave.
140
Test groups C and D will receive first "test" wave.
200
Test groups E and F will receive first "test" wave.
220
All subjects receive second wave of missiles. Second wave is exact repeat of first wave.
After second wave, experimenter will enter the room and stop the data recording and
simulation.
225
Subjects will be asked to fill out the post-experiment survey and NASA TLX (Appendices
F, G). Subjects will also conduct a short debrief interview with the experimenter to
discuss interface design, perceived boredom/workload during test, and any other
comments on the experiment.
230
Subjects are thanked for their participation and paid, and the experiment is over.
103
Appendix B: Participant Consent Form
CONSENT TO PARTICIPATE IN
NON-BIOMEDICAL RESEARCH
Human Performance in Ballistic Missile Response Scenarios
You are asked to participate in a research study conducted by Lee Spence, Ph.D. from the MIT Lincoln
Laboratory Advanced Concepts and Technology Group and Mark Boyer from the MIT Humans and
Automation Laboratory. You were selected as a possible participant in this study because of your interest
in improving human performance in ballistic missile defense scenarios. You should read the information
below, and ask questions about anything you do not understand, before deciding whether or not to
participate.
0
PARTICIPATION AND WITHDRAWAL
Your participation in this study is completely voluntary and you are free to choose whether to be in it or
not. If you choose to be in this study, you may subsequently withdraw from it at any time without penalty
or consequences of any kind. The investigator may withdraw you from this research if circumstances
arise which warrant doing so.
*
PURPOSE OF THE STUDY
Ballistic Missile Decision Support involves a number of very broad and complex issues. The system is
very large, it has many interconnected elements, and it is physically spread over an area that is a
significant fraction of the Earth. During a ballistic missile response, operators will have very little time to
coordinate a defensive actions and may face overwhelming amounts of information. The general purpose
of this study to analyze the human response in a ballistic missile defense scenario.
0
PROCEDURES
If you volunteer to participate in this study, we would ask you to do the following things:
Participate in a 15 minute training session to familiarize yourself with the display and test conditions.
Participate in a long duration scenario with long periods of low activity and several threats to address.
All of these steps will occur in the Tufts University Human-Computer Interface Lab, 196 Boston Avenue,
Medford, MA 02155.
0
POTENTIAL RISKS AND DISCOMFORTS
There are no foreseeable risks in participating in this experiment. Since the fNIRS sensors require a snug
fit for optimal data collection, the sensors are attached to the forehead using elastic headbands. The
104
headband and sensors may be mildly uncomfortable for extended experiments. Subjects may ask the
experimenter to remove the sensors at any time if they feel they are too uncomfortable.
*
POTENTIAL BENEFITS
Your participation in this study will help increase understanding of how humans react to rapid changes in
workload in predominantly low workload environments. Although this study focuses on ballistic missile
defense, other applications include unmanned vehicle operators, nuclear power plant operators and
manufacturing supervisors.
*
PAYMENT FOR PARTICIPATION
Participation in this experiment is strictly voluntary with payment of 125 dollars. Top performing
participants can also win a prize gift card of 150 dollars.
0
CONFIDENTIALITY
Any information that is obtained in connection with this study and that can be identified with you will
remain confidential and will be disclosed only with your permission or as required by law. Your
performance in this study will only be coded by your subject number, which will not be linked to your
name so your participation in this research is essentially anonymous.
0
IDENTIFICATION OF INVESTIGATORS
If you have any questions or concerns about the research, please feel free to contact Lee Spence at Group
36 - Ballistic Missile Defense System Integration, MIT Lincoln Laboratory, 3 Forbes Road, Lexington,
MA 02421 (781) 981-5043 or Professor Missy Cummings at 77 Massachusetts Ave., 33-305, Cambridge,
MA 02139 (617) 252-1512.
105
0
EMERGENCY CARE AND COMPENSATION FOR INJURY
If you feel you have suffered an injury, which may include emotional trauma, as a result of participating
in this study, please contact the person in charge of the study as soon as possible.
In the event you suffer such an injury, M.I.T. may provide itself, or arrange for the provision of,
emergency transport or medical treatment, including emergency treatment and follow-up care, as needed,
or reimbursement for such medical services. M.I.T. does not provide any other form of compensation for
injury. In any case, neither the offer to provide medical assistance, nor the actual provision of medical
services shall be considered an admission of fault or acceptance of liability. Questions regarding this
policy may be directed to MIT's Insurance Office, (617) 253-2823. Your insurance carrier may be billed
for the cost of emergency transport or medical treatment, if such services are determined not to be directly
related to your participation in this study.
0
RIGHTS OF RESEARCH SUBJECTS
You are not waiving any legal claims, rights or remedies because of your participation in this research
study. If you feel you have been treated unfairly, or you have questions regarding your rights as a
research subject, you may contact the Chairman of the Committee on the Use of Humans as Experimental
Subjects, M.I.T., Room E25-143B, 77 Massachusetts Ave, Cambridge, MA 02139, phone 1-617-253
6787.
106
SIGNATURE OF RESEARCH SUBJECT OR LEGAL REPRESENTATIVE
I understand the procedures described above. My questions have been answered to my satisfaction, and I
agree to participate in this study. I have been given a copy of this form.
Name of Subject
Name of Legal Representative (if applicable)
Date
Signature of Subject or Legal Representative
SIGNATURE OF INVESTIGATOR
In my judgment the subject is voluntarily and knowingly giving informed consent and possesses the legal
capacity to give informed consent to participate in this research study.
Date
Signature of Investigator
107
Appendix C: Demographic Survey
Demographic Survey
1. Subject number:
2. Age:
3. Gender: MF
4. Occupation:
If student, (circle one): Undergrad
Masters
PhD
expected year of graduation:
5. Military experience (circle one):
No
Yes
If yes, which branch:
Years of service:
6. How much sleep did you get for the past two nights?
Last night:
Night before last:
7. How often do you play computer games?
Rarely
Monthly
Weekly
A few times a week
Types of games played:
108
Daily
Appendix D: Boredom Proneness Survey
1. It is easy for me to concentrate on my activities.
T IF
2. Frequently when I am working I find myself worrying about other things.
T |F
3. Time always seems to be passing slowly.
T F
4. I often find myself at "loose ends," not knowing what to do.
T F
5. I am often trapped in situations where I have to do meaningless things.
T IF
6. Having to look at someone's home movies or travel slides bores me
tremendously.
7. I have projects in mind all the time, things to do.
T IF
T IF
8. I find it easy to entertain myself.
TIF
9. Many things I have to do are repetitive and monotonous.
T IF
10. It takes more stimulation to get me going than most people.
T IF
11. 1 get a kick out of most things I do.
TIF
12. 1 am seldom excited about my work.
TIF
13. In any situation I can usually find something to do or see to keep me
interested.
14. Much of the time I just sit around doing nothing.
T IF
15. 1 am good at waiting patiently.
T F
16. I often find myself with nothing to do-time on my hands.
T IF
17. In situations where I have to wait, such as a line or queue, I get very restless.
T IF
18. 1 often wake up with a new idea.
T F
19. It would be very hard for me to find a job that is exciting enough.
T IF
20. I would like more challenging things to do in life.
T IF
21. I feel that I am working below my abilities most of the time.
T IF
22. Many people would say that I am a creative or imaginative person.
T IF
23. I have so many interests, I don't have time to do everything.
T IF
24. Among my friends, I am the one who keeps doing something the longest.
T IF
109
T IF
Appendix E: Post-Experiment Survey
Post-experiment Survey
1. How confident were you about the actions you took?
Not Confident Somewhat Confident
Confident
Very Confident
Extremely Confident
2. How did you feel you performed?
Very Poor
Poor
Satisfactory
Good
Excellent
3. Overall, how busy did you feel during the mission?
Idle
Not Busy
Busy
Very Busy
Extremely Busy
4. When did you feel the busiest during the experiment?
5. When did you feel the least busy during the experiment?
3. Overall, how frustrated did you feel during the mission?
Very
Somewhat
Mildly
Not very
Not at all
4. When did you feel the most frustrated during the experiment?
5. When did you feel the least frustrated during the experiment?
4. Did you feel distracted at any point in the mission?
Yes
No
If so, please list some of the items or activities that distracted you from the mission:
110
5. How quickly did you feel you detected threats?
Slow
Very slow
Very Fast
Fast
6. How clear was the alert that incoming missiles had been launched?
Very Poor
Poor
Good
Satisfactory
Excellent
7. How comfortable did you feel with the interface?
Very Uncomfortable
slightly Uncomfortable
Neutral
Comfortable
8. What changes to the interface would help you improve your situational awareness?
9. Other comments:
111
Very Comfortable
Appendix F: Message Panel Alert Times
Onset time: 40 Minutes
Time (sec)
Message
1400
"REGIONAL BMDS ON ALERT" (False Alarm)
2400
"REGIONAL BMDS ON ALERT" (Event @ t=2500)
5000
"REGIONAL BMDS ON ALERT" (False Alarm)
8200
"REGIONAL BMDS ON ALERT" (False Alarm)
10800
"REGIONAL BMDS ON ALERT" (Event @ t=10900)
Onset time: 100 Minutes
Time (sec)
Message
1400
"REGIONAL BMDS ON ALERT" (False Alarm)
2400
"REGIONAL BMDS ON ALERT" (False Alarm)
5000
"REGIONAL BMDS ON ALERT" (Event @ t=6100)
8200
"REGIONAL BMDS ON ALERT" (False Alarm)
10800
"REGIONAL BMDS ON ALERT" (Event @t=10900)
Onset time: 160 Minutes
Time (sec)
Message
1400
"REGIONAL BMDS ON ALERT" (False Alarm)
2400
"REGIONAL BMDS ON ALERT" (False Alarm)
5000
"REGIONAL BMDS ON ALERT" (False Alarm)
9000
"REGIONAL BMDS ON ALERT" (Event @ t=9600)
10800
"REGIONAL BMDS ON ALERT" (Event @ t=10900)
112
Appendix G: Summary of Variables
Variable
Mean
Std. Dev.
Min.
Median
Max.
Demographics
Age (years)
21.3
2.51
18
21
31
12 male, 18 female
21 undergraduate,
7 Masters, 2 other
7.45
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
1.56
5
7
12
Gender
Occupation
Sleep (1 night previoushours)
_
Sleep (2 nights previous hours)
7.28
1.39
4
7.75
9
Video gaming
((Rank) Category Frequency)
(1) Less than once
per month -18
(2) Monthly -4
(3) Weekly - 3
(4) A few times per
week -2
(5) Daily - 3
1.38
1
1
5
Mean: 1.93
Personality Indexes
Boredom Proneness
Five Factor Index:
5.76
3.56
1
5
17
Neuroticism
21
8.44
8
20
38
Extraversion
32.1
5.86
22
Openness to Experience
27.7
3.62
20
32.5
28.5
47
32
Agreeableness
Conscientiousness
Workload Assessment
NASA TLX
32.2
32.2
4.27
6.59
23
32.5
40
21
31
48
4.94
1.61
2.47
4.97
9.07
1.23
0.504
1
1
3
HbO baseline
2.08
1.26
0.57
1.86
6.09
HbR baseline
-0.72
HbT baseline
HbO Avg. of Max.
1.92
2.73
0.62
1.03
2.24
-2.78
0.56
-0.568
1.69
-0.177
4.97
-0.51
1.86
8.01
HbR Avg. of Min.
-0.89
0.57
-2.86
-0.84
0.09
Video Coding
Distraction Coding (1-3)
fNIRS Metrics (pmol/L)
113
HbT Avg. of Max.
Physio. Response Metrics
2.42
1.93
-2.96
-0.73
1.57
HbO Time to Max (seconds).
106.4
64.0
0.5
108.75
199.5
HbO Return from Max to
Baseline (seconds)
221.7
217.3
9.5
194.5
1174.5
HbO Level-Off Time (sec)
32.6
17.1
12
29
78
HbR Level-Off Time (sec)
23.1
11.7
11
17
46
HbO Start to Level-Off Slope
0.259
0.17
0.0285
0.201
0.778
(imol/L/min)
HbR Start to Level-Off Slope
-0.070
0.081
-0.342
-0.047
0.003
Average Response Time
(seconds)
13.7
7.96
6.13
10.28
37.68
Average Missed Questions
0.4
0.67
0
0
2
Average Final Track Error
27.04
60.39
0.607
5.08
257.6
% Below Threshold
Wave 2
81.1%
25.0%
16.7%
91.6%
100%
Average Final Track Error
52.77
103.14
0.751
5.25
500
% Below Threshold
75.9%
31.4%
0%
100%
100%
Long-Term Trends
(imol/L/min)
Chat Box
Performance
Wave 1
114
Appendix H Results Tables
Multiple Comparisons
Dependent Variable: PercentAvgtax0
Tukey HSD
Mean
Differiince aIStd. Error
J)
Imonse
.46644
.066
100
1.1033
40
.303
.46544
-.7056
160
.066
.46644
-1.1033
40
100
.002
.46644
-1.8089.
160
.303
.7056
.46644
40
160
.002
.46644
1.B089'
100
Based on observed means.
The error term Is Mean Square(Error) - 1.0868.
'. The mean dlference is signifcant at the
9S% Conidence
interval
Lower Bound
-.0616
r Bound
2.2681
-1.9703
-2.2681
-2.9738
-.492
.4592
.0616
.6441
1.8705
.6441
2.9738
Table 6: HbO Comparisons for Onset Time
Multiple Comparisons
Dependent Variable: percentAvgMInR
Tukey HS
Mean
Diffrence 0IStd. Error sl,
J
to ()nstnse
40
100
.4812
.63930
.735
160
-1.1204
.63930
.207
.63930
.73$
100
40
-.4812
160
-1.6015*
.63930
.049
.63930
.207
160
40
1.1204
Lower Saind
.049
.0050
1.60154
100
.63930
95s Confidence Interval
-1,1153
-2.7169
-2.0777
-3.1981
-.4762
Upper Bound
2.0777
.4762
1.1153
-.0050
2.7169
3.1961
The error term Is Mean Square(Error) - 2.044.
The rnean difference in sisncant at dhe
Table 7: HbR Comparisons for Onset Time
Multiple Comparisons
Dependent Variable: ratIoAvgMax
Tukey
HSD
Mien
Difference (IStd. Error Mg. Lower Bound Upper Bound
.134
40
100
-1.6260
.81248
.6498
.81248
.707
160
100
40
1.6260
.81248
.134
160
2.27581 .81248
.026
.51248
.707
40
-.6498
160
.81245
.026
100
-2.2758
Based on observed means.
The error term Is Mean Square(Error) a 3.301.
1. The mean difference Is signifcant at the
ID
onset
ILonset
95% Confidence Interval
)
-3.6550
-1.3792
-. 4030
.2468
-2.6788
-4.3048
.4030
2.6786
3.6550
4.3048
1.3792
-. 2468
Table 8: HbO/HbR Marginal Means Onset Time Comparison
115
Model Summary
R R
IModal
Ol stmt
1
u
RSquare
.6301
.397 1
.301
50.50335
a. Predictors: (Constan), A, distraction. AvgMinR.
ideogame
Moa
1
Regression
Residual
df
63764.705
4
Mean uare
10501.990
2
25.5_8
F
4.117
.
ANOVA
sum of
Squares
42007.961
.it,
Total
105772.667
29
a. Dependent Variable: FhnalfrackErrorl
b. Predictors: (Constan), A, dIstraction. AVgMinR, vldeogame
CoefftlentsO
Unstandardized Coeffcients
B
Std. Error
1
(Constant)
195.560
51.393
videogame
-17.637
7.208
distraction
37.540
18.240
AvginR
-42.061
16.790
A
-6.896
2.349
a. Dependent Variable: FkialTracklrrorl
Standardized
Coeffdents
Beta
Model
-.405
.333
-.401
-.486
t
2.403
-2.447
519.
.024
.022
2.058
.050
-2.505
-2.937
.019
.007
Table 9: Performance Model Summary
Tests of Between-Subjects Effects
Dependent Variable:
T
chatresponsetime
e iMSum
Tmrc
X
esu
Corrected Model
Intercept
C
dIfficulty
onset
difficulty * onset
Error
Total
856.358'
-
Adsef
df
Mean Square
6
1002.973
1
301.409
1
193.449
1
241.149
2
179.983
2
983.521
23
7488.793
30
Corrected Total
1839.879
29
a. R Squared - .465 (Adjusted R Squared
142.726
1002.973
301.409
193.449
120.575
89.992
42.762
-
F
3.338
23.455
7.049
4.524
2.820
2.104
.326)
Table 10: Chat Response Model
Lateralization (Left-Right)
Measure
Wave
1 (p-value)
Wave 2 (p-value)
HbO (HbO L - R)
0.609
0.123
HbR (HbR L -R)
0.887
0.860
HbO/HbR
0.449
0.181
Table 11: Lateralization Effects
116
Sg
.016
.000
.014
.044
.080
.145
Lateralization (Left-Right)
Measure
Wave 1 (p-value)
Wave 2 (p-value)
Diff.
Time
Diff
Time
HbO
0.73
0.42
0.35
0.81
HbR
0.79
0.38
0.73
0.11
HbO/HbR
0.30
0.54
0.40
0.86
&
Table 12: Lateralization by Difficulty
Time
Lateralization (Left-Right) Performance Model for Average Final Error
1 (p-value)
Wave 2 (p-value)
Predictor
Wave
HbO (HbO Left-Right)
0.804
0.319
HbR (HbR Left-Right)
0.537
0.997
HbO/HbR
0.899
0.879
Table 13: Lateralization vs Performance Model
117
Appendix I: Return to baseline calculations
SubjNo
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Base+1SD
1.382
7.4951
84.999
6.4328
0.93893
3.3831
8.9337
8.5228
74.576
-2.2255
4.8555
7.4287
6.2103
3.8036
3.6467
73.069
1.3145
5.2572
4.2215
3.9536
-2.1781
0.02712
-3.4383
7.8521
12.153
9.1268
0.3825
3.3024
8.8032
11.868
Max Val
Time2Max
2.7047
91
12.072
0.5
175.16
95
26.155
68.5
2.3355
0.5
8.9803
95
17.021
93
14.633
4
164.56
57
5.742
3.5
8.8314
28
11.548
95
9.127
88.5
7.5472
100.5
5.5168
89
150.34
60.5
4.2627
13
5.6383
0.5
5.321
3.5
5.5636
99.5
4.2626
82
3.5231
98
1.0611
36
13.743
85.5
15.106
32
13.199
54
6.9281
43.5
12.122
99.5
11.109
0.5
17.579
100.5
118
Rtn2Base
N/A
N/A
N/A
119
2
110.5
N/A
5.5
59.5
8.5
28
154.5
N/A
101
89
111
N/A
N/A
N/A
99.5
202.5
131.5
N/A
85.5
N/A
N/A
111.5
152.5
95.5
107.5
Appendix J: Time elapsed: Final chat message to Wave 1 start
Subj No Time chat before event (sec)
15.508
3
7.206
4
48.239
5
9.122
6
15.008
7
6.101
8
8.749
9
18.105
10
43.187
11
44.807
12
4.529
13
3.925
14
24.389
15
23.512
16
5.462
17
39.067
18
119.76
19
114.129
20
79.519
21
140.496
22
31.592
23
43.265
24
10.117
25
2.072
26
18.764
27
18.783
-28
36.653
29
36.204
30
117.6
31
1.922
32
119
Appendix K: fNIRS Data with Vigilance Features Plotted
'
0segee
0
~T
1
4
-%L
40S0
AJ~.
*CW
Th~0
*A4o4
-HbO
06T
Q es m*"-v
26 qm bO~WOs o44*1udo"OM~f
UW4
ROesN&"n
aop
-rn
I
II '~
i
10 ~
0
N
if0
o
am
we
TMW4"0
120
00
to"
9bW*
30
5
HiOUasp. 0A8 a3maemin
ma ulpw 0 0030M Sor..sIM01
MO0 L~vOffM 14 nInSs
aOR
Is
I
I..m
MIT
-4
0
30!-
to
000
0
M60Slope. 0.264041ismsbuole
I
II
i iI I I
Is
i
111
11
10
0
Mba se: 0 miwSomolobAl
Mbo LeO4M. 4s ""An
aol-
12000
10000
w0o
i11v
1V
NIf
Ii
iuuuI
1 tI
U'I
F
r!if
6
0
4
0
200
10000
400
121
1000
30 I
bO Sosp, 0,02M 3wam4sr~a
25
NbO tav*OR:
--
It M
HbO
HbOft
Mbaft
20
15
020
4000
r000
0000
30
~
Su ap. 00WNoww8 O
25
Sspr 4000710 leuwm
14110
-. eou
pHmIM
MW LOvOM 43S
-bA
FM,
s0
10
-10--c -
--- --- -'- -200
400
two0
am0
122
I000
00
swo1
19
-HW
HbOSSe: 0.IflSI 10mN
"b
slp: 0 .U
NbO L*Velv0:
25
tftoft
iumiu**
8*s44
20
Is
10
5
0
4
2000
4000
. 000
-
*0
10000
12000
Outiec 10
30 r
0 Slope: 0213,t wsoab*iAn
I* Sap: 4,036124 NIWMMoIKhi~n
bO Lowe4*: 29 mies
25
MO it
20
Is
to
5
0
.5
0
2000
4000
000
80
Taes
123
10000
ad*
30
MWOgsp.' 02#07
I
-t*0
1*0 ft
MHbR
aktouab*
IbR ShW 0 QWinrshArbm
25
Mb0 LIowi-. 3 ao
20
"Ut
02000
sm0
*hJ"" 12
30 H
OS
MW.
9
.00
fe o~ i
034umosA
Hb0
-Nb
25
20-
Eto
0
2m0
wo0
4w0
124
10000
H
ft"No
20
13
iRnIOMN
a,2
NwoOop 0
Mbf lSp' ft 07w 2 .Wwwuiw i"R
25
MbO Levek*: 2 aboAf
-"oft
~~~
ftp
2s
Is
10i
5
0
4
Ewed
4 1
2000
000
SM
30
14
----
HbomsOAe;O WA 76wasmftm
ibs sop:
e
i.O Lsv*M:3. 4? .ANu
isi
lit
I0
S
0
.5
Ivent
0
20oo
4000
12000
WOO
r0w
125
..
.1
114o~
0""e Is
30
bO9p:. 420.14M s ukumoi
HbW oko -06066 44isrm a usohim
ssa
w.
Nbo LV*..
/t
MbRF
to
0
-0
-5
-O
oi
0
000
000
10000
30
HbO
Ope: 0.10061
*iNemoftin
!*A
A : -0 0737 mIwwom#*wa
25
Hb
Fi k
~HRR
HbO Lev*04. 34 uaftfif
20
Is
10
5
0
-101
2000
4000
8000
Ti"-
8000
126
10000
12000
9*4"e 1?
30
26
16
-
"bo Ismp 0.064319 WM..W&
MbA SI: 40$774 um,*w*rn
N*0
1bo Uwe-ON 784in
F
to
5
0
.5
Ev.Wt
'*01
0
20W0
4M00
loom0
TWO 0
,2000
&*I...*
30
MeO 0"p: O.SWI4 wftvmoktrksi
!bn asUp: 47610 uiemmolw*'*
1*0
wN
-
wufts
.evsI":21
Is
10
5
0
Evwnt
I
*
.101.
0
10000
4w00
127
1ism0
oIpe: 0 .313 mk#*wokrd*
-
IM0
H0
w LeveOk: 13 nut.s
20
15 k
toi
6
0
.5
4
0
r
fEwt
000
jSW*
30
MbOSp: 0.1021Oaiwosw
bR ROa. -0045
bO Lvwe
I2000
m0
.(..
20
Mb
bgHOft
mmalNiA*'
: 14 r6aoa
20
to
I0
5
0
2000
400
.am
a000
7e1("0
128
,0000
,.O.
SA2at
30
HbO
MWp; 0.1302 ewawfmmrn
HbA Ob
21
0 mic: minmb
20
Is-
to
h4
0
-5
-10
am0
4000
I
I KO
10000
Tim ('.0
*WSd 22
30 v
NbO Oop. 0.14MmsmmawaMbk
25
hI Sope: .0,04740 19romMw
HbO LOV.I0 20m*AG
in
hbm
bO
Oft
20
I
16
to
S
,
I
' I
i
0
2000
4000
00
0000
ame~en
129
I
I
10000
12000
I
x
IL]
ii
m
IIi
a
I
I
Is
0
4;
8
I
I
ii!
I
Li
I
9
III
a
I I'
I I, I
2
0-
a
I~
40dr
i
Ii
I
L.
I
C0
30
25
---
IO sk: CAVO weaiain
1b*"e: -00M *#NW~koo
rw'J LW"..
UW
20
HW
MR
-
v~
r
I
-S
I--.i0~
-,
1 17 -",,1
4M~
200
ft0
om
ow0
-"bO
17.1 -
14 0 Fa
SI
25
U
20
ia
Is
to
5
0
-6
.10'
0
I
4m0
sow0
WIM
T**("Co
131
60"
27
NbO
HbO8&p :0.2 s
yasmubim
HbR UWop. 0 iwomaoui
.a
O Lvf6Ot, 41 ,iiss
20
15
10
0
.4
-10~
p
2000
4=0
am0
10000
12000
TV" (E0
30
-~bO
bO Sop.: 0 XdWOWAbANiM
a HAn Sope: -034204 awanraMin
HbO LevelS: 0 uauss
-
20
,'1 11411111
1i
-5-
0
2
4M0
80
r000
T..<..
132
0000
12
I
8
;ii
Ih
I
~
,
0
wmiw=ii--
dlmmNmFm9=:;;"
S
-
W
#
I
0
Ii
0
,
I
Iii
Iii
I-
0
I
A
-.
Sbed 3t
30
-o
,
HbO ope: 013017 eoamahn
NbOtev.M 02lms
20-
.
1
LA
to
~
e
~
Tt
-5
0
2000
8000
4000
TaS
(32
0
32
-bQ
HbO 6p.: 0.3063 oesmosse
*A Np. .4070102 IwomaWhr0
2a
-
HO Lgev*: 34 O.N.
i ilk
I
W1I
I
12000
10000
Nbfk
"M
11
it
,III
diA, LJ1
11:
11
10
f
1-1 1
11'
I
Eve
s
0
10
0
2000
4000
tit,
Te000
amo
000
134
t
10000
12000
References
.
Alfredson, J., Holmberg, J., Andersson, R., & Wikforss, M. (2011). Applied cognitive
ergonomics design principles for fighter aircraft EngineeringPsychology and Cognitive
Ergonomics (pp. 473-483): Springer.
Alves, E. E., & Kelsey, C. M. (2010). Combating Vigilance Decrement in a Single-Operator
Radar Platform. Ergonomics in Design: The Quarterlyof Human FactorsApplications,
18(2), 6-9. doi: 10.1518/106480410x12737888532688
Ayaz, H., Shewokis, P. A., Bunce, S., Izzetoglu, K., Willems, B., & Onaral, B. (2012). Optical
brain monitoring for operator training and mental workload assessment. Neuromage,
59(1), 36-47.
Bainbridge, L. (1983). Ironies of Automation. Automatica, 19(6), 775-779.
Barmack, J. E. (1939). A Definition of Boredom: A Reply to Mr. Berman. The American Journal
ofPsychology, 52(3), 467-471. doi: 10.2307/1416759
Battiste, V., & Bortolussi, M. (1988). Transportpilot workload: A comparison oftwo subjective
techniques. Paper presented at the Proceedings of the Human Factors and Ergonomics
Society Annual Meeting, Anaheim, CA.
Beatty, J. (1982). Task-evoked pupillary responses, processing load, and the structure of
processing resources. PsychologicalBulletin, 91(2), 276.
Bejtlich, R. (2013). The Practiceof Network Security Monitoring: UnderstandingIncident
Detection and Response: No Starch Press.
Bekier, M., Molesworth, B. R., & Williamson, A. (2011). Defining the drivers for accepting
decision making automation in air traffic management. Ergonomics, 54(4), 347-356.
Bennett, C. M., & Miller, M. B. (2010). How reliable are the results from functional magnetic
resonance imaging? Annals of the New York Academy of Sciences, 1191(1), 133-155.
Berguer, R., Smith, W., & Chung, Y. (2001). Performing laparoscopic surgery is significantly
more stressful for the surgeon than open surgery. Surgical endoscopy, 15(10), 1204-1207.
Berka, C., Levendowski, D. J., Lumicao, M. N., Yau, A., Davis, G., Zivkovic, V. T., . . . Craven,
P. L. (2007). EEG correlates of task engagement and mental workload in vigilance,
learning, and memory tasks. Aviation, space, and environmental medicine,
78(Supplement 1), B23 1-B244.
Berka, C., Levendowski, D. J., Ramsey, C. K., Davis, G., Lumicao, M. N., Stanney, K., . .
Stibler, K. (2005). Evaluation of an EEG workload model in an Aegis simulation
environment. Paper presented at the Defense and Security.
Boles, D. B., & Adair, L. P. (2001). The multiple resourcesquestionnaire(MRQ). Paper
presented at the Proceedings of the Human Factors and Ergonomics Society Annual
Meeting, Minneapolis, MN.
Boot, W. R., Kramer, A. F., Simons, D. J., Fabiani, M., & Gratton, G. (2008). The effects of
video game playing on attention, memory, and executive control. A cta Psychologica,
129(3), 387-398.
Broadbent, D. E. (1958). The general nature of vigilance. Perceptionand Communication, 108139.
Brown, G. H., & Carroll, C. D. (1984). The Effect of Anxiety and Boredom on Cognitive Test
Performance.
135
Bruursema, K., Kessler, S. R., & Spector, P. E. (2011). Bored employees misbehaving: The
relationship between boredom and counterproductive work behaviour. Work & Stress,
25(2), 93-107.
Buckner, R. L., & Logan, J. M. (2001). Functional neuroimaging methods: PET and fMRI.
Handbook offunctional neuroimagingof cognition, 27-48.
Burton, R. R. (1980). Human responses to repeated high G stimulated aerial combat maneuvers.
Aviation, space, and environmental medicine, 51(11), 1185.
Cabeza, R., & Kingstone, A. (2001). Handbook offunctional neuroimagingof cognition: Mit
Press.
Cabeza, R., & Nyberg, L. (2000). Imaging cognition II: An empirical review of 275 PET and
fMRI studies. Journal of cognitive neuroscience, 12(1), 1-47.
Caggiano, D. M., & Parasuraman, R. (2004). The role of memory representation in the vigilance
decrement. Psychonomic bulletin & review, 11(5), 932-937.
Caldwell, J. A. (2005). Fatigue in aviation. Travel Medicine andInfectious Disease, 3(2), 85-96.
Carr, V. A., Rissman, J., & Wagner, A. D. (2010). Imaging the Human Medial Temporal Lobe
with High-Resolution f4RI. Neuron, 65(3), 298-308. doi:
http://dx.doi.org/0.1016/i.neuron.2009.12.022
Causse, M., Peran, P., Dehais, F., Caravasso, C. F., Zeffiro, T., Sabatini, U., & Pastor, J. (2013).
Affective decision making under uncertainty during a plausible aviation task: An fMRI
study. Neuromage, 71(0), 19-29. doi:
http://dx.doi.org/I0.1016/i.neuroimage.2012.12.060
Chance, B., Anday, E., Nioka, S., Zhou, S., Hong, L., Worden, K., ... Thomas, R. (1998). A
novel method for fast imaging of brainfunction, non-invasively, with light. Opt. Express,
2(10), 411-423.
Chen, J. Y., & Barnes, M. J. (2012). Supervisory Control of Multiple Robots Effects of
Imperfect Automation and Individual Differences. Human Factors: The Journalof the
Human Factorsand Ergonomics Society, 54(2), 157-174.
Clare, A. S., Cummings, M. L., How, J. P., Whitten, A. K., & Toupet, 0. (2012). Operator
Object Function Guidance for a Real-Time Unmanned Vehicle Scheduling Algorithm.
JournalofAerospace Computing, Information, and Communication, 9(4), 161-173.
Cohen, J. D., Forman, S. D., Braver, T. S., Casey, B. J., Servan-Schreiber, D., & Noll, D. C.
(1993). Activation of the prefrontal cortex in a nonspatial working memory task with
functional MRI. Human Brain Mapping, 1(4), 293-304. doi: 10.1002/hbm.460010407
Cooper, G. E., & Harper Jr, R. P. (1969). The use of pilot rating in the evaluation of aircraft
handling qualities. Neuilly-sur-Seine, France: NATO Advisory Group for Aerospace
Research and Development.
Costa Jr, P. T., McCrae, R. R., & Dye, D. A. (1991). Facet scales for agreeableness and
conscientiousness: A revision of tshe NEO personality inventory. Personalityand
individual differences, 12(9), 887-898. doi: http://dx.doi.org/10.1016/01918869(91)90177-D
Coyle, S., Ward, T., & Markham, C. (2003). Brain& Computer Interfaces: A Review.
InterdisciplinaryScience Reviews, 28(2), 112-118. doi: 10.1179/030801803225005102
136
Cui, X., Bray, S., Bryant, D. M., Glover, G. H., & Reiss, A. L. (2011). A quantitative
comparison of NIRS and fMRI across multiple cognitive tasks. Neurolmage, 54(4), 28082821. doi: http://dx.doi.org/10.1016/i.neuroimage.2010.10.069
Cummings, M. L. (2004). The Need for Command and Control Instant Message Adaptive
Interfaces: Lessons Learned from Tactical Tomahawk Human-in-the-Loop Simulations
CyberPsychology & Behavior, 7(6).
Cummings, M. L., Clare, A., & Hart, C. (2010). The role of human-automation consensus in
multiple unmanned vehicle scheduling. Human Factors: The Journalof the Human
Factorsand Ergonomics Society, 52(1), 17-27.
Curtis, C. E., & D'Esposito, M. (2003). Persistent activity in the prefrontal cortex during working
memory. Trends in cognitive sciences, 7(9), 415-423.
D'Mello, S., Chipman, P., & Graesser, A. (2007). Postureas a predictorof learner's affective
engagement. Paper presented at the Proceedings of the 29th Annual Cognitive Science
Society, Nashville, TN.
Damos, D. L. (1991). Multiple task performance: CRC Press.
Davies, D., & Krkovic, A. (1965). Skin-conductance, alpha-activity, and vigilance. The
American Journalof Psychology, 78(2), 304-306.
Davies, D., & Parasuraman, R. (1982). The psychology ofvigilance: Academic Press London.
De Waard, D., & Studiecentrum, V. (1996). The measurement of drivers'mental workload:
Groningen University, Traffic Research Center.
Dickens, C. (1853). Bleak House. England: Bradbury & Evans.
Dickerson, S. S., & Kemeny, M. E. (2004). Acute stressors and cortisol responses: a theoretical
integration and synthesis of laboratory research. PsychologicalBulletin, 130(3), 355.
Drory, A. (1982). Individual Differences in Boredom Proneness and Task Effectivness at Work.
[Article]. PersonnelPsychology, 35(1), 141-151.
Droste, D. W., Harders, A. G., & Rastogi, E. (1989). Two transcranial doppler studies on blood
flow velocity in both middle cerebral arteries during rest and the performance of
cognitive tasks. Neuropsychologia, 27(10), 1221-1230. doi:
http://dx.doi.org/10.1016/0028-3932(89)90034-1
Durantin, G., Gagnon, J.-F., Tremblay, S., & Dehais, F. (2014). Using near infrared spectroscopy
and heart rate variability to detect mental overload. Behaviouralbrain research, 259, 1623.
Dussault, C., Jouanin, J.-C., Philippe, M., & Guezennec, C.-Y. (2005). EEG and ECG changes
during simulator operation reflect mental workload and vigilance. Aviation, space, and
environmentalmedicine, 76(4), 344-351.
Dyer-Smith, M. B., & Wesson, D. A. (1995). Boredom and expert error. Contemporary
Ergonomics, 56-56.
Eastwood, J. D., Frischen, A., Fenske, M. J., & Smilek, D. (2012). The Unengaged Mind
Defining Boredom in Terms of Attention. Perspectiveson PsychologicalScience, 7(5),
482-495.
Endsley, M. R. (1995). Toward a theory of situation awareness in dynamic systems. Human
Factors: The Journalof the Human Factorsand Ergonomics Society, 3 7(1), 32-64.
137
Endsley, M. R., & Rodgers, M. D. (1997). Distribution of attention, situation awareness, and
workload in a passive air traffic control task: Implications for operational errors and
automation. Air Traffic Control Quarterly, 6(1), 21-44.
Fahiman, S. A., Mercer-Lynn, K. B., Flora, D. B., & Eastwood, J. D. (2013). Development and
validation of the multidimensional state boredom scale. Assessment, 20(1), 68-85.
Farmer, R. a. S., N.D. (1986). Boredom Proneness: The Development and Correlates of a New
Scale. Journalof PersonalityAssessment(50), 4-17.
Fisher, C. D. (1993). Boredom at Work: A Neglected Concept. Human Relations, 46(3), 395417. doi: 10.1177/001872679304600305
Forest, L. M., Kahn, A., Thomer, J., & Shapiro, M. (2007). The Design and Evaluationof
Human-GuidedAlgorithmsforMission Planning. Paper presented at the Human Systems
Integration Symposium, Annapolis, MD.
Frankenhaeuser, M., & Lundberg, U. (1982). Psychoneuroendocrine aspects of effort and
distress as modified by personal control. Mental load and stress in activity European
approaches,97-103.
Frankenhaeuser, M., Nordheden, B., Myrsten, A.-L., & Post, B. (1971). Psychophysiological
reactions to understimulation and overstimulation. Acta Psychologica, 35(4), 298-308.
Frankenhaeuser, M., & Patkai, P. (1965). Interindividual differences in catecholamine excretion
during stress. ScandinavianJournalofPsychology, 6(4), 117-123.
Gagnon, L., Yfcel, M. A., Dehaes, M., Cooper, R. J., Perdue, K. L., Selb, J., . . . Boas, D. A.
(2012). Quantification of the cortical contribution to the NIRS signal over the motor
cortex using concurrent NIRS-tMRI measurements. Neurolmage, 59(4), 3933-3940.
Gianaros, P. J., Van der Veen, F. M., & Jennings, J. R. (2004). Regional cerebral blood flow
correlates with heart period and high - frequency heart period variability during
working - memory tasks: Implications for the cortical and subcortical regulation of
cardiac autonomic activity. Psychophysiology, 41(4), 521-530.
Girourd, A., Solovey, E., Hirshfield, L., Chauncey, K., Sassaroli, A., Fantini, S., & Jacob, R.
(2009). Distinguishing Difficulty Levels with Non-invasive Brain Activity
Measurements. INTER ACT 2009, Part1, 440-452.
Green, C. S., & Bavelier, D. (2003). Action video game modifies visual selective attention.
Nature, 423(6939), 534-537.
Grubb, E. A. (1975). Assembly Line Boredom and Individual Differences in Recreation
Participation. JournalofLeisure Research.
Haller, S., Bartsch, A., Radue, E., Klarh5fer, M., Seifritz, E., & Scheffler, K. (2005). Effect of
fMRI acoustic noise on non-auditory working memory task: comparison between
continuous and pulsed sound emitting EPI. Magnetic ResonanceMaterials in Physics,
Biology and Medicine, 18(5), 263-271. doi: 10.1007/si0334-005-0010-2
Hamilton, J. A., Haier, R. J., & Buchsbaum, M. S. (1984). Intrinsic enjoyment and boredom
coping scales: Validation with personality, evoked potential and attention measures.
Personalityand individual differences, 5(2), 183-193.
Hancock, P., Mihaly, T., Rahimi, M., & Meshkati, N. (1988). A bibliographic listing of mental
workload research. Advances in Psychology, 52, 329-333.
Hancock, P. A., & Desmond, P. A. (2001). Stress, workload, andfatigue: Psychology Press.
138
.
Hancock, P. A., & Krueger, G. P. (2010). Hours of Boredom, Moments of Terror: Temporal
Desynchrony in Military and Security Force Operations. Washington, DC: Center for
Technology and National Security Policy, National Defense University.
Hancock, P. A., & Warm, J. S. (1989). A Dynamic Model of Stress and Sustained Attention.
Human Factors:The Journalof the Human Factorsand Ergonomics Society, 31(5), 519537. doi: 10.1177/001872088903100503
Harrison, J., Izzetoglu, K., Ayaz, H., Willems, B., Hah, S., Woo, H., . . . Onaral, B. (2013).
Human Performance Assessment Study in Aviation Using Functional Near Infrared
Spectroscopy FoundationsofAugmented Cognition (pp. 433-442): Springer.
Hart, C. (2010). Assessing the Impact of Low Workload in Supervisory Control of Networked
Unmanned Vehicles.
Hart, S. G. (2006). NASA-task load index (NASA-TLX); 20 years later. Paper presented at the
Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Baltimore,
MD.
Hart, S. G., & Sheridan, T. B. (1984). Pilot workload, performance, and aircraft control
automation. Moffett Field, CA: National Aeronautics and Space Administration Ames
Research Center.
Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results
of empirical and theoretical research. Human mental workload, 1(3), 139-183.
Hart, S. G., & Wickens, C. D. (1990). Workload assessment and prediction Manprint(pp. 257296): Springer.
Haworth, L. A., Atencio Jr, A., Bivens, C., Shively, R., & Delgado, D. (1987). Advanced
helicopter cockpit and control configurations for helicopter combat mission tasks. The
Man-Machine Interface:"in TacticalAircraft Design and Combat Automation, 39.
Heberlein, L. T., Dias, G. V., Levitt, K. N., Mukherjee, B., Wood, J., & Wolber, D. (1990). A
network security monitor. Paper presented at the Research in Security and Privacy, 1990.
Proceedings., 1990 IEEE Computer Society Symposium on.
Helton, W. S., Warm, J. S., Tripp, L. D., Matthews, G., Parasuraman, R., & Hancock, P. A.
(2010). Cerebral lateralization of vigilance: A function of task difficulty.
Neuropsychologia, 48(6), 1683-1688. doi:
http://dx.doi.org/0.1016/i.neuropsychologia.2010.02.014
Heron, W. (1957). The pathology of boredom. Scientific American.
Hill, S. G., Zaklad, A. L., Bittner, A. C., Byers, J. C., & Christ, R. E. (1988). Workload
assessment of a mobile air defense missile system. Paper presented at the Proceedings of
the Human Factors and Ergonomics Society Annual Meeting, Anaheim, CA.
Hirshfield, L. M., Chauncey, K., Gulotta, R., Girouard, A., Solovey, E. T., Jacob, R. J., . .
Fantini, S. (2009). CombiningElectroencephalographand FunctionalNear Infrared
Spectroscopy to Explore Users'Mental Workload. Paper presented at the Proceedings of
the 5th International Conference on Foundations of Augmented Cognition.
Neuroergonomics and Operational Neuroscience: Held as Part of HCI International 2009,
San Diego, CA.
Hjortskov, N., Rissdn, D., Blangsted, A., Fallentin, N., Lundberg, U., & Sogaard, K. (2004). The
effect of mental stress on heart rate variability and blood pressure during computer work.
139
EuropeanJournalofApplied Physiology, 92(1-2), 84-89. doi: 10.1007/s00421-004-1055z
Hopkin, V. D. (1988). Air traffic control. In E. L. W. D. C. Nagel (Ed.), Humanfactorsin
aviation (pp. 639-663). San Diego, CA, US: Academic Press.
Hopkin, V. D. (1995). Humanfactors in air traffic control: CRC Press.
Hu, B., Majoe, D., Ratcliffe, M., Qi, Y., Zhao, Q., Peng, H., . . . Moore, P. (2011). EEG-Based
Cognitive Interfaces for Ubiquitous Applications: Developments and Challenges.
IntelligentSystems, IEEE, 26(5), 46-53.
Huey, B. M., & Wickens, C. D. (1993). Workload Transition:Implicationsfor Individualand
Team Performance:The National Academies Press.
Huppert, T. J., Hoge, R. D., Diamond, S. G., Franceschini, M. A., & Boas, D. A. (2006). A
temporal comparison of BOLD, ASL, and NIRS hemodynamic responses to motor
stimuli in adult humans. Neurolmage, 29(2), 368-382. doi:
http://dx.doi.org/10.1016/i.neuroimage.2005.08.065
Iso-Ahola, S. E., & Weissinger, E. (1987). Leisure and boredom. Journalofsocial and clinical
psychology, 5(3), 356-364.
Izzetoglu, K., Ayaz, H., Merzagora, A., Izzetoglu, M., Shewokis, P. A., Bunce, S., . . . Onaral, B.
(2011). The Evolution of Field Deployable fNIR Spectroscopy from Bench to Clinical
Settings. JournalofInnovative OpticalHealth Sciences, 04(03), 239-250. doi:
doi: 10.1142/S1793545811001587
Izzetoglu, M., Izzetoglu, K., Bunce, S., Ayaz, H., Devaraj, A., Onaral, B., & Pourrezaei, K.
(2005). Functional near-infrared neuroimaging. Neural Systems and Rehabilitation
Engineering,IEEE Transactionson, 13(2), 153-159.
Jaeggi, S. M., Seewer, R., Nirkko, A. C., Eckstein, D., Schroth, G., Groner, R., & Gutbrod, K.
(2003). Does excessive memory load attenuate activation in the prefrontal cortex? Loaddependent processing in single and dual tasks: functional magnetic resonance imaging
study. Neurolmage, 19(2 Pt 1), 210-225.
Jagacinski, R. J. (1989). Target acquisition: Performance measures, process models, and design
implications Applications of Human PerformanceModels to System Design (pp. 135149): Springer.
Janis, I. L., & Mann, L. (1977). Decision making: A psychological analysis of conflict, choice,
and commitment. New York, NY, US: Free Press.
Jelzow, A., Tachtsidis, I., Kirilina, E., Niessing, M., BrUhl, R., Wabnitz, H., . . . Macdonald, R.
(2011). Simultaneous measurement of time-domainfNVIRS andphysiological signals
during a cognitive task. Paper presented at the European Conferences on Biomedical
Optics.
J6bsis, F. F. (1977). Noninvasive, infrared monitoring of cerebral and myocardial oxygen
sufficiency and circulatory parameters. Science (New York, N. Y), 198(4323), 1264-1267.
Kahneman, D. (1973). Attention and effort: Prentice Hall.
Kahneman, D. (2011). Thinking, fast andslow: Macmillan.
Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and
biases: Cambridge University Press.
140
.
Kahol, K., Leyba, M. J., Deka, M., Deka, V., Mayes, S., Smith, M., . . . Panchanathan, S. (2008).
Effect of fatigue on psychomotor and cognitive skills. The American JournalofSurgery,
195(2), 195-204. doi: http://dx.doi.org/10.1016/i.amisurg.2007.10.004
Kantowitz, B., & Campbell, J. (1994). Pilot workload and flightdeck automation. Automation
and Human Preformance:Theory andApplications, 117-136.
Kessel, C. J., & Wickens, C. D. (1982). The transfer of failure-detection skills between
monitoring and controlling dynamic systems. Human Factors: The Journalof the Human
FactorsandErgonomics Society, 24(1), 49-60.
Klein, G., & Zsambok, C. E. (1997). Naturalisticdecision making: Erlbaum, Lawrence,
Associates.
Klein, M. I., Riley, M. A., Warm, J. S., & Matthews, G. (2005). Perceivedmental workload in
an endocopic surgerysimulator. Paper presented at the Proceedings of the Human
Factors and Ergonomics Society Annual Meeting, Orlando, FL.
Knowles, W. (1963). Operator loading tasks. Human Factors: The Journalof the Human
Factorsand Ergonomics Society, 5(2), 155-16 1.
Koechlin, E., Basso, G., Pietrini, P., Panzer, S., & Grafinan, J. (1999). The role of the anterior
prefrontal cortex in human cognition. Nature, 399(6732).
Kramer, A., & Parasuraman, R. (2007). Neuroergonomics-application of neuroscience to human
factors. Handbook ofpsychophysiology, 2, 704-722.
Kramer, A. F. (1991). Physiological metrics of mental workload: A review of recent progress.
Multiple-taskperformance, 279-328.
Kroes, S. (2007). Detecting Boredom in Meetings. Enschede, Netherlands, University of Twente,
1-5.
Landrigan, C. P., Rothschild, J. M., Cronin, J. W., Kaushal, R., Burdick, E., Katz, J. T., . .
Czeisler, C. A. (2004). Effect of reducing interns' work hours on serious medical errors in
intensive care units. New EnglandJournalof Medicine, 351(18), 1838-1848.
Larson, R. W., & Richards, M. H. (1991). Boredom in the middle school years: Blaming schools
versus blaming students. American Journalof Education, 418-443.
Leary, M. R., Rogers, P. A., Canfield, R. W., & Coe, C. (1986). Boredom in interpersonal
encounters: Antecedents and social implications. JournalofPersonality andSocial
Psychology, 51(5), 968.
Lee, T. (1986). Toward the development and validation of a measure ofjob boredom. Manhattan
College Journalof Business, 15(1), 22-28.
Le6n-Carri6n, J., & Le6n-Dominguez, U. (2012). Functional Near-Infrared Spectroscopy
(fNIRS): Principles and Neuroscientific Applications.Neuroimaging-Methods (InTech).
Lipshitz, R., & Strauss, 0. (1997). Coping with Uncertainty: A Naturalistic Decision-Making
Analysis. OrganizationalBehavior and Human DecisionProcesses, 69(2), 149-163. doi:
http://dx.doi.org/I0.1006/obhd.1997.2679
Llaneras, R. E., Salinger, J., & Green, C. A. (2013). Human FactorsIssues Associated with
LimitedAbility Autonomous DrivingSystems: Drivers'Allocationof VisualAttention to
the ForwardRoadway. Paper presented at the Proceedings of the Seventh International
Driving Symposium on Human Factors in Driver Assessment, Training, and Vehicle
Design.
141
.
Lloyd-Fox, S., Blasi, A., & Elwell, C. (2010). Illuminating the developing brain: the past, present
and future of functional near infrared spectroscopy. Neuroscience & Biobehavioral
Reviews, 34(3), 269-284.
Lockley, S. W., Barger, L. K., Ayas, N. T., Rothschild, J. M., Czeisler, C. A., Landrigan, C. P.,
. . Safety, G. (2007). Effects of Health Care Provider Work Hours and Sleep Deprivation
on Safety and Performance. Joint Commission Journalon Quality and PatientSafety,
33(11), 7-18.
Logothetis, N. K., Pauls, J., Augath, M., Trinath, T., & Oeltermann, A. (2001).
Neurophysiological investigation of the basis of the fMRI signal. Nature, 412(6843),
150-157.
Luld, D., Noirhomme, Q., Kleih, S. C., Chatelle, C., Halder, S., Demertzi, A., ... Schnakers, C.
(2012). Probing command following in patients with disorders of consciousness using a
brain-computer interface. ClinicalNeurophysiology.
Lundberg, U. (2005). Stress hormones in health and illness: the roles of work and gender.
Psychoneuroendocrinology,30(10), 1017-1021.
Lysaght, R. J., Hill, S. G., Dick, A., Plamondon, B. D., & Linton, P. M. (1989). Operator
workload: Comprehensive review and evaluation of operator workload methodologies:
DTIC Document.
Mackworth, N. H. (1948). The breakdown of vigilance durning prolonged visual search.
QuarterlyJournalofExperimentalPsychology, 1(1), 6-21. doi:
10.1080/17470214808416738
Mandeville, J. B., Marota, J. J., Ayata, C., Moskowitz, M. A., Weisskoff, R. M., & Rosen, B. R.
(1999). MRI measurement of the temporal evolution of relative CMRO 2 during rat
forepaw stimulation. Magnetic Resonance in Medicine, 42(5), 944-951.
Manoach, D. S., Schlaug, G., Siewert, B., Darby, D. G., Bly, B. M., Benfield, A.,... Warach, S.
(1997). Prefrontal cortex fMRI signal changes are correlated with working memory load.
Neuroreport, 8(2), 545-549.
Manyakov, N. V., Chumerin, N., Combaz, A., & Van Hulle, M. M. (2011). Comparison of
classification methods for P300 brain-computer interface on disabled subjects.
Computationalintelligence and neuroscience, 2011, 2.
Manzey, D. (2000). Monitoring of mental performance during spaceflight. Aviation, space, and
environmental medicine, 71(9 Suppl), A69.
Manzey, D., Lorenz, B., & Poljakov, V. (1998). Mental performance in extreme environments:
results from a performance monitoring study during a 438-day spaceflight. Ergonomics,
41(4), 537-559.
Marois, R., & Ivanoff, J. (2005). Capacity limits of information processing in the brain. Trends
in cognitive sciences, 9(6), 296-305. doi: httn://dx.doi.org/l 0.1016/i.tics.2005.04.010
Marxen, M., Cassidy, R. J., Dawson, T. L., Ross, B., & Graham, S. J. (2012). Transient and
sustained components of the sensorimotor BOLD response in fMRI. Magnetic Resonance
Imaging, 30(6), 837-847. doi: http://dx.doi.org-/10.1016/i.mri.2012.02.007
May, J. G., Kennedy, R. S., Williams, M. C., Dunlap, W. P., & Brannan, J. R. (1990). Eye
movement indices of mental workload. Acta Psychologica, 75(1), 75-89. doi:
http://dx.doi.org/10.1016/0001-6918(90)90067-P
142
McCarthy, G., Blamire, A. M., Puce, A., Nobre, A. C., Bloch, G., Hyder, F., . . . Shulman, R. G.
(1994). Functional magnetic resonance imaging of human prefrontal cortex activation
during a spatial working memory task. Proceedingsof the NationalAcademy ofSciences,
91(18), 8690-8694.
McCrae, R., & Costa Jr, P. (1999). A five-factor theory of personality. Handbook ofpersonality:
Theory and research, 2, 139-153.
McCrae, R. R., & Costa, P. T. (2010). NEO Inventories. Lutz, FL: PAR Publishing.
Meek, J. H., Firbank, M., Elwell, C. E., Atkinson, J., Braddick, 0., & Wyatt, J. S. (1998).
Regional hemodynamic responses to visual stimulation in awake infants. Pediatric
Research, 43(6), 840-843.
Merrifield, C., & Danckert, J. (2014). Characterizing the Psychophysiological Signature of
Boredom. ExperimentalBrain Research, 232.2, 481-491.
Meshkati, N., & Hancock, P. (2011). Human mental workload. Amsterdam: North-Holland.
Meyer, W.-U., Reisenzein, R., & Schtitzwohl, A. (1997). Toward a process analysis of emotions:
The case of surprise. Motivation andEmotion, 21(3), 251-274.
Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual
review of neuroscience, 24(1), 167-202.
Miller, J. L., Szalma, L. C., Warm, E. M., Hitchcock, J. S., & Dember, W. N. (1999). Intraclass
andInterclass Transfer of Trainingfor Vigilance. Paper presented at the Automation
Technology and Human Performance: Current Research and Trends;[proceedings of the
Third Conference on Automation Technology and Human Performance Held in Norfolk,
VA, March 25-28, 1998].
Mkrtchyan, A., Macbeth, J., Solovey, E., Ryan, J., & Cummings, M. (2012, October 2012).
Using Variable-RateAlerting to CounterBoredom in Human Supervisory Control. Paper
presented at the 56th Annual Meeting of the Human Factors and Ergonomics Society,
Boston, MA.
Monchi, 0., Petrides, M., Petre, V., Worsley, K., & Dagher, A. (2001). Wisconsin Card Sorting
revisited: distinct neural circuits participating in different stages of the task identified by
event-related functional magnetic resonance imaging. The Journal ofNeuroscience,
21(19), 7733-7741.
Moray, N. (1988). Mental workload since 1979. InternationalReviews ofErgonomics, 2(2), 123150.
Moray, N. E. (1979). Mental workload: Its theory and measurement. New York: Plenum Press.
Murdock Jr, B. B. (1962). The serial position effect of free recall. Journalof experimental
psychology, 64(5), 482.
Murphy, R., & Shields, J. (2012). Task ForceReport: The Role ofAutonomy in DoD Systems.
Washington, D.C.: Defense Science Board.
Napadow, V., Dhond, R., Conti, G., Makris, N., Brown, E. N., & Barbieri, R. (2008). Brain
correlates of autonomic modulation: combining heart rate variability with fMRI.
Neurolmage, 42(1), 169-177.
Nevin, J. A., Mandell, C., & Atak, J. R. (1983). The analysis of behavioral momentum. Journal
ofthe Experimentalanalysis of behavior, 39(1), 49-59.
Nijholt, A., Bos, D. P.-O., & Reuderink, B. (2009). Turning shortcomings into challenges:
Brain-computer interfaces for games. EntertainmentComputing, 1(2), 85-94.
143
O'Donnell, R., & Eggemeier, F. T. (1986). Workload assessment methodology. Measurement
Technique, 42, 5.
Obata, T., Liu, T. T., Miller, K. L., Luh, W.-M., Wong, E. C., Frank, L. R., & Buxton, R. B.
(2004). Discrepancies between BOLD and flow dynamics in primary and supplementary
motor areas: application of the balloon model to the interpretation of BOLD transients.
Neurolmage, 21(1), 144-153.
Ochsner, K. N., Bunge, S. A., Gross, J. J., & Gabrieli, J. D. (2002). Rethinking feelings: An
fMRI study of the cognitive regulation of emotion. Journalof cognitive neuroscience,
14(8), 1215-1229.
Parasuraman, R. (1979). Memory load and event rate control sensitivity decrements in sustained
attention. Science, 205(4409), 924-927.
Parasuraman, R. (1986). Vigilance, monitoring, and search.
Parasuraman, R., & Caggiano, D. (2005). Neural and genetic assays of human mental workload.
Quantifying human informationprocessing, 123-149.
Parasuraman, R., & Davies, D. R. (1976). Decision theory analysis of response latencies in
vigilance. Journalof ExperimentalPsychology: Human Perceptionand Performance,
2(4), 578.
Parasuraman, R., & Davies, D. R. (1984). Varieties of attention (Vol. 40): Academic Press New
York.
Parasuraman, R., Warm, J. S., & Dember, W. N. (1987). Vigilance: Taxonomy and utility
Ergonomics andhumanfactors (pp. 11-32): Springer.
Parasuraman, R., Warm, J. S., & See, J. E. (1998). Brain systems of vigilance.
Pashler, H. (1994). Dual-task interference in simple tasks: data and theory. Psychological
Bulletin, 116(2), 220.
Pattyn, N., Neyt, X., Henderickx, D., & Soetens, E. (2008). Psychophysiological investigation of
vigilance decrement: Boredom or cognitive fatigue? Physiology & Behavior, 93(1-2),
369-378. doi: http://dx.doi.ori/10.1016/i.physbeh.2007.09.016
Pigeau, R. A., Angus, R., O'Neill, P., & Mack, I. (1995). Vigilance latencies to aircraft detection
among NORAD surveillance operators. Human Factors:The Journalof the Human
FactorsandErgonomics Society, 37(3), 622-634.
Price, C. J. (2010). The anatomy of language: a review of 100 fMRI studies published in 2009.
Annals of the New York Academy of Sciences, 1191(1), 62-88.
Ragheb, M. G., & Merydith, S. P. (2001). Development and validation of a multidimensional
scale measuring free time boredom. Leisure Studies, 20(1), 41-59.
Raichle, M. E. (2011). Circulatory and Metabolic Correlates of Brain Function in Normal
Humans ComprehensivePhysiology: John Wiley & Sons, Inc.
Raichle, M. E., & Mintun, M. A. (2006). Brain Work and Brain Imaging. Annual review of
neuroscience, 29(1), 449-476. doi: doi: 10.1 146/annurev.neuro.29.051605.112819
Rasmussen, J. (1983). Skills, rules, and knowledge; signals, signs, and symbols, and other
distinctions in human performance models. Systems, Man and Cybernetics, IEEE
Transactionson(3), 257-266.
Rathje, J. M., Spence, L. B., & Cummings, M. L. (2013). Human-Automation Collaboration in
Occluded Trajectory Smoothing. Human-MachineSystems, IEEE Transactionson, 43(2),
137-148.
144
&
Recarte, M. A., & Nunes, L. M. (2003). Mental workload while driving: effects on visual search,
discrimination, and decision making. Journalof experimentalpsychology: Applied, 9(2),
119.
Redding, R. E. (1992). Analysis of operationalerrors and workload in airtraffic control. Paper
presented at the Proceedings of the Human Factors and Ergonomics Society Annual
Meeting, Atlanta, GA.
Reed, J. H., McAdams, F. H., Thayer, L. M., Burgess, I. A., & Haley, W. R. (1973). Aircraft
Accident Report: Eastern Air Lines, Inc. L-10 11, N3 1OEA. Washington, D.C.: National
Transportation Safety Board.
Reid, G. B., & Nygren, T. (1988). The subjective workload assessment technique: A scaling
procedure for measuring mental workload. Human mental workload, 185, 218.
Ridderinkhof, K. R., van den Wildenberg, W. P., Segalowitz, S. J., & Carter, C. S. (2004).
Neurocognitive mechanisms of cognitive control: the role of prefrontal cortex in action
selection, response inhibition, performance monitoring, and reward-based learning. Brain
and cognition, 56(2), 129-140.
Robertson, C. S., Narayan, R. K., Gokaslan, Z. L., Pahwa, R., Grossman, R. G., Caram Jr, P.,
Allen, E. (1989). Cerebral arteriovenous oxygen difference as an estimate of cerebral
blood flow in comatose patients. Journal ofneurosurgery, 70(2), 222-230.
Roscoe, A. H. (1992). Assessing pilot workload. Why measure heart rate, HRV and respiration?
Biologicalpsychology, 34(2), 259-287.
Rovira, E., McGarry, K., & Parasuraman, R. (2007). Effects of imperfect automation on decision
making in a simulated command and control task. Human Factors: The Journalof the
Human Factors andErgonomics Society, 49(1), 76-87.
Roy, C. S., & Sherrington, C. (1890). On the regulation of the blood-supply of the brain. The
Journalofphysiology, 11(1-2), 85.
Rypma, B., & D'Esposito, M. (1999). The roles of prefrontal brain regions in components of
working memory: effects of memory load and individual differences. Proceedings ofthe
NationalAcademy of Sciences, 96(11), 6558-6563.
Sachse, B. (2011). FAA Issues Final Rule on Pilot Fatigue, 2014, from
http://www.faa.gov/news/press releases/news story.cfn?newsld=13272
Sakatani, K., Chen, S., Lichty, W., Zuo, H., & Wang, Y.-p. (1999). Cerebral blood oxygenation
changes induced by auditory stimulation in newborn infants measured by near infrared
spectroscopy. Early Human Development, 55(3), 229-236. doi:
http://dx.doi.org/l0.1016/SO378-3782(99)00019-5
Sarter, N. B., Woods, D. D., & Billings, C. E. (1997). Automation surprises. Handbook of
Human FactorsandErgonomics, 2, 1926-1943.
Sassaroli, A., Zheng, F., Hirshfield, L., Girourd, A., Solovey, E. T., Jacob, R., & Fantini, S.
(2008). Discrimination of Mental Workload Levels in Human Subjects with Functional
Near-Infrared Spectroscopy. JournalofInnovative OpticalHealth Sciences, 01(02), 227237. doi: doi: 10.1142/S 1793545808000224
Schlegel, R. E., Gilliland, K., & Schlegel, B. (1986). Development ofthe CriterionTask Set
performance data base. Paper presented at the Proceedings of the Human Factors and
Ergonomics Society Annual Meeting, Dayton, OH.
145
Scholkmann, F., Klein, S. D., Gerber, U., Wolf, M., & Wolf, U. (2014). Cerebral hemodynamic
and oxygenation changes induced by inner and heard speech: a study combining
functional near-infrared spectroscopy and capnography. JournalofBiomedical Optics,
19(1), 017002-017002. doi: 10.1117/1.jbo.19.1.017002
Schroeter, M. L., Kupka, T., Mildner, T., Uludag, K., & von Cramon, D. Y. (2006). Investigating
the post-stimulus undershoot of the BOLD signal-a simultaneous tMRI and fNIRS
study. Neuroimage, 30(2), 349-358. doi:
http://dx.doi.org/10.1016/i.neuroimage.2005.09.048
Sears, L. (1909). Wendell Phillips, orator and agitator:Doubleday, Page.
See, J. E., Howe, S. R., Warm, J. S., & Dember, W. N. (1995). Meta-analysis of the sensitivity
decrement in vigilance. PsychologicalBulletin, 117(2), 230.
Shaw, T. H., Funke, M. E., Dillard, M., Funke, G. J., Warm, J. S., & Parasuraman, R. (2013).
Event-related cerebral hemodynamics reveal target-specific resource allocation for both
"go" and "no-go" response-based vigilance tasks. Brain and cognition, 82(3), 265-273.
Shaw, T. H., Warm, J. S., Finomore, V., Tripp, L., Matthews, G., Weiler, E., & Parasuraman, R.
(2009). Effects of sensory modality on cerebral blood flow velocity during vigilance.
NeuroscienceLetters, 461(3), 207-211.
Sheridan, T. B., & Simpson, R. (1979). Toward the definition and measurement of the mental
workload of transport pilots: Cambridge, Mass.: Massachusetts Institute of Technology,
Dept. of Aeronautics and Astronautics, Flight Transportation Laboratory,[ 1979].
Shimizu, T., Hirose, S., Obara, H., Yanagisawa, K., Tsunashima, H., Marumo, Y., . . . Taira, M.
(2009). Measurement of frontal cortex brain activity attributable to the driving workload
and increased attention. SAE InternationalJournalof PassengerCars-Mechanical
Systems, 2(1), 736-744.
Shimoda, N., Takeda, K., Imai, I., Kaneko, J., & Kato, H. (2008). Cerebral laterality differences
in handedness: A mental rotation study with NIRS. NeuroscienceLetters, 430(1), 43-47.
Sirevaag, E. J., Kramer, A. F., Reisweber, C. D. W. M., Strayer, D. L., & Grenell, J. F. (1993).
Assessment of pilot performance and mental workload in rotary wing aircraft.
Ergonomics, 36(9), 1121-1140. doi: 10.1080/00140139308967983
Skitka, L. J., Mosier, K. L., & Burdick, M. (1999). Does automation bias decision-making?
InternationalJournalof Human-ComputerStudies, 51(5), 991-1006.
Solovey, E. (2009). Using FNIRS Brain Sensing in RealisticHCI Settings: Experiments and
Guidelines. Paper presented at the Computer-Human Interface Conference, Boston, MA.
Solovey, E., Schermerhorn, P., Scheutz, M., Sassaroli, A., Fantini, S., & Jacob, R. (2012).
Brainput: EnhancingInteractive Systems with Streaming FNIRS Brain Input. Paper
presented at the Computer-Human Interaction, Austin, TX.
Son, I.-Y., Guhe, M., Gray, W., Yazici, B., & Schoelles, M. (2005). Human performance
assessmentusingfNVIR. Paper presented at the Society of Photo-Optical Instrumentation
Engineers (SPIE) Conference Series.
Speert, D. (2012). Brainfacts: a primer on the brain and nervous system (7th ed.). Washington,
DC: Society for Neuroscience.
Stager, P., Hameluck, D., & Jubis, R. (1989). Underlyingfactorsin air traffic control incidents.
Paper presented at the Proceedings of the Human Factors and Ergonomics Society
Annual Meeting, Denver, CO.
146
Steinbrink, J., Villringer, A., Kempf, F., Haux, D., Boden, S., & Obrig, H. (2006). Illuminating
the BOLD signal: combined fMRI-fNIRS studies. MagneticResonance Imaging, 24(4),
495-505. doi: http://dx.doi.org/10.1016/.mri.2005.12.034
Strangman, G., Culver, J. P., Thompson, J. H., & Boas, D. A. (2002). A quantitative comparison
of simultaneous BOLD fMRI and NIRS recordings during functional brain activation.
Neurolmage, 17(2), 719-731.
Suedfeld, P. (1975). The Benefits of Boredom: Sensory Deprivation Reconsidered: The effects
of a monotonous environment are not always negative; sometimes sensory deprivation
has high utility. American Scientist, 63(1), 60-69.
Svensson, E., Angelborg-Thanderez, M., Sj6berg, L., & Olsson, S. (1997). Information
complexity-mental workload and performance in combat aircraft. Ergonomics, 40(3),
362-380.
Svensson, E., Angelborg-Thanderz, M., & Sj6berg, L. (1993). Mission challenge, mental
workload and performance in military aviation. Aviation, space, and environmental
medicine.
Takahashi, N., Shimizu, S., Hirata, Y., Nara, H., Inoue, H., Hirai, N., ... Kato, S. (2011). Basic
study of analysis of human brain activities during car driving Human Interface and the
Management ofInformation. Interactingwith Information (pp. 627-635): Springer.
Tan, D., & Nijholt, A. (2010). Brain-Computer Interfaces and Human-Computer Interaction. In
D. S. Tan & A. Nijholt (Eds.), Brain-ComputerInterfaces (pp. 3-19): Springer London.
Thackray, R., Bailey, J. P., & Touchstone, R. M. (1977). Physiological, Subjective, and
Performance Correlates of Reported Boredom and Monotony While Performing a
Simulated Radar Control Task. In R. Mackie (Ed.), Vigilance (Vol. 3, pp. 203-215):
Springer US.
Thackray, R. I., Powell Bailey, J., & Mark Touchstone, R. (1979). The Effect of Increased
Monitoring Load on Vigilance Performance using a Simulated Radar Display.
Ergonomics, 22(5), 529-539. doi: 10.1080/00140137908924637
Thomas, L. C., & Wickens, C. D. (2001). Visual displays and cognitive tunneling: Frames of
reference effects on spatialjudgmentsand change detection. Paper presented at the
Proceedings of the Human Factors and Ergonomics Society Annual Meeting,
Minneapolis, MN.
Thornburg, K., Peterse, H. P. M., Liu, A. M., & Oman, C. (2011). Operator Performance in Long
Duration, Low Task Load Control Operations. In D. A. D'Agostino (Ed.), Preparedfor
the Nuclear Regulatory Commission, Office ofNuclear Regulatory Research. Cambridge,
MA: Massachusetts Institute of Technology.
Tootell, R. B., Hadjikhani, N. K., Mendola, J. D., Marrett, S., & Dale, A. M. (1998). From
retinotopy to recognition: fMRI in human visual cortex. Trends in cognitive sciences,
2(5), 174-183.
Tsujimoto, S., Yamamoto, T., Kawaguchi, H., Koizumi, H., & Sawaguchi, T. (2004). Prefrontal
cortical activation associated with working memory in adults and preschool children: an
event-related optical topography study. Cerebralcortex, 14(7), 703-712.
Tsunashima, H., & Yanagisawa, K. (2009). Measurement of brain function of car driver using
functional near-Infrared spectroscopy (fNIRS). Computationalintelligence and
neuroscience, 2009.
147
Tucker, D. M. (1981). Lateral brain function, emotion, and conceptualization. Psychological
Bulletin, 89(1), 19.
Tvaryanas, A. P., Platte, W., Swigart, C., Colebank, J., & Miller, N. L. (2008). A resurvey of
shift work-related fatigue in MQ-1 Predator unmanned aircraft system crewmembers.
van Tilburg, W. A., & Igou, E. R. (2012). On boredom: Lack of challenge and meaning as
distinct boredom experiences. Motivation and Emotion, 36(2), 181-194.
Vernon, J. A., Mc Gill, T. E., Gulick, W. L., & Candland, D. K. (1959). Effect of sensory
deprivation on some perceptual and motor skills. PerceptualandMotor Skills, 9(3), 9197.
Vidulich, M. A., & Tsang, P. S. (2012). Mental workload and situation awareness. Handbook of
Human Factors andErgonomics, 243.
Vienneau, R., & Gozzo, F. (1987). Estimatingpilot workload and its impact on system
performance. Paper presented at the 43rd American Helicopter Society Annual Forum,
St. Louis, MO.
Vodanovich, S. J. (2003). Psychometric measures of boredom: a review of the literature. The
JournalOfPsychology, 137(6), 569-595.
Vogel-Walcutt, J. J., Fiorella, L., Carper, T., & Schatz, S. (2012). The Definition, Assessment,
and Mitigation of State Boredom Within Educational Settings: A Comprehensive
Review. EducationalPsychology Review, 24(1), 89-111.
Warm, J. S. (1984). Sustained attention in humanperformance: Wiley New York.
Warm, J. S., & Dember, W. N. (1998). Tests of vigilance taxonomy. In R. R. Hoffman, M. F.
Sherrick & J. S. Warm (Eds.), Viewing Psychology as a Whole: The IntegrativeScience
of William N Dember (pp. 704): American Psychological Association.
Warm, J. S., Dember, W. N., & Hancock, P. A. (1996). Vigilance and workload in automated
systems.
Warm, J. S., Matthews, G., & Parasuraman, R. (2009). Cerebral hemodynamics and vigilance
performance. Militarypsychology, 21, S75-S100.
Warm, J. S., Parasuraman, R., & Matthews, G. (2008). Vigilance requires hard mental work and
is stressful. Human Factors: The Journalofthe Human Factorsand Ergonomics Society,
50(3), 433-441.
Watt, J. D., & Ewing, J. E. (1996). Toward the development and validation of a measure of
sexual boredom. JournalofSex Research, 33(1), 57-66.
Weber, A., Fussler, C., O'hanlon, J., Gierer, R., & Grandjean, E. (1980). Psychophysiological
effects of repetitive tasks. Ergonomics, 23(11), 1033-1046.
Whyte, J. (2011). Blood Oxygen Level-Dependent Encyclopedia of ClinicalNeuropsychology
(pp. 423-426): Springer.
Wickens, C., & Hollands, J. (1999). EngineeringPsychology andHuman Performance(3rd
Edition): Prentice Hall.
Wickens, C. D. (1984). Processing resources in attention. In R. Parasuraman & D. R. Davies
(Eds.), Varieties ofAttention (pp. 63-102). Orlando, FL: Academic Press.
Wickens, C. D., & Kessel, C. (1979). The effects of participatory mode and task workload on the
detection of dynamic system failures. Systems, Man and Cybernetics, IEEE Transactions
on, 9(1), 24-34.
148
-
Wickens, C. D., Mayor, A. S., & McGee, J. P. (1997). Flight to thefuture: Humanfactors in air
traffic control: National Academies Press.
Wickens, C. D., Vidulich, M., & Sandry-Garza, D. (1984). Principles of SCR compatibility with
spatial and verbal tasks: The role of display-control location and voice-interactive
display-control interfacing. Human Factors: The Journalof the Human Factors and
Ergonomics Society, 26(5), 533-543.
Wierwille, W. W. (1979). Physiological measures of aircrew mental workload. Human Factors:
The Journal ofthe Human Factors and Ergonomics Society, 21(5), 575-593.
Wierwille, W. W., & Eggemeier, F. T. (1993). Recommendations for mental workload
measurement in a test and evaluation environment. Human Factors: The Journalof the
Human Factorsand Ergonomics Society, 35(2), 263-281.
Wilson, G. (2002). An Analysis of Mental Workload in Pilots During Flight Using Multiple
Psychophysiological Measures. The InternationalJournalofAviation Psychology, 12(1),
3-18. doi: 10.1207/s15327108ijap201_2
Wilson, G., & Russell, C. A. (2003). Real-time assessment of mental workload using
psychophysiological measures and artificial neural networks. Human Factors:The
Journalof the Human Factorsand ErgonomicsSociety, 45(4), 635-644.
Wilson, G. F., & Russell, C. A. (2007). Performance enhancement in an uninhabited air vehicle
task using psychophysiologically determined adaptive aiding. Human Factors:The
Journalof the Human Factorsand ErgonomicsSociety, 49(6), 1005-1018.
Wolf, M., Ferrari, M., & Quaresima, V. (2007). Progress of near-infrared spectroscopy and
topography for brain and muscle clinical applications. JournalofBiomedical Optics,
12(6), 062104-062104-062114.
Woods, D. D., & Sarter, N. B. (2000). Learning from automation surprises and" going sour"
accidents. Cognitive engineering in the aviation domain, 327-353.
Wylie, G., & Allport, A. (2000). Task switching and the measurement of "switch costs".
Psychologicalresearch, 63(3-4), 212-233.
Yerkes, R. M., & Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit
formation. Journalof comparative neurology andpsychology, 18(5), 459-482.
Zheng, B., Cassera, M. A., Martinec, D. V., Spaun, G. 0., & Swanstr5m, L. L. (2010).
Measuring mental workload during the performance of advanced laparoscopic tasks.
Surgical endoscopy, 24(1), 45-50.
Zuckerman, M. (1979). Sensationseeking: Beyond the optimal level of arousal:Erlbaum
Hillsdale, NJ.
149
Download