The Impact of Rapid TransItions in Mental Workload in a Supervisory Control Setting _A__M_ MASSACHUSETT$ 1N6TtJTE OF TECHNOLOGY Ii Mark W. Boyer 0 20 LIBRARIES B.S. Aaonautcal Engineering United States Air Fowce Academy, Colorado SPrgs, CD, 2012 Subndtted to the Depatment of Aronautics and Astronautics in partial fUfmret Of the requirementsfor the degree of Master of Science In Aeronautics and AstronautIcs at de MASSACHUSETS INSTMTUTE OP TECHNOLOGY June 2014 0 2014 Mark W. Boyer. AU rights reserved. This work is sponsored by the Missile Defense Agency under Air Force Contract #PAS721--C-02. Opinions, interpretations, conclusions, and econ 101is are those of die authors and are nMt necessarily endorsed by the United States Air Force, Depenment of Defcnse, or US Governiamt. Signature redacted Depsrtneust of Aaonumt and As:nauics May 1, 2014 Certifiedby ............. .................................................. Signature redacted M r / iJ09 / Mary Visiting Professor of Aeonautics antfAsftomauntcs Thesis Supervisor Signature redacted Accepted by.......................................................................... Paulo C. Loano Asaocite Professor of Aalaics and Astronautics Chair, Comittee on Graduate Students Approved for Public Release 14-MDA-7899 (30 June 14) 1 ~:t *'~r The Impact of Rapid Transitions in Mental Workload in a Supervisory Control Setting by Mark W. Boyer Submitted to the department of Aeronautics and Astronautics on May 21, 2014 in partial fulfillment of the requirements for the degree of Master of Science in Aeronautics and Astronautics ABSTRACT With improving and expanding automation in domains such as unmanned aerial vehicles, nuclear power plants, and commercial aviation, operators will be expected to handle long periods of low task load while monitoring these highly automated systems with only an occasional need to respond to critical and emergent situations. In most of these situations operators must go from periods of low cognitive engagement to ones of high stress, high mental workload, and limited time to respond. This research investigates the mental workload transition process from low to high workload by measuring the hemodynamic response of subjects to a simulated missile defense task using functional near-infrared spectroscopy, or fNIRS. NIRS is a non-invasive neuroimaging technique that uses specific wavelengths of near-infrared light to measure concentrations of oxygenated and deoxygenated hemoglobin in the prefrontal cortex, the region of the brain commonly associated with "higher level" mental activities. This missile defense simulation consisted of low task loading through system monitoring and two critical events. For these two critical events, participants had to dynamically allocate assets to rapidly respond to incoming missiles in just 100 seconds. The two factors controlled were critical event difficulty (2 levels- easy and hard) and critical event onset time (3 levels - 40, 100, 160 minutes). Thirty subjects participated in the 3-hour experiment. Subjects who received their first event at the middle onset time (100 minutes) had a lower hemodynamic response than those in the early or late groups. Subjects in the "100-minute, hard" condition performed worse than other groups, indicating that the diminished hemodynamic response may be correlated with diminished performance. There was no significant difference for hemodynamic response between the difficulty levels, despite one scenario difficulty level being significantly harder, which was reflective in performance scores. The most important factors to predicting performance in this supervisory control application were NEO Five Factor Index Agreeableness, video game usage, pre-event attention state, and deoxygenated hemoglobin levels. These results indicate that performance can be correlated to the physiological response, and that physiological responses can change over time during periods of very low workload. Additionally, hemodynamic trends correlating with the 30-minute vigilance decrement indicate that physiological changes may be occurring during a vigilance task and that fNIRS may be a suitable method for tracking mental state and engagement over time. Thesis Supervisor: Mary L. Cummings Title: Visiting Professor of Aeronautics and Astronautics 3 Acknowledgments First, I would like to thank my advisor, Missy Cummings, for your guidance and wisdom over the past two years. Despite not being at MIT during my time here, you were always generous with your time and support, both personally and professionally. Second, I would like to thank my Lincoln Lab advisor, Lee Spence. You were an indispensable resource helping me develop, execute, and communicate my research project. Thank you for your encouragement in dealing with tough classes and tough advisors. And finally, thank you for great boat trips around the North Shore. Thank you to MIT Lincoln Laboratory and the Missile Defense Agency for sponsoring this research. Thank you to Dan O'Connor, Sung-Hyun Son, Whangbong Choi, and all the members of Group 36 who helped make this research project possible. I have learned volumes about missile defense and about the people working tirelessly to develop technology to keep our country safe. Thank you to John Kuconis for organizing the funding to make graduate school possible for this Lt and many others. It has been a tremendous experience, and I feel very fortunate to have been selected to work at Lincoln Lab and MIT. I would also like to thank Rob Jacob, Dan Afergan, and all the other members of the Tufts HCI Lab for allowing me to run my study at your lab and providing me with technical and academic support throughout. Thank you to Erin Solovey for all of your help in making this fNIRS project work. Despite a busy schedule, you always found time to help me setup my project, analyze my data, and present my results. 4 Thank you to Jason, Andrew, Alex, Kathleen, and all the other HAL members for your help throughout my time here. From picking classes, navigating coursework, and surviving MIT, I know I would not have made it without all of your help and encouragement. Thank you to Sally Chapman, Beth Marois, Marie Stuppard, and all of the support staff at MIT for your help in smoothing out funding, scheduling, and advising. You are truly the backbone of the Institute. I would also like to thank all of the other grad students who I have worked with here at MIT. I know I am forever in debt to those who carried me through tough classes and helped refine my research goals and world views. To my brother, Shane-you have always been my best bud, role model, and late night putting companion. Even though we may stand eye-to-eye now, I'll always be looking up to you. To my beautiful Catherine, thank you for your love and encouragement during the past two years. You were always my biggest supporter and best friend, and I cherish how you've helped me grow during all our time together. To my parents-thank you for your unwavering support during my time here. You have always given me wisdom, love, and encouragement, no matter the circumstance. I feel so lucky to have been raised by such caring, honest, and supportive parents, and I hope to aspire to your example someday. Finally, I want to thank God for the many blessings in my life. "I can do all things through him who gives me strength." Philippians 4:13 5 Table of Contents ABSTRACT .................................................................................................................................................. 3 Acknow ledgm ents ......................................................................................................................................... 4 List of Figures ............................................................................................................................................... 9 List of Tables .............................................................................................................................................. I I List of Acronym s ........................................................................................................................................ 12 1. Introduction ............................................................................................................................................. 13 1.1 Research Goals ............................................................................................................................ 14 1.2 Thesis Layout .............................................................................................................................. 16 2. Background ............................................................................................................................................. 17 2.1 Vigilance ..................................................................................................................................... 17 2.1.1 Definitions .................................................................................................................................. 17 2.1.2 Previous W ork ........................................................................................................................... 18 2.1.3 Fram eworks for Studying V igilance .......................................................................................... 19 2.2 Boredom ...................................................................................................................................... 20 2.2.1 Definitions ........................................................................................................................... 20 2.2.2 Previous W ork .................................................................................................................... 21 2.2.3 M easuring Boredom ...................................... ...................................................................... 22 2.3 M ental W orkload ........................................................................................................................ 23 2.3.1 Definitions and M odels ....................................................................................................... 24 2.3.2 Previous W ork .................................................................................................................... 28 2.3.3 W orkload Transition ........................................................................................................... 29 2.3.4 W orkload M easurem ent ...................................................................................................... 33 2.4 Neurophysiology and functional Near-Infrared Spectroscopy (fNIRS) ..................................... 45 2.4.1 Neurophysiology ................................................................................................................. 45 2.4.2 fNIRS Background .............................................................................................................. 46 2.4.3 How fNIRS measures cognitive activity ............................................................................. 48 2.4.4 Using fKIRS to m easure workload ..................................................................................... 51 2.5. Summ ary ..................................................................................................................................... 54 3. Experim ental M ethods ............................................................................................................................ 55 3.1 Experim ental Framework ............................................................................................................ 55 6 3.2 Experim ent Conduct and Data Collection ............................................................................. 3.3 Experimental Design...................................................................................................................64 3.4 Experim ental M atrix...................................................................................................................68 3.5 Summ ary.....................................................................................................................................69 4 59 Results.................................................................................................................................................71 4.1 Data Processing...........................................................................................................................71 4.2 Results.........................................................................................................................................72 72 4.2.1 Sam ple Sum mary ................................................................................................................ 4.2.2 Baseline Analysis................................................................................................................73 4.2.3 First Event Analysis........................................................................................................ 74 4.2.4 Tim e to the Maximum and Return to Baseline ............................................................... 76 4.2.5 Perform ance........................................................................................................................78 4.2.6 M odel Creation ................................................................................................................... 80 4.2.7 Chat Box Analysis......................................................................................................... 83 4.2.8 Second W ave Analysis................................................................................................... 85 4.2.9 Lateralization ...................................................................................................................... 87 4.2.10 Long-Term Effects..............................................................................................................88 4.3 Summ ary ..................................................................................................................................... 91 Conclusions.........................................................................................................................................93 5. 5.1 Experim ent Conclusions.......................................................................................................... 5.2 Lim itations..................................................................................................................................99 5.3 Future W ork.............................................................................................................................. 93 100 Appendix A: Experim ent Timeline........................................................................................................... 102 Appendix B: Participant Consent Form .................................................................................................... 104 Appendix C: Dem ographic Survey...........................................................................................................108 Appendix D : Boredom Proneness Survey ................................................................................................ 109 Appendix E: Post-Experim ent Survey ...................................................................................................... 110 Appendix F: M essage Panel Alert Times ................................................................................................. 112 Appendix G : Summ ary of Variables......................................................................................................... 113 Appendix H Results Tables...................................................................................................................... 115 Appendix : Return to baseline calculations............................................................................................. 118 7 Appendix J: Tim e elapsed: Final chat m essage to W ave 1 start ............................................................... 119 Appendix K : fNIRS Data with V igilance Features Plotted ...................................................................... 120 References................................................................................................................................................. 135 8 List of Figures Figure 1: Yerkes-Dodson Arousal vs. Performance Curve.................................................................... 25 Figure 2: Model of Human Information Processing (Wickens & Hollands, 1999)................................27 Figure 3: fNIRS sensor diagram for prefrontal cortex measurement (Sassaroli, et al., 2008)................41 Figure 4: Operator with fNIRS sensors mounted on forehead............................................................... 42 Figure 5: Scattering of Photons in Tissue (from ISS, Inc.).................................................................... 47 Figure 6: Light Penetration in Brain Tissue using fNIRS (from ISS, Inc.).............................................47 Figure 7: Nominal hemodynamic response ........................................................................................... 50 Figure 8: Operator Display ...........................................................................................................----..... 57 Figure 9: fNIRS data collection method ................................................................................................ 61 Figure 10: fNIRS Probe Applied to Forehead (Scholkmann, Klein, Gerber, Wolf, & Wolf, 2014)...........62 Figure 11: Performance vs. Workload................................................................................................... 68 Figure 12: fNIRS Data Functional Diagram ........................................................................................... 72 Figure 13: Baseline Comparison.................................................................................................................73 Figure 14: Subject Number vs. HbO % Change ......................................................................................... 75 Figure 15ab: Estimated Marginal Means for % Change HbO (left), HbR (right)................75 Figure 16: HbR % Change from baseline ................................................................................................... 76 Figure 17: HbO Time to the Maximum .................................................................................................. 77 Figure 18: Return to baseline time.......................................................................................................... 78 9 Figure 19: Average Final Track Error by Difficulty .............................................................................. 79 Figure 20: Average Final Track Error by Time and Difficulty ............................................................... 79 Figure 21: %Below Threshold by Difficulty ............................................................................................. 80 Figure 22: %Below Threshold by Time & Difficulty ................................................................................ 80 Figure 23: Average Track Error vs Distraction Coding State..................................................................81 Figure 24: NEO-FFI Scores ................................................... 82 Figure 25: Video Game Usage....................................................................................................................83 Figure 26: Chat Response Time vs. Primary Variables ............................... 84 Figure 27: Time of Last Chat Message Before Event ................................................................................. 85 Figure 28: Wave 2 Average Final Error vs. Time & Difficulty............................................................. 86 Figure 29: Wave 1 vs. Wave 2 Final Track Error Performance............................................................. 87 Figure 30: HbO Lateralization Effects (Left-Right) ................................................................................ 88 Figure 31: HbR Lateralization Effects (Left-Right) ................................................................................ 88 Figure 32: HbO Response for Subject with Vigilance Decrement Pattern............................................ 89 Figure 33: HbO Level-Off Time.................................................................................................................90 Figure 34: Slope to Level-Off.................................................................................................................... 90 Figure 35: HbO-HbR integral for 4 quarters......................................................................................... 91 10 List of Tables Table 1: UAV Com ponent Descriptions................................................................................................. 59 Table 2: Data Collection and Processing Parameters ............................................................................. 61 Table 3: Video Coding Criteria...................................................................................................................63 Table 4: Test Matrix....................................................................................................................................68 Table 5: Return To Baseline test results ................................................................................................ 78 Table 6: HbO Com parisons for Onset Tim e ............................................................................................. 115 Table 7: HbR Comparisons for Onset Time ............................................................................................. 115 Table 8: HbO/HbR Marginal M eans Onset Time Comparison ................................................................ 115 Table 9: Perform ance M odel Summ ary .................................................................................................... 116 Table 10: Chat Response M odel...............................................................................................................116 Table 11: Lateralization Effects................................................................................................................116 Table 12: Lateralization by Difficulty & Tim e ......................................................................................... 117 Table 13: Lateralization vs Performance M odel....................................................................................... 117 11 List of Acronyms ANOVA Analysis of Variance ANS Autonomic Nervous System ATC Air Traffic Control BMDS Ballistic Missile Defense System BOLD Blood Oxygen Level Dependent BPI Boredom Proneness Index COUHES Committee on the Use of Humans as Experimental Subjects CT Computed Tomography DPF Differential Path Length EEG Electroencephelograph EKG Electrocardiogram FAA Federal Aviation Administration fMRI Functional Magnetic Resonance Imaging tNIRS Functional Near Infrared Spectroscopy HAL Humans and Automation Lab HbO Oxygenated Hemoglobin HbR Deoxygenated Hemoglobin HbT Total Hemoglobin HRF Hemodynamic Response Function MIT Massachusetts Institute of Technology NEO FFI 3 NEO Industries Five Factor Index 3 PET Positron Emission Tomography SA Situational Awareness SNS Sympathetic Nervous System TCD Transcranial Doppler Sonography TLX Task Load Index UAV Unmanned Aerial Vehicle 12 1. Introduction The US Ballistic Missile Defense System (BMDS) is an integrated global network of sensors and assets put in place to defend against a myriad of threats to the United States and its allies. Because of the high speeds of both the threats and the defense's interceptor missiles in a ballistic missile engagement scenario, the timelines are short - on the order of minutes or tens of minutes. Furthermore, threats may employ a variety of countermeasures to confuse the defense. Thus, operators must be able to receive, store, and process information rapidly to make critical decisions like system performance evaluation, information reliability, task prioritization, or action effectiveness determination. Current systems are designed with highly autonomous subsystems for detecting, tracking, engaging, and intercepting threats in order to meet the highly demanding timeline. However, while algorithms are much faster at performing calculations and optimizing response, algorithms are also notoriously brittle and susceptible to overloading. To counter this, a role for the human operator has frequently been proposed for the system which requires a determination about which roles can be performed better by the computer alone, by the human alone, or by a human-computer team (Forest, Kahn, Thomer, & Shapiro, 2007; Murphy & Shields, 2012). While a human-computer team may be more efficient under optimal conditions (Rathje, Spence, & Cummings, 2013), human operators may face extended periods of extremely low mental workload that can degrade their response in various ways before being required to transition to an intense period of activity (Bainbridge, 1983). Missile defense is only one instantiation of a larger class of human supervisory control domains which require human supervision of highly automated systems, with only occasionally or rarely-occurring events requiring attention. This research seeks to better understand the processes of attention management in low workload, semi-autonomous supervisory control situations and provide tools for measurement of these processes. In a growing number of fields such as unmanned aviation, manufacturing, nuclear power generation, security systems, commercial aviation, and many more, 13 humans have transitioned from the role of system operator to system monitor. Instead of continuous input and feedback, a system monitor's role is to only intervene in critical situations when the system reaches an unknown state or error or when new information changes the operational context. Monitors today are given an abundance of information to synthesize and often must make challenging decisions in decreasing time windows due to increasing operational tempo. At the same time, automation improvements have reduced the frequency of monitor interactions to levels of very low workload, leaving many workers in states of boredom, frustration, and distraction. Extended periods of low workload and boredom have negative consequences on the health and satisfaction of workers, but can also lead to reduced reaction times, increased errors, and reduced mistake detection (Brown & Carroll, 1984; Bruursema, Kessler, & Spector, 2011; Dyer-Smith & Wesson, 1995). When critical events occur, monitors in these low workload supervisory control environments are fighting the cognitive momentum of long periods of boredom in order to "spool up" to high mental functioning, a problem which will only get more pronounced as anomalous events become more and more rare. Workload transition is, and will continue to be, one of the most important factors in a successful implementation of a system operating under a human supervisory control paradigm. 1.1 Research Goals This investigation seeks to expand the understanding of how humans handle the abrupt transition from extended periods of very long duration monitoring to intense periods of activity. This issue is faced in a wide range of domains such as military unmanned aerial vehicle operators, nuclear power plant operators, robotic manufacturing supervisors, and many more. The primary purpose of this research is to gather psychophysiological data on low workload, high workload, and transitional workload periods for a supervisory control task. Conclusions drawn from this research will help add to the understanding of how 14 humans handle supervisory control roles for long duration, low task loading scenarios, and could be used to help develop novel methods for assisting humans during the different phases of a supervisory control task. The greatest difference between this study and previous research done by Hart (2010), Thornburg (2011), and Mkrtchyan (2012) is that this work addresses the operator's psychophysiological state during the low workload, transition, and high workload periods using functional near-infrared spectroscopy (fNIRS). fNIRS is a non-invasive method for dynamically measuring the physiological changes occurring in the brain, which can provide information about mental state over time. Other methods of functional brain imaging such as functional magnetic resonance imaging (fMRI), positron emission tomography (PET), electroencephalograph (EEG) and transcranial Doppler (TCD) all provide different benefits and have different costs for measuring brain activity. fNIRS is unique for its relatively low cost, high temporal and spatial resolution, and robust data collection, in addition to its ease of use in typical applied laboratory settings. Previous studies using fNIRS have focused primarily on the high-workload & spectrum of activities such as multitasking and working memory (Durantin, Gagnon, Tremblay, Dehais, 2014; Harrison et al., 2013; Sassaroli et al., 2008; Solovey, 2009; Solovey et al., 2012; Tsunashima & Yanagisawa, 2009). Others have studied the cerebral hemovelocity in operators during a low-task load vigilance state (Shaw et al., 2013; Shaw et al., 2009; Warm, Matthews, & Parasuraman, 2009; Warm, Parasuraman, & Matthews, 2008). This study first and foremost focuses on objectively determining whether it is possible to detect the transition from low to high workload using fNIRS, using a relatively large pool of participants. Four questions motivated the design and conduct of this experiment: 1. Is a change in the measured activity of the fNIRS data correlated to an actual change in real mental workload? 2. What is the impact of time in low taskload environment on operator performance and response? 15 3. Is the magnitude of a transition from low workload to high workload correlated to performance? 4. Can the pre-transition state of the operator relative to a known baseline be used to predict the post-transition response of the operator? This work explores many aspects of the four questions described above and lays the groundwork for future research in the domain. Designers and scientists can use the results presented here to improve the design and analysis of operator consoles, training methods, and operational protocols to help mitigate detrimental patterns in a number of fields. Additionally, this work expands the database of fNIRS measurement into the low-workload domain, an area previously unstudied and critically important in the future. 1.2 Thesis Layout This thesis begins with a background examination of the topics surrounding vigilance, boredom, and workload in supervisory control settings in Chapter 2. This chapter also explores the principles of neurophysiology and the methods used to measure brain activity. Chapter 3 lays out the framework for the missile defense simulation experiment and describes the various components used. Chapter 4 presents the results from the experiment, and Chapter 5 discusses conclusions drawn from this research endeavor. 16 2. Background This chapter discusses the research that relates to the problems faced by semi-autonomous system supervisors in terms of attention management and the methods used to measure them. The beginning of this chapter will explore some of the classic and recent research relating to vigilance, boredom, mental workload, and mental workload transition. Later, this chapter will introduce functional near-infrared spectroscopy (fNIRS) and provide evidence for why it is a novel, suitable, and effective method for measuring mental workload. 2.1 Vigilance This section will cover a brief introduction to vigilance, highlights of the previous work on the topic, and a summary of the methods used to investigate vigilance. 2.1.1 Definitions One of the core ideas of human interaction with semi-autonomous systems is the concept of vigilance. Merriam-Webster defines vigilance as, "a state of being alertly watchful especially to avoid danger" and Wendell Phillips famously stated in 1852 that, "eternal vigilance is the price of liberty" (Sears, 1909). In the context of human factors, however, vigilance is a common problem in human supervisory control of a semi-autonomous system, specifically the detection of stimuli that occur at low frequency. Experimental psychologist D.E. Broadbent classified vigilance tasks as ones, "whose chief feature is that a man responds only to very infrequent signals but may have to watch for them over long periods" (Broadbent, 1958). More recently, vigilance has been defined as, "the ability of organism to & maintain their focus of attention and to remain alert to stimuli over prolonged periods of time" (Davies Parasuraman, 1982; Parasuraman, 1986; Warm, 1984; Warm, Dember, & Hancock, 1996). 17 2.1.2 Previous Work The first true vigilance experiments were conducted by Norman Mackworth in the 1940s to understand why radar and sonar operators were missing signals, especially at the end of their watch. He devised an experimental device known as the Mackworth Clock that operators would watch for hours to detect a double jump in the second hand, the "signal" in the experiment. He found that vigilance waned quickly, as operator performance decreased 10-15% after the first 30 minutes followed by a more gradual decline for the rest of the experiment (Mackworth, 1948). This "vigilance decrement" has been studied and replicated in various forms in the decades following the original experiment (Alves & Kelsey, 2010; & Caggiano & Parasuraman, 2004; Davies & Parasuraman, 1982; Parasuraman, 1986; Parasuraman Davies, 1976; Pigeau, Angus, O'Neill, & Mack, 1995; See, Howe, Warm, & Dember, 1995; Shaw, et al., 2009; Warm & Dember, 1998; Warm, et al., 1996; Warm, et al., 2009; Warm, et al., 2008). A whole field devoted to vigilance research has spawned a steady flow of experiments in fields as diverse as ballistic missile defense to medical analysis, particularly as many fields have made the transition from human as direct controller to human as a system supervisor. Warm, Parasuraman, and Matthews completed a review of vigilance research in their Human Factorsarticle "Vigilance Requires Hard Mental Work and Is Stressful" (2008). They list the fields where vigilance is important and has been considered include: military surveillance, air traffic control, cockpit monitoring, seaboard navigation, industrial process/quality control, long-distance driving, agricultural inspection tasks.. .medical settings such as cytological screening, electrocardiogram monitoring, anesthesia.. .airport baggage inspection, and illicit radioactive materials detection at border crossings and ports. This list has expanded in the years following to include domains like unmanned aerial systems (Tvaryanas, Platte, Swigart, Colebank, & Miller, 2008), network security monitoring (Bejtlich, 2013; Heberlein et al., 1990) and autonomous automobiles (Llaneras, Salinger, & Green, 2013), and it will continue to grow as automation creeps into almost every domain. 18 One of their most crucial findings is that vigilance is demanding work, even though the operator may have very few tasks to actually complete. Several studies (Caggiano & Parasuraman, 2004; Miller, Szalma, Warm, Hitchcock, & Dember, 1999; Parasuraman, 1979; Parasuraman & Davies, 1976; See, et al., 1995) show that vigilance performance is correlated closely with task type, not the actual task. They define two types of tasks: a successive task, where the subject must make an absolute judgment based on stored memory, or a simultaneous task, where the subject must make a comparative judgment based on information within the signal and requires little memory requirements. It was found that subjects perform worse on successive tasks, which points to the conclusion that vigilance tasks demand attentional resources (Parasuraman, Warm, & Dember, 1987; Warm & Dember, 1998). The important factors that determine performance include event rate, event irregularities, spatial uncertainty of stimuli, and multitasking. 2.1.3 Frameworks for Studying Vigilance Several different frameworks can be employed for understanding performance during vigilance tasks. First, using attentional resource theories such as those proposed by Kahnemann (1973) or Wickens (1984), vigilance performance decrement reflects a depletion of attentional resources over time. Similar to the attentional framework, Warm, Parasuraman, and Matthews also use the framework of mental workload-the degree of information processing capacity that is expended during the task-to understand vigilance performance. Subjective methods such as the NASA-Task Load Index (TLX) or Multiple Resource Questionnaire (MRQ) provide a multi-dimensional measure of workload (Boles & Adair, 2001; Hart & Staveland, 1988). Several different studies show that these studies are reliable measures of workload (Wickens & Hollands, 1999) and that vigilance tasks correlate to high levels of mental demand and frustration (Warm, et al., 1996). 19 The newest method that Warm, Parasuraman, and Matthews cite to measure vigilance is a neurological or physiological approach. Using various neuroimaging techniques such as positron emission tomography (PET) or functional magnetic resonance imaging (fMRI), they find blood flow increases to the prefrontal cortex, the region of the brain responsible for attention and cognition, during vigilance tasks (Parasuraman & Caggiano, 2005; Parasuraman, Warm, & See, 1998). Finally, they cite that stress is also an important component of vigilance. Subjects performing vigilance tasks have been & shown to have increased levels of hormones linked to stress such as catecholamines (Frankenhaeuser Patkai, 1965; Lundberg, 2005; Parasuraman & Davies, 1984) or epinephrine and norepinephrine (Frankenhaeuser & Lundberg, 1982; Frankenhaeuser, Nordheden, Myrsten, & Post, 1971). Through these different frameworks, Warm, Parasuraman, and Matthews provide a compelling case that vigilance tasks are demanding and stressful. 2.2 Boredom Boredom is an important concept that is closely related to vigilance and workload. This section will present a variety of definitions and historical contexts, discuss some of the previous work, and survey some of the measurement techniques available to researchers. 2.2.1 Definitions Boredom is a phenomenon that stretches across many fields including psychology, philosophy, physiology, anthropology, and more, so one of the greatest challenges when researching boredom is finding a consistent definition. Although the lineage of the term boredom can be traced back to Roman taedia, early Christian acedia, and French ennui, the first recorded usage of the actual word boredom dates back to 1853 when Charles Dickens penned the words, "And I am bored to death with it. Bored to 20 death with this place. Bored to death with my life. Bored to death with myself." in his novel Bleak House (Dickens, 1853). Boredom has been described as "a common experience that affects people on multiple levels, including their thoughts, feelings, motivations, and actions." (van Tilburg & Igou, 2012) Others have described it as "an unpleasant, transient affective state in which the individual feels a pervasive lack of interest in and difficulty concentrating on the current activity," (Fisher, 1993) or "an affective experience associated with cognitive attentional processes" (Leary, Rogers, Canfield, & Coe, 1986). Current boredom researcher Dr. John Eastwood states that, "through the synthesis of psychodynamic, existential, arousal, and cognitive theories of boredom, we argue that boredom is universally conceptualized as 'the aversive experience of wanting, but being unable, engage in satisfying activity."' (Eastwood, Frischen, Fenske, & Smilek, 2012) These myriad of definitions highlight the broad and nebulous nature of boredom itself, which is why there have been many studies on the effects of boredom. 2.2.2 Previous Work Some of the earliest work on boredom research dates from the 1930s when researchers like Joseph Barmack were first defining boredom as "a state which is...unpleasant principally because of inadequate motivation resulting in inadequate physiological adjustments to it." (1939) Barmack found signs of physiological changes over time, but the study was limited in time and controls. In the 1940s and '50s, researchers began to more thoroughly study and quantify why human performance suffered when operators were placed in boring roles. Extreme sensory deprivation and boredom experiments actually led to hallucinations and vastly degraded cognitive abilities, similar to what was seen by aviators on long-haul flights, radar operators in air defense, or sonar operators on submarines (Heron, 1957; Vernon, Mc Gill, Gulick, & Candland, 1959). Others have cited sensory deprivation as having possible 21 beneficial or therapeutic uses, but it remains an area that few researchers are investigating today (Suedfeld, 1975). Studies by Thackray (1977; 1979) and Wickens (1997) examined Air Traffic Control (ATC) radar operators, finding that while some accidents and mistakes occur during busy times, more often they occur during the low to moderate periods of traffic. Hopkin (1988) noted that the vast majority of research on ATC focuses on the high-workload scenarios, leaving research into vigilance and boredom woefully underdeveloped. Boredom has also been studied in various working environments (Bruursema, et al., 2011; Drory, 1982; Fisher, 1993; Kroes, 2007), education (Larson & Richards, 1991; VogelWalcutt, Fiorella, Carper, & Schatz, 2012), and other tasks (Leary, et al., 1986; Pattyn, Neyt, Henderickx, & Soetens, 2008). 2.2.3 Measuring Boredom There have been several approaches taken to predict and measure boredom. The most commonly used measures for measuring and predicting boredom are surveys like the Boredom Proneness Index, developed by Farmer and Sundberg (1986). This 28-item survey has been shown as a reliable and consistent measure of attention and interest in measuring classroom boredom. The Boredom Susceptibility Scale (Zuckerman, 1979) is similar in nature but not as commonly used. In his 2003 review titled "Boredom: A Review", Vodanovich reviews many other scales, such as the job boredom scales by Grubb (1975) and Lee (1986), a boredom coping measure by Hamilton, Haier, and Buchsbaum & (1984), two scales for leisure and free-time boredom (Iso-Ahola & Weissinger, 1987; Ragheb Merydith, 2001), the Sexual Boredom Scale (Watt & Ewing, 1996) and several more measures of attention, sensation, and emotion (Vodanovich, 2003). Although the author cites a large number of scales, he points out that there are many gaps in the literature due to the relatively little research done in comparison to other emotions or states. Work continues to be done in this area, with new tests such as the 22 State Boredom Test (Fahlman, Mercer-Lynn, Flora, & Eastwood, 2013) being developed to address gaps in the testing spectrum. Other approaches to capturing boredom have focused on objective physiological measures. Barmack's original study in 1937 found a strongly inverse relationship between boredom level and both oxygen consumption and blood pressure. A study of an ATC operators test measured blood pressure, oral temperature, skin conductance, body movement, heart rate and heart rate variability while 45 male subjects performed a one-hour trial (Thackray, et al., 1977). They found that subjects who subjectively reported greater boredom and monotony had significant differences in several of the physiologic measures. The study concluded that the "nature of the pattern associated with boredom and monotony suggests a pattern more closely related to attentional processes than to 'arousal'." One more recent study used automatic posture tracking to measure students' affective state between boredom and high engagement (D'Mello, Chipman, & Graesser, 2007). Weber, et. al. (1980) and Frankenhaeuser (1971) both measured catecholamine levels during low activity tests but found somewhat conflicting results. A similar study was conducted by Merrifield (2014) and showed that heart rate, skin conductance and cortisol levels were sensitive measures to boredom. Other studies using electrocortical techniques like electroencephelograph (EEG) to measure electrical activity emanating from the brain or electrodermal measures (Davies & Krkovic, 1965) have proven to be weak indicators of boredom. Overall, the research relating to pure boredom has been limited since much more attention has been focused towards vigilance. 2.3 Mental Workload This section will first discuss some of the most well-recognized models and definitions for mental workload. Next, it will briefly sample some of the extensive literature of mental workload studies. It will 23 then focus on the case of workload transition, looking at how humans handle a change from low to high workload. Finally, it will cover a variety of methods of workload measurement such as primary task measures, secondary task measures, subjective workload assessments, neurophysiological measures, and physiological measures outside the brain. 2.3.1 Definitions and Models Vigilance and boredom are correlated with mental workload, which has been defined many & different ways. Entire volumes have been written about the subject (Hancock, Mihaly, Rahimi, Meshkati, 1988; Moray, 1979; O'Donnell & Eggemeier, 1986) with Hancock's partial bibliography in 1988 listing over 500 papers. Clearly this review can only cover some of the interesting highlights that are relevant to this research. One of the first important distinctions to make is between taskload and workload. Taskload is an objective measure of the number of tasks given to a person to complete. For example, an ATC controller may have ten planes to control in their sector. This type of measure is subject-independent since it is generated only by the environment and factors external to human. Workload, in contrast, is the perceived difficulty of the task by the human. It is a relative term that is a combination of the objective measure of difficulty (taskload) with factors internal to the human such as experience, skill, mood, or physiological state. There are reliable tests to compare workload levels between humans, but ultimately there is still some level of subjective criteria for workload internal to every human. Some of the earliest research into the field of workload was conducted by Yerkes and Dodson in the early 1900's. Their work evolved into the famous Yerkes-Dodson Curve, which shows that performance is a function of arousal (1908). 24 d) 0 low medium high Arousal Figure 1: Yerkes-Dodson Arousal vs. Performance Curve They found that humans perform best at a moderate arousal level, with performance decreasing at the low and high arousal extremes. This research has been shown to be conceptually accurate over the course of the previous century and has been extrapolated beyond arousal to studies of low and high workload. Hancock and Warm (1989) showed that a similar "inverted U" relationship exists between attentional resource capacity and stress level. Studies by Hart (2010) and others (Mkrtchyan, et al., 2012; Thornburg, et al., 2011) show a lesser performance drop in the low workload domain, although the larger body of vigilance and boredom work presents evidence that there is at least some decrement below the medium level of arousal. Studies of mental workload began to truly take root in the 1950s and 60s in domains like ATC and aviation. The first seminal conference on workload was the 1977 North Atlantic Trade Organization Symposium on Mental Workload (Moray, 1979). At that conference experts from different fields such as experimental psychology, control engineering, and physiological psychology presented models and definitions of mental workload, laying the foundation in many fields for decades to come. As part of the experimental psychology group, Moray himself gives a definition of mental workload as: A load is something which imposes a burden on a structure, or makes it approach the limit of its performance in some dimension. Go far enough along that dimension and the 25 system will fail in some way. In the case of mental workload, the central concept is the rate at which information is processed by the human operator, and basically the rate at which decisions are made and the difficulty of making the decisions. (Moray 1979) Many similar definitions have been provided in subsequent years, with the key emphasis that the human operator is limited in capacity of mental resources and that increases in task difficulty will require greater resources from the operator to adequately perform a task (O'Donnell & Eggemeier, 1986). From this fundamental presupposition, there are two dominant models by which the human responds to a mental load. Resource Theory was proposed by Kahneman in 1973 and modeled human mental abilities as a single pool of resources that could be applied to an information processing problem (Kahneman, 1973). This model states that resources varied by individual, and if the amount of demands exceeded the available resources, then tasks would be shed until reaching equilibrium at maximum capacity. Wickens expanded upon this model with Multiple Resource Theory, which is widely cited as one of the most comprehensive models of how humans handle cognitive activity (Wickens, 1984). Similar to Resource Theory, Multiple Resource Theory posits that humans draw from a pool of mental resources in order to address sensory and cognitive demands, which are then processed through an allocation policy and then carried out by the body. Unlike Resource Theory, Multiple Resource Theory assumes that each of the individual sensory and cognition systems such as visual, auditory, or tactile have separate pools of resources that can be accessed independently. Each resource is limited in differing capacity, so the human may reach a maximum workload at different levels of taskload or engagement. The maximum workload for any given person in any one modality may vary with task and experience, such as an experienced pilot listening to a noisy radio or an air traffic controller visually inspecting a cluttered radar display. 26 While Resource Theory and Multiple Resource Theory have emerged as some of the overarching theories of mental workload, the process of receiving and interpreting information has also been explored in many other ways. Wicken's model of information processing, shown in Figure 2, provides a good system diagram for representing the transformation of data from the environment into responses (Wickens & Hollands, 1999). Humans receive the raw signal through sensory organs (eyes, ears, nose, etc.), process those signals into usable pieces of information (text, speech, etc), draw upon their short- and long-term memory to interpret the information, and finally use their decision-making centers to generate and execute a response. This process has several potential bottlenecks where humans may be resourcelimited, such as working memory, attentional resources, and response selection phases (Marois & Ivanoff, 2005). Humans may be able to process signals in multiple modalities, such as simultaneous visual and auditory signals, but are generally constrained by the response selection stage (Pashler, 1994). Understanding the neural correlates to the psychological phenomena is a growing trend and one of the underpinnings of this research. Short-Twrm Sensmry StImuli -- Dedn nd Percepton Response Excuton S LoWnrkTng Memy Feedback Figure 2: Model of Human Information Processing (Wickens & Hollands, 1999) 27 2.3.2 Previous Work As mentioned previously, there have been an extensive number of studies to evaluate human mental workload. In the aviation domain, there have been numerous studies that seek to examine the mental workload of airline pilots (Battiste & Bortolussi, 1988; Roscoe, 1992; Sheridan & Simpson, 1979; Wilson, 2002), military pilots (Alfredson, Holmberg, Andersson, & Wikforss, 2011; Sirevaag, Kramer, Reisweber, Strayer, & Grenell, 1993; Svensson, Angelborg-Thanderez, Sj6berg, & Olsson, 1997; Svensson, Angelborg-Thanderz, & Sjaberg, 1993), and air traffic controllers (Hopkin, 1988; Wickens, et al., 1997). Outside of the classic aviation sphere, there have also been many studies examining the mental workload of highly trained specialists like surgeons (Berguer, Smith, & Chung, 2001; Klein, Riley, Warm, & Matthews, 2005; Zheng, Cassera, Martinec, Spaun, & Swanstr6m, 2010), missile operators (Berka et al., 2005; Hill, Zaklad, Bittner, Byers, & Christ, 1988) and astronauts (Manzey, 2000; Manzey, & Lorenz, & Poljakov, 1998), but also many studying more mundane tasks like driving (De Waard Studiecentrum, 1996; Recarte & Nunes, 2003). For a full listing of all experiments done to measure mental workload, the reader could consult synopses done by Damos (1991), Hancock and Meshkati (1988), Kantowitz and Campbell (1994), Lysaght et al. (1989), Moray (1988), O'Donnell and Eggermeier (1986), Warm et al. (1996), Meshkati (2011), or Vidulich (2012). As noted previously, the bulk of the work has focused on high mental workload since those are the domains that intuitively associated with critical situations. Endsley and Rodgers (1997) found that there was a positive correlation between workload and operational errors in the high-workload domain. However, there is evidence that many errors that lead to accidents or incidents stem from low or moderate-workload situations. Several studies have shown that ATC operational errors have occurred under low to moderate traffic complexity (Redding, 1992; Stager, Hameluck, & Jubis, 1989). Thus, Hopkin (1995) argues, underload has been relatively under-studied for being a real threat to safe operations. 28 2.3.3 Workload Transition Although many of the previous studies focus on steady state workload situations, the real world rarely exhibits this trait. Often there are transitions from low to high task load that require an adaptation by the operator to meet critical new demands. These transitions are a common feature in many domains such as emergency medicine and military operations. For example, a missile defense officer may need to rapidly transition from a very low task load monitoring state to a very demanding engagement state within the course of a few seconds in order to successfully intercept an incoming missile. Many other military and civilian career-fields face similar challenges, but missile defense provides an excellent case study for task, and thus workload transition because of its low frequency of engagement (almost zero), short timelines, and high criticality. It is the epitome of what Hancock calls "hours of boredom, moments of terror." (Hancock & Krueger, 2010) While mental workload transition shares some of the same attributes of both vigilance and boredom at one end and high workload at the other, there are some additional problems associated just with the transition period. One of the defining attributes of workload transition is uncertainty. Often in environments with high uncertainty, operators will use decision-making heuristics such as reduction of information, assumption-based reasoning, weighing pros and cons, suppression of uncertainty, or hedging with alternatives (Kahneman, Slovic, & Tversky, 1982; Lipshitz & Strauss, 1997). Surprise is also one of the detrimental factors associated with workload transition since humans must overcome their "behavioral momentum" during a workload transition (Nevin, Mandell, & Atak, 1983). Surprise is effectually overcompensation during that "momentum" exchange to adjust to a discontinuity in mental model and can result in diminished mental performance (Meyer, Reisenzein, & Schtitzwohl, 1997). Automation & surprise has become one of the leading factors in automation-related accidents (Sarter, Woods, Billings, 1997; Woods & Sarter, 2000). Team coordination is especially critical during transition periods since roles may change between a low and high-workload environment. 29 Huey and Wickens (1993) propose five direct factors of workload transition, along with several other considerations that also influence performance. The first factor is the task character. Included here are factors such as task structure, performance criteria, task schedule, presentation rate, task complexity, task variability, task duration, and task requirements and procedures. The task nature is highlighted by considering examples such as a fighter pilot that suddenly encounters a surface-to-air missile versus a nuclear power plant operator that receives a caution message. Both may be life-threatening situations, but the task schedule, duration, and complexity are much more demanding for the aviator than the plant operator. When considering the human as the plant within a control system introduced earlier in section 2.3.1, the next three factors could be considered the input, processing, and output elements. The second direct factor of workload transition, the "input", consists of information that is being transmitted from the environment to the human. This is done through visual displays, visuals from the scene, focal vision, peripheral vision, auditory, haptic, olfactory, or other sensory modalities. In addition to the raw data collection from the various senses, humans also must do what Wickens calls "encoding" in his Multiple Resource Theory model. Encoding consists of processing and translating the raw data into usable information that can be acted upon by the processing centers of the brain. In terms of the third factor of workload transition, the "processing" functions of the brain, there are several models good models for how processing functions. Wickens defines processing as two stages of perception and responding. Another commonly used framework for information processing is Rasmussen's "Skill-Rule-Knowledge" hierarchy (Rasmussen, 1983). In it, humans make decisions based on a combination of their experience, task complexity, uncertainty, and time pressure. Skill-based behavior uses extensive experience to assign an automatic response with little use of cognitive resources. This is often the kind of paradigm associated with routine behaviors, like walking or driving, or with experts like emergency room doctors or fire chiefs who have well-developed heuristics for diagnosis in 30 what has been called naturalistic decision-making (Klein & Zsambok, 1997). Rule-based behavior is found with a medium experience level, with the human generally following "if-then" types of procedures. Finally, knowledge-based behavioroccurs when the human has very little experience with the situation and must take a systematic or novel approach. This type of decision-making has been well-explored by various psychologists (Bekier, Molesworth, & Williamson, 2011; Janis & Mann, 1977; Kahneman, 2011; Rovira, McGarry, & Parasuraman, 2007; Skitka, Mosier, & Burdick, 1999). A third paradigm for understanding cognition is Endsley's Situational Awareness Model (1995). Situational Awareness is a three-step process of perceiving the environment, understanding what is happening, and projecting probable courses of action onto the environment. Cognition occurs at all stages, but particularly during the comprehension and projection phases of SA. The fourth variable to workload transition, "output" variables, can also impact a mental workload transition. Output variables are the avenue by which the human imparts their decision upon the system or environment. These include control design, control gain or lag, and order of controls. If the controls are poorly suited to the task, the human will have to apply additional resources to accomplish their goals. Anyone who has used a mouse with the gain set extremely high will intuitively understand this challenge, and many studies in aerospace controls work have demonstrated the importance of proper control laws (Wickens, Vidulich, & Sandry-Garza, 1984). Display-control compatibility can have important consequences on information processing abilities, and any disconnect between the two will place an addition burden on the operator (Jagacinski, 1989). The final primary factor that Huey and Wickens cite is computer aiding and automation. With increasingly complex systems, computer aiding and automation is a necessity. Several studies of aviation show that workload peaks and overall workload were reduced with implementation of automation (Haworth, Atencio Jr, Bivens, Shively, & Delgado, 1987; Vienneau & Gozzo, 1987). However, they also cite that automation often is only the translation of workload into another form, sometimes leaving the 31 operator with mismatched tasks or reduced system knowledge and actually increasing the cognitive demands (Hart & Sheridan, 1984; Kessel & Wickens, 1982; Wickens & Kessel, 1979). While the five factors listed above (task character, input variables, information processing, output variables, and computer aiding and automation) are the primary drivers of additional workload during workload transition, they are not the only factors that influence how it occurs in actual operations. Insufficient sleep and fatigue are well-known to be detrimental to mental performance and can exacerbate the issues found in workload transition (Kahol et al., 2008). Studies in aviation claim that fatigue accounts for at least 4-8% of all mishaps (Caldwell, 2005; Tvaryanas, et al., 2008). Research in other critical fields like medicine are even more alarming, such as studies that cite that residents on-call made 36% more serious medical errors and made 300% more fatigue-related medical errors that lead to a patient's death when compared to well-rested residents (Landrigan et al., 2004; Lockley et al., 2007). Even though this has been known for decades, governing bodies like the FAA are still combating the sleep and fatigue issues of commercial aviation in 2013 (Sachse, 2011). Beyond physiological factors, there are other factors that can play important roles in mental workload transition. Huey and Wickens' cite that spatial awareness is a crucial element to successful operations and places a large demand on mental resources (Huey & Wickens, 1993). They also describe several biases in geographic memory and spatial orientation that can be damaging to correct operations. In addition to geography, they also cite cognitive tunneling and cognitive switching as two important phenomena that influence mental workload transition. When switching between tasks, there is a certain cost incurred on the mental processing resources. If this is done often, the cost of switching can & drain the resources available for either process and can lead to the detriment of both tasks (Wylie Allport, 2000). Cognitive tunneling is another phenomenon that influences the mental processes of humans. Often, humans tend to focus on a single piece of information or hypothesis and fail to see 32 & alternatives, even though their information may be wrong or other options may be better (Thomas Wickens, 2001). Accidents such as the Eastern Airlines crash into the Everglades is an excellent example of pilots who were fixated on solving an issue (the landing gear light bulb had burned out), failed to detect important information (the autopilot had disengaged and they were descending), and ended up crashing the airplane (Reed, McAdams, Thayer, Burgess, & Haley, 1973). Overall, workload transition draws upon elements of both the low and high workload domains. Fatigue and stress are shown to have a significant impact on the functioning of operators during high workload, and often these operators are subject to these conditions due to vigilance tasks. The models for mental workload are primarily derived from high-workload scenarios, and many of the important factors of the information processing model are also important in workload transition. 2.3.4 Workload Measurement This section will give an overview of the important parameters of workload measurement and highlight several different techniques used for the measurement of mental workload. 23.4.1 Overview There are several important considerations that must be addressed when trying to measure mental workload. O'Donnell and Eggemeier (1986) provide a good overview of the topic and highlight the important characteristics of workload measurement. Wierwille, and Eggemeier (1993) reinforce these conclusions and also provide recommendations for choosing the appropriate type of measurement technique for a workload evaluation. The three most important factors in a good workload measurement tool are sensitivity, diagnosticity, and task intrusiveness, although also important are global sensitivity, transferability, and implementation requirements (Wierwille & Eggemeier, 1993). Sensitivity refers to the capability of a technique to detect changes in the levels of workload imposed by task performance. In 33 testing for "choke points", a relatively insensitive measure may be acceptable since it only need discriminate the highest aberrations. When testing apects like operational procedures, display designs, or crew composition, finer discriminations in workload must be made. Diagnosticity is based upon the multiple-resource approach and refers to the ability of a test to, " discern the type or cause of workload, or the ability to attribute it to an aspect or aspects of the operator's task" (Wierwille & Eggemeier, 1993). A test with high diagnosticity measures only the resources being strained by the task in question and provides some explanation of the workload-driving elements. For example, if trying to measure the mental workload of a person while using a variety of visual displays, a visual test should be used to determine the excess capacity. Finally, intrusiveness is highly important in measuring workload, especially in experiments that attempt to mimic natural settings. A test that is overly intrusive could induce its own extra level of workload or divert the subject's attention from the primary task, thereby muddling the results of what is actually being measured. 2.3.4.2 Primary Task Method The most obvious method for measuring workload is using the primary task. Since primary task performance is often closely tracked, it is possible to try to extract workload from the speed and accuracy of the task performance. This type of workload measurement operates under the assumption that "speed and/or accuracy of performance will decrease as workload increases beyond a critical value or threshold for unimpaired performance" (Wierwille & Eggemeier, 1993). Primary task measures can provide high sensitivity and should be considered during any workload analysis. However, others (Hart & Wickens, 1990) have cautioned that subjects may be able to expend more resources to keep performance high, making primary task workload measurement insensitive, especially at the low-to-moderate levels where operators can easily compensate for changes in demand. 2.3.4.3 Subjective Workload Ratings 34 Beyond primary task performance, there are a number of other methods for measuring workload. One of the most commonly-used techniques is the subjective workload analysis, which commonly comes in the form of a workload survey. There are several different workload surveys such as the NASA Task Load Index (TLX) (Hart & Staveland, 1988), the Cooper-Harper Rating Scale (Cooper & Harper Jr, 1969), or the Subjective Work Index Test (SWAT) (Reid & Nygren, 1988), each of which takes a similar approach to measuring workload through a series of questions. The TLX is the most widely-used test because it has been found to be a reliable measure of workload (Hart, 2006) and provides a multidimensional analysis of which component of overall workload is most important. The test first asks the subject to rate their perceived workload on six subscales of mental demands, physical demands, temporal demands, own performance, effort, and frustration. Subjects are then asked to compare which category was more influential to weight each of the components into an overall workload score. This test is generally administered at the end of the trial, but can also be administered during an experiment to measure workload at different points in time (Thornburg, et al., 2011). While subjective tests are shown to be a fairly reliable measure of workload, they lack a certain purely objective element that is important for measuring the effectiveness of a display or interface. Although they may be able to give good feedback about global levels of workload, these tests are not able to generate workload measurements at pinpointed times throughout the entire experiment. The workload of an operator during the critical phase of a test may be much more important than the overall workload to a human factors engineer, so global workload measurement may not be particularly valuable. Additionally, Murdock shows that humans are particularly vulnerable to the primacy and recency biases in what he calls the serial position effect, where humans tend to better remember events at the beginning and end but fail to remember the middle events (Murdock Jr, 1962). Hence, if a critical event occurred in the middle of the experiment, the perceived workload might be attenuated by the time it comes to complete a survey like the TLX. 35 2.3.4.4 Secondary Task Measures Secondary task measures are a derivative of resource theories of mental workload and function under the premise that task performance on the secondary task will degrade when there are fewer resources to allocate to the secondary task (Knowles, 1963). They attempt to measure the reserve capacity of a resource by placing extra demands in addition to the primary task. If using the general Resource Theory, the secondary task type does not matter much as long as it provides a high level of sensitivity. Using a Multiple Resource Theory presumption, the secondary task should be paired closely with the primary task so that they are demanding the same resources (Hart & Wickens, 1990). As discussed previously, a highly diagnostic secondary task will be closely paired with the primary resource being used in order to accurately record the workload of the subject using that specific resource. There are two primary categorizations of secondary task measures: unrelated and embedded secondary tasks. Unrelated tasks are often arbitrary tasks that have been well-developed by psychologists to measure workload. Examples of these kinds of tasks include time estimation, tracking tasks, memory & tasks, tapping tasks arithmetic, and reaction tasks (O'Donnell & Eggemeier, 1986; Schlegel, Gilliland, Schlegel, 1986; Wierwille & Eggemeier, 1993). As workload increases, performance decreases in a predictable manner. However, these tasks can be intrusive, unnatural, and difficult to standardize if subjects are coming from a diverse background. A different approach to secondary task measurement is the embedded secondary task. These use a more natural task that is discreetly embedded within the interface or system, minimizing suspicion and evoking a more natural response from subjects. Cummings (2004) showed that workload for Tomahawk missile operators could be measured using a chat box interface, something highly realistic to actual operations, natural to the overall task, and minimally intrusive. 36 2.3.4.5 Neurophysiological Workload Measures A third approach to measuring workload throughout an experiment or task is physiological tracking. Physiological tracking allows for continuous monitoring of subject state, whereas many primary and secondary task measures can only measure the subject's state at discreet event times. Although physical workload can be a very important factor in environments like high-G military flying (Burton, 1980), scientists interested only in mental workload have several options for workload measurement that all begin with the brain. Cognitive activity is driven by the firing of neurons in different regions of the brain, which consume oxygen and glucose, produce electrical signals, and give off carbon dioxide byproducts (Koechlin, Basso, Pietrini, Panzer, & Grafman, 1999; Miller & Cohen, 2001; Raichle, 2011; & Raichle & Mintun, 2006; Ridderinkhof, van den Wildenberg, Segalowitz, & Carter, 2004; Roy Sherrington, 1890; Speert, 2012; Whyte, 2011). Therefore, studying any one of these elements can be a suitable method for measuring mental activity. Since the brain is always active at some level, each of these elements of neural "combustion" must be compared to a baseline level during a control or resting period. Additionally, each person's individual physiology is different, making absolute comparisons between subjects or even within subjects on different days a challenging task (Rypma & D'Esposito, 1999). This section will give a very brief overview of the noninvasive techniques available. Cabeza (2000), Coyle (2003), and Huppert (2006) provide good overviews of the literature surrounding noninvasive neurocognitive measurement techniques and some guidelines to researchers looking to study mental workload. Invasive techniques such as brain implants and surgery pose higher risk to the subject, higher costs to experimenters, and are often incompatible with studies of conscious, cognitive behavior, which is the focus of most psychophysiological research. Non-invasive methods are much preferred for healthy subjects participating simple experiments because of the lower costs and risks coupled with relatively high reliability of modem techniques (Bennett & Miller, 2010). 37 As mentioned previously, there are three main categories of brain activity indicators: supply, products, and byproducts. The supply category can be broken into glucose and oxygen, the two main ingredients required for neural firing. Glucose uptake can be measured using Positron Emission & Tomography, or PET scan, which uses a radioactive tracer that is analogous to glucose (Buckner Logan, 2001). Cabeza's review of the many measures of workload that have been correlated with glucose & uptake shows it to be a viable and accurate method for neurophysiological measurement (Cabeza Nyberg, 2000). PET scans do come with a cost, though, since they require subjects to receive a radioactive treatment which exposes patients to potentially high levels of radiation. Additionally, the radioactive isotopes to perform PET scans must be produced on-site and have relatively short half-lives, which limits the length of a possible test (Raichle & Mintun, 2006). The second "supply" element is oxygen. Oxygen is carried to the brain via hemoglobin in the blood stream, so there are several different methods for determining how much oxygen is being supplied to the brain. Transcranial Doppler sonography, or TCD, measures the velocity of the blood flow to the brain, or cerebral hemovelocity. Several studies show that increases in cerebral hemovelocity are correlated to increases in mental workload (Droste, Harders, & Rastogi, 1989; Warm, et al., 2009), although this technique is generally limited to entire brain workload analysis rather than measurement in specific regions. Another method for measuring the oxygen flowing to the brain is through functional Near-Infrared Spectroscopy, or fNIRS. Section 2.4 will cover this topic in depth, but simply put this technology allows for the measurement of the concentration of both oxygenated and deoxygenated hemoglobin through the measurement of the absorption of near-infrared light at specific wavelengths. While still a relatively nascent technology, it has been shown to be a viable tool for measurement of neural activity in several regions of the brain (Izzetoglu et al., 2005; Sassaroli, et al., 2008; Wolf, Ferrari, & Quaresima, 2007) and a promising method for populations who are unable to lie motionless for 38 extended periods like infants and the mentally impaired (Lloyd-Fox, Blasi, & Elwell, 2010; Meek et al., 1998; Sakatani, Chen, Lichty, Zuo, & Wang, 1999). With an adequate supply of oxygen and glucose, the brain performs cognitive activities through networks of neuron firing (Raichle, 2011). Using ions separated by membranes, the neuron is able to generate electrical potentials that are transmitted throughout the brain. This process is controlled by the nucleus of the neuron, which uses glucose and oxygen to regulate the electrical activity. When the activity is greater, such as during a period of elevated mental activity, the nucleus uses greater amounts of glucose and oxygen to fire more rapidly and produce more signals. The electrical output, therefore, is actually the most direct method for measuring what is truly going on in the brain, while the supply chain is merely a support mechanism. Electroencephelography, or EEG, attempts to measure these electrical signals through probes placed at various locations around the skull. This technique has been in use since the 1940s and has been shown to be a viable workload measurement tool (Berka et al., 2007; Berka, et al., 2005; Dussault, Jouanin, Philippe, & Guezennec, 2005), brain-computer interface mechanism (Coyle, et al., 2003; Hu et al., 2011; Tan & Nijholt, 2010) and assistance device for the physically or mentally impaired (Luld et al., 2012). It can provide high temporal resolution but has low spatial resolution and is susceptible to motion artifacts such as blinking and head movement (Hu, et al., 2011). It is a relatively well-developed technique, with algorithms that can filter out many types of noise and artifacts (Manyakov, Chumerin, Combaz, & Van Hulle, 2011), yet it still has a relatively low signal-to-noise ratio, low spatial resolution, and can be intrusive to subjects (Coyle, et al., 2003; Nijholt, Bos, & Reuderink, 2009; Tan & Nijholt, 2010). The third phase of the neural firing process is the removal of waste, which generally takes the form of deoxygenated hemoglobin that is carried away in the bloodstream. Deoxygenated hemoglobin can be measured by fNIRS because of its specific light absorption properties, but it also can be measured using its magnetic properties. Functional Magnetic Resonance Imaging, or fMRI, takes advantage of 39 these magnetic properties to measure the concentration of deoxygenated hemoglobin with excellent spatial resolution (Buckner & Logan, 2001; Carr, Rissman, & Wagner, 2010; Logothetis, Pauls, Augath, Trinath, & Oeltermann, 2001). Numerous studies have measured brain activity during various task types and have shown activation in different regions, showing the utility of fMRI and reinforcing that different parts of the brain are responsible for different kinds of mental activity (Cabeza & Nyberg, 2000; Causse et al., 2013; Cohen et al., 1993; Curtis & D'Esposito, 2003; Jaeggi et al., 2003; Manoach et al., 1997; McCarthy et al., 1994; Ochsner, Bunge, Gross, & Gabrieli, 2002; Price, 2010; Tootell, Hadjikhani, Mendola, Marrett, & Dale, 1998). One of the key advantages of fMRI is the ability to map the entire brain during an activity, allowing researchers to examine which regions work in conjunction during a certain task (Monchi, Petrides, Petre, Worsley, & Dagher, 2001). fMRI is an excellent tool because of its spatial resolution (S10mm 3) but it does have several drawbacks (Carr, et al., 2010). First, it is limited in its temporal resolution, with a typical system taking 2-4 seconds per slice for 5-16 slices. It is expensive to own and operate, with average costs running from $200/hour up to $1 100/hour at the Massachusetts General Hospital Martinos Center for Biomedical Imaging. Since subjects must lie in a tube with surrounded by a large and acoustically noisy magnet, it is a very unnatural environment and can reduce cognitive abilities or divert attention from the primary task (Haller et al., 2005). Subjects must also lay very still, which precludes using this technique on many populations such as the young or the handicapped. Finally, fMRI only measures the deoxygenated hemoglobin signature, so is may be difficult to get a full picture of the entire hemodynamics, since oxygenated and total hemoglobin levels are closely coupled to deoxygenated hemoglobin. Still, it remains a very effective and standard tool because of the high spatial resolution, entire brain-mapping capability, and extensive history (Bennett & Miller, 2010; Cabeza & Kingstone, 2001). In contrast to fMRI, fNIRS is less capable in measurement of the entire brain but can be used in more natural settings (Ayaz et al., 2012; Girourd et al., 2009; Helton et al., 2010; Hirshfield et al., 2009; 40 Izzetoglu et al., 2011; Sassaroli, et al., 2008; Shimizu et al., 2009; Solovey, 2009; Solovey, et al., 2012; Son, Guhe, Gray, Yazici, & Schoelles, 2005; Tsujimoto, Yamamoto, Kawaguchi, Koizumi, & Sawaguchi, 2004; Tsunashima & Yanagisawa, 2009). fNIRS sensors generally are mounted on a headband or cap that is quickly applied, is comfortable for extended wear (hours), and can be used in realistic settings like desktop computer work (Solovey, 2009), simulators (Shimizu, et al., 2009; Tsunashima & Yanagisawa, 2009), or actual on-road driving (Takahashi et al., 2011). This type of sensor setup can be seen below in Figure 3and Figure 4. The use of fNIRS in natural settings is an important distinction because laboratory tests do not always translate into the complexities and nuances of real-world activities. The lack of total brain mapping capability with fNIRS may result in less information to draw correlations between different brain regions and map neural networks, but researchers looking to study mental workload, working memory, and other executive functions are primarily focused on the prefrontal region, so this may be an acceptable tradeoff. If a researcher was interested in another type of activity, such as a motor task, fNIRS sensors could be applied to that region of the brain to measure related activity. This has been done in populations that cannot lie still, such as mentally handicapped or infants where fMRI is difficult or impossible (Meek, et al., 1998). OP" Ladw Figure 3: fNIRS sensor diagram for prefrontal cortex measurement (Sassaroli, et al., 2008) 41 Figure 4: Operator with fNIRS sensors mounted on forehead 2.3.4.6 Other Physiological Workload Measures While measuring the brain may provide the most direct measures of mental activity, there are many other secondary measures that have been shown to reliably predict mental workload (Kramer, 1991). Physiological measures have several pros and cons. The intrusiveness of physiological measures ranges from minimal impact for items like heart rate trackers to higher impact with eye trackers or respiratory monitors. However, they are generally non-intrusive to the primary task, they can measure activity even when the subject is not physically interacting with the system, they provide a multidimensional measurement of the subject, and can record continuous data. They require specialized equipment, data is often relatively noisy, and physiological signals can come from multiple bodily sources which may have little to do with the experiment, so experimenters must always weigh the pros and cons when considering using a physiological measurement (Kramer, 1991). Some of these measures, 42 such as heart rate or blood pressure, can be directly traced to an increase demand by the brain, but many are secondary responses that are merely correlated with increased mental workload. Physiological measures can be divided roughly into three categories: cardiovascular measures, ocular measures, and sympathetic nervous system measures. While other types of measures such as posture (D'Mello, et al., 2007) or muscle tension (Wierwille, 1979) have been suggested, the vast majority of non-neurological physiological measures fall into one of these categories. Cardiovascular measures are some of the most commonly-used methods for tracking workload over time. Kramer lists the most common measures of cardiac activity, including electrocardiogram (ECG or EKG) measures like heart rate and heart rate variability, blood pressure measures, and blood volume measures. Generally, heart rate, heart rate variability, and blood pressure all increase during periods of high mental workload (Durantin, et al., 2014; Hjortskov et al., 2004; Pattyn, et al., 2008; Roscoe, 1992; Sirevaag, et al., 1993). These measures are relatively easy to obtain with a simple heart rate and blood pressure monitor, both of which are minimally intrusive to the subject. Respiratory rate can also serve as an indicator of higher mental workload, although it can be a more intrusive measurement technique since the subject is required to don a mask of some kind (Roscoe, 1992; Sirevaag, et al., 1993). There are several different measures of eye activity that are associated with mental workload. Pupil dilation has been found to be a good measure of workload, with increased dilation occurring during periods of high workload (Beatty, 1982). In addition to the pupillary response, some eye motion components such as dwell times and scan pattern variability are shown to correlate well with mental workload, while others such as eye speed and large-amplitude movement frequency are more weakly correlated (May, Kennedy, Williams, Dunlap, & Brannan, 1990). Finally, several eyelid measures such as blink rate, blink pattern, and closure duration have been proposed as possible workload measures with varying degrees of confidence (Wilson, 2002). Overall, it appears that changes in pattern are more telling than changes in raw characteristics when measuring movement factors (Kramer, 1991). Much progress 43 has been made in the past few years in minimizing the intrusiveness and improving the usability of eyetracking devices, with various types of trackers available that can be head-mounted or desktop-mounted. The third type of physiological workload measurement is through symptoms of the sympathetic nervous system, or SNS, which is part of the autonomic nervous system, or ANS. The SNS is commonly associated with the "fight-or-flight" response and stimulates many systems in the body when activated. One of the most commonly-used measures is galvanic skin response, which measures sweat produced in certain regions of the skin. Galvanic skin response has been associated with mental workload in several different environments (Berguer, et al., 2001; Davies & Krkovic, 1965; Wierwille, 1979) and is a relatively low-intrusive technique. Another SNS response to workload is the release of hormones such as catecholamines (Frankenhaeuser & Lundberg, 1982; Lundberg, 2005), cortisol (Dickerson & Kemeny, 2004) and norepinephrine, which is more commonly called adrenaline (Frankenhaeuser, et al., 1971; Frankenhaeuser & Patkai, 1965). These indicators can be measured in real-time although are generally measured through blood drawn immediately following an experiment, which can make this a more intrusive method. The great concern with all of these SNS responses is that they are often criticized as measuring stress rather than mental workload, two different phenomena that are often but not necessarily linked (Hancock & Desmond, 2001). Therefore, it is prudent to use these methods in conjunction with other physiological measures in order to reduce the type I error. Overall, the use of physiological measures of workload is promising but should always be done with a caution of confounding variables. Many factors beyond just mental workload can influence physiological responses such as physical activity, environmental conditions, sleep and fatigue, metabolism, physical conditioning and more, so it is incumbent upon the researcher to control as many variables as possible. Additionally, it is sometimes difficult to link a behavioral event with certainty to a physiological measure because these are all secondary functions to actual mental activity in the brain, so there will always be a degree of systematic error involved. With that in mind, these measures are also 44 relatively cheap, well-studied, and minimally-intrusive methods for collecting information about the state of the operator at different times throughout a task. 2.4 Neurophysiology and functional Near-Infrared Spectroscopy (fNIRS) Since fNIRS will be central to the experiment described in this thesis, this section will expand on the neurophysiological phenomena being measured by fNIRS and how those phenomena correlate to mental workload. This section will also discuss some of the limitations of using fNIRS. 2.4.1 Neurophysiology At the core of neural hemodynamic studies such as fNIRS and fMRI is the blood oxygen leveldependent (BOLD) signal. Quoting from the 2011 Encyclopedia of Clinical Neuropsychology regarding the BOLD signal in fMRI: BOLD imaging is a version of magnetic resonance imaging that depends on the different magnetic properties of oxygenated versus deoxygenated hemoglobin and, thus, indirectly, on variations in local tissue perfusion. The utility of BOLD imaging for fMRI also depends on the physiological phenomenon by which metabolically active cerebral tissue "demands" more perfusion than less-active tissue. Thus, populations of neurons that are particularly active during a cognitive or motor task actually elicit a surplus of perfusion which, in turn, results in an increase in the ratio of oxygenated to deoxygenated hemoglobin, detectable as a change in the BOLD signal. (Whyte, 2011) This phenomenon can be traced back to 1890, when Roy and Sherrington noticed that regional blood flow increased in areas of neural activity (Roy & Sherrington, 1890). Today it is generally accepted that an increase in neural activity in a certain region will demand greater blood flow to supply more oxygenated hemoglobin and to remove deoxygenated hemoglobin. While very useful for its noninvasive qualities, the BOLD signal is an indirect measure of what scientists are truly trying to measure, neural activity. The hemodynamic response is a byproduct of neural activity and is often 45 delayed by several seconds or is confounded by other tissue or hemodynamic phenomenon. Additionally, BOLD measures the relationship between oxygen delivery and oxygen extraction rather than actual oxygen consumption like PET. Finally, fMRI techniques generally extract useful information by comparing two conditions of activity rather than against an absolute reference, which makes it an excellent tool for classic "stimulus-response"-type experiments but less useful for naturalistic experiments, where there is still some debate about what a "resting" state truly looks like or if one even exists (Whyte, 2011). 2.4.2 fNIRS Background fNIRS was first developed in 1977 by Jobsis's discovery that light at certain wavelengths in the near-infrared spectrum passes through bone and tissue but is absorbed and scattered by hemoglobin (JMbsis, 1977). Subsequent development refined the technique to be able to accurately measure oxygenated and deoxygenated hemoglobin in the brain and other regions of the body (Chance et al., 1998). fNIRS functions by injecting near-infrared light from light-emitting diodes or lasers at certain wavelengths into the region of interest. This light passes through the skin and bone and is absorbed and scattered by the hemoglobin circulating through the region of interest, which can be seen in Figure 5 and Figure 6. The amount of light that is returned to the sensor is measured through photomultipliers, which are converted into the digital signal used for post-processing. Finally, this optical signal is processed through the Modified Beer-Lambert Law to convert optical intensity into hemoglobin concentration. This general process is applicable to all tNIRS devices, although the exact methods for measuring hemoglobin vary slightly between machines. 46 Figure 5: Scattering of Photons in Tissue (from ISS, Inc.) Figure 6: Light Penetration in Brain Tissue using NIRS (from ISS, Inc.) tNIRS devices for fingertip blood oxygenation are nearly ubiquitous in the medical community, but their usage as a cognitive research instrument is still modest but growing (Wolf, et al., 2007). INIRS could be considered a cousin of fMRI because they both measure the same BOLD signal but in different ways. While fMRI measures hemoglobin levels using the different spin characteristics of oxygenated and deoxygenated hemoglobin, fNIRS observes the same physical characteristics by measuring the absorption of light at 690 nm and 830 nm. Due to the optical properties of hemoglobin, the concentration of oxygenated and deoxygenated hemoglobin can be determined when combined with models of the optical properties of the brain tissue. Since fNIRS and fMRI measure the same physiological response (hemoglobin concentration), several researchers have tried to correlate results from the more-established fMRI to the relatively nascent fNIRS in order to bring greater reputability to independent fNIRS studies. This comparison between fMRI and fNIRS has proven valid in a number of comparison studies, most notably a by Strangman (2002) and Steinbrink (2006). Strangman shows that the response recorded by fNIRS during a motor task was similar to the response simultaneously recorded by fMRI, and Steinbrink reviews nineteen different fNIRS-fMRI simultaneous studies which show general overlap in conclusions. Cui captured simultaneous fNIRS and fMRI data in the frontal and parietal lobes during a battery of cognitive tasks and 47 & found a lower signal-to-noise for fNRS but a highly correlated response (Cui, Bray, Bryant, Glover, Reiss, 2011). fNIRS can also be used in complimentary roles with EEG, as shown by Hirshfield (2009). EEG is also commonly used in brain-computer interfaces. EEG is a well-studied method for measuring brain activity, but has several limitations. Although "dry" electrode systems exist, many systems still require a gel application to the scalp to attach the electrodes, which is inconvenient for usage outside the lab. Additionally, EEG is susceptible to several types of artifacts, including physiological differences, environment changes, body movements, and especially ocular movements (blinking, eye movement, etc) (Hu, et al., 2011). Finally, all data must be processed through feature extraction, which is highly context dependent and relies on algorithms to select the best features for real-time analysis. Using fNIRS simultaneously with EEG may help to show the overlap in capabilities so that researchers already wellversed in EEG can confidently use methods or results from fNIRS experiments. To summarize, using multiple brain recording devices allows researchers to record a response in multiple channels, improving the certainty that a response truly occurred while diminishing the noise that a single channel may have. 2.4.3 How fNIRS measures cognitive activity Neural activity in a local region generally results in an increase in oxygenated hemoglobin and a decrease in deoxygenated hemoglobin, although this simplification does not capture the full complexity of the brain activation response (Raichle & Mintun, 2006). While a simplification, it does reflect some of the overall mechanisms of the brain as a muscle, with neurons consuming glucose and oxygen to produce electrical signals through a process called neurovascular coupling (Le6n-Carri6n & Le6n-Dominguez, 2012). As Raichle and Mintun point out though, while the brain consumes nearly 20% of the total energy of the body it only accounts for roughly 3% of its mass, with much of the activity still occurring at resting 48 state. Therefore, the relative changes due to functional activation are not nearly as large, for example, as might be expected in the bicep while doing a weighted curl compared to resting. The tissue of the brain does not store oxygen well, so supplies must be constantly replenished, which means that studying the blood flow is roughly equivalent to studying the overall content of the blood volume. fNIRS measures the concentration of oxygenated and deoxygenated hemoglobin in a localized region. Since these resources are constantly being used, even during resting, there must be a constant resupply of oxygenated hemoglobin to the region. The high ratio between blood flow and blood volume indicates that while little oxygen resources are stored in the brain, much oxygen is still required. This can easily be confirmed by the fact that most humans lose consciousness in less than 15 seconds after blood flow is cut off to the brain. However, the brain only uses about 40% of the oxygen that passes through it during a normal state, indicating that there is generally some excess capacity to handle instantaneous jumps in activity. The "aerobic" mechanism to supply more oxygen to regions with sustained activity does not begin to engage until 4-6 seconds after the initial response. Some have postulated that there may be an "initial dip" in the first second of activity, but once the brain senses sustained activity, it will begin to provide greater overall blood flow to provide the region with more oxygenated hemoglobin (Marxen, Cassidy, Dawson, Ross, & Graham, 2012; Obata et al., 2004; Steinbrink, et al., 2006). There are several hypotheses for the "initial dip", including local changes in oxygenation concentration due to metabolic demand and an increase in total blood volume. The signal-to-noise ratio for these systems makes answering this question very difficult, which is why transient phenomena less than one second, both immediately following the stimulus presentation and during the return to baseline, are still an issue of & some controversy and under investigation (Marxen, et al., 2012; Schroeter, Kupka, Mildner, Uludag, von Cramon, 2006). Additionally, there is often a tradeoff between spatial and temporal resolution, so machines such as fMRI generally are too slow while systems such as fNIRS do not have enough spatial 49 resolution. This challenge is especially apparent in real-time systems, where both spatial and temporal resolutions are obstacles to quick, precise measurements. Even though the research into the transient phenomena is still nascent, a much broader body of work has found that deoxygenated hemoglobin levels are generally found to decline during the first 4-6 second response to increased mental workload and then trend back towards steady state. Overall, sustained mental workload is correlated to a sustained increase in oxygenated hemoglobin and an initial drop followed by recovery towards equilibrium for deoxygenated hemoglobin (Steinbrink, et al., 2006). The nominal response is shown below in Figure 7, with the hypothesized "early dip", the "aerobic" response corresponding to sustained perfusion and neuronal activity, and finally a return to baseline after the response-generating event has concluded. According to researchers at Tufts University's Biomedical Engineering Department, the magnitude of the response is generally 3-5% of the baseline flow, but can range up to 10% for total blood flow during intense activity (Mandeville et al., 1999). Notional Hemodynamic Response Event -10 Oxy - Deoxy "Steady state" "Steady state" -20 -- 0 40 50 30 10 20 Time from stimulus onset (sec) 60 70 80 Figure 7: Nominal hemodynamic response 50 2.4.4 Using fNIRS to measure workload Many see brain-computer interfaces as a way to measure mental workload to ultimately ease the cognitive burden on humans, or help the mentally disabled (Tan & Nijholt, 2010). As a workloadmeasurement device, fNIRS is also compatible with other methods traditionally used by cognitive psychologists and human factors engineers such as heart rate measures, eye-tracking measures, and skinresponse measures. Several studies have used fNIRS with techniques like heart rate variability (Durantin, et al., 2014), or combinations of systemic signals like heart rate, blood pressure, galvanic skin response, respiration, and scalp blood flow (Jelzow et al., 2011) that show the correlation between secondary physiological responses to mental workload. Jelzow found that galvanic skin response and mean blood pressure have the highest correlation with hemoglobin concentration changes, with r values up to 0.5. Additionally, fMRI studies such as Gianaros (2004) and Napadow (2008) show significant correlations between specific brain regions and cardiac measures like heart rate variability, with correlation t-values ranging from 4 to 6 and p<0.05. These studies show that fNIRS is a reliable workload measurement tool that correlates well with other physiological signals and can capture additional information about cortex activity. These studies show that fNIRS correlates well with other physiological signals and can capture additional information about cortex activity. Psychophysiological measures are especially useful because they have the potential to discriminate mental activity from just stress. Stress often accompanies mental workload but is not a prerequisite since it is possible to have high levels of stress but low levels of mental activity (Wierwille, 1979). Using secondary physiological responses can help reinforce the brain data or help to remove potential artifacts that might arise such as overall changes in blood pressure or heart rate. Similar to the miniaturization and commercialization of other biometric measuring devices, as fNIRS is miniaturized and mobilized there will likely be an expanding pool of researchers using this device to measure blood flow in the brain. With miniature or wireless devices, fNIRS could be used outside of tightly controlled research environments to capture mental activity in entirely natural 51 environments like the cockpit, where stress and danger are far more realistic. A wireless system would allow for even greater comfort, improved mobility, and possible introduction into more extreme environments where a tethered system would either be impossible, impractical or unsafe. Additionally, smaller and cheaper devices could make brain-computer interfaces more ubiquitous as a viable commercial product for adaptable automation systems. While there is great promise for fNIRS, it is still a developing technology with many problems still to be solved. Challenges using fNIRS were found in the literature (Le6n-Carri6n & Le6nDominguez, 2012; Wolf, et al., 2007) and through discussions with experts in the domain such as Dr. Angelo Sassroli of Tufts University's Department of Biomedical Engineering. The sensors are very sensitive to light, so currently measures must be taken to minimize exterior light, whether through shielding the sensors or through dimming the testing environment, dampening the realism of some situations. If the probes are applied incorrectly, fNIRS is also susceptible to light channeling, which can flood the sensor with ambient light and wash out any actual signals (Wolf, et al., 2007). While fNIRS is more robust to motion artifacts like eye blinks or minor head movement, it is still susceptible to gross head movements so it could not currently be counted on to be reliable if applied to some critical tasks. There are several different filtering methods to eliminate artifacts, but these methods are still susceptible to failure and have only been tested in tightly controlled laboratory experiments (Solovey, 2009). At a physiological level, fNIRS also has several limitations that are summarized by Leon-Carrion and Leon-Dominguez (2012). fNIRS is limited in the depth it can measure into the brain, due to excessive scattering and absorption that occurs as light travels farther into the tissue and the optical window of the light wavelengths used. This limits the ability to study activity deep in the brain, restricting studies to only the outer structures. Dark skin or hair can also absorb some wavelengths and attenuate signals. Additionally, brain tissue composition and geometry vary slightly between humans so it is difficult to get absolute measures of hemoglobin levels without knowing the individual's differential 52 pathlength factor (DPF). DPF is a constant that is used in the calculation of hemoglobin concentration and can be obtained through measurement of the absorptivity of light through the tissue. Without the DPF, it is possible to only make conclusions about relative changes, not absolute ones. The actual DPF can be determined through additional measurements and processing, but it is generally considered a constant under the assumptions of homogeneous tissue makeup and constant brain geometry from one measurement to another (Wolf, et al., 2007). The other main assumption of fNIRS is the diffusion approximation. In order to use the Boltzmann transport equation to convert light intensity into hemoglobin concentration, it must be assumed that the tissue is homogeneous, scattering is much larger than absorption, and the tissue has a specific geometry (Wolf, et al., 2007). fNIRS is also limited in its ability to detect highly localized changes in activity. When localized activity occurs and oxygen is consumed, the brain reacts by oversupplying the entire region with oxygen, which is the response that is detected by fNIRS. This physiological limitation allows researchers to determine the general region of activity but precludes knowing the precise location of activation. Measuring the initial response in the first second is a difficult challenge because of limited spatial resolution and low signal-to-noise, and it is likely only possible to definitively quantify these transient phenomena with improved equipment and processing techniques (Marxen, et al., 2012). Finally, fNIRS relies on the hemodynamics of the brain, so it is limited in precision and accuracy by the biological mechanisms that control the brain and the individual differences between every human. While there are still many problems to solve with fNIRS, it remains a promising technology that could help to better understand human cognition and augment human performance. 53 2.5. Summary This chapter describes the important phenomena surrounding jobs that are primarily low intensity but have critical periods requiring operator engagement and performance, such as BMD, ATC, or UAVs. Operators of these systems must manage extended periods of vigilance and boredom, which are shown to be detrimental to performance and operator well-being. Mental workload is one of the critical elements of performance and should be moderated from becoming too high or too low. When operators transition from one level of mental workload to another, there are several additional stressors that can impact cognitive abilities, and designers should take care to provide adequate decision aids to facilitate these transitions. Mental workload can be measured in many ways, such as task performance, subjective ratings, or physiological measures. The physiological response of the brain has been measured through devices such as EEG, fMRI and fNIRS to track different physiological signs of increased activity. fNIRS is a noninvasive method for tracking mental workload by measuring blood oxygenation in specific regions of the brain, and is a promising tool for psychophysiological study in natural settings in and outside of the lab. The next chapter introduces a study conducted using fNIRS to measure the neurophysiological response of military-age subjects during a simulated BMD mission, looking at how varying periods of low taskload can impact the response during a cognitively challenging task. 54 3. Experimental Methods 3.1 Experimental Framework The experiment employed a simulation designed to mimic aspects of the job of an Unmanned Aerial Vehicle (UAV) sensor operator. This operator's job is to track threatening objects in a ballistic missile defense environment until their positions are known with sufficient accuracy that they can be engaged by defense missiles. The engagement timeline is very short, so many actions and decisions must be made very quickly, often with uncertain information and potentially without warning. The simulation was a Java-based environment where subjects were required to allocate radar tracking assets to reduce track error on simulated ballistic missile threats to a specified threshold in an unclassified test scenario. The primary task of the operator was to allocate assets to track any threats to reduce track error, and the secondary task was to monitor text messages, known as the chat box, for any messages or alerts and the map for situational awareness during the test. The scenario was the launch of an unknown number of threats from a mid-Pacific location towards other locations in the Pacific. The system simulated having satellite coverage to alert the operator when the threats were launched. The operator was then required to track the threats via UAV sensor to achieve a predetermined accuracy. In the simulation, all threats followed one of 3 predetermined trajectories unknown to the operator. The operator achieved the required accuracy by controlling the tracking sensors on 3 UAVs. In this simulation, the operator allocated the UAVs with the sensors that track the threats. The tracks are subsequently fed to the Fire Control System, which uses the track data to engage targets with interceptor missiles, but this aspect is not part of the study. The subjects were informed that their mission was to achieve a certain level of tracking accuracy on all targets. If the threshold was not met, the 55 interceptor missiles could not be fired since they cannot acquire the targets independently. The operator had the following displays, which can be seen together in Figure 8 and in detail in Table 1: 0 3 windows showing the field of regard of the tracking sensor for each UAV * 1 window showing track accuracy for a selected target * 1 message panel recording events and system messages * A map showing a 2D representation of the UAVs and targets 0 A timer and clock * A chat box The primary goal of the operator was to assign UAV sensors to track the threats. Each UAV can only track one threat at a time, and the objective was to track each threat long enough to achieve a track error below the specified threshold. If necessary, the operator could use multiple UAVs together for "stereo viewing" to achieve a lower track error much more rapidly than when only using one UAV per target. Thirty to sixty seconds after launch, the targets became visible in the UAV sensor tracking windows, which can be seen on the left side of the display in Figure 8. The participant does not know when the event will occur and receives no direct alert that an event is about to begin. The targets suddenly appear in the UAV Sensor Tracking Windows and in the Tracking Error Display at the start of the event. The operator receives a message in the "System Message Display" that the system is on alert before the event, but also receives at least one false alarm message before the event, as seen in Appendix F. These messages helped reinforce the supervisory control task and did not directly signal the start of an event, but may have provided some priming for subjects. The missiles remained in the UAV sensor tracking window until they are out of the field of regard of the UAV. As the threats pass through the fields of regard, the operator must re-task the UAVs to achieve the required error on all the threats. Other tasks that the user had to do were monitoring the chat box for new messages and monitoring the map to keep situational awareness of where the threats are at all times. 56 System Track Error Display I 2-D Map Display Message Display I I Sensor Tracker Displays I I Simulation Simulation Clock Timer Chat Message Display Figure 8: Operator Display 57 COMPONENT NAME COMPONENT PICTURE UAV Sensor Tracker Display -Shows targets available to be tracked by UAV sensor (ovals) and target currently being tracked (outlined with dashed line) -Operator clicks on an available target to direct the sensor to focus on target -Small square represents where sensor is actually pointing -If sensor is not pointing at the threat, it is not receiving any tracking data -The sensor can only slew at a finite speed, so sometimes there is a small lag time when the sensor is directed to a new object -Shows elevation (y-axis) vs. azimuth (x-axis) Message Panel -Displays messages from the system when launches are detected -Is not interactive but functions to alert when event may be starting Track Error Dispflay -Shows the track error of system (y axis) on the selected target vs. time (x axis) -Track error must go below a pre-calculated threshold of the interceptors in order to engage the target -Operator can toggle between targets by clicking on target boxes on the left side of window 58 2-D Mai Display -Shows where UAVs are located, which UAV is tracking which threat, where each threat is currently located, and the calculated impact point for each threat -Used as a situational awareness tool for the operator -Solid lines show which UAV is currently locked on to which missile -Dotted lines show track of missile every 10 seconds Chat Box -Displays messages from "command" -A simulated commander that is physically separated from the operator -Used to measure subject workload and situational awareness -Does not display UAV system information Table 1: UAV Component Descriptions 3.2 Experiment Conduct and Data Collection The test was conducted in the Human-Computer Interaction Lab at Tufts University. All procedures were reviewed and approved by MIT's Committee on the Use of Humans as Experimental Subjects (COUHES). All subjects were asked a series of eligibility questions and then were asked to read and sign a consent form. Following the training period, the subjects were seated in front of the two monitors used to interact with the system. The participants were knowingly video recorded, and all computer interactions were collected using Camtasia* recording software. The primary data collection was through the simulation computer logs, although subjects were informed that video recordings are used as a backup in case of data logging failure. The simulation logged all interactions, such as chat box message responses, final performance measures, and number of clicks on the various objects. The video recording was combined with the interaction recording in order to create a log of user activity for the periods surrounding the critical events. The encodings follow a similar encoding scheme as Mkrtchyan 59 (2012), Thornburg (2011), and Hart (2010) by classifying the subjects as directed, distracted, or asleep/ completely unaware attention states. These encoding states are further discussed in Table 3. The subjects stayed seated in the testing room for the entire 3 hour experiment. In addition to the video recording and computer recording, data was also collected using a functional near infrared spectroscopy (fNIRS) measurement device. The device employed in this research was the Imagent Functional Brain Imaging System Using Infrared Photons, developed and manufactured by ISS, Incorporated. This device is a "non-invasive tissue oximeter for the absolute determination of oxygenated and deoxygenated hemoglobin concentration, oxygen saturation and total hemoglobin content in tissues". The overall process for data collection can be seen below in Figure 9. The data were collected using the Boxy software package created by ISS Inc. and went through several steps in the processing stream. The raw data returned from the Imagent are simply the light readings from the sensors. These data are first transmitted to a computer, which has software that decodes the raw data from the Imagent and converts it into a standard format of optical intensity with associated time markers. The data are then sent across a local network to a second computer which runs a MATLAB script which writes the data into a formatted file. These files are then processed using the Homer2 user interface developed by the Massachusetts General Hospital Martinos Center for Biomedial Imaging. This software uses the Modified Beer-Lambert Law to convert the light intensities into hemoglobin concentration levels and allows the user to apply certain filters such as bandpass filters. Homer2 also allows the user to extract the hemodynamic response function (HRF) from the overall data. The HRF is a plot of the concentrations for a given amount of time surrounding the event. HRF data was collected using the 60 seconds prior to a missile wave and the 100 seconds following the arrival of the first missile, which was the length of the critical event period. This period was chosen since it was short enough to simulate the time pressure of a real-world engagement, but long enough to capture a variety of responses by participants. Subjects were not explicitly informed how long 60 the event would last, but the training tutorial provided a guide for the approximate length of a normal engagement. They were also instructed that it was important to act quickly since threats may only appear in the viewing area for a short period of time. Some of the key sampling and processing parameters can be seen in Table 2 below. Light is absorbed differently by different hemoglobin components Near Infrared light pulses are sent through prefrontal cortex Signal processing software turns detection signal into brain activity Near Infrared light sensors receive transmitted light pulses pattern by Boxy software Brain activity patterns compared between varying conditions and to benchmarked data using Homer2 Figure 9: fNIRS data collection method Parameter Description Sampling frequency 12Hz Sources spacing 9 total (4 left, 5 right). Spaced linearly from sensor. Dist. in cm: Left: 2.04, 2.52, 3, 3.45 Right: 1.48, 1.95, 2.46, 3.0, 3.45 *Only 8 sensors used at one time (4 left, 4 right) Source laser Fiber coupled laser diodes Wavelengths: 690nm, 830 nm Avg power 10mW Light detectors Photomultiplier tubes Sensors Selected side-on photomultiplier tubes Low pass filter 0.5 Hz Table 2: Data Collection and Processing Parameters 61 Operators must be able to quickly switch attention between incoming information from multiple sources (multi-tasking) while storing and synthesizing that information to create a unified mental model of battlespace (working memory). Both of these functions are associated with the prefrontal cortex, so the fNIRS sensors were applied to the forehead directly over the region of interest, as seen in Figure 10. NIRS optode Figure 10: fNIRS Probe Applied to Forehead (Scholkmann, Klein, Gerber, Wolf, & Wolf, 2014) The subject donned the fNIRS measurement device at the beginning of the calibration period and wore the device throughout the entire session. The subject was asked to try to avoid moving the sensors in any way and to refrain from furrowing their brow since that has been shown to inhibit good data collection from the device (Solovey, 2009). While the system is resistant to minor movement, it was imperative to closely monitor the data and the subject during the experiment to determine if the device has significantly moved from its original position. Other factors such as blinking, minor movements, and heartbeat either have a low impact on good data collection or can be mitigated through filtering (Solovey, 2009). The fNIRS data were recorded throughout the entire experiment. In addition to the fNIRS data, there were several other sources of data collected. Before the experiment begins, the subject filled out three surveys. The first survey was a demographic survey which 62 recorded factors such as age, gender, occupation, military experience, and sleep, as seen in Appendix C. The second survey was the Boredom Proneness Survey, a standard for measuring propensity to boredom (Farmer, 1986). The third survey was the NEO Five Factor Inventory, a standard for measuring the "big five" personality traits (McCrae & Costa, 2010). Previous studies have shown that conscientiousness may play a significant role in performance in supervisory control situations (Thomburg, et al., 2011). Subjects also filled out a NASA TLX workload survey at the end of the experiment, as well as a customized workload survey to record responses which can be seen in Appendix E. The final form of data collection was done through the use of the Camtasia@ software and video recording. The participant behavior in the two minutes prior to each event was quantified by a twoperson panel using the encodings in Table 3 to track the subjects' attention state during the two minutes preceding a critical event. Table 3: Video Coding Criteria Attention State Criteria Directed (1) The participant appears focused, is scanning both displays, is only monitoring or interacting with the interface, and is not doing any other task. Distracted (2) The participant is awake but may be drowsy (rapid blinking, rubbing eyes, head on hands without moving, extended eye closures, etc). The participant is looking outside the screen for extended periods, is playing with an object besides the display (cell phone, hair tie, etc.), or is staring blankly at the screen for long periods without activity. Asleep/Unaware (3) The participant is not paying attention to the interface at all and is completely asleep or unaware of the interface. 63 3.3 Experimental Design While there was only one data set, the overall study can be divided into a primary and a secondary experiment. The primary experiment was a between subjects test consisting of only the data relevant to the first wave of missiles. In order to maintain a high level of congruence with real-world operations, repeated waves of missiles would be confounded by learning and priming effects. Of critical importance in this investigation is not how operators may handle the fourth or fifth wave, but rather how they handle thefirst (and unexpected) wave. The effects of surprise and novelty place special demands on operator cognition that are theoretically and practically difficult to replicate with repeated trials, especially when limited by time and resources. The primary experiment gathers data for assessing operator performance in the transition period from a low workload environment to a high workload environment using performance metrics from the simulation, behavioral coding, and psychophysiological analysis. The primary analysis deals only with the data leading up to and during the first wave of missiles. After the first wave is complete, it is invalid to assume participants will return to their pre-event state, so any data collected after the first wave cannot be used to inform our research questions about the effects of low to high workload transition on performance and hemodynamic response. The secondary experiment aims at determining whether learning or fatigue effects are present by presenting a second wave of threats at a later point in the experiment. Since subjects were recruited for a 4-hour block, the experiment could last up to 3 hours in total no matter what time the first wave was presented, with one hour for filling out surveys and completing training. This presented a second data collection opportunity at the end of the experiment to perform a repeat of the between-subjects testing done in the first wave as well as a within-subjects test of changes between the waves. Data collection during this wave was important, but was only done as a collection of opportunity and was kept as a strictly secondary task to the primary experiment. 64 The independent variables for the primary experiment were: 1) Time from beginning of simulation until the first appearance of targets. This builds on research already conducted that suggests that operators reach a "boredom" state after about 25 minutes of low-engagement activity (Mkrtchyan, et al., 2012; Thornburg, et al., 2011). Targets began to appear at either 40, 100, or 160 minutes. These times allow for comparison to previous studies as well as ample time for prolonged boredom to develop. 2) Number of targets presented to the operator. At a level of three targets, the subject can usually assign one of the three sensors to each of the visible threats. However, at six targets the subject must allocate limited assets to the task, adding a much higher cognitive workload to the process of assigning assets. This condition demands more resources from the operator and was hypothesized to expose variations in performance. The primary dependent variable was the fNIRS data, but other dependent variables included assessments of performance, such as scenario performance and behavior coding, as well as assessments of demographics, such as NEO-FFI 3 score, age, or video game experience. The primary performance task of the missile defense simulation was using the system to minimize the tracking error. When the system initially detects a missile, it only has a rough estimate of the missile's position and velocity. Using the UAV tracking system helps to create a more refined picture of where the missile is and where it is going by reducing the track error. Once it goes below a threshold level of error, other missile operators can engage the threat with intercept missiles. Track error achieved for each target, the number of targets that meet threshold track error, and response time to chat messages were collected for each subject. 65 The dependent variables in this study are summarized as follows: 1) Subject performance during tracking tasks: a. Percentage of threats tracked to the predetermined track error threshold. The threshold was determined through pilot studies to a level at which most participants found the task challenging but possible to complete successfully. b. Average final track error for all threats 2) Subject response time to chat box messages (secondary workload measurement) 3) a. 300-500 second intervals during low workload periods b. 15-20 second intervals during high workload and transition periods Subject response to subjective workload assessment questions a. Experiment-specific post-event questionnaire (see Appendix E: Post-Experiment Survey) b. NASA Task Load Index (TLX) 4) fNIRS data to assess prefrontal cortex activity a. 60-second baseline before event b. 100-second period during event c. Period following event corresponding to a return to baseline levels 5) Behavioral codings (see Table 3: Video Coding Criteria) a. In 2 minutes before events b. Periods of sleep during entire experiment c. Case studies of unusual behavior In order to measure the mental workload of the subject, the chat box was used as a secondary task measure. Previous work on cruise missile controllers has found that the chat box can suitably measure performance across a range of visual workloads (Cummings, 2004). Subjects received chat messages at 66 pseudo-random intervals and with varying degrees of interaction varying from a personal question to a simple system status message. Precautions were taken in the implementation to prevent the secondary task from interfering with the primary task. In order to avoid the secondary task measure from becoming an influential force on the study, the chat box is located outside the central area of the display. During the low task load periods, chat box questions or statements were presented pseudo-randomly every 300-500 seconds. During the high task load period, questions were presented only every 15-20 seconds, and questions asked were very simple to minimize time away from the primary interface. Although impossible to fully eliminate any possibility of interruption of the primary task, the location, frequency, and salience of the chat box were tested and adjusted during pilot testing to ensure minimal distraction. Furthermore, the subjects were clearly instructed on the hierarchy of tasks at the beginning of the experiment and told to prioritize the mission over responding to chat messages. The use of a chat box as a secondary workload measure provides some insights into the subject's workload by measuring the response time to a chat message. It is expected that as workload moves to the extreme low end, the individual will have lower engagement and motivation which will result in a slower response to a chat box message, shown by the blue solid line in Figure 11 below (Wickens & Hollands, 1999). More recent studies have shown that operator response is not necessarily diminished in low work load environments, shown by the dashed black line (Hart, 2010). On the converse, if a subject is overloaded, he or she will also have lower performance on a workload measure because of the limited resources available to respond appropriately. If the subject does not acknowledge the message at all, it can safely be assumed that they are at either end of the workload spectrum since they are either completely unaware of the message or completely overwhelmed with primary tasks. While this experiment focuses primarily on the two ends of the workload spectrum, at a medium workload level subjects will likely have high performance at both the primary and secondary task. 67 randomly placed into one of the experimental groups from Table 4 above. Participants were paid $75 dollars for participation and informed of a $150 dollar gift card prize for the participant that achieves the best performance. The experiment timeline is detailed in Appendix A. 3.5 Summary This chapter describes the experiment conducted to measure the effect of time in low task load and situation difficulty on workload transition. Thirty participants were recruited to take part in a missile defense simulation that follows a supervisory control structure. The hemodynamic response was recorded throughout the entire 3-hour experiment using fNIRS. Subject tasks during low task load included monitoring the system and responding to chat messages. Subject had to perform a dynamic asset allocation problem to try to get 3 or 6 targets better than a certain performance threshold. The event onset time and difficulty level was unknown to the participant. Each participant received one event at either 40, 100, or 160 minutes, and all subjects received a second event at 180 minutes. Subjects filled out several surveys including a demographic survey, the Boredom Proneness Index, the NEO Five Factor Index, the NASA TLX, and a debriefing survey. Video recording was also performed for each subject. 69 70 4 Results This section first introduces the methods for data analysis for the experiment described in Chapter 3. It begins with a description of the methods for reducing the data from full time series to singular data points suitable for statistical study. This chapter then describes the various statistical tests and modeling methods applied and summarizes the most important results. Finally, it explores measuring long-term trends in the data relating to boredom and fatigue. 4.1 Data Processing As described in Chapter 3, the raw light intensity data produced by the fNIRS device was recorded into a file containing a measurement from each sensor distance for each point in time (for a total of 24 measurements at 12Hz recording rate). A representation of the data recording can be seen in Figure 12. This dataset was then converted into oxygenated and deoxygenated hemoglobin using the Homer2 software. The dataset was then filtered using the parameters listed in Table 2 from Section 3.2 to remove the majority of artifacts and noise. Using Homer2, the Hemodynamic Response Function (HRF) was extracted from the entire 3-hours of data. Of particular interest in the 3-hours of data is the interval surrounding the missile tracking event. The HRF contains the 60 seconds prior to the arrival of the first missile, referred to as the baseline, and the 100 seconds following the arrival of the first missile. This baseline was chosen because it includes enough baseline data to cancel out a majority of artifacts. The missile event lasted 100 seconds, so that is all that was included in the event dataset. Once the baseline and event segments of the data were extracted from the overall time series, the data for analysis was extracted using the "average of max" technique. The magnitude of the maximum was found for each signal within the event period, and then those four maximums were then averaged together to get an overall average maximum for the subject's oxygenated hemoglobin concentration (HbO). This procedure was also done to find the average minimum for deoxygenated hemoglobin 71 concentration (HbR), since HbR is found to generally decrease during increased brain activity. The sum of oxygenated and deoxygenated hemoglobin, total hemoglobin concentration (HbT), was also calculated. The maximum magnitude was chosen over an average magnitude for several reasons. First, using the average of several average responses tends to minimize any significant responses, so using the average of maximum focuses on the strongest part of the signal. According to Resource Theory as described in Chapter 2, cognitive errors are most likely to propagate when cognitive demands meet or exceed cognitive resources. At these critical moments, the brain is working the hardest and so measuring the maximum response provides a mechanism for comparing the critical response of subjects. This same method was applied to the baseline period to achieve a consistent data analysis procedure. Sensor Source HHbO Signal -ED HbH -S3IJ HbT HET HbT -S HS HbT T Figure 12: fNIRS Data Functional Diagram 4.2 Results 4.2.1 Sample Summary The sample of 30 participants was drawn from a Boston-area university using an online student message board. All communications were conducted according to COUHES-approved protocol. The 72 mean age was 21.3 years (s.d 2.51), and ages ranged from 18 to 31 years old. The sample included 12 males and 18 females. Twenty one participants identified as undergraduate students, seven identified as Master's students, and two identified as other status. A full summary of all variables collected can be seen in Appendix G. 4.2.2 Baseline Analysis Once the average of maximum magnitude was determined for the HbO and HbR event and baseline periods were determined, a myriad of statistical tests were performed. First, a comparison of total hemoglobin (HbT) values between the baseline period and each of the six experimental conditions using a 2-factor ANOVA (difficulty x onset time) found no significant differences between any of the conditions, (F(2,24)=0.453, p=0. 8 07 ). Similar trends were seen for HbO and HbR. These results provide confidence that there were no inherent differences confounding the control variables. The results from this test can be seen below in Figure 13. *10 S.001 onset I160 40 25 4.00" 2.00- 1.0016 .00Hard Easy difficulty Figure 13: Baseline Comparison 73 4.2.3 First Event Analysis Next, simple examination of the effect of scenario difficulty and first wave onset time was performed to measure the impact of the experiment independent variables. The raw HbO and HbR data were converted into percent change factors (over the baseline values) in order to determine the relative changes occurring in each subject. The mean HbO percent change was 60.5% (s.d. 124.1%), with a minimum of -123.4%, maximum of 337.6%, and median of 25.64%. The mean HbR percent change was 81.5% (s.d. 150.4%), with a minimum of -107.2%, maximum of 594.2%, and median of 39.1%. Twofactor ANOVAs for percent change in HbO and HbR found that difficulty was not a significant factor in HbO (F(1,24)=1.471,p=0.237) or HbR (F(1,24)=0.298,p=0.71) response in terms of percent changes, but that onset time was a significant factor in both HbO (F(2,24) = 7 .6 4 1, p=0.00 3 ) and HbR (F(2,24)=3.304,p=0.054), which can be seen in Figure 14, Figure 15ab, and Figure 16 and in Table 6 and Table 7, which can be found in Appendix H. The most striking result was that the 100-minute onset time was found to have a lower marginal mean response than the 40- and 160-minute cases, indicating a diminished response for subjects during the middle of the experiment, the time when others have found attentional inefficiencies to be highest (Hart, 2010). 74 onset 160 100 40 4.003.002.00 I 1100 000 -.00- -li - I II Ia I II a *1I I CL -2.00 C 4.00- I 3.002.001.00- Sl .00- I -1.00- III II I I CL I -2.003 6 9 12151821242730 3 6 9 12151821242730 3 6 9 12151821242730 subject Figure 14: Subject Number vs. HbO % Change HbR HbO onset 2.00- -40 100 160 onset 2.00- 40 00 1.60 so- I I 1.00" .5S0- I-- IA& 100- -so- Hard Esy Hard Easy difficulty difficulty Figure 15a,b: Estimated Marginal Means for % Change HbO (left), HbR (right) 75 In addition to computing the results for the percent changes in HbO and HbR individually, the ratio of HbO divided by HbR was also calculated. This may provide a measure of how oxygenated and deoxygenated hemoglobin vary together and give greater insights into the overall hemodynamics (Gagnon et al., 2012). Using this ratio, onset time was found to be a significant factor in the hemodynamic response (F(2,24)=4.163,p=0.028), with the 100-minute onset time having a higher hemodynamic response ratio than the 160-minute cases, as calculated using a Tukey HSD test for all pairs (p=0.0 2 6 ). The summary of the pairwise comparisons for onset time using the HbO/HbR ratio can be seen in Table 8. onset 27 Ils 4.00- 14 09 z 2.00- ~-2.00 H.,d a-V dImculty Figure 16: HbR % Change from baseline 4.2.4 Time to the Maximum and Return to Baseline Following the analysis of the primary dependent variables (HbO and HbR) relationship to the independent variables (number of missiles and onset time), several auxiliary analyses were performed to fully capture the response of subjects. The first auxiliary analysis was a measurement of time to achieve the maximum HbO and the time to return from the maximum back to the baseline. To perform the time to the maximum analysis, the HbO signal was averaged at each time point in order to generate a single HbO signal, and then this average signal was used to find the time from the beginning of the event to the 76 occurrence of the monotonically increasing maximum. Due to the -6 second lag that can occur between activity and hemodynamic response and when it is observed in the data, the window for the time to maximum was extended to 120 seconds from the start of the missile event. The average time to the maximum was 69.1 seconds (s.d 50.4 sec). The time to the maximum was not found to be significant for onset time or difficulty, as seen in Figure 17. Estimated Marginal Means of TimetoMax onset -40 100 160 100.00- 40.00- EEar difficufty Figure 17: HbO Time to the Maximum The second time-dependent measure was the time to return to the baseline. First, the mean and standard deviation of the 60-second baseline period were calculated. These values can be seen in Appendix I. Next, the return to baseline calculation was done by measuring when a I 0-second sliding average window of the HbO signal returned to within one standard deviation of the baseline mean following the maximum calculated above. The average total time to achieve the maximum and then return to baseline was 153.4 seconds (s.d 127.9 sec). Table 5 shows the Kruskal-Wallis test applied to the data that found the return to baseline time was significantly different for missile wave onset time (X2(2) =8.788, p=0.0 12), but was not significant for scenario difficulty. 77 Figure 18 shows that there is a difference evident when comparing time to return to baseline for the three different onset times. The 160-minute condition was the driving factor of the variation, indicating that the physiological response of returning to baseline was slower when participants were subject to the 160 min onset event time. This result points to greater evidence that fatigue and fNIRS signals may be an important relationship. 600.00- 500.0s u400.0 Test StatIstIcs" Chi-Square Returntoase line 8.788 df 2 .012 1Asymp. Sig. a. Kruskal Wallis Test b. Grouping Variable: onset Table 5: Return To Baseline test results 4.2.5 300.0 zoo-o- 100.2T 40 to0 160 on"et Figure 18: Return to baseline time Performance The next set of calculations analyzes the performance of the subjects on the missile tracking simulation. The average final track error and average percentage of missiles tracked to the threshold level were compared between the 3- and 6-missile scenarios. Due to hetereoscedasticity in the results, the results were calculated using several non-parametric methods. The Mann-Whitney test (U=37.00) and Wilcoxon test (W=157.00) were performed to compare final track error between onset time conditions and difficulty conditions, finding difficulty to be significant (p=0.002) but onset time not significant (p=0.3 11). The discrepancy in performance for difficulty can be readily seen in Figure 19. Onset time was not found to be a significant factor in performance, but Figure 20 suggests that there may have been differences in the 100-minute 6-missile condition compared to the other 6-missile conditions. The 78 Wilcoxon and Kruskal-Wallis tests were applied to compare all 6 conditions together and found there was statistical variation between the groups (X2()=12.897,p=0.0244). Video analysis of participants in this condition showed no abnormal or exceptional characteristics or behavior, so the clear differences in the 100-minute condition suggest the difference may have come from the effect of time and difficulty, as opposed to extraneous variables. Brown-Forsythe tests for unequal variance found no significant differences (p=0.285) in variance for overall onset time effects as well as comparing onset time within just the 3- or 6-missile conditions. With only five subjects per experimental group, the statistical power for these tests was low, so additional data may help to confirm or deny these results. 300.0 onset 300.00- I 0 200.W too1060 20000- 200DO -4 20 200.00- 44 00.00- 100.00 SO.00- 50.00. .00- bo, taov *0S EaV murd Hard dffcuky dfficufty Figure 20: Average Final Track Error by Time and Difficulty Figure 19: Average Final Track Errorby Difficulty Similar calculations were performed using the percentage of missiles tracked below the specified performance threshold. Again, difficulty was shown to be a significant factor but onset time was not, as can be seen below in Figure 21 and Figure 22. The Mann-Whitney U was 66.5 and the Wilcoxon W was 186.5, leading to a 2-tailed significance of 0.040. This result confirms that the 6-missile scenario was significantly harder than the 3-missile scenario, which was the expected result. However, the lack of significance between the onset times demonstrates that regardless of how long subjects waited to interact with the system, they performed statistically no different. 79 I. 00- . so- I. Go- . 20004y Hmrd difficulty F igure 21: % Below Threshold by Difficulty 06 1.00- 5 onset 70 .80- C *5 .6044 .4004 .20- .00Hard Easy ditflculty Figure 22: % Below Threshold by Time & Difficulty 4.2.6 Model Creation In order to analyze which of the myriad of possible demographic, covariate, and independent variable factors were most important in predicting performance, all possible factors were input into a backwards linear regression model and tested for significance. The predicted variable was the average final track error, which was a primary measure of performance. Since this is an error term, a lower 80 average final track error corresponds to better performance. Since this was a model based on the simulation performance, the average final track error is a relative term that is useful only comparing between subjects but not anchored to any real-world parameters. The predictor factors included age, gender, Boredom Proneness survey scores, NEO-Five Factor Index Scores, time to max HbO, time to return to baseline, the hemodynamic features of percent change of HbO and HbR, video coding scores of distraction, video gaming experience and NASA TLX scores. The video coding scores come from a 2person panel who rated each subject's behavior during the 2 minutes prior to the event as directed, divided, or asleep/completely distracted, as described in Chapter 3.5. The rating panel reached consensus on each subject. The performance of subjects categorized by distraction coding can be seen in Figure 23. 200.00. 100W SO.sdd0too0g-*0 distraction Figure 23: Average Track Errorvs Distraction Coding State The model, presented below in Equation 4.1, shows that there are four significant factors that influenced performance. The NEO-FFI component Agreeableness was found to negatively correlate with performance (p=0.00 7 ). NEO-FFI responses for all thirty subjects can be seen in Figure 24. Distraction levels generated by the video coding were also a significant predictor of performance, with increased distraction corresponding to decreased performance (p=0.050). Video game usage, identified during the demographic survey and summarized in Figure 25, was a significant predictor as well, with "gamers" performing better than non-gamers (p=0.0 2 2 ). Finally, HbR was shown as a significant predictor, with a 81 greater magnitude response in deoxygenated hemoglobin corresponding to an increase in performance (p=0.019). The modeling parameters and significance levels can be seen in Table 9 in Appendix H. The model parameters are A (NEO-FFI Agreeableness), distraction (video coding of distraction), AvgMinR (HbR), videogame (video game usage). The R2 for this model was 0.397, thus the model fit was moderate to strong. FTE = flo - 6.896(A) - 42.061(HbR) + 37.54(D) - 17.637(VG) + E Where FTE - Average Final Track Error (lower = better performance) po - Model intercept constant A - NEO FFI-3 Agreeableness rating HbR - Minimum of deoxygenated hemoglobin during event minus baseline D - Distraction coding from section 3.2 VG - Videogame usage s - Model error 50 NEO-Five Factor Index Scores 45 40 35 E 30 S 25 0 Neuroticism Extraversion Open to Experience Agreeableness q:1 Conscientiousness -- 20 15 10 5 Figure 24: NEO-FFI Scores 82 (4.1) 20 Video Game Usage Histogram ----- 15 10 444V Figure 25: Video Game Usage 4.2.7 Chat Box Analysis The impact of participants' responses to the chat box was also analyzed for many of the previously mentioned variables. Since the chat box was used as a secondary workload measure, it provided an additional measure of workload over time. The average chat question response time was calculated for the period from the start of the experiment to the start of the 'first wave of missiles. The average chat response question time was 13.7 seconds (s.d 7.96). The number of chat questions missed entirely in that period was also computed. The mean number of missed questions was 0.4 (s.d 0.67). Neither average response time nor total questions missed were significantly correlated to the hemodynamic measures of HbO, HbR, or HbT during the event. Subjects in the "100-minute, hard" condition had a higher response time, which lends more evidence to the conclusion that they were less engaged than the other conditions. This difference can be seen in Figure 26. When the NEO Five Factor Index of Conscientiousness was included as a covariance measure, chat response time was significantly different for difficulty level (p=0.0 4 4 ) and weakly related to onset time (p=0.080) using a 2-factor 83 ANOVA. These results are summarized in Table 10 in Appendix H. difficulty 30.00 11 0 .00 40 100 IGO onset Figure 26: Chat Response Time vs. Primary Variables It is also important to note here that the chat box may have unintentionally cued subjects to look at the screen right before a missile event. A pre-determined script of messages was randomly spaced between 300 and 500 seconds to mimic the pace of actual operations. All subjects received 70 chat messages in total, with 4 messages presented during the event period. All subjects received the same script in order to provide continuity across conditions. The chat script used the computer clock, while the simulation used a server located at Lincoln Lab, which would occasionally lag for a second or two due to system processing. Since the simulation clock was sometimes slower, chat messages intended to arrive immediately after the start of the event would arrive slightly before the appearance of any missiles. Consequently, 25 of 30 subjects received a message within the 60 seconds before the start of the first wave and 15 received a message within the 20 seconds before the event. The table of times for all subjects can be seen in Appendix J, and a histogram of the times can be seen in Figure 27. The times correlated with subject performance (r = 0.232), but a linear regression did not reveal a significant slope (F(1,28)=l.58, p=0.2l8). Since all subjects received a message within the 140 seconds prior in a relatively random manner, it is reasonable to accept that it did not confound the experiment significantly. 84 Overall, the usage of chat box response times lends further evidence to previous supervisory control experiments and contributes to the conclusion that participants in the "100-minute, hard" condition were less engaged than other groups which led to worse performance. Time of Last Chat Message Before Event 10 8 c6 Cr4 10 20 30 40 50 60 70 80 More Time before Event Figure 27: Time of Last Chat Message Before Event 4.2.8 Second Wave Analysis A similar set of analyses were performed for the second wave of missiles. As described by Table 3 in section 3.2, the first wave analysis is the only legitimate analysis of the effects of a mental workload transition from low taskload to high taskload for a novel event, since repeated waves are confounded by situational priming and learning effects. However, the second wave analysis is helpful in understanding the effects of fatigue and learning, and is also useful for making broad conclusions about mental workload. The second wave was a repeat of difficulty from the first wave, and all subjects received the second wave at 180 minutes. Comparison tests from wave one for the difference in hemodynamic response were repeated for Wave 2 and showed that there were no significant differences in HbO, HbR, or the ratio HbO/HbR for either difficulty or Wave I onset time. Difficulty was found to have a significant impact on average final track error when measured by the Mann-Whitney test (U=42.00) and the Wilcoxon test (W=147.00, 85 p=0.006). Wave 1 onset time did not have a significant effect on Wave 2 average final track error (F(2,28)=0.81, p=0.455). HbR and Agreeableness were found to be a weak predictor of performance, with HbR correlating with average final track error correlation (p=0.067) and Agreeableness correlating with percentage below threshold (p=0.069). The data in Figure 28 show that there was large variation in Wave 2 performance for subjects who received the 100-minute, 3-missile condition in Wave 1. This is due to the fact that one subject fell asleep entirely and another subject nearly missed the entire second wave event. These occurred 40-80 minutes after Wave 1 was complete and these subjects both performed adequately answering chat questions before the first wave and tracking the first wave missiles below the threshold, so these findings do not invalidate the conclusions made about Wave 1 above. Additionally, those that received the Wave 1 160-minute, 6-missile condition appeared to struggle considerably in their Wave 2 performance when compared to their Wave 1 performance, which can be seen in Figure 29. onset 040 500.00- 160 400.00- 300.00- LA. 200.00 S 100.00 24 .3 Hard E.y difficulty Figure 28: Wave 2 Average Final Error vs. Time & Difficulty 86 FinalfrackErrori 500- FlnalTrack2 400300Im 200- S 100Sb 0- CL 91 MC 4 a 500400 300200- i 24 24U 100- *4 040 100 160 onset Figure 29: Wave 1 vs. Wave 2 Final Track ErrorPerformance 4.2.9 Lateralization In addition to looking at the overall physiological response of the subjects, the data was split hemispherically in order to examine if there were lateral effects between left and right brain hemispheres. Psychology literature indicates that each hemisphere may have some task type specialization (Helton, et al., 2010; Shimoda, Takeda, Imai, Kaneko, & Kato, 2008; Tucker, 1981). Paired t-tests found no evidence of lateralization of the hemodynamic response when measured globally or broken down by difficulty or onset time in an ANOVA, (Table 1 land Table 12 in Appendix H). The distribution of HbO and HbR lateralization can be seen in Figure 30 and Figure 31. A linear regression model fit for Wave 1 and Wave 2 performance using HbO, HbR and HbO/HbR lateralization predictor variables found that no measures of lateralization were significant predictors of performance, (Table 13 in Appendix H). 87 L-R hbe 1& L-R hbo 2 vs. Dlfflty& Onmo Thme HbftLatwaflintlon 0Z L-R hbo 4 1:1 L-Rhbl1 L -R hb 10 -10 00 10 Difi.uty/ 40 10 160 40 100 100 E rf Ontim Waufy 40 100 160 HASrri Figure 31: HbR Lateralization Effects (LeftRight) Figure 30: HbO Lateralization Effects (LeftRight) 4.2.10 Long-Term Effects In addition to the analysis of the response directly before, during, and after events, the hemodynamic data was also analyzed to look at trends over the course of minutes or hours. Visual inspection revealed that 25 of 30 subjects had a fairly strong, consistent pattern of steadily increasing HbO levels during the first 30-60 minutes of the experiment, with steady levels of HbR. Conceptually, the time frame associated with this phenomenon suggests that it may be related to the vigilance decrement. This well-studied phenomenon shows performance declines over the first 30 minutes of a vigilance task and then stabilizes or continues to decline but at a slower rate. Figure 32 shows a response of a subject for the full 180 minutes, with the first 30 minutes highlighted. In order to compare with the classic vigilance research that shows vigilance performance declines linearly for the first 30 minutes, the time to level-off for HbO was calculated by fitting a linear slope through a 20-minute moving window and then declaring a "level-off' when the slope either becomes negative or is less than 0.1% of initial slope. These empirically derived constants for window size and level-off criteria aligned well with visual inspection and captured the main effects of the phenomena. If the subject was determined to have no 88 slope or negative slope for the initial 20-minute window, they were excluded from calculation. Twenty six of the 30 subjects exhibited this HbO vigilance pattern but only 15 subjects were classified as having HbR vigilance patterns, indicating HbO may be the primary driver of vigilance response. Figure 33 shows a histogram of the level-off time for HbO, and the figures for each subject with a mean slope can be seen in Appendix K. Subl"m 31 so 30 F-- HbOl Event Start 25 20 15 0 2000 4000 s000 6000 10000 12000 Tirne (s) Figure 32: HbO Response for Subject with Vigilance Decrement Pattern The mean HbO level-off time was 32.6 minutes (s.d 17.1) with a maximum of 78, minimum of 12 and median of 29 minutes, while the mean HbR level-off time was 23.1 minutes (s.d 11.7) with a maximum of 46, minimum of 11, and median of 18 minutes. This reinforces the "30-minute decrement" that Mackworth found nearly 70 years ago and shows that these behavioral phenomena may correlate to physiological measures. The mean slope of the period from t=0 to the level-off point for HbO was 0.241 micromolar/min (s.d. 0.169) and for HbR was -0.04 micromolar/L/min (s.d. 0.080), as seen in Figure 34. Several studies (Berka, et al., 2007; Helton, et al., 2010; Warm, et al., 2009; Warm, et al., 2008) have measured a similar trend using other neuroimaging techniques, but to the author's knowledge this is one of the first uses of fNIRS to measure the vigilance response. 89 Figure 33: HbO Level-Off Time Slope to Level-O - 0.8 0.6 0.4 0.2 0 -0.2 HbR HbO Figure 34: Slope to Level-Off There were several other long-terms trends of interest. The first test looked at the variance of the HbO and HbR signals over the course of the experiment by dividing the 180-minute test into 4 periods. The overall test was divided into the first 30-minute period and then three 50-minute periods since it is hypothesized that the first 30 minutes may have a difference physiological response due to vigilance effects. ANOVAs for HbO and HbR variance and range were both inconclusive (i.e., no statistical significance). The final test was looking at the area between the HbO and HbR curves. This integral between HbO and HbR could potentially be used as a cumulative measure for work, similar to what is described by the Fick Principle that oxygen consumption is the oxygenated blood minus the 90 deoxygenated blood, normalized by the total blood flow (Robertson et al., 1989). Comparing the integrals for the four periods using ANOVA found no significant differences (F(3,26)= 1.35, p=0.26) as seen in Figure 35, but this metric should be re-examined in future studies. F rYACo-Migner IO4f 4qtt*at 14 12 to. I 0 -2 Figure 35: HbO-HbR integral for 4 quarters 4.3 Summary Primary data analysis shows that hemodynamic response was affected by onset time but not by difficulty. Separate from hemodynamic response, the 6-missile scenario proved to be a significantly more difficult scenario than the 3-missile scenario, as expected, but first wave onset time was not found to be a statistically significant factor in either first or second wave performance despite large differences in the 100-minute onset time condition that appear upon visual analysis of the data. A model using all possible input parameters found that Agreeableness, Video Game experience, pre-event behavioral state, and HbR response were significant performance predictors. Lateral analysis showed no major distinction between left and right activity during mission response. Long-term trend analysis provides some evidence that there is a physiological correlate to the vigilance decrement. 91 92 5. Conclusions This chapter addresses conclusions regarding the experiment described in this thesis and examines how the results fit into the broader context of low-workload, human supervisory control tasks. It also discusses possible confounding variables and limitations of this experiment. Finally, it provides recommendations for future work using functional brain imaging in low-workload environments. 5.1 Experiment Conclusions There are several conclusions that can be drawn from the primary experiment. First, the two independent variables measured showed interesting relationships between difficulty and onset time. As expected, participants who received the 3-missile scenario had performance scores significantly better than those who received the 6-missile scenario. However, it was expected that this increase in objective difficulty would correspond to a physiological difference between the difficulty levels, which was not found. This result indicates that the physiological differences in mental workload were not significantly different in this experiment when measured between the two difficulty levels. Unlike other studies (Girourd, et al., 2009; Kramer & Parasuraman, 2007; Sassaroli, et al., 2008; Wilson & Russell, 2003; Wilson & Russell, 2007), this study did not discriminate levels of mental workload. There are many possible reasons for why this experiment did not detect any significant differences with respect to degree of difficulty. First, the number of trials was very limited for this fNIRS experiment. Although there were 30 subjects studied (which is significantly more than most previous NIRS studies), each subject only received one event for the primary experiment and two events overall, which stands in contrast to other fNIRS studies in which each subject receives several repeated trials in order to obtain an average response. However, one could argue that this test scenario was more realistic in terms of work environments. In order to truly capture the very low workload and rare event occurrence of events in 93 something like a ballistic missile defense environment, a repeated-measures structure would change the character of the task and the subject response, so this limitation could only be offset by running additional trials. Second, the subjects in this experiment were novice operators. While provided with an instruction tutorial, practice scenario, experimental guidance, and knowledge check, subjects were still dealing with some uncertainty in using the interface, which could have contributed to the lack of statistical difference in physiologic response. Third, the method for determining the response of the subjects through the use of the average of the maximum method could bias the result away from sustained workload and towards instantaneous workload. While subjects may have had to sustain their mental workload for greater periods of time in the 6-missile scenario, subjects could have reached instantaneous peaks for the 3-missile scenario that were equivalent to 6-missile levels but not sustained. Finally, while several different filtering methods were applied to reduce noise and artifacts from the data, it is still possible that some of the true responses may have been attenuated, magnified, or otherwise altered by phenomena other than increased blood flow due to prefrontal cortex activity. In contrast to the analysis regarding degree of difficulty, analysis of the impact of onset time showed some remarkable results. The most significant result was that the 100-minute condition was significantly different for both HbO and HbR in the first wave response, with the greatest difference between the 100-minute group and 160-minute group. The best explanation for this is that subjects were at their least engaged and least primed at the middle point of the experiment, as opposed to at the relative beginning and end. Video analysis and experimenter notes suggest that subjects were more primed at the 40-minute mark than the 100-minute mark, and began losing concentration after an hour. Most subjects returned to a more alert state in the last half hour since they knew the experiment was coming to a close. This result is very relevant to understanding mental workload over the course of long-duration, lowworkload tasks. Over the course of a shift, an operator's attention and arousal can follow a pattern of 94 high at the beginning, low in the middle, and a return to higher at the end of a shift, so a corresponding physiological difference at the middle onset time suggests that the brain may be less apt to handle difficult situations during the middle of a boring shift. This response has been seen in a similar study (Hart, 2010). Ultimately, the result that onset time was a significant factor in hemodynamic response but not task difficulty lends evidence to the argument that INIRS is a better tool for measuring engagement, rather than simply mental workload. The finding that the 100-minute onset time was linked to a diminished psychophysiological response lends evidence to the hypothesis that mental workload transition is not simply an accumulation of work or boredom, but rather a process dependent upon state of engagement or attention at the start of the workload transition. When combined with other results such as the vigilance decrement and the return to baseline, the power of fNIRS may truly be greatest as tracking mental state rather than just mental work, a potentially more powerful measure. Arguably the most important relationship is between hemodynamic response and performance, and whether performance can be reliably predicted from a physiologic state. The backwards regression model discussed in the previous chapter generated striking results. The first significant prediction factor was the NEO Five Factor Index level of Agreeableness. In short, subjects who scored lower on Agreeableness tended to perform better than those who scored higher. One of the primary traits of Agreeableness is trust, so it could be inferred that subjects less willing to trust the system to function were more alert, allowing them to perform better when forced to rapidly shift their paradigm of operation (Costa Jr, McCrae, & Dye, 1991; McCrae & Costa Jr, 1999). People with a low Agreeableness rating are also more likely to be competitive than those higher on the Agreeableness spectrum, which aligns well with the fact that there was a large prize for the top performer. The second factor that proved to be significant was video game experience. Several studies (Boot, Kramer, Simons, Fabiani, & Gratton, 2008; Green & Bavelier, 2003) have linked video game experience to performance in various kinds of simulation environments, especially military-related (Chen 95 & Barnes, 2012; Clare, Cummings, How, Whitten, & Toupet, 2012; Cummings, Clare, & Hart, 2010). Clare (2012) found that video game experience was linked to automation trust, which adds evidence to the hypothesis that trust, video game experience, and supervisory control task performance are interrelated. Subjects with game experience are likely to be more comfortable operating with a computer interface and dealing with complex scenarios, so it is a very telling result that game experience was found to be a predictor of performance. The third major factor in predicting performance was the state of behavior in the two minutes preceding the event, which were generated from the video analysis. Subjects who were coded as divided (N = 5) or asleep/unaware (N= 1) before Wave 1 were found to perform significantly worse than those coded as directed. This result is consistent with other similar experiments, which found behavior state to be linked to performance (Hart, 2010; Mkrtchyan, et al., 2012). Additionally, this result solidifies the importance of attention in priming a subject for a cognitively demanding challenge. Even with chat messages within the two minutes before the major event, many subjects still struggled to stay in a properly primed state, especially those whom received the event at the 100 minute time. Subjects who were not engaged had to command more attentional resources to first focus on the task and then perform at a high level, as opposed to subjects who were already cognitively prepared to tackle a challenge. This result reinforces the importance of being task-focused in operational settings, especially settings where the potential for distraction is high. The final performance predictor was HbR. This is an important result because it indicates that performance can be related to hemodynamics. HbR is often viewed as a more reliable indicator of workload since it is a measurement of the resources being consumed, so the result that subjects who performed better consumed more resources fits well with the theoretical underpinnings of cognition and psychophysiology. Furthermore, this result gives credence to the notion that this technology could be used in by adaptive automation to help modulate mental workload in order to maximize overall system 96 performance based upon the measurement of hemodynamic signals during high workload periods. Workload is more directly related to task volume or rate, while engagement also incorporates task type and operator interest, so simply manipulating task volume may not be sufficient to maintain sustained attention. This result should be interpreted with caution, however, since HbR cannot predict performance alone and has to be considered in light of the other variables controlling for other sources of variability. None of the four predictors were significant on their own. These results are significant for a number of reasons. First and foremost, the correlation of physiological trends with vigilance improves the external validity of this experiment. It also provides evidence to the overarching notion that physiological measures can be used in the low workload periods to make inferences about psychological state. fNIRS is just one of many methods that could be applied towards monitoring operator arousal, stress, and mental workload during boredom and low workload, and when used in combination, these techniques could provide reliable indications of when subject engagement and focus begins to deteriorate. Second, these results provide a physiological perspective of Hart's findings (2010) that attentional inefficiencies were highest in the middle of a low task load experiment. Finally, these results lay the foundation for future work that can look at longer periods before and after an event to determine how a subject's pre-event state affects transition and, ultimately, performance. While this study looked specifically at the time directly surrounding an the events in question, analysis of broader periods of time surrounding events could provide even more information into what factors influence workload transition and event performance. Looking back at the questions stated as goals in Section 1.1, there are several questions that were answered and several more that remain open paths for investigation. The four research questions are listed again below: 97 1. Is a change in the measured activity of the fNIRS data correlated to an actual change in real mental workload? 2. What is the impact of time in low taskload environment on operator performance and response? 3. Is the magnitude of a transition from low workload to high workload correlated to performance? 4. Can the pre-transition state of the operator relative to a known baseline be used to predict the post-transition response of the operator? This study showed first that a transition from low to high workload was related to a significant hemodynamic response. It then showed that critical event difficulty did not have a significant impact on hemodynamic transition magnitude, but that time in the low taskload environment before an event was significant. The decrease in HbO for the 100-minute condition along with the degraded performance for subjects that received the 100-minute condition point to the fact that those subjects were the most disengaged and that fNIRS may be a suitable technology for measuring engagement, although that question requires a follow-up study. The model developed shows several factors which were important to predicting performance, including HbR, Agreeableness, video game experience, and pre-transition distraction state. The final questions looking at distinguishing engagement during low workload and comparing pre- and post-transition states remain open areas for future research, although the detection of a possible physiological correlate with the vigilance decrement is a promising result that should be studied further. Overall, this study accomplished many of the goals laid out at the onset, in addition to being one of the first, if not the first, applications of fNIRS for a long-duration, more realistic environment. The demonstration that fNIRS sensors were suitable for long-duration experimentation is an important contribution to the fNIRS field as a whole, and the dataset collected will provide a rich area for future 98 analyses. This is one of the first applications of fNIRS into the low workload domain, and the development of analysis tools and methods in this study has helped lay the foundation for how future fNIRS studies can be analyzed. 5.2 Limitations While every effort was made to control confounding variables and make valid conclusions, there are several limitations that should be discussed for this work. First, the experiment was conducted under only a single blind condition, with the experimenter having knowledge of the experimental condition at all times. This was done in order to properly monitor the subject and ensure the simulation was working correctly, but may have introduced a bias into the experiment. The experimenter tried to avoid entering the room in the 30 minutes before the event and avoid interpersonal interaction, but this was not always possible and may have resulted in attenuating boredom before an event. Second, increased experience in running the experiment may have slightly modified the experimenter conduct over time, especially in regards to addressing questions by the subject about how to best utilize the interface. Third, the chat box may have unintentionally cued subjects immediately before an event. The chat box timer did not always sync perfectly with the simulation, so several subjects received a message directly before an event that may have raised their arousal level unintentionally. These occurrences were noted in the video analysis, but it is still worth mentioning as a potential confounding factor since performance may have been significantly worse in several cases without the inadvertent chat alert. Finally, the presence of video recording may have influenced the actions of subjects. There were also several limitations to the data processing and analysis. The Homer2 software provides a user-friendly user interface for analyzing fNIRS data, but may provide misleading plots when analyzing fNIRS data after a probe has faulted. All the data were checked to remove erroneous points 99 after sensor faults had occurred, but it can still be problematic when using the Homer2 software for visualization. It also automatically centers the data on the average response, which can cause misleading interpretations of long-term trends. This was corrected by setting the "time = 0" point for each HbO, HbR, and HbT signal to a magnitude of zero at the beginning. For the HRF plots, the baseline period of 60 seconds before the event was used to offset the effects of Homer2's data averaging process. The other main limitation of the analysis was the use of a single differential path-length factor (DPF) for the computation of hemoglobin concentration. While a good approximation for most humans, the true DPF varies slightly for each individual based on a number of factors, so the use of a single common DPF limits the analysis. The average of the maximum method may also introduce some error because it may be more susceptible to false readings due to motion artifacts than an averaging method. This was offset by filtering to remove high frequency artifacts, as well as manual exclusion of signals with obvious large excursions. 5.3 Future Work Since low task load research will continue to be an important context for human-automation collaboration, there are many avenues to explore that are suggested by this work. The first area for further research would involve studies similar that collect more responses for each subject. While it would be more difficult to prevent learning effects and capture true boredom, this may help provide greater reliability and insulation against outliers due to artifacts in the fNIRS data. Other permutations of this experiment could modify the overall environment, the amount of training time, the amount of low taskload time, or the complexity of the task. Adding in a dedicated vigilance task such as monitoring a process or video feed could help to elucidate the vigilance findings and increase the similarity to many real-world environments. The second area to explore is looking at using fNIRS to specifically focus on measuring engagement during very low taskload operations, rather than the transition and high taskload periods. 100 The long-duration measures are good first steps towards answering this question, but possible future experiments could use more traditional vigilance tests as a method for determining what an "engaged" brain signal looks like as compared to a "disengaged" brain signal. A cursory investigation to link chat box response time with hemodynamic response found no significant correlation, but an experiment focused exclusively on measuring mental state during low taskload may draw different conclusions. A more extensive machine learning application may uncover trends in the data that could lead to stronger conclusions about engagement state over time. While a difficult problem, tracking brain state over time would provide a very powerful input to any future system and could significantly alter the relationship between a human and a machine. A system that can discern brain engagement during low workload would have major implications in the development of robust adaptive automation techniques for low taskload work. The third area for investigation would be the development of an adaptive automation system for low taskload environments. A primary feature of this system would be to monitor brain activity and then combine it with system inputs or other physiological measures to create a multimodal inference of mental state. Even a simple measure such as drowsiness could have major impacts on critical fields such as transportation where fatigue and drowsiness are routinely cited as cause for scores of accidents every year. If a reliable mental state tracking system could be developed, it could be used in a variety of ways, such as finding factors that can mitigate true boredom and mental disengagement or measuring the differences between various interfaces. 101 Appendix A: Experiment Timeline Time (min) 0 Action -Subject is brought into test room and read the following script: "As part ofa study to analyze the way that humans interact with the highly autonomous systems by MIT's Human and Automation Lab and Lincoln Laboratory, we request your participationin this simulation. Participationis strictly voluntary andyou can choose to withdraw at any time. All datawill be kept confidentialand encoded to ensure participant anonymity. Pleasefeelfree to raise questions at any time throughoutthe experiment. Thank you for your participation." -Subject is first asked the following questions: (Qualifying response in parentheses) "Are you a native English speaker?(Yes) Are you right handed?(Yes) Are you colorblind? (No) Do you have any history ofhead trauma, neurologicaldisorder, or epilepsy?(No)" If the subject answers any questions with a disqualifying response, they will be thanked for their time and removed from the study. Otherwise, they will be presented with consent form, which they will review with the experimenter (Appendix B). Subjects will then take on-line versions of the demographic survey, boredom proneness survey, and NEO Five Factor Index III forms (Appendices C, D, E). Subject will be assigned a number to complete their forms. 10 Once subject completes previous forms, they are presented with powerpoint tutorial of how to operate the software. They will be allowed as much time as necessary to go through the tutorial. The experimenter will be present to answer any questions that may come up regarding functions of the interface, although the experimenter will answer in simplest form possible to minimize experimental bias. 25 The test environment will be opened for a tutorial version of the test. This tutorial will be a simple case of two missiles launched together 90 seconds after the start of the tutorial period. The subject will be allowed to experiment with all the different features of the displays and ask the experimenter questions. The practice session gives the subject approximately 4 minutes to experiment with the system while also showing them the importance of quick action. After the tutorial, the subject will be asked to fill out a 5question test to show proficient understanding of how to operate the system. 30 The subject will be given an opportunity to take a break (water, bathroom, stretch, etc) before beginning the test. The experimenter will read the following: "Ifyou would like, you can use this time to use the restroom, get a drink, stretch or use your phone. This will be your last chance to leave the room once the experiment begins, unless it is an emergency." 35 The experimenter places the sensor onto the subject's forehead. After ensuring proper placement and adequate subject comfort with the device, the experimenter will run the system calibration program. Once the system is calibrated, the experimenter will advise the subject to remain still and calm for a 1-minute baseline period. 40 Experimenter loads the scenario. Once it is configured, experimenter reads the following 102 script: "The experiment is now about to begin. Your primary mission is to respondto the missile threats in the most effective way possible. You should read and respondto chat messages as quickly as possible without compromisingthe primary mission. During the test please do not work on any other tasks or change or block the screens or modify the test computer in any other way. If the clock is running, the system is operating. Please let the experimenterknow of any problems or concerns during the test. Are you ready to start?" At this time, the experimenter answers any last minute questions and then begins the trial. The experimenter will remain close by outside the room to monitor the subject, record any observations, and address any issues that may come up. The experimenter will enter the room every 20-30 minutes to monitor the system function and ensure proper data collection. They will refrain from any interaction with the subject to ensure a sterile environment. The experimenter will specifically avoid entering the room within the 30 minutes prior to any event. 80 Test groups A and B will receive first "test" wave. 140 Test groups C and D will receive first "test" wave. 200 Test groups E and F will receive first "test" wave. 220 All subjects receive second wave of missiles. Second wave is exact repeat of first wave. After second wave, experimenter will enter the room and stop the data recording and simulation. 225 Subjects will be asked to fill out the post-experiment survey and NASA TLX (Appendices F, G). Subjects will also conduct a short debrief interview with the experimenter to discuss interface design, perceived boredom/workload during test, and any other comments on the experiment. 230 Subjects are thanked for their participation and paid, and the experiment is over. 103 Appendix B: Participant Consent Form CONSENT TO PARTICIPATE IN NON-BIOMEDICAL RESEARCH Human Performance in Ballistic Missile Response Scenarios You are asked to participate in a research study conducted by Lee Spence, Ph.D. from the MIT Lincoln Laboratory Advanced Concepts and Technology Group and Mark Boyer from the MIT Humans and Automation Laboratory. You were selected as a possible participant in this study because of your interest in improving human performance in ballistic missile defense scenarios. You should read the information below, and ask questions about anything you do not understand, before deciding whether or not to participate. 0 PARTICIPATION AND WITHDRAWAL Your participation in this study is completely voluntary and you are free to choose whether to be in it or not. If you choose to be in this study, you may subsequently withdraw from it at any time without penalty or consequences of any kind. The investigator may withdraw you from this research if circumstances arise which warrant doing so. * PURPOSE OF THE STUDY Ballistic Missile Decision Support involves a number of very broad and complex issues. The system is very large, it has many interconnected elements, and it is physically spread over an area that is a significant fraction of the Earth. During a ballistic missile response, operators will have very little time to coordinate a defensive actions and may face overwhelming amounts of information. The general purpose of this study to analyze the human response in a ballistic missile defense scenario. 0 PROCEDURES If you volunteer to participate in this study, we would ask you to do the following things: Participate in a 15 minute training session to familiarize yourself with the display and test conditions. Participate in a long duration scenario with long periods of low activity and several threats to address. All of these steps will occur in the Tufts University Human-Computer Interface Lab, 196 Boston Avenue, Medford, MA 02155. 0 POTENTIAL RISKS AND DISCOMFORTS There are no foreseeable risks in participating in this experiment. Since the fNIRS sensors require a snug fit for optimal data collection, the sensors are attached to the forehead using elastic headbands. The 104 headband and sensors may be mildly uncomfortable for extended experiments. Subjects may ask the experimenter to remove the sensors at any time if they feel they are too uncomfortable. * POTENTIAL BENEFITS Your participation in this study will help increase understanding of how humans react to rapid changes in workload in predominantly low workload environments. Although this study focuses on ballistic missile defense, other applications include unmanned vehicle operators, nuclear power plant operators and manufacturing supervisors. * PAYMENT FOR PARTICIPATION Participation in this experiment is strictly voluntary with payment of 125 dollars. Top performing participants can also win a prize gift card of 150 dollars. 0 CONFIDENTIALITY Any information that is obtained in connection with this study and that can be identified with you will remain confidential and will be disclosed only with your permission or as required by law. Your performance in this study will only be coded by your subject number, which will not be linked to your name so your participation in this research is essentially anonymous. 0 IDENTIFICATION OF INVESTIGATORS If you have any questions or concerns about the research, please feel free to contact Lee Spence at Group 36 - Ballistic Missile Defense System Integration, MIT Lincoln Laboratory, 3 Forbes Road, Lexington, MA 02421 (781) 981-5043 or Professor Missy Cummings at 77 Massachusetts Ave., 33-305, Cambridge, MA 02139 (617) 252-1512. 105 0 EMERGENCY CARE AND COMPENSATION FOR INJURY If you feel you have suffered an injury, which may include emotional trauma, as a result of participating in this study, please contact the person in charge of the study as soon as possible. In the event you suffer such an injury, M.I.T. may provide itself, or arrange for the provision of, emergency transport or medical treatment, including emergency treatment and follow-up care, as needed, or reimbursement for such medical services. M.I.T. does not provide any other form of compensation for injury. In any case, neither the offer to provide medical assistance, nor the actual provision of medical services shall be considered an admission of fault or acceptance of liability. Questions regarding this policy may be directed to MIT's Insurance Office, (617) 253-2823. Your insurance carrier may be billed for the cost of emergency transport or medical treatment, if such services are determined not to be directly related to your participation in this study. 0 RIGHTS OF RESEARCH SUBJECTS You are not waiving any legal claims, rights or remedies because of your participation in this research study. If you feel you have been treated unfairly, or you have questions regarding your rights as a research subject, you may contact the Chairman of the Committee on the Use of Humans as Experimental Subjects, M.I.T., Room E25-143B, 77 Massachusetts Ave, Cambridge, MA 02139, phone 1-617-253 6787. 106 SIGNATURE OF RESEARCH SUBJECT OR LEGAL REPRESENTATIVE I understand the procedures described above. My questions have been answered to my satisfaction, and I agree to participate in this study. I have been given a copy of this form. Name of Subject Name of Legal Representative (if applicable) Date Signature of Subject or Legal Representative SIGNATURE OF INVESTIGATOR In my judgment the subject is voluntarily and knowingly giving informed consent and possesses the legal capacity to give informed consent to participate in this research study. Date Signature of Investigator 107 Appendix C: Demographic Survey Demographic Survey 1. Subject number: 2. Age: 3. Gender: MF 4. Occupation: If student, (circle one): Undergrad Masters PhD expected year of graduation: 5. Military experience (circle one): No Yes If yes, which branch: Years of service: 6. How much sleep did you get for the past two nights? Last night: Night before last: 7. How often do you play computer games? Rarely Monthly Weekly A few times a week Types of games played: 108 Daily Appendix D: Boredom Proneness Survey 1. It is easy for me to concentrate on my activities. T IF 2. Frequently when I am working I find myself worrying about other things. T |F 3. Time always seems to be passing slowly. T F 4. I often find myself at "loose ends," not knowing what to do. T F 5. I am often trapped in situations where I have to do meaningless things. T IF 6. Having to look at someone's home movies or travel slides bores me tremendously. 7. I have projects in mind all the time, things to do. T IF T IF 8. I find it easy to entertain myself. TIF 9. Many things I have to do are repetitive and monotonous. T IF 10. It takes more stimulation to get me going than most people. T IF 11. 1 get a kick out of most things I do. TIF 12. 1 am seldom excited about my work. TIF 13. In any situation I can usually find something to do or see to keep me interested. 14. Much of the time I just sit around doing nothing. T IF 15. 1 am good at waiting patiently. T F 16. I often find myself with nothing to do-time on my hands. T IF 17. In situations where I have to wait, such as a line or queue, I get very restless. T IF 18. 1 often wake up with a new idea. T F 19. It would be very hard for me to find a job that is exciting enough. T IF 20. I would like more challenging things to do in life. T IF 21. I feel that I am working below my abilities most of the time. T IF 22. Many people would say that I am a creative or imaginative person. T IF 23. I have so many interests, I don't have time to do everything. T IF 24. Among my friends, I am the one who keeps doing something the longest. T IF 109 T IF Appendix E: Post-Experiment Survey Post-experiment Survey 1. How confident were you about the actions you took? Not Confident Somewhat Confident Confident Very Confident Extremely Confident 2. How did you feel you performed? Very Poor Poor Satisfactory Good Excellent 3. Overall, how busy did you feel during the mission? Idle Not Busy Busy Very Busy Extremely Busy 4. When did you feel the busiest during the experiment? 5. When did you feel the least busy during the experiment? 3. Overall, how frustrated did you feel during the mission? Very Somewhat Mildly Not very Not at all 4. When did you feel the most frustrated during the experiment? 5. When did you feel the least frustrated during the experiment? 4. Did you feel distracted at any point in the mission? Yes No If so, please list some of the items or activities that distracted you from the mission: 110 5. How quickly did you feel you detected threats? Slow Very slow Very Fast Fast 6. How clear was the alert that incoming missiles had been launched? Very Poor Poor Good Satisfactory Excellent 7. How comfortable did you feel with the interface? Very Uncomfortable slightly Uncomfortable Neutral Comfortable 8. What changes to the interface would help you improve your situational awareness? 9. Other comments: 111 Very Comfortable Appendix F: Message Panel Alert Times Onset time: 40 Minutes Time (sec) Message 1400 "REGIONAL BMDS ON ALERT" (False Alarm) 2400 "REGIONAL BMDS ON ALERT" (Event @ t=2500) 5000 "REGIONAL BMDS ON ALERT" (False Alarm) 8200 "REGIONAL BMDS ON ALERT" (False Alarm) 10800 "REGIONAL BMDS ON ALERT" (Event @ t=10900) Onset time: 100 Minutes Time (sec) Message 1400 "REGIONAL BMDS ON ALERT" (False Alarm) 2400 "REGIONAL BMDS ON ALERT" (False Alarm) 5000 "REGIONAL BMDS ON ALERT" (Event @ t=6100) 8200 "REGIONAL BMDS ON ALERT" (False Alarm) 10800 "REGIONAL BMDS ON ALERT" (Event @t=10900) Onset time: 160 Minutes Time (sec) Message 1400 "REGIONAL BMDS ON ALERT" (False Alarm) 2400 "REGIONAL BMDS ON ALERT" (False Alarm) 5000 "REGIONAL BMDS ON ALERT" (False Alarm) 9000 "REGIONAL BMDS ON ALERT" (Event @ t=9600) 10800 "REGIONAL BMDS ON ALERT" (Event @ t=10900) 112 Appendix G: Summary of Variables Variable Mean Std. Dev. Min. Median Max. Demographics Age (years) 21.3 2.51 18 21 31 12 male, 18 female 21 undergraduate, 7 Masters, 2 other 7.45 n/a n/a n/a n/a n/a n/a n/a n/a 1.56 5 7 12 Gender Occupation Sleep (1 night previoushours) _ Sleep (2 nights previous hours) 7.28 1.39 4 7.75 9 Video gaming ((Rank) Category Frequency) (1) Less than once per month -18 (2) Monthly -4 (3) Weekly - 3 (4) A few times per week -2 (5) Daily - 3 1.38 1 1 5 Mean: 1.93 Personality Indexes Boredom Proneness Five Factor Index: 5.76 3.56 1 5 17 Neuroticism 21 8.44 8 20 38 Extraversion 32.1 5.86 22 Openness to Experience 27.7 3.62 20 32.5 28.5 47 32 Agreeableness Conscientiousness Workload Assessment NASA TLX 32.2 32.2 4.27 6.59 23 32.5 40 21 31 48 4.94 1.61 2.47 4.97 9.07 1.23 0.504 1 1 3 HbO baseline 2.08 1.26 0.57 1.86 6.09 HbR baseline -0.72 HbT baseline HbO Avg. of Max. 1.92 2.73 0.62 1.03 2.24 -2.78 0.56 -0.568 1.69 -0.177 4.97 -0.51 1.86 8.01 HbR Avg. of Min. -0.89 0.57 -2.86 -0.84 0.09 Video Coding Distraction Coding (1-3) fNIRS Metrics (pmol/L) 113 HbT Avg. of Max. Physio. Response Metrics 2.42 1.93 -2.96 -0.73 1.57 HbO Time to Max (seconds). 106.4 64.0 0.5 108.75 199.5 HbO Return from Max to Baseline (seconds) 221.7 217.3 9.5 194.5 1174.5 HbO Level-Off Time (sec) 32.6 17.1 12 29 78 HbR Level-Off Time (sec) 23.1 11.7 11 17 46 HbO Start to Level-Off Slope 0.259 0.17 0.0285 0.201 0.778 (imol/L/min) HbR Start to Level-Off Slope -0.070 0.081 -0.342 -0.047 0.003 Average Response Time (seconds) 13.7 7.96 6.13 10.28 37.68 Average Missed Questions 0.4 0.67 0 0 2 Average Final Track Error 27.04 60.39 0.607 5.08 257.6 % Below Threshold Wave 2 81.1% 25.0% 16.7% 91.6% 100% Average Final Track Error 52.77 103.14 0.751 5.25 500 % Below Threshold 75.9% 31.4% 0% 100% 100% Long-Term Trends (imol/L/min) Chat Box Performance Wave 1 114 Appendix H Results Tables Multiple Comparisons Dependent Variable: PercentAvgtax0 Tukey HSD Mean Differiince aIStd. Error J) Imonse .46644 .066 100 1.1033 40 .303 .46544 -.7056 160 .066 .46644 -1.1033 40 100 .002 .46644 -1.8089. 160 .303 .7056 .46644 40 160 .002 .46644 1.B089' 100 Based on observed means. The error term Is Mean Square(Error) - 1.0868. '. The mean dlference is signifcant at the 9S% Conidence interval Lower Bound -.0616 r Bound 2.2681 -1.9703 -2.2681 -2.9738 -.492 .4592 .0616 .6441 1.8705 .6441 2.9738 Table 6: HbO Comparisons for Onset Time Multiple Comparisons Dependent Variable: percentAvgMInR Tukey HS Mean Diffrence 0IStd. Error sl, J to ()nstnse 40 100 .4812 .63930 .735 160 -1.1204 .63930 .207 .63930 .73$ 100 40 -.4812 160 -1.6015* .63930 .049 .63930 .207 160 40 1.1204 Lower Saind .049 .0050 1.60154 100 .63930 95s Confidence Interval -1,1153 -2.7169 -2.0777 -3.1981 -.4762 Upper Bound 2.0777 .4762 1.1153 -.0050 2.7169 3.1961 The error term Is Mean Square(Error) - 2.044. The rnean difference in sisncant at dhe Table 7: HbR Comparisons for Onset Time Multiple Comparisons Dependent Variable: ratIoAvgMax Tukey HSD Mien Difference (IStd. Error Mg. Lower Bound Upper Bound .134 40 100 -1.6260 .81248 .6498 .81248 .707 160 100 40 1.6260 .81248 .134 160 2.27581 .81248 .026 .51248 .707 40 -.6498 160 .81245 .026 100 -2.2758 Based on observed means. The error term Is Mean Square(Error) a 3.301. 1. The mean difference Is signifcant at the ID onset ILonset 95% Confidence Interval ) -3.6550 -1.3792 -. 4030 .2468 -2.6788 -4.3048 .4030 2.6786 3.6550 4.3048 1.3792 -. 2468 Table 8: HbO/HbR Marginal Means Onset Time Comparison 115 Model Summary R R IModal Ol stmt 1 u RSquare .6301 .397 1 .301 50.50335 a. Predictors: (Constan), A, distraction. AvgMinR. ideogame Moa 1 Regression Residual df 63764.705 4 Mean uare 10501.990 2 25.5_8 F 4.117 . ANOVA sum of Squares 42007.961 .it, Total 105772.667 29 a. Dependent Variable: FhnalfrackErrorl b. Predictors: (Constan), A, dIstraction. AVgMinR, vldeogame CoefftlentsO Unstandardized Coeffcients B Std. Error 1 (Constant) 195.560 51.393 videogame -17.637 7.208 distraction 37.540 18.240 AvginR -42.061 16.790 A -6.896 2.349 a. Dependent Variable: FkialTracklrrorl Standardized Coeffdents Beta Model -.405 .333 -.401 -.486 t 2.403 -2.447 519. .024 .022 2.058 .050 -2.505 -2.937 .019 .007 Table 9: Performance Model Summary Tests of Between-Subjects Effects Dependent Variable: T chatresponsetime e iMSum Tmrc X esu Corrected Model Intercept C dIfficulty onset difficulty * onset Error Total 856.358' - Adsef df Mean Square 6 1002.973 1 301.409 1 193.449 1 241.149 2 179.983 2 983.521 23 7488.793 30 Corrected Total 1839.879 29 a. R Squared - .465 (Adjusted R Squared 142.726 1002.973 301.409 193.449 120.575 89.992 42.762 - F 3.338 23.455 7.049 4.524 2.820 2.104 .326) Table 10: Chat Response Model Lateralization (Left-Right) Measure Wave 1 (p-value) Wave 2 (p-value) HbO (HbO L - R) 0.609 0.123 HbR (HbR L -R) 0.887 0.860 HbO/HbR 0.449 0.181 Table 11: Lateralization Effects 116 Sg .016 .000 .014 .044 .080 .145 Lateralization (Left-Right) Measure Wave 1 (p-value) Wave 2 (p-value) Diff. Time Diff Time HbO 0.73 0.42 0.35 0.81 HbR 0.79 0.38 0.73 0.11 HbO/HbR 0.30 0.54 0.40 0.86 & Table 12: Lateralization by Difficulty Time Lateralization (Left-Right) Performance Model for Average Final Error 1 (p-value) Wave 2 (p-value) Predictor Wave HbO (HbO Left-Right) 0.804 0.319 HbR (HbR Left-Right) 0.537 0.997 HbO/HbR 0.899 0.879 Table 13: Lateralization vs Performance Model 117 Appendix I: Return to baseline calculations SubjNo 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Base+1SD 1.382 7.4951 84.999 6.4328 0.93893 3.3831 8.9337 8.5228 74.576 -2.2255 4.8555 7.4287 6.2103 3.8036 3.6467 73.069 1.3145 5.2572 4.2215 3.9536 -2.1781 0.02712 -3.4383 7.8521 12.153 9.1268 0.3825 3.3024 8.8032 11.868 Max Val Time2Max 2.7047 91 12.072 0.5 175.16 95 26.155 68.5 2.3355 0.5 8.9803 95 17.021 93 14.633 4 164.56 57 5.742 3.5 8.8314 28 11.548 95 9.127 88.5 7.5472 100.5 5.5168 89 150.34 60.5 4.2627 13 5.6383 0.5 5.321 3.5 5.5636 99.5 4.2626 82 3.5231 98 1.0611 36 13.743 85.5 15.106 32 13.199 54 6.9281 43.5 12.122 99.5 11.109 0.5 17.579 100.5 118 Rtn2Base N/A N/A N/A 119 2 110.5 N/A 5.5 59.5 8.5 28 154.5 N/A 101 89 111 N/A N/A N/A 99.5 202.5 131.5 N/A 85.5 N/A N/A 111.5 152.5 95.5 107.5 Appendix J: Time elapsed: Final chat message to Wave 1 start Subj No Time chat before event (sec) 15.508 3 7.206 4 48.239 5 9.122 6 15.008 7 6.101 8 8.749 9 18.105 10 43.187 11 44.807 12 4.529 13 3.925 14 24.389 15 23.512 16 5.462 17 39.067 18 119.76 19 114.129 20 79.519 21 140.496 22 31.592 23 43.265 24 10.117 25 2.072 26 18.764 27 18.783 -28 36.653 29 36.204 30 117.6 31 1.922 32 119 Appendix K: fNIRS Data with Vigilance Features Plotted ' 0segee 0 ~T 1 4 -%L 40S0 AJ~. *CW Th~0 *A4o4 -HbO 06T Q es m*"-v 26 qm bO~WOs o44*1udo"OM~f UW4 ROesN&"n aop -rn I II '~ i 10 ~ 0 N if0 o am we TMW4"0 120 00 to" 9bW* 30 5 HiOUasp. 0A8 a3maemin ma ulpw 0 0030M Sor..sIM01 MO0 L~vOffM 14 nInSs aOR Is I I..m MIT -4 0 30!- to 000 0 M60Slope. 0.264041ismsbuole I II i iI I I Is i 111 11 10 0 Mba se: 0 miwSomolobAl Mbo LeO4M. 4s ""An aol- 12000 10000 w0o i11v 1V NIf Ii iuuuI 1 tI U'I F r!if 6 0 4 0 200 10000 400 121 1000 30 I bO Sosp, 0,02M 3wam4sr~a 25 NbO tav*OR: -- It M HbO HbOft Mbaft 20 15 020 4000 r000 0000 30 ~ Su ap. 00WNoww8 O 25 Sspr 4000710 leuwm 14110 -. eou pHmIM MW LOvOM 43S -bA FM, s0 10 -10--c - --- --- -'- -200 400 two0 am0 122 I000 00 swo1 19 -HW HbOSSe: 0.IflSI 10mN "b slp: 0 .U NbO L*Velv0: 25 tftoft iumiu** 8*s44 20 Is 10 5 0 4 2000 4000 . 000 - *0 10000 12000 Outiec 10 30 r 0 Slope: 0213,t wsoab*iAn I* Sap: 4,036124 NIWMMoIKhi~n bO Lowe4*: 29 mies 25 MO it 20 Is to 5 0 .5 0 2000 4000 000 80 Taes 123 10000 ad* 30 MWOgsp.' 02#07 I -t*0 1*0 ft MHbR aktouab* IbR ShW 0 QWinrshArbm 25 Mb0 LIowi-. 3 ao 20 "Ut 02000 sm0 *hJ"" 12 30 H OS MW. 9 .00 fe o~ i 034umosA Hb0 -Nb 25 20- Eto 0 2m0 wo0 4w0 124 10000 H ft"No 20 13 iRnIOMN a,2 NwoOop 0 Mbf lSp' ft 07w 2 .Wwwuiw i"R 25 MbO Levek*: 2 aboAf -"oft ~~~ ftp 2s Is 10i 5 0 4 Ewed 4 1 2000 000 SM 30 14 ---- HbomsOAe;O WA 76wasmftm ibs sop: e i.O Lsv*M:3. 4? .ANu isi lit I0 S 0 .5 Ivent 0 20oo 4000 12000 WOO r0w 125 .. .1 114o~ 0""e Is 30 bO9p:. 420.14M s ukumoi HbW oko -06066 44isrm a usohim ssa w. Nbo LV*.. /t MbRF to 0 -0 -5 -O oi 0 000 000 10000 30 HbO Ope: 0.10061 *iNemoftin !*A A : -0 0737 mIwwom#*wa 25 Hb Fi k ~HRR HbO Lev*04. 34 uaftfif 20 Is 10 5 0 -101 2000 4000 8000 Ti"- 8000 126 10000 12000 9*4"e 1? 30 26 16 - "bo Ismp 0.064319 WM..W& MbA SI: 40$774 um,*w*rn N*0 1bo Uwe-ON 784in F to 5 0 .5 Ev.Wt '*01 0 20W0 4M00 loom0 TWO 0 ,2000 &*I...* 30 MeO 0"p: O.SWI4 wftvmoktrksi !bn asUp: 47610 uiemmolw*'* 1*0 wN - wufts .evsI":21 Is 10 5 0 Evwnt I * .101. 0 10000 4w00 127 1ism0 oIpe: 0 .313 mk#*wokrd* - IM0 H0 w LeveOk: 13 nut.s 20 15 k toi 6 0 .5 4 0 r fEwt 000 jSW* 30 MbOSp: 0.1021Oaiwosw bR ROa. -0045 bO Lvwe I2000 m0 .(.. 20 Mb bgHOft mmalNiA*' : 14 r6aoa 20 to I0 5 0 2000 400 .am a000 7e1("0 128 ,0000 ,.O. SA2at 30 HbO MWp; 0.1302 ewawfmmrn HbA Ob 21 0 mic: minmb 20 Is- to h4 0 -5 -10 am0 4000 I I KO 10000 Tim ('.0 *WSd 22 30 v NbO Oop. 0.14MmsmmawaMbk 25 hI Sope: .0,04740 19romMw HbO LOV.I0 20m*AG in hbm bO Oft 20 I 16 to S , I ' I i 0 2000 4000 00 0000 ame~en 129 I I 10000 12000 I x IL] ii m IIi a I I Is 0 4; 8 I I ii! I Li I 9 III a I I' I I, I 2 0- a I~ 40dr i Ii I L. I C0 30 25 --- IO sk: CAVO weaiain 1b*"e: -00M *#NW~koo rw'J LW".. UW 20 HW MR - v~ r I -S I--.i0~ -, 1 17 -",,1 4M~ 200 ft0 om ow0 -"bO 17.1 - 14 0 Fa SI 25 U 20 ia Is to 5 0 -6 .10' 0 I 4m0 sow0 WIM T**("Co 131 60" 27 NbO HbO8&p :0.2 s yasmubim HbR UWop. 0 iwomaoui .a O Lvf6Ot, 41 ,iiss 20 15 10 0 .4 -10~ p 2000 4=0 am0 10000 12000 TV" (E0 30 -~bO bO Sop.: 0 XdWOWAbANiM a HAn Sope: -034204 awanraMin HbO LevelS: 0 uauss - 20 ,'1 11411111 1i -5- 0 2 4M0 80 r000 T..<.. 132 0000 12 I 8 ;ii Ih I ~ , 0 wmiw=ii-- dlmmNmFm9=:;;" S - W # I 0 Ii 0 , I Iii Iii I- 0 I A -. Sbed 3t 30 -o , HbO ope: 013017 eoamahn NbOtev.M 02lms 20- . 1 LA to ~ e ~ Tt -5 0 2000 8000 4000 TaS (32 0 32 -bQ HbO 6p.: 0.3063 oesmosse *A Np. .4070102 IwomaWhr0 2a - HO Lgev*: 34 O.N. i ilk I W1I I 12000 10000 Nbfk "M 11 it ,III diA, LJ1 11: 11 10 f 1-1 1 11' I Eve s 0 10 0 2000 4000 tit, Te000 amo 000 134 t 10000 12000 References . Alfredson, J., Holmberg, J., Andersson, R., & Wikforss, M. (2011). Applied cognitive ergonomics design principles for fighter aircraft EngineeringPsychology and Cognitive Ergonomics (pp. 473-483): Springer. Alves, E. E., & Kelsey, C. M. (2010). Combating Vigilance Decrement in a Single-Operator Radar Platform. Ergonomics in Design: The Quarterlyof Human FactorsApplications, 18(2), 6-9. doi: 10.1518/106480410x12737888532688 Ayaz, H., Shewokis, P. A., Bunce, S., Izzetoglu, K., Willems, B., & Onaral, B. (2012). Optical brain monitoring for operator training and mental workload assessment. Neuromage, 59(1), 36-47. Bainbridge, L. (1983). Ironies of Automation. Automatica, 19(6), 775-779. Barmack, J. E. (1939). A Definition of Boredom: A Reply to Mr. Berman. The American Journal ofPsychology, 52(3), 467-471. doi: 10.2307/1416759 Battiste, V., & Bortolussi, M. (1988). Transportpilot workload: A comparison oftwo subjective techniques. Paper presented at the Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Anaheim, CA. Beatty, J. (1982). Task-evoked pupillary responses, processing load, and the structure of processing resources. PsychologicalBulletin, 91(2), 276. Bejtlich, R. (2013). The Practiceof Network Security Monitoring: UnderstandingIncident Detection and Response: No Starch Press. Bekier, M., Molesworth, B. R., & Williamson, A. (2011). Defining the drivers for accepting decision making automation in air traffic management. Ergonomics, 54(4), 347-356. Bennett, C. M., & Miller, M. B. (2010). How reliable are the results from functional magnetic resonance imaging? Annals of the New York Academy of Sciences, 1191(1), 133-155. Berguer, R., Smith, W., & Chung, Y. (2001). Performing laparoscopic surgery is significantly more stressful for the surgeon than open surgery. Surgical endoscopy, 15(10), 1204-1207. Berka, C., Levendowski, D. J., Lumicao, M. N., Yau, A., Davis, G., Zivkovic, V. T., . . . Craven, P. L. (2007). EEG correlates of task engagement and mental workload in vigilance, learning, and memory tasks. Aviation, space, and environmental medicine, 78(Supplement 1), B23 1-B244. Berka, C., Levendowski, D. J., Ramsey, C. K., Davis, G., Lumicao, M. N., Stanney, K., . . Stibler, K. (2005). Evaluation of an EEG workload model in an Aegis simulation environment. Paper presented at the Defense and Security. Boles, D. B., & Adair, L. P. (2001). The multiple resourcesquestionnaire(MRQ). Paper presented at the Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Minneapolis, MN. Boot, W. R., Kramer, A. F., Simons, D. J., Fabiani, M., & Gratton, G. (2008). The effects of video game playing on attention, memory, and executive control. A cta Psychologica, 129(3), 387-398. Broadbent, D. E. (1958). The general nature of vigilance. Perceptionand Communication, 108139. Brown, G. H., & Carroll, C. D. (1984). The Effect of Anxiety and Boredom on Cognitive Test Performance. 135 Bruursema, K., Kessler, S. R., & Spector, P. E. (2011). Bored employees misbehaving: The relationship between boredom and counterproductive work behaviour. Work & Stress, 25(2), 93-107. Buckner, R. L., & Logan, J. M. (2001). Functional neuroimaging methods: PET and fMRI. Handbook offunctional neuroimagingof cognition, 27-48. Burton, R. R. (1980). Human responses to repeated high G stimulated aerial combat maneuvers. Aviation, space, and environmental medicine, 51(11), 1185. Cabeza, R., & Kingstone, A. (2001). Handbook offunctional neuroimagingof cognition: Mit Press. Cabeza, R., & Nyberg, L. (2000). Imaging cognition II: An empirical review of 275 PET and fMRI studies. Journal of cognitive neuroscience, 12(1), 1-47. Caggiano, D. M., & Parasuraman, R. (2004). The role of memory representation in the vigilance decrement. Psychonomic bulletin & review, 11(5), 932-937. Caldwell, J. A. (2005). Fatigue in aviation. Travel Medicine andInfectious Disease, 3(2), 85-96. Carr, V. A., Rissman, J., & Wagner, A. D. (2010). Imaging the Human Medial Temporal Lobe with High-Resolution f4RI. Neuron, 65(3), 298-308. doi: http://dx.doi.org/0.1016/i.neuron.2009.12.022 Causse, M., Peran, P., Dehais, F., Caravasso, C. F., Zeffiro, T., Sabatini, U., & Pastor, J. (2013). Affective decision making under uncertainty during a plausible aviation task: An fMRI study. Neuromage, 71(0), 19-29. doi: http://dx.doi.org/I0.1016/i.neuroimage.2012.12.060 Chance, B., Anday, E., Nioka, S., Zhou, S., Hong, L., Worden, K., ... Thomas, R. (1998). A novel method for fast imaging of brainfunction, non-invasively, with light. Opt. Express, 2(10), 411-423. Chen, J. Y., & Barnes, M. J. (2012). Supervisory Control of Multiple Robots Effects of Imperfect Automation and Individual Differences. Human Factors: The Journalof the Human Factorsand Ergonomics Society, 54(2), 157-174. Clare, A. S., Cummings, M. L., How, J. P., Whitten, A. K., & Toupet, 0. (2012). Operator Object Function Guidance for a Real-Time Unmanned Vehicle Scheduling Algorithm. JournalofAerospace Computing, Information, and Communication, 9(4), 161-173. Cohen, J. D., Forman, S. D., Braver, T. S., Casey, B. J., Servan-Schreiber, D., & Noll, D. C. (1993). Activation of the prefrontal cortex in a nonspatial working memory task with functional MRI. Human Brain Mapping, 1(4), 293-304. doi: 10.1002/hbm.460010407 Cooper, G. E., & Harper Jr, R. P. (1969). The use of pilot rating in the evaluation of aircraft handling qualities. Neuilly-sur-Seine, France: NATO Advisory Group for Aerospace Research and Development. Costa Jr, P. T., McCrae, R. R., & Dye, D. A. (1991). Facet scales for agreeableness and conscientiousness: A revision of tshe NEO personality inventory. Personalityand individual differences, 12(9), 887-898. doi: http://dx.doi.org/10.1016/01918869(91)90177-D Coyle, S., Ward, T., & Markham, C. (2003). Brain& Computer Interfaces: A Review. InterdisciplinaryScience Reviews, 28(2), 112-118. doi: 10.1179/030801803225005102 136 Cui, X., Bray, S., Bryant, D. M., Glover, G. H., & Reiss, A. L. (2011). A quantitative comparison of NIRS and fMRI across multiple cognitive tasks. Neurolmage, 54(4), 28082821. doi: http://dx.doi.org/10.1016/i.neuroimage.2010.10.069 Cummings, M. L. (2004). The Need for Command and Control Instant Message Adaptive Interfaces: Lessons Learned from Tactical Tomahawk Human-in-the-Loop Simulations CyberPsychology & Behavior, 7(6). Cummings, M. L., Clare, A., & Hart, C. (2010). The role of human-automation consensus in multiple unmanned vehicle scheduling. Human Factors: The Journalof the Human Factorsand Ergonomics Society, 52(1), 17-27. Curtis, C. E., & D'Esposito, M. (2003). Persistent activity in the prefrontal cortex during working memory. Trends in cognitive sciences, 7(9), 415-423. D'Mello, S., Chipman, P., & Graesser, A. (2007). Postureas a predictorof learner's affective engagement. Paper presented at the Proceedings of the 29th Annual Cognitive Science Society, Nashville, TN. Damos, D. L. (1991). Multiple task performance: CRC Press. Davies, D., & Krkovic, A. (1965). Skin-conductance, alpha-activity, and vigilance. The American Journalof Psychology, 78(2), 304-306. Davies, D., & Parasuraman, R. (1982). The psychology ofvigilance: Academic Press London. De Waard, D., & Studiecentrum, V. (1996). The measurement of drivers'mental workload: Groningen University, Traffic Research Center. Dickens, C. (1853). Bleak House. England: Bradbury & Evans. Dickerson, S. S., & Kemeny, M. E. (2004). Acute stressors and cortisol responses: a theoretical integration and synthesis of laboratory research. PsychologicalBulletin, 130(3), 355. Drory, A. (1982). Individual Differences in Boredom Proneness and Task Effectivness at Work. [Article]. PersonnelPsychology, 35(1), 141-151. Droste, D. W., Harders, A. G., & Rastogi, E. (1989). Two transcranial doppler studies on blood flow velocity in both middle cerebral arteries during rest and the performance of cognitive tasks. Neuropsychologia, 27(10), 1221-1230. doi: http://dx.doi.org/10.1016/0028-3932(89)90034-1 Durantin, G., Gagnon, J.-F., Tremblay, S., & Dehais, F. (2014). Using near infrared spectroscopy and heart rate variability to detect mental overload. Behaviouralbrain research, 259, 1623. Dussault, C., Jouanin, J.-C., Philippe, M., & Guezennec, C.-Y. (2005). EEG and ECG changes during simulator operation reflect mental workload and vigilance. Aviation, space, and environmentalmedicine, 76(4), 344-351. Dyer-Smith, M. B., & Wesson, D. A. (1995). Boredom and expert error. Contemporary Ergonomics, 56-56. Eastwood, J. D., Frischen, A., Fenske, M. J., & Smilek, D. (2012). The Unengaged Mind Defining Boredom in Terms of Attention. Perspectiveson PsychologicalScience, 7(5), 482-495. Endsley, M. R. (1995). Toward a theory of situation awareness in dynamic systems. Human Factors: The Journalof the Human Factorsand Ergonomics Society, 3 7(1), 32-64. 137 Endsley, M. R., & Rodgers, M. D. (1997). Distribution of attention, situation awareness, and workload in a passive air traffic control task: Implications for operational errors and automation. Air Traffic Control Quarterly, 6(1), 21-44. Fahiman, S. A., Mercer-Lynn, K. B., Flora, D. B., & Eastwood, J. D. (2013). Development and validation of the multidimensional state boredom scale. Assessment, 20(1), 68-85. Farmer, R. a. S., N.D. (1986). Boredom Proneness: The Development and Correlates of a New Scale. Journalof PersonalityAssessment(50), 4-17. Fisher, C. D. (1993). Boredom at Work: A Neglected Concept. Human Relations, 46(3), 395417. doi: 10.1177/001872679304600305 Forest, L. M., Kahn, A., Thomer, J., & Shapiro, M. (2007). The Design and Evaluationof Human-GuidedAlgorithmsforMission Planning. Paper presented at the Human Systems Integration Symposium, Annapolis, MD. Frankenhaeuser, M., & Lundberg, U. (1982). Psychoneuroendocrine aspects of effort and distress as modified by personal control. Mental load and stress in activity European approaches,97-103. Frankenhaeuser, M., Nordheden, B., Myrsten, A.-L., & Post, B. (1971). Psychophysiological reactions to understimulation and overstimulation. Acta Psychologica, 35(4), 298-308. Frankenhaeuser, M., & Patkai, P. (1965). Interindividual differences in catecholamine excretion during stress. ScandinavianJournalofPsychology, 6(4), 117-123. Gagnon, L., Yfcel, M. A., Dehaes, M., Cooper, R. J., Perdue, K. L., Selb, J., . . . Boas, D. A. (2012). Quantification of the cortical contribution to the NIRS signal over the motor cortex using concurrent NIRS-tMRI measurements. Neurolmage, 59(4), 3933-3940. Gianaros, P. J., Van der Veen, F. M., & Jennings, J. R. (2004). Regional cerebral blood flow correlates with heart period and high - frequency heart period variability during working - memory tasks: Implications for the cortical and subcortical regulation of cardiac autonomic activity. Psychophysiology, 41(4), 521-530. Girourd, A., Solovey, E., Hirshfield, L., Chauncey, K., Sassaroli, A., Fantini, S., & Jacob, R. (2009). Distinguishing Difficulty Levels with Non-invasive Brain Activity Measurements. INTER ACT 2009, Part1, 440-452. Green, C. S., & Bavelier, D. (2003). Action video game modifies visual selective attention. Nature, 423(6939), 534-537. Grubb, E. A. (1975). Assembly Line Boredom and Individual Differences in Recreation Participation. JournalofLeisure Research. Haller, S., Bartsch, A., Radue, E., Klarh5fer, M., Seifritz, E., & Scheffler, K. (2005). Effect of fMRI acoustic noise on non-auditory working memory task: comparison between continuous and pulsed sound emitting EPI. Magnetic ResonanceMaterials in Physics, Biology and Medicine, 18(5), 263-271. doi: 10.1007/si0334-005-0010-2 Hamilton, J. A., Haier, R. J., & Buchsbaum, M. S. (1984). Intrinsic enjoyment and boredom coping scales: Validation with personality, evoked potential and attention measures. Personalityand individual differences, 5(2), 183-193. Hancock, P., Mihaly, T., Rahimi, M., & Meshkati, N. (1988). A bibliographic listing of mental workload research. Advances in Psychology, 52, 329-333. Hancock, P. A., & Desmond, P. A. (2001). Stress, workload, andfatigue: Psychology Press. 138 . Hancock, P. A., & Krueger, G. P. (2010). Hours of Boredom, Moments of Terror: Temporal Desynchrony in Military and Security Force Operations. Washington, DC: Center for Technology and National Security Policy, National Defense University. Hancock, P. A., & Warm, J. S. (1989). A Dynamic Model of Stress and Sustained Attention. Human Factors:The Journalof the Human Factorsand Ergonomics Society, 31(5), 519537. doi: 10.1177/001872088903100503 Harrison, J., Izzetoglu, K., Ayaz, H., Willems, B., Hah, S., Woo, H., . . . Onaral, B. (2013). Human Performance Assessment Study in Aviation Using Functional Near Infrared Spectroscopy FoundationsofAugmented Cognition (pp. 433-442): Springer. Hart, C. (2010). Assessing the Impact of Low Workload in Supervisory Control of Networked Unmanned Vehicles. Hart, S. G. (2006). NASA-task load index (NASA-TLX); 20 years later. Paper presented at the Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Baltimore, MD. Hart, S. G., & Sheridan, T. B. (1984). Pilot workload, performance, and aircraft control automation. Moffett Field, CA: National Aeronautics and Space Administration Ames Research Center. Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Human mental workload, 1(3), 139-183. Hart, S. G., & Wickens, C. D. (1990). Workload assessment and prediction Manprint(pp. 257296): Springer. Haworth, L. A., Atencio Jr, A., Bivens, C., Shively, R., & Delgado, D. (1987). Advanced helicopter cockpit and control configurations for helicopter combat mission tasks. The Man-Machine Interface:"in TacticalAircraft Design and Combat Automation, 39. Heberlein, L. T., Dias, G. V., Levitt, K. N., Mukherjee, B., Wood, J., & Wolber, D. (1990). A network security monitor. Paper presented at the Research in Security and Privacy, 1990. Proceedings., 1990 IEEE Computer Society Symposium on. Helton, W. S., Warm, J. S., Tripp, L. D., Matthews, G., Parasuraman, R., & Hancock, P. A. (2010). Cerebral lateralization of vigilance: A function of task difficulty. Neuropsychologia, 48(6), 1683-1688. doi: http://dx.doi.org/0.1016/i.neuropsychologia.2010.02.014 Heron, W. (1957). The pathology of boredom. Scientific American. Hill, S. G., Zaklad, A. L., Bittner, A. C., Byers, J. C., & Christ, R. E. (1988). Workload assessment of a mobile air defense missile system. Paper presented at the Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Anaheim, CA. Hirshfield, L. M., Chauncey, K., Gulotta, R., Girouard, A., Solovey, E. T., Jacob, R. J., . . Fantini, S. (2009). CombiningElectroencephalographand FunctionalNear Infrared Spectroscopy to Explore Users'Mental Workload. Paper presented at the Proceedings of the 5th International Conference on Foundations of Augmented Cognition. Neuroergonomics and Operational Neuroscience: Held as Part of HCI International 2009, San Diego, CA. Hjortskov, N., Rissdn, D., Blangsted, A., Fallentin, N., Lundberg, U., & Sogaard, K. (2004). The effect of mental stress on heart rate variability and blood pressure during computer work. 139 EuropeanJournalofApplied Physiology, 92(1-2), 84-89. doi: 10.1007/s00421-004-1055z Hopkin, V. D. (1988). Air traffic control. In E. L. W. D. C. Nagel (Ed.), Humanfactorsin aviation (pp. 639-663). San Diego, CA, US: Academic Press. Hopkin, V. D. (1995). Humanfactors in air traffic control: CRC Press. Hu, B., Majoe, D., Ratcliffe, M., Qi, Y., Zhao, Q., Peng, H., . . . Moore, P. (2011). EEG-Based Cognitive Interfaces for Ubiquitous Applications: Developments and Challenges. IntelligentSystems, IEEE, 26(5), 46-53. Huey, B. M., & Wickens, C. D. (1993). Workload Transition:Implicationsfor Individualand Team Performance:The National Academies Press. Huppert, T. J., Hoge, R. D., Diamond, S. G., Franceschini, M. A., & Boas, D. A. (2006). A temporal comparison of BOLD, ASL, and NIRS hemodynamic responses to motor stimuli in adult humans. Neurolmage, 29(2), 368-382. doi: http://dx.doi.org/10.1016/i.neuroimage.2005.08.065 Iso-Ahola, S. E., & Weissinger, E. (1987). Leisure and boredom. Journalofsocial and clinical psychology, 5(3), 356-364. Izzetoglu, K., Ayaz, H., Merzagora, A., Izzetoglu, M., Shewokis, P. A., Bunce, S., . . . Onaral, B. (2011). The Evolution of Field Deployable fNIR Spectroscopy from Bench to Clinical Settings. JournalofInnovative OpticalHealth Sciences, 04(03), 239-250. doi: doi: 10.1142/S1793545811001587 Izzetoglu, M., Izzetoglu, K., Bunce, S., Ayaz, H., Devaraj, A., Onaral, B., & Pourrezaei, K. (2005). Functional near-infrared neuroimaging. Neural Systems and Rehabilitation Engineering,IEEE Transactionson, 13(2), 153-159. Jaeggi, S. M., Seewer, R., Nirkko, A. C., Eckstein, D., Schroth, G., Groner, R., & Gutbrod, K. (2003). Does excessive memory load attenuate activation in the prefrontal cortex? Loaddependent processing in single and dual tasks: functional magnetic resonance imaging study. Neurolmage, 19(2 Pt 1), 210-225. Jagacinski, R. J. (1989). Target acquisition: Performance measures, process models, and design implications Applications of Human PerformanceModels to System Design (pp. 135149): Springer. Janis, I. L., & Mann, L. (1977). Decision making: A psychological analysis of conflict, choice, and commitment. New York, NY, US: Free Press. Jelzow, A., Tachtsidis, I., Kirilina, E., Niessing, M., BrUhl, R., Wabnitz, H., . . . Macdonald, R. (2011). Simultaneous measurement of time-domainfNVIRS andphysiological signals during a cognitive task. Paper presented at the European Conferences on Biomedical Optics. J6bsis, F. F. (1977). Noninvasive, infrared monitoring of cerebral and myocardial oxygen sufficiency and circulatory parameters. Science (New York, N. Y), 198(4323), 1264-1267. Kahneman, D. (1973). Attention and effort: Prentice Hall. Kahneman, D. (2011). Thinking, fast andslow: Macmillan. Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases: Cambridge University Press. 140 . Kahol, K., Leyba, M. J., Deka, M., Deka, V., Mayes, S., Smith, M., . . . Panchanathan, S. (2008). Effect of fatigue on psychomotor and cognitive skills. The American JournalofSurgery, 195(2), 195-204. doi: http://dx.doi.org/10.1016/i.amisurg.2007.10.004 Kantowitz, B., & Campbell, J. (1994). Pilot workload and flightdeck automation. Automation and Human Preformance:Theory andApplications, 117-136. Kessel, C. J., & Wickens, C. D. (1982). The transfer of failure-detection skills between monitoring and controlling dynamic systems. Human Factors: The Journalof the Human FactorsandErgonomics Society, 24(1), 49-60. Klein, G., & Zsambok, C. E. (1997). Naturalisticdecision making: Erlbaum, Lawrence, Associates. Klein, M. I., Riley, M. A., Warm, J. S., & Matthews, G. (2005). Perceivedmental workload in an endocopic surgerysimulator. Paper presented at the Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Orlando, FL. Knowles, W. (1963). Operator loading tasks. Human Factors: The Journalof the Human Factorsand Ergonomics Society, 5(2), 155-16 1. Koechlin, E., Basso, G., Pietrini, P., Panzer, S., & Grafinan, J. (1999). The role of the anterior prefrontal cortex in human cognition. Nature, 399(6732). Kramer, A., & Parasuraman, R. (2007). Neuroergonomics-application of neuroscience to human factors. Handbook ofpsychophysiology, 2, 704-722. Kramer, A. F. (1991). Physiological metrics of mental workload: A review of recent progress. Multiple-taskperformance, 279-328. Kroes, S. (2007). Detecting Boredom in Meetings. Enschede, Netherlands, University of Twente, 1-5. Landrigan, C. P., Rothschild, J. M., Cronin, J. W., Kaushal, R., Burdick, E., Katz, J. T., . . Czeisler, C. A. (2004). Effect of reducing interns' work hours on serious medical errors in intensive care units. New EnglandJournalof Medicine, 351(18), 1838-1848. Larson, R. W., & Richards, M. H. (1991). Boredom in the middle school years: Blaming schools versus blaming students. American Journalof Education, 418-443. Leary, M. R., Rogers, P. A., Canfield, R. W., & Coe, C. (1986). Boredom in interpersonal encounters: Antecedents and social implications. JournalofPersonality andSocial Psychology, 51(5), 968. Lee, T. (1986). Toward the development and validation of a measure ofjob boredom. Manhattan College Journalof Business, 15(1), 22-28. Le6n-Carri6n, J., & Le6n-Dominguez, U. (2012). Functional Near-Infrared Spectroscopy (fNIRS): Principles and Neuroscientific Applications.Neuroimaging-Methods (InTech). Lipshitz, R., & Strauss, 0. (1997). Coping with Uncertainty: A Naturalistic Decision-Making Analysis. OrganizationalBehavior and Human DecisionProcesses, 69(2), 149-163. doi: http://dx.doi.org/I0.1006/obhd.1997.2679 Llaneras, R. E., Salinger, J., & Green, C. A. (2013). Human FactorsIssues Associated with LimitedAbility Autonomous DrivingSystems: Drivers'Allocationof VisualAttention to the ForwardRoadway. Paper presented at the Proceedings of the Seventh International Driving Symposium on Human Factors in Driver Assessment, Training, and Vehicle Design. 141 . Lloyd-Fox, S., Blasi, A., & Elwell, C. (2010). Illuminating the developing brain: the past, present and future of functional near infrared spectroscopy. Neuroscience & Biobehavioral Reviews, 34(3), 269-284. Lockley, S. W., Barger, L. K., Ayas, N. T., Rothschild, J. M., Czeisler, C. A., Landrigan, C. P., . . Safety, G. (2007). Effects of Health Care Provider Work Hours and Sleep Deprivation on Safety and Performance. Joint Commission Journalon Quality and PatientSafety, 33(11), 7-18. Logothetis, N. K., Pauls, J., Augath, M., Trinath, T., & Oeltermann, A. (2001). Neurophysiological investigation of the basis of the fMRI signal. Nature, 412(6843), 150-157. Luld, D., Noirhomme, Q., Kleih, S. C., Chatelle, C., Halder, S., Demertzi, A., ... Schnakers, C. (2012). Probing command following in patients with disorders of consciousness using a brain-computer interface. ClinicalNeurophysiology. Lundberg, U. (2005). Stress hormones in health and illness: the roles of work and gender. Psychoneuroendocrinology,30(10), 1017-1021. Lysaght, R. J., Hill, S. G., Dick, A., Plamondon, B. D., & Linton, P. M. (1989). Operator workload: Comprehensive review and evaluation of operator workload methodologies: DTIC Document. Mackworth, N. H. (1948). The breakdown of vigilance durning prolonged visual search. QuarterlyJournalofExperimentalPsychology, 1(1), 6-21. doi: 10.1080/17470214808416738 Mandeville, J. B., Marota, J. J., Ayata, C., Moskowitz, M. A., Weisskoff, R. M., & Rosen, B. R. (1999). MRI measurement of the temporal evolution of relative CMRO 2 during rat forepaw stimulation. Magnetic Resonance in Medicine, 42(5), 944-951. Manoach, D. S., Schlaug, G., Siewert, B., Darby, D. G., Bly, B. M., Benfield, A.,... Warach, S. (1997). Prefrontal cortex fMRI signal changes are correlated with working memory load. Neuroreport, 8(2), 545-549. Manyakov, N. V., Chumerin, N., Combaz, A., & Van Hulle, M. M. (2011). Comparison of classification methods for P300 brain-computer interface on disabled subjects. Computationalintelligence and neuroscience, 2011, 2. Manzey, D. (2000). Monitoring of mental performance during spaceflight. Aviation, space, and environmental medicine, 71(9 Suppl), A69. Manzey, D., Lorenz, B., & Poljakov, V. (1998). Mental performance in extreme environments: results from a performance monitoring study during a 438-day spaceflight. Ergonomics, 41(4), 537-559. Marois, R., & Ivanoff, J. (2005). Capacity limits of information processing in the brain. Trends in cognitive sciences, 9(6), 296-305. doi: httn://dx.doi.org/l 0.1016/i.tics.2005.04.010 Marxen, M., Cassidy, R. J., Dawson, T. L., Ross, B., & Graham, S. J. (2012). Transient and sustained components of the sensorimotor BOLD response in fMRI. Magnetic Resonance Imaging, 30(6), 837-847. doi: http://dx.doi.org-/10.1016/i.mri.2012.02.007 May, J. G., Kennedy, R. S., Williams, M. C., Dunlap, W. P., & Brannan, J. R. (1990). Eye movement indices of mental workload. Acta Psychologica, 75(1), 75-89. doi: http://dx.doi.org/10.1016/0001-6918(90)90067-P 142 McCarthy, G., Blamire, A. M., Puce, A., Nobre, A. C., Bloch, G., Hyder, F., . . . Shulman, R. G. (1994). Functional magnetic resonance imaging of human prefrontal cortex activation during a spatial working memory task. Proceedingsof the NationalAcademy ofSciences, 91(18), 8690-8694. McCrae, R., & Costa Jr, P. (1999). A five-factor theory of personality. Handbook ofpersonality: Theory and research, 2, 139-153. McCrae, R. R., & Costa, P. T. (2010). NEO Inventories. Lutz, FL: PAR Publishing. Meek, J. H., Firbank, M., Elwell, C. E., Atkinson, J., Braddick, 0., & Wyatt, J. S. (1998). Regional hemodynamic responses to visual stimulation in awake infants. Pediatric Research, 43(6), 840-843. Merrifield, C., & Danckert, J. (2014). Characterizing the Psychophysiological Signature of Boredom. ExperimentalBrain Research, 232.2, 481-491. Meshkati, N., & Hancock, P. (2011). Human mental workload. Amsterdam: North-Holland. Meyer, W.-U., Reisenzein, R., & Schtitzwohl, A. (1997). Toward a process analysis of emotions: The case of surprise. Motivation andEmotion, 21(3), 251-274. Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual review of neuroscience, 24(1), 167-202. Miller, J. L., Szalma, L. C., Warm, E. M., Hitchcock, J. S., & Dember, W. N. (1999). Intraclass andInterclass Transfer of Trainingfor Vigilance. Paper presented at the Automation Technology and Human Performance: Current Research and Trends;[proceedings of the Third Conference on Automation Technology and Human Performance Held in Norfolk, VA, March 25-28, 1998]. Mkrtchyan, A., Macbeth, J., Solovey, E., Ryan, J., & Cummings, M. (2012, October 2012). Using Variable-RateAlerting to CounterBoredom in Human Supervisory Control. Paper presented at the 56th Annual Meeting of the Human Factors and Ergonomics Society, Boston, MA. Monchi, 0., Petrides, M., Petre, V., Worsley, K., & Dagher, A. (2001). Wisconsin Card Sorting revisited: distinct neural circuits participating in different stages of the task identified by event-related functional magnetic resonance imaging. The Journal ofNeuroscience, 21(19), 7733-7741. Moray, N. (1988). Mental workload since 1979. InternationalReviews ofErgonomics, 2(2), 123150. Moray, N. E. (1979). Mental workload: Its theory and measurement. New York: Plenum Press. Murdock Jr, B. B. (1962). The serial position effect of free recall. Journalof experimental psychology, 64(5), 482. Murphy, R., & Shields, J. (2012). Task ForceReport: The Role ofAutonomy in DoD Systems. Washington, D.C.: Defense Science Board. Napadow, V., Dhond, R., Conti, G., Makris, N., Brown, E. N., & Barbieri, R. (2008). Brain correlates of autonomic modulation: combining heart rate variability with fMRI. Neurolmage, 42(1), 169-177. Nevin, J. A., Mandell, C., & Atak, J. R. (1983). The analysis of behavioral momentum. Journal ofthe Experimentalanalysis of behavior, 39(1), 49-59. Nijholt, A., Bos, D. P.-O., & Reuderink, B. (2009). Turning shortcomings into challenges: Brain-computer interfaces for games. EntertainmentComputing, 1(2), 85-94. 143 O'Donnell, R., & Eggemeier, F. T. (1986). Workload assessment methodology. Measurement Technique, 42, 5. Obata, T., Liu, T. T., Miller, K. L., Luh, W.-M., Wong, E. C., Frank, L. R., & Buxton, R. B. (2004). Discrepancies between BOLD and flow dynamics in primary and supplementary motor areas: application of the balloon model to the interpretation of BOLD transients. Neurolmage, 21(1), 144-153. Ochsner, K. N., Bunge, S. A., Gross, J. J., & Gabrieli, J. D. (2002). Rethinking feelings: An fMRI study of the cognitive regulation of emotion. Journalof cognitive neuroscience, 14(8), 1215-1229. Parasuraman, R. (1979). Memory load and event rate control sensitivity decrements in sustained attention. Science, 205(4409), 924-927. Parasuraman, R. (1986). Vigilance, monitoring, and search. Parasuraman, R., & Caggiano, D. (2005). Neural and genetic assays of human mental workload. Quantifying human informationprocessing, 123-149. Parasuraman, R., & Davies, D. R. (1976). Decision theory analysis of response latencies in vigilance. Journalof ExperimentalPsychology: Human Perceptionand Performance, 2(4), 578. Parasuraman, R., & Davies, D. R. (1984). Varieties of attention (Vol. 40): Academic Press New York. Parasuraman, R., Warm, J. S., & Dember, W. N. (1987). Vigilance: Taxonomy and utility Ergonomics andhumanfactors (pp. 11-32): Springer. Parasuraman, R., Warm, J. S., & See, J. E. (1998). Brain systems of vigilance. Pashler, H. (1994). Dual-task interference in simple tasks: data and theory. Psychological Bulletin, 116(2), 220. Pattyn, N., Neyt, X., Henderickx, D., & Soetens, E. (2008). Psychophysiological investigation of vigilance decrement: Boredom or cognitive fatigue? Physiology & Behavior, 93(1-2), 369-378. doi: http://dx.doi.ori/10.1016/i.physbeh.2007.09.016 Pigeau, R. A., Angus, R., O'Neill, P., & Mack, I. (1995). Vigilance latencies to aircraft detection among NORAD surveillance operators. Human Factors:The Journalof the Human FactorsandErgonomics Society, 37(3), 622-634. Price, C. J. (2010). The anatomy of language: a review of 100 fMRI studies published in 2009. Annals of the New York Academy of Sciences, 1191(1), 62-88. Ragheb, M. G., & Merydith, S. P. (2001). Development and validation of a multidimensional scale measuring free time boredom. Leisure Studies, 20(1), 41-59. Raichle, M. E. (2011). Circulatory and Metabolic Correlates of Brain Function in Normal Humans ComprehensivePhysiology: John Wiley & Sons, Inc. Raichle, M. E., & Mintun, M. A. (2006). Brain Work and Brain Imaging. Annual review of neuroscience, 29(1), 449-476. doi: doi: 10.1 146/annurev.neuro.29.051605.112819 Rasmussen, J. (1983). Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human performance models. Systems, Man and Cybernetics, IEEE Transactionson(3), 257-266. Rathje, J. M., Spence, L. B., & Cummings, M. L. (2013). Human-Automation Collaboration in Occluded Trajectory Smoothing. Human-MachineSystems, IEEE Transactionson, 43(2), 137-148. 144 & Recarte, M. A., & Nunes, L. M. (2003). Mental workload while driving: effects on visual search, discrimination, and decision making. Journalof experimentalpsychology: Applied, 9(2), 119. Redding, R. E. (1992). Analysis of operationalerrors and workload in airtraffic control. Paper presented at the Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Atlanta, GA. Reed, J. H., McAdams, F. H., Thayer, L. M., Burgess, I. A., & Haley, W. R. (1973). Aircraft Accident Report: Eastern Air Lines, Inc. L-10 11, N3 1OEA. Washington, D.C.: National Transportation Safety Board. Reid, G. B., & Nygren, T. (1988). The subjective workload assessment technique: A scaling procedure for measuring mental workload. Human mental workload, 185, 218. Ridderinkhof, K. R., van den Wildenberg, W. P., Segalowitz, S. J., & Carter, C. S. (2004). Neurocognitive mechanisms of cognitive control: the role of prefrontal cortex in action selection, response inhibition, performance monitoring, and reward-based learning. Brain and cognition, 56(2), 129-140. Robertson, C. S., Narayan, R. K., Gokaslan, Z. L., Pahwa, R., Grossman, R. G., Caram Jr, P., Allen, E. (1989). Cerebral arteriovenous oxygen difference as an estimate of cerebral blood flow in comatose patients. Journal ofneurosurgery, 70(2), 222-230. Roscoe, A. H. (1992). Assessing pilot workload. Why measure heart rate, HRV and respiration? Biologicalpsychology, 34(2), 259-287. Rovira, E., McGarry, K., & Parasuraman, R. (2007). Effects of imperfect automation on decision making in a simulated command and control task. Human Factors: The Journalof the Human Factors andErgonomics Society, 49(1), 76-87. Roy, C. S., & Sherrington, C. (1890). On the regulation of the blood-supply of the brain. The Journalofphysiology, 11(1-2), 85. Rypma, B., & D'Esposito, M. (1999). The roles of prefrontal brain regions in components of working memory: effects of memory load and individual differences. Proceedings ofthe NationalAcademy of Sciences, 96(11), 6558-6563. Sachse, B. (2011). FAA Issues Final Rule on Pilot Fatigue, 2014, from http://www.faa.gov/news/press releases/news story.cfn?newsld=13272 Sakatani, K., Chen, S., Lichty, W., Zuo, H., & Wang, Y.-p. (1999). Cerebral blood oxygenation changes induced by auditory stimulation in newborn infants measured by near infrared spectroscopy. Early Human Development, 55(3), 229-236. doi: http://dx.doi.org/l0.1016/SO378-3782(99)00019-5 Sarter, N. B., Woods, D. D., & Billings, C. E. (1997). Automation surprises. Handbook of Human FactorsandErgonomics, 2, 1926-1943. Sassaroli, A., Zheng, F., Hirshfield, L., Girourd, A., Solovey, E. T., Jacob, R., & Fantini, S. (2008). Discrimination of Mental Workload Levels in Human Subjects with Functional Near-Infrared Spectroscopy. JournalofInnovative OpticalHealth Sciences, 01(02), 227237. doi: doi: 10.1142/S 1793545808000224 Schlegel, R. E., Gilliland, K., & Schlegel, B. (1986). Development ofthe CriterionTask Set performance data base. Paper presented at the Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Dayton, OH. 145 Scholkmann, F., Klein, S. D., Gerber, U., Wolf, M., & Wolf, U. (2014). Cerebral hemodynamic and oxygenation changes induced by inner and heard speech: a study combining functional near-infrared spectroscopy and capnography. JournalofBiomedical Optics, 19(1), 017002-017002. doi: 10.1117/1.jbo.19.1.017002 Schroeter, M. L., Kupka, T., Mildner, T., Uludag, K., & von Cramon, D. Y. (2006). Investigating the post-stimulus undershoot of the BOLD signal-a simultaneous tMRI and fNIRS study. Neuroimage, 30(2), 349-358. doi: http://dx.doi.org/10.1016/i.neuroimage.2005.09.048 Sears, L. (1909). Wendell Phillips, orator and agitator:Doubleday, Page. See, J. E., Howe, S. R., Warm, J. S., & Dember, W. N. (1995). Meta-analysis of the sensitivity decrement in vigilance. PsychologicalBulletin, 117(2), 230. Shaw, T. H., Funke, M. E., Dillard, M., Funke, G. J., Warm, J. S., & Parasuraman, R. (2013). Event-related cerebral hemodynamics reveal target-specific resource allocation for both "go" and "no-go" response-based vigilance tasks. Brain and cognition, 82(3), 265-273. Shaw, T. H., Warm, J. S., Finomore, V., Tripp, L., Matthews, G., Weiler, E., & Parasuraman, R. (2009). Effects of sensory modality on cerebral blood flow velocity during vigilance. NeuroscienceLetters, 461(3), 207-211. Sheridan, T. B., & Simpson, R. (1979). Toward the definition and measurement of the mental workload of transport pilots: Cambridge, Mass.: Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, Flight Transportation Laboratory,[ 1979]. Shimizu, T., Hirose, S., Obara, H., Yanagisawa, K., Tsunashima, H., Marumo, Y., . . . Taira, M. (2009). Measurement of frontal cortex brain activity attributable to the driving workload and increased attention. SAE InternationalJournalof PassengerCars-Mechanical Systems, 2(1), 736-744. Shimoda, N., Takeda, K., Imai, I., Kaneko, J., & Kato, H. (2008). Cerebral laterality differences in handedness: A mental rotation study with NIRS. NeuroscienceLetters, 430(1), 43-47. Sirevaag, E. J., Kramer, A. F., Reisweber, C. D. W. M., Strayer, D. L., & Grenell, J. F. (1993). Assessment of pilot performance and mental workload in rotary wing aircraft. Ergonomics, 36(9), 1121-1140. doi: 10.1080/00140139308967983 Skitka, L. J., Mosier, K. L., & Burdick, M. (1999). Does automation bias decision-making? InternationalJournalof Human-ComputerStudies, 51(5), 991-1006. Solovey, E. (2009). Using FNIRS Brain Sensing in RealisticHCI Settings: Experiments and Guidelines. Paper presented at the Computer-Human Interface Conference, Boston, MA. Solovey, E., Schermerhorn, P., Scheutz, M., Sassaroli, A., Fantini, S., & Jacob, R. (2012). Brainput: EnhancingInteractive Systems with Streaming FNIRS Brain Input. Paper presented at the Computer-Human Interaction, Austin, TX. Son, I.-Y., Guhe, M., Gray, W., Yazici, B., & Schoelles, M. (2005). Human performance assessmentusingfNVIR. Paper presented at the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series. Speert, D. (2012). Brainfacts: a primer on the brain and nervous system (7th ed.). Washington, DC: Society for Neuroscience. Stager, P., Hameluck, D., & Jubis, R. (1989). Underlyingfactorsin air traffic control incidents. Paper presented at the Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Denver, CO. 146 Steinbrink, J., Villringer, A., Kempf, F., Haux, D., Boden, S., & Obrig, H. (2006). Illuminating the BOLD signal: combined fMRI-fNIRS studies. MagneticResonance Imaging, 24(4), 495-505. doi: http://dx.doi.org/10.1016/.mri.2005.12.034 Strangman, G., Culver, J. P., Thompson, J. H., & Boas, D. A. (2002). A quantitative comparison of simultaneous BOLD fMRI and NIRS recordings during functional brain activation. Neurolmage, 17(2), 719-731. Suedfeld, P. (1975). The Benefits of Boredom: Sensory Deprivation Reconsidered: The effects of a monotonous environment are not always negative; sometimes sensory deprivation has high utility. American Scientist, 63(1), 60-69. Svensson, E., Angelborg-Thanderez, M., Sj6berg, L., & Olsson, S. (1997). Information complexity-mental workload and performance in combat aircraft. Ergonomics, 40(3), 362-380. Svensson, E., Angelborg-Thanderz, M., & Sj6berg, L. (1993). Mission challenge, mental workload and performance in military aviation. Aviation, space, and environmental medicine. Takahashi, N., Shimizu, S., Hirata, Y., Nara, H., Inoue, H., Hirai, N., ... Kato, S. (2011). Basic study of analysis of human brain activities during car driving Human Interface and the Management ofInformation. Interactingwith Information (pp. 627-635): Springer. Tan, D., & Nijholt, A. (2010). Brain-Computer Interfaces and Human-Computer Interaction. In D. S. Tan & A. Nijholt (Eds.), Brain-ComputerInterfaces (pp. 3-19): Springer London. Thackray, R., Bailey, J. P., & Touchstone, R. M. (1977). Physiological, Subjective, and Performance Correlates of Reported Boredom and Monotony While Performing a Simulated Radar Control Task. In R. Mackie (Ed.), Vigilance (Vol. 3, pp. 203-215): Springer US. Thackray, R. I., Powell Bailey, J., & Mark Touchstone, R. (1979). The Effect of Increased Monitoring Load on Vigilance Performance using a Simulated Radar Display. Ergonomics, 22(5), 529-539. doi: 10.1080/00140137908924637 Thomas, L. C., & Wickens, C. D. (2001). Visual displays and cognitive tunneling: Frames of reference effects on spatialjudgmentsand change detection. Paper presented at the Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Minneapolis, MN. Thornburg, K., Peterse, H. P. M., Liu, A. M., & Oman, C. (2011). Operator Performance in Long Duration, Low Task Load Control Operations. In D. A. D'Agostino (Ed.), Preparedfor the Nuclear Regulatory Commission, Office ofNuclear Regulatory Research. Cambridge, MA: Massachusetts Institute of Technology. Tootell, R. B., Hadjikhani, N. K., Mendola, J. D., Marrett, S., & Dale, A. M. (1998). From retinotopy to recognition: fMRI in human visual cortex. Trends in cognitive sciences, 2(5), 174-183. Tsujimoto, S., Yamamoto, T., Kawaguchi, H., Koizumi, H., & Sawaguchi, T. (2004). Prefrontal cortical activation associated with working memory in adults and preschool children: an event-related optical topography study. Cerebralcortex, 14(7), 703-712. Tsunashima, H., & Yanagisawa, K. (2009). Measurement of brain function of car driver using functional near-Infrared spectroscopy (fNIRS). Computationalintelligence and neuroscience, 2009. 147 Tucker, D. M. (1981). Lateral brain function, emotion, and conceptualization. Psychological Bulletin, 89(1), 19. Tvaryanas, A. P., Platte, W., Swigart, C., Colebank, J., & Miller, N. L. (2008). A resurvey of shift work-related fatigue in MQ-1 Predator unmanned aircraft system crewmembers. van Tilburg, W. A., & Igou, E. R. (2012). On boredom: Lack of challenge and meaning as distinct boredom experiences. Motivation and Emotion, 36(2), 181-194. Vernon, J. A., Mc Gill, T. E., Gulick, W. L., & Candland, D. K. (1959). Effect of sensory deprivation on some perceptual and motor skills. PerceptualandMotor Skills, 9(3), 9197. Vidulich, M. A., & Tsang, P. S. (2012). Mental workload and situation awareness. Handbook of Human Factors andErgonomics, 243. Vienneau, R., & Gozzo, F. (1987). Estimatingpilot workload and its impact on system performance. Paper presented at the 43rd American Helicopter Society Annual Forum, St. Louis, MO. Vodanovich, S. J. (2003). Psychometric measures of boredom: a review of the literature. The JournalOfPsychology, 137(6), 569-595. Vogel-Walcutt, J. J., Fiorella, L., Carper, T., & Schatz, S. (2012). The Definition, Assessment, and Mitigation of State Boredom Within Educational Settings: A Comprehensive Review. EducationalPsychology Review, 24(1), 89-111. Warm, J. S. (1984). Sustained attention in humanperformance: Wiley New York. Warm, J. S., & Dember, W. N. (1998). Tests of vigilance taxonomy. In R. R. Hoffman, M. F. Sherrick & J. S. Warm (Eds.), Viewing Psychology as a Whole: The IntegrativeScience of William N Dember (pp. 704): American Psychological Association. Warm, J. S., Dember, W. N., & Hancock, P. A. (1996). Vigilance and workload in automated systems. Warm, J. S., Matthews, G., & Parasuraman, R. (2009). Cerebral hemodynamics and vigilance performance. Militarypsychology, 21, S75-S100. Warm, J. S., Parasuraman, R., & Matthews, G. (2008). Vigilance requires hard mental work and is stressful. Human Factors: The Journalofthe Human Factorsand Ergonomics Society, 50(3), 433-441. Watt, J. D., & Ewing, J. E. (1996). Toward the development and validation of a measure of sexual boredom. JournalofSex Research, 33(1), 57-66. Weber, A., Fussler, C., O'hanlon, J., Gierer, R., & Grandjean, E. (1980). Psychophysiological effects of repetitive tasks. Ergonomics, 23(11), 1033-1046. Whyte, J. (2011). Blood Oxygen Level-Dependent Encyclopedia of ClinicalNeuropsychology (pp. 423-426): Springer. Wickens, C., & Hollands, J. (1999). EngineeringPsychology andHuman Performance(3rd Edition): Prentice Hall. Wickens, C. D. (1984). Processing resources in attention. In R. Parasuraman & D. R. Davies (Eds.), Varieties ofAttention (pp. 63-102). Orlando, FL: Academic Press. Wickens, C. D., & Kessel, C. (1979). The effects of participatory mode and task workload on the detection of dynamic system failures. Systems, Man and Cybernetics, IEEE Transactions on, 9(1), 24-34. 148 - Wickens, C. D., Mayor, A. S., & McGee, J. P. (1997). Flight to thefuture: Humanfactors in air traffic control: National Academies Press. Wickens, C. D., Vidulich, M., & Sandry-Garza, D. (1984). Principles of SCR compatibility with spatial and verbal tasks: The role of display-control location and voice-interactive display-control interfacing. Human Factors: The Journalof the Human Factors and Ergonomics Society, 26(5), 533-543. Wierwille, W. W. (1979). Physiological measures of aircrew mental workload. Human Factors: The Journal ofthe Human Factors and Ergonomics Society, 21(5), 575-593. Wierwille, W. W., & Eggemeier, F. T. (1993). Recommendations for mental workload measurement in a test and evaluation environment. Human Factors: The Journalof the Human Factorsand Ergonomics Society, 35(2), 263-281. Wilson, G. (2002). An Analysis of Mental Workload in Pilots During Flight Using Multiple Psychophysiological Measures. The InternationalJournalofAviation Psychology, 12(1), 3-18. doi: 10.1207/s15327108ijap201_2 Wilson, G., & Russell, C. A. (2003). Real-time assessment of mental workload using psychophysiological measures and artificial neural networks. Human Factors:The Journalof the Human Factorsand ErgonomicsSociety, 45(4), 635-644. Wilson, G. F., & Russell, C. A. (2007). Performance enhancement in an uninhabited air vehicle task using psychophysiologically determined adaptive aiding. Human Factors:The Journalof the Human Factorsand ErgonomicsSociety, 49(6), 1005-1018. Wolf, M., Ferrari, M., & Quaresima, V. (2007). Progress of near-infrared spectroscopy and topography for brain and muscle clinical applications. JournalofBiomedical Optics, 12(6), 062104-062104-062114. Woods, D. D., & Sarter, N. B. (2000). Learning from automation surprises and" going sour" accidents. Cognitive engineering in the aviation domain, 327-353. Wylie, G., & Allport, A. (2000). Task switching and the measurement of "switch costs". Psychologicalresearch, 63(3-4), 212-233. Yerkes, R. M., & Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit formation. Journalof comparative neurology andpsychology, 18(5), 459-482. Zheng, B., Cassera, M. A., Martinec, D. V., Spaun, G. 0., & Swanstr5m, L. L. (2010). Measuring mental workload during the performance of advanced laparoscopic tasks. Surgical endoscopy, 24(1), 45-50. Zuckerman, M. (1979). Sensationseeking: Beyond the optimal level of arousal:Erlbaum Hillsdale, NJ. 149