Development of Optical Computer Recognition (OCR) for Monitoring Stress and... N. Michael, F. Yang, D. Metaxas,

advertisement
18th IAA Humans in Space Symposium (2011)
2306.pdf
Development of Optical Computer Recognition (OCR) for Monitoring Stress and Emotions in Space
N. Michael,1 F. Yang,1 D. Metaxas,1 and D.F. Dinges2
1
Center for Computational Biomedicine Imaging and Modeling Center, Rutgers University, New Brunswick, NJ,
USA, 2Unit for Experimental Psychiatry, Department of Psychiatry, University of Pennsylvania School of Medicine,
Philadelphia, PA, USA
INTRODUCTION. While in space, astronauts are required to perform mission-critical tasks on very expensive
equipment at a high level of functional capability. Stressors can compromise their ability to do so, thus it is very
important to have a system that can unobtrusively and objectively detect neurobehavioral problems involving
elevated levels of behavioral stress and negative emotions. Computerized approaches involving inexpensive cameras
offer an unobtrusive way to detect distress and to monitor observable emotions of astronauts during critical
operations in space, by tracking and analyzing facial expressions and body gestures in video streams. Such systems
can have applications beyond space flight, e.g., surveillance, law enforcement and human computer interaction.
TECHNOLOGY DEVELOPMENT. We developed a framework [1-9] that is capable of real time tracking of faces
and skin blobs of heads and hands. Face tracking uses a group of deformable statistical models of facial shape
variation and local texture distribution to robustly track facial landmarks (e.g., eyes, eyebrows, nose, mouth). The
model tolerates partial occlusions, it automatically detects and recovers from lost track, and it handles head rotations
up to full profile view. The skin blob tracker is initialized with a generic skin color model, dynamically learning the
specific color distribution online for adaptive tracking. Detected facial landmarks and blobs are filtered online, both
in terms of shape and motion, using eigenspace analysis and temporal dynamical models to prune false detections.
We then extract geometric and appearance features to learn models that detect relevant gestures and facial
expressions. In particular, our method utilizes the relative intensity ordering of facial expressions (i.e., neutral, onset,
apex, offset) found in the training set to learn a ranking model (Rankboost) for their recognition and intensity
estimation, which improves our average recognition rate (~87.5% on CMU benchmark database [4,10]). In relation
to stress detection, we piloted an experiment to learn subject-specific models of deception detection using behavioral
cues to discriminate stressed and relaxed behaviors. We video recorded 147 subjects in 12-question interviews after
a mock crime scenario, tracking their facial expressions and body gestures using our algorithm. Using leave-one-out
cross validation we acquired separate Nearest Neighbor models per subject, discriminating deceptive from truthful
responses with an average accuracy of 81.6% [7, 9]. We are currently experimenting with structured sparsity [14]
and super-resolution [11-13] techniques to obtain better quality image features to improve tracking and recognition
accuracy. Validation of detection of emotional states is underway (see abstract by Moreta et al.).
REFERENCES. [1] Peng Y, et al. Exploring Facial Expressions with Compositional Features. IEEE Computer Soc.
Conf. on Computer Vision & Pattern Recognition 2010; [2] Peng Y, et al. Ranking Model for Facial Age
Estimation. 20th Internat. Conf. on Pattern Recognition 2010; [3] Peng Y, et al. Boosting Encoded Dynamic
Features for Facial Expression Recognition. Pattern Recognition Letter. 2009; [4] Peng Y, et al. RankBoost with
regularization for Facial Expression Recognition & Intensity Estimation. In Proc. Int. Conf. on Computer Vision
2009; [5] Neidle C, et al. A Method for Recognition of Grammatically Significant Head Movements & Facial
Expressions, Developed through use of a Linguistically Annotated Video Corpus. Proc. Workshop on Formal
Approaches to Sign Languages, 21st European Summer School in Logic, Language & Information, Bordeaux,
France, July 20-31, 2009; [6] Michael N, et al. Spatial & Temporal Pyramids for Grammatical Expression
Recognition of ASL Proc. 11th Inter. ACM SIGACCESS Conf on Computers & Accessibility, Phil., PA, Oct 26-28.
2009; [7] Judee K, et al. Automated Kinesic Analysis for Deception Detection, 43rd Hawaii Internat. Conf. on
Systems Sciences; [8] Michael N, et al. Computer-based recognition of facial expressions in ASL: From face
tracking to linguistic interpretation. Proc. 4th Workshop on the Representation & Processing of Sign Languages:
Corpora & Sign Language Tech., Malta, May 22-23, 2010; [9] Michael N, et al. Motion Profiles for Deception
Detection using Visual Cues, Proc. 11th European Conf on Computer Vision, September 2010; [10] Kanade T, et al.
Comprehensive database for facial expression analysis. Proc. 4th IEEE Internat. Conf. on Automatic Face & Gesture
Recognition (FG'00), Grenoble, France, 46-53, 2000; [11] Borman S, Stevenson R. Super-resolution from image
sequences—a review, Proc. Midwest Symposium on Circuits & Systems, 1998; [12] Park SC, et al. Super-resolution
image reconstruction: a technical overview, IEEE Signal Processing Mag, 20(3):21–36, May 2003; [13] Schuon S,
et al. LidarBoost: Depth Superresolution for ToF 3D Shape Scanning, Proc. IEEE CVPR 2009; [14] Huang J, et al.
Learning with Structured Sparsity, 26th Inter. Conf. on Machine Learning, ICML’09, Montreal, June, 2009.
FUNDING. Supported by the National Space Biomedical Research Institute through NASA NCC 9-58.
Download