Proposal for a PhD thesis INRIA Sophia Antipolis, STARS group 2004, route des Lucioles, P93 06902 Sophia Antipolis Cedex-France I. Title Detecting critical human activities using RGB/RGBD cameras in home environment II. General objective It is a well-known phenomenon that life expectancy is steadily increasing and that various issues specific to the growing population of older people need to be addressed. Many user studies have shown that staying in their home environment is of great importance for the wellbeing of senior citizens. However the physical and mental decline that usually comes with age often raises concerns about their safety and health when living alone. Technical solutions to monitor the state of an older person and raising alarms when outside support is required could help reducing the amount of supervision from relatives and carers and thus increase the selfreliance of the elderly. A lot of research has been carried out to model activities of daily living (ADLs). Most of the systems that have been developed use either simple sensor data (wearable sensors, touch sensors, RFID tags) or camera information to detect ADLs in a home environment. There have also been a few attempts to include audio information into the recognition process. In this project we would like to focus our attention not so much on the recognition of activities but rather on the detection of critical situations. We believe that a system that is able to detect potentially dangerous situations will give peace of mind to frail older people as well as to their care givers. This will require not only recognition of ADLs but also an evaluation of the way and timing in which they are being carried out. The users we are targeting should be in a relatively good health condition. We do not want to address serious dementia problems but intend to give support to people who suffer from agerelated forgetfulness (in many cases the start of an Alzheimer condition). The users should be living in their own house by themselves and generally be able to carry out their day-to-day activities. The system we want to develop is intended to help them and their relatives to feel more comfortable because they know potentially dangerous situations will be detected and reported to caregivers if necessary. Typical situations that we would like to monitor are 1. Eating and drinking (how much? how often?) 2. Cooking (detect behavior that might lead to dangerous situations or non completion of the task) 3. Taking medication (detect if correct medication is taken timely) 4. The fall There are 3 categories of critical human activities: o Activities which can be well described or modeled by users o Activities which can be specified by users and that can be illustrated by positive/negative samples representative of the targeted activities o Rare activities which are unknown to the users and which can be defined only with respect to frequent activities requiring large datasets III. Scientific context STARS group works on automatic sequence video interpretation. The “SUP” (“Scene Understanding Platform”) Platform developed in STARS, detects mobile objects, tracks their trajectory and recognizes related behaviors predefined by experts. This platform contains several techniques for the detection of people and for the recognition of human postures and activities of one or several persons using 2D or 3D video cameras. In particular, there are 3 categories to recognize human activities: o Recognition engine using hand-crafted ontologies based on a priori knowledge (e.g. rules) predefined by users. This activity recognition engine is easily extendable and allows later integration of additional sensor information when needed. o Supervised learning methods based on positive/negative samples representative of the targeted activities which have to be specified by users. These methods are usually based on Bag-of-Words computing a large variety of spatio-temporal descriptors. o Unsupervised (fully automated) learned methods based on clustering of frequent activity patterns on large datasets which can generate/discover new activity models However there are many scientific challenges in recognizing human activities when dealing with real word scenes with dementia patients: cluttered scenes, handling wrong and incomplete person segmentation, handling static and dynamic occlusions, low contrasted objects, moving contextual objects (e.g. chairs) ... IV. PhD objective We would like to design a supervised learning algorithm that detects automatically critical human activities using RGB/RGBD cameras. This algorithm should help with the monitoring of the activities of an older person throughout the day. Several issues need to be tackled: 1. Robustness of the algorithm and accuracy of the recognized actions/activities. a. The algorithm should be able to distinguish between similar activities. b. More fine-grained activity recognition might also require improvement of the existing algorithms. c. The algorithm should provide a confidence and a utility measure depending on the seriousness of the detected activities 2. Genericity of the proposed algorithm. Investigation of 3D spatio-temporal descriptors that characterize abnormal behaviors and which are generic for most people (i.e. person independent). 3. Real-time algorithm to be integrated in an operational system. Actual state-of-the-Art algorithms still got some limitations when person performing action is not in front of the camera. In this setup some serious occlusions can occur or skeleton detection can perform poorly (for methods which rely on skeleton detection). What is more current state-of-the-Art algorithms focus on some specific actions (with low intra class variation) like for instance “chopping”. On the other hand some more generic action like “cooking” can mean either “chopping” or “mixing”. Already proposed state-of-the-Art methods do not perform well on recognizing similar looking actions and do not perform kind of anomaly detection procedures to distinguish for instance laying down vs. falling down. The evaluation of algorithm should be performed on datasets which contains every day activities like MSR Daily Activity 3D1 (RGBD), MSR Action 3D1 (only depth), ADL2 (RGB), CHU (Nice Hospital – RGBD). V. Prerequisites Strong background in C++ Programming, machine learning, recognition of activities, computer vision. VI. Calendar 1st year: Study the limitations of existing solutions. Proposing an original algorithm for human activity recognition using RGB/RGBD cameras. 2nd year: Evaluate proposed algorithm on Benchmarking databases to compare with the State-ofthe-Art. Depending on the targeted activities data collection might be required. Evaluation in situations that are close enough to reality will be investigated. 3rd year: 1 2 http://research.microsoft.com/en-us/um/people/zliu/actionrecorsrc/ http://ailab.wsu.edu/casas/datasets/adlnormal.zip Optimise proposed algorithm. Writing papers and PhD manuscript. VII. Bibliography: o P.H. Robert, A. Konig, S. Andrieu, F. Bremond, I. Chemin, P.C. Chung, J.F. Dartigues , B. Dubois, G. Feutren, R. Guillemaud, P.A. Kenisberg, S. Nave, B. Vellas, F. Verhey, J. Yesavage and P. Mallea. Recommendations for ICT use in Alzheimer's Disease assessment: Monaco CTAD expert meeting, JNHA - The Journal of Nutrition, Health and Aging Ref. No.: JNHA-D-13-00016R1, 2013. o G. Sacco, V. Joumier, N. Darmon, A. Dechamps, A. Derreumeaux, L. Lee, J. Piano, N. Bordone, A. Konig, B. Teboul, R. David, O. Guerin, F. Bremond and P.H. Robert, Detection of activities of daily living impairment in Alzheimer's disease and mild cognitive impairment using information and communication technology, Clinical Interventions in Aging Volume 2012:7 Pages 539 - 549 DOI: http://dx.doi.org/10.2147/CIA.S36297, Link to PubMed, December 2012. o P. Bilinski and F. Bremond. Contextual Statistics of Space-Time Ordered Features for Human Action Recognition. The 9th IEEE International Conference On Advanced Video and Signal Based Surveillance (AVSS 12), Beijing on 18-21 September 2012. o P. Bilinski, E. Corvee, S. Bak and F. Bremond. Relative Dense Tracklets for Human Action Recognition . The 10th IEEE International Conference on Automatic Face and Gesture Recognition , FG 2013, Shanghai, China, 22-26 April, 2013. o P. Bilinski and F. Bremond. Statistics of Pairwise Cooccurring Local SpatioTemporal Features for Human Action Recognition. The 4th International Workshop on Video Event Categorization, Tagging and Retrieval (VECTaR 2012) , ECCV 2012 Workshop, Firenze, Italy, October 13, 2012. o Wanqing Li, Zhengyou Zhang, Zicheng Liu. Action Recognition Based on A Bag of 3D Points. IEEE International Workshop on CVPR for Human Communicative Behavior Analysis (in conjunction with CVPR2010), San Francisco, CA, June, 2010. o Jiang Wang, Zicheng Liu, Ying Wu, Junsong Yuan. Mining Actionlet Ensemble for Action Recognition with Depth Cameras. IEEE Conference on Computer Vision and Pattern Recognition (CVPR2012), Providence, Rhode Island, June 16-21, 2012. o Robust 3D Action Recognition with Random Occupancy Patterns. Jiang Wang, Zicheng Liu, Jan Chorowski, Zhuoyuan Chen, Ying Wu. ECCV 2012, Firenze, Italy, October 13, 2012. o Ivan Laptev, Marcin MarszaĆek, Cordelia Schmid, Benjamin Rozenfeld. Learning Realistic Human Actions from Movies. CVPR 2008, 24-26 June 2008, Anchorage, Alaska, USA.