IV. PhD objective - Sophia Antipolis

advertisement
Proposal for a PhD thesis
INRIA Sophia Antipolis, STARS group
2004, route des Lucioles, P93
06902 Sophia Antipolis Cedex-France
I. Title
Detecting critical human activities using RGB/RGBD cameras in home environment
II. General objective
It is a well-known phenomenon that life expectancy is steadily increasing and that various
issues specific to the growing population of older people need to be addressed. Many user
studies have shown that staying in their home environment is of great importance for the wellbeing of senior citizens. However the physical and mental decline that usually comes with age
often raises concerns about their safety and health when living alone. Technical solutions to
monitor the state of an older person and raising alarms when outside support is required could
help reducing the amount of supervision from relatives and carers and thus increase the selfreliance of the elderly.
A lot of research has been carried out to model activities of daily living (ADLs). Most of the
systems that have been developed use either simple sensor data (wearable sensors, touch
sensors, RFID tags) or camera information to detect ADLs in a home environment. There
have also been a few attempts to include audio information into the recognition process.
In this project we would like to focus our attention not so much on the recognition of
activities but rather on the detection of critical situations. We believe that a system that is able
to detect potentially dangerous situations will give peace of mind to frail older people as well
as to their care givers.
This will require not only recognition of ADLs but also an evaluation of the way and timing
in which they are being carried out.
The users we are targeting should be in a relatively good health condition. We do not want to
address serious dementia problems but intend to give support to people who suffer from agerelated forgetfulness (in many cases the start of an Alzheimer condition). The users should be
living in their own house by themselves and generally be able to carry out their day-to-day
activities. The system we want to develop is intended to help them and their relatives to feel
more comfortable because they know potentially dangerous situations will be detected and
reported to caregivers if necessary.
Typical situations that we would like to monitor are
1. Eating and drinking (how much? how often?)
2. Cooking (detect behavior that might lead to dangerous situations or non completion of
the task)
3. Taking medication (detect if correct medication is taken timely)
4. The fall
There are 3 categories of critical human activities:
o Activities which can be well described or modeled by users
o Activities which can be specified by users and that can be illustrated by
positive/negative samples representative of the targeted activities
o Rare activities which are unknown to the users and which can be defined only
with respect to frequent activities requiring large datasets
III. Scientific context
STARS group works on automatic sequence video interpretation. The “SUP” (“Scene
Understanding Platform”) Platform developed in STARS, detects mobile objects, tracks their
trajectory and recognizes related behaviors predefined by experts. This platform contains
several techniques for the detection of people and for the recognition of human postures and
activities of one or several persons using 2D or 3D video cameras. In particular, there are 3
categories to recognize human activities:
o Recognition engine using hand-crafted ontologies based on a priori knowledge (e.g.
rules) predefined by users. This activity recognition engine is easily extendable and
allows later integration of additional sensor information when needed.
o Supervised learning methods based on positive/negative samples representative of the
targeted activities which have to be specified by users. These methods are usually
based on Bag-of-Words computing a large variety of spatio-temporal descriptors.
o Unsupervised (fully automated) learned methods based on clustering of frequent
activity patterns on large datasets which can generate/discover new activity models
However there are many scientific challenges in recognizing human activities when dealing
with real word scenes with dementia patients: cluttered scenes, handling wrong and
incomplete person segmentation, handling static and dynamic occlusions, low contrasted
objects, moving contextual objects (e.g. chairs) ...
IV. PhD objective
We would like to design a supervised learning algorithm that detects automatically
critical human activities using RGB/RGBD cameras. This algorithm should help with the
monitoring of the activities of an older person throughout the day.
Several issues need to be tackled:
1. Robustness of the algorithm and accuracy of the recognized actions/activities.
a. The algorithm should be able to distinguish between similar activities.
b. More fine-grained activity recognition might also require improvement of the
existing algorithms.
c. The algorithm should provide a confidence and a utility measure depending on
the seriousness of the detected activities
2. Genericity of the proposed algorithm. Investigation of 3D spatio-temporal descriptors
that characterize abnormal behaviors and which are generic for most people (i.e.
person independent).
3. Real-time algorithm to be integrated in an operational system.
Actual state-of-the-Art algorithms still got some limitations when person performing action is
not in front of the camera. In this setup some serious occlusions can occur or skeleton
detection can perform poorly (for methods which rely on skeleton detection). What is more
current state-of-the-Art algorithms focus on some specific actions (with low intra class
variation) like for instance “chopping”. On the other hand some more generic action like
“cooking” can mean either “chopping” or “mixing”. Already proposed state-of-the-Art
methods do not perform well on recognizing similar looking actions and do not perform kind
of anomaly detection procedures to distinguish for instance laying down vs. falling down.
The evaluation of algorithm should be performed on datasets which contains every day
activities like MSR Daily Activity 3D1 (RGBD), MSR Action 3D1 (only depth), ADL2
(RGB), CHU (Nice Hospital – RGBD).
V. Prerequisites
Strong background in C++ Programming, machine learning, recognition of activities,
computer vision.
VI. Calendar
1st year:
Study the limitations of existing solutions.
Proposing an original algorithm for human activity recognition using RGB/RGBD
cameras.
2nd year:
Evaluate proposed algorithm on Benchmarking databases to compare with the State-ofthe-Art.
Depending on the targeted activities data collection might be required.
Evaluation in situations that are close enough to reality will be investigated.
3rd year:
1
2
http://research.microsoft.com/en-us/um/people/zliu/actionrecorsrc/
http://ailab.wsu.edu/casas/datasets/adlnormal.zip
Optimise proposed algorithm.
Writing papers and PhD manuscript.
VII. Bibliography:
o P.H. Robert, A. Konig, S. Andrieu, F. Bremond, I. Chemin, P.C. Chung, J.F. Dartigues ,
B. Dubois, G. Feutren, R. Guillemaud, P.A. Kenisberg, S. Nave, B. Vellas, F. Verhey, J.
Yesavage and P. Mallea. Recommendations for ICT use in Alzheimer's Disease
assessment: Monaco CTAD expert meeting, JNHA - The Journal of Nutrition, Health
and Aging Ref. No.: JNHA-D-13-00016R1, 2013.
o G. Sacco, V. Joumier, N. Darmon, A. Dechamps, A. Derreumeaux, L. Lee, J. Piano, N.
Bordone, A. Konig, B. Teboul, R. David, O. Guerin, F. Bremond and P.H. Robert,
Detection of activities of daily living impairment in Alzheimer's disease and mild
cognitive impairment using information and communication technology, Clinical
Interventions in Aging Volume 2012:7 Pages 539 - 549 DOI:
http://dx.doi.org/10.2147/CIA.S36297, Link to PubMed, December 2012.
o P. Bilinski and F. Bremond. Contextual Statistics of Space-Time Ordered Features for
Human Action Recognition. The 9th IEEE International Conference On Advanced
Video and Signal Based Surveillance (AVSS 12), Beijing on 18-21 September 2012.
o P. Bilinski, E. Corvee, S. Bak and F. Bremond. Relative Dense Tracklets for Human
Action Recognition . The 10th IEEE International Conference on Automatic Face and
Gesture Recognition , FG 2013, Shanghai, China, 22-26 April, 2013.
o P. Bilinski and F. Bremond. Statistics of Pairwise Cooccurring Local SpatioTemporal
Features for Human Action Recognition. The 4th International Workshop on Video
Event Categorization, Tagging and Retrieval (VECTaR 2012) , ECCV 2012 Workshop,
Firenze, Italy, October 13, 2012.
o Wanqing Li, Zhengyou Zhang, Zicheng Liu. Action Recognition Based on A Bag of 3D
Points. IEEE International Workshop on CVPR for Human Communicative Behavior
Analysis (in conjunction with CVPR2010), San Francisco, CA, June, 2010.
o Jiang Wang, Zicheng Liu, Ying Wu, Junsong Yuan. Mining Actionlet Ensemble for
Action Recognition with Depth Cameras. IEEE Conference on Computer Vision and
Pattern Recognition (CVPR2012), Providence, Rhode Island, June 16-21, 2012.
o Robust 3D Action Recognition with Random Occupancy Patterns. Jiang Wang, Zicheng
Liu, Jan Chorowski, Zhuoyuan Chen, Ying Wu. ECCV 2012, Firenze, Italy, October
13, 2012.
o Ivan Laptev, Marcin MarszaƂek, Cordelia Schmid, Benjamin Rozenfeld. Learning
Realistic Human Actions from Movies. CVPR 2008, 24-26 June 2008, Anchorage,
Alaska, USA.
Download