PhD proposal - Sophia Antipolis

advertisement
PhD Proposal
Farhood NEGIN
INRIA Sophia Antipolis, STARS group
2004, route des Lucioles, BP93
06902 Sophia Antipolis Cedex – France
http://www-sop.inria.fr/members/Francois.Bremond/
1. Title
People detection for activity recognition using RGB-Depth sensors
2. Scientific context
STARS group works on automatic sequence video interpretation. The “SUP” (“Scene Understanding Platform”)
Platform developed in STARS, detects mobile objects, tracks their trajectory and recognises related behaviours
predefined by experts. This platform contains several techniques for the detection of people and for the
recognition of human postures and activities of one or several persons using conventional cameras. However
there are scientific challenges in people detection when dealing with real word scenes with apathetic patients:
cluttered scenes, handling wrong and incomplete person segmentation, handling static and dynamic occlusions,
low contrasted objects, moving contextual objects (e.g. chairs) ...
Moreover new sensors have been released improving people detection. For instance, thanks to Microsoft and
its Kinect sensor, depth camera became popular and accessible. The basic idea of depth camera is to combine
an IR camera with an IR structured light to determine the depth of each image pixel. This kind of sensor is well
adapted for applications which monitor people (e.g. monitoring Alzheimer patient in hospital): because the
people are in a predefined area and near the camera. The depth cameras have two main advantages: first, the
output images contain depth information and second, the sensor is independent on the light changes (IR
sensor). In our work, we propose to use the Kinect or Asus sensor to acquire 3D images, detect people and
recognize interesting activities. The Kinect SDK library is used to manage the Kinect sensor. This library is based
on a framework similar to OpenNI (an open source driver) to acquire the image. Moreover the library is able to
compute some treatments (e.g. detection of the people) and to provide a true 3D map of the scene in the
referential of the RGB-Depth sensor.
3. General objectives of the PhD
This work consists in the design of novel algorithms for people detection using RGB-Depth sensors (e.g. Kinect)
and for activity recognition, in order to help apathetic patients to improve their conditions of life.
Many techniques have already been proposed for detecting people in specific environment (e.g. low density
laboratory) using the cooperation of several sensors (e.g. camera network, individual equipped with markers,
accelerometer). Despite these studies, people detection is still brittle with conventional cameras and often
depends on the position of the individual relatively to the cameras or is limited (about 6 - 7 meters) with RGBDepth sensors.
This work aims at reducing these hypotheses in order to conceive a general algorithm enabling the detection of
an individual living in an unconstrained environment and observed through a limited number of cameras
including RGB-D sensors. The goal is to review the literature, evaluate existing libraries, propose and assess new
algorithms.
The main objective of the research would be to focus on the limitations of current approaches. We will try to
inspect the open problems in the field. And the target contribution would be to confront those problems by
defining new methods. Investigating available tools could be inspiring in this regard. For instance applying
algorithms which has been used for natural language processing (NLP) to solve computer vision problems (find
meaningful patterns in video) could open a new window to tackle such problems. It will involve utilization of
data mining methods and also semantic analysis of the videos. Moreover, investigating deep learning that has
been produced state-of-the-art results on various tasks could be an interesting research direction for activity
recognition field.
To learn new people detectors, we could explore techniques based on people appearance and silhouette using
for instance local descriptors such as SURF, Hu Moments, skin colour histogram, MESR, LBP, HOG, Haar features,
covariance matrix, Omega descriptor..., in the 3D depth map.
In order to achieve this aim, we will extend different parts of the SUP platform to improve reach of its
algorithms. The main focus will be on “people detection” and “activity recognition” elements of the platform,
where they are highly interlaced and improving former will affect quality of the latter and vice versa.
Online feedback or off-line evaluation
People Detection
Activity
Recognition
Recognition results
Figure 1. relation between detection and recognition: evaluation of activity recognition can help improve
people detection
Currently used people detection algorithm uses classification to recognize people from a set of detected objects
in the scene and based on that it estimates head and shoulder by a predefined model. It uses a background
subtraction method to eliminate noise which results in better detection. While background subtraction part
works quite efficiently, noise removal and merging of different objects shows inconsistency and reduces
performance of the people detection algorithm. As mentioned, combining appearance based techniques like
deformable part models (DPM) with depth images may enhance the performance of our people detection
element of the platform.
So far, activity recognition has been done with two approaches which conventionally categorized as supervised
and unsupervised approaches. We propose a semisupervised activity recognition method that combines the
both methods. In unsupervised approach by using trajectory information and clustering techniques we find
regions of interest in the scene which most probably activities are happening in those regions. We define a
hierarchical activity model for each activity happening inside regions. This model consists of varying information
about a particular activity and mostly relies on time or duration properties of activities. We will try to improve
hierarchical activity model by taking into account more intricate constraints (like local motion) other than just
basic time related attributes.
Evaluation of the people detection algorithm separately is tedious and it but evaluation by activity recognition
is much easier. That is, with the results obtained from recognition element, we can assess how detection
algorithm’s performance is boosted. As figure 1 shows, recognition block’s feedback (on-line or off-line) will
decide how to improve the detection model and on the other hand, apparently, any improvement in detection
will leads to a better recognition.
To validate the PhD we will assess the propose approach on homecare 3D videos from Nice Hospital to evaluate
algorithms to keep older adults functioning at higher levels and living independently.
This PhD will be conducted in the PAL framework (https://pal.inria.fr/ ).
4. Pre-requisites:
Computer Vision, Strong background in C++ programming, Linux, artificial intelligence, cognitive vision, 3D
geometry and Machine Learning.
5. Schedule –2014-2017
1st year:
 Study the limitations of existing algorithms.
 Proposing an original algorithm for people detection.
2nd year:
 Proposing an original algorithm for activity recognition.
 Evaluate and optimise proposed algorithms.
3rd year:
 Writing papers and PhD manuscript.
6. Bibliography:

Y. Yang, D. Ramanan. "Articulated Human Detection with Flexible Mixtures of Parts" IEEE Pattern
Analysis and Machine Intelligence (PAMI). To appear 2013.

Alberto Avanzi, Francois Bremond, Christophe Tornieri and Monique Thonnat, Design and Assessment
of an Intelligent Activity Monitoring Platform, in EURASIP Journal on Applied Signal Processing, special
issue in "Advances in Intelligent Vision Systems: Methods and Applications", 2005.

J. Joumier, R. Romdhane, F. Bremond, M. Thonnat, E. Mulin, P.H. Robert, A. Derreumeaux, J. Piano and
L. Lee. Video Activity Recognition Framework for assessing motor behavioural disorders in Alzheimer
Disease Patients. In the International Workshop on Behaviour Analysis, Behave 2011, Sophia Antipolis,
France on the 23rd of September 2011.

E. Corvee and F. Bremond. Haar like and LBP based features for face, head and people detection in
video sequences. In the International Workshop on Behaviour Analysis, Behave 2011, Sophia Antipolis,
France on the 23rd of September 2011.

Liebe, B. and Schiele, B. “Interleaved object categorization and segmentation” In British Machine Vision
Conference (BMVC'03). Norwich, UK, 2003.

Wang, Peihua and Zhang, "Histogram feature-based Fisher linear discriminant for face detection",
Neural Computing and Applications, Volume 17, Issue 1, Pages: 49 - 58, November 2007.

E. Corvee and F. Bremond, “Combining face detection and people tracking in video surveillance”, 3rd
International Conference on Imaging for Crime Detection and Prevention, ICDP 09, Kingston University,
London, UK, 3rd December 2009.

David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of
Computer Vision, 60, 2 (2004), pp. 91-110

Serhan Cosar, Francois Bremond, “Unsupervised Activity Recognition”
Anh Tuan Nghiem, E. Auvinet, and J. Meunier. Head detection using kinect camera and its
application to fall detection. In Information Science, Signal Processing and their Applications
(ISSPA), 2012 11th International Conference on, pages 164–169, 2012.

7. Contact:
Francois.Bremond@inria.fr
Download