PhD Proposal Farhood NEGIN INRIA Sophia Antipolis, STARS group 2004, route des Lucioles, BP93 06902 Sophia Antipolis Cedex – France http://www-sop.inria.fr/members/Francois.Bremond/ 1. Title People detection for activity recognition using RGB-Depth sensors 2. Scientific context STARS group works on automatic sequence video interpretation. The “SUP” (“Scene Understanding Platform”) Platform developed in STARS, detects mobile objects, tracks their trajectory and recognises related behaviours predefined by experts. This platform contains several techniques for the detection of people and for the recognition of human postures and activities of one or several persons using conventional cameras. However there are scientific challenges in people detection when dealing with real word scenes with apathetic patients: cluttered scenes, handling wrong and incomplete person segmentation, handling static and dynamic occlusions, low contrasted objects, moving contextual objects (e.g. chairs) ... Moreover new sensors have been released improving people detection. For instance, thanks to Microsoft and its Kinect sensor, depth camera became popular and accessible. The basic idea of depth camera is to combine an IR camera with an IR structured light to determine the depth of each image pixel. This kind of sensor is well adapted for applications which monitor people (e.g. monitoring Alzheimer patient in hospital): because the people are in a predefined area and near the camera. The depth cameras have two main advantages: first, the output images contain depth information and second, the sensor is independent on the light changes (IR sensor). In our work, we propose to use the Kinect or Asus sensor to acquire 3D images, detect people and recognize interesting activities. The Kinect SDK library is used to manage the Kinect sensor. This library is based on a framework similar to OpenNI (an open source driver) to acquire the image. Moreover the library is able to compute some treatments (e.g. detection of the people) and to provide a true 3D map of the scene in the referential of the RGB-Depth sensor. 3. General objectives of the PhD This work consists in the design of novel algorithms for people detection using RGB-Depth sensors (e.g. Kinect) and for activity recognition, in order to help apathetic patients to improve their conditions of life. Many techniques have already been proposed for detecting people in specific environment (e.g. low density laboratory) using the cooperation of several sensors (e.g. camera network, individual equipped with markers, accelerometer). Despite these studies, people detection is still brittle with conventional cameras and often depends on the position of the individual relatively to the cameras or is limited (about 6 - 7 meters) with RGBDepth sensors. This work aims at reducing these hypotheses in order to conceive a general algorithm enabling the detection of an individual living in an unconstrained environment and observed through a limited number of cameras including RGB-D sensors. The goal is to review the literature, evaluate existing libraries, propose and assess new algorithms. The main objective of the research would be to focus on the limitations of current approaches. We will try to inspect the open problems in the field. And the target contribution would be to confront those problems by defining new methods. Investigating available tools could be inspiring in this regard. For instance applying algorithms which has been used for natural language processing (NLP) to solve computer vision problems (find meaningful patterns in video) could open a new window to tackle such problems. It will involve utilization of data mining methods and also semantic analysis of the videos. Moreover, investigating deep learning that has been produced state-of-the-art results on various tasks could be an interesting research direction for activity recognition field. To learn new people detectors, we could explore techniques based on people appearance and silhouette using for instance local descriptors such as SURF, Hu Moments, skin colour histogram, MESR, LBP, HOG, Haar features, covariance matrix, Omega descriptor..., in the 3D depth map. In order to achieve this aim, we will extend different parts of the SUP platform to improve reach of its algorithms. The main focus will be on “people detection” and “activity recognition” elements of the platform, where they are highly interlaced and improving former will affect quality of the latter and vice versa. Online feedback or off-line evaluation People Detection Activity Recognition Recognition results Figure 1. relation between detection and recognition: evaluation of activity recognition can help improve people detection Currently used people detection algorithm uses classification to recognize people from a set of detected objects in the scene and based on that it estimates head and shoulder by a predefined model. It uses a background subtraction method to eliminate noise which results in better detection. While background subtraction part works quite efficiently, noise removal and merging of different objects shows inconsistency and reduces performance of the people detection algorithm. As mentioned, combining appearance based techniques like deformable part models (DPM) with depth images may enhance the performance of our people detection element of the platform. So far, activity recognition has been done with two approaches which conventionally categorized as supervised and unsupervised approaches. We propose a semisupervised activity recognition method that combines the both methods. In unsupervised approach by using trajectory information and clustering techniques we find regions of interest in the scene which most probably activities are happening in those regions. We define a hierarchical activity model for each activity happening inside regions. This model consists of varying information about a particular activity and mostly relies on time or duration properties of activities. We will try to improve hierarchical activity model by taking into account more intricate constraints (like local motion) other than just basic time related attributes. Evaluation of the people detection algorithm separately is tedious and it but evaluation by activity recognition is much easier. That is, with the results obtained from recognition element, we can assess how detection algorithm’s performance is boosted. As figure 1 shows, recognition block’s feedback (on-line or off-line) will decide how to improve the detection model and on the other hand, apparently, any improvement in detection will leads to a better recognition. To validate the PhD we will assess the propose approach on homecare 3D videos from Nice Hospital to evaluate algorithms to keep older adults functioning at higher levels and living independently. This PhD will be conducted in the PAL framework (https://pal.inria.fr/ ). 4. Pre-requisites: Computer Vision, Strong background in C++ programming, Linux, artificial intelligence, cognitive vision, 3D geometry and Machine Learning. 5. Schedule –2014-2017 1st year: Study the limitations of existing algorithms. Proposing an original algorithm for people detection. 2nd year: Proposing an original algorithm for activity recognition. Evaluate and optimise proposed algorithms. 3rd year: Writing papers and PhD manuscript. 6. Bibliography: Y. Yang, D. Ramanan. "Articulated Human Detection with Flexible Mixtures of Parts" IEEE Pattern Analysis and Machine Intelligence (PAMI). To appear 2013. Alberto Avanzi, Francois Bremond, Christophe Tornieri and Monique Thonnat, Design and Assessment of an Intelligent Activity Monitoring Platform, in EURASIP Journal on Applied Signal Processing, special issue in "Advances in Intelligent Vision Systems: Methods and Applications", 2005. J. Joumier, R. Romdhane, F. Bremond, M. Thonnat, E. Mulin, P.H. Robert, A. Derreumeaux, J. Piano and L. Lee. Video Activity Recognition Framework for assessing motor behavioural disorders in Alzheimer Disease Patients. In the International Workshop on Behaviour Analysis, Behave 2011, Sophia Antipolis, France on the 23rd of September 2011. E. Corvee and F. Bremond. Haar like and LBP based features for face, head and people detection in video sequences. In the International Workshop on Behaviour Analysis, Behave 2011, Sophia Antipolis, France on the 23rd of September 2011. Liebe, B. and Schiele, B. “Interleaved object categorization and segmentation” In British Machine Vision Conference (BMVC'03). Norwich, UK, 2003. Wang, Peihua and Zhang, "Histogram feature-based Fisher linear discriminant for face detection", Neural Computing and Applications, Volume 17, Issue 1, Pages: 49 - 58, November 2007. E. Corvee and F. Bremond, “Combining face detection and people tracking in video surveillance”, 3rd International Conference on Imaging for Crime Detection and Prevention, ICDP 09, Kingston University, London, UK, 3rd December 2009. David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110 Serhan Cosar, Francois Bremond, “Unsupervised Activity Recognition” Anh Tuan Nghiem, E. Auvinet, and J. Meunier. Head detection using kinect camera and its application to fall detection. In Information Science, Signal Processing and their Applications (ISSPA), 2012 11th International Conference on, pages 164–169, 2012. 7. Contact: Francois.Bremond@inria.fr