Statistical Methods for Human Behaviour Recognition Geoff West & Svetha Venkatesh This paper will describe a number of projects that are broadly defined as statistical methods for human behaviour recognition. This has been a major research effort and has applications in human gait analysis, surveilliance, video indexing and smart homes. The statistical methods are those based on hidden Markov models (HMMs) as, in all cases, we are dealing with sequences of patterns and signals that have variability in terms of duration, and in terms of the features used. This is because we are dealing with humans and there is inherent variability in activity. Much of the work is based on a laboratory environment that has a number of ceiling mounted cameras attached to networked PCs so that many streams of video can be captured simultaneously. Use is made of background subtraction (Stauffer et al), bounding box and blob tracking, and bounding box and blob statistics and features to describe the motion of people throughout the environment. Cameras are calibrated so that positions on the floor are mapped between the various cameras giving a reasonably complete idea of movement. Much use is made of Kalman filtering to get good measurements and to deal with occlusion, both behind objects and where people cross each other. Smart House – Scene Labelling In this research, which is part of the smart house project, human activity is being used to build up a description of an indoor scene i.e. is concerned with scene understanding and object recognition. The traditional methods investigated have problems because of the variability of object shape and the problem of deciding what a particular object is e.g. what is a chair? This approach uses human interaction and behaviour and is influenced by the work of Stark et al in the early 1990s in which function of different objects was investigated, mainly from CAD descriptions. Consider the following scenario. Assume that all pixels in an image are classified as background i.e. not labelled as objects. A person walks around a room and occasionally sits on a chair. By using a HMM to track the height of the bounding box, walking/standing, sitting down, seated and standing up can be identified as human activities. Then, each pixel in the image representing the scene can be updated depending on the activity. Pixels near the bottom of the bounding box of a person detected to be walking or standing can be labelled as floor and the more frames that this occurs for, the higher the confidence in the label that they are floor pixels. If a person is detected as sitting down, then pixels inside the bounding box are reinforced to be chair pixels. Again, the more frames that this occurs for the higher the confidence that the pixels are chair pixels. Given a large time and significant human movement in the scene, eventually a picture is built up of the different objects in the scene. The advantage of this method is that it is the human activity that is determining the object class. If people repeatedly sit on a coffee table, the more likelihood this is regarded as a chair rather than a table which is as expected. Current work is concerned with more fine scale human activity determination such as eating and carrying, as well as other coarse scale activity such as lying down. The fine scale activity will be used to identify such objects as tables, cupboards etc. Smart House – Describing Normal Behaviour An important aspect of the smart house project is the need to describe the activities that occur such as cooking and watching television. Analysis of many typical activities reveals a rich hierarchical structure with simple tasks at the bottom of the hierarchy and more complex tasks made up of sequences of lower level tasks at the higher levels. For example, preparing a TV dinner could consist of various actions in the kitchen: going to the fridge, going to the oven etc. as well as turning on the TV and sitting on the sofa. Hierarchical HMMs (HHMMs) have been investigated to learn and describe such hierarchies of activity. Different approaches have been tried, including allowing the HHMM to discover the low level behaviours (number of layers and number of hidden states at the lower levels defined), and learning the lower levels first, followed by fixing the lower levels and learning the higher levels. Modification of the standard techniques for HMM training are used. Current research is concerned with adding duration models to the HHMM structure to enable the identification of abnormal behaviour given that normal behaviour has been used for training. Surveillance – Detecting Normal Behaviour In the area of surveillance, there is a need to describe various normal activities that may occur in a room, a floor of a building, the whole building and so on. At any point in time during human activity, it is necessary to have probabilistic measures of the most likely activities that are occurring. We have been exploring the use of Hierarchical Dynamic Bayesian Networks for this. The advantage of using the hierarchy rather than a flat model is that it incorporates the hierarchy. Currently, we are using the EM algorithm to estimate the parameters of the hierarchical model in a probabilistic model, allowing complex activities to be generated from simple activities. Video Indexing – Using Accelerometers The final project does not rely on the processing of video data but uses accelerometers to measure human activity from limb and body movement directly. These movements can be quite complex and, again, we use HHMMs to describe the sequences at different levels. We have been concentrating on sport official movements such as Australian Rules Football and cricket. The objective is to be able to link the detected movements to the sport video so we can extract various types of event e.g. “show me all the goals scored” or “show me all the cricketers given out by LBW”. Obviously accelerometers (which are now available in small unobtrusive packages) need to be worn by the sports officials but this is not a problem and much video and accelerometer footage has been acquired from various sports. So far we have been able to learn and recognise various cricket martial arts gestures.