Statistical Methods for Human Behaviour Recognition

advertisement
Statistical Methods for Human Behaviour Recognition
Geoff West & Svetha Venkatesh
This paper will describe a number of projects that are broadly defined as statistical methods for
human behaviour recognition. This has been a major research effort and has applications in
human gait analysis, surveilliance, video indexing and smart homes. The statistical methods
are those based on hidden Markov models (HMMs) as, in all cases, we are dealing with
sequences of patterns and signals that have variability in terms of duration, and in terms of the
features used. This is because we are dealing with humans and there is inherent variability in
activity.
Much of the work is based on a laboratory environment that has a number of ceiling mounted
cameras attached to networked PCs so that many streams of video can be captured
simultaneously. Use is made of background subtraction (Stauffer et al), bounding box and blob
tracking, and bounding box and blob statistics and features to describe the motion of people
throughout the environment. Cameras are calibrated so that positions on the floor are mapped
between the various cameras giving a reasonably complete idea of movement. Much use is
made of Kalman filtering to get good measurements and to deal with occlusion, both behind
objects and where people cross each other.
Smart House – Scene Labelling
In this research, which is part of the smart house project, human activity is being used to build
up a description of an indoor scene i.e. is concerned with scene understanding and object
recognition. The traditional methods investigated have problems because of the variability of
object shape and the problem of deciding what a particular object is e.g. what is a chair? This
approach uses human interaction and behaviour and is influenced by the work of Stark et al in
the early 1990s in which function of different objects was investigated, mainly from CAD
descriptions. Consider the following scenario. Assume that all pixels in an image are classified
as background i.e. not labelled as objects. A person walks around a room and occasionally sits
on a chair. By using a HMM to track the height of the bounding box, walking/standing, sitting
down, seated and standing up can be identified as human activities. Then, each pixel in the
image representing the scene can be updated depending on the activity. Pixels near the bottom
of the bounding box of a person detected to be walking or standing can be labelled as floor and
the more frames that this occurs for, the higher the confidence in the label that they are floor
pixels. If a person is detected as sitting down, then pixels inside the bounding box are
reinforced to be chair pixels. Again, the more frames that this occurs for the higher the
confidence that the pixels are chair pixels. Given a large time and significant human movement
in the scene, eventually a picture is built up of the different objects in the scene. The advantage
of this method is that it is the human activity that is determining the object class. If people
repeatedly sit on a coffee table, the more likelihood this is regarded as a chair rather than a
table which is as expected. Current work is concerned with more fine scale human activity
determination such as eating and carrying, as well as other coarse scale activity such as lying
down. The fine scale activity will be used to identify such objects as tables, cupboards etc.
Smart House – Describing Normal Behaviour
An important aspect of the smart house project is the need to describe the activities that occur
such as cooking and watching television. Analysis of many typical activities reveals a rich
hierarchical structure with simple tasks at the bottom of the hierarchy and more complex tasks
made up of sequences of lower level tasks at the higher levels. For example, preparing a TV
dinner could consist of various actions in the kitchen: going to the fridge, going to the oven etc.
as well as turning on the TV and sitting on the sofa. Hierarchical HMMs (HHMMs) have been
investigated to learn and describe such hierarchies of activity. Different approaches have been
tried, including allowing the HHMM to discover the low level behaviours (number of layers and
number of hidden states at the lower levels defined), and learning the lower levels first,
followed by fixing the lower levels and learning the higher levels. Modification of the standard
techniques for HMM training are used. Current research is concerned with adding duration
models to the HHMM structure to enable the identification of abnormal behaviour given that
normal behaviour has been used for training.
Surveillance – Detecting Normal Behaviour
In the area of surveillance, there is a need to describe various normal activities that may occur
in a room, a floor of a building, the whole building and so on. At any point in time during human
activity, it is necessary to have probabilistic measures of the most likely activities that are
occurring. We have been exploring the use of Hierarchical Dynamic Bayesian Networks for
this. The advantage of using the hierarchy rather than a flat model is that it incorporates the
hierarchy. Currently, we are using the EM algorithm to estimate the parameters of the
hierarchical model in a probabilistic model, allowing complex activities to be generated from
simple activities.
Video Indexing – Using Accelerometers
The final project does not rely on the processing of video data but uses accelerometers to
measure human activity from limb and body movement directly. These movements can be
quite complex and, again, we use HHMMs to describe the sequences at different levels. We
have been concentrating on sport official movements such as Australian Rules Football and
cricket. The objective is to be able to link the detected movements to the sport video so we can
extract various types of event e.g. “show me all the goals scored” or “show me all the cricketers
given out by LBW”. Obviously accelerometers (which are now available in small unobtrusive
packages) need to be worn by the sports officials but this is not a problem and much video and
accelerometer footage has been acquired from various sports. So far we have been able to
learn and recognise various cricket martial arts gestures.
Download