Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
NATO PROGRAMME FOR SECURITY THROUGH SCIENCE
S C I I E N C E F O R P E A C E
NATO Public Diplomacy Division, Bd. Leopold III, B-1110 Brussels, Belgium fax +32 2 707 4232 : e-mail sfp.applications@hq.nato.int
Project Director (PPD): PROF. NAFTALI TISHBY, ISRAEL
Project Director (NPD): PROF. RITA CUCCHIARA, ITALY
People involved in the report’s preparation:
Prof. Rita Cucchiara, Dr. Andrea Prati
Prof. Naftali Tishby
Date of completion: 18 October 2008
1
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
OUTLINE OF THE SfP SUMMARY REPORT ............................. Ошибка! Закладка не определена.
Abstract of Research ............................................................. Ошибка! Закладка не определена.
Major Objectives ................................................................... Ошибка! Закладка не определена.
Overview of Achievements since the Start of the Project until 31 March 2008 ................Ошибка!
Закладка не определена.
Milestones for the Next Six Months ..................................... Ошибка! Закладка не определена.
Implementation of Results .................................................... Ошибка! Закладка не определена.
Other Collaborating Institutions ........................................... Ошибка! Закладка не определена.
2
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
HUJI – The Hebrew University
UNIMORE – Università degli Studi di Modena e Reggio Emilia
MSS – Magal Security Systems, Ltd.
PPD – Partner Project Director
NPD – NATO Project Director
CV – Computer Vision
ML – Machine Learning
FOV – Field of View
OGM – Oscillatory Gait Model
HMM – Hidden Markov Model
SVM – Support Vector Machine
PTZ – Pan-Tilt-Zoom
3
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
(a) Project Director (PPD) (Consult ”Definitions”)
Surname/First name/Title
Job Title, Institute and Address
TISHBY/NAFTALI
/PROF.
Professor, School of Engineering and Computer Science, The
Hebrew University, Ross Building,
Givat Ram Campus, 91904
Jerusalem, Israel
(b) End-user(s) (Consult “Definitions”)
Surname/First name/Title
DANK/ZVI
Job Title, Company/Organisation and Address
V.P. Engineering, Magal Security
Systems, Ltd., P.O. Box 70,
Industrial Zone, 56000, Yahud
Country
ISRAEL
Country
ISRAEL
Telephone, Fax and Email
Tel: +972-2-65-84167
Fax: +972-2-658-6440
Email: tishby@cs.huji.ac.il
Telephone, Fax and Email
Tel: +972-3-5391444
Fax: +972-3-5366245
Email: mglzvi@trendline.co.il
(c) Project Director (NPD) (Consult “Definitions”)
Surname/First name/Title
Job Title, Institute and Address Country Telephone, Fax and E-mail
CUCCHIARA/RITA/
PROF.
Full professor, Dipartimento di
Ingegneria dell’Informazione,
University of Modena and Reggio
Emilia, Via Vignolese, 905, 41100
Modena
ITALY Tel: +39 059 2056136
Fax: +39 059 2056129
Email: rita.cucchiara@unimore.it
4
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
This project is unique since it aims at combining two main areas of research, Computer Vision and
Machine Learning, in an application of automatic surveillance for people detection and tracking and abnormal behavior recognition. Computer Vision (CV) and Machine Learning (ML) have been used jointly for many different applications but either using ML as a tool for computer vision applications or using CV as a case study to proof theoretical advances in ML.
The project aims at exploring how visual features can be automatically extracted from video using computer vision techniques and exploited by a classifier (generated by machine learning) to detect and identify suspicious people behavior in public places in real time. In this sense, CV and ML are jointly developed and studied to provide a better mix of innovative techniques.
Justification of the proposed project is based on two issues of major concern to the state of Israel:
(1) the need for intelligent surveillance in public and commercial areas that are susceptible to terrorist attacks and (2) lack of automatic and intelligent decision support in existing surveillance systems.
More specifically, the objectives of the project are: (1) to achieve a better understanding of which visual features can be used for (1.a) analyzing people activity and (1.b) characterizing people shape; (2) to suitably adapt ML techniques such as HMM, SVM or methods for “novelty detection” in order to infer from the visual features extracted the behavior of the people and possible classifying it as normal or abnormal; (3) develop a first simple prototype in a specific scenario that can be considered as a threat for security.
The machine learning research is carried out at the Hebrew University’s machine learning lab utilizing its long experience in temporal pattern recognition and computational learning methods.
Following the meeting in June 2007 in Jerusalem, we decided to focus in the available time for the project on one particular behavior which is both well defined and threatening: people who leave objects behind them (such as luggage in airports). The machine learning component is based on the following phases: (1) constructing a generative statistical model of human gait on the basis of the features provided by the CV group. Such a model is an adaptation of an oscillatory dynamic model we developed in the past (Singer and Tishby 1994), where different points on the walking person are assumed having a drifted oscillatory motion with characteristic frequency and relative phases; (2) this basic Oscillatory Gait Model (OGM) is then plugged as the output of a state of an
HMM, yielding a complete statistical model of regular gait; (3) detecting deviations (irregularities) in the relative phases and amplitudes of the OGM to capture irregular behavior, e.g. halting, bending, leaving objects, etc. The output of such a statistical model can be classified using likelihood ratio tests or standard classifiers as SVM to improve confidence. (4) We also carried work on detecting statistical irregularities in multivariate correlated data, as another component of the project.
5
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
1st year
Month:
1. Hybrid and distributed multi-camera people detection and tracking
S1.1 People detection and tracking in multi-camera systems
S1.2 Camera coordination primitives for static, hybrid and PTZ cameras
2. Feature extraction for people surveillance
S2.1 Features extraction for people activity detection
S2.2 Features extraction for people shape detection
3. Data preparation and symbolic coding
S3.1 Data preparation and understanding, per-sensor symbolic coding and state modeling for people activity features
S3.2 Data selection, cleaning, formatting, and cases generation for people activity features
S3.3 Data preparation and understanding, per-sensor symbolic coding and state modeling for people shape features
S3.4 Data selection, cleaning, formatting, and cases generation for people shape features
4. Designing a dynamic gait model based on coupled oscillatory motion
S4.1 Using the people activity features to design a statistical classifier of regular gait
S4.2 Design a state model/kernel for the people shape features
1 - 2 – 3
S4.3 Plug in the Gait-Oscillatory model (GOM) as a state in an HMM for a complete regular gate statistical model
S4.4 Use the likelihood of the model for robust classification of regular motion/behaviour
5. Framework for Abnormal Behavior monitoring
4 - 5 - 6
S5.1 Analysis of requirements and constraints
S5.2 Video data collection and annotation
S5.3 Testing and refinement of integrated framework
2nd year
7 - 8 – 9 10 - 11 - 12 1 - 2 - 3 4 - 5 - 6 7 - 8 - 9 10 - 11 - 12
As planned: Completed: Delayed:
6
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
7
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
The main objective of UNIMORE unit in the project is to study which visual features can be used for inferring people abnormal behaviors. These features have been considered coming from two types of analysis: the analysis of the people activity and the people shape.
During the first year of the project UNIMORE has concentrated on developing methods for extracting useful features for both people activity and people shape.
While people activity in terms of trajectory shape analysis (as a first descriptive feature for people activity monitoring) has been completed with a full working system for logging, analyzing and classifying people trajectories, in this third semester UNIMORE has deepen its research on people shape analysis (see step S2.2) with both a newly developed approach for markerless tracking of body parts (to model people shape) and collecting data from a Motion Capture (MoCap) system in order to provide useful data to HUJI.
Some activities planned in the second semester regarding feature extraction from moving cameras have been concluded in this semester. Eventually, a first effort for building a complete multisensor system for behavior analysis has been carried out.
During the second semester, a newly-developed system for people detection and tracking from moving cameras (both constrained PTZ and freely moving cameras), jointly studied with University of Venice, has been produced and tested. In this semester, UNIMORE further completed and enhanced the system, by proposing a new method for exploiting graph matching theory
(developed by University of Venice) in the context of body part tracking. By extracting meaningful body/object parts and imposing spatial constraints among them (to both account for human body deformability and preserve the relative positions and orientations), we were able to create a system for tracking from completely moving cameras, robust enough to occlusions, rotations, scaling, illumination variation, etc (see previous report or (Gualdi, Albarelli, Prati, Torsello, Pelillo,
& Cucchiara, 2008) for a complete description).
The results of this research have been collected in a paper accepted for oral presentation at the
International Workshop on Visual Surveillance (VS), held in conjunction with European
Conference on Computer Vision (ECCV) 2008 that will be held in Marseille (France) in October
2008 (Gualdi, Albarelli, Prati, Torsello, Pelillo, & Cucchiara, 2008).
With the aim of integrating the acquired knowledge and experience in people detection and tracking from single and moving cameras, UNIMORE also “learned the lessons”, by fully understanding the limits of the systems based on a single (either static or moving) camera. Thus, during this semester UNIMORE also addressed the search of the enabling technologies, mostly from computer vision and distributed sensor networks, that have been used as building blocks of an automatic system for the attendance of indoor environments, which can successfully replace the tedious (and costly) human-based activity often limited to very low-level people monitoring
(people positioning, counting, evident abuses of devices, thefts, etc).
The field of applicability is wide: it is enough to quote the attendance of shops, libraries, labs, data-centers, etc. Many times such environments are left unattended, since the cost of a human attendance would be definitely bigger than the cost of the problems deriving from its lack. In our
8
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008 application scenario, multi-modal sensor networks may be used to collect useful features to describe people activities.
Figure 1. Scheme of the overall architecture
The general system architecture, depicted in Fig. 1, is divided into three main layers, namely: perimeter, environment and reasoning.
The perimeter layer deals with the surveillance of the surroundings of the environment to be attended. Usually (but not necessarily) it is based on nodes which are positioned outdoor. Each node is self standing and consists of a micro-controller, a radio-communication device, a camera and a power supply. In order to realize a self-standing device, it is even feasible to power up the nodes through solar panels.
9
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
Figure 2. Scheme of the Perimeter Vision layer
The layer built upon such nodes forms a distributed sensor network that performs video surveillance tasks: object detection (with particular focus on people detection and stationary – possibly abandoned - objects), tracking (with consistent labeling) and people entrance/exit logging. This last task is particularly important, since this information will be handed off to the reasoning layer which will make inferences over the attended area and the people interacting around it. Unfortunately, in distributed video surveillance system, extraction of meaningful information from remote cameras to detect abnormal situations is a challenging task.
As shown in Fig. 2, one node is elected as master, with the task to aggregate data provided by processing nodes, to handle the communication with the reasoning layer and to manage the insertion of new nodes in the layer. Further details on this layer can be found in (Gualdi, et al.,
2008).
The environment layer deals with the surveillance of the environment area to be attended.
Differently from the perimeter layer, is based on cameras which are not supposed to be repositioned. Therefore they are wire-connected to a central processing unit and the requirement of easy and quick deployment is here loosened. In our implementation, the employed cameras are of two different kinds, which play complementary roles: fixed and PTZ cameras (see Fig. 3); but nothing hinders the architecture to exploit other kind of cameras (e.g. omni-directional) or even different sensors.
Figure 3. Scheme of the environment layer
The fixed cameras are positioned inside the environment so that, primarily, all the entrance/exit points are kept under observation and, secondarily, the widest possible area is covered by their fields of view. Since the cameras are fixed, the object segmentation and the single-view tracking can be exploited using the background suppression.
10
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
From a high level view, this layer extends the tasks that were requested in the perimeter layer.
Beyond the people tracking and entrance/exit monitoring, also more advanced tasks are performed; for instance, active tracking of moving objects through PTZ cameras: this is particularly helpful in case an object is leaving the field of view of the fixed cameras. The environment layer, similarly to the perimeter one, forwards the homography, with camera positions and fields of view to the reasoning layer. Being this layer wirely connected to the network, the data regarding moving objects is forwarded by default with the maximum frequency (i.e. equal to the frame rate), and maximum degree of detail (trajectories, object descriptor, images, clips, etc).
Figure 4. Scheme of the reasoning layer and the interactions with the other layers
The reasoning layer, represented in its functionalities in Fig. 4, infers behaviors in the attended environment by using data provided by perimeter and environment layers. In particular, it infers knowledge about what happened, happens and is likely to happen in the environment.
Specifically, regarding the present (what happens), the reasoning layer provides an on-line people counter (restricted to the borders of the indoor environment), a people tracker and an analysis of the status of the environment infrastructure (missing / misplaced objects). Regarding the past
(what happened), the layer offers logging about all the people that came in contact with the environment, offering trajectories, inferred information (e.g. interactions with other people or with infrastructures of the environment) and recorded visual data. Regarding the future (what will likely happen), the layer will infer, merging together the geometrical data and the perimeter observations, who is probably approaching or leaving the environment. Of course, these three pieces of information can be interrelated in order to deepen the knowledge over the environment
(e.g. in case of misplaced object, it is possible to understand who interacted with it using trajectory analysis and, via visual data, understand what really happened).
The results of this research, partially carried out in collaboration with the University of Palermo, has been presented in a paper accepted for publication at First International ACM Workshop on
Vision Networks for Behavior Analysis (VNBA 2008), held in conjunction with ACM Multimedia
11
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008 conference and organized by UNIMORE (Gualdi, et al., 2008). VNBA 2008 will be held on October
31, 2008 in Vancouver (Canada).
In order to provide to HUJI a large set of exemplar data for testing their statistical models,
UNIMORE followed two directions during this semester. The first direction, more interesting and innovative from the computer vision perspective, is based on the development of a completely automatic system capable to segment and track meaningful body parts (describing the people shape) with a markerless approach. To overcome to the obvious inaccuracies of the markerless approach, the second direction followed by UNIMORE is based on the use of a MoCap (Motion
Capture) system which robustly track body parts through the use of several artificial markers.
Markerless approach
Basic approaches for recognizing human actions are based on either the analysis of body shape (in
2D or 3D) or the analysis of the dynamics of prominent points or parts of the human body. More specifically, action recognition approaches can be divided into two main groups (Gavrila D. M.,
1999) depending on whether the analysis is performed directly in the image plane (2D
approaches) or using a three dimensional reconstruction of the action itself (3D approaches). The latter ones have been widely adopted where building and fitting a 3D model of the body parts performing the action is relatively simple due to controlled environmental conditions and highresolution view of the object (Regh & Kanade, 1995) (Goncalves, Di Bernardo, Ursella, & Perona,
1995) (Gavrila & Davis, 1996). These methods are sometimes unfeasible in many real-time surveillance applications. Despite the complexity of the approach used, these methods can be applied only if a more or less sophisticated model of the target exists.
On the contrary, 2D approaches analyze the action in the image plane relaxing all the environmental constraints of 3D approaches but lowering the discriminative power of the actionclassification task. People action classification can be performed in the image plane by either observing and tracking explicitly feature points (local feature approaches (Laptev & Lindeberg,
2003)), or considering the whole shape-motion as a feature itself (holistic approaches (Cucchiara,
Grana, Prati, & Vezzani, 2005) (Ke, Sukthankar, & Hebert, 2007)). (Yilmaz & Shah, 2005) exploited people contour-points tracking to build a 3D volume describing the action and their work represents an example of local feature approaches. (Niebles, Wang, & Fei-Fei, 2006) proposed a feature-based approach that searches for “spatio-temporal words” as a time-collection of points of interest and classify them into actions using a pLSA (probabilistic latent semantic) graphical model. Holistic approaches, instead, directly map low-level image features to actions, preserving spatial and temporal relations. Feature choice is a crucial aspect to obtain a discriminative representation. An interesting holistic approach that detects human action in videos without performing motion segmentation was proposed by (Shechtman & Irani, 2007). They analyzed spatio-temporal video patches to detect discontinuities in the motion-field directions. Despite the general applicability of this method, the high computational cost makes it unusable for real-time surveillance applications.
The system developed by UNIMORE in this task is meant to solve the problem of using artificial markers by automatically segmenting the human silhouette into a certain number of relevant areas found in the image describing the motion evolution. The tracking of the areas’ centroids produces a set of 3D trajectories describing on a fine grain the action of the person. Therefore, in order to compare two actions, we define a novel approach for comparing two sets of trajectories based on sequence global alignment and dynamic programming, similarly to the approach applied in the second semester for people trajectory shape analysis.
12
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
An important addition of this method regards the fusion of information coming from different views of the same action. In this way, a consistent action recognition is exploited to solve errors due to occlusions, view-dependent action representations, etc. Preliminary results showed an excellent discriminative power of this approach.
Figure 5. Scheme of the proposed system
The proposed system is based on four main steps (Fig. 5):
object detection and tracking: for each camera view and tracked from the image
C i
, the moving people are segmented
I i t
at instant time t; by this step, a probability map
PM i t
is obtained for each moving person; techniques used have been reported in
(Cucchiara, Grana, Piccardi, & Prati, 2003) (Cucchiara, Grana, Tardini, & Vezzani, 2004) and examples of PM i t for different actions are shown in Fig. 6;
iterative space-time trajectory extraction: K main components of PM i t corresponding to K main body parts are automatically extracted and tracked; they are used to model the action; EM algorithm is used to infer the parameter set
Gaussians (MoG) on
A t of a 3-variate mixture of i
PM i t ; the means of the K components are tracked frame-by-frame using a minimum distance approach in the pdf domain based on the Bhattacharyya distance; finally, the iterative tracking allows to obtain K space-time trajectories
STT i
i
T T
1
,...,
K i
for each view; some examples of the segmentation achieved by this process with K = 3 components are reported in Fig. 7, where a person leaving a pack is shown.
consistent action recognition: the list of tracked people from different (overlapped) cameras can be used to consistently assign the same label to different instances of the same person in different views (problem of consistent labeling as used in (Calderara,
Cucchiara, & Prati, 2008)); this assignment is used in the consistent action recognition step
to fuse together the STT of the same person coming from different views; action recognition: a new action modeled as the set STT of consistent STT i from different views is compared using global alignment to compute a measure of distance/similarity
13
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008 from all the existing actions: the classification is performed with a minimum distance classifier.
(a) (b) (c) (d) (e)
Figure 6. Examples of PM computed for different actions. Brighter pixels correspond to higher probability.
Figure 7. Examples of the segmentation with the MoG
The global alignment used for action recognition is the same used for people trajectory analysis and described in the second report.
Since the STTs are composed by a set of trajectories, a global similarity measure that accounts for all the trajectories in the sets should be adopted to compare the actions. A K × K distance matrix
is build comparing all the trajectories of the STT modeling action a with the STT modeling action b using the alignment technique: where
i j
with
T i
STT a and
1,1
2,1
...
K ,1
1,2
2,2
K ,2
...
...
...
1, K
2, K
T j
STT b
. The distance matrix is evaluated computing the maximum eigenvalue. Since distance values are in the real domain, matrices are guaranteed to have at least a singular positive eigenvalue. The maximum eigenvalue
of
matrix expresses the variance of distances along the main direction of distances diffusion. This is equivalent to compare how much different STT trajectories are in comparison with the ones belonging to another action performing the comparison globally and without considering any association among trajectories themselves.
When multiple cameras are present, a set of STTs, one for each camera observing the action, is build exploiting the consistent labeling module previously described. The multi-camera STT set embodies the information coming from multiple views allowing to distinguish among actions that
14
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008 may appear similar from a specific point of view. This enriched discriminative properties suggests to exploit a method that jointly compares each STT coming from a specific camera. The matrix is consequently extended to account for STTs coming from the same camera views. More specifically if
C j
STT a i is the action descriptor of action a observed on camera C i and STT b j observed on camera
, the matrix
that compares two different actions observed from both C i and C j becomes:
i
0
0
j
As stated for the single camera case, the eigenvalues are computed to obtain a scalar similarity measure from distance matrices . In particular, for every sub-matrix i , its largest eigenvalue
i is computed and the sum used to identify whether actions are jointly similar on all views.
The proposed approach is conceived for working on a multi-camera setup. Nevertheless, we first evaluate its performance on a single camera setup. The nine actions we considered are summarized in Fig. 8 with an example frame for each action.
Drinking Taking off jacket Sitting
Tying shoes Walking Abandoning object
Oscillating Jumping Raising up arm
Figure 8. Examples of the used actions
Videos are taken from static cameras with a side view, but the action may also take place on a notcompletely-lateral way. The only strong assumption is that the considered moving people are visible at a sufficient resolution. In fact, if this assumption does not hold, the EM algorithm on the
MoG would have too few samples, making it strictly dependent on the initialization seeds and may not converge.
15
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
Several examples for each action have been collected to form a set of 29 videos. The resulting confusion matrix with respect to a manually labeled ground truth is reported in Table 1. In general, the system makes only 4 errors on the 29 videos, resulting in an average accuracy of 86.21%.
Table 1. Confusion matrix in the single camera setup.
In addition to the single camera experiments, we also conducted a preliminary experimentation on a multi-camera setup. Fig. 9 shows two examples of a “tying shoe laces” action from two completely different viewpoints. We collected 10 videos of two actions (“walking” and “tying shoe laces”). Though very preliminary, only two actions have been misclassified (resulting in an average accuracy of 80%), basically due to the poor resolution of one of the cameras.
Figure 9. An example from the multi-camera setup
This approach has been presented in a paper accepted in Proceedings of ACM/IEEE International
Conference on Distributed Smart Cameras (ACM/IEEE ICDSC 2008) that will be held in Vancouver,
BC (Canada) on October 31, 2008 (Calderara, Prati, & Cucchiara, A Markerless Approach for
Consistent Action Recognition in a Multi-camera System, 2008).
Marker-based approach
The method described in the previous section has the clear advantage to be usable also without any artificial marker on the moving people and even on pre-registered videos. On the negative side, however, it is far from being accurate in tracking single body parts/points, especially when the image resolution is limited. As an alternative, UNIMORE also collected data from a system for
MoCap (Motion Capture) available at the “Motion Analysis Laboratory” (headed by Prof. Adriano
Ferrari) at the Faculty of Medicine of UNIMORE. These colleagues provide us with the free use of the system, composed of 8 VICON near-infrared cameras and 2 standard cameras. The VICON cameras, together with a specific software and a calibration procedure, are able to track in the 3D space all the markers (of a special reflective material) which are attached to the human body. For
16
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008 our preliminary tests, we use a classical biological human model with 31 markers attached to the most significant kinematic points on the human body (see Fig.
C7
RA
LA
RA
Rback
RHA
RUL
RPSIS
RASIS
RGT
REP
REP
RASIS
RGT
LASIS
L5
LPSIS
RTH
RLE
RHF
RTT
RM
E
LME
LTT
RLE
RHF
LCA
LLM
LFM
RLM RMM
LVM
LSM
LCA
LMM
RVM
RFM
RSM
RVM
RSM
LMM
RFM
LFM
RCA
RLM
Figure 10. Human model used and the corresponding markers
With this system we were able to collect an initial set of video feeds on several actions (walking, running, abandoning an object, etc.). Fig. 11 shows some examples of the video obtained. The resulting (very accurate) 3D trajectories for all the 31 points have been given to HUJI for further analysis and for testing their machine learning models.
Figure 11. Some examples of the results obtained (from left to right: walking, running, tying shoes)
17
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
D. M. Gavrila, “The visual analysis of human movement: A survey,” Computer Vision and
Image Understanding, vol. 73, no. 1, pp. 82–98, Jan 1999.
J. Regh and T. Kanade, “Model based tracking of self occluding articulated objects,” in Proc. of IEEE Intl Conference on Computer Vision, 1995, pp. 612–617.
L. Goncalves, E. DiBernardo, E. Ursella, and P. Perona, “Monocular tracking of the human arm in 3d,” in Proc. of IEEE Intl Conference on Computer Vision, 1995, pp. 764–770.
D. M. Gavrila and L. S. Davis, “3d model-based tracking of humans in action: A multiview approach,” in Proc. of IEEE Int’l Conference on Computer Vision and Pattern Recognition,
1996, pp. 73–80.
I. Laptev and T. Lindeberg, “Space-time interest points,” in Proc. of IEEE Intl Conference on
Computer Vision, 2003, vol. 1, pp. 432 – 439.
R. Cucchiara, C. Grana, A. Prati, and R. Vezzani, “Probabilistic posture classification for human-behavior analysis,” IEEE Trans. on Systems, Man, and Cybernetics - Part A, vol. 35, no. 1, pp. 42–54, Jan. 2005.
Yan Ke, R. Sukthankar, and M. Hebert, “Spatiotemporal shape and flow correlation for action recognition,” in Proc. of IEEE Int’l Conference on Computer Vision and Pattern
Recognition, 2007, pp. 1–8.
A. Yilmaz and M. Shah, “Action sketch: A novel action representation,” in Proc. of IEEE Int’l
Conference on Computer Vision and Pattern Recognition, 2005, vol. 1, pp. 984–989.
J. C. Niebles, H. Wang, and L. Fei-Fei, “Unsupervised learning of human action categories using spatial temporal words,” in Proc of British Machine Vision Conference, 2006, vol. 3, pp. 1249–1259.
E. Shechtman and M. Irani. Space-time behavior correlation -or- how to tell if two underlying motion fields are similar without computing them? IEEE Trans. on PAMI, 2007
The development of the ViSOR (Video Surveillance Online Repository) described in the previous reports has continued in this semester. Specifically, the current video corpus set includes about
200 videos grouped into 14 categories, and allows four types of annotation:
• Base Annotation: ground-truth, with concepts referred to the whole video.
• New online Annotation tool
• GT Annotation: ground-truth, with a frame level annotation; concepts can be referred to the whole video, to a frame interval or to a single frame.
• Automatic Annotation: output of automatic systems shared by ViSOR users.
Online annotations in ViSOR can be used by registered users for their own purposes, or for querying the ViSOR database, or for executing the Online Performance Evaluation tool and obtain measures of the accuracy of their annotation (also provided by their own automatic tool).
18
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
Figure 12. Some screenshots of the ViSOR web interface
The development activity of VISOR in this semester has also resulted in a publication (Vezzani &
Cucchiara, 2008), recently presented at 5th IEEE International Conference On Advanced Video
and Signal Based Surveillance (AVSS2008) in Santa Fe, New Mexico, 1-3 Sep, 2008
Here is a comprehensive list of the accomplishments achieved so far compared to the Project Plan in the first 18 months (1-18):
Developing a new approach for modeling human gait (GOM) and model it statistically using autoregressive processes (partly done).
Use the GOM as a state output model of an HMM for a complete statistical model of human motion (in progress).
Use the graph Laplacian formulation, proved very successful for detecting irregularities in multivariate data, and apply it to the GOM motion parameters (in progress).
Development of a complete tool for extracting visual features (people detection and tracking with correspondent features) from a system of multiple cameras with partially overlapped FOVs (concluded);
Further enhancement of solutions for analyzing people trajectories to account for multimodal and sequential trajectories in order to infer behaviors (under final development of a prototype);
Study of a system for people shape analysis based on action signature (concluded);
Creation of a video repository for annotated surveillance videos (under further development);
Development of a system for people tracking in freely moving cameras (under further study);
19
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
Development of a system for markerless modeling of human actions from multiple cameras (under further study);
Acquisition of videos from MoCAP system (first set of video acquired);
Organization of the first ACM International Workshop on Vision Networks for Behaviour
Analysis (ACM VNBA 2008) – http://imagelab.ing.unimore.it/vnba08 - Vancouver, BC
(Canada) – October 31, 2008
UNIMORE has move forward the development of real prototypes both for the people detection and tracking from fixed multi-camera systems and for trajectory analysis.
In the next six months UNIMORE (NPD) will focus mainly on the following aspects:
1.
Final development of the complete system for analysis of people behaviors (in terms of trajectories and shape);
2.
Collection of further data for HUJI tests.
At UNIMORE five young scientists have been involved in the project:
Simone Calderara (3rd year PhD student at UNIMORE): involved in the study of people trajectories and the research on people shape detection and markerless segmentation of human body parts; he has been sent to international schools and conferences on these topics to acquire the necessary knowledge and experience for the project;
Roberto Vezzani (experienced Post-doc at UNIMORE – recently appointed as assistant professor at UNIMORE): involved in the development and maintenance of the VISOR system; he also participated to a meeting in Italy to disseminate the VISOR system and
BESAFE project;
Giovanni Gualdi (2nd year PhD student at UNIMORE): involved in the study of methods for object tracking in freely moving cameras.
Daniele Borghesani (1st year PhD student at UNIMORE): involved in the study of biometric features that can be applied to model people shape; he has been sent to international schools on biometry to acquire the necessary knowledge and experience for the project;
Paolo Piccinini (1st year PhD student at UNIMORE): involved in the development of the people trajectory analysis system; he has participated to international schools on fundaments in computer vision and pattern recognition useful for BESAFE project;
5 th Summer School on Advanced Studies Biometric for Secure Authentication (Biometrics
2008) (9-13 June 2008, Alghero, Italy): Daniele Borghesani participated to this school to acquire specific knowledge on biometric techniques and algorithms (school website: http://biometrics.uniss.it/index.php
);
International Computer Vision Summer School (ICVSS) 2008, (14-19 July 2008): Paolo
Piccinini participated to this summer school to acquire the fundamentals of computer vision and pattern recognition (school website: http://svg.dmi.unict.it/icvss2008/ );
Meeting (17-20 March 2008, Amsterdam, The Netherlands): Rita Cucchiara and Roberto
Vezzani attended to a meeting for evaluate possible collaborations within the BESAFE project.
20
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
[1] G. Gualdi, A. Prati, R. Cucchiara, E. Ardizzone, M. La Cascia, L. Lo Presti, M. Morana,
“Enabling Technologies on Hybrid Camera Networks for Behavioral Analysis of
Unattended Indoor Environments and Their Surroundings”, in Proceedings of 1st
ACM International Workshop on Vision Network for Behaviour Analysis (ACM
VNBA 2008)
[2] S. Calderara, A. Prati, R. Cucchiara, “A Markerless Approach for Consistent Action
Recognition in a Multi-camera System”, in Proceedings of ACM/IEEE International
Conference on Distributed Smart Cameras (ACM/IEEE ICDSC 2008)
[3] Calderara, S., Cucchiara, R., & Prati, A. (2008). Bayesian-competitive consistent labeling for people surveillance. IEEE Trans. on PAMI , 30 (2), 354-360.
[4] Gualdi, G., Albarelli, A., Prati, A., Torsello, A., Pelillo, M., & Cucchiara, R. (2008).
Using Dominant Sets for Object Tracking with Freely Moving Camera. in
Proceedings of Workshop on Visual Surveillance (VS 2008).
[5] R. Vezzani, R. Cucchiara, "Annotation Collection and Online Performance Evaluation for Video Surveillance: the ViSOR Project" in press on 5th IEEE International
Conference On Advanced Video and Signal Based Surveillance (AVSS2008), Santa
Fe, New Mexico, 1-3 Sep, 2008.
Organization of the First Workshop on VIdeo Surveillance projects in Italy (VISIT 2008) - http://imagelab.ing.unimore.it/visit2008/ - Modena (Italy), 22 May 2008; the workshop has been presented as sponsored by BESAFE; moreover, the PPD Prof. Naftali Tishby has been the main invited speaker of the workshop
(see the program at http://imagelab.ing.unimore.it/visit2008/program.asp
)
Organization of the First ACM Workshop on Vision Networks for Behaviours Analysis
(VNBA 2008) – http://imagelab.ing.unimore.it/vnba08/ - Vancouver, BC (Canada), 31
October 2008; this workshop will bring together different researchers in the field of vision networks, computer vision and behaviour analysis;
None by UNIMORE.
UNIMORE included in the project staff Daniele Borghesani and Paolo Piccinini.
21
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
Science for Peace - Project Management Handbook
Please provide one sheet per Project Co-Director
ATTENTION: Project Co-Directors from NATO countries (except Bulgaria and Romania) are only eligible for NATO funding for items f-g-h !
Project number: SfP - 982480
Report date: 20/10/2008
Project Co-Director: Prof. Naftali Tishby
Project short title: SfP - BE SAFE
Duration of the Project 1 : 04/07-03/09
Detailed Budget Breakdown
(a) Equipment
ACTUAL
EXPENDITURES
(1) from start until
30.09.08
FORECAST EXPENDITURES
(2) for the following six months
(3) for the following period until project's end
Annex 4a
Comments on changes, if any, in the financial planning compared to the approved Project Plan
1.892
2.903
Upgrades in the brand of the cameras 2 Samsung SHC 740 D/N cameras
1 Samsung SPD 3300 (PTZ), 1 Samsung SHC 750
1" D/N camera, 1 Samsung SVR 950E recorder for cameras
Miscellaneous equipments professional rack for DVD recording ink-jet printer for rack
Thermal-Eye 250D w/150mm Lens
Subtotal "Equipment"
(b) Computers - Software
Sun Fire X2200 M2 x64 Server, DS14 Shelf with 7TB
SATA, 6 Imacs plus upgrades
Lapto, PC, other equipments
Accessories, external storage, printers, peripherals
Software: productivity applications, Data storage and statistics
Subtotal "Computers - Software"
(c) Training
International for meetings in Italy
Subtotal "Training "
(d1) Books and Journals (global figure)
(d2) Publications (global figure)
Subtotal "Books - Publications"
(e) Experts - Advisors security consultant, anti-terror experts
Subtotal "Experts - Advisors "
10.358
4.565
16.815
46.764
14.492
510
61.766
30
30
5.113
12.922
5.400
2.760
19.750
43.735
5.744
4.000
5.290
15.034
10000
10.000
7.110
800
7.910
2.500
2.500
4.887
0
0
0
0
0
0
0
2 PTZ cameras changed in 1 PTZ plus one high-quality D/N camera equipments moved to following period equipments moved to following period for meetings and setup scenarios
Subtotal "Travel" network, servers
Subtotal "Consumables - Spare parts"
(h) Other costs and (i) stipends (specify) telecommunication, printing, desk-top
Miscellaneous
Graduate student (to be identified)
Master's student (to be identified)
Master's student (to be identified)
Subtotal "Other costs"
TOTAL (1), (2), (3) :
CURRENT COST OUTLOOK
= (1)+(2)+(3)
5.113
1.094
1.094
159
1.281
1.440
86.258
190.190
4.887
8.906
8.906
2.841
4.000
519
1.800
1.800
10.960
103.932
0
0
0
0
22
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
Science for Peace - Project Management Handbook
Please provide one sheet per Project Co-Director
ATTENTION: Project Co-Directors from NATO countries (except Bulgaria and Romania) are only eligible for NATO funding for items f-g-h !
Project number: SfP - 982480
Report date: 20/10/2008
Project Co-Director: Prof. Rita Cucchiara
Project short title: SfP - BE SAFE
Duration of the Project 1 : 04/07-03/09
ACTUAL
EXPENDITURES
Detailed Budget Breakdown from start until
(to be completed in EUR
3
)
30.09.08
(a) Equipment
FORECAST EXPENDITURES
(2) for the following six months
(3) for the following period until project's end
Annex 4a
Comments on changes, if any, in the financial planning compared to the approved Project Plan
Subtotal "Equipment"
(b) Computers - Software
(c) Training
Subtotal "Computers - Software"
Subtotal "Training "
(d1) Books and Journals (global figure)
(d2) Publications (global figure)
Subtotal "Books - Publications"
(e) Experts - Advisors
1.688
1.688
(f) Travel
Subtotal "Experts - Advisors "
13.306
Subtotal "Travel"
(g) Consumables - Spare parts:
Subtotal "Consumables - Spare parts"
(h) Other costs and (i) stipends (specify) other vosts stipends
Subtotal "Other costs"
TOTAL (1), (2), (3) :
CURRENT COST OUTLOOK
= (1)+(2)+(3)
13.306
2.059
2.059
1.913
200
2.113
19.166
29.000
0
0
2.947
2.947
6.887
6.887
9.834
0
0 books' quote has been increased a little (approx 188 euro)
0
0
Travels for PHD student involved in the project
(increased of approx 1300 euro)
0
Reduced to compensate to increases in books and travel
23
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
Project number: SfP - 982480
Report date: 20/10/2008
The Project is in the year 2
Project short title: SfP - Be Safe
Duration of the Project: 04/07-03/09
Project Co-Director's name, city, country
Naftali Tishby,Israel
Rita Cucchiara, Modena, Italia
APPROVED
BUDGET:
Total year 1-5
CURRENT COST
OUTLOOK:
Total year 1 - 5
30.09. of current year 2
190.190
29.000
190.190
29.000
86.258
19.166
FORECAST EXPENDITURES months
103.932
9.834
for the following period until project's end
Comments on changes, if any, in financial planning compared to the approved Project Plan
TOTAL (must be identical with
TOTALs given in 'Breakdown per item'):
219.190
219.190
105.424
113.766
Breakdown per item (to be completed in EUR 3)
ACTUAL
EXPENDITURES
FORECAST EXPENDITURES
Project Co-Director's name, city, country
APPROVED
BUDGET:
Total year 1-5
CURRENT COST
OUTLOOK:
Total year 1 - 5
30.09. of current year 2 months
(a) Equipment 60.550
60.550
16.815
43.735
(b) Computers - Software 76.800
76.800
61.766
15.034
(c) Training 10.000
10.000
10.000
(d) Books - Publications 7.940
9.628
1.718
7.910
(e) Experts - Advisors 2.500
2.500
2.500
(f) Travel 20.000
23.306
18.419
4.887
(g) Consumables - Spare parts:
(h) Other costs and (i) stipends
19.000
22.400
15.006
21.400
3.153
3.553
11.853
17.847
TOTAL : 219.190
219.190
105.424
113.766
1 Give month/year when the Project started and expected ending date. 2 Choose the appropriate date and complete the year. 3 As of January 2002, grants will be made in Euro (EUR) and all figures should be given in EUR. for the following period until project's end
0
Comments on changes, if any, in financial planning compared to the approved Project Plan books are necessary to the added staff member
Travels for participating to schools for added PhD students reduced to compensate incresed costs of books and travels reduced to compensate incresed costs of books and travels
24
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
The completion of the equipment inventory records has been delayed since we never received the inventory labels.
Manufacturer Model Number Serial Number Date of Purchase Cost (EUR 1 ) Inventory Label No. Property
Item
Location
25
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
Project number: SfP - 982480
Report date: 20/10/2008
The Project is in the year: 2
Criteria for Success as approved
with the first Grant Letter on: 24/10/2006
1) Abnormal behavior: defined, scenarios of motion capture video are collected, data is acquired and annotated
2) People detection and tracking: techniques for multiple cameras
and PTZ defined; detection and tracking evaluated
CRITERIA FOR SUCCESS TABLE
Project short title: SfP - BESAFE
Duration of the Project
1
: 04/07-03/09
3) People activity: features extracted, symbolic coding for trajectories defined, data prepared, per-sensor classification is evaluated
%
Criteria for Success:
Achievements as at 30.09.08
(changes should be refleced here)
1) Abnormal behavior: partial definition,
25% defined scenario of abandoned baggage, acquired several annotated videos
, acquired additional video with MoCAP
2) People detection and tracking: techniques for overlapped multiple cameras
20% defined and deeply tested; preliminary techniques for PTZ studied; detection and tracking evaluated; preliminary studies for freely moving cameras; going forward an integrated system
3) People activity: features partially extracted, symbolic coding for
15% trajectories defined, data prepared
4) People shape: features extracted, symbolic coding defined, data prepared, per-sensor classification is evaluated
5) Kernel design and SVM learning: kernels are mathematically defined, their evaluation algorithm is implemented, experimental tests and accuracy evaluated
.)
4) People shape: initial study on feature extraction and representation
15% through action signatures; markerless system for human body part tracking
25%
5) Statistical framework designed
.)
%
25%
20%
10%
10%
10%
TOTAL : 100% TOTAL
4
: 75%
26
Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008
27