Technical Progress

Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008

NATO PROGRAMME FOR SECURITY THROUGH SCIENCE

S C I I E N C E F O R P E A C E

NATO Public Diplomacy Division, Bd. Leopold III, B-1110 Brussels, Belgium fax +32 2 707 4232 : e-mail sfp.applications@hq.nato.int

Progress Report – NOVEMBER 2008

Project SfP 982480 – BE SAFE

(Behavior lEarning in Surveilled Areas with Feature Extraction)

Project Director (PPD): PROF. NAFTALI TISHBY, ISRAEL

Project Director (NPD): PROF. RITA CUCCHIARA, ITALY

People involved in the report’s preparation:

Prof. Rita Cucchiara, Dr. Andrea Prati

Prof. Naftali Tishby

Date of completion: 18 October 2008

1


Table of Content

Table of Content................................................................................................................................... 2

List of abbreviations ............................................................................................................................. 3

Participants .......................................................................................................................................... 4

Project background and objectives ...................................................................................................... 5

Overview of the project ....................................................................................................................... 6

Technical Progress ............................................................................................................................... 7

PPD – Hebrew Unviersity (HUJI) ...................................................................................................... 7

NPD – University of Modena and Reggio Emilia (UNIMORE) .......................................................... 8

Financial Status .................................................................................................................................. 22

PPD Financial Status ....................................................................................................................... 22

NPD Financial Status ...................................................................................................................... 23

Equipment Inventory Records ........................................................................................................... 25

Criteria for success table .................................................................................................................... 26

OUTLINE OF THE SfP SUMMARY REPORT ............................. Ошибка! Закладка не определена.

Abstract of Research ............................................................. Ошибка! Закладка не определена.

Major Objectives ................................................................... Ошибка! Закладка не определена.

Overview of Achievements since the Start of the Project until 31 March 2008 ................Ошибка!

Закладка не определена.

Milestones for the Next Six Months ..................................... Ошибка! Закладка не определена.

Implementation of Results .................................................... Ошибка! Закладка не определена.

Other Collaborating Institutions ........................................... Ошибка! Закладка не определена.

2


List of abbreviations

HUJI – The Hebrew University

UNIMORE – Università degli Studi di Modena e Reggio Emilia

MSS – Magal Security Systems, Ltd.

PPD – Partner Project Director

NPD – NATO Project Director

CV – Computer Vision

ML – Machine Learning

FOV – Field of View

OGM – Oscillatory Gait Model

HMM – Hidden Markov Model

SVM – Support Vector Machine

PTZ – Pan-Tilt-Zoom

3


Participants

(a) Project Director (PPD) (Consult ”Definitions”)

Surname/First name/Title

Job Title, Institute and Address

TISHBY/NAFTALI

/PROF.

Professor, School of Engineering and Computer Science, The

Hebrew University, Ross Building,

Givat Ram Campus, 91904

Jerusalem, Israel

(b) End-user(s) (Consult “Definitions”)


DANK/ZVI

Job Title, Company/Organisation and Address

V.P. Engineering, Magal Security

Systems, Ltd., P.O. Box 70,

Industrial Zone, 56000, Yahud

Country

ISRAEL

Country

ISRAEL

Telephone, Fax and Email

Tel: +972-2-65-84167

Fax: +972-2-658-6440

Email: tishby@cs.huji.ac.il

Telephone, Fax and Email

Tel: +972-3-5391444

Fax: +972-3-5366245

Email: mglzvi@trendline.co.il

(c) Project Director (NPD) (Consult “Definitions”)


Job Title, Institute and Address Country Telephone, Fax and E-mail

CUCCHIARA/RITA/

PROF.

Full professor, Dipartimento di

Ingegneria dell’Informazione,

University of Modena and Reggio

Emilia, Via Vignolese, 905, 41100

Modena

ITALY Tel: +39 059 2056136

Fax: +39 059 2056129

Email: rita.cucchiara@unimore.it

4


Project background and objectives

This project is unique since it aims at combining two main areas of research, Computer Vision and

Machine Learning, in an application of automatic surveillance for people detection and tracking and abnormal behavior recognition. Computer Vision (CV) and Machine Learning (ML) have been used jointly for many different applications but either using ML as a tool for computer vision applications or using CV as a case study to proof theoretical advances in ML.

The project aims at exploring how visual features can be automatically extracted from video using computer vision techniques and exploited by a classifier (generated by machine learning) to detect and identify suspicious people behavior in public places in real time. In this sense, CV and ML are jointly developed and studied to provide a better mix of innovative techniques.

Justification of the proposed project is based on two issues of major concern to the state of Israel:

(1) the need for intelligent surveillance in public and commercial areas that are susceptible to terrorist attacks and (2) lack of automatic and intelligent decision support in existing surveillance systems.

More specifically, the objectives of the project are: (1) to achieve a better understanding of which visual features can be used for (1.a) analyzing people activity and (1.b) characterizing people shape; (2) to suitably adapt ML techniques such as HMM, SVM or methods for “novelty detection” in order to infer from the visual features extracted the behavior of the people and possible classifying it as normal or abnormal; (3) develop a first simple prototype in a specific scenario that can be considered as a threat for security.

The machine learning research is carried out at the Hebrew University’s machine learning lab utilizing its long experience in temporal pattern recognition and computational learning methods.

Following the meeting in June 2007 in Jerusalem, we decided to focus in the available time for the project on one particular behavior which is both well defined and threatening: people who leave objects behind them (such as luggage in airports). The machine learning component is based on the following phases: (1) constructing a generative statistical model of human gait on the basis of the features provided by the CV group. Such a model is an adaptation of an oscillatory dynamic model we developed in the past (Singer and Tishby 1994), where different points on the walking person are assumed having a drifted oscillatory motion with characteristic frequency and relative phases; (2) this basic Oscillatory Gait Model (OGM) is then plugged as the output of a state of an

HMM, yielding a complete statistical model of regular gait; (3) detecting deviations (irregularities) in the relative phases and amplitudes of the OGM to capture irregular behavior, e.g. halting, bending, leaving objects, etc. The output of such a statistical model can be classified using likelihood ratio tests or standard classifiers as SVM to improve confidence. (4) We also carried work on detecting statistical irregularities in multivariate correlated data, as another component of the project.

5


Overview of the project

1st year

Month:

1. Hybrid and distributed multi-camera people detection and tracking

S1.1 People detection and tracking in multi-camera systems

S1.2 Camera coordination primitives for static, hybrid and PTZ cameras

2. Feature extraction for people surveillance

S2.1 Features extraction for people activity detection

S2.2 Features extraction for people shape detection

3. Data preparation and symbolic coding

S3.1 Data preparation and understanding, per-sensor symbolic coding and state modeling for people activity features

S3.2 Data selection, cleaning, formatting, and cases generation for people activity features

S3.3 Data preparation and understanding, per-sensor symbolic coding and state modeling for people shape features

S3.4 Data selection, cleaning, formatting, and cases generation for people shape features

4. Designing a dynamic gait model based on coupled oscillatory motion

S4.1 Using the people activity features to design a statistical classifier of regular gait

S4.2 Design a state model/kernel for the people shape features

1 - 2 – 3

S4.3 Plug in the Gait-Oscillatory model (GOM) as a state in an HMM for a complete regular gate statistical model

S4.4 Use the likelihood of the model for robust classification of regular motion/behaviour

5. Framework for Abnormal Behavior monitoring

4 - 5 - 6

S5.1 Analysis of requirements and constraints

S5.2 Video data collection and annotation

S5.3 Testing and refinement of integrated framework

2nd year

7 - 8 – 9 10 - 11 - 12 1 - 2 - 3 4 - 5 - 6 7 - 8 - 9 10 - 11 - 12

As planned: Completed: Delayed:

6


Technical Progress

PPD – Hebrew Unviersity (HUJI)

Description of the research (months 13-18)

7


NPD – University of Modena and Reggio Emilia (UNIMORE)

Description of the research (months 13-18)

The main objective of UNIMORE unit in the project is to study which visual features can be used for inferring people abnormal behaviors. These features have been considered coming from two types of analysis: the analysis of the people activity and the people shape.

During the first year of the project UNIMORE has concentrated on developing methods for extracting useful features for both people activity and people shape.

While people activity in terms of trajectory shape analysis (as a first descriptive feature for people activity monitoring) has been completed with a full working system for logging, analyzing and classifying people trajectories, in this third semester UNIMORE has deepen its research on people shape analysis (see step S2.2) with both a newly developed approach for markerless tracking of body parts (to model people shape) and collecting data from a Motion Capture (MoCap) system in order to provide useful data to HUJI.

Some activities planned in the second semester regarding feature extraction from moving cameras have been concluded in this semester. Eventually, a first effort for building a complete multisensor system for behavior analysis has been carried out.

Task S1.2 Camera coordination primitives for static, hybrid and PTZ cameras

During the second semester, a newly-developed system for people detection and tracking from moving cameras (both constrained PTZ and freely moving cameras), jointly studied with University of Venice, has been produced and tested. In this semester, UNIMORE further completed and enhanced the system, by proposing a new method for exploiting graph matching theory

(developed by University of Venice) in the context of body part tracking. By extracting meaningful body/object parts and imposing spatial constraints among them (to both account for human body deformability and preserve the relative positions and orientations), we were able to create a system for tracking from completely moving cameras, robust enough to occlusions, rotations, scaling, illumination variation, etc (see previous report or (Gualdi, Albarelli, Prati, Torsello, Pelillo,

& Cucchiara, 2008) for a complete description).

The results of this research have been collected in a paper accepted for oral presentation at the

International Workshop on Visual Surveillance (VS), held in conjunction with European

Conference on Computer Vision (ECCV) 2008 that will be held in Marseille (France) in October

2008 (Gualdi, Albarelli, Prati, Torsello, Pelillo, & Cucchiara, 2008).

With the aim of integrating the acquired knowledge and experience in people detection and tracking from single and moving cameras, UNIMORE also “learned the lessons”, by fully understanding the limits of the systems based on a single (either static or moving) camera. Thus, during this semester UNIMORE also addressed the search of the enabling technologies, mostly from computer vision and distributed sensor networks, that have been used as building blocks of an automatic system for the attendance of indoor environments, which can successfully replace the tedious (and costly) human-based activity often limited to very low-level people monitoring

(people positioning, counting, evident abuses of devices, thefts, etc).

The field of applicability is wide: it is enough to quote the attendance of shops, libraries, labs, data-centers, etc. Many times such environments are left unattended, since the cost of a human attendance would be definitely bigger than the cost of the problems deriving from its lack. In our

8

Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008 application scenario, multi-modal sensor networks may be used to collect useful features to describe people activities.

Figure 1. Scheme of the overall architecture

The general system architecture, depicted in Fig. 1, is divided into three main layers, namely: perimeter, environment and reasoning.

The perimeter layer deals with the surveillance of the surroundings of the environment to be attended. Usually (but not necessarily) it is based on nodes which are positioned outdoor. Each node is self standing and consists of a micro-controller, a radio-communication device, a camera and a power supply. In order to realize a self-standing device, it is even feasible to power up the nodes through solar panels.

9


Figure 2. Scheme of the Perimeter Vision layer

The layer built upon such nodes forms a distributed sensor network that performs video surveillance tasks: object detection (with particular focus on people detection and stationary – possibly abandoned - objects), tracking (with consistent labeling) and people entrance/exit logging. This last task is particularly important, since this information will be handed off to the reasoning layer which will make inferences over the attended area and the people interacting around it. Unfortunately, in distributed video surveillance system, extraction of meaningful information from remote cameras to detect abnormal situations is a challenging task.

As shown in Fig. 2, one node is elected as master, with the task to aggregate data provided by processing nodes, to handle the communication with the reasoning layer and to manage the insertion of new nodes in the layer. Further details on this layer can be found in (Gualdi, et al.,

2008).

The environment layer deals with the surveillance of the environment area to be attended.

Differently from the perimeter layer, is based on cameras which are not supposed to be repositioned. Therefore they are wire-connected to a central processing unit and the requirement of easy and quick deployment is here loosened. In our implementation, the employed cameras are of two different kinds, which play complementary roles: fixed and PTZ cameras (see Fig. 3); but nothing hinders the architecture to exploit other kind of cameras (e.g. omni-directional) or even different sensors.

Figure 3. Scheme of the environment layer

The fixed cameras are positioned inside the environment so that, primarily, all the entrance/exit points are kept under observation and, secondarily, the widest possible area is covered by their fields of view. Since the cameras are fixed, the object segmentation and the single-view tracking can be exploited using the background suppression.

10


From a high level view, this layer extends the tasks that were requested in the perimeter layer.

Beyond the people tracking and entrance/exit monitoring, also more advanced tasks are performed; for instance, active tracking of moving objects through PTZ cameras: this is particularly helpful in case an object is leaving the field of view of the fixed cameras. The environment layer, similarly to the perimeter one, forwards the homography, with camera positions and fields of view to the reasoning layer. Being this layer wirely connected to the network, the data regarding moving objects is forwarded by default with the maximum frequency (i.e. equal to the frame rate), and maximum degree of detail (trajectories, object descriptor, images, clips, etc).

Figure 4. Scheme of the reasoning layer and the interactions with the other layers

The reasoning layer, represented in its functionalities in Fig. 4, infers behaviors in the attended environment by using data provided by perimeter and environment layers. In particular, it infers knowledge about what happened, happens and is likely to happen in the environment.

Specifically, regarding the present (what happens), the reasoning layer provides an on-line people counter (restricted to the borders of the indoor environment), a people tracker and an analysis of the status of the environment infrastructure (missing / misplaced objects). Regarding the past

(what happened), the layer offers logging about all the people that came in contact with the environment, offering trajectories, inferred information (e.g. interactions with other people or with infrastructures of the environment) and recorded visual data. Regarding the future (what will likely happen), the layer will infer, merging together the geometrical data and the perimeter observations, who is probably approaching or leaving the environment. Of course, these three pieces of information can be interrelated in order to deepen the knowledge over the environment

(e.g. in case of misplaced object, it is possible to understand who interacted with it using trajectory analysis and, via visual data, understand what really happened).

The results of this research, partially carried out in collaboration with the University of Palermo, has been presented in a paper accepted for publication at First International ACM Workshop on

Vision Networks for Behavior Analysis (VNBA 2008), held in conjunction with ACM Multimedia

11

Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008 conference and organized by UNIMORE (Gualdi, et al., 2008). VNBA 2008 will be held on October

31, 2008 in Vancouver (Canada).

Task S2.2 Features extraction for people shape detection

In order to provide to HUJI a large set of exemplar data for testing their statistical models,

UNIMORE followed two directions during this semester. The first direction, more interesting and innovative from the computer vision perspective, is based on the development of a completely automatic system capable to segment and track meaningful body parts (describing the people shape) with a markerless approach. To overcome to the obvious inaccuracies of the markerless approach, the second direction followed by UNIMORE is based on the use of a MoCap (Motion

Capture) system which robustly track body parts through the use of several artificial markers.

Markerless approach

Basic approaches for recognizing human actions are based on either the analysis of body shape (in

2D or 3D) or the analysis of the dynamics of prominent points or parts of the human body. More specifically, action recognition approaches can be divided into two main groups (Gavrila D. M.,

1999) depending on whether the analysis is performed directly in the image plane (2D

approaches) or using a three dimensional reconstruction of the action itself (3D approaches). The latter ones have been widely adopted where building and fitting a 3D model of the body parts performing the action is relatively simple due to controlled environmental conditions and highresolution view of the object (Regh & Kanade, 1995) (Goncalves, Di Bernardo, Ursella, & Perona,

1995) (Gavrila & Davis, 1996). These methods are sometimes unfeasible in many real-time surveillance applications. Despite the complexity of the approach used, these methods can be applied only if a more or less sophisticated model of the target exists.

On the contrary, 2D approaches analyze the action in the image plane relaxing all the environmental constraints of 3D approaches but lowering the discriminative power of the actionclassification task. People action classification can be performed in the image plane by either observing and tracking explicitly feature points (local feature approaches (Laptev & Lindeberg,

2003)), or considering the whole shape-motion as a feature itself (holistic approaches (Cucchiara,

Grana, Prati, & Vezzani, 2005) (Ke, Sukthankar, & Hebert, 2007)). (Yilmaz & Shah, 2005) exploited people contour-points tracking to build a 3D volume describing the action and their work represents an example of local feature approaches. (Niebles, Wang, & Fei-Fei, 2006) proposed a feature-based approach that searches for “spatio-temporal words” as a time-collection of points of interest and classify them into actions using a pLSA (probabilistic latent semantic) graphical model. Holistic approaches, instead, directly map low-level image features to actions, preserving spatial and temporal relations. Feature choice is a crucial aspect to obtain a discriminative representation. An interesting holistic approach that detects human action in videos without performing motion segmentation was proposed by (Shechtman & Irani, 2007). They analyzed spatio-temporal video patches to detect discontinuities in the motion-field directions. Despite the general applicability of this method, the high computational cost makes it unusable for real-time surveillance applications.

The system developed by UNIMORE in this task is meant to solve the problem of using artificial markers by automatically segmenting the human silhouette into a certain number of relevant areas found in the image describing the motion evolution. The tracking of the areas’ centroids produces a set of 3D trajectories describing on a fine grain the action of the person. Therefore, in order to compare two actions, we define a novel approach for comparing two sets of trajectories based on sequence global alignment and dynamic programming, similarly to the approach applied in the second semester for people trajectory shape analysis.

12


An important addition of this method regards the fusion of information coming from different views of the same action. In this way, a consistent action recognition is exploited to solve errors due to occlusions, view-dependent action representations, etc. Preliminary results showed an excellent discriminative power of this approach.

Figure 5. Scheme of the proposed system

The proposed system is based on four main steps (Fig. 5):

 object detection and tracking: for each camera view and tracked from the image

C i

, the moving people are segmented

I i t

  at instant time t; by this step, a probability map

PM i t

 

is obtained for each moving person; techniques used have been reported in

(Cucchiara, Grana, Piccardi, & Prati, 2003) (Cucchiara, Grana, Tardini, & Vezzani, 2004) and examples of PM i t for different actions are shown in Fig. 6;

 iterative space-time trajectory extraction: K main components of PM i t corresponding to K main body parts are automatically extracted and tracked; they are used to model the action; EM algorithm is used to infer the parameter set

Gaussians (MoG) on

A t of a 3-variate mixture of i

PM i t ; the means of the K components are tracked frame-by-frame using a minimum distance approach in the pdf domain based on the Bhattacharyya distance; finally, the iterative tracking allows to obtain K space-time trajectories

STT i 

 i

T T

1

,...,

K i

 for each view; some examples of the segmentation achieved by this process with K = 3 components are reported in Fig. 7, where a person leaving a pack is shown.

 consistent action recognition: the list of tracked people from different (overlapped) cameras can be used to consistently assign the same label to different instances of the same person in different views (problem of consistent labeling as used in (Calderara,

Cucchiara, & Prati, 2008)); this assignment is used in the consistent action recognition step

 to fuse together the STT of the same person coming from different views; action recognition: a new action modeled as the set STT of consistent STT i from different views is compared using global alignment to compute a measure of distance/similarity

13

Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008 from all the existing actions: the classification is performed with a minimum distance classifier.

(a) (b) (c) (d) (e)

Figure 6. Examples of PM computed for different actions. Brighter pixels correspond to higher probability.

Figure 7. Examples of the segmentation with the MoG

The global alignment used for action recognition is the same used for people trajectory analysis and described in the second report.

Since the STTs are composed by a set of trajectories, a global similarity measure that accounts for all the trajectories in the sets should be adopted to compare the actions. A K × K distance matrix

 is build comparing all the trajectories of the STT modeling action a with the STT modeling action b using the alignment technique: where

  

 i j

 with

 

T i



STT a and

















1,1

2,1

...





 

K ,1

1,2

2,2

K ,2

...

...

...







1, K

2, K













T j



STT b

. The distance matrix is evaluated computing the maximum eigenvalue. Since distance values are in the real domain, matrices are guaranteed to have at least a singular positive eigenvalue. The maximum eigenvalue

 of

 matrix expresses the variance of distances along the main direction of distances diffusion. This is equivalent to compare how much different STT trajectories are in comparison with the ones belonging to another action performing the comparison globally and without considering any association among trajectories themselves.

When multiple cameras are present, a set of STTs, one for each camera observing the action, is build exploiting the consistent labeling module previously described. The multi-camera STT set embodies the information coming from multiple views allowing to distinguish among actions that

14

Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008 may appear similar from a specific point of view. This enriched discriminative properties suggests to exploit a method that jointly compares each STT coming from a specific camera. The  matrix is consequently extended to account for STTs coming from the same camera views. More specifically if

C j

STT a i is the action descriptor of action a observed on camera C i and STT b j observed on camera

, the matrix

 that compares two different actions observed from both C i and C j becomes:





 i

0

0

 j







As stated for the single camera case, the eigenvalues are computed to obtain a scalar similarity measure from distance matrices  . In particular, for every sub-matrix  i , its largest eigenvalue

 i is computed and the sum used to identify whether actions are jointly similar on all views.

The proposed approach is conceived for working on a multi-camera setup. Nevertheless, we first evaluate its performance on a single camera setup. The nine actions we considered are summarized in Fig. 8 with an example frame for each action.

Drinking Taking off jacket Sitting

Tying shoes Walking Abandoning object

Oscillating Jumping Raising up arm

Figure 8. Examples of the used actions

Videos are taken from static cameras with a side view, but the action may also take place on a notcompletely-lateral way. The only strong assumption is that the considered moving people are visible at a sufficient resolution. In fact, if this assumption does not hold, the EM algorithm on the

MoG would have too few samples, making it strictly dependent on the initialization seeds and may not converge.

15


Several examples for each action have been collected to form a set of 29 videos. The resulting confusion matrix with respect to a manually labeled ground truth is reported in Table 1. In general, the system makes only 4 errors on the 29 videos, resulting in an average accuracy of 86.21%.

Table 1. Confusion matrix in the single camera setup.

In addition to the single camera experiments, we also conducted a preliminary experimentation on a multi-camera setup. Fig. 9 shows two examples of a “tying shoe laces” action from two completely different viewpoints. We collected 10 videos of two actions (“walking” and “tying shoe laces”). Though very preliminary, only two actions have been misclassified (resulting in an average accuracy of 80%), basically due to the poor resolution of one of the cameras.

Figure 9. An example from the multi-camera setup

This approach has been presented in a paper accepted in Proceedings of ACM/IEEE International

Conference on Distributed Smart Cameras (ACM/IEEE ICDSC 2008) that will be held in Vancouver,

BC (Canada) on October 31, 2008 (Calderara, Prati, & Cucchiara, A Markerless Approach for

Consistent Action Recognition in a Multi-camera System, 2008).

Marker-based approach

The method described in the previous section has the clear advantage to be usable also without any artificial marker on the moving people and even on pre-registered videos. On the negative side, however, it is far from being accurate in tracking single body parts/points, especially when the image resolution is limited. As an alternative, UNIMORE also collected data from a system for

MoCap (Motion Capture) available at the “Motion Analysis Laboratory” (headed by Prof. Adriano

Ferrari) at the Faculty of Medicine of UNIMORE. These colleagues provide us with the free use of the system, composed of 8 VICON near-infrared cameras and 2 standard cameras. The VICON cameras, together with a specific software and a calibration procedure, are able to track in the 3D space all the markers (of a special reflective material) which are attached to the human body. For

16

Project BeSafe – SfP 982480 – 3 rd Progress Report – NOVEMBER Progress Report - 2008 our preliminary tests, we use a classical biological human model with 31 markers attached to the most significant kinematic points on the human body (see Fig.

C7

RA

LA

RA

Rback

RHA

RUL

RPSIS

RASIS

RGT

REP

REP

RASIS

RGT

LASIS

L5

LPSIS

RTH

RLE

RHF

RTT

RM

E

LME

LTT

RLE

RHF

LCA

LLM

LFM

RLM RMM

LVM

LSM

LCA

LMM

RVM

RFM

RSM

RVM

RSM

LMM

RFM

LFM

RCA

RLM

Figure 10. Human model used and the corresponding markers

With this system we were able to collect an initial set of video feeds on several actions (walking, running, abandoning an object, etc.). Fig. 11 shows some examples of the video obtained. The resulting (very accurate) 3D trajectories for all the 31 points have been given to HUJI for further analysis and for testing their machine learning models.

Figure 11. Some examples of the results obtained (from left to right: walking, running, tying shoes)

17


References



D. M. Gavrila, “The visual analysis of human movement: A survey,” Computer Vision and

Image Understanding, vol. 73, no. 1, pp. 82–98, Jan 1999.

 J. Regh and T. Kanade, “Model based tracking of self occluding articulated objects,” in Proc. of IEEE Intl Conference on Computer Vision, 1995, pp. 612–617.

 L. Goncalves, E. DiBernardo, E. Ursella, and P. Perona, “Monocular tracking of the human arm in 3d,” in Proc. of IEEE Intl Conference on Computer Vision, 1995, pp. 764–770.

 D. M. Gavrila and L. S. Davis, “3d model-based tracking of humans in action: A multiview approach,” in Proc. of IEEE Int’l Conference on Computer Vision and Pattern Recognition,

1996, pp. 73–80.

 I. Laptev and T. Lindeberg, “Space-time interest points,” in Proc. of IEEE Intl Conference on

Computer Vision, 2003, vol. 1, pp. 432 – 439.



R. Cucchiara, C. Grana, A. Prati, and R. Vezzani, “Probabilistic posture classification for human-behavior analysis,” IEEE Trans. on Systems, Man, and Cybernetics - Part A, vol. 35, no. 1, pp. 42–54, Jan. 2005.



Yan Ke, R. Sukthankar, and M. Hebert, “Spatiotemporal shape and flow correlation for action recognition,” in Proc. of IEEE Int’l Conference on Computer Vision and Pattern

Recognition, 2007, pp. 1–8.



A. Yilmaz and M. Shah, “Action sketch: A novel action representation,” in Proc. of IEEE Int’l

Conference on Computer Vision and Pattern Recognition, 2005, vol. 1, pp. 984–989.



J. C. Niebles, H. Wang, and L. Fei-Fei, “Unsupervised learning of human action categories using spatial temporal words,” in Proc of British Machine Vision Conference, 2006, vol. 3, pp. 1249–1259.



E. Shechtman and M. Irani. Space-time behavior correlation -or- how to tell if two underlying motion fields are similar without computing them? IEEE Trans. on PAMI, 2007

S5.2 Video data collection and annotation

The development of the ViSOR (Video Surveillance Online Repository) described in the previous reports has continued in this semester. Specifically, the current video corpus set includes about

200 videos grouped into 14 categories, and allows four types of annotation:

• Base Annotation: ground-truth, with concepts referred to the whole video.

• New online Annotation tool

• GT Annotation: ground-truth, with a frame level annotation; concepts can be referred to the whole video, to a frame interval or to a single frame.

• Automatic Annotation: output of automatic systems shared by ViSOR users.

Online annotations in ViSOR can be used by registered users for their own purposes, or for querying the ViSOR database, or for executing the Online Performance Evaluation tool and obtain measures of the accuracy of their annotation (also provided by their own automatic tool).

18


Figure 12. Some screenshots of the ViSOR web interface

The development activity of VISOR in this semester has also resulted in a publication (Vezzani &

Cucchiara, 2008), recently presented at 5th IEEE International Conference On Advanced Video

and Signal Based Surveillance (AVSS2008) in Santa Fe, New Mexico, 1-3 Sep, 2008

Accomplishment achieved

Here is a comprehensive list of the accomplishments achieved so far compared to the Project Plan in the first 18 months (1-18):

 Developing a new approach for modeling human gait (GOM) and model it statistically using autoregressive processes (partly done).

 Use the GOM as a state output model of an HMM for a complete statistical model of human motion (in progress).



Use the graph Laplacian formulation, proved very successful for detecting irregularities in multivariate data, and apply it to the GOM motion parameters (in progress).



Development of a complete tool for extracting visual features (people detection and tracking with correspondent features) from a system of multiple cameras with partially overlapped FOVs (concluded);



Further enhancement of solutions for analyzing people trajectories to account for multimodal and sequential trajectories in order to infer behaviors (under final development of a prototype);



Study of a system for people shape analysis based on action signature (concluded);



Creation of a video repository for annotated surveillance videos (under further development);



Development of a system for people tracking in freely moving cameras (under further study);

19


 Development of a system for markerless modeling of human actions from multiple cameras (under further study);

 Acquisition of videos from MoCAP system (first set of video acquired);



Organization of the first ACM International Workshop on Vision Networks for Behaviour

Analysis (ACM VNBA 2008) – http://imagelab.ing.unimore.it/vnba08 - Vancouver, BC

(Canada) – October 31, 2008

Actions taken to ensure the implementation of the end-results

UNIMORE has move forward the development of real prototypes both for the people detection and tracking from fixed multi-camera systems and for trajectory analysis.

Milestones for the next six months (months 19-24)

In the next six months UNIMORE (NPD) will focus mainly on the following aspects:

1.

Final development of the complete system for analysis of people behaviors (in terms of trajectories and shape);

2.

Collection of further data for HUJI tests.

Involvement of young scientists

At UNIMORE five young scientists have been involved in the project:



Simone Calderara (3rd year PhD student at UNIMORE): involved in the study of people trajectories and the research on people shape detection and markerless segmentation of human body parts; he has been sent to international schools and conferences on these topics to acquire the necessary knowledge and experience for the project;



Roberto Vezzani (experienced Post-doc at UNIMORE – recently appointed as assistant professor at UNIMORE): involved in the development and maintenance of the VISOR system; he also participated to a meeting in Italy to disseminate the VISOR system and

BESAFE project;



Giovanni Gualdi (2nd year PhD student at UNIMORE): involved in the study of methods for object tracking in freely moving cameras.

 Daniele Borghesani (1st year PhD student at UNIMORE): involved in the study of biometric features that can be applied to model people shape; he has been sent to international schools on biometry to acquire the necessary knowledge and experience for the project;

 Paolo Piccinini (1st year PhD student at UNIMORE): involved in the development of the people trajectory analysis system; he has participated to international schools on fundaments in computer vision and pattern recognition useful for BESAFE project;

Major travels



5 th Summer School on Advanced Studies Biometric for Secure Authentication (Biometrics

2008) (9-13 June 2008, Alghero, Italy): Daniele Borghesani participated to this school to acquire specific knowledge on biometric techniques and algorithms (school website: http://biometrics.uniss.it/index.php

);



International Computer Vision Summer School (ICVSS) 2008, (14-19 July 2008): Paolo

Piccinini participated to this summer school to acquire the fundamentals of computer vision and pattern recognition (school website: http://svg.dmi.unict.it/icvss2008/ );

 Meeting (17-20 March 2008, Amsterdam, The Netherlands): Rita Cucchiara and Roberto

Vezzani attended to a meeting for evaluate possible collaborations within the BESAFE project.

20


Visibility of the project

Scientific publications in conferences with specific acknowledgment

[1] G. Gualdi, A. Prati, R. Cucchiara, E. Ardizzone, M. La Cascia, L. Lo Presti, M. Morana,

“Enabling Technologies on Hybrid Camera Networks for Behavioral Analysis of

Unattended Indoor Environments and Their Surroundings”, in Proceedings of 1st

ACM International Workshop on Vision Network for Behaviour Analysis (ACM

VNBA 2008)

[2] S. Calderara, A. Prati, R. Cucchiara, “A Markerless Approach for Consistent Action

Recognition in a Multi-camera System”, in Proceedings of ACM/IEEE International

Conference on Distributed Smart Cameras (ACM/IEEE ICDSC 2008)

[3] Calderara, S., Cucchiara, R., & Prati, A. (2008). Bayesian-competitive consistent labeling for people surveillance. IEEE Trans. on PAMI , 30 (2), 354-360.

[4] Gualdi, G., Albarelli, A., Prati, A., Torsello, A., Pelillo, M., & Cucchiara, R. (2008).

Using Dominant Sets for Object Tracking with Freely Moving Camera. in

Proceedings of Workshop on Visual Surveillance (VS 2008).

Scientific publications in conferences on topic related to the project

[5] R. Vezzani, R. Cucchiara, "Annotation Collection and Online Performance Evaluation for Video Surveillance: the ViSOR Project" in press on 5th IEEE International

Conference On Advanced Video and Signal Based Surveillance (AVSS2008), Santa

Fe, New Mexico, 1-3 Sep, 2008.

Other events



Organization of the First Workshop on VIdeo Surveillance projects in Italy (VISIT 2008) - http://imagelab.ing.unimore.it/visit2008/ - Modena (Italy), 22 May 2008; the workshop has been presented as sponsored by BESAFE; moreover, the PPD Prof. Naftali Tishby has been the main invited speaker of the workshop

(see the program at http://imagelab.ing.unimore.it/visit2008/program.asp

)



Organization of the First ACM Workshop on Vision Networks for Behaviours Analysis

(VNBA 2008) – http://imagelab.ing.unimore.it/vnba08/ - Vancouver, BC (Canada), 31

October 2008; this workshop will bring together different researchers in the field of vision networks, computer vision and behaviour analysis;

Technical and administrative difficulties encountered

None by UNIMORE.

Changes in project personnel

UNIMORE included in the project staff Daniele Borghesani and Paolo Piccinini.

21


Financial Status

PPD Financial Status

Science for Peace - Project Management Handbook

SfP NATO BUDGET TABLE

Please provide one sheet per Project Co-Director

ATTENTION: Project Co-Directors from NATO countries (except Bulgaria and Romania) are only eligible for NATO funding for items f-g-h !

Project number: SfP - 982480

Report date: 20/10/2008

Project Co-Director: Prof. Naftali Tishby

Project short title: SfP - BE SAFE

Duration of the Project 1 : 04/07-03/09

Detailed Budget Breakdown

(a) Equipment

ACTUAL

EXPENDITURES

(1) from start until

30.09.08

FORECAST EXPENDITURES

(2) for the following six months

(3) for the following period until project's end

Annex 4a

Comments on changes, if any, in the financial planning compared to the approved Project Plan

1.892

2.903

Upgrades in the brand of the cameras 2 Samsung SHC 740 D/N cameras

1 Samsung SPD 3300 (PTZ), 1 Samsung SHC 750

1" D/N camera, 1 Samsung SVR 950E recorder for cameras

Miscellaneous equipments professional rack for DVD recording ink-jet printer for rack

Thermal-Eye 250D w/150mm Lens

Subtotal "Equipment"

(b) Computers - Software

Sun Fire X2200 M2 x64 Server, DS14 Shelf with 7TB

SATA, 6 Imacs plus upgrades

Lapto, PC, other equipments

Accessories, external storage, printers, peripherals

Software: productivity applications, Data storage and statistics

Subtotal "Computers - Software"

(c) Training

International for meetings in Italy

Subtotal "Training "

(d1) Books and Journals (global figure)

(d2) Publications (global figure)

Subtotal "Books - Publications"

(e) Experts - Advisors security consultant, anti-terror experts

Subtotal "Experts - Advisors "

10.358

4.565

16.815

46.764

14.492

510

61.766

30

30

5.113

12.922

5.400

2.760

19.750

43.735

5.744

4.000

5.290

15.034

10000

10.000

7.110

800

7.910

2.500

2.500

4.887

0

0

0

0

0

0

0

2 PTZ cameras changed in 1 PTZ plus one high-quality D/N camera equipments moved to following period equipments moved to following period for meetings and setup scenarios

Subtotal "Travel" network, servers

Subtotal "Consumables - Spare parts"

(h) Other costs and (i) stipends (specify) telecommunication, printing, desk-top

Miscellaneous

Graduate student (to be identified)

Master's student (to be identified)

Master's student (to be identified)

Subtotal "Other costs"

TOTAL (1), (2), (3) :

CURRENT COST OUTLOOK

= (1)+(2)+(3)

5.113

1.094

1.094

159

1.281

1.440

86.258

190.190

4.887

8.906

8.906

2.841

4.000

519

1.800

1.800

10.960

103.932

0

0

0

0

22


NPD Financial Status

Science for Peace - Project Management Handbook

SfP NATO BUDGET TABLE

Please provide one sheet per Project Co-Director

ATTENTION: Project Co-Directors from NATO countries (except Bulgaria and Romania) are only eligible for NATO funding for items f-g-h !



Project Co-Director: Prof. Rita Cucchiara

Project short title: SfP - BE SAFE

Duration of the Project 1 : 04/07-03/09

ACTUAL

EXPENDITURES

Detailed Budget Breakdown from start until

(to be completed in EUR

3

)

30.09.08

(a) Equipment

FORECAST EXPENDITURES

(2) for the following six months

(3) for the following period until project's end

Annex 4a

Comments on changes, if any, in the financial planning compared to the approved Project Plan

Subtotal "Equipment"

(b) Computers - Software

(c) Training

Subtotal "Computers - Software"

Subtotal "Training "

(d1) Books and Journals (global figure)

(d2) Publications (global figure)

Subtotal "Books - Publications"

(e) Experts - Advisors

1.688

1.688

(f) Travel

Subtotal "Experts - Advisors "

13.306

Subtotal "Travel"

(g) Consumables - Spare parts:

Subtotal "Consumables - Spare parts"

(h) Other costs and (i) stipends (specify) other vosts stipends

Subtotal "Other costs"

TOTAL (1), (2), (3) :

CURRENT COST OUTLOOK

= (1)+(2)+(3)

13.306

2.059

2.059

1.913

200

2.113

19.166

29.000

0

0

2.947

2.947

6.887

6.887

9.834

0

0 books' quote has been increased a little (approx 188 euro)

0

0

Travels for PHD student involved in the project

(increased of approx 1300 euro)

0

Reduced to compensate to increases in books and travel

23


SFP NATO BUDGET SUMMARY TABLE



The Project is in the year 2

Project short title: SfP - Be Safe

Duration of the Project: 04/07-03/09

Project Co-Director's name, city, country

Naftali Tishby,Israel

Rita Cucchiara, Modena, Italia

APPROVED

BUDGET:

Total year 1-5

CURRENT COST

OUTLOOK:

Total year 1 - 5

30.09. of current year 2

190.190

29.000

190.190

29.000

86.258

19.166

FORECAST EXPENDITURES months

103.932

9.834

for the following period until project's end

Comments on changes, if any, in financial planning compared to the approved Project Plan

TOTAL (must be identical with

TOTALs given in 'Breakdown per item'):

219.190

219.190

105.424

113.766

Breakdown per item (to be completed in EUR 3)

ACTUAL

EXPENDITURES

FORECAST EXPENDITURES

Project Co-Director's name, city, country

APPROVED

BUDGET:

Total year 1-5

CURRENT COST

OUTLOOK:

Total year 1 - 5

30.09. of current year 2 months

(a) Equipment 60.550

60.550

16.815

43.735

(b) Computers - Software 76.800

76.800

61.766

15.034

(c) Training 10.000

10.000

10.000

(d) Books - Publications 7.940

9.628

1.718

7.910

(e) Experts - Advisors 2.500

2.500

2.500

(f) Travel 20.000

23.306

18.419

4.887

(g) Consumables - Spare parts:

(h) Other costs and (i) stipends

19.000

22.400

15.006

21.400

3.153

3.553

11.853

17.847

TOTAL : 219.190

219.190

105.424

113.766

1 Give month/year when the Project started and expected ending date. 2 Choose the appropriate date and complete the year. 3 As of January 2002, grants will be made in Euro (EUR) and all figures should be given in EUR. for the following period until project's end

0

Comments on changes, if any, in financial planning compared to the approved Project Plan books are necessary to the added staff member

Travels for participating to schools for added PhD students reduced to compensate incresed costs of books and travels reduced to compensate incresed costs of books and travels

24


Equipment Inventory Records

The completion of the equipment inventory records has been delayed since we never received the inventory labels.

Manufacturer Model Number Serial Number Date of Purchase Cost (EUR 1 ) Inventory Label No. Property

Item

Location

25


Criteria for success table



The Project is in the year: 2

Criteria for Success as approved

with the first Grant Letter on: 24/10/2006

1) Abnormal behavior: defined, scenarios of motion capture video are collected, data is acquired and annotated

2) People detection and tracking: techniques for multiple cameras

and PTZ defined; detection and tracking evaluated

CRITERIA FOR SUCCESS TABLE

Project short title: SfP - BESAFE

Duration of the Project

1

: 04/07-03/09

3) People activity: features extracted, symbolic coding for trajectories defined, data prepared, per-sensor classification is evaluated

%

Criteria for Success:

Achievements as at 30.09.08

(changes should be refleced here)

1) Abnormal behavior: partial definition,

25% defined scenario of abandoned baggage, acquired several annotated videos

, acquired additional video with MoCAP

2) People detection and tracking: techniques for overlapped multiple cameras

20% defined and deeply tested; preliminary techniques for PTZ studied; detection and tracking evaluated; preliminary studies for freely moving cameras; going forward an integrated system

3) People activity: features partially extracted, symbolic coding for

15% trajectories defined, data prepared

4) People shape: features extracted, symbolic coding defined, data prepared, per-sensor classification is evaluated

5) Kernel design and SVM learning: kernels are mathematically defined, their evaluation algorithm is implemented, experimental tests and accuracy evaluated

.)

4) People shape: initial study on feature extraction and representation

15% through action signatures; markerless system for human body part tracking

25%

5) Statistical framework designed

.)

%

25%

20%

10%

10%

10%

TOTAL : 100% TOTAL

4

: 75%

26


27

Technical Progress - Imagelab - Università degli studi di Modena e

Progress Report – NOVEMBER 2008

Project SfP 982480 – BE SAFE

(Behavior lEarning in Surveilled Areas with Feature Extraction)

Table of Content

List of abbreviations

Participants

Project background and objectives

Overview of the project

PPD – Hebrew Unviersity (HUJI)

Description of the research (months 13-18)

NPD – University of Modena and Reggio Emilia (UNIMORE)

Description of the research (months 13-18)

Task S1.2 Camera coordination primitives for static, hybrid and PTZ cameras

Task S2.2 Features extraction for people shape detection

References

S5.2 Video data collection and annotation

Accomplishment achieved

Actions taken to ensure the implementation of the end-results

Milestones for the next six months (months 19-24)

Involvement of young scientists

Major travels

Visibility of the project

Scientific publications in conferences with specific acknowledgment

Scientific publications in conferences on topic related to the project

Other events

Technical and administrative difficulties encountered

Changes in project personnel

Financial Status

PPD Financial Status

SfP NATO BUDGET TABLE

NPD Financial Status

SfP NATO BUDGET TABLE

SFP NATO BUDGET SUMMARY TABLE

Equipment Inventory Records

Criteria for success table

Related documents

Products

Support