bmva-CANTATA

advertisement
JL-1
CANTATA
Content Aware Networked systems Towards Advanced and Tailored
Assistance
BMVA 2007
December 12, 2007
Francois Bremond – INRIA sophia
BMVA CANTATA – INRIA, December 12, 2007
page 1
JL-2
CANTATA Introduction
Problem statement
•
2 year ITEA Project, ending in December 2008
•
Large amounts of data for transfer and interpretation
•
3 MCA challenges



•
Surveillance
Consumer applications
Medical
Solution


High-level descriptions by
means of content analysis
Retrieval by Intelligent Indexing
BMVA CANTATA – INRIA, December 12, 2007
page 2
JL-3
CANTATA Introduction
Long term vision
•
Develop systems



•
That are aware of the content and understand it
That apply this knowledge to establish an action or autonomously
control the environment
That will be a “virtual specialist” as it will apply the knowledge to
assist the decision-making security officer
Challenges



Video content models for robust analysis and reasoning
Self-learning, context awareness for faithful system performance
Performance quantification of MCA for objective evaluation
(standard dataset, well-defined metrics)
BMVA CANTATA – INRIA, December 12, 2007
page 3
JL-4
CANTATA the scope
Portable PC
The CANTATA platform
Framework 1
media
Desktop PC
metadata
Desktop PC
Desktop PC
Framework 2
Desktop PC
Content
database
metadata
Mobile
User
adaptation
Content
database
ACA
video
1
VCA
Portable PC
GPRS
UMTS
User
adaptation
SetTopBox
Desktop PC services-based
Content
metadata
HighDef TV
database
network architecture
VCA
Desktop PC
WIFI
IP network
streaming architecture
MDA
Tablet PC
GP
UM R S
TS
media
audio
VCA
PDA
GPRS
UMTS
GPRS
UMTS
Desktop PC
Framework
3
component-based
video
2
GPRS
UMTS
media
Desktop PC
Desktop PC
User
adaptation
DB
Desktop PC
Desktop PC
VCA
Evaluation
Metrics
Standard TV
Ground truth
BMVA CANTATA – INRIA, December 12, 2007
Performance
reporting
Tp
Fp
page 4
SetTopBox
HighDef TV
Standard TV
JL-5
WP4 Validation and classification
Objective
• This work package aims at defining an overall objective
validation framework that covers the various aspects of
MCA systems
Validation chain
BMVA CANTATA – INRIA, December 12, 2007
page 5
JL-6
WP4 Validation and classification
• Organisation
Organisation
• Work Package leader: Barco
Task 4.1 State-Of-The-Art : Inria
Task 4.2 Requirements : VDG Security
Task 4.3 Creation of Datasets : Kingston University
Task 4.4 Annotation tool : Traficon
Task 4.5 Ground truth : Multitel
Task 4.6 Validation metrics : Philips medical
Task 4.7 Publication of validation : Codasystem
• Other partners: Acic, UPF, IBBT, Philips research
BMVA CANTATA – INRIA, December 12, 2007
page 6
JL-7
This work
gives an overview of projects in performance evaluation
and proposed datasets
BMVA CANTATA – INRIA, December 12, 2007
page 7
JL-8
Performance evaluation
Creation of WEB PAGE with existing VIDEO DATASETS
• Topics:
 Surveillance
 Consumer applications
 Medical
• Content:








Website: Webpage link (if any)
Description of Dataset: (Content, size, etc)
Description of Ground Truth/Metadata: (if any)
Contextual info:environment conditions (calibration, scene...)
Results from metrics and ground truth:
Comments:
Information on Copyright: Licence, Cost, etc.
Contact person from Cantata:contact person to get more info.
BMVA CANTATA – INRIA, December 12, 2007
page 8
JL-9
Surveillance
ETISEO
•
•
•
•
•
•
•
•
Website: http://www-sop.inria.fr/orion/ETISEO/
Description of Dataset: 86 video clips. These sequences constitute a
representative panel of different video surveillance areas.
They merge indoor and outdoor scenes, corridors, streets, building entries,
subway station...
They also mix different types of sensors and complexity levels.
Description of Ground Truth/Metadata: 5 different levels: Object Detection,
Object Localization, Object Tracking, Object Classification.
Contextual info: zone of interest, calibration matrix
Results from metrics and ground truth: bounding box, object class, events
Comments:
Information on Copyright: Free download but registration and user agreement
is required.
Contact person from Cantata: francois.bremond@sophia.inria.fr
BMVA CANTATA – INRIA, December 12, 2007
page 9
JL-10
Surveillance
PETS 2001
•
Website: http://www.cvg.cs.rdg.ac.uk/PETS2001/pets2001-dataset.html
•
Description of Dataset: Outdoor people and vehicle tracking (two synchronised
viewsDescription of Ground Truth/Metadata: Tracking information on image
plane and ground plane can be found at:
http://www.cvg.cs.rdg.ac.uk/PETS2001/ANNOTATION/
•
Contextual info: Camera Calibration provided
•
Results from metrics and ground truth: Centroid and bounding box coordinates
on image plane, object class (person, vehicle, other), position on ground plane
and object orientation.
•
Information on Copyright: Free download from website
•
Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk
BMVA CANTATA – INRIA, December 12, 2007
page 10
JL-11
Surveillance
PETS 2002- VISOR BASE: Moving People
•
•
•
•
•
•
•
Website: http://www.cvg.cs.rdg.ac.uk/PETS2002/pets2002-db.html
Description of Dataset: Indoor people tracking (and counting). Two training and
four testing sequences consist of people moving in front of a shop window.
Sequences are provided as both MPEG movie format and as individual JPEG
images.
Description of Ground Truth/Metadata: People tracking, counting and activity
recognition.
Contextual info: No calibration
Results from metrics and ground truth: How many people are passing in front
of the shop window, how many people stop and look into the window, how
many people are looking into the window at each instant (frame) in time, the
trajectories of people passing in front of the store, the time spent per frame
(processing time): a histogram of the microseconds spent processing each
frame.
Information on Copyright: Free download from website
Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk
BMVA CANTATA – INRIA, December 12, 2007
page 11
JL-12
Surveillance
PETS-ICVS'2003 - FGnet
•
•
Website: http://www.cvg.cs.rdg.ac.uk/PETS-ICVS/pets-icvs-db.html
Description of Dataset: Smart meeting, that includes facial expressions, gaze
and gesture/action. The environment consists of three cameras: one mounted
on each of two opposing walls, and an omnidirectional camera positioned at
the centre of the room. The dataset consists of four scenarios.
•
Description of Ground Truth/Metadata: a) Eye positions of people in Scenarios
A, B and D. (every 10th frame is annotated). b) Facial expression and gaze
estimation for Scenarios A and D, Cameras 1-2. c) Gesture/action annotations
for Scenarios B and D, Cameras 1-2.
Contextual info: Camera Calibration provided.
•
•
•
Results from metrics and ground truth: For each frame, the requirement is to
perform:face localisation (centre location of eyes), recognition of facial
expression, recognition of face/hand gesture, estimation of face/head direction
(gaze), recognition of actions.
Information on Copyright: Free download
•
Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk
BMVA CANTATA – INRIA, December 12, 2007
page 12
JL-13
Surveillance
PETS-ECCV'2004 - CAVIAR
•
•
•
•
•
•
•
Website: http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/
or http://www-prima.inrialpes.fr/PETS04/caviar_data.html
Description of Dataset: People walking alone, meeting with others, window
shopping, fighting and passing out and leaving a package in a public place. All
video clips were filmed with a wide angle camera lens. The resolution is halfresolution PAL standard (384 x 288 pixels, 25 frames per second) and
compressed using MPEG2. The file sizes are about 10 MB.
Description of Ground Truth/Metadata: Person/Group Tracking, Person/Group
Activity Recognition, Scenario/Situation Recognition
Contextual info: 3D coordinates of points for calibration purposes provided.
Results from metrics and ground truth: For each frame and object/group :
bounding box and behaviour label. Also, for each frame, labels for
situations/scenarios for the whole image.
Information on Copyright: Free download from website. If you publish results
using the data, please acknowledge the data as coming from the EC Funded
CAVIAR project/IST 2001 37540, found at URL:
http://www.dai.ed.ac.uk/homes/rbf/CAVIAR/
Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk
BMVA CANTATA – INRIA, December 12, 2007
page 13
JL-14
Surveillance
PETS'2006 - ISCAPS
•
•
•
•
•
•
•
Website: http://pets2006.net/
Description of Dataset: Surveillance of public spaces, detection of left luggage
events. Scenarios of increasing complexity, captured using multiple sensors.
Description of Ground Truth/Metadata: XML files: Calibration parameters,
these are given in the sub-directory 'calibration‘ and configuration and groundtruth information.
Contextual info: Calibration provided.
Results from metrics and ground truth: The radii distances, luggage location,
warning / alarm triggers etc
Information on Copyright: Free download from website . The UK Information
Commisioner has agreed that the PETS 2006 data-sets described here may
be made publicly available for the purposes of academic research. The video
sequences are copyright ISCAPS consortium and permission is hereby
granted for free download for the purposes of the PETS 2006 workshop.
Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk
BMVA CANTATA – INRIA, December 12, 2007
page 14
JL-15
Surveillance
PETS'2007 - REASON
•
•
•
•
•
•
•
Website: http://pets2007.net/
Description of Dataset: The datasets are multisensor sequences containing the
following 3 scenarios, with increasing scene complexity: 1. loitering, 2.
attended luggage removal (theft), 3. unattended luggage.
Description of Ground Truth/Metadata: Event Detection
Contextual info: Calibration provided
Results from metrics and ground truth: Event Details (type, location, time)
Information on Copyright: Free download from website . The UK Information
Commisioner has agreed that the PETS 2007 datasets described here may be
made publicly available for the purposes of academic research. The video
sequences are copyright UK EPSRC REASON Project consortium and
permission is hereby granted for free download for the purposes of the PETS
2007 workshop.
Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk
BMVA CANTATA – INRIA, December 12, 2007
page 15
JL-16
Surveillance
Level Crossing
•
•
•
•
Website: http://www.multitel.be/~va/selcat/
Description of Dataset: These datasets are composed of 24 Hours of real
sequences, showing a level crossing where some vehicles stop due to its
particular configuration: on the right side of the LC, there is an avenue, parallel
to the LC. So a traffic light is located just after the LC. Consequently,
sometimes, vehicles stopped on the LC due to this traffic light. The Total
Amount of data is about 7 GigaBytes. Description of Ground Truth/Metadata:
For each video files, there is a corresponding ground truth file in XML that give
the timestamp of events "stopped vehicles"
Contextual info:environment conditions (calibration, scene...)
Contact person from Cantata: Caroline Machy, machy@multitel.be
BMVA CANTATA – INRIA, December 12, 2007
page 16
JL-17
Surveillance
SPEVI: Single face dataset
•
•
Website: www.spevi.org
Description of Dataset: This is a dataset for single person/face visual detection
and tracking. The dataset is composed of five sequences with different
illumination conditions and resolutions.
•
Description of Ground Truth/Metadata: The ground truth data is available in the
.zip files for the sequences motinas_toni and motinas_emilio_webcam. In the
ground truth files each line of text describes the objects' position and size in a
frame. The syntax of a line is the following:
frame number_of_objects obj_1_name x y half_width half_height angle obj
_2_name x y half_width half_height angle ...
•
Information on Copyright: Requested citation acknowledgment E. Maggio, A.
Cavallaro, "Hybrid particle filter and mean shift tracker with adaptive transition
model", in Proc. of IEEE Int. Conference on Acoustics, Speech and Signal
Processing (ICASSP 2005), Philadelphia, 19-23 March 2005, pp. 221 - 224.
Contact person from Cantata: Xavier Desurmont, desurmont@multitel.be
•
BMVA CANTATA – INRIA, December 12, 2007
page 17
JL-18
Surveillance
SPEVI: Multiple faces dataset
•
•
•
•
•
•
•
•
Website: www.spevi.org
Description of Dataset: This is a dataset for multiple people/faces visual detection and
tracking. The dataset is composed of 3 sequences (same scenario); 4 targets repeatedly
occlude each other while appearing and disappearing from the field of view of the
camera. The sequence motinas_multi_face_frontal shows frontal faces only; in
motinas_multi_face_turning the faces are frontal and rotated; in motinas_multi_face_fast
the targets move faster that in the previous two sequences. Total number of images:
2769, DivX 6 compression,640 x 480 pixels,25 Hz.
Description of Ground Truth/Metadata: No
Contextual info: No
Results from metrics and ground truth: No
Comments: No
Information on Copyright: Requested citation acknowledgment: E. Maggio, E. Piccardo,
C. Regazzoni, A. Cavallaro. "Particle PHD filter for multi-target visual tracking", in Proc.
of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP
2007), Honolulu (USA), April 15-20, 2007
Contact person from Cantata: Xavier Desurmont, desurmont\a\multitel.be
BMVA CANTATA – INRIA, December 12, 2007
page 18
JL-19
Surveillance
OVVV
•
•
•
•
•
•
•
•
Website: http://development.objectvideo.com/
Description of Dataset: The ObjectVideo Virtual Video provides the ability to generate
virtual video sequences. These sequences can then be used to test VCA algorithms.
Description of Ground Truth/Metadata: The automatically generated ground truth is
generated in a propriety binary format. The format is open, and a conversion program
can be created to convert metadata to any format.
Contextual info: Virtual environment, the user can make his own environment from the
internet. Several camera settings can be changed to simulate real-world cameras.
Results from metrics and ground truth: results from metrics and ground truth are not
applicable for OVVV.
Comments: This is not a dataset as is but using these tools, very powerful and tailored;
test videos can be created.
Information on Copyright: The ObjectVideo Virtual Video Tool is provided free for noncommercial use, for your own research and development purposes. If you publish or
distribute images, videos or derivative results based on this software, you must
acknowledge ObjectVideo by including "ObjectVideo Virtual Video Tool".
To use the ObjectVideo Virtual Video tool a licence for the commercial game Half-Life 2
is needed (www.steampowered.com).
Contact person from Cantata: Rick Koeleman, VDG-Security bv. rick@vdg-security.com
BMVA CANTATA – INRIA, December 12, 2007
page 19
JL-20
Surveillance
CANDELA
•
Website: http://www.multitel.be/~va/candela/
•
Description of Dataset: "Indoor abandonned object" and "road intersection"
•
Description of Ground Truth/Metadata: no
•
Contextual info: no
•
Results from metrics and ground truth: Criteria for verification/ : -Is the alarm
generated (yes/no)? -How correct is the timing of the alarm (start delay, overall
time overlap) Position correctness
•
Information on Copyright: public domain
•
Contact person from Cantata: Xavier Desurmont, desurmont\a\multitel.be
 Scenario 1: The detection of abandoned objects
 Scenario 2: Street at zebra crossings.
BMVA CANTATA – INRIA, December 12, 2007
page 20
JL-21
Surveillance
Traffic datasets (Institut fur Algorithmen
und Kognitive Systemes)
•
Website: http://i21www.ira.uka.de/image_sequences/
•
Description of dataset: Traffic databases
•
Description of Ground Truth/Metadata: No
•
Contextual info: Different context, snow, fogs, etc.
•
Information on Copyright: license (no), cost (free):
•
Contact person from Cantata: Sabri Boughorbel
(sabri.boughorbel@philips.com)
BMVA CANTATA – INRIA, December 12, 2007
page 21
JL-22
Surveillance
VISOR
•
Website: http://imagelab.ing.unimore.it/visor/
•
Description of Dataset: 4 types of video clips. These sequences constitute a
representative panel of different video surveillance areas.
They merge indoor and outdoor scenes, such as Indoor Domotic Unimore D.I.I.
setup.
•
Description of Ground Truth/Metadata: Object Detection and Tracking.
•
Results from metrics and ground truth: (Viper-GT) bounding box,
•
Comments: mostly simple videos
•
Information on Copyright: Free download
•
Contact person: vezzani.roberto@unimore.it
BMVA CANTATA – INRIA, December 12, 2007
page 22
JL-23
Surveillance
BEHAVE
•
•
•
•
Website: http://groups.inf.ed.ac.uk/vision/BEHAVEDATA/
Description of Dataset: crowd, people acting out various interactions.
Description of Ground Truth/Metadata: Object Detection and Tracking.
Contextual info: calibration info
•
•
•
•
Results from metrics and ground truth: (Viper-GT) bounding box, object class,
Comments: some complex videos
Information on Copyright: Free download
Contact person: Bob Fisher : rbf@inf.ed.ac.uk
BMVA CANTATA – INRIA, December 12, 2007
page 23
JL-24
Surveillance
BEHAVE 2
•
•
•
Website: http://groups.inf.ed.ac.uk/vision/BEHAVEDATA/INTERACTIONS/
Description of Dataset: The dataset comprises of two views of various
scenario's of people acting out various interactions. Ten basic scenarios were
acted out: InGroup, Approach, WalkTogether, Split, Ignore, Following, Chase,
Fight, RunTogether, and Meet.The data is captured at 25 frames per second.
The resolution is 640x480. The videos are available either as AVI's or as a
numbered set of JPEG single image files.
Description of Ground Truth/Metadata: Tracking, Event detection.
Contextual info: 3D coordinates of points for calibration purposes provided.
Results from metrics and ground truth: Bounding boxes (VIPER XML format).
Event labels for persons and frame span
Comments: The site will be updated when more of the ground truth becomes
available.
Information on Copyright: Free download from website.
•
Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk
•
•
•
•
BMVA CANTATA – INRIA, December 12, 2007
page 24
JL-25
Consumer applications
VS-PETS'2003 - INMOVE
•
•
•
•
•
•
Website: http://www.cvg.cs.rdg.ac.uk/VSPETS/vspets-db.html
Description of Dataset: Outdoor people tracking - football data (three
synchronised views). The datasets consists of football players moving around
a pitch.
Description of Ground Truth/Metadata: Tracking information on image plane for
camera 3 can be found at:
http://www.cvg.cs.rdg.ac.uk/VSPETS/Camera3Xml.zip. An AVI file of the
ground truth for camera view 3 is also available at
http://www.cvg.cs.rdg.ac.uk/VSPETS/Cam3_Gt.avi
Results from metrics and ground truth: The location of each player on the
pitch, for each frame of the sequence. For each player, the bounding box (with
origin bottom left) in pixels should be determined. The position of the player is
defined as the middle bottom of the bounding box (in pixels).
Information on Copyright: Free download from website
Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk
BMVA CANTATA – INRIA, December 12, 2007
page 25
JL-26
Consumer Applications
TRICTRAC
•
•
•
•
•
•
•
Website: http://www.multitel.be/trictrac/
Description of dataset: Multicamera HD progressive image in jpeg for synthetic
video sequence of soccer.
Description of Ground Truth/Metadata: XML (position is 2D, 3D of objects and
camera)
Contextual info: No
Results from metrics and ground truth : no
Comments: the datasets is fully described in "TRICTRAC Video Dataset:
Public HDTV Synthetic Soccer Video Sequences With Ground Truth", X.
Desurmont, J-B. Hayet, J-F. Delaigle, J. Piater, B. Macq, Workshop on
Computer Vision Based Analysis in Sport Environments (CVBASE), 2006.
Information on Copyright: Access / licence: All data is publicly available and
downloadable. If you publish results using the data, please acknowledge the
data as coming from the TRICTRAC project, found at URL:
http://www.multitel.be/trictrac. THE DATASET IS PROVIDED WITHOUT
WARRANTY OF ANY KIND.
 Contact person from Cantata: Xavier Desurmont, desurmont\a\multitel.be
BMVA CANTATA – INRIA, December 12, 2007
page 26
JL-27
Medical Dataset
Example of one dataset
BMVA CANTATA – INRIA, December 12, 2007
page 27
JL-28
Example with 2 signals:
a mass and a micro calcification
BMVA CANTATA – INRIA, December 12, 2007
page 28
JL-29
Conclusion
WEB SITE
• Many application domains (d.makris@kingston.ac.uk)
 25 datasets for Surveillance
 6 datasets for Comsumer applications
 3 datasets for Medical
http://www.tudor.lu/cantata
http://www.tudor.lu/QuickPlace/cantata/PageLibraryC125725E002AB722.nsf/h_AA
BC75AA0B05E5DFC125725E002B5E46/ED93066DB0E340C7C12573A2005
6D789/?OpenDocument
User Name : Francois.Bremond@sophia.inria.fr
Password :
BMVA CANTATA – INRIA, December 12, 2007
page 29
Download