JL-1 CANTATA Content Aware Networked systems Towards Advanced and Tailored Assistance BMVA 2007 December 12, 2007 Francois Bremond – INRIA sophia BMVA CANTATA – INRIA, December 12, 2007 page 1 JL-2 CANTATA Introduction Problem statement • 2 year ITEA Project, ending in December 2008 • Large amounts of data for transfer and interpretation • 3 MCA challenges • Surveillance Consumer applications Medical Solution High-level descriptions by means of content analysis Retrieval by Intelligent Indexing BMVA CANTATA – INRIA, December 12, 2007 page 2 JL-3 CANTATA Introduction Long term vision • Develop systems • That are aware of the content and understand it That apply this knowledge to establish an action or autonomously control the environment That will be a “virtual specialist” as it will apply the knowledge to assist the decision-making security officer Challenges Video content models for robust analysis and reasoning Self-learning, context awareness for faithful system performance Performance quantification of MCA for objective evaluation (standard dataset, well-defined metrics) BMVA CANTATA – INRIA, December 12, 2007 page 3 JL-4 CANTATA the scope Portable PC The CANTATA platform Framework 1 media Desktop PC metadata Desktop PC Desktop PC Framework 2 Desktop PC Content database metadata Mobile User adaptation Content database ACA video 1 VCA Portable PC GPRS UMTS User adaptation SetTopBox Desktop PC services-based Content metadata HighDef TV database network architecture VCA Desktop PC WIFI IP network streaming architecture MDA Tablet PC GP UM R S TS media audio VCA PDA GPRS UMTS GPRS UMTS Desktop PC Framework 3 component-based video 2 GPRS UMTS media Desktop PC Desktop PC User adaptation DB Desktop PC Desktop PC VCA Evaluation Metrics Standard TV Ground truth BMVA CANTATA – INRIA, December 12, 2007 Performance reporting Tp Fp page 4 SetTopBox HighDef TV Standard TV JL-5 WP4 Validation and classification Objective • This work package aims at defining an overall objective validation framework that covers the various aspects of MCA systems Validation chain BMVA CANTATA – INRIA, December 12, 2007 page 5 JL-6 WP4 Validation and classification • Organisation Organisation • Work Package leader: Barco Task 4.1 State-Of-The-Art : Inria Task 4.2 Requirements : VDG Security Task 4.3 Creation of Datasets : Kingston University Task 4.4 Annotation tool : Traficon Task 4.5 Ground truth : Multitel Task 4.6 Validation metrics : Philips medical Task 4.7 Publication of validation : Codasystem • Other partners: Acic, UPF, IBBT, Philips research BMVA CANTATA – INRIA, December 12, 2007 page 6 JL-7 This work gives an overview of projects in performance evaluation and proposed datasets BMVA CANTATA – INRIA, December 12, 2007 page 7 JL-8 Performance evaluation Creation of WEB PAGE with existing VIDEO DATASETS • Topics: Surveillance Consumer applications Medical • Content: Website: Webpage link (if any) Description of Dataset: (Content, size, etc) Description of Ground Truth/Metadata: (if any) Contextual info:environment conditions (calibration, scene...) Results from metrics and ground truth: Comments: Information on Copyright: Licence, Cost, etc. Contact person from Cantata:contact person to get more info. BMVA CANTATA – INRIA, December 12, 2007 page 8 JL-9 Surveillance ETISEO • • • • • • • • Website: http://www-sop.inria.fr/orion/ETISEO/ Description of Dataset: 86 video clips. These sequences constitute a representative panel of different video surveillance areas. They merge indoor and outdoor scenes, corridors, streets, building entries, subway station... They also mix different types of sensors and complexity levels. Description of Ground Truth/Metadata: 5 different levels: Object Detection, Object Localization, Object Tracking, Object Classification. Contextual info: zone of interest, calibration matrix Results from metrics and ground truth: bounding box, object class, events Comments: Information on Copyright: Free download but registration and user agreement is required. Contact person from Cantata: francois.bremond@sophia.inria.fr BMVA CANTATA – INRIA, December 12, 2007 page 9 JL-10 Surveillance PETS 2001 • Website: http://www.cvg.cs.rdg.ac.uk/PETS2001/pets2001-dataset.html • Description of Dataset: Outdoor people and vehicle tracking (two synchronised viewsDescription of Ground Truth/Metadata: Tracking information on image plane and ground plane can be found at: http://www.cvg.cs.rdg.ac.uk/PETS2001/ANNOTATION/ • Contextual info: Camera Calibration provided • Results from metrics and ground truth: Centroid and bounding box coordinates on image plane, object class (person, vehicle, other), position on ground plane and object orientation. • Information on Copyright: Free download from website • Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk BMVA CANTATA – INRIA, December 12, 2007 page 10 JL-11 Surveillance PETS 2002- VISOR BASE: Moving People • • • • • • • Website: http://www.cvg.cs.rdg.ac.uk/PETS2002/pets2002-db.html Description of Dataset: Indoor people tracking (and counting). Two training and four testing sequences consist of people moving in front of a shop window. Sequences are provided as both MPEG movie format and as individual JPEG images. Description of Ground Truth/Metadata: People tracking, counting and activity recognition. Contextual info: No calibration Results from metrics and ground truth: How many people are passing in front of the shop window, how many people stop and look into the window, how many people are looking into the window at each instant (frame) in time, the trajectories of people passing in front of the store, the time spent per frame (processing time): a histogram of the microseconds spent processing each frame. Information on Copyright: Free download from website Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk BMVA CANTATA – INRIA, December 12, 2007 page 11 JL-12 Surveillance PETS-ICVS'2003 - FGnet • • Website: http://www.cvg.cs.rdg.ac.uk/PETS-ICVS/pets-icvs-db.html Description of Dataset: Smart meeting, that includes facial expressions, gaze and gesture/action. The environment consists of three cameras: one mounted on each of two opposing walls, and an omnidirectional camera positioned at the centre of the room. The dataset consists of four scenarios. • Description of Ground Truth/Metadata: a) Eye positions of people in Scenarios A, B and D. (every 10th frame is annotated). b) Facial expression and gaze estimation for Scenarios A and D, Cameras 1-2. c) Gesture/action annotations for Scenarios B and D, Cameras 1-2. Contextual info: Camera Calibration provided. • • • Results from metrics and ground truth: For each frame, the requirement is to perform:face localisation (centre location of eyes), recognition of facial expression, recognition of face/hand gesture, estimation of face/head direction (gaze), recognition of actions. Information on Copyright: Free download • Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk BMVA CANTATA – INRIA, December 12, 2007 page 12 JL-13 Surveillance PETS-ECCV'2004 - CAVIAR • • • • • • • Website: http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/ or http://www-prima.inrialpes.fr/PETS04/caviar_data.html Description of Dataset: People walking alone, meeting with others, window shopping, fighting and passing out and leaving a package in a public place. All video clips were filmed with a wide angle camera lens. The resolution is halfresolution PAL standard (384 x 288 pixels, 25 frames per second) and compressed using MPEG2. The file sizes are about 10 MB. Description of Ground Truth/Metadata: Person/Group Tracking, Person/Group Activity Recognition, Scenario/Situation Recognition Contextual info: 3D coordinates of points for calibration purposes provided. Results from metrics and ground truth: For each frame and object/group : bounding box and behaviour label. Also, for each frame, labels for situations/scenarios for the whole image. Information on Copyright: Free download from website. If you publish results using the data, please acknowledge the data as coming from the EC Funded CAVIAR project/IST 2001 37540, found at URL: http://www.dai.ed.ac.uk/homes/rbf/CAVIAR/ Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk BMVA CANTATA – INRIA, December 12, 2007 page 13 JL-14 Surveillance PETS'2006 - ISCAPS • • • • • • • Website: http://pets2006.net/ Description of Dataset: Surveillance of public spaces, detection of left luggage events. Scenarios of increasing complexity, captured using multiple sensors. Description of Ground Truth/Metadata: XML files: Calibration parameters, these are given in the sub-directory 'calibration‘ and configuration and groundtruth information. Contextual info: Calibration provided. Results from metrics and ground truth: The radii distances, luggage location, warning / alarm triggers etc Information on Copyright: Free download from website . The UK Information Commisioner has agreed that the PETS 2006 data-sets described here may be made publicly available for the purposes of academic research. The video sequences are copyright ISCAPS consortium and permission is hereby granted for free download for the purposes of the PETS 2006 workshop. Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk BMVA CANTATA – INRIA, December 12, 2007 page 14 JL-15 Surveillance PETS'2007 - REASON • • • • • • • Website: http://pets2007.net/ Description of Dataset: The datasets are multisensor sequences containing the following 3 scenarios, with increasing scene complexity: 1. loitering, 2. attended luggage removal (theft), 3. unattended luggage. Description of Ground Truth/Metadata: Event Detection Contextual info: Calibration provided Results from metrics and ground truth: Event Details (type, location, time) Information on Copyright: Free download from website . The UK Information Commisioner has agreed that the PETS 2007 datasets described here may be made publicly available for the purposes of academic research. The video sequences are copyright UK EPSRC REASON Project consortium and permission is hereby granted for free download for the purposes of the PETS 2007 workshop. Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk BMVA CANTATA – INRIA, December 12, 2007 page 15 JL-16 Surveillance Level Crossing • • • • Website: http://www.multitel.be/~va/selcat/ Description of Dataset: These datasets are composed of 24 Hours of real sequences, showing a level crossing where some vehicles stop due to its particular configuration: on the right side of the LC, there is an avenue, parallel to the LC. So a traffic light is located just after the LC. Consequently, sometimes, vehicles stopped on the LC due to this traffic light. The Total Amount of data is about 7 GigaBytes. Description of Ground Truth/Metadata: For each video files, there is a corresponding ground truth file in XML that give the timestamp of events "stopped vehicles" Contextual info:environment conditions (calibration, scene...) Contact person from Cantata: Caroline Machy, machy@multitel.be BMVA CANTATA – INRIA, December 12, 2007 page 16 JL-17 Surveillance SPEVI: Single face dataset • • Website: www.spevi.org Description of Dataset: This is a dataset for single person/face visual detection and tracking. The dataset is composed of five sequences with different illumination conditions and resolutions. • Description of Ground Truth/Metadata: The ground truth data is available in the .zip files for the sequences motinas_toni and motinas_emilio_webcam. In the ground truth files each line of text describes the objects' position and size in a frame. The syntax of a line is the following: frame number_of_objects obj_1_name x y half_width half_height angle obj _2_name x y half_width half_height angle ... • Information on Copyright: Requested citation acknowledgment E. Maggio, A. Cavallaro, "Hybrid particle filter and mean shift tracker with adaptive transition model", in Proc. of IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Philadelphia, 19-23 March 2005, pp. 221 - 224. Contact person from Cantata: Xavier Desurmont, desurmont@multitel.be • BMVA CANTATA – INRIA, December 12, 2007 page 17 JL-18 Surveillance SPEVI: Multiple faces dataset • • • • • • • • Website: www.spevi.org Description of Dataset: This is a dataset for multiple people/faces visual detection and tracking. The dataset is composed of 3 sequences (same scenario); 4 targets repeatedly occlude each other while appearing and disappearing from the field of view of the camera. The sequence motinas_multi_face_frontal shows frontal faces only; in motinas_multi_face_turning the faces are frontal and rotated; in motinas_multi_face_fast the targets move faster that in the previous two sequences. Total number of images: 2769, DivX 6 compression,640 x 480 pixels,25 Hz. Description of Ground Truth/Metadata: No Contextual info: No Results from metrics and ground truth: No Comments: No Information on Copyright: Requested citation acknowledgment: E. Maggio, E. Piccardo, C. Regazzoni, A. Cavallaro. "Particle PHD filter for multi-target visual tracking", in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), Honolulu (USA), April 15-20, 2007 Contact person from Cantata: Xavier Desurmont, desurmont\a\multitel.be BMVA CANTATA – INRIA, December 12, 2007 page 18 JL-19 Surveillance OVVV • • • • • • • • Website: http://development.objectvideo.com/ Description of Dataset: The ObjectVideo Virtual Video provides the ability to generate virtual video sequences. These sequences can then be used to test VCA algorithms. Description of Ground Truth/Metadata: The automatically generated ground truth is generated in a propriety binary format. The format is open, and a conversion program can be created to convert metadata to any format. Contextual info: Virtual environment, the user can make his own environment from the internet. Several camera settings can be changed to simulate real-world cameras. Results from metrics and ground truth: results from metrics and ground truth are not applicable for OVVV. Comments: This is not a dataset as is but using these tools, very powerful and tailored; test videos can be created. Information on Copyright: The ObjectVideo Virtual Video Tool is provided free for noncommercial use, for your own research and development purposes. If you publish or distribute images, videos or derivative results based on this software, you must acknowledge ObjectVideo by including "ObjectVideo Virtual Video Tool". To use the ObjectVideo Virtual Video tool a licence for the commercial game Half-Life 2 is needed (www.steampowered.com). Contact person from Cantata: Rick Koeleman, VDG-Security bv. rick@vdg-security.com BMVA CANTATA – INRIA, December 12, 2007 page 19 JL-20 Surveillance CANDELA • Website: http://www.multitel.be/~va/candela/ • Description of Dataset: "Indoor abandonned object" and "road intersection" • Description of Ground Truth/Metadata: no • Contextual info: no • Results from metrics and ground truth: Criteria for verification/ : -Is the alarm generated (yes/no)? -How correct is the timing of the alarm (start delay, overall time overlap) Position correctness • Information on Copyright: public domain • Contact person from Cantata: Xavier Desurmont, desurmont\a\multitel.be Scenario 1: The detection of abandoned objects Scenario 2: Street at zebra crossings. BMVA CANTATA – INRIA, December 12, 2007 page 20 JL-21 Surveillance Traffic datasets (Institut fur Algorithmen und Kognitive Systemes) • Website: http://i21www.ira.uka.de/image_sequences/ • Description of dataset: Traffic databases • Description of Ground Truth/Metadata: No • Contextual info: Different context, snow, fogs, etc. • Information on Copyright: license (no), cost (free): • Contact person from Cantata: Sabri Boughorbel (sabri.boughorbel@philips.com) BMVA CANTATA – INRIA, December 12, 2007 page 21 JL-22 Surveillance VISOR • Website: http://imagelab.ing.unimore.it/visor/ • Description of Dataset: 4 types of video clips. These sequences constitute a representative panel of different video surveillance areas. They merge indoor and outdoor scenes, such as Indoor Domotic Unimore D.I.I. setup. • Description of Ground Truth/Metadata: Object Detection and Tracking. • Results from metrics and ground truth: (Viper-GT) bounding box, • Comments: mostly simple videos • Information on Copyright: Free download • Contact person: vezzani.roberto@unimore.it BMVA CANTATA – INRIA, December 12, 2007 page 22 JL-23 Surveillance BEHAVE • • • • Website: http://groups.inf.ed.ac.uk/vision/BEHAVEDATA/ Description of Dataset: crowd, people acting out various interactions. Description of Ground Truth/Metadata: Object Detection and Tracking. Contextual info: calibration info • • • • Results from metrics and ground truth: (Viper-GT) bounding box, object class, Comments: some complex videos Information on Copyright: Free download Contact person: Bob Fisher : rbf@inf.ed.ac.uk BMVA CANTATA – INRIA, December 12, 2007 page 23 JL-24 Surveillance BEHAVE 2 • • • Website: http://groups.inf.ed.ac.uk/vision/BEHAVEDATA/INTERACTIONS/ Description of Dataset: The dataset comprises of two views of various scenario's of people acting out various interactions. Ten basic scenarios were acted out: InGroup, Approach, WalkTogether, Split, Ignore, Following, Chase, Fight, RunTogether, and Meet.The data is captured at 25 frames per second. The resolution is 640x480. The videos are available either as AVI's or as a numbered set of JPEG single image files. Description of Ground Truth/Metadata: Tracking, Event detection. Contextual info: 3D coordinates of points for calibration purposes provided. Results from metrics and ground truth: Bounding boxes (VIPER XML format). Event labels for persons and frame span Comments: The site will be updated when more of the ground truth becomes available. Information on Copyright: Free download from website. • Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk • • • • BMVA CANTATA – INRIA, December 12, 2007 page 24 JL-25 Consumer applications VS-PETS'2003 - INMOVE • • • • • • Website: http://www.cvg.cs.rdg.ac.uk/VSPETS/vspets-db.html Description of Dataset: Outdoor people tracking - football data (three synchronised views). The datasets consists of football players moving around a pitch. Description of Ground Truth/Metadata: Tracking information on image plane for camera 3 can be found at: http://www.cvg.cs.rdg.ac.uk/VSPETS/Camera3Xml.zip. An AVI file of the ground truth for camera view 3 is also available at http://www.cvg.cs.rdg.ac.uk/VSPETS/Cam3_Gt.avi Results from metrics and ground truth: The location of each player on the pitch, for each frame of the sequence. For each player, the bounding box (with origin bottom left) in pixels should be determined. The position of the player is defined as the middle bottom of the bounding box (in pixels). Information on Copyright: Free download from website Contact person from Cantata: Dimitrios Makris, d.makris@kingston.ac.uk BMVA CANTATA – INRIA, December 12, 2007 page 25 JL-26 Consumer Applications TRICTRAC • • • • • • • Website: http://www.multitel.be/trictrac/ Description of dataset: Multicamera HD progressive image in jpeg for synthetic video sequence of soccer. Description of Ground Truth/Metadata: XML (position is 2D, 3D of objects and camera) Contextual info: No Results from metrics and ground truth : no Comments: the datasets is fully described in "TRICTRAC Video Dataset: Public HDTV Synthetic Soccer Video Sequences With Ground Truth", X. Desurmont, J-B. Hayet, J-F. Delaigle, J. Piater, B. Macq, Workshop on Computer Vision Based Analysis in Sport Environments (CVBASE), 2006. Information on Copyright: Access / licence: All data is publicly available and downloadable. If you publish results using the data, please acknowledge the data as coming from the TRICTRAC project, found at URL: http://www.multitel.be/trictrac. THE DATASET IS PROVIDED WITHOUT WARRANTY OF ANY KIND. Contact person from Cantata: Xavier Desurmont, desurmont\a\multitel.be BMVA CANTATA – INRIA, December 12, 2007 page 26 JL-27 Medical Dataset Example of one dataset BMVA CANTATA – INRIA, December 12, 2007 page 27 JL-28 Example with 2 signals: a mass and a micro calcification BMVA CANTATA – INRIA, December 12, 2007 page 28 JL-29 Conclusion WEB SITE • Many application domains (d.makris@kingston.ac.uk) 25 datasets for Surveillance 6 datasets for Comsumer applications 3 datasets for Medical http://www.tudor.lu/cantata http://www.tudor.lu/QuickPlace/cantata/PageLibraryC125725E002AB722.nsf/h_AA BC75AA0B05E5DFC125725E002B5E46/ED93066DB0E340C7C12573A2005 6D789/?OpenDocument User Name : Francois.Bremond@sophia.inria.fr Password : BMVA CANTATA – INRIA, December 12, 2007 page 29