DEBS’02 Viktor S. Wold Eide , Frank Eliassen , Olav Lysne , and Ole-Christoffer Granmo Real-time Processing of Media Streams: A Case for Event-based Interaction Department of Informatics , Oslo, Norway viktore,olegr @ifi.uio.no Simula Research Laboratory , Lysaker, Norway viktore,frank,olegr,olavly @simula.no http://www.ifi.uio.no/˜dmj/ International Workshop on Distributed Event-based Systems (DEBS’02) July 2nd and 3rd, 2002, Vienna, Austria 1 of 20 DEBS’02 Overall Project Goal The purpose of the content analysis is to index and annotate media streams. Address and devise solutions for an extensible framework for real-time content analysis of media streams transported over a network. The purpose of the framework is to simplify development for this application domain, by handling general issues. 2 of 20 Streaming Control Sensors DEBS’02 Typical Environment for the Application Domain Network Processing 3 of 20 Streaming Control DEBS’02 Case: Real-time Tracking of Object in Video Network Sensors Processing 4 of 20 DEBS’02 Massive amounts of data, even for a single compressed media stream, e.g. video 1Mbps - 10Mbps Concurrent analysis of a number of different media streams increase resource requirements even further Some application sub domains have an inherent distributed nature, e.g. office/traffic surveillance Todays “best effort” environments (OS, network) can not give any guarantees with respect to the availability of resources Real time requirements limit the time available for processing each media 40 ms/frame sample - 25 frames/second video Challenges 5 of 20 DEBS’02 Challenges: Feature Extraction Calculation of quantitative features, such as motion vectors, color histograms, and texture coarseness, may in general be arbitrarily complex and computationally demanding. Object tracking case: Motion vector estimation, block based. 6 of 20 DEBS’02 Challenges: Classification Interpretation of extracted features, in general is a very hard problem. Feasibility and reliability necessitates restriction of interpretation to an application specific context. Object tracking case: Determine when an object enters the camera view, and then track the center position of the moving object. 7 of 20 DEBS’02 Support distributed processing Support parallel processing of compute intensive algorithms Support concurrent processing of a number of media streams Support adaptation, reconfiguration, and migration Simplify integration of new sub-technologies, e.g. a new video analysis algorithm should be pluggable into framework when available Framework Goals Develop domain specific train-able classifiers to bridge the gap between low-level features (e.g. motion vectors) and high-level interpretation/annotations (e.g. moving object) 8 of 20 DEBS’02 Content Analysis Hierarchy: Example Classification C C feature Extraction E E E F F F Filtering Streaming S S C E F S : Classification : feature Extraction : Filtering : Streaming : Extracted Features : Filtered media stream : Media stream 9 of 20 DEBS’02 Content Analysis Hierarchy: Tracking of Object Tracked Position=(3,3) Classification OT 1 2 3 4 feature Extraction ME Filtering CF Streaming VS 1 2 3 4 OT : Object Tracking ME: Motion Estimation CF : Color Filtering VS : Video Streaming : Event Notification : Filtered media stream : Media stream 10 of 20 DEBS’02 Content Analysis Hierarchy: Tracking of Object Tracked Position=(3,3) CO Classification PF PF 1 2 3 4 1 2 3 4 1 2 3 4 feature Extraction ME Filtering Streaming Parallel processing at different levels ME CF CF VS CO : Coordination PF : Particle Filtering ME: Motion Estimation CF : Color Filtering VS : Video Streaming : Event Notification : Filtered media stream : Video Stream 11 of 20 DEBS’02 Interaction Model Requirements allow one-one, one-many, many-one, and many-many communication provide a level of indirection to simplify parallelization The interaction model should: provide a level of indirection to simplify adaptation, reconfiguration, and migration 12 of 20 DEBS’02 A Case for Event-based Interaction These requirements fit the publish/subscribe interaction paradigm very well, leading to an event based model. balance requirements for real time communication and low event propagation delay against ordering and reliability guarantees associated with event delivery be realized as a distributed service to improve scalability and reliability support different degrees of distribution - intra process, intra host, and inter host (both LAN and WAN) The event notification service should: take advantage of network level multicast to improve scalability We handle ordering and synchronization at our framework level, by assuming global time in hosts (Network Time Protocol, IETF). 13 of 20 DEBS’02 Event-based Interaction: Object Tracking CF VS 2 ME ENS 3 1 PF 4 5 2 CF ME PF CO CO : Coordination PF : Particle Filter ME: Motion Estimation CF : Color Filtering VS : Video Streaming : Event Notification : Filtered media stream : Video Stream 14 of 20 Streaming ! ! ! ! ! ! ! ! ! ! Control " " " " #" #" #" " # # # DEBS’02 Deployment Example: Object Tracking CO Network Sensors PF VS ME CF PF ME CF Processing 15 of 20 DEBS’02 Prototype, Event Notification Service A thin layer on top of UDP and IP multicast Filtering in Mbus layer of all components - “multicast and filter” Filtering based on address, a sequence of name,value tuples Provides a layer of indirection Inherits many characteristics from IP multicast (scalability, delay, reliability, and ordering) Implemented as a distributed service, in network and end nodes Based on Mbus, work in progress in IETF. Mbus has the following characteristics: Easy to integrate, few lines of code, text based protocol 16 of 20 DEBS’02 Used standard PC’s connected by 100 Mbps switched Ethernet LAN The protocol stack for media streaming was MJPEG/RTP/UDP/IP Empirical Results: Scalability Test, Object Tracking Video size of 352 x 288 pixels, block size of 16 x 16 pixels, 320 blocks 1 CPU 2 CPUs 4 CPUs 8 CPUs 10 CPUs Streaming 2.5 5 10 20 25 Object Tracking 2.5 5 8.5 13.5 16 Efficiency 100% 100% 85% 67.5% 64% The number of frames/second processed by different configurations of the object tracking application, compared to the streamed (ideal) frame rate. Observation: When streaming at 25 f/s, depacketization and JPEG to RGB transformation consumes roughly 30% of the processing power of a single CPU. The entire video frame is processed, not only the necessary blocks. 17 of 20 DEBS’02 A distributed approach is required to handle the real-time requirements, the massive amounts of data, and the computational complexity A distributed solution is more appropriate for problem domains having an inherent distributed nature An event notification service it the glue that binds components together and provides a level of indirection The event notification service simplifies parallelization, adaptation, reconfiguration, and migration The scalability of a real-time motion vector based object tracking application, implemented in the framework, has been demonstrated experimentally. Conclusion Mbus performs well for our experiments, both as intra and inter host event notification service on a LAN. 18 of 20 DEBS’02 Further Work Will an event notification service capable of video streaming improve scalability? Object tracking: Each motion estimation component may then subscribe to only a region, some blocks, of each video frame. ME CF PF Event Notification Service VS 1 CF 2 3 ME 4 5 PF CO CO : Coordination PF : Particle Filter ME: Motion Estimation CF : Color Filtering VS : Video Streaming : Event Notification 19 of 20 DEBS’02 Add parallel block based color and texture feature extractors Map identity to objects during classification, based on color and texture Add a number of video streams and relate classified content, e.g. track objects across media streams and time, based on assigned identity Further Work Demand driven execution, e.g. process edge blocks of video for object detection, and then all blocks for tracking 20 of 20