IDMS−PROMS 2002 Scalable Independent Multi-level Distribution in Multimedia Content Analysis Viktor S. Wold Eide , Frank Eliassen , Ole-Christoffer Granmo , and Olav Lysne Department of Informatics , Oslo, Norway viktore,olegr @ifi.uio.no Simula Research Laboratory , Lysaker, Norway viktore,frank,olegr,olavly @simula.no http://www.ifi.uio.no/˜dmj/ Joint International Workshop on Interactive Distributed Multimedia Systems / Protocols for Multimedia Systems IDMS-PROMS 2002 November 26-29, 2002, Coimbra, Portugal Authors are listed alphabetically 1 of 17 IDMS−PROMS 2002 Introduction Content analysis, in general and an object tracking application Scalability challenges Component interaction and communication Feature extraction Classification Empirical results, scalability test of the object tracking application Outline Conclusion and further work 2 of 17 IDMS−PROMS 2002 Our application domain is automatic real-time content analysis The purpose of the content analysis is to index and annotate media streams Introduction Some examples of applications in this domain are: — traffic surveillance, — indexing of TV Broadcast news, and — object tracking, the application case in this paper This application domain has many common issues which may be handled generally Our overall project goal is to: “Address and devise solutions for an extensible framework for real-time content analysis of media streams transported over a network” 3 of 17 IDMS−PROMS 2002 Content Analysis Hierarchy In general, content analysis applications consist of several levels Classification feature Extraction Filtering Streaming 4 of 17 IDMS−PROMS 2002 Content Analysis Hierarchy: Object Tracking The functional decomposition of the object tracking application Tracked Position=(3,3) Classification OT 1 2 3 4 feature Extraction ME Filtering CF Streaming VS 1 2 3 4 OT : Object Tracking ME: Motion Estimation CF : Color Filtering VS : Video Streaming : Event Notification : Filtered media stream : Media stream 5 of 17 IDMS−PROMS 2002 Challenges The processing resource requirements for multimedia content analysis are very challenging, and will most likely remain so in the near future A scalable solution requires parallel and distributed processing on multiple CPUs In multimedia content analysis applications, parallelization and distribution are difficult tasks The relative computational complexity of streaming, filtering/transformation, feature extraction, and classification may vary A processing bottleneck at any level may render the application useless, unless the processing bottleneck can be resolved 6 of 17 IDMS−PROMS 2002 Content Analysis Hierarchy: Object Tracking A configuration where only the classification level is parallelized CO Classification PF 1 2 3 4 feature Extraction 1 2 3 4 ME Filtering CF Streaming VS Tracked Position=(3,3) PF CO : Coordination PF : Particle Filtering ME: Motion Estimation CF : Color Filtering VS : Video Streaming : Event Notification : Filtered media stream : Video Stream 7 of 17 IDMS−PROMS 2002 Content Analysis Hierarchy: Object Tracking A configuration where several levels are parallelized Tracked Position=(3,3) CO Classification PF PF 1 2 3 4 1 2 1 2 3 4 feature Extraction ME Filtering Streaming 3 4 ME CF CF VS CO : Coordination PF : Particle Filtering ME: Motion Estimation CF : Color Filtering VS : Video Streaming : Event Notification : Filtered media stream : Video Stream 8 of 17 IDMS−PROMS 2002 PF components subscribe to events from the event notification service, ENS: src=vs1 func=me ME components publish motion vectors for blocks as event notifications: src=vs1 func=me time=[t, t] block=[1,1] vector=[ 0 ,0] ... src=vs1 func=me time=[t, t] block=[3,2] vector=[-1 ,0] ... src=vs1 func=me time=[t, t] block=[4,4] vector=[ 0 ,0] Event-based Interaction: Object Tracking CF VS 2 ENS 1 CF 2 ME 3 ME PF 4 5 PF CO CO : Coordination PF : Particle Filter ME: Motion Estimation CF : Color Filtering VS : Video Streaming : Event Notification : Filtered media stream : Video Stream 9 of 17 IDMS−PROMS 2002 A scalable solution parallelization Even simple feature extraction algorithms are costly when applied to a real-time high quality video stream. Additionally, feature extraction algorithms may be arbitrarily complex Feature Extraction partitioning of the media data Our framework allows spatial partitioning, by using a block based approach. Each feature extraction component processes only some blocks 10 of 17 IDMS−PROMS 2002 Classification The classification level may become a processing bottleneck due to: — the complexity of the content analysis task — the required classification rate — the required classification accuracy Accordingly, a scalable solution requires parallel and distributed classification Classifier Texture Image: n−2 Motion Vector Image: n−1 Color Image: n 11 of 17 IDMS−PROMS 2002 Our PF maintains histories — of high-level concepts (e.g. object positions) is an assignment of high-level concepts to past video frames — is the likelihood of , given the extracted features Classification: The Particle Filter Alternative histories are maintained to handle noise and uncertainty Image: n−2 Image: n−1 Image: n 12 of 17 IDMS−PROMS 2002 Classification: A Parallel Particle Filter We propose a parallel PF for resolving classification processing bottlenecks Our parallel PF consists of multiple parallel PF components and a single light-weight coordinator component Each PF component maintains local histories of high-level concepts The PF components cooperate by exchanging event notifications to synchronize histories The coordinator makes globally consistent classifications based on the local histories of the PF components Image: n−2 Image: n−1 Image: n 13 of 17 IDMS−PROMS 2002 Used standard PCs connected by 100 Mbps switched Ethernet LAN The protocol stack for media streaming was MJPEG/RTP/UDP/IP multicast Empirical Results: Scalability Test, Object Tracking Video size of 352 x 288 pixels, block size of 16 x 16 pixels, 320 blocks The number of frames / second processed by different configurations of the object tracking application, compared to the ideal frame rate: 1 CPU 2 CPUs 4 CPUs 8 CPUs 10 CPUs Ideal Frame Rate 2.5 5 10 20 25 Streaming 2.5 5 10 20 25 Filtering and Feature Extraction 2.5 5 8.5 13.5 16 Classification 2.5 5 10 20 25 Observation: When streaming at 25 f/s, depacketization and JPEG to RGB transformation consumes roughly 30% of the processing power of a single CPU. The entire video frame is processed, not only the necessary blocks 14 of 17 IDMS−PROMS 2002 Conclusion Event-based interaction simplifies parallelization — Provides a level of indirection, location transparency, etc. Each level of the content analysis task may be independently parallelized — Allows for focusing the processing resources on the processing bottlenecks The parallel particle filter is well suited for real-time classification — Allows distributed processing at the classification level — Is robust, i.e. able to supress noise in extracted features The scalability of a real-time motion vector based object tracking application, implemented in the framework, has been demonstrated experimentally 15 of 17 IDMS−PROMS 2002 Assign identity to objects during classification, based on color and texture Add parallel block based color and texture feature extractors Further Work Add a number of video streams and relate classified content, e.g. track objects across media streams and time, based on assigned identity Add demand driven feature extraction - the features are ranked on-line according to their ability to contribute to the current stage of the content analysis task — E.g. the edge blocks are processed for object detection, and the blocks surrounding objects are processed for tracking purposes 16 of 17 IDMS−PROMS 2002 Further Work We are currently working on an event notification service for high data rates Object tracking case: Each motion estimation component may subscribe to only some blocks of each video frame CF VS 1 ME 2 3 PF 4 5 ENS CF ME PF CO CO : Coordination PF : Particle Filter ME: Motion Estimation CF : Color Filtering VS : Video Streaming : Event Notification 17 of 17