ACM Multimedia 2003 Supporting Timeliness and Accuracy in Distributed Real-time Content-based Video Analysis Viktor S. Wold Eide1,2, Frank Eliassen2, Ole-Christoffer Granmo1,2,3, and Olav Lysne2† University of Oslo , Norway {viktore,olegr}@ifi.uio.no 2 Simula Research Laboratory , Lysaker, Norway {viktore,frank,olegr,olavly}@simula.no 3 Agder University College, Norway {ole.granmo}@hia.no 1 http://www.ifi.uio.no/˜dmj/ ACM Multimedia 2003 November 2-8, 2003, Berkeley, CA, USA † Authors are listed alphabetically 1 of 16 ACM Multimedia 2003 Control Introduction Processing Sensors Streaming Network • Due to the increasing availability of inexpensive cameras and deployment of high-speed computer networks, it has become economically and technically feasible to build complex distributed real-time video content analysis applications 2 of 16 ACM Multimedia 2003 Introduction • The purpose of real-time content analysis applications is to index and annotate captured media streams on-line, as events happen — E.g., to detect and track a running person on-line • Some examples of applications in this domain are: — traffic surveillance — indexing of TV Broadcast news • This application domain has many common issues which may be handled generally 3 of 16 ACM Multimedia 2003 Challenges QoS Requirements Application COL 1 2 3 4 5 6 7 8 100 1 2 3 6 12 25 50 80 10 Physical Resources Ether 10/100 ! Power • The video data must be analyzed: — at least as fast as the data is made available to the application — with an acceptable error rate • Such Quality of Service requirements are typically mutually dependent • The tasks of the application must be mapped to the physical resources so that the QoS requirements are satisfied during execution 4 of 16 ACM Multimedia 2003 Contribution • An architecture for distributed real-time content-based video analysis that supports — an explicit QoS model for this class of applications — balancing of QoS properties against the available processing resources — scalability at multiple logical levels of distribution 5 of 16 ACM Multimedia 2003 Content Analysis QoS Model • Accuracy — maximum ratio of misclassifications to number of classifications • Temporal Resolution — minimum temporal length of detectable events • Latency — maximum application response time 6 of 16 ACM Multimedia 2003 Content Analysis Application Model C Classification C feature Extraction E E E Filtering F F F Streaming S S C E F S : Classification : feature Extraction : Filtering : Streaming : Extracted Features : Filtered media stream : Media stream • A typical content analysis application can be seen as a graph, where nodes represent tasks and edges represent directed flows of data • Different classes of functionality can be found at the four logical levels of the task graph: streaming, filtering, feature extraction, and classification 7 of 16 ACM Multimedia 2003 Architecture Requirements • In order to provide some level of control of the QoS provided, the video content analysis application must be scalable and resource aware • A scalable architecture can generally only be obtained by adopting distribution as its basic principle — e.g., by parallelizing and distributing application algorithms • In video content analysis, the relative complexity of streaming, filtering, feature extraction, and classification depends on the application ⇓ • The architecture should support parallelization and focusing of processing resources on any given logical levels, independently of other logical levels • Such parallelization and distribution requires a scalable interaction mechanism • Algorithms that can be used to decide whether a QoS requirement can be satisfied in a given processing environment are needed 8 of 16 ACM Multimedia 2003 Overall Architecture E F E E E S Application Candidate Configurations QoS Requirements C C C − accuracy − temporal resolution − latency ARCAMIDE Config 1 Config 2 links Resource model CPU CPU CPU Ether 10/100 COL 1 2 3 4 5 6 7 8 100 1 2 3 6 12 25 50 80 10 ! Power Physical Resources F 9 of 16 ACM Multimedia 2003 Example Application Task Graph Tracked Position=(3,3) CO Classification PF PF 1 2 3 4 1 2 1 2 3 4 feature Extraction ME Filtering Streaming Parallel processing at different levels 3 4 ME CF CF VS CO : Coordination PF : Particle Filtering ME: Motion Estimation CF : Color Filtering VS : Video Streaming : Event Notification : Filtered media stream : Video Stream 10 of 16 ACM Multimedia 2003 Generating Candidate Configurations 1 ms CO 10 ms PF 10 ms HC 10 ms ME 10 ms CF 1 ms 1 ms 1 ms 1 ms 1 ms 1 ms 1 ms 1 ms 1 ms PF 10 ms 1 ms CO TC 10 ms 10 ms PF ME 10 ms 10 ms ME CF 10 ms 10 ms CF 1 ms 1 ms PF 10 ms ME 10 ms 1 ms 1 ms 30 ms VS Error rate: 0.03 Latency: 74 ms Resolution: 43 ms 1 ms CF 10 ms 30 ms VS Error rate: 0.05 Latency: 64 ms Resolution: 33 ms • ARCAMIDE prunes tasks from “brute force” task graph iteratively until either — accuracy falls below required level −→ failure — latency/resolution requirements are met −→ success • Pruning is guided by task efficiency: accuracy loss/processing cost 11 of 16 ACM Multimedia 2003 Deployment and Execution CO PF ME CF VS CF ME PF ENS OS HW ENS OS HW ENS OS HW COL 1 2 3 4 5 6 7 8 100 1 2 3 6 12 25 50 80 10 Physical Resources Ether 10/100 ! Power CO : Coordination PF : Particle Filter ME : Motion Estimation CF : Color Filtering VS : Video Streaming ENS : Event Notification Service OS : Operating System HW : Hardware • The deployed components communicate through a high-performance distributed event notification service — Simplifies configuration and reconfiguration — Supports independent parallelization at different logical levels 12 of 16 ACM Multimedia 2003 Empirical Results: Balancing Accuracy Against Timeliness 60 0,6 50 0,5 40 0,4 30 0,3 20 0,2 10 0,1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Processing Time Error Rate • The efficiency-based ARCAMIDE pruning strategy produces a fine grained range of error rate/processing time tradeoffs 13 of 16 ACM Multimedia 2003 Empirical Results: Scalability The number of frames / second processed by different deployments of a fixed object tracking task graph, compared to the ideal frame rate: 1 CPU 2 CPUs 4 CPUs 8 CPUs 10 CPUs Ideal Frame Rate 2.5 5 10 20 25 Streaming 2.5 5 10 20 25 Filtering and Feature Extraction 2.5 5 8.5 13.5 16 Classification 2.5 5 10 20 25 14 of 16 ACM Multimedia 2003 Conclusion We have presented a general architecture for distributed real-time video content analysis applications, which given: • the application graph (components and data flow), • the application QoS requirements (accuracy and timeliness), and • the available physical resources (expressed in the resource model) supports QoS aware mapping of application onto physical resources Salient features of the architecture include: • independent scalability at multiple logical levels of distribution — handle harder QoS requirements by utilizing additional resources — decouple application development from QoS mapping and deployment — realized by using an event notification service 15 of 16 ACM Multimedia 2003 Further Work • Development of a more complete QoS management architecture for real-time video content analysis applications • The work presented here represents steps towards that goal 16 of 16