VideoForge – Video Processing system *** doplnit referencie Abstract This document describes in detail the system for processing of video-sequences or live camera streams. There are many graphical and modeling systems that offer a lot of functionality, but in the field of Computer Vision the situation is not so good. The purpose of this system is to unify the work within the area of Computer Vision and to facilitate the creation of Computer Vision applications. We would like to create such a system, in which we can solve the computer vision tasks (almost) without technical programming. The other reason is: we want to create a standard base for implementation of various algorithms so that anyone can test and use the implemented algorithms easily and he can also easily extend the capabilities of the system. Introduction There exists a number of video processing systems. During the last years we have been collecting information on the demands of a universal video processing system. We have performed analyses and now we have implemented the first version of the image processing system VideoForge. This system is trying to simplify and facilitate the development of the software solutions from the Computer Vision area. This software is mainly aimed to support the real-time video processing. The system has plugins and the user can build up the processing pipeline out of these plugins. The plugins can perform virtually any kind of image processing functionality. Comparison to existing systems We have chosen some systems to demonstrate the diverseness of approaches: FFTW FFTW is a free library for optimized computation of discrete fast Fourier transformation The library is free for use This library will be used as platform independent implementation of FFT AMD Core Math Library (ACML, Error! Reference source not found.) ACML is a free library for quick matrix operation highly optimized for AMD processors (platform independent) based on FORTRAN interface ported to C This library will be used for AMD processors for some basic image processing This library is an example of a terrible interface (see Figure 2) int ldh, complex *w, complex *z, int ldz, int *info); extern void _stdcall cpbcon(char uplo, int n, int ndiag, complex *a, int lda, float anorm, float *rcond, int *info); extern void _stdcall cpbequ(char uplo, int n, int ndiag, complex *a, int lda, float *scale, float *scond, float *amax, int *info); extern void _stdcall cpbrfs(char uplo, int n, int ndiag, int nrhs, complex *a, int lda, complex *af, int ldaf, complex *b, int ldb, complex *x, int ldx, float *ferr, float *berr, int *info); extern void _stdcall cpbstf(char uplo, int n, int kd, complex *ab, int ldab, int *info); extern void _stdcall cpbsv(char uplo, int n, int ndiag, int nrhs, complex *a, int lda, complex *b, int ldb, int *info); extern void _stdcall cpbsvx(char fact, char uplo, int n, int ndiag, int nrhs, complex *a, int lda, complex *af, int ldaf, char equed, float *scale, complex *b, int ldb, complex *x, int ldx, float *rcond, float *ferr, float *berr, int *info); extern void _stdcall cpbtrf(char uplo, int n, int ndiag, complex *a, int lda, int *info); extern void _stdcall cpbtrs(char uplo, int n, int ndiag, int nrhs, complex *a, int lda, complex *b, int ldb, int *info); extern void _stdcall cpocon(char uplo, int n, complex *a, int lda, float anorm, float *rcond, int *info); extern void _stdcall cpoequ(int n, complex *a, int lda, float *scale, float *scond, float *amax, int *info); Figure 2 - ACML library interface (excerpt, the library contains about 300 such functions) NAG Libraries (Error! Reference source not found.NAG) This commercial library contains implementation of various numerical algorithms implemented in FORTRAN. This library also offers C interface to these algorithms Matrox Image Library (Error! Reference source not found.MIL) This library supports many professional graphical devices created by Matrox company This library is an essential software part for managing these devices Open Computer Vision Library (OpenCV, Error! Reference source not found.now in beta 5) This contains implementation of higher level computer vision algorithms, including fundamental matrix computation, RANSAC, etc. This library also contains algorithms for stereovision This library is free Intel Image Processing Library (Error! Reference source not found.) This library contains basic image processing operations like morphological operations, logical operations optimized for majority of current processors The library got commercial. We are using version 25 that is free for non-commercial use We will use this library for implementation of basic image processing operations Microsoft Vision SDK (Error! Reference source not found.) The library contains basic (really basic) operations with raster images. This library is free for use The library contains nice type-safe implementation written in ANSI C++ Microsoft DirectX (Error! Reference source not found.) The library ships with all versions of Windows operating system and it will be the backbone of the system implementation on Windows™ platform. This library will be used as a backbone for Windows™ platform IJG Libraries (Error! Reference source not found.) This library is free for use. It was created by Independent JPEG group It contains implementation of fast, optimized JPEG codec for static images IBM Mpeg Toolkit(Error! Reference source not found.) This library is a commercial one. It offers a Java interface for MPEG4 video playback and recording Matrix Template Library (MTL, Error! Reference source not found.) This free library contains implementation of fast matrix operations. It has nice type-safe C++ interface It will be part of the system for computing of the statistical moments ImageMagick (Error! Reference source not found.) This library contains conversion algorithms from and to static image formats It contains conversion algorithms from/to 96 different image formats including new JPEG 2000 image format (JASPER library) ImageJ (Error! Reference source not found.) This application contains implementation of basic image processing operations It is completely written in Java and it supports scripting and plugins So, it is here mentioned as a mean how to write platform-independent plug-ins for image processing, although the speed of interpreted code (Java) is not so high as the speed of compiled code This is quite essential in video processing EIKONA *** Cantata ImageForge, Photoshop ? Interactive vs. non-interactive system mode The system can work in two modes – an interactive and a non-interactive mode. The interactive mode of the system is useful when the user wants to design some new algorithm. In this mode the system works as a full-fledged application and the user can design a new algorithm using the graphical user interface and save it for later use. In the non-interactive mode the system loads the previously saved graph and executes the graph on the input data. The user interface of the system can be implemented in various ways. It can use native or platform independent windows management libraries like wxWindows, MFC, Qt, etc. It is also very important to do the final user interface as comfortable as possible for the end user. This is achieved via easy customization of the user interface and easy storing of the user interface elements status (shortcuts, shown/hidden toolbars, menu items, etc). Layered system architecture The system itself consists of 4 layers: Core wrapper VFCore layer Core Base The core wrapper layer is usually the application for development. But it can also be some kind of COM object or a web service. These can be using the information that was previously created in the interactive mode of the system by the developer. The VFCore layer is responsible for enumeration of existing algorithms and devices in the system. This layer is also responsible for creating graph of connected algorithms and devices. The Core layer is responsible for loading and unloading of modules and enumeration of the plugins. The Base layer is responsible for logging support, persistence and allocation. The lower two layers (Core and Base) are designed generally, so that they can be used also in other systems that want to use plugins not only in VideoForge. Core wrapper = User Interface layer VFCore = The core system object – main functionality Core –plugin enumeration, loading and unloading of statically and dynamically loaded modules Base – logging, allocation, reflection and persistence support Figure 1 – Layered system architecture Modular architecture The system is implemented in a modular way, so that the user can choose what modules (containing plugins) are loaded at startup of the system. The modules are implemented as dynamically loaded libraries. The modules are implemented so they are as small as possible and the development of the plugins and modules is as easy as possible. Plugin development All algorithms in the system are implemented like plugins. We need just 5 lines of core to implement an empty plugin, we need just copy 18 lines to implement a module template. We need 9 lines and one method redefinition for inplace algorithm (the simplest type of algorithm). There can also exist other types of algorithms other than device enumerators and algorithms. The plugin can automatically connect the boxes in the graph, or it can obtain information from the graph (number of boxes, memory consumption, etc) or it can show some image information like histogram. System Windows The windows (showing some kind of data) in the system are of 2 kinds: Bitmap windows Vector windows The bitmap windows of the system are a mean for fast showing of image frames .The vector windows are usually drawn on top of the bitmap window and they show some kind of graphical information. Logging The system has built-in support for easy information of the user, when some event in the system occurs. The logging is a functionality that is available to plugin developers. There exist various levels of logging, so that the user can easily filter unwanted messages. The logging works seamlessly in the core and in the modules. It is also multithread safe. Persistence The system has built-in support for storing and retrieving information about the state of the algorithms and devices. The information is stored in the chunks of typed information, and there exist various implementations of persistence. The binary persistence is the fastest way of storing information. This information is not easily readable by human. The text persistence is readable by human and it is even editable by hand. We also want to implement XML persistence. We support also operators << and >> for easy loading and storing. The storing of reflected variables is automatic, but overridable. Every plugin can store its information elsewhere, and store just a link, or just not implement the Load and Save methods. Allocator The system offers built-in allocator for easy allocation across modules. The only issue is that the class must have default constructor in order to be able to be allocated by it . This could be more easily achieved in new C++0x standard. There we could request the concept DefaultConstructible for the class. The system allocator is multithread safe and we can allocate an object in one module and free the object in other module. License The license of the system is freeware. The license for using particular libraries is limited by their respective owners Macros During development of the system I was trying to completely avoid C++ macros The macros are source of errors, because they don’t carry any type information Except in one case (the REFLECTION macro map) we don’t use any important macros in the code There is no way (according to C++ 98) standard to do this without a macro.. Language of implementation When we started to think about the implementation, we were considering various languages: C++ compatible with C native code (almost as fast as C code) extended instruction sets support (MMX, 3DNow, SSE, etc) relatively easy implementation of plugins precisely standardized Java easy implementation of plugins – built in reflection not native code – slow for video processing the performance of calling objects is reasonable outside the image processing cycle in Java, but NOT INSIDE*** C fast, faster than C++ lack of important language features many existing libraries and algorithms – IPL, OpenCV others not so widely used as C++ We chose the C++ language, as the performance was the primary goal. Scripting support All reflected classes are automatically transformable for using in different scripts The user of the system can use script to quickly implement some kind of script-algorithm. Support for scripting - all reflected classes are automatically transformable for using in different scripts - working on the scripting wrapper for ActiveX scripting objects Support for GPU algorithms - working on implementation of wrapper for GPU in GLSL, HLSL, etc Platform independence The core of the system is written strictly in ANSI C++ 98 .This ensures maximal portability among various platforms. The only thing non-conformant to the standard is the need for some kind of class identifier. This identifier is obtained using parsing of the class name returned from typeid(T).name() for particular type T. According to the standard this method may return an empty string. But nowadays, for the most of existing platforms this method returns reasonably encrypted type name. Then we use function AuxUtils::ParseClassName to parse the class name to the form that is the same on all platforms. So, the only thing we need to remake in the core when compiling under different platform is the aforementioned method, and then EnumerateModules method, because this method needs to iterate through the file system to find and load the dynamically loaded libraries, and this is platform dependent Pedagogic and didactic aspects The system is very good for learning of the basic image processing and computer vision algorithms The system enables the user to “see” what particular algorithms do. The batch processing and implementation of rather complicated image processing tasks is made rather simpler. Case studies In this section we briefly mention some trivial scenarios of system usage for better understanding of the cooperation between the system, plug-ins and their interaction with the user. Facial feature tracking in live-stream The system is in non-interactive mode. The facial feature tracking plug-in (consisting of particular feature detectors) tracks the features from live stream data. Training the classifier on a group of images The system is in non-interactive mode. The input file contains directory path, where the files for training are located. The training plug-in feeds each file to the classifier. Developing a simple driving license code recognition module The developer implements a class derived from CAlgorithm. The only significant method to override is Apply, which applies the algorithm to the input buffer. The result is shown using a simple message box. Various algorithm properties can be published by implementation of IRefObj interface. The algorithm can load/store its values by implementation of IPersistence interface. Face recognition system as ISAPI-web application The only thing to develop is the ISAPI web application template; the system is included into the application as a COM object. Console image processing application The system is in interactive mode and the actions are offered to the user via a console window, the result of the recognition is stored as a file and/or shown as a stand-alone window in OS. Cooperation with avatar system The system works as a subsystem in the emphatic avatar system (see Figure 2). The avatar is an entity existing in a virtual reality. The avatar is a reflection of a user in the VR. We proposed an avatar and now we are trying to extraverbally communicate in VR. For this we are using a pair of webcams. These capture the facial expression of the user, this is then decoded and sent to VRML script, so the avatar reacts in accordance with the user face expression. The system handles the input part of the avatar system. It works in non-interactive mode. It uses plugins for feature tracking and detection. The user communicates with the avatar, the image processing system monitors via a camera device the user and detects his facial features. According to the facial features the facial expression is calculated and then the avatar reacts according to the user’s facial expression. It feeds the information about detected facial feature points to either the Avatar toolkit for creating the avatar face or to the avatar itself to detect the user expression. The reader can find more information in Error! Reference source not found.. The part that will be implemented in the image processing system is shown in Figure 2 in dashed blue rectangle. Figure 2 – Avatar system Results Figure 3 – VideoForge a screenshot of existing system The reader can find more information on the VideoForge webpage – http://www.sccg.sk/~kubini/videoforge Future work The development of the system is not yet finished We are also awaiting the new C++ standard C++0x that should bring new C++ features like concepts and automatic type generation The reader can find more information in the references and ISO C++ standardization committee white papers There will probably exist also platform independent plugin interface and/or implementation Features: - avi <-> frames and back seamlessly - do it for many images - store previous frames, if needed - staticky alebo dynamicky modul References [1] Test [2] erwegw [3] efwe