VideoForge – Video Processing system

advertisement
VideoForge – Video Processing system
*** doplnit referencie
Abstract
This document describes in detail the system for processing of video-sequences or live camera
streams. There are many graphical and modeling systems that offer a lot of functionality, but
in the field of Computer Vision the situation is not so good.
The purpose of this system is to unify the work within the area of Computer Vision and to
facilitate the creation of Computer Vision applications. We would like to create such a
system, in which we can solve the computer vision tasks (almost) without technical
programming. The other reason is: we want to create a standard base for implementation of
various algorithms so that anyone can test and use the implemented algorithms easily and he
can also easily extend the capabilities of the system.
Introduction
There exists a number of video processing systems. During the last years we have been
collecting information on the demands of a universal video processing system. We have
performed analyses and now we have implemented the first version of the image processing
system VideoForge. This system is trying to simplify and facilitate the development of the
software solutions from the Computer Vision area. This software is mainly aimed to support
the real-time video processing. The system has plugins and the user can build up the
processing pipeline out of these plugins. The plugins can perform virtually any kind of image
processing functionality.
Comparison to existing systems
We have chosen some systems to demonstrate the diverseness of approaches:
 FFTW
FFTW is a free library for optimized computation of discrete fast Fourier transformation The
library is free for use This library will be used as platform independent implementation of
FFT
 AMD Core Math Library (ACML, Error! Reference source not found.)
ACML is a free library for quick matrix operation highly optimized for AMD processors
(platform independent) based on FORTRAN interface ported to C This library will be used
for AMD processors for some basic image processing This library is an example of a terrible
interface (see Figure 2)
int ldh, complex *w, complex *z, int ldz, int *info);
extern void _stdcall cpbcon(char uplo, int n, int ndiag, complex *a, int lda,
float anorm, float *rcond, int *info);
extern void _stdcall cpbequ(char uplo, int n, int ndiag, complex *a, int lda,
float *scale, float *scond, float *amax, int *info);
extern void _stdcall cpbrfs(char uplo, int n, int ndiag, int nrhs, complex *a,
int lda, complex *af, int ldaf, complex *b, int ldb, complex *x, int ldx, float
*ferr, float *berr, int *info);
extern void _stdcall cpbstf(char uplo, int n, int kd, complex *ab, int ldab, int
*info);
extern void _stdcall cpbsv(char uplo, int n, int ndiag, int nrhs, complex *a, int
lda, complex *b, int ldb, int *info);
extern void _stdcall cpbsvx(char fact, char uplo, int n, int ndiag, int nrhs,
complex *a, int lda, complex *af, int ldaf, char equed, float *scale, complex *b,
int ldb, complex *x, int ldx, float *rcond, float *ferr, float *berr, int *info);
extern void _stdcall cpbtrf(char uplo, int n, int ndiag, complex *a, int lda, int
*info);
extern void _stdcall cpbtrs(char uplo, int n, int ndiag, int nrhs, complex *a,
int lda, complex *b, int ldb, int *info);
extern void _stdcall cpocon(char uplo, int n, complex *a, int lda, float anorm,
float *rcond, int *info);
extern void _stdcall cpoequ(int n, complex *a, int lda, float *scale, float
*scond, float *amax, int *info);
Figure 2 - ACML library interface (excerpt, the library contains about 300 such functions)
 NAG Libraries (Error! Reference source not found.NAG)
This commercial library contains implementation of various numerical algorithms
implemented in FORTRAN. This library also offers C interface to these algorithms
 Matrox Image Library (Error! Reference source not found.MIL)
This library supports many professional graphical devices created by Matrox company This
library is an essential software part for managing these devices
 Open Computer Vision Library (OpenCV, Error! Reference source not found.now in
beta 5)
This contains implementation of higher level computer vision algorithms, including
fundamental matrix computation, RANSAC, etc. This library also contains algorithms for
stereovision This library is free
 Intel Image Processing Library (Error! Reference source not found.)
This library contains basic image processing operations like morphological operations, logical
operations optimized for majority of current processors
The library got commercial. We are using version 25 that is free for non-commercial use We
will use this library for implementation of basic image processing operations
 Microsoft Vision SDK (Error! Reference source not found.)
The library contains basic (really basic) operations with raster images. This library is free for
use The library contains nice type-safe implementation written in ANSI C++
 Microsoft DirectX (Error! Reference source not found.)
The library ships with all versions of Windows operating system and it will be the backbone
of the system implementation on Windows™ platform. This library will be used as a
backbone for Windows™ platform
 IJG Libraries (Error! Reference source not found.)
This library is free for use. It was created by Independent JPEG group It contains
implementation of fast, optimized JPEG codec for static images
 IBM Mpeg Toolkit(Error! Reference source not found.)
This library is a commercial one. It offers a Java interface for MPEG4 video playback and
recording
 Matrix Template Library (MTL, Error! Reference source not found.)
This free library contains implementation of fast matrix operations. It has nice type-safe C++
interface It will be part of the system for computing of the statistical moments
 ImageMagick (Error! Reference source not found.)
This library contains conversion algorithms from and to static image formats It contains
conversion algorithms from/to 96 different image formats including new JPEG 2000 image
format (JASPER library)
 ImageJ (Error! Reference source not found.)
This application contains implementation of basic image processing operations It is
completely written in Java and it supports scripting and plugins So, it is here mentioned as a
mean how to write platform-independent plug-ins for image processing, although the speed of
interpreted code (Java) is not so high as the speed of compiled code This is quite essential in
video processing
 EIKONA ***
 Cantata
 ImageForge, Photoshop ?
Interactive vs. non-interactive system mode
The system can work in two modes – an interactive and a non-interactive mode. The
interactive mode of the system is useful when the user wants to design some new algorithm.
In this mode the system works as a full-fledged application and the user can design a new
algorithm using the graphical user interface and save it for later use. In the non-interactive
mode the system loads the previously saved graph and executes the graph on the input data.
The user interface of the system can be implemented in various ways. It can use native
or platform independent windows management libraries like wxWindows, MFC, Qt, etc. It is
also very important to do the final user interface as comfortable as possible for the end user.
This is achieved via easy customization of the user interface and easy storing of the user
interface elements status (shortcuts, shown/hidden toolbars, menu items, etc).
Layered system architecture
The system itself consists of 4 layers:
 Core wrapper
 VFCore layer
 Core
 Base
The core wrapper layer is usually the application for development. But it can also be some
kind of COM object or a web service. These can be using the information that was previously
created in the interactive mode of the system by the developer.
The VFCore layer is responsible for enumeration of existing algorithms and devices in the
system. This layer is also responsible for creating graph of connected algorithms and devices.
The Core layer is responsible for loading and unloading of modules and enumeration of the
plugins. The Base layer is responsible for logging support, persistence and allocation.
The lower two layers (Core and Base) are designed generally, so that they can be used also in
other systems that want to use plugins not only in VideoForge.
Core wrapper = User Interface layer
VFCore = The core system object – main functionality
Core –plugin enumeration, loading and
unloading of statically and dynamically loaded
modules
Base – logging, allocation, reflection
and persistence support
Figure 1 – Layered system architecture
Modular architecture
The system is implemented in a modular way, so that the user can choose what modules
(containing plugins) are loaded at startup of the system. The modules are implemented as
dynamically loaded libraries. The modules are implemented so they are as small as possible
and the development of the plugins and modules is as easy as possible.
Plugin development
All algorithms in the system are implemented like plugins. We need just 5 lines of core to
implement an empty plugin, we need just copy 18 lines to implement a module template. We
need 9 lines and one method redefinition for inplace algorithm (the simplest type of
algorithm).
There can also exist other types of algorithms other than device enumerators and algorithms.
The plugin can automatically connect the boxes in the graph, or it can obtain information
from the graph (number of boxes, memory consumption, etc) or it can show some image
information like histogram.
System Windows
The windows (showing some kind of data) in the system are of 2 kinds:
 Bitmap windows
 Vector windows
The bitmap windows of the system are a mean for fast showing of image frames .The vector
windows are usually drawn on top of the bitmap window and they show some kind of
graphical information.
Logging
The system has built-in support for easy information of the user, when some event in the
system occurs. The logging is a functionality that is available to plugin developers. There
exist various levels of logging, so that the user can easily filter unwanted messages.
The logging works seamlessly in the core and in the modules. It is also multithread safe.
Persistence
The system has built-in support for storing and retrieving information about the state of the
algorithms and devices. The information is stored in the chunks of typed information, and
there exist various implementations of persistence. The binary persistence is the fastest way of
storing information. This information is not easily readable by human. The text persistence is
readable by human and it is even editable by hand. We also want to implement XML
persistence. We support also operators << and >> for easy loading and storing. The storing of
reflected variables is automatic, but overridable. Every plugin can store its information
elsewhere, and store just a link, or just not implement the Load and Save methods.
Allocator
The system offers built-in allocator for easy allocation across modules. The only issue is that
the class must have default constructor in order to be able to be allocated by it . This could be
more easily achieved in new C++0x standard. There we could request the concept
DefaultConstructible for the class. The system allocator is multithread safe and we can
allocate an object in one module and free the object in other module.
License
The license of the system is freeware. The license for using particular libraries is limited by
their respective owners
Macros
During development of the system I was trying to completely avoid C++ macros The macros
are source of errors, because they don’t carry any type information Except in one case (the
REFLECTION macro map) we don’t use any important macros in the code There is no way
(according to C++ 98) standard to do this without a macro..
Language of implementation
When we started to think about the implementation, we were considering various languages:
 C++
 compatible with C
 native code (almost as fast as C code)
 extended instruction sets support (MMX, 3DNow, SSE, etc)
 relatively easy implementation of plugins
 precisely standardized
 Java
 easy implementation of plugins – built in reflection
 not native code – slow for video processing



the performance of calling objects is reasonable outside the image processing cycle in
Java, but NOT INSIDE***
C
 fast, faster than C++
 lack of important language features
 many existing libraries and algorithms – IPL, OpenCV
others
 not so widely used as C++
We chose the C++ language, as the performance was the primary goal.
Scripting support
All reflected classes are automatically transformable for using in different scripts The user of
the system can use script to quickly implement some kind of script-algorithm.
Support for scripting
- all reflected classes are automatically transformable for using in
different scripts
- working on the scripting wrapper for ActiveX scripting objects
Support for GPU algorithms
- working on implementation of wrapper for GPU in GLSL, HLSL, etc
Platform independence
The core of the system is written strictly in ANSI C++ 98 .This ensures maximal portability
among various platforms. The only thing non-conformant to the standard is the need for some
kind of class identifier. This identifier is obtained using parsing of the class name returned
from typeid(T).name() for particular type T. According to the standard this method may return
an empty string. But nowadays, for the most of existing platforms this method returns
reasonably encrypted type name. Then we use function AuxUtils::ParseClassName to parse
the class name to the form that is the same on all platforms. So, the only thing we need to
remake in the core when compiling under different platform is the aforementioned method,
and then EnumerateModules method, because this method needs to iterate through the file
system to find and load the dynamically loaded libraries, and this is platform dependent
Pedagogic and didactic aspects
The system is very good for learning of the basic image processing and computer vision
algorithms The system enables the user to “see” what particular algorithms do. The batch
processing and implementation of rather complicated image processing tasks is made rather
simpler.
Case studies
In this section we briefly mention some trivial scenarios of system usage for better
understanding of the cooperation between the system, plug-ins and their interaction with the
user.
 Facial feature tracking in live-stream
The system is in non-interactive mode. The facial feature tracking plug-in (consisting of
particular feature detectors) tracks the features from live stream data.
 Training the classifier on a group of images
The system is in non-interactive mode. The input file contains directory path, where the files
for training are located. The training plug-in feeds each file to the classifier.
 Developing a simple driving license code recognition module
The developer implements a class derived from CAlgorithm. The only significant method to
override is Apply, which applies the algorithm to the input buffer. The result is shown using a
simple message box. Various algorithm properties can be published by implementation of
IRefObj interface. The algorithm can load/store its values by implementation of IPersistence
interface.
 Face recognition system as ISAPI-web application
The only thing to develop is the ISAPI web application template; the system is included into
the application as a COM object.
 Console image processing application
The system is in interactive mode and the actions are offered to the user via a console
window, the result of the recognition is stored as a file and/or shown as a stand-alone window
in OS.
 Cooperation with avatar system
The system works as a subsystem in the emphatic avatar system (see
Figure 2).
The avatar is an entity existing in a virtual reality. The avatar is a reflection of a user in the
VR. We proposed an avatar and now we are trying to extraverbally communicate in VR. For
this we are using a pair of webcams. These capture the facial expression of the user, this is
then decoded and sent to VRML script, so the avatar reacts in accordance with the user face
expression.
The system handles the input part of the avatar system. It works in non-interactive mode. It
uses plugins for feature tracking and detection. The user communicates with the avatar, the
image processing system monitors via a camera device the user and detects his facial features.
According to the facial features the facial expression is calculated and then the avatar reacts
according to the user’s facial expression.
It feeds the information about detected facial feature points to either the Avatar toolkit for creating the
avatar face or to the avatar itself to detect the user expression. The reader can find more information in
Error! Reference source not found.. The part that will be implemented in the image processing system is
shown in
Figure 2 in dashed blue rectangle.
Figure 2 – Avatar system
Results
Figure 3 – VideoForge a screenshot of existing system
The reader can find more information on the VideoForge webpage –
http://www.sccg.sk/~kubini/videoforge
Future work
The development of the system is not yet finished
We are also awaiting the new C++ standard C++0x that should bring new C++ features like
concepts and automatic type generation The reader can find more information in the
references and ISO C++ standardization committee white papers There will probably exist
also platform independent plugin interface and/or implementation
Features:
- avi <-> frames and back seamlessly
- do it for many images
- store previous frames, if needed
- staticky alebo dynamicky modul
References
[1] Test
[2] erwegw
[3] efwe
Download