The RIM Framework for Image Processing Øyvind Ryan

advertisement
The RIM Framework for Image Processing
Øyvind Ryan
?
Department of Informatics, Group for Digital Signal Processing and Image Analysis,
University of Oslo, P.O Box 1080 Blindern, NO-0316 Oslo, Norway
oyvindry@ifi.uio.no
Abstract. A new design for image processing frameworks is proposed.
The new design addresses high-level abstractions suited for componentbased image processing applications, in particular real-time image processing with high performance demands. The RIM framework, an implementation of this design, is gone through. It is explained how RIM can
be adapted in applications, and integrated with other image libraries. It
is also shown how it can be used to confirm some properties of widely
used image formats.
1
Introduction
This paper studies a recently developed image processing framework. The framework is called the Raster Imaging Framework, or RIM. Focus will be on what
may be called dynamic image processing frameworks, since RIM can be placed
in this category. A dynamic image processing framework doesn’t concern itself
with storing results persistently. Rather it is concerned with delivering ephemeral
images, which may be based on an image composition description. The result
is kept in memory until some other software has made use of it. Other software
may be a web server transmitting the result to a client, a program storing the result to file or a graphical user interface displaying the result on screen. Frequent
requests are typical, so memory usage and performance are important factors.
The paper [1] discusses in detail performance of some parts of RIM. To
study RIM’s dynamic image processing capabilities closer, the concept of lazy
evaluation was introduced. Lazy evaluation means to process ephemeral images
piece by piece, such as scanline by scanline, keeping only small parts of the
image in memory at a time. This reduces the working set [2], [3] of the image
processing.
There is an application-driven need for dynamic image processing libraries.
A typical application is to extract a small section of a large image, and convert
it to another image format. Such applications often come in the form of requests
to a server, in particular a map server. The OpenGIS consortium has established
a standard for map servers, called WMS, or Web Map Server [4]. WMS specifies
the behaviour of a service which produces georeferenced maps.
One attempt to categorize image processing frameworks may be the following:
?
This project has been sponsored by the Norwegian Research Council, project nr.
160130/V30
– Some address issues like software reusability through emphasis on image processing algorithm generics (templates). These contain independent building
blocks, and the user can restrict the use of blocks to only the ones he needs.
An example of this kind is the Vigra Computer Vision Library [5].
– Some libraries, like Java’s image processing library [6], attempt to be as
general as possible. The user has access to rich functionality, even if he may
only be interested in a small part of it.
RIM stands somewhere between these types. It does not attempt to be a set
of loosely coupled general purpose algorithms, although parts of it may be extracted as a template library. It does not attempt to be a fully featured image
processing framework either. It is a small set of high-level interfaces targeting
component-oriented usage. The interfaces offer general image operations, particularly transcoding between widely used formats. These operations are abstract
to the user in the form of an Image Algebra, a set of functions to compute new
images as a function of other images. The image operations are polymorphic
with respect to the concrete image formats.
2
The RIM core
The core of the RIM framework is implemented in C++, see public header file [7].
It is at an experimental stage, so that not all parts of it have been optimised
or tested. RIM does not link with other image processing libraries, and support
for some image standards have been implemented from scratch. This was done
in order to support optimisations for lazy evaluation and runlength-based image
processing.
The interface to the RIM framework is inspired by Microsoft’s Component
Object Model, or COM [8], in order to be programming language independent
and target distributed applications. COM provides a standardized API in the
form of the IUnknown interface, and all RIM functionality is based on COM
interfaces offering this interface. Although the interfaces were implemented in
C++, the COM interop system of .NET [9] can use these interfaces with it’s
garbage collector to achieve a smooth integration with languages based on CLI
(Common Language Interface), such as C#. C++ and Java are the languages
currently supported by RIM. The Java interfaces are given in [10]. Interface
naming conventions in this paper follow those in [10], with the exception that a
class prefix is dropped.
One advantage with an interface-based API is that one can hide implementation strategies. Different image formats can for instance utilize data representations in different domains during image processing, the details of the domains
and when they are chosen being completely hidden to the application developer.
The most widely used image processing frameworks assume that a raster representation is used. RIM takes this further by using both runlength-based and
raster-based internal representations [1], the choice depending on the image format. Some image formats and operations may be most efficiently processed when
a runlength-based internal representation is used, and [1] exploits this in terms of
image transcoding. It was shown that more efficient processing is obtained when
the input and output formats can efficiently convert between runlength representations and compressed data. GIF and bi-level TIFF were used as examples
for such formats. One can utilize other internal representations also. Operating
directly in the wavelet domain is for instance known to be more efficient for
certain operations [11].
The high-level abstractions of RIM makes it suitable for use as a dynamic
image library in a web application setting. RIM has been integrated with an XML
interpreter, where different XML elements correspond to different RIM interface
methods. Example XML files can be found in [12]. The XML interpreter has
been integrated with an Image Server component [1] for prolonging the life span
of ephemeral images likely to be used in the future. The Image Server is designed
to host image requests for a map server, so it can be seen as an analogy to WMS.
It is reviewed in section 4.1.
2.1
RIM main interfaces
Certain interfaces are of particular importance in RIM. The most fundamental interface is Image, which is the interface abstraction of an image format’s
read-only view to the image data. The Image interface contains methods for
retrieving common image characteristics, like dimensions. An Image can offer
other interfaces also, reflecting different aspects of the underlying image data.
It may for instance be that the underlying image data is actually vector data.
The VectorSource interface is then also offered. This offers vector-based methods, like functionality for processing objects like text, circles and lines. The
ColoredImage interface is offered if a colour image is used. Interface inheritance
relationships are summed up in figure 1.
Image
<<
<<
<<
<
HandleN
NNN
NNN
NNN
N'
Layer / LayerHandle
ColoredImageHandle
ColoredImage Y
jU U
Z [ \ h h4
c d 2
h h ] ^ _ ` a bU U U U
h
U U
h h
U
h h
PaletteImage _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _/ CompositeLayerHandle
w
ww
ww
w
w
{w
w
w
ww
ww
w
w
w{ w
Fig. 1. Image and Handle hierarchies. Solid line represents inheritance. Dashed
line represents ways of producing objects of the given types.
The Image interface provides a method for producing references to it’s image data. The references represent the context of image traversal, and are used
when iterating a (potentially) compressed image. Many references could be created. Each can traverse the image data independently, thereby supporting concurrent image processing. The references offer the interface Handle (figure 1),
which supports functionality like rendering image data to an output buffer. In
Java, there is a similar duality between the classes Graphics and Image. In the
RIM implementation, classes implement these interfaces on a per-format basis.
The inheritance hierarchy in figure 1, together with internal processing domains
shared by many image formats, offer possibilities for code reuse. This is reflected
in the relatively small code footprint achieved by RIM: The entire RIM dll is
only about 450kb when compiled on win32 platforms.
Handle objects need not arise as references to image data. They can also
arise from Font objects, which represent textual data. Handle objects can also
be constructed from Image Algebra operations.
2.2
Image Algebra
Map images typically consist of a number of bi-level layers placed together. RIM
supports bi-level images in the following way: If an Image originates from a
bi-level file, it will offer the Layer interface. Layer objects support boolean operations. A particularly important boolean operation is image difference. Boolean
operations are part of an important category, called Image Algebra operations.
Image Algebra operations produce Handle objects from existing Handle objects.
Other examples of Image Algebra operations supported in RIM are
–
–
–
–
–
–
–
scaling, which produces a scaled Handle object,
rotation,
clipping,
duplication,
combining a set of Handle objects in a given Z-order,
separating a colour-indexed image into LayerHandle objects,
inversion (switching foreground and background in a bi-level image).
Image Algebra operations can be combined recursively to form a tree, for instance
by taking image difference of scaled or rotated Handles. In such a tree, leaf nodes
would correspond to what one may call atomic Handles. These include Handles
which are references to image data. An Image Algebra tree using some of the
listed operations is shown in figure 2. Note that Image Algebra operands can
refer to either image data or vector data, opening up for applications to hybrid
formats like SVG [13].
The common factor for Image Algebra operations is that new Handle objects
are created. How this is done is up to the implementation, but it is recommended
done without creating new image data segments. RIM is implemented with this
in mind, for instance by performing operations like scaling and image differences
with only small parts of the image loaded at any time.
If an intermediate Image Algebra result is reused more than once, it may be
desirable to precalculate the Image Algebra to avoid performing repeated Image
CompositeLayerHandle
o7
O
hQQ
QQQ
ooo
Combining images_ _ _ _ _ o_oo_o _ _ _ _ _ _ _ Q_QQ_QQ_ _ _
QQ
ooo
Handle
Handle
Textual
data
O
O
scaling
rotation
Image Reference
Handle
O
iii4kWWWWWWWWW
iiii
Image Reference
Image Reference
Fig. 2. A typical Image Algebra tree. represents image difference. Text data
is placed on top of the image layers.
Algebra. A method in the Handle interface offers this functionality, and creates
a compressed in-memory representation of the Image Algebra tree. The format
used for this representation is at the discretion of RIM, and different formats
are used for different image content: TIFF G4 is used for bi-level images, a
proprietary format is used for vector data. JPEG2000 is a natural candidate for
colour images.
The functionality for compressing to an in-memory representation, along with
the other Image Algebra operations, constitute a rather complete set of image
operations. Performing Image Algebra raises a string of performance issues, like
how Image Algebra trees can be transformed into equivalent trees more suitable
for processing. RIM implements several such optimisations.
2.3
I/O support in the RIM framework
RIM supports GIF, BMP and TIFF input. TIFF input is analysed in [1], where
the focus is on TIFF G4 [14]. An API method exists which creates an Image
object from file name and file type identifiers. Depending on the image type, this
object may offer any of the interfaces already discussed.
RIM supports GIF, TIFF, lossless JPEG2000, JPEG and PNG output. The
PNG implementation is based on the libpng reference library [15]. The RIM
framework supports different types of output through a method taking an output format identifier as parameter. This method creates an object offering the
Renderer interface. The Renderer interface has a method which, for a selected
image region, incrementally renders compressed output to a buffer. The method
signature is similar to the read-methods of java.io.InputStream classes in
Java: A parameter indicates the size of the buffer to read from, and another
parameter indicates the number of bytes actually read. Such a method signature frees us from the underlying file system: The output buffer can for instance
be drained onto a network connection, enabling integration with web servers.
Another advantage is that one is offered natural support for splitting output in
logical units since the method can produce output in parts. Logical units for different image formats could be blocks (used by GIF), chunks (PNG) or packets
(JPEG2000). Java also uses InputStreams for image processing purposes, for
instance for the deflate compression algorithm.
Prior to rendering compressed output, one must restrict compression to a
concrete region, and the Handles to render must be added. Handles which are
results of Image Algebra expressions are typically added, and the order they are
added dictates the Z-order. A typical application can have a colour image or a
set of bi-level images as background, and have text fragments or small bitmap
images anchored at designated positions. Bitmaps may be used to represent
some kind of user interaction (like zoom or pan), so this could constitute a user
interface. Example XML is listed below:
<?xml version="1.0" encoding="UTF-8" ?>
<visalg>
<coloredsection color="beffe9">
<file x0="0" y0="0" laysf="1" name="l1.tif" format="3"/>
</coloredsection>
<coloredsection color="ffd1bf">
<file x0="0" y0="0" laysf="1" name="l11.tif" format="3"/>
</coloredsection>
<coloredsection color="000000" static="true">
<text height="16" width="8" text="Test" x0="10" y0="70"/>
</coloredsection>
<coloredsection color="00ff00" static="true">
<file x0="10" y0="40" laysf="1" name="rimtool.bmp" format="2"/>
</coloredsection>
</visalg>
When RIM’s XML interpreter processes this, two TIFF layers overlaid with a
black text segment and a green bitmap will be produced (figure 3).
3
Applications of the RIM framework
A useful and simple application of RIM is layer separation. One of the dashed arrows in figure 1 represents layer separation, so that occurrences of a single colour
in a colour-indexed image may be obtained as a dedicated object. This object
can be compressed to an in-memory representation, which may be desirable to
avoid repeated colour separation.
Performance results are here obtained for different image formats using RIM.
The image formats which will be used are GIF, PNG and lossless JPEG2000. GIF
and PNG are perhaps the most widely used formats for exchange of losslessly
compressed images on the world wide web, while JPEG2000 is the emerging
standard for both lossy and lossless compression. Measurements use the same
test images as in [1], i.e. two images of different parts of Norway comprising of
19 TIFF G4 bi-level layers. One of these is 7469 × 8886 pixels in size (figure 3).
(a) Output for the XML example listed (b) Layered image of Lyngen, one of the
test images used in this paper
Fig. 3. Images used in this paper
The test images have tile dimensions of 512 × 512, and tests are performed on
the tiles separately to obtain a high number of tests. XML files written for the
tests are listed in [12].
3.1
Comparison of performance for different output formats
Performance in terms of clock cycles should be higher when little detail is present
in the image. For RIM, this is verified in the first plot in figure 4, where accumulated runs per line is plotted against clock cycles. Accumulated runs per line [1]
measures the level of image detail in the form of counting the number of runs
per line for all layers. The connection between performance and image detail
is best seen for GIF and PNG. GIF comes out best in terms of performance,
as it has the least complex algorithm. For JPEG2000, two main components
have impact: The embedded block coder (EBCOT), and the Discrete Wavelet
Transform (DWT). The DWT has not been applied in the plot, so the poor
performance of JPEG2000 as compared to GIF has to do with the complexity
of the embedded block coder. The most expensive part of a PNG compressor is
the matching algorithm part of deflate. If much time is spent matching previous combinations of pixels, compression is improved. The PNG compressor used
here is more concerned about compression than performance, which is reflected
in poor performance numbers when compared to GIF.
It may be that compression of bi-level images is of interest. According to [16]
chapter 16.3, JPEG2000 outperforms GIF when it comes to compression at low
(a) Performance in megacycles (= 106
clock cycles)
(b) Compression of bi-level file
Fig. 4. Comparison of widely used image formats using RIM
bit-depths, and is comparable to JPEG-LS and TIFF G4 (for bi-level images).
The second plot in figure 4, generated by using just one layer (rich in content)
in the test image, supports this statement.
3.2
JPEG2000 compression strategies
JPEG2000 is flexible when it comes to techniques which can improve compression. Palette mode can be used for images with a limited number of colours.
Palette-based JPEG2000 can improve compression considerably for two reasons:
First of all, bit-depth and the number of components are reduced. Secondly,
palette-indices can be reorganized. This can be exploited by the JPEG2000 compression algorithm, since the JPEG2000 block coder is bit-plane oriented and
gives higher compression in areas with low bit-plane complexity. Reorganizing
palette indices for some image formats has been exploited in [17]. The figures in
this paper have used a simple palette reorganization, in which the background is
assigned palette index 0, and the next colours are assigned indices in alternating
and increasing order around 0. Both PNG and JPEG2000 support palette mode,
and so does RIM for both these formats. Comparison with and without palette
mode is done in figure 5 for these two file formats.
Both JBIG and JPEG2000 can apply multi-resolution transforms. [16] notes
that the reversible wavelet transform JPEG2000 uses is primarily designed for
continous-tone imagery. One would therefore expect that compression would
suffer somewhat for our type of images when different resolutions are used. This
is verified in figure 6, where compressed file sizes for zero and one DWT levels are
compared. RIM uses a Config interface for image format specific configuration.
For JPEG2000, this supports setting tile sizes, block sizes, progression order [16]
and the number of DWT levels. The JPEG2000 Config interface is here used to
set the number of DWT levels.
(a) PNG
(b) JPEG2000
Fig. 5. Comparison of compressed file sizes for palette-based and RGB-based
compression. PNG and JPEG2000 are used.
Fig. 6. Comparison of compressed file sizes for JPEG2000 with no DWT and
one level DWT
4
Integration of RIM with other component libraries
RIM can easily be integrated with components like web servers and GUI libraries.
Qt [18] is a C++ class library for writing GUI applications. It has been used
to build the popular open source KDE desktop environment for Unix. Making a
scrollable component with QT boils down to subclassing the class QScrollView,
and implementing the method drawContents to draw the image contents of
the current part of the image. An example file in [19] sketches how this can be
done using RIM. The RIM framework can also be integrated with Java Swing
components or Java servlets in a similar way. An example file in [19] sketching
this is also listed.
4.1
Integration with the Image Server
The Image Server acts as a cache for frequently accessed files, and as a front-end
to RIM. A typical use of the Image Server is to extract a small part of a large
image on request from a web server. The Image Server ensures that frequently
requested parts are readily available in shared memory. The architecture used
by the image server is shown in figure 7.
Client
O
HTTP Web server
O
HTTP Image data
Image Server
TTTT
jj
TTTT
jjjj
*
tjjjj
/
interpreter
XML filesZ Z ZXML
Z Z Z Zd d d d d d RIM O core
Image Algebra
d d Z Z Z Z Z
qd d d d
Vector files
Image files
Fig. 7. The Image Server architecture
The Image Server could also be used as a cache for compressed representations of the most commonly used Image Algebra requests. Another possible use
could be to serve as information holder for occurrences of colours within different
parts of the image. Such information can be used in the process of improving
compression.
JPIP [20] is one of the more recent extensions to JPEG2000. It defines a protocol for scalable delivery of JPEG2000 data in client-server systems. Supporting
the JPIP protocol is another possible use of the Image Server.
5
Other work
The Image Server generates images from image description files using XML.
Other dynamic image libraries have also been developed for use in web development settings similarly to RIM. An example is the gd library [21], which has
been integrated with the fly gd command interpreter. Separate gd commands
exist for different drawing primitives, so that image processing can be embedded
in scripting languages. This is similar to the way XML is used by the Image
Server. The RIM API supports the most common drawing primitives, like circles, lines and text, so the RIM XML interpreter supports similar functionality
to gd.
6
Conclusion
A small dynamic image processing library has been demonstrated. It was argued
that the library meets low-memory demands imposed in dynamic image processing. It was also shown how the library can be used to demonstrate properties
about widely used image standards, and easily be integrated with other GUI
component libraries. It was also demonstrated that RIM can handle different
image formats in a completely transparent manner, and how RIM’s support for
Image Algebra makes it a very general tool.
Results in this paper were obtained with an Intel Pentium M processor with
1600MHz clock speed, L2 cache size of 1MB and 512 MB RAM. All tests were
run under Windows XP, and all programs were compiled with Microsoft Visual
C++.NET 7.1.
Acknowledgement
I give my sincere thanks to Stein Jørgen Ryan for helpful discussions on different
topics presented in this paper.
The work in this paper is partially based on the RIM library from Raster
Imaging AS (www.rasterimaging.com) which provides high performance imaging technologies. The post.doc project carried out by Dr. Øyvind Ryan at the
University of Oslo has enhanced this implementation, and added algorithms for
improved performance and scalability with regards to server applications and
memory consumption.
References
1. Ryan, Ø.: Efficient implementations of operations on runlength-represented images. Submitted to the 14th European Signal Processing Conference, Eusipco
2006 (2006)
2. Denning, P.J.: The working set model for program behavior. Communications of
the ACM 11 (1968) 323–333
3. Denning, P.J., Schwartz, S.C.: Properties of the working-set model. Communications of the ACM 15 (1972) 191–198
4. Open Geospatial Consortium Inc.: WMS specification. www.opengis.org. (2006)
5. Köthe, U.:
The Vigra computer vision library. kogs-www.informatik.unihamburg.de/∼koethe/vigra/. (2005)
6. Sun
Microsystems:
Java
Image
I/0
API.
java.sun.com/j2se/1.4.2/docs/guide/imageio/. (2002)
7. Raster
Imaging
AS:
RIM
framework
C++
header
file.
www.ifi.uio.no/∼oyvindry/rim/rim.h. (2006)
8. Microsoft: COM. www.microsoft.com/com/default.mspx. (2006)
9. Löwy, J.: Programming .NET Components, 2nd Edition. O’Reilly Media (2005)
10. Raster Imaging: javadoc for RIM. www.ifi.uio.no/∼oyvindry/rim/javadoc/. (2006)
11. Drori, I., Lischinski., D.: Fast multiresolution image operations in the wavelet
domain. IEEE Transactions on Visualization and Computer Graphics. 9 (2003)
395–412
12. Raster Imaging AS: Example xml files. www.ifi.uio.no/∼oyvindry/rim/. (2006)
13. W3C Consortium: SVG specification. www.w3.org/Graphics/SVG/. (2006)
14. CCITT: Recommendation T.6. Facsimile Coding Schemes and Coding Control
Functions for Group 4 Facsimile Apparatus. (1985)
15. libpng.org: libpng, reference library for reading and writing PNG. www.libpng.org.
(2001)
16. Taubman, D.S., Marcellin, M.W.: JPEG2000. Image compression. Fundamentals,
standards and practice. Kluwer Academic Publishers (2002)
17. W. Seng, J.L., Lei, S.: An efficient color re-indexing scheme for palette-based
compression. Proc. IEEE Int. Conf. Image Proc. 3 (2000) 476–479
18. Dalheimer, M.: Programming with Qt (2nd Edition). O’Reilly Media (2002)
19. Raster
Imaging
AS:
RIM
framework
example
files.
www.ifi.uio.no/∼oyvindry/rim/. (2006)
20. The JPEG Comittee: ISO/IEC 15444-9:2005, Information technology - JPEG 2000
image coding system: Interactivity tools, APIs and protocols. (2005)
21. boutell.com: The GD graphics library. www.boutell.com/gd/. (2006)
Download