The RIM Framework for Image Processing Øyvind Ryan ? Department of Informatics, Group for Digital Signal Processing and Image Analysis, University of Oslo, P.O Box 1080 Blindern, NO-0316 Oslo, Norway oyvindry@ifi.uio.no Abstract. A new design for image processing frameworks is proposed. The new design addresses high-level abstractions suited for componentbased image processing applications, in particular real-time image processing with high performance demands. The RIM framework, an implementation of this design, is gone through. It is explained how RIM can be adapted in applications, and integrated with other image libraries. It is also shown how it can be used to confirm some properties of widely used image formats. 1 Introduction This paper studies a recently developed image processing framework. The framework is called the Raster Imaging Framework, or RIM. Focus will be on what may be called dynamic image processing frameworks, since RIM can be placed in this category. A dynamic image processing framework doesn’t concern itself with storing results persistently. Rather it is concerned with delivering ephemeral images, which may be based on an image composition description. The result is kept in memory until some other software has made use of it. Other software may be a web server transmitting the result to a client, a program storing the result to file or a graphical user interface displaying the result on screen. Frequent requests are typical, so memory usage and performance are important factors. The paper [1] discusses in detail performance of some parts of RIM. To study RIM’s dynamic image processing capabilities closer, the concept of lazy evaluation was introduced. Lazy evaluation means to process ephemeral images piece by piece, such as scanline by scanline, keeping only small parts of the image in memory at a time. This reduces the working set [2], [3] of the image processing. There is an application-driven need for dynamic image processing libraries. A typical application is to extract a small section of a large image, and convert it to another image format. Such applications often come in the form of requests to a server, in particular a map server. The OpenGIS consortium has established a standard for map servers, called WMS, or Web Map Server [4]. WMS specifies the behaviour of a service which produces georeferenced maps. One attempt to categorize image processing frameworks may be the following: ? This project has been sponsored by the Norwegian Research Council, project nr. 160130/V30 – Some address issues like software reusability through emphasis on image processing algorithm generics (templates). These contain independent building blocks, and the user can restrict the use of blocks to only the ones he needs. An example of this kind is the Vigra Computer Vision Library [5]. – Some libraries, like Java’s image processing library [6], attempt to be as general as possible. The user has access to rich functionality, even if he may only be interested in a small part of it. RIM stands somewhere between these types. It does not attempt to be a set of loosely coupled general purpose algorithms, although parts of it may be extracted as a template library. It does not attempt to be a fully featured image processing framework either. It is a small set of high-level interfaces targeting component-oriented usage. The interfaces offer general image operations, particularly transcoding between widely used formats. These operations are abstract to the user in the form of an Image Algebra, a set of functions to compute new images as a function of other images. The image operations are polymorphic with respect to the concrete image formats. 2 The RIM core The core of the RIM framework is implemented in C++, see public header file [7]. It is at an experimental stage, so that not all parts of it have been optimised or tested. RIM does not link with other image processing libraries, and support for some image standards have been implemented from scratch. This was done in order to support optimisations for lazy evaluation and runlength-based image processing. The interface to the RIM framework is inspired by Microsoft’s Component Object Model, or COM [8], in order to be programming language independent and target distributed applications. COM provides a standardized API in the form of the IUnknown interface, and all RIM functionality is based on COM interfaces offering this interface. Although the interfaces were implemented in C++, the COM interop system of .NET [9] can use these interfaces with it’s garbage collector to achieve a smooth integration with languages based on CLI (Common Language Interface), such as C#. C++ and Java are the languages currently supported by RIM. The Java interfaces are given in [10]. Interface naming conventions in this paper follow those in [10], with the exception that a class prefix is dropped. One advantage with an interface-based API is that one can hide implementation strategies. Different image formats can for instance utilize data representations in different domains during image processing, the details of the domains and when they are chosen being completely hidden to the application developer. The most widely used image processing frameworks assume that a raster representation is used. RIM takes this further by using both runlength-based and raster-based internal representations [1], the choice depending on the image format. Some image formats and operations may be most efficiently processed when a runlength-based internal representation is used, and [1] exploits this in terms of image transcoding. It was shown that more efficient processing is obtained when the input and output formats can efficiently convert between runlength representations and compressed data. GIF and bi-level TIFF were used as examples for such formats. One can utilize other internal representations also. Operating directly in the wavelet domain is for instance known to be more efficient for certain operations [11]. The high-level abstractions of RIM makes it suitable for use as a dynamic image library in a web application setting. RIM has been integrated with an XML interpreter, where different XML elements correspond to different RIM interface methods. Example XML files can be found in [12]. The XML interpreter has been integrated with an Image Server component [1] for prolonging the life span of ephemeral images likely to be used in the future. The Image Server is designed to host image requests for a map server, so it can be seen as an analogy to WMS. It is reviewed in section 4.1. 2.1 RIM main interfaces Certain interfaces are of particular importance in RIM. The most fundamental interface is Image, which is the interface abstraction of an image format’s read-only view to the image data. The Image interface contains methods for retrieving common image characteristics, like dimensions. An Image can offer other interfaces also, reflecting different aspects of the underlying image data. It may for instance be that the underlying image data is actually vector data. The VectorSource interface is then also offered. This offers vector-based methods, like functionality for processing objects like text, circles and lines. The ColoredImage interface is offered if a colour image is used. Interface inheritance relationships are summed up in figure 1. Image << << << < HandleN NNN NNN NNN N' Layer / LayerHandle ColoredImageHandle ColoredImage Y jU U Z [ \ h h4 c d 2 h h ] ^ _ ` a bU U U U h U U h h U h h PaletteImage _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _/ CompositeLayerHandle w ww ww w w {w w w ww ww w w w{ w Fig. 1. Image and Handle hierarchies. Solid line represents inheritance. Dashed line represents ways of producing objects of the given types. The Image interface provides a method for producing references to it’s image data. The references represent the context of image traversal, and are used when iterating a (potentially) compressed image. Many references could be created. Each can traverse the image data independently, thereby supporting concurrent image processing. The references offer the interface Handle (figure 1), which supports functionality like rendering image data to an output buffer. In Java, there is a similar duality between the classes Graphics and Image. In the RIM implementation, classes implement these interfaces on a per-format basis. The inheritance hierarchy in figure 1, together with internal processing domains shared by many image formats, offer possibilities for code reuse. This is reflected in the relatively small code footprint achieved by RIM: The entire RIM dll is only about 450kb when compiled on win32 platforms. Handle objects need not arise as references to image data. They can also arise from Font objects, which represent textual data. Handle objects can also be constructed from Image Algebra operations. 2.2 Image Algebra Map images typically consist of a number of bi-level layers placed together. RIM supports bi-level images in the following way: If an Image originates from a bi-level file, it will offer the Layer interface. Layer objects support boolean operations. A particularly important boolean operation is image difference. Boolean operations are part of an important category, called Image Algebra operations. Image Algebra operations produce Handle objects from existing Handle objects. Other examples of Image Algebra operations supported in RIM are – – – – – – – scaling, which produces a scaled Handle object, rotation, clipping, duplication, combining a set of Handle objects in a given Z-order, separating a colour-indexed image into LayerHandle objects, inversion (switching foreground and background in a bi-level image). Image Algebra operations can be combined recursively to form a tree, for instance by taking image difference of scaled or rotated Handles. In such a tree, leaf nodes would correspond to what one may call atomic Handles. These include Handles which are references to image data. An Image Algebra tree using some of the listed operations is shown in figure 2. Note that Image Algebra operands can refer to either image data or vector data, opening up for applications to hybrid formats like SVG [13]. The common factor for Image Algebra operations is that new Handle objects are created. How this is done is up to the implementation, but it is recommended done without creating new image data segments. RIM is implemented with this in mind, for instance by performing operations like scaling and image differences with only small parts of the image loaded at any time. If an intermediate Image Algebra result is reused more than once, it may be desirable to precalculate the Image Algebra to avoid performing repeated Image CompositeLayerHandle o7 O hQQ QQQ ooo Combining images_ _ _ _ _ o_oo_o _ _ _ _ _ _ _ Q_QQ_QQ_ _ _ QQ ooo Handle Handle Textual data O O scaling rotation Image Reference Handle O iii4kWWWWWWWWW iiii Image Reference Image Reference Fig. 2. A typical Image Algebra tree. represents image difference. Text data is placed on top of the image layers. Algebra. A method in the Handle interface offers this functionality, and creates a compressed in-memory representation of the Image Algebra tree. The format used for this representation is at the discretion of RIM, and different formats are used for different image content: TIFF G4 is used for bi-level images, a proprietary format is used for vector data. JPEG2000 is a natural candidate for colour images. The functionality for compressing to an in-memory representation, along with the other Image Algebra operations, constitute a rather complete set of image operations. Performing Image Algebra raises a string of performance issues, like how Image Algebra trees can be transformed into equivalent trees more suitable for processing. RIM implements several such optimisations. 2.3 I/O support in the RIM framework RIM supports GIF, BMP and TIFF input. TIFF input is analysed in [1], where the focus is on TIFF G4 [14]. An API method exists which creates an Image object from file name and file type identifiers. Depending on the image type, this object may offer any of the interfaces already discussed. RIM supports GIF, TIFF, lossless JPEG2000, JPEG and PNG output. The PNG implementation is based on the libpng reference library [15]. The RIM framework supports different types of output through a method taking an output format identifier as parameter. This method creates an object offering the Renderer interface. The Renderer interface has a method which, for a selected image region, incrementally renders compressed output to a buffer. The method signature is similar to the read-methods of java.io.InputStream classes in Java: A parameter indicates the size of the buffer to read from, and another parameter indicates the number of bytes actually read. Such a method signature frees us from the underlying file system: The output buffer can for instance be drained onto a network connection, enabling integration with web servers. Another advantage is that one is offered natural support for splitting output in logical units since the method can produce output in parts. Logical units for different image formats could be blocks (used by GIF), chunks (PNG) or packets (JPEG2000). Java also uses InputStreams for image processing purposes, for instance for the deflate compression algorithm. Prior to rendering compressed output, one must restrict compression to a concrete region, and the Handles to render must be added. Handles which are results of Image Algebra expressions are typically added, and the order they are added dictates the Z-order. A typical application can have a colour image or a set of bi-level images as background, and have text fragments or small bitmap images anchored at designated positions. Bitmaps may be used to represent some kind of user interaction (like zoom or pan), so this could constitute a user interface. Example XML is listed below: <?xml version="1.0" encoding="UTF-8" ?> <visalg> <coloredsection color="beffe9"> <file x0="0" y0="0" laysf="1" name="l1.tif" format="3"/> </coloredsection> <coloredsection color="ffd1bf"> <file x0="0" y0="0" laysf="1" name="l11.tif" format="3"/> </coloredsection> <coloredsection color="000000" static="true"> <text height="16" width="8" text="Test" x0="10" y0="70"/> </coloredsection> <coloredsection color="00ff00" static="true"> <file x0="10" y0="40" laysf="1" name="rimtool.bmp" format="2"/> </coloredsection> </visalg> When RIM’s XML interpreter processes this, two TIFF layers overlaid with a black text segment and a green bitmap will be produced (figure 3). 3 Applications of the RIM framework A useful and simple application of RIM is layer separation. One of the dashed arrows in figure 1 represents layer separation, so that occurrences of a single colour in a colour-indexed image may be obtained as a dedicated object. This object can be compressed to an in-memory representation, which may be desirable to avoid repeated colour separation. Performance results are here obtained for different image formats using RIM. The image formats which will be used are GIF, PNG and lossless JPEG2000. GIF and PNG are perhaps the most widely used formats for exchange of losslessly compressed images on the world wide web, while JPEG2000 is the emerging standard for both lossy and lossless compression. Measurements use the same test images as in [1], i.e. two images of different parts of Norway comprising of 19 TIFF G4 bi-level layers. One of these is 7469 × 8886 pixels in size (figure 3). (a) Output for the XML example listed (b) Layered image of Lyngen, one of the test images used in this paper Fig. 3. Images used in this paper The test images have tile dimensions of 512 × 512, and tests are performed on the tiles separately to obtain a high number of tests. XML files written for the tests are listed in [12]. 3.1 Comparison of performance for different output formats Performance in terms of clock cycles should be higher when little detail is present in the image. For RIM, this is verified in the first plot in figure 4, where accumulated runs per line is plotted against clock cycles. Accumulated runs per line [1] measures the level of image detail in the form of counting the number of runs per line for all layers. The connection between performance and image detail is best seen for GIF and PNG. GIF comes out best in terms of performance, as it has the least complex algorithm. For JPEG2000, two main components have impact: The embedded block coder (EBCOT), and the Discrete Wavelet Transform (DWT). The DWT has not been applied in the plot, so the poor performance of JPEG2000 as compared to GIF has to do with the complexity of the embedded block coder. The most expensive part of a PNG compressor is the matching algorithm part of deflate. If much time is spent matching previous combinations of pixels, compression is improved. The PNG compressor used here is more concerned about compression than performance, which is reflected in poor performance numbers when compared to GIF. It may be that compression of bi-level images is of interest. According to [16] chapter 16.3, JPEG2000 outperforms GIF when it comes to compression at low (a) Performance in megacycles (= 106 clock cycles) (b) Compression of bi-level file Fig. 4. Comparison of widely used image formats using RIM bit-depths, and is comparable to JPEG-LS and TIFF G4 (for bi-level images). The second plot in figure 4, generated by using just one layer (rich in content) in the test image, supports this statement. 3.2 JPEG2000 compression strategies JPEG2000 is flexible when it comes to techniques which can improve compression. Palette mode can be used for images with a limited number of colours. Palette-based JPEG2000 can improve compression considerably for two reasons: First of all, bit-depth and the number of components are reduced. Secondly, palette-indices can be reorganized. This can be exploited by the JPEG2000 compression algorithm, since the JPEG2000 block coder is bit-plane oriented and gives higher compression in areas with low bit-plane complexity. Reorganizing palette indices for some image formats has been exploited in [17]. The figures in this paper have used a simple palette reorganization, in which the background is assigned palette index 0, and the next colours are assigned indices in alternating and increasing order around 0. Both PNG and JPEG2000 support palette mode, and so does RIM for both these formats. Comparison with and without palette mode is done in figure 5 for these two file formats. Both JBIG and JPEG2000 can apply multi-resolution transforms. [16] notes that the reversible wavelet transform JPEG2000 uses is primarily designed for continous-tone imagery. One would therefore expect that compression would suffer somewhat for our type of images when different resolutions are used. This is verified in figure 6, where compressed file sizes for zero and one DWT levels are compared. RIM uses a Config interface for image format specific configuration. For JPEG2000, this supports setting tile sizes, block sizes, progression order [16] and the number of DWT levels. The JPEG2000 Config interface is here used to set the number of DWT levels. (a) PNG (b) JPEG2000 Fig. 5. Comparison of compressed file sizes for palette-based and RGB-based compression. PNG and JPEG2000 are used. Fig. 6. Comparison of compressed file sizes for JPEG2000 with no DWT and one level DWT 4 Integration of RIM with other component libraries RIM can easily be integrated with components like web servers and GUI libraries. Qt [18] is a C++ class library for writing GUI applications. It has been used to build the popular open source KDE desktop environment for Unix. Making a scrollable component with QT boils down to subclassing the class QScrollView, and implementing the method drawContents to draw the image contents of the current part of the image. An example file in [19] sketches how this can be done using RIM. The RIM framework can also be integrated with Java Swing components or Java servlets in a similar way. An example file in [19] sketching this is also listed. 4.1 Integration with the Image Server The Image Server acts as a cache for frequently accessed files, and as a front-end to RIM. A typical use of the Image Server is to extract a small part of a large image on request from a web server. The Image Server ensures that frequently requested parts are readily available in shared memory. The architecture used by the image server is shown in figure 7. Client O HTTP Web server O HTTP Image data Image Server TTTT jj TTTT jjjj * tjjjj / interpreter XML filesZ Z ZXML Z Z Z Zd d d d d d RIM O core Image Algebra d d Z Z Z Z Z qd d d d Vector files Image files Fig. 7. The Image Server architecture The Image Server could also be used as a cache for compressed representations of the most commonly used Image Algebra requests. Another possible use could be to serve as information holder for occurrences of colours within different parts of the image. Such information can be used in the process of improving compression. JPIP [20] is one of the more recent extensions to JPEG2000. It defines a protocol for scalable delivery of JPEG2000 data in client-server systems. Supporting the JPIP protocol is another possible use of the Image Server. 5 Other work The Image Server generates images from image description files using XML. Other dynamic image libraries have also been developed for use in web development settings similarly to RIM. An example is the gd library [21], which has been integrated with the fly gd command interpreter. Separate gd commands exist for different drawing primitives, so that image processing can be embedded in scripting languages. This is similar to the way XML is used by the Image Server. The RIM API supports the most common drawing primitives, like circles, lines and text, so the RIM XML interpreter supports similar functionality to gd. 6 Conclusion A small dynamic image processing library has been demonstrated. It was argued that the library meets low-memory demands imposed in dynamic image processing. It was also shown how the library can be used to demonstrate properties about widely used image standards, and easily be integrated with other GUI component libraries. It was also demonstrated that RIM can handle different image formats in a completely transparent manner, and how RIM’s support for Image Algebra makes it a very general tool. Results in this paper were obtained with an Intel Pentium M processor with 1600MHz clock speed, L2 cache size of 1MB and 512 MB RAM. All tests were run under Windows XP, and all programs were compiled with Microsoft Visual C++.NET 7.1. Acknowledgement I give my sincere thanks to Stein Jørgen Ryan for helpful discussions on different topics presented in this paper. The work in this paper is partially based on the RIM library from Raster Imaging AS (www.rasterimaging.com) which provides high performance imaging technologies. The post.doc project carried out by Dr. Øyvind Ryan at the University of Oslo has enhanced this implementation, and added algorithms for improved performance and scalability with regards to server applications and memory consumption. References 1. Ryan, Ø.: Efficient implementations of operations on runlength-represented images. Submitted to the 14th European Signal Processing Conference, Eusipco 2006 (2006) 2. Denning, P.J.: The working set model for program behavior. Communications of the ACM 11 (1968) 323–333 3. Denning, P.J., Schwartz, S.C.: Properties of the working-set model. Communications of the ACM 15 (1972) 191–198 4. Open Geospatial Consortium Inc.: WMS specification. www.opengis.org. (2006) 5. Köthe, U.: The Vigra computer vision library. kogs-www.informatik.unihamburg.de/∼koethe/vigra/. (2005) 6. Sun Microsystems: Java Image I/0 API. java.sun.com/j2se/1.4.2/docs/guide/imageio/. (2002) 7. Raster Imaging AS: RIM framework C++ header file. www.ifi.uio.no/∼oyvindry/rim/rim.h. (2006) 8. Microsoft: COM. www.microsoft.com/com/default.mspx. (2006) 9. Löwy, J.: Programming .NET Components, 2nd Edition. O’Reilly Media (2005) 10. Raster Imaging: javadoc for RIM. www.ifi.uio.no/∼oyvindry/rim/javadoc/. (2006) 11. Drori, I., Lischinski., D.: Fast multiresolution image operations in the wavelet domain. IEEE Transactions on Visualization and Computer Graphics. 9 (2003) 395–412 12. Raster Imaging AS: Example xml files. www.ifi.uio.no/∼oyvindry/rim/. (2006) 13. W3C Consortium: SVG specification. www.w3.org/Graphics/SVG/. (2006) 14. CCITT: Recommendation T.6. Facsimile Coding Schemes and Coding Control Functions for Group 4 Facsimile Apparatus. (1985) 15. libpng.org: libpng, reference library for reading and writing PNG. www.libpng.org. (2001) 16. Taubman, D.S., Marcellin, M.W.: JPEG2000. Image compression. Fundamentals, standards and practice. Kluwer Academic Publishers (2002) 17. W. Seng, J.L., Lei, S.: An efficient color re-indexing scheme for palette-based compression. Proc. IEEE Int. Conf. Image Proc. 3 (2000) 476–479 18. Dalheimer, M.: Programming with Qt (2nd Edition). O’Reilly Media (2002) 19. Raster Imaging AS: RIM framework example files. www.ifi.uio.no/∼oyvindry/rim/. (2006) 20. The JPEG Comittee: ISO/IEC 15444-9:2005, Information technology - JPEG 2000 image coding system: Interactivity tools, APIs and protocols. (2005) 21. boutell.com: The GD graphics library. www.boutell.com/gd/. (2006)