The Perfect Image Format Graduation Thesis of Zhong Qing University of Applied Sciences Northwestern Switzerland School of Engineering qing.zhong@students.fhnw.ch Professor: Prof. Dr. Christoph Stamm November 30, 2006 ABSTRACT. Progressive Graphics File (PGF), a very efficient image format for natural pictures, is based upon the discrete wavelet transformation and various coding techniques. Portable Network Graphics (PNG) is a well-known image format and a successor of GIF. It is a proper format for artificial or computer generated images (CGIs). This project focuses on designing a predictor function which decides on whether to choose PGF or PNG as the underlying encoding engine for any given image of any type. The aim of the project is to develop such a predictor and a complete codec. The result is a new image codec which not only demonstrates high efficiency, but it also produces a sound accuracy of prediction, which is almost invariably for photo-graphic images, and eighty percent for CGIs. Keywords: Progressive Graphics File, Portable Network Graphics, PGF, PNG, Predictor METHODOLOGY. A test set of 100 images were chosen to examine the compression ratio and encoding / decoding efficiency of both PGF and PNG codecs. The result showed that PGF encodes photographic, photo realistic CGIs and any other photograph-like images better, whereas PNG is appropriate for encoding ordinary CGIs as well as compound images like screen shots. It is easy for human beings to distinguish among these kinds of images, but not a computer. For this reason, a quadruplet feature space which mathematically represents an image was carefully chosen, so that with the help of K-nearest neighbor algorithm and a set of training images, any given image can be classified as either photographic(like) images or ordinary CGIs. Thus the predictor simulates human being’s visual perception and effectively predicts the best suitable image formats to be used. The following illustration (figure 0.1) depicts the entire procedure of this project. INPUT PROCESSING STEP 1: SImage.Load(coffee.bmp) Reads an input image of file format JPG/GIF/BMP/PNG using SImage. A sample image coffee.bmp is used. SImage CODEC STEP 2: Predictor Determines whether an input image should be encoded as PGF or PNG STEP 2.1: Calculate the feature space of coffee.bmp = {0.206179, 0.00394969, 33.00000, 0.275221} STEP 2.2: Calulate the euclidean distance between coffee and all images in the traing set coffee.bmp PRE-PROCESSING DEFINING FEATURE SPACE An image can be mathematically represented as a quadruple feature space with following items: Number of Unique Colors Spatial Variation of Color Pixel Saturation Prevalent Color OUTPUT STEP 3: DECISION & ENCODING coffee.pgf The three closest neighbors are 2 from PGF set, 1 from PNG set. K=3, so that choose PGF as the encoding codec. X PNG coffee.png STEP 2.3: Use K-nearest neighbor algorithm to choose K=3 closest samples and make a binary decision. TRAINING SET Precalculated quadruple of training images, left hand side images for PGF, right for PNG {0.665056,0.00892906,1.09302,0.00241607,1}, {0.655052,0.0732959,7.775,0.00487805,1}, {0.750381,0.0463105,1.30814,0.00890132,1}, {0.507407,0.0119276,0.333333,0.0488889,1}, {0.869764,0.0345812,1.88542,0.00198367,1}, {0.95963,0.0301363,1.875,0.00148148,1}, {0.927528,0.0254809,1.47222,0.00730337,1} ...... Figure 0.1: Procedure {0.0392666,0.00333464,13.5,0.56109,0}, {0.165956,0.0305393,21.5,0.606954,0}, {0.214155,0.0152545,3.17874,0.483503,0}, {0.229212,0.0302594,15.6061,0.335853,0}, {0.216314,0.0134404,25.8,0.149423,0} {0.167755,0.0258481,5.44595,0.608707,0}, {0.0281633,0.0112492,42.3333,0.689524,0}, ...... Contents 1 Introduction 1.1 Progressive Graphics File (PGF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Portable Network Graphics (PNG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 7 7 8 2 PNG 2.1 PNG: Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 PNG: zlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 10 3 Testset for PGF & PNG 3.1 Test Set . . . . . . . . . . . . . . 3.2 Software . . . . . . . . . . . . . . 3.2.1 PGF Console Application 3.2.2 BMP2PNG . . . . . . . . 3.2.3 Microsoft CImage . . . . . 3.3 Test Result . . . . . . . . . . . . . . . . . . 12 12 13 13 13 14 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Predictor 4.1 Strategy . . . . . . . . . . . . . . 4.2 Complete Procedure . . . . . . . 4.3 Feature Space . . . . . . . . . . . 4.3.1 Number of Unique Colors 4.3.2 Spatial Variation of Color 4.3.3 Pixel Saturation . . . . . 4.3.4 Prevalent Color . . . . . . 4.4 Training Set . . . . . . . . . . . . 4.5 Predictor Function . . . . . . . . 4.5.1 Euclidean Distance . . . . 4.5.2 KNN Algorithm . . . . . 4.5.3 Utilities . . . . . . . . . . 4.6 NUnit Test . . . . . . . . . . . . 4.7 Extensibility vs. Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 17 17 18 22 22 22 23 23 24 25 25 26 27 27 5 Complete Codec 5.1 SImage . . . . . . . . . . . . . . 5.2 PGFPNG Console Application 5.3 Result . . . . . . . . . . . . . . 5.4 Future Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 28 29 29 30 6 Problems and Experience 6.1 CImage & libpng & NUnit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 32 32 7 Conclusion 34 QZ.November 30, 2006 . . . . 3 of 35 List of Tables 2.1 Filter Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 3.2 Test Set Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 14 5.1 Test Set Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 QZ.November 30, 2006 4 of 35 List of Figures 0.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1 Blackbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1 3.2 Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Encoding time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 Procedure . . . . . F1 vs. F2 . . . . . F1 vs. F3 . . . . . F1 vs. F4 . . . . . F2 vs. F3 . . . . . F2 vs. F4 . . . . . F3 vs. F4 . . . . . K-nearest Neighbor . . . . . . . . 18 19 19 20 20 21 21 24 5.1 Comparison among PGF, PNG and SImage . . . . . . . . . . . . . . . . . . . . . . . . 30 QZ.November 30, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 of 35 1 Introduction My graduation project at the FHNW is to develop an image predictor which decides on an output image file format of either Progressive Graphics File (PGF) or Portable Network Graphics (PNG) after reading any input image of any type. Ordinary users always have difficulties to find the proper image file format to use. The predictor prevents users from using incorrect format and assists them to choose the best format for various types of imagery. With help of the predictor, a complete new codec can be created which first analyzes the input images, then saves in either PGF or PNG format upon predictor’s decision. The new codec and predictor can be regarded as a black box function which is illustrated in figure 1.1. Three terminologies used throughout the paper are listed below: • Encoding Time: the time duration of encoding (conversion) and writing • Decoding Time: the time duration of reading and decoding (conversion) • Compression ratio: the ratio between an input file size and an output file size: image.bmp = 800 KB, output: image.pgf = 400 KB, then the ratio is 2.0. input output , i.e. input: In order to program the predictor, understanding PGF and PNG is essential. In chapter 1.1 and 1.2, I will be describing PGF and PNG technologies briefly. Detailed information about PNG, like filtering, zlib and libpng, will be discussed in Chapter 2. In chapter 3, test set is introduced and the algorithm and the implementation of predictor is explained in chapter 4, .The complete codec SImage is presented in chapter 5. The problems occurred and experience gathered are covered in chapter 6. A conclusion is drawn in chapter 7. INPUT PROCESSING BLACK BOX OUTPUT coffee.pgf coffee.bmp coffee.png Figure 1.1: Blackbox QZ.November 30, 2006 6 of 35 1.1 Progressive Graphics File (PGF) Progressive Graphics File (PGF)1 was developed by Dr. Christoph Stamm at the Swiss Federal Institute of Technology Zurich. It is based on a color transform (RGB to YUV) and a discrete wavelet transform with progressive coding features. It conducts both lossy and lossless compression. In its lossy compression mode, PGF outperforms JPEG for natural and aerial ortho photos and achieves better compression efficiency. (cf. [PGF01]) It supports the following types of images, namely • 1-bit Bitmap • 8-, 16, and 31-bit gray scale support • 12-, 16-, 24-, and 48-bit RGB support • 32-bit RGBA support • 32- und 64-bit CMYK support • 24- und 48-bit L*a*b* support 1.2 Portable Network Graphics (PNG) Portable Network Graphics (PNG) is an image file format storing bitmapped (raster) images on computers, which was designed to be the successor to the Graphics Interchange Format (GIF). It supports three types of images, namely • 1-, 2-, 4- and 8-bit palette support (like GIF) • 1-, 2-, 4-, 8- and 16-bit gray scale support • 8- and 16-bit-per-sample (that is, 24- and 48-bit) true color support • 32-bit ARGB Besides, PNG boasts a maximal 16-bit mode full alpha transparency compared to the simple on-off (1-bit) transparency like GIF. As far as compression is concerned, PNG specification defines a single compression method, the deflate algorithm, for all image types. It is a part of the LZ77 class of compression algorithms and was defined by PKWARE in 1991 as part of the 1.93a beta version of their PKZIP archiver. (cf. [PNG99]), which unlike the Unisys’ patent on the LZW compression method used in GIF, is royalty free. Moreover, PNG conducts only lossless compression and is the only lossless true color format for web applications. Furthermore, the PNG reference library libpng uses zlib as its underlying compression and decompression engine. Detailed information about libpng will be explained in Chapter 2, in respect of filtering and zlib. 1 c.f. http://de.wikipedia.org/wiki/Progressive_Graphics_File QZ.November 30, 2006 7 of 35 1.3 Infrastructure Microsoft Visual 2005 was chosen as the default software development platform. Open source software BMP2PNG by Miyasaka Masaru2 , libpng with zlib, Microsoft CImage3 demo program and PGF console application by Xeraina4 had been compiled in release versions from source codes, all of which were used to evaluate the test set, develop the predictor function and program the complete new codec SImage. IBM Notebook T43p was the hardware used in the project with WindowsXP professional, 1.86 GHz Pentium M, 2MB L2-Cache, and 1 GB RAM. 2 c.f. http://cetus.sakura.ne.jp/softlab/b2p-home/ c.f. CImage is from Microsoft, URL: http://msdn2.microsoft.com/en-us/library/bwea7by5(vs.80).aspx 4 c.f. http://www.xeraina.ch/pgf/ 3 QZ.November 30, 2006 8 of 35 2 PNG The PNG reference library libpng used was of version 1.2.12 - June 27, 2006. As long as this project is concerned, only filter() and deflate() (of its underlying compression engine zlib) will be extensively discussed in chapter 2.1 and chapter 2.2 respectively. A brief introduction of libpng is in the subsequent chapter. The granddaddy of all PNG libararies is libpng, the free reference library available as Standard (ANSI) C source code and used by many, if not most, PNG-supporting applications. It uses the similarly free zlib library (portable C source code) for compression and decompression. ([PNG99], p.219) It offers such ordinary functionalities as reading, reading progressively and writing PNG files. It also has extensions which are marked as “inactive preprocessor block” in the source code. For example the auto filtering method //////////////////////////////////////////////////////////////////// png_set_filter_heuristics(png_structp png_ptr, int heuristic_method, int num_weights, png_doublep filter_weights, png_doublep filter_costs) //////////////////////////////////////////////////////////////////// is deactivated because PNG_WRITE_WEIGHTED_FILTER_SUPPORTED is by default not defined. 2.1 PNG: Filters The first assumption made about why PNG encodes certain image types better than the others was PNG’s filtering mechanism. PNG supports a precompression step called filtering. Filtering is a method of reversibly transforming the image data so that the main compression engine can operate more efficiently. Besides, filtering improves compression in gray scale and true color images. As a simple example, consider a sequence of bytes increasing uniformly from 1 to 255. Since there is no repetition in the sequence, it compresses either very poorly or not at all. But a trivial modification of the sequence – namely, leaving the first byte alone but replacing each subsequent byte by the difference between it and its predecessor – transforms the sequence into an extremely compressible set of 255 identical bytes, each having the value 1. ([PNG99], p.147) It supports five types of filters, and the encoder may choose to use a different filter for each row of pixels in the image if PNG_WRITE_WEIGHTED_FILTER_SUPPORTED is defined. Table 2.1 lists the five filter types. ([PNG99], p.149) As mentioned before, PNG_WRITE_WEIGHTED_FILTER_SUPPORTED is by default deactivated, therefore the corresponding function png_set_filter_heuristics() in libpng is also inactive. Without weighted QZ.November 30, 2006 9 of 35 Nmae None Sub Up Average Paeth Description Each byte is unchanged. Each byte is replaced with the difference between it and the “corresponding byte” to its left. Each byte is replaced with the difference between it and the byte above it (in the previous row, as it was before filtering). Each byte is replaced with the difference between it and the average of the corresponding bytes to its left and above it, truncating any fractional part. Each byte is replaced with the difference between it and the Paeth predictor of the corresponding bytes to its left, above it, and to its upper left. Table 2.1: Filter Types filter support, PNG encodes in terms of using the given filter for each row of pixels in the image, rather that it automatically chooses the proper filter for each row. However, the PNG development group has come up with a few rules of thumb (or heuristics) for choosing filters wisely. When PNG_WRITE_WEIGHTED_FILTER_SUPPORTED is turned on, a heuristic method called weighted sum of absolute differences can be used. Though, libpng contains code to enable this heuristic, but a considerable amount of experimentation is yet to be done to determine the best combination of weighting factors, compression levels (zlib), and image types. One can also imagine heuristics involving higher-order distance metrics (e.g., root-mean-square sums), sliding averages, and other statistical methods, but to date there has been little research in this area. Lossless compression is a necessity for many applications, but cutting-edge research in image compression tends to focus almost exclusively on lossy methods, since the payoff there is so much greater. Even within the lossless domain, preconditioning the data stream is likely to have less effect than changing the back-end compression algorithm itself. ([PNG99], p.150) Besides, it is worth mentioning that the results obtained by experimenting the heuristic filter method, further confirmed the above discussed aspect that filtering is not the key point which caused PNG’s different encoding behaviors on different image types, because by using the heuristic filter methods on images in the test set, it didn’t achieve better compression ratios than those without it (using a given filter or no filter at all). 2.2 PNG: zlib Only by numerous testing on test set and understanding of PNG, particularly PNG’s filtering mechanism, could I find out that the key component of PNG’s encoding behavior is zlib’s defalte algorithm. The PNG specification defines the single compression method, the deflate algorithm, for all image types. Part of the LZ77 class of compression algorithms, deflate was defined by PKWARE in 1991 as part of the 1.93a beta version of their PKZIP archiver. As an LZ77-derived algorithm, deflate is fundamentally based on the concept of a “sliding window”. One begins with the premise that many types of interesting data, from binary computer instructions to source code to ordinary text to images, are repetitious to varying degrees. The basic idea of a sliding window is to imagine a window of some width immediately preceding the current position in the data stream (and therefore sliding along as the current position is updated), which one can use as a kind of dictionary to encode subsequent data. QZ.November 30, 2006 10 of 35 The deflate compressor is given a great deal of flexibility as to how to compress the data. The programmer must deal with the problem of designing smart algorithms to make the right choices, but the compressor does have choices about how to compress data. There are three modes of compression that the compressor has available: 1. Not compressed at all. This is an intelligent choice for, say, data that’s already been compressed. Data stored in this mode will expand slightly, but not by as much as it would if it were already compressed and one of the other compression methods was tried upon it. 2. Compression, first with LZ77 and then with Huffman coding. The trees that are used to compress in this mode are defined by the Deflate specification itself, and so no extra space needs to be taken to store those trees. 3. Compression, first with LZ77 and then with Huffman coding with trees that the compressor creates and stores along with the data. The data is broken up in “blocks,” and each block uses a single mode of compression. If the compressor wants to switch from non-compressed storage to compression with the trees defined by the specification, or to compression with specified Huffman trees, or to compression with a different pair of Huffman trees, the current block must be ended and a new one begun.1 In the previous chapters, I showed that PNG encodes CGIs with high compression ratio in the result of test set. When considering how deflate’s “sliding window” works, we could see that CGIs have more likely the characteristic that at any given point in the data, there are characters identical to ones that can be found before within the sliding window than photographic images. Therefore, deflate is the reason why PNG encodes CGIs better. 1 c.f. http://www.zlib.net QZ.November 30, 2006 11 of 35 3 Testset for PGF & PNG In oder to develop a predictor function, it is important to analyze a set of chosen images of various types and evaluate the encoding, decoding time and compression ratio for both PGF and PNG formats. Three applications were used for the test. BMP2PNG and CImage were used for encoding any given image file formats (JPEG, GIF, BMP) to PNG (BMP2PNG only does BMP to PNG conversion), whereas the PGF console application measures the encoding,decoding time and compression ratio for any given image formats and converts to PGF back and forth. 3.1 Test Set The set1 of chosen images is called test set listed in Table 3.1: Name 1 Windows 2 Door 3 Hats 4 Woman 5 Racing 6 Boat 7 Hibiscus 8 Houses 9 Aerial 10 Compound 11 Logo 12 Redbrush 13 Paris Hilton 14 ETH 15 ARGB 16 2WO 17 2WO8 Description Kodak Set Kodak Set Kodak Set Kodak Set Kodak Set Kodak Set Kodak Set Kodak Set PGF Set PGF Set PGF Set PGF Set Internet Internet Internet Selfmade Selfmade Format BMP BMP BMP BMP BMP BMP BMP BMP BMP BMP BMP BMP BMP BMP BMP BMP BMP Type Natural Natural Natural Natural Natural Natural Natural Natural Aerial Ortho Screen shot CGI (Text) Natural Natural Natural CGI (Text) CGI (Text) CGI (Text) Color&BPP RGB-24 RGB-24 RGB-24 RGB-24 RGB-24 RGB-24 RGB-24 RGB-24 RGB-24 RGB-24 RGB-24 RGB-24 Gray-8 RGB-24 ARGB-32 RGB-24 RGB-Index-8 Size(KB) 1153 1153 1153 1153 1153 1153 1153 1153 1153 1153 407 1153 162 1153 973 76 27 Dimension(pixel) 768 X 512 768 X 512 768 X 512 512 X 768 768 X 512 768 X 512 768 X 512 768 X 512 768 X 512 768 X 512 615 X 225 768 X 512 362 X 450 512 X 768 576 X 432 160 X 160 160 X 160 Table 3.1: Test Set Images The first 8 images were taken from the Kodak test set, which have the same image file format (BMP), type (Natural picture), color (GRB), bits per pixel (24), size (1153 KB) and dimension (768 X 512, or 512 X 768 for image4). Image 9 to 12 were taken from PGF test set, which are special images for testing purpose. Image 9 is an aerial ortho photo and image10 is a screen shot consisting of text, charts and computer graphics. Image 11 is a CGI with text, image 12 is however a normal natural picture. These four images are in RGB color space having 24 bits per pixel. Except for image 11 logo, all images have the same sizes and dimensions. Image 13 to 17 were either gathered from Internet or 1 In fact, 100 images were tested including photo realistic CGIs and screen shots. For documentation purpose, only 17 images were presented QZ.November 30, 2006 12 of 35 self made. Image 13 is a gray scale picture of 8 bits per pixel, image 15 a 32-bit ARGB and image 17 a 8-bit RGB palette image. 3.2 Software For test set, open source software BMP2PNG, Microsoft CImage demo program and PGF console application was used. These programs were compiled in release version from source code with a bit modification. Details about modifications can be found in the following chapters. 3.2.1 PGF Console Application The PGF console application was developed by the company Xeraina. It can be used for the test without any modification. It encodes various images formats to PGF and decodes from PGF back to them. Besides it not only measures compression ratio by calling the method static double Ratio(char *source, char *dest), but it also calculates both encoding and decoding time. The results of test set by PGF console application is listed in table 3.2, illustrated in figure 3.1 and in figure 3.2. The PGF application makes use of the C++’s CImage Class, which provides enhanced bitmap support, including the ability to load and save images in JPEG, GIF, BMP, and PNG formats. The encoding procedure is accomplished by first reading input images using CImage’s image->Load(source) method, whereas the decoding procedure is conducted by calling CImage’s image->Save(dest) to save decoded images in CImage compatible image file formats. As said, the PGF application can be used without any modification. It would be true if the 8-bit gray image hadn’t been included in the test set. Some codes were added so that the application can accept 8-bit images. The command used were: • pgf input.bmp output.pgf • pgf input.pgf output.bmp The complete codec (c.f. chapter 5) will be then developed on the basis of this PGF console application with the predictor function depicted in chapter 4. 3.2.2 BMP2PNG Open source software BMP2PNG was written by Miyasaka Masaru, it uses the official libpng as the default PNG library. As the name of this software suggests, it encodes BMP images to PNG formats. In oder to measure encoding, decoding time and compression ratio, some codes were added to calculate these values. Similar to PGF console’s measurement, time.h’s clock() function was called in the place where the effective encoding procedure runs, namely: png_write_image(png_ptr, img->rowptr). The results of test set by BMP2PNG is listed in table 3.2, illustrated in figure 3.1 and in figure 3.2. The command used were: bmp2png -9 input.bmp QZ.November 30, 2006 13 of 35 3.2.3 Microsoft CImage Microsoft CImage demo program was used also for PNG encoding and decoding to test Microsoft C++ class CImage’s PNG implementation. Unlike PGF console application and BMP2PNG, the demo application has a GUI. The backend code in CChildView.cpp was modified, so that the GUI outputs encoding time and compression ratio in terms of ::AfxMessageBox(). (see the following code snippet for details) The results of test set by CImage is listed in table 3.2, illustrated in figure 3.1 and in figure 3.2. ////////////////////////////////////////////// #include <time.h> ... clock_t start, end; double elapsed; ... start = clock(); // Encoding ... end = clock(); elapsed = (double) (end-start)/CLOCKS_PER_SEC; CString fmt; fmt.Format("Encoding PNG: %f\n", elapsed); ::AfxMessageBox(fmt); ////////////////////////////////////////////// 3.3 Test Result Name 1 Windows 2 Door 3 Hats 4 Woman 5 Racing 6 Boat 7 Hibiscus 8 Houses 9 Aerial 10 Compound 11 Logo 12 Redbrush 13 Paris Hilton 14 ETH 15 ARGB 16 2WO 17 2WO8 Ratio(%)* 2.228/1.486/1.588 2.533/1.778/1.817 2.795/2.071/2.071 2.406/1.577/1.648 2.147/1.288/1.352 2.401/1.642/1.717 2.663/1.815/1.809 2.075/1.239/1.324 2.323/1.362/1.400 4.940/7.712/7.067 10.259/46.889/31.871 2.971/1.679/1.665 2.318/1.471/1.463 3.635/2.248/2.236 5.029/3.460/3.330 14.746/178.73/60.136 N.A./26.572/19.152 Encode (s)** 0.188/0.141/0.094 0.187/0.188/0.109 0.172/0.156/0.094 0.187/0.141/0.110 0.203/0.141/0.11 0.188/0.171/0.11 0.171/0.171/0.094 0.203/0.141/0.094 0.188/0.141/0.093 0.125/0.188/0.032 0.047/0.062/0.015 0.157/0.156/0.078 0.031/0.046/0.015 0.11/0.125/0.078 0.094/0.171/0.062 0.000/0.000/0.000 N.A./0.000/0.000 Decode PGF(s) 0.141 0.141 0.110 0.140 0.141 0.125 0.125 0.156 0.141 0.094 0.031 0.125 0.031 0.688 0.063 0.000 N.A. Predictor PGF PGF PGF PGF PGF PGF PGF PGF PGF PNG PNG PGF PGF PGF PGF PNG PNG Table 3.2: Test Result *: Ratio: in the oder of PGF, PNG (BMP2PNG), PNG (CImage) ** Encode time: in the oder of PGF, PNG (BMP2PNG), PNG (CImage) QZ.November 30, 2006 14 of 35 Figure 3.1: Ratio Figure 3.2: Encoding time QZ.November 30, 2006 15 of 35 Since the project is about predicting whether a given image should be encoded as PGF or PNG, values of encoding time and compression ratio are sufficient. For this reason decoding time was only measured for PGF images. It is obvious that for certain images, PNG outperforms in both encoding time and compression ratio. These images are 10 compound, 11 Logo, 16 2WO and 17 2WO8, which are all CGIs. This result complies PNG promised strength of compressing artificial images and reinforces its role as the successor of GIF. For photographic or natural images, PGF excels in compression ratio and demonstrated almost the same encoding time as did PNG. Therefore an initial conclusion can be made that PNG is appropriate for CGIs whereas PGF for photographic or natural images. The controversial point however was that when photo realistic CGIs were tested. Photo realistic CGIs also belong to the family of CGI, which should be better encoded with PNG, but the astonishing test result showed that PGF achieved a better compression ratio than PNG without any penalty of encoding time. Therefore the initial conclusion should be modified as: PNG is appropriate for ordinary CGIs whereas PGF for photographic images, photo realistic CGIs, and any other images which can be visually perceived as photographic images. This conclusion leads to the very design decision of the predictor covered in the next chapter. QZ.November 30, 2006 16 of 35 4 Predictor 4.1 Strategy The test results obtained from the test set served as the basis of designing the predictor. It implied that photographic images, photo realistic CGIs, and any other images which can be visually perceived as photographic images have a better compression ratio when encoded with PGF, and ordinary CGIs are better with PNG. As predictor’s decision-making is binary, namely either PGF or PNG, therefore there are two ways to design the predictor. One is to well understand the both technologies of PGF and PNG (because PGF and PNG use different compression mechanism), in oder to design the predictor, and the other is to reduce or transform a given problem to another – the answer to the question how to distinguishing photographic images (or alike) and ordinary CGIs is also the solution to the problem whether an input image should be encoded as PGF or PNG. Nevertheless, it seems to be trivial for human being to differentiate between photographic images (or alike) and ordinary CGIs, it is not easy for a computer to do so. For this reason, The following chapters will explain how a computational program simulates human being’s visual perception, makes the very binary decision and pinpoint the very core design of the predictor function. 4.2 Complete Procedure Before giving insights into each perspective involved in predictor, I would like to present the complete procedure first, so that a big picture can be obtained and a better understanding could be gained.In figure 4.1, the whole processing chain and procedure step 1 to step 3 are depicted. For predictor, only step 2 and its sub steps, step 3 as well as pre-processing need to be observed. The prediction procedures are described in words as follows: 1. Determine the quadruple feature space. (c.f. chapter 4.3) 2. Set up a training set of 42 images of different types: photographic images, photo realistic CGIs, screenshots, ordinary CGIs. And calculate the quadruple features for each training image. Images are classified as PGF and PNG by hand based upon the compression ratios resulted from each compression engine respectively (c.f. chapter 4.4) 3. Predictor Function (c.f. chapter 4.5) a) For any given image, the predictor first calculate its quadruplet features b) Calculate the euclidean distance between the input image and all images in the training set QZ.November 30, 2006 17 of 35 INPUT PROCESSING STEP 1: SImage.Load(coffee.bmp) Reads an input image of file format JPG/GIF/BMP/PNG using SImage. A sample image coffee.bmp is used. SImage CODEC STEP 2: Predictor Determines whether an input image should be encoded as PGF or PNG STEP 2.1: Calculate the feature space of coffee.bmp = {0.206179, 0.00394969, 33.00000, 0.275221} STEP 2.2: Calulate the euclidean distance between coffee and all images in the traing set coffee.bmp PRE-PROCESSING DEFINING FEATURE SPACE An image can be mathematically represented as a quadruple feature space with following items: Number of Unique Colors Spatial Variation of Color Pixel Saturation Prevalent Color OUTPUT STEP 3: DECISION & ENCODING coffee.pgf The three closest neighbors are 2 from PGF set, 1 from PNG set. K=3, so that choose PGF as the encoding codec. X PNG coffee.png STEP 2.3: Use K-nearest neighbor algorithm to choose K=3 closest samples and make a binary decision. TRAINING SET Precalculated quadruple of training images, left hand side images for PGF, right for PNG {0.665056,0.00892906,1.09302,0.00241607,1}, {0.655052,0.0732959,7.775,0.00487805,1}, {0.750381,0.0463105,1.30814,0.00890132,1}, {0.507407,0.0119276,0.333333,0.0488889,1}, {0.869764,0.0345812,1.88542,0.00198367,1}, {0.95963,0.0301363,1.875,0.00148148,1}, {0.927528,0.0254809,1.47222,0.00730337,1} ...... {0.0392666,0.00333464,13.5,0.56109,0}, {0.165956,0.0305393,21.5,0.606954,0}, {0.214155,0.0152545,3.17874,0.483503,0}, {0.229212,0.0302594,15.6061,0.335853,0}, {0.216314,0.0134404,25.8,0.149423,0} {0.167755,0.0258481,5.44595,0.608707,0}, {0.0281633,0.0112492,42.3333,0.689524,0}, ...... Figure 4.1: Procedure c) Use K-nearest neighbor algorithm to choose K = 3 closest samples and make a binary decision d) The method predictor() returns true (PGF) or false (PNG) 4.3 Feature Space There are several papers describing how to distinguishing photographic images and CGIs. The paper [DPG97] focuses on the detection between photographs and normal CGIs, whereas both paper [DPC05] and [HRP05] deal with how to distinguishing between photographs and photo realistic CGIs. Although the solution to predictor design is not exactly the same as the solutions found in above papers, i could find inspirational ideas from them. As far as predictor’s efficiency is concerned, only visual features, but no texture features1 were considered. Besides, not all visual features depicted in paper [DPG97] and [DPC05] were selected for the predictor for the same reason2 . Four features were therefore carefully chosen which are, Number of Unique Colors Ratio, Spatial Variation of Color, Pixel Saturation, and Prevalent Color. In pattern recognition a feature space is an abstract space where each pattern sample is represented as a point in n-dimensional space whose dimension is determined by the number of features used to 1 2 Texture features need implementation of Gabor Filters which is time consuming Edge detection is also very time consuming QZ.November 30, 2006 18 of 35 describe the patterns.3 For this reason, an image can be then thought as a point in the 4-dimensional space, which to some extent characterize the image in another form: numbers. And all photographic images (or alike) form a different convex hull as ordinary CGIs do. Though these two hulls expand differently in the feature space, it could be overlapped. These four features will be circumstantially explained from chapter 4.3.1 to chapter 4.3.4. Saying that these four features also demonstrate good correlation among them, is only partly true, for the fact that feature 1 and feature 4 have a higher value of its correlation coefficient, which implies that either of them can be omitted. However, in oder to make a strong mathematical/statistical statement, more data is needed. In the project only 42 training images were used, it is therefore impossible to make any accurate assertion about the orthogonality, The following figures from 4.2 to 4.7 illustrate this fact. Figure 4.2: F1 vs. F2 Figure 4.3: F1 vs. F3 3 http://en.wikipedia.org/wiki/Feature_space, 2006 QZ.November 30, 2006 19 of 35 Figure 4.4: F1 vs. F4 Figure 4.5: F2 vs. F3 QZ.November 30, 2006 20 of 35 Figure 4.6: F2 vs. F4 Figure 4.7: F3 vs. F4 QZ.November 30, 2006 21 of 35 4.3.1 Number of Unique Colors CGIs tend to have fewer unique colors than photographs. In the study of J. Wu et el [DPC05], it was observed that the average number of colors of photographs is shown to be about 25% more than that of ordinary CGIs. Given an image with an <R,G,B> triplet for each pixel, two pixels have the identical color, if their CImage’s <R,G,B> COLORREF is the same. The ratio U of the number of unique colors can be calculated by normalizing the total number of unique colors by the total number of pixels: U= Ndif f erent_triplets Npixels Images suitable for PGF have more unique colors, images suitable for PNG have fewer unique colors. Pixels are selected randomly every 100 pixels in the original image for efficiency purpose. The method below returns the ratio U of the number of unique colors: /////////////////////////////////// double NumberOfUniqueColorsRatio(); /////////////////////////////////// 4.3.2 Spatial Variation of Color The observation [DPC05] that color changes occur in a lesser extent from pixel to pixel in ordinary CGIs than in photographs. To quantify the spatial variation of color, for each pixel of images, the orientation of the plane that best fitted a 5*5 neighborhood centered on the the pixel of interest is determined. After normalizing triplets <R,G,B> to nR , nG , and nB . Then a variation α can be obtained: P24 p α= i=1 (nRi − nR0 )2 + (nGi − nG0 )2 + (nBi − nB0 )2 5×5 The euclidean distance of each pixel in the 5*5 neighborhood to the center pixel is summed up and normalized by the total number 25. Images suitable for PGF have a higher value of variation, images suitable for PNG have a lower value of variation.Pixels are selected randomly every 100 pixels in the original image. The method below returns a variation α of the given image: ///////////////////////////////// double SpatialVariationOfColor(); ///////////////////////////////// 4.3.3 Pixel Saturation Another observation [DPC05] that the mean and variance of pixel saturation of ordinary CGIs are more than those of photographs. Photographs, on the other hand, contain more unsaturated pixels than QZ.November 30, 2006 22 of 35 CGIs. Pixel saturation can be quantified in terms of the ratio between the number of unsaturated pixels in a saturation histogram. The number of saturated and unsaturated pixels is obtained by counting the highest bin and the lowest bin in the histogram. The saturation value S can be described as: S= Nsaturated Nunsaturated Images suitable for PGF have more unsaturated pixels: S small, images suitable for PNG have more saturated pixels: S large. Pixels are selected randomly every 100 pixels in the original image. The method below returns the ratio S: ///////////////////////// double PixelSaturation(); ///////////////////////// 4.3.4 Prevalent Color The prevalent color metric is obtained by finding the most frequently occurring color in the image. The ratio P can be calculated by dividing the fraction of pixels that have the prevalent color by all pixels. P = Nprevalent_color Npixels Pixels are selected randomly every 100 pixels in the original image. The method below returns a ratio P: //////////////////////// double PrevalentColor(); //////////////////////// 4.4 Training Set In oder to use the K-nearest neighbor algorithm to classify an input image, a training set of 42 images of different types was made: photographic images, photo realistic CGIs, screen shots, ordinary CGIs. Then the four feature values of each image in the training set were calculated. Last but not least, PGF console application and BMP2PNG were once more used to determine the compression ratio of each training image, so that a manually classification of these images can be made. The data of training images are represented as a two dimensional array. The vertical element is defined by size, which is 42, indicating the quantity of images in the training set, whereas the number 5 indicates the four features (the first four elements) and the feature whether the image is better encoded in PGF (1) or in PNG (0) QZ.November 30, 2006 23 of 35 ////////////////////////////////////////////////////////////////////////////////// double trainingSet[size][5] = { // the last number 1 means that this image has a // better compression ratio when encoded as PGF {0.665056,0.00892906,1.09302,0.00241607,1}, {0.655052,0.0732959,7.775,0.00487805,1}, {0.750381,0.0463105,1.30814,0.00890132,1}, {0.507407,0.0119276,0.333333,0.0488889,1}, ... {0.167755,0.0258481,5.44595,0.608707,0}, {0.0281633,0.0112492,42.3333,0.689524,0}, {0.229212,0.0302594,15.6061,0.335853,0}, {0.216314,0.0134404,25.8,0.149423,0}, }; ////////////////////////////////////////////////////////////////////////////////// 4.5 Predictor Function Though the complete procedure has been once outlined in chapter 4.2, here I would like to emphasize the procedure in regard to predictor once more. And figure 4.8 depicts this procedure. PNG PGF {0.0933852,0.0195191,3,0.000381476,1}, //00_windows {0.0524402,0.0319922,2.84615,0.00172936,1}, //01_door {0.0913252,0.0128056,1.62745,0.000991836,1}, //02_hats {0.120012,0.0250592,1,0.000279749,1}, //03_woman {0.139061,0.0417489,2.59701,0.000254317,1}, //04_racing {0.108237,0.0142471,8.31915,0.000457771,1}, //05_boat {0.0915541,0.0200647,1.9375,0.000966405,1}, //06_hibiscus {0.145546,0.0178759,2.64211,0.00185651,1}, //07_houses {0.141019,0.00766904,3597.5,0.000203454,1}, //08_aerial {0.124793,0.0455948,1.10897,0.0270593,1}, //09_redbrusch ...... TRAINING SET precalculated quadruple of training images, left hand side images for PGF, right for PNG {0.0181074,0.0078233,188,0.0511686,0}, //29_compound {0.00970748,0.00853744,3.50538,0.0521565,0}, //30_screenshot {0.0237185,0.000880459,789.462,0.0679235,0}, //31_babytux {0.0308027,0.0270314,5.6,0.0610544,0}, //32_screenshot2 {0.00410204,0.0118225,18.8667,0.0687619,0}, //33_screenshot3 {0.0391667,0.0224448,4.32741,0.0188281,0}, //34_screenshot4 {0.00459466,0.018112,20.3333,0.0707354,0}, //35_screenshot6 {0.00606016,0.00328506,6.3,0.0559531,0}, //36_screenshot9 {0.0290544,0.0306379,22,0.0605775,0}, //37_screenshot10 {0.0324311,0.0121008,30,0.014339,0} //41_screenshot14 ... K=3 K-nearest neighbor algorithm* Number of Unique Colors: 0.206179 Spatial Variation of Color: 0.00394969 Pixel Saturation: 33.00000 Prevalent Color: 0.275221 * K-nearest neighbor algorithm using euclidean distance: p = coffee.bmp; n=4; p1 = 0.206179; p2 = 0.00394969; p3 = 33.00000; p4 = 0.275221; q is any image of the training set, and the index 1-4 represents the corresponding feature space values respectively. 3 neighbor points are chosen, 2 red, 1 blue, so use PGF. This graph is only for illustration purpose, the euclidean distanceis therefore not correctedly depicted, so is the choice. Figure 4.8: K-nearest Neighbor Algorithm 1. For any given image, the predictor first calculate its quadruplet features 2. Calculate the euclidean distance between the input image and all images in the training set. 3. Use K-nearest neighbor algorithm to choose K = 3 closest samples and make a binary decision. QZ.November 30, 2006 24 of 35 4. The method predictor() returns true (PGF) or false (PNG). The function signature is: ///////////////// bool Predictor(); ///////////////// 4.5.1 Euclidean Distance If an input image is coffee.bmp, it has the following values for the feature space • p1 = 0.206179 • p2 = 0.00394969 • p3 = 33.00000 • p4 = 0.275221 The euclidean dDistance between this image and another image (let’s say q, with q1 , q2 , q3, and q4 as quadruplet values in the feature space) from the training set is: p or (p1 − q1 )2 + (p2 − q2 )2 + (p3 − q3 )2 + (p4 − q4 )2 ) v u n uX t (pi − qi )2 , where n = 4 i=1 The function signature is: ///////////////////////////////////////////////////////////////// double getEuclideanDistance (double a[], double b[], int length); ///////////////////////////////////////////////////////////////// 4.5.2 KNN Algorithm In pattern recognition, the K-nearest neighbor algorithm (KNN) is a method for classifying objects based on closest training examples in the feature space. The training examples are mapped into multidimensional feature space. The space is partitioned into regions by class labels of the training samples. A point in the space is assigned to the class c if it is the most frequent class label among the k nearest training samples. Usually euclidean distance is used. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples. In the actual classification phase, the same features as before are computed QZ.November 30, 2006 25 of 35 for the test sample (whose class is not known). Distances from the new vector to all stored vectors are computed and K closest samples are selected. The new point is predicted to belong to the most numerous class within the set. The best choice of k depends upon the data; generally, larger values of K reduce the effect of noise on the classification, but make boundaries between classes less distinct.4 The KNN implementation in this project is listed below: 1. Determine parameter K, the number of nearest neighbors, choosing K = 3. 2. Calculate the distance between the query-instance (the input image’s quadruplet) and all images in the training set by using euclidean distance 3. Sort the distance and determine nearest neighbors based on the K-th minimum distance 4. Use simple majority of the category of nearest neighbors as the prediction value of the query instance 5. Predictor outputs true if PGF,otherwise PNG. The function signature is: //////////////////// bool KNNAlgorithm(); //////////////////// 4.5.3 Utilities The method double getSaturation(int R, int G, int B) converts the <R,G,B> to its corresponding saturation value, the algorithm used is from [DBV05], p.261. Chigh = max(R, G, B), Clow = min(R, G, B), Crng = Chigh − Clow . LHLS ← SHLS 4 0, 0.5 · ← 0.5 · 0, Chigh + Clow 2 Crng LHLS , Crng 1−LHLS , for LHLS = 0 for 0 < LHLS ≤ 0.5 for 0.5 < LHLS < 1 for LHLS = 1 c.f. http://en.wikipedia.org/wiki/Nearest_neighbor_(pattern_recognition), 2006 QZ.November 30, 2006 26 of 35 4.6 NUnit Test NUnit5 was used as the unit-testing framework for the predictor. The sample code is listed below. //////////////////////////////////////////////////////////////////////////////// namespace NUnitPP { SImage *image = 0; void ppunit::Init() { image = new SImage(); source = "test.bmp"; image->Load(source); } // test.bmp: 0.794167, 0.0149229, 0.428571, 0.0158333 void ppunit::NumberOfUniqueColoursRatio_NUnit(){ double result = image->NumberOfUniqueColorsRatio()-0.7941670; Assert::IsTrue(result < 0.05, "Test for NumberOfUniqueColoursRatio() failed"); } //////////////////////////////////////////////////////////////////////////////// Now only one image is tested. More images can be tested if source is redefined as an array of char*. Because the methods for calculating features have a random nature, the results were different from each test, a pre-determined value (here 0.05) was used to allow tiny bias but still set certain limits. The NUnit showed that the predictor’s accuracy is higher for photographic images or alike, and eight percent for ordinary CGIs. 4.7 Extensibility vs. Efficiency The accuracy could be further enhanced if other metrics like farthest neighbor metric, color histogram, dimension ratio or texture feature metric using Gabor filters (wavelet) were computed. However for the sake of predictor’s efficiency and the scope of this project, these metrics are regarded as extensible features. Besides, correct dealing with gray scale images needs to be further examined. Moreover, lossy compression of PGF can also be included to the project. Another issue relating efficiency is the use of void* GetBits( ). By using this pointer, along with the value returned by int GetPitch( ) const, one can locate and change individual pixels in an image, which achieves the highest efficiency possible. However, in the project, the method void* CImage::GetPixelAddress(int x, int y) was used which is faster than GetPixel(), but slower than the combination of GetBits() and GetPitch(). Also, some optimizations were included in the predictor. If an input image is indexed, so no prediction is needed, because PGF doesn’t support index image at the moment. The observation that any image type converted to have bits per pixel fewer than 256 is better encoded with PNG. Nevertheless, it is unknown what is the highest number of BPP. 5 c.f. http://www.nunit.org/ QZ.November 30, 2006 27 of 35 5 Complete Codec With the help of predictor, now a complete codec can be constructed which encodes any CImage compatible types to either PGF or PNG and decodes any CImage compatible types as well as PGF to any CImage compatible types. The new codec is called SImage. “S” refers to “smart”. SImage not only demonstrates high efficiency, but it also produces a sound accuracy of prediction, which is almost invariably for photographic images or alike (photo realistic CGIs, certain compound images and any images which can be visually perceived as photographic images), and eighty percent for normal CGIs. In chapter 5.1, SImage is introduced and explained. In chapter 5.2, a sample console application is presented to show how to use the SImage class and its methods. In chapter 5.4, some suggestions of possible improvements are outlined. 5.1 SImage The new codec SImage, takes Microsoft’s CImage as its private member which empowers SImage to utilize all the functionalities CImage provides, makes uses of the predictor function and implements libpgf’s existing encoding and decoding mechanism. As a result, SImage is the desired codec required by this project. The class SImage is a traditional C++ class, which has private member varaibles, private methods and public methods, among which there are several core functions are listed below. Besides, the fact that CImage is frequently used and acts as a private member in SImage, I made the syntax of SImage similar to CImage’s. /////////////////////////////////////// /// Load a PGF image or a CImage image /// @param source, An input file void Load(char* source); /////////////////////////////////////// /// Save as a CImage image void CSave(char* dest); /////////////////////////////////////// /// Load a CImage image void CLoad(char* source); //////////////////////////////////////////////////////////// /// Save as a PGF or a CImage image according to predictor() void Save(char* dest); QZ.November 30, 2006 28 of 35 The reason why there are two types of save(), is that the project description requires that the encoding should be either in PGF and PNG. By separating the normal CImage’s Save() (CSave() in SImage) and SImage’s Save() which works with predictor, the codec dictates that the encoding only outputs PGF or PNG file types. It is worth mentioning that when calling void Save(char* dest), the method first calls predictor(). If predictor() returns ture, then Save() calls SavingPGF(), otherwise SavingImage(). 5.2 PGFPNG Console Application The PGFPNG console application is based upon the PGF console application by the company Xeraina. The purpose of this application is to show how to utiltize the SImage class as a codec library. The main methods waits four arguments to be inputted by user from the command line in a console. Two kinds of command can be written which are: pgfpng -e input.bmp output pgfpng -d input.pgf output.bmp In encoding mode (-e), the input image can be CImage formats like BMP, JPG, GIF and PNG. But be aware that no file suffix needed for the output file, because the Save() method automatically decides the correct file format to be saved. As in decoding mode, any image of CImage formats or PGF format can be an input image, and the output image is of CImage format. There is no prediction involved in decoding mode. 5.3 Result The result of compression ratio and encoding time1 is listed in the table 5.1 and depicted in figure 5.1. It shows that SImage has achieved the best compression ratio despite the fact that it has cost double of the time which PGF encodes an image. However, it is worth mentioning that the prediction time is almost constant for any image, which implies that the difference of encoding time between different file formats will be smaller if the input image is of bigger size. Besides, as mentioned in chapter 4.7, the efficiency can be further improved when direct bit manipulation is used. Encoding Time Compression Ratio PGF 2.251 65.469 PNG 1.188 122.494 SImage 4.86 134.598 Table 5.1: Test Set Images 1 For SImage encoding time comprises of prediction QZ.November 30, 2006 29 of 35 5.4 Future Improvement As a matter of fact, with SImage the boundary between encoding and decoding becomes blur. If not for the sake of project requirement, the SImage could just have Save() and Load() as the only two methods dealing with encoding and decoding, which enables the codec to encode any CImage and PGF formats to any CImage and PGF formats and decodes accordingly. However, this mechanism only makes sense if the predictor is enhanced with multiple decision making, which is not any more a binary decision whether PGF or PNG, rather PGF or PNG, or JPG, or GIf, or BMP. Figure 5.1: Comparison among PGF, PNG and SImage QZ.November 30, 2006 30 of 35 6 Problems and Experience 6.1 CImage & libpng & NUnit CImage provides enhanced bitmap support, including the ability to load and save images in JPEG, GIF, BMP, and PNG formats. It is quite new, therefore almost no example codes in the world except for the demo program on Microsoft’s MSDN website. It made the initial study of CImage a difficult and time consuming task. As I was very glad to find the method void* CImage::GetPixelAddress(int x, int y), which is said to be more efficient1 than its counterpart: COLORREF CImage::GetPixel(int x, int y);. The sad reality however was that I should have used void* GetBits( ). By using this pointer, along with the value returned by int GetPitch( ) const, I could have located and changed individual pixels in an image. Moreover, CImage doesn’t provide user any information about filter and zlib when encoding PNG. In other words there is no way to manipulate the behavior of filtering and compression if CImage was used for encoding images. The fact that filtering and compression are minor issues in PNG encoding, makes this CImage’s shortcoming insignificant. The positive side is that CImage ’s PNG encoding is much faster than calling libpng’s routines. The PNG reference library libpng, on the other hand, offers extensive documentation both online in writing and in the source code. Although the learning curve is rigid, the final outcome compensate the learning hardship. However, it seems that the development of PNG has been stagnating for the fact that the old heuristic methods used in filtering still marked as an experimental methods in the source code. If like [PNG99] said, there were no more research on filtering, but rather the back-end zlib, why still include the heuristic methods in the source code? It only makes the newbies spending more time on depreciated things. NUnit is a easy to use unit-testing framework. Especially its GUI interface. Besides, NUnit has a plenty of documentations for newbies to consult. The only problem is the following statement in its ReadMe.txt: Due to an issue that has not been adequately addressed in the installation procedure you will have to indicate the directory where the nunit.framework.dll is located on your disk. This problem presents itself by having this program failing to compile and link. Steps: 1.) Right-click on the ‘‘cpp-sample’’ element. Select ‘‘Properties’’ on the context menu. 2.) Select the ‘‘C/C++’’ element in the tree. 3.) The field that needs to be updated is ‘‘Resolve #using references’’. Update this 1 c.f. http://easai.00freehost.com/Blur/index.html QZ.November 30, 2006 31 of 35 field to the following directory: ‘‘C:\Program Files\NUnit V2.0\bin’’ Note: This directory is the default installation directory if you have chosen a different directory then navigate to it. 5.) Recompile. This issue is being worked on and will be fixed in the release. The suggested procedure didn’t work in Visual Studio 2005. The trick is to manually adjust the value in Properties -> Common Properties -> References. 6.2 Problems This is an interesting project, the problem however, is time. As far as I know, the graduation project or thesis has normally a duration of 6 months, the reason why Fachhochschule (FH) only permits her students a merely 6-week time might be the characteristic of FH’s graduation projects, which are in general product-driven projects or practical projects. Nevertheless, the project I had was more like a research project, which is quality driven and ought to be mathematically provable. In other words, accuracy, efficiency and professionalism are essential. These attributes can be only well gained if sufficient time is permitted. Additionally, I have to admit that I had absolutely no idea what I was going to do with the project the first time when I read the project description. Or rather say, I must understand things like PNG, PGF, CImage, and general image processing first in order to get started at all! However it took me more than two weeks to get an idea of what exactly I should do and how I would do. Until I found the real beauty of this project and how to improve both code/design and documentation, time is again a matter of fact with which I must compromise. As said, for an ordinary project, it might be well, since the end product can be used a a prototype rather than a commercial product. For this project, however, if the result was not mathematically provable (problem with the correlation in feature spaces, too fewer images in training set, etc) or not efficient (problem with getPixel), then the result would be meaningless. Last but not least, as I was working alone, the good side is that I had the total control of the project development and could better make time schedule. On other side of the coin, 6 weeks was too short to be alone, because I could only work serially. 6.3 Experience From an absolute layman to some kind of insider who has been working with various image codecs and image processing for the last 6 weeks, I felt excited. At the beginning, I had to read a plenty of documentations (PNG, PGF, source code, online documentations, scripts) in oder to start to program. Not to mention the fact that a lot of time was invested to understand the relevant technologies and mathematical theories before coding could be started, which hindered productivity and made some of my efforts on this project invisible. I must admit what I have experienced was hard. Alone the libpng about PNG was fairly tough. Besides, there were additional mathematical theories required when programming the predictor. To mention a few, quadruplet, feature space, variation and k-nearest neighbor algorithm were just four examples. QZ.November 30, 2006 32 of 35 The Microsoft CImage class was frequently used in this project. It was interesting to experience the simplicity of CImage’s various functions and its surprisingly efficient encoding and decoding behaviors. Nevertheless I wish that more tutorials on CImage would be available online. Or I could write one sometime in the future. Not until I had drawn the correlation figures and calculated correlation values between two features, did I really understand what is correlation and why people need it. It was kind of self enlightenment, which I very appreciated. The most important thing I experienced was how to cope with projects having a research characteristic. Now I “forget” the very details of CImage, PNG, PGF, predictor, KNN and SImage, what yet have permanently remained in my mind are methodologies used in the project, the strong willing to learn new things, capability of working under the time pressure and an always positive thoughts toward whatever difficulties! QZ.November 30, 2006 33 of 35 7 Conclusion The new codec SImage, making use of CImage, libpgf, and predcitor, is a perfect image format. It pre-determines any input image whether to be encoded as PGF or PNG. It not only boasts a high efficiency, but also achieves the best compression ratio among PGF and PNG in lossless encoding mode. The accuracy can be improved if all features are almost orthogonal, and the efficiency can be enhanced if direct bit manipulation is used. Additionally, the more images in the training set, the more accurate the prediction. Besides, the choice of K can also be optimized. All in all, SImage is a perfect image format, if more tests on it is conducted and further optimizations are implemented. QZ.November 30, 2006 34 of 35 Bibliography [DBV05] Burger, W., M.J.Burge: Digitale Bildverarbeitung. Springer, 2005. [DPC05] Wu J., M. Kamath V., Poehlman S.: Detecting Differences between Photographs and Computer Generated Images. IEEE, 2005. [DPG97] Athitsos. M. J., Swain. C. Franckel: Distinguishing photographs and graphics on the world wide web. Proceeding of the Workshop on Content-Based Access of Image and Video Libraries (CBAIVL). Puerto Rico, 1997. [HRP05] Lyu. S. Farid. H.: How Realistic is Photorealistic?. IEEE Transactions on Signal Processing, 53(2):845-850, 2005. [PGF01] Dr. Christoph Stamm: PGF: A new progressive file format for lossy and lossless image compression, 2001. [PNG99] Greg Roelofs: PNG, The Definitive Guide. O’Reilly, 1999. QZ.November 30, 2006 35 of 35