The Perfect Image Format

advertisement
The Perfect Image Format
Graduation Thesis of
Zhong Qing
University of Applied Sciences Northwestern Switzerland
School of Engineering
qing.zhong@students.fhnw.ch
Professor: Prof. Dr. Christoph Stamm
November 30, 2006
ABSTRACT. Progressive Graphics File (PGF), a very efficient image format for natural pictures,
is based upon the discrete wavelet transformation and various coding techniques. Portable Network
Graphics (PNG) is a well-known image format and a successor of GIF. It is a proper format for artificial
or computer generated images (CGIs). This project focuses on designing a predictor function which
decides on whether to choose PGF or PNG as the underlying encoding engine for any given image of
any type. The aim of the project is to develop such a predictor and a complete codec. The result is a
new image codec which not only demonstrates high efficiency, but it also produces a sound accuracy
of prediction, which is almost invariably for photo-graphic images, and eighty percent for CGIs.
Keywords: Progressive Graphics File, Portable Network Graphics, PGF, PNG, Predictor
METHODOLOGY. A test set of 100 images were chosen to examine the compression ratio and
encoding / decoding efficiency of both PGF and PNG codecs. The result showed that PGF encodes
photographic, photo realistic CGIs and any other photograph-like images better, whereas PNG is
appropriate for encoding ordinary CGIs as well as compound images like screen shots. It is easy
for human beings to distinguish among these kinds of images, but not a computer. For this reason,
a quadruplet feature space which mathematically represents an image was carefully chosen, so that
with the help of K-nearest neighbor algorithm and a set of training images, any given image can be
classified as either photographic(like) images or ordinary CGIs. Thus the predictor simulates human
being’s visual perception and effectively predicts the best suitable image formats to be used. The
following illustration (figure 0.1) depicts the entire procedure of this project.
INPUT
PROCESSING
STEP 1: SImage.Load(coffee.bmp)
Reads an input image of file format
JPG/GIF/BMP/PNG using SImage.
A sample image coffee.bmp is used.
SImage CODEC
STEP 2: Predictor
Determines whether an input image
should be encoded as PGF or PNG
STEP 2.1:
Calculate the feature space of
coffee.bmp = {0.206179, 0.00394969,
33.00000, 0.275221}
STEP 2.2:
Calulate the euclidean distance
between coffee and all images in the
traing set
coffee.bmp
PRE-PROCESSING
DEFINING FEATURE SPACE
An image can be mathematically
represented as a quadruple feature
space with following items:
Number of Unique Colors
Spatial Variation of Color
Pixel Saturation
Prevalent Color
OUTPUT
STEP 3: DECISION & ENCODING
coffee.pgf
The three closest neighbors are 2 from PGF set, 1
from PNG set.
K=3, so that choose PGF
as the encoding codec.
X
PNG
coffee.png
STEP 2.3:
Use K-nearest neighbor algorithm to
choose K=3 closest samples and make
a binary decision.
TRAINING SET
Precalculated quadruple of training images, left hand side images for PGF, right for PNG
{0.665056,0.00892906,1.09302,0.00241607,1},
{0.655052,0.0732959,7.775,0.00487805,1},
{0.750381,0.0463105,1.30814,0.00890132,1},
{0.507407,0.0119276,0.333333,0.0488889,1},
{0.869764,0.0345812,1.88542,0.00198367,1},
{0.95963,0.0301363,1.875,0.00148148,1},
{0.927528,0.0254809,1.47222,0.00730337,1}
......
Figure 0.1: Procedure
{0.0392666,0.00333464,13.5,0.56109,0},
{0.165956,0.0305393,21.5,0.606954,0},
{0.214155,0.0152545,3.17874,0.483503,0},
{0.229212,0.0302594,15.6061,0.335853,0},
{0.216314,0.0134404,25.8,0.149423,0}
{0.167755,0.0258481,5.44595,0.608707,0},
{0.0281633,0.0112492,42.3333,0.689524,0},
......
Contents
1 Introduction
1.1 Progressive Graphics File (PGF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Portable Network Graphics (PNG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
7
7
8
2 PNG
2.1 PNG: Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 PNG: zlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
9
10
3 Testset for PGF & PNG
3.1 Test Set . . . . . . . . . . . . . .
3.2 Software . . . . . . . . . . . . . .
3.2.1 PGF Console Application
3.2.2 BMP2PNG . . . . . . . .
3.2.3 Microsoft CImage . . . . .
3.3 Test Result . . . . . . . . . . . .
.
.
.
.
.
.
12
12
13
13
13
14
14
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Predictor
4.1 Strategy . . . . . . . . . . . . . .
4.2 Complete Procedure . . . . . . .
4.3 Feature Space . . . . . . . . . . .
4.3.1 Number of Unique Colors
4.3.2 Spatial Variation of Color
4.3.3 Pixel Saturation . . . . .
4.3.4 Prevalent Color . . . . . .
4.4 Training Set . . . . . . . . . . . .
4.5 Predictor Function . . . . . . . .
4.5.1 Euclidean Distance . . . .
4.5.2 KNN Algorithm . . . . .
4.5.3 Utilities . . . . . . . . . .
4.6 NUnit Test . . . . . . . . . . . .
4.7 Extensibility vs. Efficiency . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
17
18
22
22
22
23
23
24
25
25
26
27
27
5 Complete Codec
5.1 SImage . . . . . . . . . . . . . .
5.2 PGFPNG Console Application
5.3 Result . . . . . . . . . . . . . .
5.4 Future Improvement . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
28
28
29
29
30
6 Problems and Experience
6.1 CImage & libpng & NUnit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
31
32
32
7 Conclusion
34
QZ.November 30, 2006
.
.
.
.
3 of 35
List of Tables
2.1
Filter Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
3.1
3.2
Test Set Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Test Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
14
5.1
Test Set Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
QZ.November 30, 2006
4 of 35
List of Figures
0.1
Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.1
Blackbox
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
3.1
3.2
Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Encoding time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
15
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
Procedure . . . . .
F1 vs. F2 . . . . .
F1 vs. F3 . . . . .
F1 vs. F4 . . . . .
F2 vs. F3 . . . . .
F2 vs. F4 . . . . .
F3 vs. F4 . . . . .
K-nearest Neighbor
.
.
.
.
.
.
.
.
18
19
19
20
20
21
21
24
5.1
Comparison among PGF, PNG and SImage . . . . . . . . . . . . . . . . . . . . . . . .
30
QZ.November 30, 2006
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Algorithm .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 of 35
1 Introduction
My graduation project at the FHNW is to develop an image predictor which decides on an output
image file format of either Progressive Graphics File (PGF) or Portable Network Graphics (PNG) after
reading any input image of any type.
Ordinary users always have difficulties to find the proper image file format to use. The predictor
prevents users from using incorrect format and assists them to choose the best format for various types
of imagery. With help of the predictor, a complete new codec can be created which first analyzes the
input images, then saves in either PGF or PNG format upon predictor’s decision. The new codec and
predictor can be regarded as a black box function which is illustrated in figure 1.1.
Three terminologies used throughout the paper are listed below:
• Encoding Time: the time duration of encoding (conversion) and writing
• Decoding Time: the time duration of reading and decoding (conversion)
• Compression ratio: the ratio between an input file size and an output file size:
image.bmp = 800 KB, output: image.pgf = 400 KB, then the ratio is 2.0.
input
output ,
i.e. input:
In order to program the predictor, understanding PGF and PNG is essential. In chapter 1.1 and 1.2, I
will be describing PGF and PNG technologies briefly. Detailed information about PNG, like filtering,
zlib and libpng, will be discussed in Chapter 2. In chapter 3, test set is introduced and the algorithm
and the implementation of predictor is explained in chapter 4, .The complete codec SImage is presented
in chapter 5. The problems occurred and experience gathered are covered in chapter 6. A conclusion
is drawn in chapter 7.
INPUT
PROCESSING
BLACK BOX
OUTPUT
coffee.pgf
coffee.bmp
coffee.png
Figure 1.1: Blackbox
QZ.November 30, 2006
6 of 35
1.1 Progressive Graphics File (PGF)
Progressive Graphics File (PGF)1 was developed by Dr. Christoph Stamm at the Swiss Federal Institute of Technology Zurich. It is based on a color transform (RGB to YUV) and a discrete wavelet
transform with progressive coding features. It conducts both lossy and lossless compression. In its
lossy compression mode, PGF outperforms JPEG for natural and aerial ortho photos and achieves
better compression efficiency. (cf. [PGF01]) It supports the following types of images, namely
• 1-bit Bitmap
• 8-, 16, and 31-bit gray scale support
• 12-, 16-, 24-, and 48-bit RGB support
• 32-bit RGBA support
• 32- und 64-bit CMYK support
• 24- und 48-bit L*a*b* support
1.2 Portable Network Graphics (PNG)
Portable Network Graphics (PNG) is an image file format storing bitmapped (raster) images on computers, which was designed to be the successor to the Graphics Interchange Format (GIF). It supports
three types of images, namely
• 1-, 2-, 4- and 8-bit palette support (like GIF)
• 1-, 2-, 4-, 8- and 16-bit gray scale support
• 8- and 16-bit-per-sample (that is, 24- and 48-bit) true color support
• 32-bit ARGB
Besides, PNG boasts a maximal 16-bit mode full alpha transparency compared to the simple on-off
(1-bit) transparency like GIF. As far as compression is concerned, PNG specification defines a single
compression method, the deflate algorithm, for all image types. It is a part of the LZ77 class of
compression algorithms and was defined by PKWARE in 1991 as part of the 1.93a beta version of their
PKZIP archiver. (cf. [PNG99]), which unlike the Unisys’ patent on the LZW compression method
used in GIF, is royalty free.
Moreover, PNG conducts only lossless compression and is the only lossless true color format for web
applications.
Furthermore, the PNG reference library libpng uses zlib as its underlying compression and decompression engine. Detailed information about libpng will be explained in Chapter 2, in respect of
filtering and zlib.
1
c.f. http://de.wikipedia.org/wiki/Progressive_Graphics_File
QZ.November 30, 2006
7 of 35
1.3 Infrastructure
Microsoft Visual 2005 was chosen as the default software development platform. Open source software
BMP2PNG by Miyasaka Masaru2 , libpng with zlib, Microsoft CImage3 demo program and PGF console
application by Xeraina4 had been compiled in release versions from source codes, all of which were used
to evaluate the test set, develop the predictor function and program the complete new codec SImage.
IBM Notebook T43p was the hardware used in the project with WindowsXP professional, 1.86 GHz
Pentium M, 2MB L2-Cache, and 1 GB RAM.
2
c.f. http://cetus.sakura.ne.jp/softlab/b2p-home/
c.f. CImage is from Microsoft, URL: http://msdn2.microsoft.com/en-us/library/bwea7by5(vs.80).aspx
4
c.f. http://www.xeraina.ch/pgf/
3
QZ.November 30, 2006
8 of 35
2 PNG
The PNG reference library libpng used was of version 1.2.12 - June 27, 2006. As long as this project
is concerned, only filter() and deflate() (of its underlying compression engine zlib) will be extensively discussed in chapter 2.1 and chapter 2.2 respectively. A brief introduction of libpng is in the
subsequent chapter.
The granddaddy of all PNG libararies is libpng, the free reference library available as Standard (ANSI)
C source code and used by many, if not most, PNG-supporting applications. It uses the similarly free
zlib library (portable C source code) for compression and decompression. ([PNG99], p.219)
It offers such ordinary functionalities as reading, reading progressively and writing PNG files. It also
has extensions which are marked as “inactive preprocessor block” in the source code. For example the
auto filtering method
////////////////////////////////////////////////////////////////////
png_set_filter_heuristics(png_structp png_ptr, int heuristic_method,
int num_weights, png_doublep filter_weights,
png_doublep filter_costs)
////////////////////////////////////////////////////////////////////
is deactivated because PNG_WRITE_WEIGHTED_FILTER_SUPPORTED is by default not defined.
2.1 PNG: Filters
The first assumption made about why PNG encodes certain image types better than the others was
PNG’s filtering mechanism.
PNG supports a precompression step called filtering. Filtering is a method of reversibly transforming
the image data so that the main compression engine can operate more efficiently. Besides, filtering
improves compression in gray scale and true color images. As a simple example, consider a sequence
of bytes increasing uniformly from 1 to 255. Since there is no repetition in the sequence, it compresses
either very poorly or not at all. But a trivial modification of the sequence – namely, leaving the
first byte alone but replacing each subsequent byte by the difference between it and its predecessor
– transforms the sequence into an extremely compressible set of 255 identical bytes, each having the
value 1. ([PNG99], p.147)
It supports five types of filters, and the encoder may choose to use a different filter for each row of
pixels in the image if PNG_WRITE_WEIGHTED_FILTER_SUPPORTED is defined. Table 2.1 lists the five filter
types. ([PNG99], p.149)
As mentioned before, PNG_WRITE_WEIGHTED_FILTER_SUPPORTED is by default deactivated, therefore the
corresponding function png_set_filter_heuristics() in libpng is also inactive. Without weighted
QZ.November 30, 2006
9 of 35
Nmae
None
Sub
Up
Average
Paeth
Description
Each byte is unchanged.
Each byte is replaced with the difference between it and the
“corresponding byte” to its left.
Each byte is replaced with the difference between it and the
byte above it (in the previous row, as it was before filtering).
Each byte is replaced with the difference between it and the
average of the corresponding bytes to its left and above it,
truncating any fractional part.
Each byte is replaced with the difference between it and the
Paeth predictor of the corresponding bytes to its left, above
it, and to its upper left.
Table 2.1: Filter Types
filter support, PNG encodes in terms of using the given filter for each row of pixels in the image,
rather that it automatically chooses the proper filter for each row. However, the PNG development group has come up with a few rules of thumb (or heuristics) for choosing filters wisely. When
PNG_WRITE_WEIGHTED_FILTER_SUPPORTED is turned on, a heuristic method called weighted sum of absolute differences can be used. Though, libpng contains code to enable this heuristic, but a considerable
amount of experimentation is yet to be done to determine the best combination of weighting factors,
compression levels (zlib), and image types.
One can also imagine heuristics involving higher-order distance metrics (e.g., root-mean-square sums),
sliding averages, and other statistical methods, but to date there has been little research in this
area. Lossless compression is a necessity for many applications, but cutting-edge research in image
compression tends to focus almost exclusively on lossy methods, since the payoff there is so much
greater. Even within the lossless domain, preconditioning the data stream is likely to have less effect
than changing the back-end compression algorithm itself. ([PNG99], p.150)
Besides, it is worth mentioning that the results obtained by experimenting the heuristic filter method,
further confirmed the above discussed aspect that filtering is not the key point which caused PNG’s
different encoding behaviors on different image types, because by using the heuristic filter methods on
images in the test set, it didn’t achieve better compression ratios than those without it (using a given
filter or no filter at all).
2.2 PNG: zlib
Only by numerous testing on test set and understanding of PNG, particularly PNG’s filtering mechanism, could I find out that the key component of PNG’s encoding behavior is zlib’s defalte algorithm.
The PNG specification defines the single compression method, the deflate algorithm, for all image
types.
Part of the LZ77 class of compression algorithms, deflate was defined by PKWARE in 1991 as part of
the 1.93a beta version of their PKZIP archiver. As an LZ77-derived algorithm, deflate is fundamentally
based on the concept of a “sliding window”. One begins with the premise that many types of interesting
data, from binary computer instructions to source code to ordinary text to images, are repetitious to
varying degrees. The basic idea of a sliding window is to imagine a window of some width immediately
preceding the current position in the data stream (and therefore sliding along as the current position
is updated), which one can use as a kind of dictionary to encode subsequent data.
QZ.November 30, 2006
10 of 35
The deflate compressor is given a great deal of flexibility as to how to compress the data. The
programmer must deal with the problem of designing smart algorithms to make the right choices, but
the compressor does have choices about how to compress data.
There are three modes of compression that the compressor has available:
1. Not compressed at all. This is an intelligent choice for, say, data that’s already been compressed.
Data stored in this mode will expand slightly, but not by as much as it would if it were already
compressed and one of the other compression methods was tried upon it.
2. Compression, first with LZ77 and then with Huffman coding. The trees that are used to compress
in this mode are defined by the Deflate specification itself, and so no extra space needs to be
taken to store those trees.
3. Compression, first with LZ77 and then with Huffman coding with trees that the compressor
creates and stores along with the data.
The data is broken up in “blocks,” and each block uses a single mode of compression. If the compressor
wants to switch from non-compressed storage to compression with the trees defined by the specification,
or to compression with specified Huffman trees, or to compression with a different pair of Huffman
trees, the current block must be ended and a new one begun.1
In the previous chapters, I showed that PNG encodes CGIs with high compression ratio in the result
of test set. When considering how deflate’s “sliding window” works, we could see that CGIs have more
likely the characteristic that at any given point in the data, there are characters identical to ones that
can be found before within the sliding window than photographic images. Therefore, deflate is the
reason why PNG encodes CGIs better.
1
c.f. http://www.zlib.net
QZ.November 30, 2006
11 of 35
3 Testset for PGF & PNG
In oder to develop a predictor function, it is important to analyze a set of chosen images of various
types and evaluate the encoding, decoding time and compression ratio for both PGF and PNG formats.
Three applications were used for the test. BMP2PNG and CImage were used for encoding any given
image file formats (JPEG, GIF, BMP) to PNG (BMP2PNG only does BMP to PNG conversion),
whereas the PGF console application measures the encoding,decoding time and compression ratio for
any given image formats and converts to PGF back and forth.
3.1 Test Set
The set1 of chosen images is called test set listed in Table 3.1:
Name
1 Windows
2 Door
3 Hats
4 Woman
5 Racing
6 Boat
7 Hibiscus
8 Houses
9 Aerial
10 Compound
11 Logo
12 Redbrush
13 Paris Hilton
14 ETH
15 ARGB
16 2WO
17 2WO8
Description
Kodak Set
Kodak Set
Kodak Set
Kodak Set
Kodak Set
Kodak Set
Kodak Set
Kodak Set
PGF Set
PGF Set
PGF Set
PGF Set
Internet
Internet
Internet
Selfmade
Selfmade
Format
BMP
BMP
BMP
BMP
BMP
BMP
BMP
BMP
BMP
BMP
BMP
BMP
BMP
BMP
BMP
BMP
BMP
Type
Natural
Natural
Natural
Natural
Natural
Natural
Natural
Natural
Aerial Ortho
Screen shot
CGI (Text)
Natural
Natural
Natural
CGI (Text)
CGI (Text)
CGI (Text)
Color&BPP
RGB-24
RGB-24
RGB-24
RGB-24
RGB-24
RGB-24
RGB-24
RGB-24
RGB-24
RGB-24
RGB-24
RGB-24
Gray-8
RGB-24
ARGB-32
RGB-24
RGB-Index-8
Size(KB)
1153
1153
1153
1153
1153
1153
1153
1153
1153
1153
407
1153
162
1153
973
76
27
Dimension(pixel)
768 X 512
768 X 512
768 X 512
512 X 768
768 X 512
768 X 512
768 X 512
768 X 512
768 X 512
768 X 512
615 X 225
768 X 512
362 X 450
512 X 768
576 X 432
160 X 160
160 X 160
Table 3.1: Test Set Images
The first 8 images were taken from the Kodak test set, which have the same image file format (BMP),
type (Natural picture), color (GRB), bits per pixel (24), size (1153 KB) and dimension (768 X 512,
or 512 X 768 for image4). Image 9 to 12 were taken from PGF test set, which are special images
for testing purpose. Image 9 is an aerial ortho photo and image10 is a screen shot consisting of text,
charts and computer graphics. Image 11 is a CGI with text, image 12 is however a normal natural
picture. These four images are in RGB color space having 24 bits per pixel. Except for image 11 logo,
all images have the same sizes and dimensions. Image 13 to 17 were either gathered from Internet or
1
In fact, 100 images were tested including photo realistic CGIs and screen shots. For documentation purpose, only 17
images were presented
QZ.November 30, 2006
12 of 35
self made. Image 13 is a gray scale picture of 8 bits per pixel, image 15 a 32-bit ARGB and image 17
a 8-bit RGB palette image.
3.2 Software
For test set, open source software BMP2PNG, Microsoft CImage demo program and PGF console
application was used. These programs were compiled in release version from source code with a bit
modification. Details about modifications can be found in the following chapters.
3.2.1 PGF Console Application
The PGF console application was developed by the company Xeraina. It can be used for the test without any modification. It encodes various images formats to PGF and decodes from PGF back to them.
Besides it not only measures compression ratio by calling the method static double Ratio(char
*source, char *dest), but it also calculates both encoding and decoding time. The results of test
set by PGF console application is listed in table 3.2, illustrated in figure 3.1 and in figure 3.2.
The PGF application makes use of the C++’s CImage Class, which provides enhanced bitmap support,
including the ability to load and save images in JPEG, GIF, BMP, and PNG formats. The encoding
procedure is accomplished by first reading input images using CImage’s image->Load(source) method,
whereas the decoding procedure is conducted by calling CImage’s image->Save(dest) to save decoded
images in CImage compatible image file formats.
As said, the PGF application can be used without any modification. It would be true if the 8-bit gray
image hadn’t been included in the test set. Some codes were added so that the application can accept
8-bit images.
The command used were:
• pgf input.bmp output.pgf
• pgf input.pgf output.bmp
The complete codec (c.f. chapter 5) will be then developed on the basis of this PGF console application
with the predictor function depicted in chapter 4.
3.2.2 BMP2PNG
Open source software BMP2PNG was written by Miyasaka Masaru, it uses the official libpng as the default
PNG library. As the name of this software suggests, it encodes BMP images to PNG formats. In oder
to measure encoding, decoding time and compression ratio, some codes were added to calculate these
values. Similar to PGF console’s measurement, time.h’s clock() function was called in the place
where the effective encoding procedure runs, namely: png_write_image(png_ptr, img->rowptr).
The results of test set by BMP2PNG is listed in table 3.2, illustrated in figure 3.1 and in figure 3.2.
The command used were: bmp2png -9 input.bmp
QZ.November 30, 2006
13 of 35
3.2.3 Microsoft CImage
Microsoft CImage demo program was used also for PNG encoding and decoding to test Microsoft
C++ class CImage’s PNG implementation. Unlike PGF console application and BMP2PNG, the demo
application has a GUI. The backend code in CChildView.cpp was modified, so that the GUI outputs
encoding time and compression ratio in terms of ::AfxMessageBox(). (see the following code snippet
for details) The results of test set by CImage is listed in table 3.2, illustrated in figure 3.1 and in figure
3.2.
//////////////////////////////////////////////
#include <time.h>
...
clock_t start, end;
double elapsed;
...
start = clock();
// Encoding ...
end = clock();
elapsed = (double) (end-start)/CLOCKS_PER_SEC;
CString fmt;
fmt.Format("Encoding PNG: %f\n", elapsed);
::AfxMessageBox(fmt);
//////////////////////////////////////////////
3.3 Test Result
Name
1 Windows
2 Door
3 Hats
4 Woman
5 Racing
6 Boat
7 Hibiscus
8 Houses
9 Aerial
10 Compound
11 Logo
12 Redbrush
13 Paris Hilton
14 ETH
15 ARGB
16 2WO
17 2WO8
Ratio(%)*
2.228/1.486/1.588
2.533/1.778/1.817
2.795/2.071/2.071
2.406/1.577/1.648
2.147/1.288/1.352
2.401/1.642/1.717
2.663/1.815/1.809
2.075/1.239/1.324
2.323/1.362/1.400
4.940/7.712/7.067
10.259/46.889/31.871
2.971/1.679/1.665
2.318/1.471/1.463
3.635/2.248/2.236
5.029/3.460/3.330
14.746/178.73/60.136
N.A./26.572/19.152
Encode (s)**
0.188/0.141/0.094
0.187/0.188/0.109
0.172/0.156/0.094
0.187/0.141/0.110
0.203/0.141/0.11
0.188/0.171/0.11
0.171/0.171/0.094
0.203/0.141/0.094
0.188/0.141/0.093
0.125/0.188/0.032
0.047/0.062/0.015
0.157/0.156/0.078
0.031/0.046/0.015
0.11/0.125/0.078
0.094/0.171/0.062
0.000/0.000/0.000
N.A./0.000/0.000
Decode PGF(s)
0.141
0.141
0.110
0.140
0.141
0.125
0.125
0.156
0.141
0.094
0.031
0.125
0.031
0.688
0.063
0.000
N.A.
Predictor
PGF
PGF
PGF
PGF
PGF
PGF
PGF
PGF
PGF
PNG
PNG
PGF
PGF
PGF
PGF
PNG
PNG
Table 3.2: Test Result
*: Ratio: in the oder of PGF, PNG (BMP2PNG), PNG (CImage)
** Encode time: in the oder of PGF, PNG (BMP2PNG), PNG (CImage)
QZ.November 30, 2006
14 of 35
Figure 3.1: Ratio
Figure 3.2: Encoding time
QZ.November 30, 2006
15 of 35
Since the project is about predicting whether a given image should be encoded as PGF or PNG, values
of encoding time and compression ratio are sufficient. For this reason decoding time was only measured
for PGF images.
It is obvious that for certain images, PNG outperforms in both encoding time and compression ratio.
These images are 10 compound, 11 Logo, 16 2WO and 17 2WO8, which are all CGIs. This result
complies PNG promised strength of compressing artificial images and reinforces its role as the successor of GIF. For photographic or natural images, PGF excels in compression ratio and demonstrated
almost the same encoding time as did PNG. Therefore an initial conclusion can be made that PNG is
appropriate for CGIs whereas PGF for photographic or natural images.
The controversial point however was that when photo realistic CGIs were tested. Photo realistic CGIs
also belong to the family of CGI, which should be better encoded with PNG, but the astonishing
test result showed that PGF achieved a better compression ratio than PNG without any penalty of
encoding time. Therefore the initial conclusion should be modified as: PNG is appropriate for ordinary
CGIs whereas PGF for photographic images, photo realistic CGIs, and any other images which can
be visually perceived as photographic images. This conclusion leads to the very design decision of the
predictor covered in the next chapter.
QZ.November 30, 2006
16 of 35
4 Predictor
4.1 Strategy
The test results obtained from the test set served as the basis of designing the predictor. It implied
that photographic images, photo realistic CGIs, and any other images which can be visually perceived
as photographic images have a better compression ratio when encoded with PGF, and ordinary CGIs
are better with PNG.
As predictor’s decision-making is binary, namely either PGF or PNG, therefore there are two ways to
design the predictor. One is to well understand the both technologies of PGF and PNG (because PGF
and PNG use different compression mechanism), in oder to design the predictor, and the other is to
reduce or transform a given problem to another – the answer to the question how to distinguishing
photographic images (or alike) and ordinary CGIs is also the solution to the problem whether an input
image should be encoded as PGF or PNG. Nevertheless, it seems to be trivial for human being to
differentiate between photographic images (or alike) and ordinary CGIs, it is not easy for a computer
to do so. For this reason, The following chapters will explain how a computational program simulates
human being’s visual perception, makes the very binary decision and pinpoint the very core design of
the predictor function.
4.2 Complete Procedure
Before giving insights into each perspective involved in predictor, I would like to present the complete
procedure first, so that a big picture can be obtained and a better understanding could be gained.In
figure 4.1, the whole processing chain and procedure step 1 to step 3 are depicted. For predictor,
only step 2 and its sub steps, step 3 as well as pre-processing need to be observed. The prediction
procedures are described in words as follows:
1. Determine the quadruple feature space. (c.f. chapter 4.3)
2. Set up a training set of 42 images of different types: photographic images, photo realistic CGIs,
screenshots, ordinary CGIs. And calculate the quadruple features for each training image. Images
are classified as PGF and PNG by hand based upon the compression ratios resulted from each
compression engine respectively (c.f. chapter 4.4)
3. Predictor Function (c.f. chapter 4.5)
a) For any given image, the predictor first calculate its quadruplet features
b) Calculate the euclidean distance between the input image and all images in the training set
QZ.November 30, 2006
17 of 35
INPUT
PROCESSING
STEP 1: SImage.Load(coffee.bmp)
Reads an input image of file format
JPG/GIF/BMP/PNG using SImage.
A sample image coffee.bmp is used.
SImage CODEC
STEP 2: Predictor
Determines whether an input image
should be encoded as PGF or PNG
STEP 2.1:
Calculate the feature space of
coffee.bmp = {0.206179, 0.00394969,
33.00000, 0.275221}
STEP 2.2:
Calulate the euclidean distance
between coffee and all images in the
traing set
coffee.bmp
PRE-PROCESSING
DEFINING FEATURE SPACE
An image can be mathematically
represented as a quadruple feature
space with following items:
Number of Unique Colors
Spatial Variation of Color
Pixel Saturation
Prevalent Color
OUTPUT
STEP 3: DECISION & ENCODING
coffee.pgf
The three closest neighbors are 2 from PGF set, 1
from PNG set.
K=3, so that choose PGF
as the encoding codec.
X
PNG
coffee.png
STEP 2.3:
Use K-nearest neighbor algorithm to
choose K=3 closest samples and make
a binary decision.
TRAINING SET
Precalculated quadruple of training images, left hand side images for PGF, right for PNG
{0.665056,0.00892906,1.09302,0.00241607,1},
{0.655052,0.0732959,7.775,0.00487805,1},
{0.750381,0.0463105,1.30814,0.00890132,1},
{0.507407,0.0119276,0.333333,0.0488889,1},
{0.869764,0.0345812,1.88542,0.00198367,1},
{0.95963,0.0301363,1.875,0.00148148,1},
{0.927528,0.0254809,1.47222,0.00730337,1}
......
{0.0392666,0.00333464,13.5,0.56109,0},
{0.165956,0.0305393,21.5,0.606954,0},
{0.214155,0.0152545,3.17874,0.483503,0},
{0.229212,0.0302594,15.6061,0.335853,0},
{0.216314,0.0134404,25.8,0.149423,0}
{0.167755,0.0258481,5.44595,0.608707,0},
{0.0281633,0.0112492,42.3333,0.689524,0},
......
Figure 4.1: Procedure
c) Use K-nearest neighbor algorithm to choose K = 3 closest samples and make a binary
decision
d) The method predictor() returns true (PGF) or false (PNG)
4.3 Feature Space
There are several papers describing how to distinguishing photographic images and CGIs. The paper
[DPG97] focuses on the detection between photographs and normal CGIs, whereas both paper [DPC05]
and [HRP05] deal with how to distinguishing between photographs and photo realistic CGIs. Although
the solution to predictor design is not exactly the same as the solutions found in above papers, i could
find inspirational ideas from them. As far as predictor’s efficiency is concerned, only visual features,
but no texture features1 were considered. Besides, not all visual features depicted in paper [DPG97]
and [DPC05] were selected for the predictor for the same reason2 . Four features were therefore carefully
chosen which are, Number of Unique Colors Ratio, Spatial Variation of Color, Pixel Saturation, and
Prevalent Color.
In pattern recognition a feature space is an abstract space where each pattern sample is represented
as a point in n-dimensional space whose dimension is determined by the number of features used to
1
2
Texture features need implementation of Gabor Filters which is time consuming
Edge detection is also very time consuming
QZ.November 30, 2006
18 of 35
describe the patterns.3 For this reason, an image can be then thought as a point in the 4-dimensional
space, which to some extent characterize the image in another form: numbers. And all photographic
images (or alike) form a different convex hull as ordinary CGIs do. Though these two hulls expand
differently in the feature space, it could be overlapped. These four features will be circumstantially
explained from chapter 4.3.1 to chapter 4.3.4. Saying that these four features also demonstrate good
correlation among them, is only partly true, for the fact that feature 1 and feature 4 have a higher value
of its correlation coefficient, which implies that either of them can be omitted. However, in oder to
make a strong mathematical/statistical statement, more data is needed. In the project only 42 training
images were used, it is therefore impossible to make any accurate assertion about the orthogonality,
The following figures from 4.2 to 4.7 illustrate this fact.
Figure 4.2: F1 vs. F2
Figure 4.3: F1 vs. F3
3
http://en.wikipedia.org/wiki/Feature_space, 2006
QZ.November 30, 2006
19 of 35
Figure 4.4: F1 vs. F4
Figure 4.5: F2 vs. F3
QZ.November 30, 2006
20 of 35
Figure 4.6: F2 vs. F4
Figure 4.7: F3 vs. F4
QZ.November 30, 2006
21 of 35
4.3.1 Number of Unique Colors
CGIs tend to have fewer unique colors than photographs. In the study of J. Wu et el [DPC05], it was
observed that the average number of colors of photographs is shown to be about 25% more than that
of ordinary CGIs.
Given an image with an <R,G,B> triplet for each pixel, two pixels have the identical color, if their
CImage’s <R,G,B> COLORREF is the same. The ratio U of the number of unique colors can be calculated
by normalizing the total number of unique colors by the total number of pixels:
U=
Ndif f erent_triplets
Npixels
Images suitable for PGF have more unique colors, images suitable for PNG have fewer unique colors.
Pixels are selected randomly every 100 pixels in the original image for efficiency purpose. The method
below returns the ratio U of the number of unique colors:
///////////////////////////////////
double NumberOfUniqueColorsRatio();
///////////////////////////////////
4.3.2 Spatial Variation of Color
The observation [DPC05] that color changes occur in a lesser extent from pixel to pixel in ordinary CGIs
than in photographs. To quantify the spatial variation of color, for each pixel of images, the orientation
of the plane that best fitted a 5*5 neighborhood centered on the the pixel of interest is determined.
After normalizing triplets <R,G,B> to nR , nG , and nB . Then a variation α can be obtained:
P24 p
α=
i=1
(nRi − nR0 )2 + (nGi − nG0 )2 + (nBi − nB0 )2
5×5
The euclidean distance of each pixel in the 5*5 neighborhood to the center pixel is summed up and
normalized by the total number 25.
Images suitable for PGF have a higher value of variation, images suitable for PNG have a lower value
of variation.Pixels are selected randomly every 100 pixels in the original image. The method below
returns a variation α of the given image:
/////////////////////////////////
double SpatialVariationOfColor();
/////////////////////////////////
4.3.3 Pixel Saturation
Another observation [DPC05] that the mean and variance of pixel saturation of ordinary CGIs are more
than those of photographs. Photographs, on the other hand, contain more unsaturated pixels than
QZ.November 30, 2006
22 of 35
CGIs. Pixel saturation can be quantified in terms of the ratio between the number of unsaturated
pixels in a saturation histogram. The number of saturated and unsaturated pixels is obtained by
counting the highest bin and the lowest bin in the histogram. The saturation value S can be described
as:
S=
Nsaturated
Nunsaturated
Images suitable for PGF have more unsaturated pixels: S small, images suitable for PNG have more
saturated pixels: S large. Pixels are selected randomly every 100 pixels in the original image. The
method below returns the ratio S:
/////////////////////////
double PixelSaturation();
/////////////////////////
4.3.4 Prevalent Color
The prevalent color metric is obtained by finding the most frequently occurring color in the image.
The ratio P can be calculated by dividing the fraction of pixels that have the prevalent color by all
pixels.
P =
Nprevalent_color
Npixels
Pixels are selected randomly every 100 pixels in the original image. The method below returns a ratio
P:
////////////////////////
double PrevalentColor();
////////////////////////
4.4 Training Set
In oder to use the K-nearest neighbor algorithm to classify an input image, a training set of 42 images
of different types was made: photographic images, photo realistic CGIs, screen shots, ordinary CGIs.
Then the four feature values of each image in the training set were calculated. Last but not least, PGF
console application and BMP2PNG were once more used to determine the compression ratio of each
training image, so that a manually classification of these images can be made. The data of training
images are represented as a two dimensional array. The vertical element is defined by size, which is 42,
indicating the quantity of images in the training set, whereas the number 5 indicates the four features
(the first four elements) and the feature whether the image is better encoded in PGF (1) or in PNG
(0)
QZ.November 30, 2006
23 of 35
//////////////////////////////////////////////////////////////////////////////////
double trainingSet[size][5] = { // the last number 1 means that this image has a
// better compression ratio when encoded as PGF
{0.665056,0.00892906,1.09302,0.00241607,1},
{0.655052,0.0732959,7.775,0.00487805,1},
{0.750381,0.0463105,1.30814,0.00890132,1},
{0.507407,0.0119276,0.333333,0.0488889,1},
...
{0.167755,0.0258481,5.44595,0.608707,0},
{0.0281633,0.0112492,42.3333,0.689524,0},
{0.229212,0.0302594,15.6061,0.335853,0},
{0.216314,0.0134404,25.8,0.149423,0},
};
//////////////////////////////////////////////////////////////////////////////////
4.5 Predictor Function
Though the complete procedure has been once outlined in chapter 4.2, here I would like to emphasize
the procedure in regard to predictor once more. And figure 4.8 depicts this procedure.
PNG
PGF
{0.0933852,0.0195191,3,0.000381476,1}, //00_windows
{0.0524402,0.0319922,2.84615,0.00172936,1}, //01_door
{0.0913252,0.0128056,1.62745,0.000991836,1}, //02_hats
{0.120012,0.0250592,1,0.000279749,1}, //03_woman
{0.139061,0.0417489,2.59701,0.000254317,1}, //04_racing
{0.108237,0.0142471,8.31915,0.000457771,1}, //05_boat
{0.0915541,0.0200647,1.9375,0.000966405,1}, //06_hibiscus
{0.145546,0.0178759,2.64211,0.00185651,1}, //07_houses
{0.141019,0.00766904,3597.5,0.000203454,1}, //08_aerial
{0.124793,0.0455948,1.10897,0.0270593,1}, //09_redbrusch
......
TRAINING SET
precalculated quadruple
of training images, left
hand side images for
PGF, right for PNG
{0.0181074,0.0078233,188,0.0511686,0}, //29_compound
{0.00970748,0.00853744,3.50538,0.0521565,0}, //30_screenshot
{0.0237185,0.000880459,789.462,0.0679235,0}, //31_babytux
{0.0308027,0.0270314,5.6,0.0610544,0}, //32_screenshot2
{0.00410204,0.0118225,18.8667,0.0687619,0}, //33_screenshot3
{0.0391667,0.0224448,4.32741,0.0188281,0}, //34_screenshot4
{0.00459466,0.018112,20.3333,0.0707354,0}, //35_screenshot6
{0.00606016,0.00328506,6.3,0.0559531,0}, //36_screenshot9
{0.0290544,0.0306379,22,0.0605775,0}, //37_screenshot10
{0.0324311,0.0121008,30,0.014339,0} //41_screenshot14
...
K=3
K-nearest neighbor algorithm*
Number of Unique Colors: 0.206179
Spatial Variation of Color: 0.00394969
Pixel Saturation: 33.00000
Prevalent Color: 0.275221
* K-nearest neighbor algorithm using euclidean distance:
p = coffee.bmp; n=4;
p1 = 0.206179; p2 = 0.00394969;
p3 = 33.00000; p4 = 0.275221;
q is any image of the training set, and the index 1-4 represents
the corresponding feature space values respectively.
3 neighbor points are chosen,
2 red, 1 blue, so use PGF.
This graph is only for illustration purpose, the euclidean distanceis therefore not correctedly depicted, so is the choice.
Figure 4.8: K-nearest Neighbor Algorithm
1. For any given image, the predictor first calculate its quadruplet features
2. Calculate the euclidean distance between the input image and all images in the training set.
3. Use K-nearest neighbor algorithm to choose K = 3 closest samples and make a binary decision.
QZ.November 30, 2006
24 of 35
4. The method predictor() returns true (PGF) or false (PNG).
The function signature is:
/////////////////
bool Predictor();
/////////////////
4.5.1 Euclidean Distance
If an input image is coffee.bmp, it has the following values for the feature space
• p1 = 0.206179
• p2 = 0.00394969
• p3 = 33.00000
• p4 = 0.275221
The euclidean dDistance between this image and another image (let’s say q, with q1 , q2 , q3, and q4 as
quadruplet values in the feature space) from the training set is:
p
or
(p1 − q1 )2 + (p2 − q2 )2 + (p3 − q3 )2 + (p4 − q4 )2 )
v
u n
uX
t
(pi − qi )2 , where n = 4
i=1
The function signature is:
/////////////////////////////////////////////////////////////////
double getEuclideanDistance (double a[], double b[], int length);
/////////////////////////////////////////////////////////////////
4.5.2 KNN Algorithm
In pattern recognition, the K-nearest neighbor algorithm (KNN) is a method for classifying objects
based on closest training examples in the feature space. The training examples are mapped into
multidimensional feature space. The space is partitioned into regions by class labels of the training
samples. A point in the space is assigned to the class c if it is the most frequent class label among the
k nearest training samples. Usually euclidean distance is used.
The training phase of the algorithm consists only of storing the feature vectors and class labels of
the training samples. In the actual classification phase, the same features as before are computed
QZ.November 30, 2006
25 of 35
for the test sample (whose class is not known). Distances from the new vector to all stored vectors
are computed and K closest samples are selected. The new point is predicted to belong to the most
numerous class within the set. The best choice of k depends upon the data; generally, larger values of
K reduce the effect of noise on the classification, but make boundaries between classes less distinct.4
The KNN implementation in this project is listed below:
1. Determine parameter K, the number of nearest neighbors, choosing K = 3.
2. Calculate the distance between the query-instance (the input image’s quadruplet) and all images
in the training set by using euclidean distance
3. Sort the distance and determine nearest neighbors based on the K-th minimum distance
4. Use simple majority of the category of nearest neighbors as the prediction value of the query
instance
5. Predictor outputs true if PGF,otherwise PNG.
The function signature is:
////////////////////
bool KNNAlgorithm();
////////////////////
4.5.3 Utilities
The method double getSaturation(int R, int G, int B) converts the <R,G,B> to its corresponding saturation value, the algorithm used is from [DBV05], p.261.
Chigh = max(R, G, B), Clow = min(R, G, B), Crng = Chigh − Clow .
LHLS ←
SHLS
4


0,



0.5 ·
←

0.5 ·



0,
Chigh + Clow
2
Crng
LHLS ,
Crng
1−LHLS ,
for LHLS = 0
for 0 < LHLS ≤ 0.5
for 0.5 < LHLS < 1
for LHLS = 1
c.f. http://en.wikipedia.org/wiki/Nearest_neighbor_(pattern_recognition), 2006
QZ.November 30, 2006
26 of 35
4.6 NUnit Test
NUnit5 was used as the unit-testing framework for the predictor.
The sample code is listed below.
////////////////////////////////////////////////////////////////////////////////
namespace NUnitPP {
SImage *image = 0;
void ppunit::Init() {
image = new SImage();
source = "test.bmp";
image->Load(source);
}
// test.bmp: 0.794167, 0.0149229, 0.428571, 0.0158333
void ppunit::NumberOfUniqueColoursRatio_NUnit(){
double result = image->NumberOfUniqueColorsRatio()-0.7941670;
Assert::IsTrue(result < 0.05, "Test for NumberOfUniqueColoursRatio() failed");
}
////////////////////////////////////////////////////////////////////////////////
Now only one image is tested. More images can be tested if source is redefined as an array of char*.
Because the methods for calculating features have a random nature, the results were different from
each test, a pre-determined value (here 0.05) was used to allow tiny bias but still set certain limits.
The NUnit showed that the predictor’s accuracy is higher for photographic images or alike, and eight
percent for ordinary CGIs.
4.7 Extensibility vs. Efficiency
The accuracy could be further enhanced if other metrics like farthest neighbor metric, color histogram,
dimension ratio or texture feature metric using Gabor filters (wavelet) were computed. However for
the sake of predictor’s efficiency and the scope of this project, these metrics are regarded as extensible
features. Besides, correct dealing with gray scale images needs to be further examined. Moreover,
lossy compression of PGF can also be included to the project.
Another issue relating efficiency is the use of void* GetBits( ). By using this pointer, along with
the value returned by int GetPitch( ) const, one can locate and change individual pixels in an
image, which achieves the highest efficiency possible. However, in the project, the method void*
CImage::GetPixelAddress(int x, int y) was used which is faster than GetPixel(), but slower
than the combination of GetBits() and GetPitch().
Also, some optimizations were included in the predictor. If an input image is indexed, so no prediction
is needed, because PGF doesn’t support index image at the moment. The observation that any image
type converted to have bits per pixel fewer than 256 is better encoded with PNG. Nevertheless, it is
unknown what is the highest number of BPP.
5
c.f. http://www.nunit.org/
QZ.November 30, 2006
27 of 35
5 Complete Codec
With the help of predictor, now a complete codec can be constructed which encodes any CImage
compatible types to either PGF or PNG and decodes any CImage compatible types as well as PGF
to any CImage compatible types. The new codec is called SImage. “S” refers to “smart”. SImage not
only demonstrates high efficiency, but it also produces a sound accuracy of prediction, which is almost
invariably for photographic images or alike (photo realistic CGIs, certain compound images and any
images which can be visually perceived as photographic images), and eighty percent for normal CGIs.
In chapter 5.1, SImage is introduced and explained. In chapter 5.2, a sample console application is
presented to show how to use the SImage class and its methods. In chapter 5.4, some suggestions of
possible improvements are outlined.
5.1 SImage
The new codec SImage, takes Microsoft’s CImage as its private member which empowers SImage to
utilize all the functionalities CImage provides, makes uses of the predictor function and implements
libpgf’s existing encoding and decoding mechanism. As a result, SImage is the desired codec required
by this project.
The class SImage is a traditional C++ class, which has private member varaibles, private methods
and public methods, among which there are several core functions are listed below. Besides, the fact
that CImage is frequently used and acts as a private member in SImage, I made the syntax of SImage
similar to CImage’s.
///////////////////////////////////////
/// Load a PGF image or a CImage image
/// @param source, An input file
void Load(char* source);
///////////////////////////////////////
/// Save as a CImage image
void CSave(char* dest);
///////////////////////////////////////
/// Load a CImage image
void CLoad(char* source);
////////////////////////////////////////////////////////////
/// Save as a PGF or a CImage image according to predictor()
void Save(char* dest);
QZ.November 30, 2006
28 of 35
The reason why there are two types of save(), is that the project description requires that the encoding
should be either in PGF and PNG. By separating the normal CImage’s Save() (CSave() in SImage)
and SImage’s Save() which works with predictor, the codec dictates that the encoding only outputs
PGF or PNG file types.
It is worth mentioning that when calling void Save(char* dest), the method first calls predictor().
If predictor() returns ture, then Save() calls SavingPGF(), otherwise SavingImage().
5.2 PGFPNG Console Application
The PGFPNG console application is based upon the PGF console application by the company Xeraina.
The purpose of this application is to show how to utiltize the SImage class as a codec library.
The main methods waits four arguments to be inputted by user from the command line in a console.
Two kinds of command can be written which are:
pgfpng -e input.bmp output
pgfpng -d input.pgf output.bmp
In encoding mode (-e), the input image can be CImage formats like BMP, JPG, GIF and PNG. But be
aware that no file suffix needed for the output file, because the Save() method automatically decides
the correct file format to be saved.
As in decoding mode, any image of CImage formats or PGF format can be an input image, and the
output image is of CImage format. There is no prediction involved in decoding mode.
5.3 Result
The result of compression ratio and encoding time1 is listed in the table 5.1 and depicted in figure 5.1.
It shows that SImage has achieved the best compression ratio despite the fact that it has cost double
of the time which PGF encodes an image. However, it is worth mentioning that the prediction time
is almost constant for any image, which implies that the difference of encoding time between different
file formats will be smaller if the input image is of bigger size.
Besides, as mentioned in chapter 4.7, the efficiency can be further improved when direct bit manipulation is used.
Encoding Time
Compression Ratio
PGF
2.251
65.469
PNG
1.188
122.494
SImage
4.86
134.598
Table 5.1: Test Set Images
1
For SImage encoding time comprises of prediction
QZ.November 30, 2006
29 of 35
5.4 Future Improvement
As a matter of fact, with SImage the boundary between encoding and decoding becomes blur. If not
for the sake of project requirement, the SImage could just have Save() and Load() as the only two
methods dealing with encoding and decoding, which enables the codec to encode any CImage and PGF
formats to any CImage and PGF formats and decodes accordingly. However, this mechanism only
makes sense if the predictor is enhanced with multiple decision making, which is not any more a binary
decision whether PGF or PNG, rather PGF or PNG, or JPG, or GIf, or BMP.
Figure 5.1: Comparison among PGF, PNG and SImage
QZ.November 30, 2006
30 of 35
6 Problems and Experience
6.1 CImage & libpng & NUnit
CImage provides enhanced bitmap support, including the ability to load and save images in JPEG,
GIF, BMP, and PNG formats. It is quite new, therefore almost no example codes in the world except
for the demo program on Microsoft’s MSDN website. It made the initial study of CImage a difficult and
time consuming task. As I was very glad to find the method void* CImage::GetPixelAddress(int
x, int y), which is said to be more efficient1 than its counterpart: COLORREF CImage::GetPixel(int
x, int y);. The sad reality however was that I should have used void* GetBits( ). By using this
pointer, along with the value returned by int GetPitch( ) const, I could have located and changed
individual pixels in an image.
Moreover, CImage doesn’t provide user any information about filter and zlib when encoding PNG.
In other words there is no way to manipulate the behavior of filtering and compression if CImage was
used for encoding images. The fact that filtering and compression are minor issues in PNG encoding,
makes this CImage’s shortcoming insignificant. The positive side is that CImage ’s PNG encoding is
much faster than calling libpng’s routines.
The PNG reference library libpng, on the other hand, offers extensive documentation both online in
writing and in the source code. Although the learning curve is rigid, the final outcome compensate the
learning hardship. However, it seems that the development of PNG has been stagnating for the fact
that the old heuristic methods used in filtering still marked as an experimental methods in the source
code. If like [PNG99] said, there were no more research on filtering, but rather the back-end zlib,
why still include the heuristic methods in the source code? It only makes the newbies spending more
time on depreciated things.
NUnit is a easy to use unit-testing framework. Especially its GUI interface. Besides, NUnit has a
plenty of documentations for newbies to consult. The only problem is the following statement in its
ReadMe.txt:
Due to an issue that has not been adequately addressed in the installation procedure
you will have to indicate the directory where the nunit.framework.dll is located on
your disk.
This problem presents itself by having this program failing to
compile and link.
Steps:
1.) Right-click on the ‘‘cpp-sample’’ element. Select ‘‘Properties’’ on the context
menu.
2.) Select the ‘‘C/C++’’ element in the tree.
3.) The field that needs to be updated is ‘‘Resolve #using references’’. Update this
1
c.f. http://easai.00freehost.com/Blur/index.html
QZ.November 30, 2006
31 of 35
field to the following directory: ‘‘C:\Program Files\NUnit V2.0\bin’’ Note: This
directory is the default installation directory if you have chosen a different
directory then navigate to it.
5.) Recompile.
This issue is being worked on and will be fixed in the release.
The suggested procedure didn’t work in Visual Studio 2005. The trick is to manually adjust the value
in Properties -> Common Properties -> References.
6.2 Problems
This is an interesting project, the problem however, is time. As far as I know, the graduation project
or thesis has normally a duration of 6 months, the reason why Fachhochschule (FH) only permits
her students a merely 6-week time might be the characteristic of FH’s graduation projects, which are
in general product-driven projects or practical projects. Nevertheless, the project I had was more
like a research project, which is quality driven and ought to be mathematically provable. In other
words, accuracy, efficiency and professionalism are essential. These attributes can be only well gained
if sufficient time is permitted.
Additionally, I have to admit that I had absolutely no idea what I was going to do with the project
the first time when I read the project description. Or rather say, I must understand things like PNG,
PGF, CImage, and general image processing first in order to get started at all! However it took me
more than two weeks to get an idea of what exactly I should do and how I would do. Until I found
the real beauty of this project and how to improve both code/design and documentation, time is again
a matter of fact with which I must compromise. As said, for an ordinary project, it might be well,
since the end product can be used a a prototype rather than a commercial product. For this project,
however, if the result was not mathematically provable (problem with the correlation in feature spaces,
too fewer images in training set, etc) or not efficient (problem with getPixel), then the result would
be meaningless.
Last but not least, as I was working alone, the good side is that I had the total control of the project
development and could better make time schedule. On other side of the coin, 6 weeks was too short
to be alone, because I could only work serially.
6.3 Experience
From an absolute layman to some kind of insider who has been working with various image codecs
and image processing for the last 6 weeks, I felt excited. At the beginning, I had to read a plenty of
documentations (PNG, PGF, source code, online documentations, scripts) in oder to start to program.
Not to mention the fact that a lot of time was invested to understand the relevant technologies and
mathematical theories before coding could be started, which hindered productivity and made some of
my efforts on this project invisible.
I must admit what I have experienced was hard. Alone the libpng about PNG was fairly tough.
Besides, there were additional mathematical theories required when programming the predictor. To
mention a few, quadruplet, feature space, variation and k-nearest neighbor algorithm were just four
examples.
QZ.November 30, 2006
32 of 35
The Microsoft CImage class was frequently used in this project. It was interesting to experience the
simplicity of CImage’s various functions and its surprisingly efficient encoding and decoding behaviors.
Nevertheless I wish that more tutorials on CImage would be available online. Or I could write one
sometime in the future.
Not until I had drawn the correlation figures and calculated correlation values between two features,
did I really understand what is correlation and why people need it. It was kind of self enlightenment,
which I very appreciated.
The most important thing I experienced was how to cope with projects having a research characteristic.
Now I “forget” the very details of CImage, PNG, PGF, predictor, KNN and SImage, what yet have
permanently remained in my mind are methodologies used in the project, the strong willing to learn
new things, capability of working under the time pressure and an always positive thoughts toward
whatever difficulties!
QZ.November 30, 2006
33 of 35
7 Conclusion
The new codec SImage, making use of CImage, libpgf, and predcitor, is a perfect image format. It
pre-determines any input image whether to be encoded as PGF or PNG. It not only boasts a high
efficiency, but also achieves the best compression ratio among PGF and PNG in lossless encoding
mode.
The accuracy can be improved if all features are almost orthogonal, and the efficiency can be enhanced
if direct bit manipulation is used. Additionally, the more images in the training set, the more accurate
the prediction. Besides, the choice of K can also be optimized.
All in all, SImage is a perfect image format, if more tests on it is conducted and further optimizations
are implemented.
QZ.November 30, 2006
34 of 35
Bibliography
[DBV05] Burger, W., M.J.Burge: Digitale Bildverarbeitung. Springer, 2005.
[DPC05] Wu J., M. Kamath V., Poehlman S.: Detecting Differences between Photographs and Computer Generated Images. IEEE, 2005.
[DPG97] Athitsos. M. J., Swain. C. Franckel: Distinguishing photographs and graphics on the world
wide web. Proceeding of the Workshop on Content-Based Access of Image and Video Libraries
(CBAIVL). Puerto Rico, 1997.
[HRP05] Lyu. S. Farid. H.: How Realistic is Photorealistic?. IEEE Transactions on Signal Processing,
53(2):845-850, 2005.
[PGF01] Dr. Christoph Stamm: PGF: A new progressive file format for lossy and lossless image
compression, 2001.
[PNG99] Greg Roelofs: PNG, The Definitive Guide. O’Reilly, 1999.
QZ.November 30, 2006
35 of 35
Download