project report

advertisement
Abstract
Image Compression addresses the problem of reducing the amount of data required to represent a
digital image. The underlying basis of the reduction process is the removal of redundant data.
From a mathematical viewpoint, this amounts to transforming a 2-D pixel array into statically
uncorrelated data set. The transformation is applied prior to storage or transmission of the image.
At some later time, the compressed image is decompressed to reconstruct the original image or
approximation of it.
Currently image compression is recognized as an “enabling technology”. Image
compression is the natural technology for handling the increased spatial resolution of today’s
imaging sensors and evolving broadcast television standards. Image Compression plays major
role important and diverse applications, including televideo-conferencing, remote sensing,
document and medical imaging, facsimile transmission and the control of remotely vehicles in
military, space and hazardous waste management applications.
JPEG 2000 image compression standards make use of DWT (Discrete Wavelet
Transform). DWT can be used to reduce the image size without losing much of the resolutions
computed and values less than a pre-specified threshold are discarded. Thus it reduces amount of
memory required to represent given image.
Wavelet based coding provides substantial improvement in picture quality at high
compression ratios mainly due to better energy compaction property of wavelet transforms.
Wavelet transform partitions a signal into a set of functions called wavelets. Wavelets are
obtained from single prototype wavelet called mother wavelet by dilations and shifting. The
wavelet transform is computed separately for different segments of the time-domain signal at
different frequencies.
Contents:
Chapter 1: Introduction
Chapter 2: Image Compression
2.1 What is the need of image compression?
2.2 Fundamentals of image compression
2.3 Different Classes of compression techniques
2.4 Image compression and reconstruction
2.5 Atypical Image coder
Chapter 3: Image compression using DWT
3.1 What is wavelet transform?
3.2 Subband coding
3.3 Compression steps
Chapter 4: Implementation
4.1 Choosing the images
4.2 Collecting Results
4.3 Choosing threshold values
4.4 Collecting result for all images
4.5 Using the database toolbox
4.6 Methods of analysis
4.7 Defining the Best result
Chapter 5: Discrete Wavelet Transform
5.1 DWT Results
5.2 Results for DWT based on various performance parameters
5.3 Conclusion
Chapter 1
Introduction:
Image compression is very important for efficient transmission and storage of images.
Demand for communication of multimedia data through telecommunication network and
accessing multimedia data through internet is growing explosively [2]. With the use of digital
cameras, requirement for storage, manipulation and transfer of digital images has grown
explosively. These files can be very large and can occupy a lot of memory. A gray scale image
that is 256×256 pixels has 65,536 elements to store. Downloading of these files from internet can
be very time consuming task. Image data comprise of significant portion of the multimedia data
and they occupy the major portion of the communication bandwidth for multimedia
communication. Therefore, development of efficient techniques for image compression has
become quite necessary [3].A common characteristic of most images is that neighboring pixels
are highly correlated and therefore contain highly redundant information. The basic objective of
image compression is to find an image representation in which pixels are less correlated. The two
fundamental principles used in image compression are redundancy and irrelevancy. Redundancy
removes redundancy from the signal source and irrelevancy omits pixel values which are not
noticed by human eye.JPEG and JPEG 2000 are two important techniques used for image
compression.
Image compression bring out many benefits, such as (1) easier exchange of image
files between different devices and applications; (2) reuse of existing hardware and software for
wider array of products; (3) existence of benchmarks and reference data sets for new and
alternative developments.
Chapter 2
Image Compression:
2.1 What is the need of image compression?
The need for image compression becomes apparent when number of bits per image is computed
resulting from typical sampling rates and quantization methods. For example, the amount of
storage required for given image is (1) a low resolution, TV quality, color video image which has
512×512 pixels/coor,8 bits/pixels and 3 colors approximately consists of 6×106 bits (2) a
24×36mm negative photograph scanned at 12×10-6mm:3000×2000 pixels/colors, 8 bits/pixel,
and 3 colors nearly contains 144×106bits; (3) a 14×17 inch radiograph scanned at 70×106
mm:5000×6000 pixels,12 bits/pixels nearly contains 360×106bits. Thus storage of even a few
images could cause a problem. As another example of the need for image compression, consider
the transmission of low resolution 512×512×8 bits/pixels×3-color video image over telephone
lines. Using a 96000 bauds (bits/sec) modem, the transmission would take approximately 11
minutes for just a single image, which is unacceptable for most applications.
2.2 Fundamentals of image compression
The term data refers to the process of reducing the amount of data required to represent a given
quantity of information. There are three types of redundancies: (1) Spatial redundancy, which is
due to the correlation or dependence between neighboring pixel values; (2) spectral redundancy,
which is due to the correlation between different color planes or spectral bands; (3) temporal
redundancy which is present because of correlation between different frames in images. Image
compression research aims to reduce the number of bits required to represent an image by
removing the spatial and spectral redundancies as much as possible.
Data redundancy is of central issue in digital image compression. If n1 and n2 denote
the number of information carrying units in two data sets that represent the same information, the
relative data redundancy RD of the first data set can be defined as
RD = 1-1/CR
Where CR, commonly called the compression ration is
CR = n1/n2
For the case n2=n1 and RD =0, indicating that the first representation of the information contains
no redundant data. When n2<<n1. CR>∞ and RD>1, implying significant compression and highly
redundant data. When n2>>n1, CR>0 and RD>∞, indicating that the second data set contains much
more data than the original representation. Generally CR=10(10:1) defines that the first data set
has 10 information carrying units for every 1 unit in the second or compressed data set. Thus the
corresponding redundancy of 0.9 means 90 percent of the data in the first data set is redundant
with respect to the second one.
2.3 Different Classes Of compression Techniques:
Lossless and Lossy compression: In lossless compression schemes, the reconstructed image,
after compression, is numerically identical to the original image. However lossless compression
can only achieve a modest amount of compression. Lossless compression is preferred for
archival purposes and often medical imaging, technical drawings, clip art or comics. This is
because lossy compression methods, especially when used at low bit rates, introduce
compression artifacts. An image reconstructed following lossy compression contains scheme
completely discards redundant information. However, lossy schemes are capable of achieving
much higher compression. Lossy methods are especially suitable for natural images such as
photos in applications where minor loss of fidelity is acceptable to achieve a substantial
reduction in bit rate. This lossy compression that produces imperceptible differences can be
called visually lossless [1].
Predictive and Transform coding: In predictive coding, information already sent or available is
used to predict future values, and the difference is coded. Since this is done in the image or
spatial domain, it is relatively simple to implement and is readily adapted to local image
characteristics. Differential Pulse Code Modulation (DPCM) is one particular example of
predictive coding. Transform Coding, on the other hand, first transforms the image from its
spatial domain representation to a different type of representation using some well-known
transform and then codes the transformed values (coefficients). This method provides greater
data compression compared to predictive methods, although at the expense of greater
computational requirements.
2.4 Image Compression and Reconstruction:
Three basic data redundancies can be categorized in the image compression standard.
1. Spatial redundancy due to correlation between neighboring pixels.
2. Spectral redundancy due to correlation between the color components.
3. Psycho-visual redundancy due to properties of the human visual system.
The spatial and spectral redundancies are present because certain spatial and spectral
patterns between the pixels and the color components are common to each other, whereas
the psycho-visual redundancy originates from the fact that the human eye is insensitive to
certain spatial frequencies. The principle of image compression algorithm is (1) reducing
the redundancy in the image data and (2) producing a reconstructed image from the
original image with the introduction of error that is insignificant to the intended
applications. The aim here is to obtain an acceptable representation of digital image while
preserving the essential information contained in that particular data set [4].
Original image
Transform
Quantization
Lossless
coding
compressed image
nnnn
Fig1. Image Compression System
The problem faced by image compression is very easy to define, as in above figure1.First;
the original digital image is usually transformed into another domain, where it is highly decorrelated by using some transform. This de-correlation concentrates the important image
information into a more compact form.The compressor then removes the redundancy in the
transformed image and stores it into compressed file or data stream. In second stage, the
quantization block reduces the accuracy of transformed output in accordance with some preestablished fidelity criterion. Also this stage reduces the psycho-visual redundancy of input
image. Quantization operation is a reversible process and thus may be omitted when there is a
need of error free or lossless compression. In the final stage of the data compression model the
symbol coder creates a fixed or variable-length code to represent the quantizer output and maps
the output in accordance with the code. Generally a variable-length code is used to represent the
mapped and quantized data set. If assigns the shortest code words to the most occurring output
values and thus reduces coding redundancy. The operation in fact is a reversible one.
The decompression reverses the compression process to produce the recovered image as
shown in figure. The recovered image may have lost some information due to the compression,
may have error or distortion compared to the original image.
Compressed
Image
Reconstructed
Inverse
transform
De-quantisation
Lossless
Decoding
Image
Fig2. Image Decompression System
2.5 A Typical Image Coder:
Image Compression model consists of Transformer, Quantizer and encoder.
Transformer: It transforms the input data into format to reduce inter-pixel redundancies in the
input image. Transform coding techniques use a reversible, linear mathematical transform to map
the pixels values onto a set of coefficients, which are then quantized and encoded. The key factor
behind the success of transform based coding schemes is that many of the resulting coefficients
for most natural image have small magnitudes and can be quantized without causing significant
distortion in the decoded image.
Transform coding algorithm usually start by partitioning the original image into subimages of small size (usually 8×8). For each block the transform coefficient are calculated,
effectively converting the original 8×8 array of pixel values into an array of coefficients within
which the coefficients closer to the top-left corner usually contain most of the information
needed to quantize and encode the image with little perceptual distortion. The resulting
coefficients are then quantized and the output of quantizer is used by symbol encoding
techniques to produce the output bit stream representing the encoded image. In image
decompression model at the decoder’s side,the reverse process takes place with obvious
difference that the dequantization stage will generate an approximated version of the original
coefficient values e.g., whatever loss was introduced by the quantizer in the encoder stage is not
reversible.
Quantizer: It reduces the accuracy of transformer’s output in accordance with some preestablished fidelity criterion. Reduces the psycho visual redundancies of the input image. This
operation is not reversible and must be omitted if lossless compression is desired. The
quantization stage is at the core of any lossy image encoding algorithm. Quantization at encoder
side, means partitioning of input data range into smaller set of values. There are two main types
of quantizers: scalar quantizers and vector quantizers. A scalar quantizer partitions the domain of
input values into smaller number of intervals. If the output intervals are equally spaced, which is
the simplest way to do it, the process is called uniform scalar quantization; otherwise, for reasons
usually related to minimization of total distortion, it is called non uniform scalar quantization.
One of the most popular non uniform quantizer is the Lloyd-Max quantizer. Vector quantization
techniques extend the basic principles of scalar quantization to multiple dimensions.
Entropy Encoder: It creates a fixed or variable-length code to represent the quantizer’s output
and maps the output in accordance with the code. In most cases, a variable-length code is used.
An entropy encoder compresses the compressed values obtained by the quantizer to provide
more efficient compression. Most important types of encoder used in lossy image compression
techniques are arithmetic encoder, Huffman encoder and run-length encoder.
Chapter 3:
Image Compression Using Discrete Wavelet Transform:
The wavelet transform has gained widespread acceptance in signal processing and image
compression. Because of their inherent multi-resolution nature, wavelet-coding schemes are
especially suitable for applications where scalability and tolerable degradation are important.
Recently the JPEG committee has released its new image coding standard, JPEG-2000, which
has been based upon DWT [5].
3.1 What is wavelet transform?
Wavelets are functions defined over a finite interval and having an average value of zero.
The basic idea of the wavelet transform is to represent any arbitrary function (t) as a
superposition of a set of such wavelets or basis functions. These basis functions or baby
wavelets are obtained from a single prototype wavelet called the mother wavelet, by dilation
or contradictions (scaling) and translations (shifts). The discrete wavelet Transform of a
finite length signal x (n) having N components, for example, is expressed by an N×N matrix.
3.2 Subband coding:
A signal is passed through a series of filters to calculate DWT. Procedure starts by passing
this signal sequence through a half band digital low pass filter with impulse response h(n).
Filtering of a signal is numerically equal to convolution of the title signal with impulse
response of the filter.
X[n]*h[n] = x[k].h[n-k]
A half band low pass filter removes all frequencies that are above half of the highest
frequency in the title signal. Then the signal is passed through high pass filter. The two
filters are related to each other as
H [L-1-n]=(-1)n g(n)
Filters satisfying this condition are known as quadrature mirror filters. After filtering half
of the samples can be eliminated since the signal now has the highest frequency as half of
the original frequency. The signal can therefore be subsampled by 2, simply by
discarding every other sample. This constitutes 1 level of decomposition and can
mathematically be expressed as
Y1 [n] =x[k]h[2n-k]
Y2 [n] =x[k]g[2n+1-k]
Where y1[n] and y2[n] are the output of low pass and high pass filters, respectively after
subsampling by 2.
This decomposition halves the time resolution since only half the number of sample now
characterizes the whole signal. Frequency resolution has doubled because each output has
half the frequency band of the input. This process is called as sub band coding. It can be
repeated further to increase the frequency resolution as shown by the filter bank.
Filter bank
3.3 Compression steps:
1. Digitize the source image into a signal s, which is a string of numbers.
2. Decompose the signal into a sequence of wavelet coefficients w.
3. Use threshold to modify the wavelet coefficients from w to w’.
4. Use quantization to convert w’ to a sequence q.
5. Entropy encoding is applied to convert q into a sequence e.
Digitation:
The image is digitized first. The digitized image can be characterized by its intensity level, or
scales of gray which range from 0(black) to 255(white), and its resolution, or how many pixels
per square inch[6].
Thresholding:
In certain signals, many of the wavelet coefficients are closer or equal to zero. Through
threshold these coefficients are modified so that the sequence of wavelet coefficients contains
long strings of zeros.
In hard threshold, a threshold is selected. Any wavelet whose absolute value falls below the
tolerance is set to zero with the goal to introduce many zeros without losing a great amount of
detail.
Quantization:
Quantization converts a sequence of floating numbers w to a sequence of integer’s q. The
simplest form is to round to the nearest integer. Another method is to multiply each number in w’
by a constant k, and then round to the nearest integer. Quantization is called lossy because it
introduces error into the process, since the conversation of w’ to q is not one to one function [6].
Entropy Encoding:
With this method, an integer sequence q is changed into a shorter sequence e, with the numbers
in e being 8 bit integers. The conversion is made by an entropy encoding table. Strings of zeros
are coded by numbers 1 through 100,105 and 106, while the non-zero integers in q are coded by
101 through 104 and 107 through 254.
Chapter 4:
Implementation:
4.1 Choosing the Images:
The images used were selected from a range of over 40 images provided in Matlab. A script was
programmed to calculate the entropy He of an image using equations:
P(k)= h(k)/MN
He = - P(k) log2[P(k)]
The script read through the image matrix noting the frequency, h, of each intensity level, k.
Images that were not already in matrix format had to be converted using the imread function in
Matlab. For example to convert an image file flower.tif to a Matlab matrix, the following code
could be use:
X= double(imread(‘flower.tif’));
The entropy could then be calculated using:
He = imageEntropy(X);
The entropy result showed a range of entropies from 0.323 to 7.7804 and a sample of 9 images
was selected to represent the full range with intervals as even as possible. Two further images
were added to act as controls and help confirm or disprove the effect of image entropy on
compression. Since image entropy of control 1(‘Saturn’) was similar to image ‘Spine’ similar
result would tend to confirm the significance of image entropy. This also applied to control
2(‘Cameraman’) and ‘Tire’.
Image
Image Entropy
Text
0.323
Circbw
0.9996
Wifs
1.9754
Julia
3.1812
Spine
4.0935
Saturn(Control1)
4.114
Woman
5.003
Cameraman(Control2)
7.0097
4.2 Collecting Result:
The Results Collected
The results that were collected were values for percentage energy retained and percentage
number of zeros. These values were calculated for a range of threshold values on all the images,
decomposition levels and wavelets used in the investigation.
The energy retained describes the amount of image detail that has been kept; it is a measure of
the quality of the image after compression. The number of zeros is a measure of compression. A
greater percentage of zeros implies that higher compression rates can be obtained.
How the results were collected?
Results were collected using the wdencmp function from the Matlab Wavelet toolbox, this return
the L2 recovery(energy retained) and percentage of zeros when given the following inputs:
1. ‘gbl’ or ‘lvd’ for global or level dependent thresholding.
2. An image matrix
3. A wavelet name
4. Level of decomposition
5. Threshold value(s)
6. ‘s’ or ‘h’ for sort or hard thresholding
7. Whether the approximation values should be threshold
An automated script was written which took as input an image, a wavelet and a level. It
calculated 10 appropriate threshold levels to use and then collected the energy retained and
percentage of zeros using wdencmp with the given image, wavelet, decomposition level for
each of the 10 threshold values.
4.3Choosing the threshold values:
There are a number of different options for thresholding. These include:
1. The approximation signal is threshold or not threshold.
2. Level dependent or global threshold values.
3. Threshold different areas of an image with different threshold values.
Thresholding for Results set 1
The first set of results used the simplest combination which was to use global thresholding
and not to threshold the approximation signals. To get an even spread of energy values, the
function ‘calcres’ first calculated the decomposition coefficients up to the level given as input.
The script then calculated 10 values to be evenly spread along the range 0.max_c, where max_c
is the maximum value in the coefficients matrix.
The spread of coefficients,10points are sampled
However it was observed that the results for energy retained and percentage zeros only changed
significantly between the first 4 or 5 thresholds. The most dramatic compression and energy loss
occurred before reaching a threshold of 0.25max_c. The reason for this must be because a
large percentage of coefficients and energy occurs at the lower values, therefore setting these
values to zero would change a lot of the coefficients, and increase the compression rate. Setting
the higher values to zero had little effect on compression because there weren’t many
coefficients to affect.
Therefore the threshold calculations were changed to zoom in on the first quarter of possible
thresholds. The idea of this was to get a better view of how the energy was lost with
thresholding, the optimal compromise between energy loss and compression, if any were
possible, was more likely to be contained within this section.
Thresholding for Results set 2
This used local thresholding. The function ‘calcres2’ was written to calculate the coefficient
matrices for each level and direction (horizontal, vertical and diagonal), the maximum value for
each level was noted. Again 10 sample of threshold values were taken from the range 0 to max_c
for each local value. The first thresholds were all zero; the 10th set contained the maximum
values for each.
Each level and direction sub signal can be threshold differently. The Level 3 horizontal
would be tested with thresholds in the range 0 to 880 but the level 1 diagonal only requires a
range of thresholds 0 to 120.
4.4 Collecting
results for all images
The function ‘calcres’ was used to get one set of results, for a particular combination of image,
wavelet and decomposition level. However results were required for many images with
many wavelets at many decomposition levels. Therefore an automated script was written
(getwaveletresults.m), which follows the following algorithm:
For each image,
For each wavelet,
For level=1 to 5,
RES = calcres(image, wavelet, level)
Save RES
end
end
end
The list of images and wavelets to use are loaded from a database, using the Database
Toolbox. The results are then saved in the database.
4.5 Using
the Database Toolbox
The Database Toolbox was used to provide a quick and relatively simple solution to
problems involved with saving the results. Originally the results were saved to a large matrix but
in Matlab the matrix couldn’t include strings, so the names of images and wavelets could not be
included in the matrix. The initial solution was to give each image and wavelet a numeric
identifier that could be put into the matrix. The image names and wavelet names could then be
written in separate files, and the ID values linked to their position in this file. However this lead
to concurrency problems, if the image file was changed then the results pointed to the wrong
images and thus were wrong. The second idea was to save a matrix of strings in the same file
that the results matrix was saved, however if the strings are of different lengths they can not go
into the same matrix. Thus database toolbox allows Matlab to communicate with a database e.g.
MS Access database in order to import and export data. An advantage of saving results into a
database of this sort is that SQL statement can be used to retrieve certain sets of data and thus
makes the analysis of results a lot easier.
The database contained 3 tables: Images, Results and Wavelets
4.6 Method of Analysis
In order to analyze the results, graphs were plotted of energy retained against percentage of
zeros. In an ideal situation an image would be compressed by 100% whilst retaining 100% of the
energy. The graph would be a straight line similar to:
The best threshold to use would be one that produced 100% zeros whilst retaining 100% of the
energy, corresponding to point E above. However, since energy retained decreases as the
number of zeros increases, it is impossible for the image to have the same energy if the values
have been changed by thresholding. Thus all the graphs produced had the general form.
Important things to analyze about the curves were:
1. The starting point: the x co-ordinate gives the percentage of zeros with no thresholding and
Therefore shows how much the image could be compressed without thresholding.
2. The end point: this shows how many zeros could be achieved with the investigated thresholds
And how much energy is lost by doing so.
3.The changing gradient of the curve: this shows how rapidly energy is lost when
trying to compress the image. It is the ability of wavelets to retain energy during compression
that is important, so energy being lost quickly suggests that a wavelet is not good to use.
4.7 Defining the Best Result
In figure the starting point indicates the best energy retention and the end point indicates the best
compression. In practice compressing an image requires a ‘trade-off’ between compression and
energy loss. It may not be better to compress by an extra small amount if this would cause a
dramatic loss in energy.
The Distance, D
For the purposes of this study it was decided to define the best ‘trade-off’ to be the
point T on the curve which was closest to the idealized point E. This was calculated using simple
Pythagoras Theorem with the ten known points on each graph. For the rest of the study the
distance between each point T and point E will be known simply as D.
By calculating D for each result we can take the minimum to be the best, in other words
the most efficient at retaining energy while compressing the signal. This will not necessarily be
the best to use for all situations but be the most efficient found in the results.
Energy Loss per percentage Zero Ratio
A second possible measure of the best result is the ratio of Energy Loss per percentage Zero.
This can be calculated for each result by the following equation:
Energy Loss per percentage Zero = 100% - %Energy Retained/%Zeros
This gives a measure of how much energy is lost by compressing the images, so a smaller value
for the ratio would mean the compression is less lossy, and therefore better.
Chapter 5
5.1 DWT Result:
Results obtained with the matlab code[7] are shown below. Figure5.1 shows original Lena
image. Figure 5.2 to 5.4 show compressed images for various threshold values. As threshold
value increases blurring of image continues to increase.
Figure 5.1
Original Lena image
Figure 5.2
Compressed image for threshold value 1
Figure 5.3
Compressed image for threshold value 2
Figure 5.4
Compressed image for threshold value 5
5.2Result for DWT based on various performance parameters:
Mean Square Error(MSE) is defined as the square of differences in the pixel values between
the corresponding pixels of the two images. DWT based image compression Fig(5.2.1) shows
that MSE first decreases with increase in window size and then starts to increase slowly with
finally attaining a constant value. Compression decreases with increase in window size for DWT.
Figure 5.2.1
Mean Squared Error vs. Window size for DWT
Figure 5.2.2
Compression vs. window size for DWT
5.3 Conclusion:
DWT is used as basis for transformation in JPEG 2000 standard. DWT provides high quality
compression at low bit rates. The use of larger DWT basis function or wavelet filters produces
blurring near edges in images.
DWT performs better than DCT in the context that it avoids blocking artifacts which degrade
reconstructed images. DWT provides lower quality than JPEG at low compression rates. DWT
requires longer compression time.
References:
1. http://en.wikipedia.org/Image_Compression.
2. http://www.3.interscience.wiley.com
3. Grey Ames,” image compression”, dec07,2002
4. David Salomon, Data Compression. The complete reference, 2nd Edition Springer-verlag
1998.
5. R.C Gonzalez and R.E Woods, ”Digital Image Processing.” Reading MA: Addison
Wesley, 2004.
6. Grey Ames,” Image compression,” dec07,2002.
7. http://www.spelman.edu/%7 Ecolm/wav.html.
Download