1. Image compression - Part 1 Date: 08/05/04
Image compression address the problem of reducing the amount of data required to represent a digital image with no significant loss of information.
Interest in image compression dates back more than 25 years.
The field is now poised significant growth through the practical application of the theoretic work that began in
1940s , when C.E. Shannon and others first formulated the probabilistic view of information and its representation , transmission and compression.
Images take a lot of storage space :
1024 x 1024 x 32 x bits images requires 4 MB
suppose you have some video that is 640 x 480 x 24 bits x 30 frames per second , 1 minute of video would require 1.54 GB
That many bytes take a long time to transfer slow connections – suppose we have 56,000 bps
4MB will take almost 10 minutes
1.54 GB will take almost 66 hours
Storage problems, plus the desire to exchange images over the Internet, have lead to a large interest in image compression algorithms.
The same information can be represented many ways. We must make a clear distinction between data and information. Data are the means by which information is conveyed . Various amounts of data can be used to convey the same amount of information. Example: Four different representation of the same information ( number five) 1) a picture ( 1001,632 bits ); 2) a word “five” spelled in English using the ASCII character set (
32 bits); 3) a single ASCII digit ( 8bits); 4) binary integer ( 3bits)
If more data are used than is strictly necessary, then we say that there is redundancy in the dataset.
Data redundancy is not abstract concept but a mathematically quantifiable entity . If n
1
and n c
denote the number of information carrying units in two data sets that represent the same information, the relative data redundancy R
D of the first data set ( n
1
) can be defined as
R
D
= 1 – 1/ C
R
Where C
R is compression ration, defined as
(1)
C
R
= n
1
/ nc (2)
Where n
1
is the number of information carrying units used in the uncompressed dataset and nc is the number of units in the compressed dataset. The same units should be used for n1 and n c ; bits or bytes are typically used.
When nc<<n1 , C
R
large value and R
D
1. Larger values of C indicate better compression
In digital image compression three basic data redundancies can be identified and exploited: coding redundancy , interpixel redundancy , interframe redundancy and psychovisual redundancy .
A general algorithm for data compression and image reconstruction is shown in a block diagram in Figure 1.
An input image is fed into the encoder which creates a set of symbols from the input data. After transmission over the channel, the encoded representation is fed to the decoder, where a reconstructed output image f’(x,y) is generated . In general , f’(x,y) may or may not an exact replica of f(x,y). If it is , the system is error free or information preserving, if not, some level of distortion is present in the reconstructed image .
Both the encoder and decoder shown in Fig. 1 consist of two relatively independent functions or subblocks. The encoder is made up of a source encoder , which removes input redundancies, and a channel encoder , which increases the noise immunity of the source encoder’s output.
1.1 Data compression algorithms can be divided into two groups :
1 Lossless algorithms remove only redundancy present in the data . The reconstructed image is identical to the original , i.e., all af the information present in the input image has been preserved by compression .
2. Higher compression is possible using lossy algorithms which create redundancy (by discarding some information ) and then remove it .
1.2 Fidelity criteria
When lossy compression techniques are employed, the decompressed image will not be identical to the original image In such cases , we can define fidelity criteria that measure the difference between this two images. Two general classes of criteria are used : (1) objective fidelity criteria and (2) subjective fidelity criteria
A good example for (1) objective fidelity criteria is root-mean square ( rms ) error between on input and output imae For any value of x ,and y , the error e ( x,y ) can be defined as : e ( x,y ) = f’ ( x,y ) – f ( x,y )
The total error between two images is:
1 x
0
N M y
1
0
f
x
y
f
x
y
where the images are of size M x N .
The root –mean square error , e rms
is : e rms
MN
1 x
0
N M
1
0 y
f
x
y
f
x
y
2
1
2
1.3 Types of redundancy
1.
Coding redundancy
Our quantized data is represented using codewords .
The value at a pixel in a typical grayscale image relates to the intensity of light detected by the camera that acquired the image. We quantise the data and represent intensities by a discrete set of what, in the language of information theory is called codewords. Image quantization uses a standard binary coding scheme in which the codewords are the set of values that can be represented with a fixed number of bits (eight). The codewords are ordered in the same way as the intensities that they represent; thus the bit pattern 00000000, corresponding to the value 0, represents the darkest points in an image and the bit pattern 11111111, corresponding to the value
255, represents the brightest points. An 8-bit coding scheme has the capacity to represent 256 distinct levels of intensity in an image . But if there are only 16 different grey levels in a image , the image exhibits coding redundancy because it could be represented using a 4-bit coding scheme. Coding redundancy can also arise due to the use of fixed-length codewords.
We showed that a great deal of information about the appearance of an image could be obtained from a histogram of its gray –levels. Grey level histogram of an image also can provide a great deal of insight into the construction of codes to reduce the amount of data used to represent it .
Let us assume, that a discrete random variable r k
in the interval (0,1) represents the grey levels of an image and that each rk occurs with probability Pr ( r k
). Probability can be estimated from the histogram of an image using
Pr ( r k
) = h k
/ n for k = 0,1……L-1 (3)
Where L is the number of grey levels and h k
is the frequency of occurrence of grey level k (the number of times that the k th grey level appears in the image) and n is the total number of the pixels in the image. If the number of the bits used to represent each value of rk is l(rk ), the average number of bits required to represent each pixel is :
L avg
L k
1
0 l ( r k
) P r
( r k
) (4)
The average length of the code words assigned to the various gray-level values is found by summing the product of the number of bits used to represent each gray and the probability that the gray level occurs. Thus the total number of bits required to code an M x N image is MNLavg .
Example
An 8-level image has the gray-level distribution shown in Table 1.In natural 3-bit binary code ( see code 1 and l1 ( rk ) in Table 1) is used to represent the 8 possible grey levels, L avg
is 3 bits, because l(rk) = 3 bits for all rk. If code 2 in Table 1 is used however, the average number of bits required to code the image is reduced to
L avg
k
7
0 l
2
( r k
) P r
( r k
) (5)
= 2(0.19) + 2(0.25) + 2(0.21) + 3(0.16) + 4 (0.08) + 5(0.06) +6(0.03) +6(0.02) = 2.7 bits
Rk r0 = 0
Pr(rk)
0.19
Code 1
000 l1(rk)
3
Code 2
11 l2(rk)
2 r1 = 1/7 r2 = 2/7 r3 = 3/7
0.25
021
0.16
001
010
011
3
3
3
01
10
001
2
2
3 r4 = 4/7 r5 = 5/7 r6 = 6/7 r7=1
0.08
0.06
0.03
0.02
100
101
110
111
3
3
3
3
0001
00001
000001
000000
4
5
6
6
Using eq. (2) the resulting compression ratio Cn is 3/2.7 or 1.11 Thus approximately 10 percent of the data resulting from the use of code 1 is redundant. The exact level of redundancy is
R
D
= 1 – 1/1.11 =0.099
Using eq. (2) the resulting compression ratio Cn is 3/2.7 or 1.11 Thus approximately 10 percent of the data resulting from the use of code 1 is redundant. The exact level of redundancy is
R
D
= 1 – 1/1.11 =0.099
This process is referred to as variable-length coding
2.
Interpixel redundancy
The intensity at a pixel may correlate strongly with the intensity value of its neighbors.
Because the value of any given pixel can be reasonably predicted from the value of its neighbors Much of the visual contribution of a single pixel to an image is redundant; it could have been guessed on the bases of its neighbors values.
We can remove redundancy by representing changes in intensity rather than absolute intensity values. For example , the differences between adjacent pixels can be used to represent an image . Transformation of this type are referred to as mappings . They are called reversible if the original image elements can be reconstructed from the transformed data set. For example , the image contains many regions of constant intensity , a more efficient representation can be constructed by mapping the pixels along each scan line f (x,0), f (x,1)… f (x,N-1) into a sequence of pairs ( g1 , r1 ), ( g2 , r2 )..in which gi denotes the i th gray level encountered along the line and ri the run length of he i th run.
For example the sequence (50,50, ..50) becomes (50, 4).
3.
Psychovisual redundancy
This arises because not all information present in an image is of equal importance to the human visual system.
For example, there are upper limits on the number of quatisation levels that can be easily distinguished by eye, and on the spatial frequencies that can be detected by the eye. In general , an observer searches for distinguishing features such as edges or textural regions and mentally combines them into recognizable groupings . The brain then correlates this grouping with prior knowledge in order to complete the image interpretation process.
Unlike coding and interpixel redundancy , psychovisual redundancy is associates with real or quantifiable visual information. Since the elimination of psychovisual redundant data results in a loss of quantitative information, it is commonly referred to as quantization This means the mapping of a broad range of input values to a limited number of output values As it is an irreversible operation ( visual information is lost), quantization results in lossy data compression .
Example First we have a image with 256 possible gray levels . We can apply uniform quantization to four bits or
16 possible levels The resulting compression ratio is 2:1. Note , that false contouring is present in the previously smooth regions of the original image.
The significant improvements possible with quantization that takes advantage of the peculiarities of the human visual system . The method used to produce this result is known as improved gray-scale ( IGS) quantization. It recognizes the eye’s inherent sensitivity to edges and breaks them up by adding to each pixel a pseudo-random number, which is generated from the order bits of neighboring pixels, before quantizing the result.
2. Image compression model
A typical image compreression/decompression sequence consists of data redundancy reduction, coding, transmission, decoding, and reconstruction
As Figure 1 shows, a compression system consists of two distinct structural blocks: an encoder and a decoder.
The source encoder is responsible for reducing or eliminating any coding, interpixel, or psychovisual redundancies in the input image .
The mapper transforms the input data into a ( usually nonvisual ) format designed to reduce interpixel redundancies in the input image . Run –length coding is an example of a mapping that directly result in data compression in this initial stage of the source encoding process. The representation of an image by a set of transform coefficients in an example of the opposite case.
The second stage, or quantizer block reduces the accuracy of the mapper’s output in accordance with some preestablished fidelity criterion. This stage reduces the psychovisaul redundandancies in the input image. This stage must be omitted when error-free compression is desired.
In the third stage of the source encoding process, the symbol coder creates a fixed or variable length code to represent the quantized and mapped sat set . It assigns the shortest code words to the most frequently occurring output values and thus reduces coding redundancy .
The source decoder contains only two components : a symbol decoder and a inverse mapper .
3. Error free compression
3.1
Delta compression
Delta compression ( differential coding ) is a very simple, lossless techniques in which we recode an image in terms of the difference in gray level between each pixel and the previous pixel in the row. The first pixel must be represented as an absolute value, but subsequent values can be represented as differences , or ‘deltas”.
Takes advantage of interpixel redundancy in a scan line
Assumes most intensity changes happen gradually
Use codewords to represent the change in pixel intensity along a scan line
Use 4-bit codewords to represent intensity changes –7 to +7 , instead an 8-bits codewords. For bits give us 2 4 =
16 codewords, but there are only 15 values in the range. The remaining codeword can be used to flag pixels for which the gray level difference exceeds the range.
FIGURE : Example of delta encoding. Figure (a) is an audio signal digitized to 8 bits. Figure (b) shows the delta encoded version of this signal. Delta encoding is useful for data compression if the signal being encoded varies slowly from sample-to-sample.
3.2 Run length encoding
Also take advantage of interpixel redundancy .
A “run: of consecutive pixels whose gray levels are identical is replaced with two values: the length of the run and the gray level of all pixels in the run. Exampe ( 50, 50,50,50) becomes (4,50)
Especially suited for synthetic images containing large homogeneous regions . The encoding process is effective only if there are sequences of 4 or more repeating characters
Applications – compression of binary images to be faxed.
CTRL COUNT CHAR
FIGURE:. Format of three byte code word
CTRL - control character which is used to indicate compression
COUNT- number of counted characters in stream of the same characters
CHAR - repeating characters
FIGURE:. RLE - flow chart
Process of RLE starts with initialization of character counter, repetition counter and a variable which represents the current character (Ch), then if all characters in file have been processed encoding ends. If there are more characters then Ch variable is being stored in temporary variable (if ChCount equals 1), else actual character is being compared to the previous character and then result of that comparison leads to repetition counter increment or to another comparison in which it is being tested if the number of consecutive characters is greater than four in other words does the stream is just copied or coded according to code word shown in Fig 1.
Examples of RLE implementations
RLE algorithms are parts of various image compression techniques like BMP, PCX, TIFF, and is also used in
PDF file format, but RLE also exists as separate compression technique and file format.
MS Windows standard for RLE file format ( RLE is special case of BMP)
MS Windows standard for RLE have the same file format as well-known BMP file format, but it's RLE format is defined only for 4-bit and 8-bit color images.
Two types of RLE compression is used 4bit RLE and 8bit RLE as expected the first type is used for 4-bit images, second for 8-bit images.
4bit RLE
Compression sequence consists of two bytes, first byte (if not zero) determines number of pixels which will be drawn. The second byte specifies two colors, high-order 4 bits (upper 4 bits) specifies the first color, low-order
4bits specifies the second color this means that after expansion 1st, 3rd and other odd pixels will be in color specified by high-order bits, while even 2nd, 4th and other even pixels will be in color specified by low-order bits. If first byte is zero then the second byte specifies escape code. (See table below)
Second byte
0
1
2
>=3
Definition
End-of-line
End-of-Rle(Bitmap)
Following two bytes defines offset in x and y direction (x is right,y is up). The skipped pixels get color zero. when expanding following >=3 nibbles (4bits) are just copied from compressed file, file/memory pointer must be on 16bit boundary so adequate number of zeros follows
Table 1. Definition of escape codes(the first byte of compression sequence is 0)
Examples for 4bit RLE:
Compressed
Expanded data data
06 52 5 2 5 2 5 2
08 1B 1 B 1 B 1 B 1 B
00 06 83 14
34
8 3 1 4 3 4
00 02 09 06 Move 9 positions right and 6 up
00 00 End-of –line
04 22
00 01
2 2 2 2
End-of-RLE(Bitmap)
8bit RLE
Sequence when compressing is also formed from 2 bytes, the first byte (if not zero) is a number of consecutive pixels which are in color specified by the second byte.
Same as 4bit RLE if the first byte is zero the second byte defines escape code, escape codes 0, 1, 2, have same meaning as described in Table 1. while if escape code is >=3 then when expanding the following >=3 bytes will be just copied from the compressed file, if escape code is 3 or other greater odd number then zero follows to ensure 16bit boundary.
Examples for 8bit RLE
Compressed
Expanded data data
06 52 52 52 52 52 52 52
08 1B 1B 1B 1B 1B 1B 1B 1B 1B
00 03 83 14
34
83 14 34
00 02 09 06 Move 9 positions right and 6 up
00 00
04 2A
End-of –line
2A 2A 2A 2A
00 01 End-of-RLE(Bitmap)
3.3 Statistical coding
Statistical coding techniques remove the coding redundancy in an image . Information theory tells us that the amount of information conveyed by a codeword relates to its probability of occurrence. Codeword that occur rarely convey more information that codeword that occue frequently in the data.
A random event i that occurs with probability P (i) is said to contain
I(i) = -logP(i) units of information ( self information )
If P(i) = 1 ( that is, the event always occurs) I(i) = 0 and no information is attributed to it .
Let us assume that information source generates a random sequence of symbols ( grey level). The probability of occurrence for a grey level i is P(i) . If we have 2 b -1 gray level ( symbols ) the average self-information obtained from i outputs is called entropy.
Entropy in information theory is the measure of the information content of a message.
Entropy gives a average bits per pixel required to encode an image.
H
b
2 i
1
0
P ( i ) log
2
P ( i )
Probabilities are computed by normalizing the histogram of the image – P(i) = hi / n
Where hi is the frequency of occurrence of grey level i and n is the total number of pixels in the image.
If b is the smallest number of bits needed to generate a number of quatisation levels observed in an image, then the information redundancy of that image is defined as
R =b-H
The compression ratio is Cmax= b/H
After computing the histogram and normalizing the task is to construct a set of codewords to represent each pixel value . These codewords must have the following properties:
1.
Different codewords must have different lengths ( number of bits0
2.
Codewords that occurs infrequently ( low probability ) should use more bits. Codewords that occur frequently ( high probability ) should use fewer bits.
3.
It must not be possible to mistake a particular sequence of concatenated codewords for any other sequence.
The average bit length of codewords is
L avg
2 b i
1
0 l ( i ) P ( i ) where l(i) is the length of the codeword used to represent the grey level i. From Shannon first coding theorem the upper limit for Lavg is b and the lower limit for Lavg is the entropy .
Huffman coding
1.
Ranking pixel values in decreasing order of their probability
2.
Pair the two values with the lowest probabilities, labeling one of them with 0 and other with 1.
3.
Link two symbols with lowest probabilities .
4.
Go to step 2 until you generate a single symbol which probability is 1.
5.
Trace the coding tree from a root. http://www.compressconsult.com/huffman/ http://www.cs.duke.edu/csed/poop/huff/info/
3.4 Dictionary –based coding
These compression methods use the property of many data types to contain repeating code sequences. Good examples of such data are text files (code words represent characters) and raster images (code words represent pixels). All the dictionary methods can be subdivided into two main groups .
The methods of the first group try to find if the character sequence currently being compressed has already occurred earlier in the input data and then, instead of repeating it, output only a pointer to the earlier occurrence.
This is illustrated in the following diagram:
The dictionary here is implicit -- it is represented by the previously processed data. All the methods of this group are based on the algorithm developed and published in 1977 by Abraham Lempel and Jakob Ziv -- LZ77 .( http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lz77.html) A refinement of this algorithm which is the basis for practically all the later methods in this group is the LZSS algorithm developed in 1982 by Storer and Szymanski.
The algorithms of the second group create a dictionary of the phrases that occur in the input data. When they encounter a phrase already present in the dictionary, they just output the index number of the phrase in the dictionary. This is explained in the diagram below:
These methods are based on the algorithm developed and published by Lempel and Ziv in 1978 -- LZ78 . The refinement which is the basis for the later methods is called LZW . It was developed by Terry Welch i n 1984 for hardware implementation in high-performance disk controllers.
More Data Compression Links: http://www.ddj.com/topics/compression/algorithms/