Image compression -Contents Date: 08/06/04 images take a lot of

advertisement

Image compression -Contents
o
o
o
o
o
o
o
o
o
Date: 08/06/04
images take a lot of storage space
 1024 x 1024 x 32-bit images requires 4MB
 suppose you have some video that is 640 x 480 x 24-bits x 30 frames per second, 1
minute of video would require 1.54 GB
that many bytes take a long time to transfer over slow connections - suppose you have 56,000
bps
 4MB will take almost 10 minutes
 1.54 GB will take almost 66 hours
storage problems, plus the desire to exchange images over the Internet, have led to a large
interest in image compression algorithms
the same information can be represented many ways
a representation contains redundancy if more data than necessary is used to represent the
information
compression algorithms remove redundancy
 lossless algorithms remove only redundancy present in the data
 lossy algorithms create redundancy (by discarding some information) and then remove
it
types of redundancy
 coding redundancy
 our quantized data is represented using codewords
 if the size of the codeword is larger than is necessary to represent all
quantization levels, then we have coding redundancy
 interpixel redundancy
 the intensity at a pixel may correlate strongly with the intensity value of its
neighbors
 may remove redundancy by representing changes in intensity rather than
absolute intensity values
 psychovisual redundancy
 may have information present that is of lesser importance to human perception
 e.g. high spatial frequencies or many quantization levels
 interframe redundancy
 temporal equivalent of interpixel redundancy
measuring compression algorithm performance
 compression ratio
 C = n/nc
 n is the size of the uncompressed data, NC is the size of the compressed data
 larger values of C indicate better compression
 symmetry
 how does the time required for compression compare with the time required for
decompression
 requirements for an algorithm may depend on the application
 e.g. images on a CD-ROM or served over the Web versus images being stored
from a security camera
 fidelity criteria
 when a lossy algorithm is used, how does the decompressed image compare to
the original image
 one objective measure is the root-mean-square (RMS) error
 smaller values of RMS error indicate the decompressed image is closer to the
original
 small RMS error does not always correlate well with subjective perception
 subjective fidelity measures are also important
Lossless compression algorithms
 delta compression






takes advantage of interpixel redundancy in a scan line
assumes most intensity changes happen gradually
use codewords to represent the change in pixel intensity along a scan line
e.g. use 4-bit codewords to represent intensity changes -7 to +7, plus a flag for
an 8-bit codeword
 DeltaEncoder.java
run length encoding
 also takes advantage of interpixel redundancy
 especially suited for synthetic images containing large homogeneous regions
 encode runs (a sequence of pixels of equal intensity) with a (length, intensity)
pair
 runs can be constrained to a scan line, or allowed to extend to multiple scan
lines
 application - compression of binary images to be faxed
 it is possible to increase the size of the dataset if applied to non-suitable images
 RunLengthEncoder.java
statistical coding
 want to remove coding redundancy
 to measure the effectiveness of a coding scheme, we compute the entropy
 entropy - in information theory, it is the measure of the information
content of a message
 entropy gives the average bits per pixel required to encode an image, H
 based on the probability of occurrence of each gray level, pi
 probabilities are computed by normalizing the histogram of the image
 we can then state the amount of redundancy - r = b - H, where b is the
number of bits used per codeword
 we can also state the maximum compression ratio achievable by
removing coding redundancy - Cmax=b/H
 entropy encoding - is a method for efficiently representing the information
content of a message
 we are interested in encoding the information content only and gaining
efficiency by discarding the noisy part of the information
 we will use variable-length codewords
 codewords that occur infrequently (low probability) should use more
bits
 codewords that occur frequently (high probability) should use fewer
bits
 no codeword of length n should be identical to the first n bits of any
other codeword
 Entropy.java
 measuring entropy encoding performance on an image
 compute the average length bit length of its codewords
 lower limit is H, the entropy
 upper limit is b, number of bits used in fixed-length codewords
 Huffman coding
 one algorithm to perform entropy encoding
 used as a step or phase in many other compression algorithms
 rank gray levels in decreasing order of probability
 build a Huffman tree
 pair the least frequent gray levels
 assign 0 to one and 1 to the other
 replace the pair with the sum of their probabilities
 continue until no more pairs
 read the codeword for each gray level by starting at the root and
following the nodes out to the leaf


In-class exercise on Huffman coding
o Given a 3-bit grayscale image with the following gray-level probabilities, give the Huffman tree
for the encoding and list the codewords for all gray levels.
 0 - 0.12
 1 - 0.26
 2 - 0.30
 3 - 0.15
 4 - 0.10
 5 - 0.03
 6 - 0.02
 7 - 0.02
Dictionary-based coding
o a type of universal coding (versus entropy coding)
 does not require a priori knowledge of data source characteristics
 thus can compress data in one pass
o based upon the realization that any stream of data with some measure of redundancy consists of
repetitions
o encode variable-length strings (or sequences of bits) with a single codeword
o a universal code will realize optimum compression in the limit for large datasets (doesn't work
as well on small datasets)
o Lempel and Ziv first published their work in 1977
 LZ77
 zip, gzip
 PNG
 implemented in java.util.zip
 LZ78
 GIF is based on a (patented) variant of this - LZW by Lempel-Ziv and Welch
o a simple example of LZ77
 we can achieve compression of an arbitrary sequence of bits by always coding a series
of 0's and 1's as some previous such string (the prefix string) plus one new bit
 the new string formed by adding the new bit to the previously used prefix string
becomes a potential prefix string for future strings
 we wish to code the string: 101011011010101011
 the first bit, a 1, has no predecessors, so, it has a ``null'' prefix string and the
one extra bit is itself: 1,01011011010101011
 the same goes for the 0 that follows since it can't be expressed in terms of the
only existing prefix: 1,0,1011011010101011
 the following 10 is obviously a combination of the 1 prefix and a 0:
1,0,10,11011010101011
 we eventually parse the whole string as follows: 1,0,10,11,01,101,010,1011
 since we found 8 phrases, we will use a three bit code to label the null phrase
and the first seven phrases for a total of 8 numbered phrases
 write the string in terms of the number of the prefix phrase plus the new bit
needed to create the new phrase
 the eight phrases can be described by:
(000,1)(000,0),(001,0),(001,1),(010,1),(011,1),(101,0),(110,1)
 the coded version of the above string is:
00010000001000110101011110101101
 the larger the initial string, the more savings we get as we move along, because
prefixes that are quite large become representable as small numerical indices
 Ziv proved that for long documents the compression of the file approaches the
optimum obtainable as determined by the information content of the document
 another way to think of it  the algorithm searches the window for the longest match with the beginning of
the look-ahead buffer and outputs a pointer to that match

o
o
since it is possible that not even a one-character match can be found, the output
cannot contain just pointers
 LZ77 solves this problem this way:
 after each pointer it outputs the first character in the lookahead buffer
after the match
 if there is no match, it outputs a null-pointer and the character at the
coding position
deflation in gzip and PNG use a variant of LZ77
 sliding window compression technique
 window consists of two parts
 previously seen data (dictionary)
 look-ahead buffer
 a "string" is an arbitrary sequence of bytes (they don't have to be printable characters)
 try to match strings of bytes in the look-ahead buffer with strings of bytes in the
dictionary
 the second occurance of a string is replaced with a pair containing
 distance to the match (maximum is 32K bytes)
 length of the match
 if no match, string is not replaced
 unreplaced bytes and match lengths are Huffman coded using one tree
 match distances are Huffman coded using another tree
 coding is done in blocks - different blocks may use different trees
 trees are stored with the blocks
 hash tables are used for string matching
compression in TIFF and GIF use LZW
 based on a code table that maps frequently encountered bit strings to codewords
 output bit string and code table are produced simultaneously (one pass)
 code table has 4096 entries - codewords can be up to 12 bits
 entries 0 - 255: contain 0 - 255
 entry 256: contains clear code
 entry 257: contains end of information code
 entries 258 - 4095: contain frequently encountered bit strings - built on the fly
 codewords from 0 - 511 are 9 bits long
 codewords from 512 - 1023 are 10 bits long
 codewords from 1024 - 2047 are 11 bits long
 codewords from 2048 to 4095 are 12 bits long
 encoding algorithm
 initialize first 258 entries of code table
 set prefix string to null
 output clear code
 iterate
 read new image byte
 create current string by concatenating prefix string with new byte
 if current string in code table
 update prefix and repeat
 else
 output codeword of prefix
 put current string in code table
 set prefix to new byte and repeat
 a tree structure may be imposed on the table to facilitate searching
 end of information is output at end of image (or image strip)
 when code table is full, output clear code and reinitialize table
 decoding
 code table is built during decompression
 must be careful to read the correct number of bits



DeflateTest.java
HuffmanTest.java
Comparision of compression ratios for lossless techniques

Lossy compression
o take advantage of psychovisual redundancy to create more coding redundancy
 example - convert 100, 101, 100, 99, 101 to 100, 100, 100, 100, 100
o JPEG- Joint Photographics Experts Group - compression algorithm based on transform coding
http://www.jpeg.org/public/jpeghomepage.htm
o
o
o
o
o
o
transform coding
 map the image into a set of transform coefficients using a reversible, linear transform
 a significant number of the coefficients will have small magnitudes
 these can be coarsely quantized or discarded
 compression is achieved during the quantization step, not during the transform
block coding
 subdivide the image into small, non-overlapping blocks
 apply the transform to each block separately
 allows the compression to adapt to local image characteristics
 reduces the computational requirements
choice of transform
 there are many that would work, including the discrete Fourier transform (DFT)
 discrete cosine transform (DCT) has several advantages
 real coefficients, rather than complex
 packs more information into fewer coefficients than DFT
 periodicity is 2N rather than N, reducing artifacts from discontinuities at block
edges
 discrete wavelet transform (DWT) is even better - JPEG 2000 will use this http://www.jpeg.org/JPEG2000.htm
choice of block size
 affects root-mean-square (RMS) error due to transform coding
 affects computational complexity
 want an n x n subimage, where n is an integer power of 2
 as block size increases
 compression level and computational requirements increase
 RMS error decreases, then levels off
 adaptability decreases
quantization
 DCT coefficients are real numbers, which require more bits to represent
 use an 8 x 8 quantization table that gives greater precision to lower frequency
components
 higher frequency components may be discarded because their coefficients get set to
zero
 table may be scaled, giving us a compression versus quality parameter
 low quality parameter will zero more coefficients and give a high compression
ratio
 high quality parameter will zero fewer coefficients and give a lower
compression ratio
JPEG standard
 actually defines three different coding systems
 lossy baseline coding system
 extended coding system
 lossless independent coding system
 to be JPEG compatible only the baseline coding system must be implemented

o
o
algorithm to compress 8-bit grayscale
 subdivide into 8 x 8 blocks
 for each block
 shift all pixels by -128
 perform DCT
 quantize DCT coefficients
 reorder coefficients using a zigzag pattern
 delta encode the zero frequency coefficient (DC)
 run length encode zero-valued (AC) coefficients
 Huffman code AC coefficients
 JPEG classes provided in package com.sun.image.codec.jpeg
 factory class JPEGCodec creates instances of encoders and decoders JPEGEncoder.java
 access to control parameters is provided - JPEGQuantTable.java
 Comparison of quality versus compression ratio using JPEGTool application
 JPEG compression is not as effective as the lossless methods on synthetic images
fractal compression
 fractals
 repeated evaluation of a simple equation at varying scales
 generates infinite level of detail
 has the property of self-similarity - parts of the shape look like transformed
copies of other parts
 apply this concept to images - one group of pixels is a transformed copy of another
group of pixels
 the transformation is not just geometric, but also gray level
 the transformation can be represented more compactly than the original group of pixels
 compression algorithm
 subdivide the image into non-overlapping ranges
 search for a domain twice the size of the range which is most similar to the
range
 select the best set of translation, orientation, and gray-level mapping parameters
 store these in a compact format
 can subdivide a range further if an appropriate domain can't be found
 specify an RMS error tolerance parameter
 if the result for a range exceeds this, we subdivide further
 gives us a quality versus computational requirements control parameter
 algorithm is not symmetric - compression takes much longer than decompression
 decompression can be done to any image resolution
compressing video
 could do JPEG compression on each frame - e.g. QuickTime, Video for Windows
 would like to reduce interframe redundancy
 MPEG- Motion Picture Experts Group - develops standards http://mpeg.telecomitalialab.com/
 MPEG-1 - video CD, MP3
 MPEG-2 - digital television set-top boxes, DVD
 MPEG-4 - multimedia for the web
 working on MPEG-7 (Multimedia Content Description Interface) and MPEG21 (Multimedia Framework)
 compress a reference frame at regular intervals
 perform motion estimation for subsequent frames
 try to encode blocks in subsequent frames in terms of blocks in the reference frame
 highly asymmetrical - compression (often hardware assisted) takes much longer than
decompression
Download