Image compression -Contents o o o o o o o o o Date: 08/06/04 images take a lot of storage space 1024 x 1024 x 32-bit images requires 4MB suppose you have some video that is 640 x 480 x 24-bits x 30 frames per second, 1 minute of video would require 1.54 GB that many bytes take a long time to transfer over slow connections - suppose you have 56,000 bps 4MB will take almost 10 minutes 1.54 GB will take almost 66 hours storage problems, plus the desire to exchange images over the Internet, have led to a large interest in image compression algorithms the same information can be represented many ways a representation contains redundancy if more data than necessary is used to represent the information compression algorithms remove redundancy lossless algorithms remove only redundancy present in the data lossy algorithms create redundancy (by discarding some information) and then remove it types of redundancy coding redundancy our quantized data is represented using codewords if the size of the codeword is larger than is necessary to represent all quantization levels, then we have coding redundancy interpixel redundancy the intensity at a pixel may correlate strongly with the intensity value of its neighbors may remove redundancy by representing changes in intensity rather than absolute intensity values psychovisual redundancy may have information present that is of lesser importance to human perception e.g. high spatial frequencies or many quantization levels interframe redundancy temporal equivalent of interpixel redundancy measuring compression algorithm performance compression ratio C = n/nc n is the size of the uncompressed data, NC is the size of the compressed data larger values of C indicate better compression symmetry how does the time required for compression compare with the time required for decompression requirements for an algorithm may depend on the application e.g. images on a CD-ROM or served over the Web versus images being stored from a security camera fidelity criteria when a lossy algorithm is used, how does the decompressed image compare to the original image one objective measure is the root-mean-square (RMS) error smaller values of RMS error indicate the decompressed image is closer to the original small RMS error does not always correlate well with subjective perception subjective fidelity measures are also important Lossless compression algorithms delta compression takes advantage of interpixel redundancy in a scan line assumes most intensity changes happen gradually use codewords to represent the change in pixel intensity along a scan line e.g. use 4-bit codewords to represent intensity changes -7 to +7, plus a flag for an 8-bit codeword DeltaEncoder.java run length encoding also takes advantage of interpixel redundancy especially suited for synthetic images containing large homogeneous regions encode runs (a sequence of pixels of equal intensity) with a (length, intensity) pair runs can be constrained to a scan line, or allowed to extend to multiple scan lines application - compression of binary images to be faxed it is possible to increase the size of the dataset if applied to non-suitable images RunLengthEncoder.java statistical coding want to remove coding redundancy to measure the effectiveness of a coding scheme, we compute the entropy entropy - in information theory, it is the measure of the information content of a message entropy gives the average bits per pixel required to encode an image, H based on the probability of occurrence of each gray level, pi probabilities are computed by normalizing the histogram of the image we can then state the amount of redundancy - r = b - H, where b is the number of bits used per codeword we can also state the maximum compression ratio achievable by removing coding redundancy - Cmax=b/H entropy encoding - is a method for efficiently representing the information content of a message we are interested in encoding the information content only and gaining efficiency by discarding the noisy part of the information we will use variable-length codewords codewords that occur infrequently (low probability) should use more bits codewords that occur frequently (high probability) should use fewer bits no codeword of length n should be identical to the first n bits of any other codeword Entropy.java measuring entropy encoding performance on an image compute the average length bit length of its codewords lower limit is H, the entropy upper limit is b, number of bits used in fixed-length codewords Huffman coding one algorithm to perform entropy encoding used as a step or phase in many other compression algorithms rank gray levels in decreasing order of probability build a Huffman tree pair the least frequent gray levels assign 0 to one and 1 to the other replace the pair with the sum of their probabilities continue until no more pairs read the codeword for each gray level by starting at the root and following the nodes out to the leaf In-class exercise on Huffman coding o Given a 3-bit grayscale image with the following gray-level probabilities, give the Huffman tree for the encoding and list the codewords for all gray levels. 0 - 0.12 1 - 0.26 2 - 0.30 3 - 0.15 4 - 0.10 5 - 0.03 6 - 0.02 7 - 0.02 Dictionary-based coding o a type of universal coding (versus entropy coding) does not require a priori knowledge of data source characteristics thus can compress data in one pass o based upon the realization that any stream of data with some measure of redundancy consists of repetitions o encode variable-length strings (or sequences of bits) with a single codeword o a universal code will realize optimum compression in the limit for large datasets (doesn't work as well on small datasets) o Lempel and Ziv first published their work in 1977 LZ77 zip, gzip PNG implemented in java.util.zip LZ78 GIF is based on a (patented) variant of this - LZW by Lempel-Ziv and Welch o a simple example of LZ77 we can achieve compression of an arbitrary sequence of bits by always coding a series of 0's and 1's as some previous such string (the prefix string) plus one new bit the new string formed by adding the new bit to the previously used prefix string becomes a potential prefix string for future strings we wish to code the string: 101011011010101011 the first bit, a 1, has no predecessors, so, it has a ``null'' prefix string and the one extra bit is itself: 1,01011011010101011 the same goes for the 0 that follows since it can't be expressed in terms of the only existing prefix: 1,0,1011011010101011 the following 10 is obviously a combination of the 1 prefix and a 0: 1,0,10,11011010101011 we eventually parse the whole string as follows: 1,0,10,11,01,101,010,1011 since we found 8 phrases, we will use a three bit code to label the null phrase and the first seven phrases for a total of 8 numbered phrases write the string in terms of the number of the prefix phrase plus the new bit needed to create the new phrase the eight phrases can be described by: (000,1)(000,0),(001,0),(001,1),(010,1),(011,1),(101,0),(110,1) the coded version of the above string is: 00010000001000110101011110101101 the larger the initial string, the more savings we get as we move along, because prefixes that are quite large become representable as small numerical indices Ziv proved that for long documents the compression of the file approaches the optimum obtainable as determined by the information content of the document another way to think of it the algorithm searches the window for the longest match with the beginning of the look-ahead buffer and outputs a pointer to that match o o since it is possible that not even a one-character match can be found, the output cannot contain just pointers LZ77 solves this problem this way: after each pointer it outputs the first character in the lookahead buffer after the match if there is no match, it outputs a null-pointer and the character at the coding position deflation in gzip and PNG use a variant of LZ77 sliding window compression technique window consists of two parts previously seen data (dictionary) look-ahead buffer a "string" is an arbitrary sequence of bytes (they don't have to be printable characters) try to match strings of bytes in the look-ahead buffer with strings of bytes in the dictionary the second occurance of a string is replaced with a pair containing distance to the match (maximum is 32K bytes) length of the match if no match, string is not replaced unreplaced bytes and match lengths are Huffman coded using one tree match distances are Huffman coded using another tree coding is done in blocks - different blocks may use different trees trees are stored with the blocks hash tables are used for string matching compression in TIFF and GIF use LZW based on a code table that maps frequently encountered bit strings to codewords output bit string and code table are produced simultaneously (one pass) code table has 4096 entries - codewords can be up to 12 bits entries 0 - 255: contain 0 - 255 entry 256: contains clear code entry 257: contains end of information code entries 258 - 4095: contain frequently encountered bit strings - built on the fly codewords from 0 - 511 are 9 bits long codewords from 512 - 1023 are 10 bits long codewords from 1024 - 2047 are 11 bits long codewords from 2048 to 4095 are 12 bits long encoding algorithm initialize first 258 entries of code table set prefix string to null output clear code iterate read new image byte create current string by concatenating prefix string with new byte if current string in code table update prefix and repeat else output codeword of prefix put current string in code table set prefix to new byte and repeat a tree structure may be imposed on the table to facilitate searching end of information is output at end of image (or image strip) when code table is full, output clear code and reinitialize table decoding code table is built during decompression must be careful to read the correct number of bits DeflateTest.java HuffmanTest.java Comparision of compression ratios for lossless techniques Lossy compression o take advantage of psychovisual redundancy to create more coding redundancy example - convert 100, 101, 100, 99, 101 to 100, 100, 100, 100, 100 o JPEG- Joint Photographics Experts Group - compression algorithm based on transform coding http://www.jpeg.org/public/jpeghomepage.htm o o o o o o transform coding map the image into a set of transform coefficients using a reversible, linear transform a significant number of the coefficients will have small magnitudes these can be coarsely quantized or discarded compression is achieved during the quantization step, not during the transform block coding subdivide the image into small, non-overlapping blocks apply the transform to each block separately allows the compression to adapt to local image characteristics reduces the computational requirements choice of transform there are many that would work, including the discrete Fourier transform (DFT) discrete cosine transform (DCT) has several advantages real coefficients, rather than complex packs more information into fewer coefficients than DFT periodicity is 2N rather than N, reducing artifacts from discontinuities at block edges discrete wavelet transform (DWT) is even better - JPEG 2000 will use this http://www.jpeg.org/JPEG2000.htm choice of block size affects root-mean-square (RMS) error due to transform coding affects computational complexity want an n x n subimage, where n is an integer power of 2 as block size increases compression level and computational requirements increase RMS error decreases, then levels off adaptability decreases quantization DCT coefficients are real numbers, which require more bits to represent use an 8 x 8 quantization table that gives greater precision to lower frequency components higher frequency components may be discarded because their coefficients get set to zero table may be scaled, giving us a compression versus quality parameter low quality parameter will zero more coefficients and give a high compression ratio high quality parameter will zero fewer coefficients and give a lower compression ratio JPEG standard actually defines three different coding systems lossy baseline coding system extended coding system lossless independent coding system to be JPEG compatible only the baseline coding system must be implemented o o algorithm to compress 8-bit grayscale subdivide into 8 x 8 blocks for each block shift all pixels by -128 perform DCT quantize DCT coefficients reorder coefficients using a zigzag pattern delta encode the zero frequency coefficient (DC) run length encode zero-valued (AC) coefficients Huffman code AC coefficients JPEG classes provided in package com.sun.image.codec.jpeg factory class JPEGCodec creates instances of encoders and decoders JPEGEncoder.java access to control parameters is provided - JPEGQuantTable.java Comparison of quality versus compression ratio using JPEGTool application JPEG compression is not as effective as the lossless methods on synthetic images fractal compression fractals repeated evaluation of a simple equation at varying scales generates infinite level of detail has the property of self-similarity - parts of the shape look like transformed copies of other parts apply this concept to images - one group of pixels is a transformed copy of another group of pixels the transformation is not just geometric, but also gray level the transformation can be represented more compactly than the original group of pixels compression algorithm subdivide the image into non-overlapping ranges search for a domain twice the size of the range which is most similar to the range select the best set of translation, orientation, and gray-level mapping parameters store these in a compact format can subdivide a range further if an appropriate domain can't be found specify an RMS error tolerance parameter if the result for a range exceeds this, we subdivide further gives us a quality versus computational requirements control parameter algorithm is not symmetric - compression takes much longer than decompression decompression can be done to any image resolution compressing video could do JPEG compression on each frame - e.g. QuickTime, Video for Windows would like to reduce interframe redundancy MPEG- Motion Picture Experts Group - develops standards http://mpeg.telecomitalialab.com/ MPEG-1 - video CD, MP3 MPEG-2 - digital television set-top boxes, DVD MPEG-4 - multimedia for the web working on MPEG-7 (Multimedia Content Description Interface) and MPEG21 (Multimedia Framework) compress a reference frame at regular intervals perform motion estimation for subsequent frames try to encode blocks in subsequent frames in terms of blocks in the reference frame highly asymmetrical - compression (often hardware assisted) takes much longer than decompression