Zonal coding keep Information theory says coefficients with maximum variance carry the most information. 15 out of 64 transform coefficients , with largest variance, are kept discard ©2001 Bijan Mobasseri 76 Quantization • Retained coefficients are quantized and coded • Coefficients are either normalized by their standard deviation and uniformly quantized or put through an optimum Lloyd-Max quantizer ©2001 Bijan Mobasseri 77 Bit allocation • How many bits should be assigned to each coefficient? • Information theory tells us, that a Gaussian random variable, subject to distortion D, cannot be represented by fewer than 2 1 log2 bits 2 D • Therefore, large variance coefficients need more bits ©2001 Bijan Mobasseri 78 Bit allocation table 8 7 6 4 3 2 1 0 7 6 5 4 3 2 1 0 6 5 4 3 2 1 1 0 4 3 2 1 0 Number of bits per coefficient ©2001 Bijan Mobasseri 79 Problem with zonal coding • If we rope off a fixed area we run the risk of cutting off occasional transform coefficients that are large enough to be included excluded ©2001 Bijan Mobasseri 80 Threshold coding • One solution is to threshold coefficients and possibly use different thresholds for different subimages • This way we will end up keeping those coefficients that exceed threshold, wherever they might be ©2001 Bijan Mobasseri 81 Picking the N-largest transform coefficients • In a 2D array, how do we pick N-largest coefficients. • Answer: zig-zag scan. Coefficients are rearranged in a 1D- sequence 0 1 5 6 14 2 4 7 1316 3 8 121725 9 11182431 ©2001 Bijan Mobasseri 82 Threshold details • How do we threshold the transform coefficients? • There are 3 ways – Single global threshold for all subimages – Different thresholds for different subimages – Different threshold for different locations in the subimage ©2001 Bijan Mobasseri 83 Issue of code rate • If we always keep the same number of coefficients and bit allocation table, we will be operating under a constant code rate that is also known in advance. • If we keep different number of coefficients for different subimages, code rate will be variable ©2001 Bijan Mobasseri 84 Combining thresholding and quantization • Let T(u,v) be the transform coefficient matrix. Let Z(u,v) be the normalization array. Then define 16 11 10 16 24 40 51 61 T(u, v) Tˆ u,v round Z u,v 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 Z(u,v)= 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 ©2001 Bijan Mobasseri 72 92 95 98 112 100 103 99 85 Quantization curve Tˆ u,v 2 1 -2c -c c 2c For Z=c, Tˆ u,v assumes integer value k if and only if c c kc T(u, v) kc 2 2 T(u,v) If Z(u,v)>2T(u,v),Tˆ u,v=0 ©2001 Bijan Mobasseri 86 Bit rate question • As k increases, Tˆ u,v is represented by a variable length code • The number of bits used to represent T is controlled by c • Thus the elements of Z can be scaled to achieve a variety of compression levels ©2001 Bijan Mobasseri 87 VECTOR QUANTIZATION ©2001 Bijan Mobasseri 88 DEFINING THE “VECTOR” IN VQ • An n=lxm block of pixels is converted into an n dimensional vector X X ©2001 Bijan Mobasseri 89 THEORETICAL BASIS • There is an information-theoretic basis for vector quantization • An extended source can be source encoded with arbitrarily to the source entropy for large vector length 1 H X R H X n ©2001 Bijan Mobasseri 90 Achieving the limit • Let each pixel be represented by k bits. Then there are a total of 2nk possible code vectors • To transmit/store all such vectors requires log2(2nk)/n pixels which puts us right back at k bits per pixel • Solution: Codebook ©2001 Bijan Mobasseri 91 CODEBOOK CONCEPT • Codebook approach is a many-to-one transformation of image vectors. ©2001 Bijan Mobasseri 92 MAPPING RULE • X is compared to a codebook given by Xˆ i,i 1,...,N c • Nc is a lot fewer than all possible image vectors • The best match codevector is chosen using a minimum distortion rule; i.e. dX, Xˆ k dX, Xˆ j j 1,...N c ©2001 Bijan Mobasseri 93 DISTANCE MEASURE • Distance measure is the Euclidean distance between two vectors given by 1 n 2 ˆ dX, X x i xˆ i n i1 ©2001 Bijan Mobasseri 94 IMPACT ON BIT RATE • Only the entry into the codebook, via index K, is transmitted at the cost of log2 N c bits per vector • The resulting bit rate of a VQ is R log2 N c n bits pixel • With no VQ we would need k bits/pixel ©2001 Bijan Mobasseri 95 EXAMPLE: BINARY IMAGE • Image vector X formed by grouping a 3x3 window 101011100 • There are a total of 29 possible image vectors [1 1 0 1 0 1 1 0 1], [ 0 0 01 1 0 0 1 0 ], [1 0 1 0 1 0 1 1 1 ] ? 101011100 Which entry is closest? [0 0 0 0 0 0 0 0 0 ] Example codebook ©2001 Bijan Mobasseri 96 COMPRESSION • Straight encoding needs 1 bit/pixel • Here we will transmit only an index into the codebook; one of 4 • Effective bits/pixel then is log 2 Nc n bits pixel log 2 (4) / 9 0.222bits / pixel ©2001 Bijan Mobasseri 97 A NUMERICAL EXAMPLE • Want to encode an 8-bit image using VQ at a desired rate of 1 bit/pixel (8:1 compression) • If the vectors are formed as 2x2 blocks(n=4) of image pixels, and the codebook contains 16 codevectors, Nc=16. • We will then have R=log216/4=1 bit/pixel ©2001 Bijan Mobasseri 98 INCREASING VECTOR SIZE • Only 16 codevectors are available to represent all the 2564 possible image vectors • If block size is increased to 4x4(n=16), the codebook size has to increase to 216 codevectors to keep the bit rate at 1.0 bit/pixel • The number of possible image vectors for a Bijan16 Mobasseri 99 4x4 block is now©2001256 CONCLUSION • It appears that ratio of codebook size to image vectors decreases, pointing to possibly large distortion • It can be shown, however, that mean square distortion actually goes down as vector dimension increases. VQ at an effective rate of 1bit/pixel outperforms a straight 1bit/pixel encoding ©2001 Bijan Mobasseri 100 A PRACTICAL EXAMPLE • The most commonly used test image is that of “Lena”. ©2001 Bijan Mobasseri 101 VQ PARAMETERS • • • • Original image:8 bits/pixel Block size: 4x4(16 pixels) Possible number of vectors:25616=2128 Codebook size: 215 ©2001 Bijan Mobasseri 102 COMPRESSION • Mean of each block is quantized to 8 bits for a rate of 8/16=0.5 bits/pixel • Total of 215codevectors are represented by 15 bits. • This amounts to 15 bits/16pixels=15/16 bits/pixel. • Total bit rate: 0.5+15/16=1.438 bits/pixel ©2001 Bijan Mobasseri 103 Predictive Compression Using the past to predict the future ©2001 Bijan Mobasseri 104 What is the idea? • The familiar PCM encoding does A/D conversion but very inefficiently. Voice, for example, is coded at 64 Kb/sec. • The reason is samples are treated without regard to past sample values • Therefore, huge redundancy is built into PCM ©2001 Bijan Mobasseri 105 Differential PCM(DPC M) • Concept of differential encoding is of great importance in communications • The underlying idea is not to look at samples individually but to look at past values as well. • Often, samples change very little thus a substantial compression can be achieved ©2001 Bijan Mobasseri 106 Why differential? • Let’s say we have a DC signal and blindly go about PCM-encoding it. Is it smart? • Clearly not. What we have failed to realize is that samples don’t change. We can send the first sample and tell the receiver that the rest are the same ©2001 Bijan Mobasseri 107 Definition of differential encoding • We can therefore say that in differential encoding, what is recorded and ultimately transmitted is the change in sample amplitudes not their absolute values • We should send only what is NEW. ©2001 Bijan Mobasseri 108 Where is the saving? • Consider the following two situations 2 2 1.6 1.6 2 2 1.6 2 0.4 -0.8 -0.4 0.8 0 -0.4 0.4 0.8 • The right samples are adjacent sample differences with much smaller dynamic range requiring fewer quantization levels ©2001 Bijan Mobasseri 109 Implementation of DPCM:prediction • At the heart of DPCM is the idea of prediction • Based on n-1 previous samples, encoder generates an estimate of the nth sample. Since the nth sample is known, prediction error can be found. This error is then transmitted ©2001 Bijan Mobasseri 110 Illustrating prediction • Here is what is happening at the transmitter To be trasmited Prediction error Past samples(already sent) Prediction of the Current sample Only Prediction error is sent ©2001 Bijan Mobasseri 111 What does the receiver do? • Receiver has the identical prediction algorithm available to it. It has also received all previous samples so it can make a prediction of its own • Transmitter helps out by supplying the prediction error which is then used by the receiver to update the predicted value ©2001 Bijan Mobasseri 112 Interesting speculation • What if our power of prediction was perfect? In other words, what if we could predict the next sample with no error?. What kind of communication system would be looking at? ©2001 Bijan Mobasseri 113 Prediction error • Let m(t) be the message and Ts sample interval, then prediction error is given e(nTs ) m(nTs ) mˆ nTs Prediction error ©2001 Bijan Mobasseri 114 Prediction filter • Prediction is normally done using a weighted sum of N previous samples N mˆ nTs wi mn i Ts i1 • The quality of prediction depends on the good choice of weights wi ©2001 Bijan Mobasseri 115 Finding the optimum filter • How do you find the “best” weights? • Obviously, we need to minimize the prediction error. This is done statistically 2 e Min nTs over w • Choose a set of weights that gives the lowest (on average) prediction error ©2001 Bijan Mobasseri 116 Prediction gain • Prediction provides an SNR improvement by a factor called prediction gain 2M message power Gp 2 e prediction error power ©2001 Bijan Mobasseri 117 How much gain? • On average, this gain is about 4-11 dB. • Recall that 6 dB of SNR gain can be exchanged for 1 bit per sample • At 8000 samples/sec(for speech) we can save 1 to 2 bits per sample thus saving 8-16 Kb/sec. ©2001 Bijan Mobasseri 118 DPCM encoder Input sample + + Prediction - error quantizer Prediction error encoder + Prediction N-tap prediction Updated prediction • Prediction error is used to correct the estimate in time for the next round of prediction ©2001 Bijan Mobasseri 119 Closer look at prediction engine • Two questions need to be addressed: 1): How many past samples to use and 2): what are the best weights? • Notations: x(n) sample to be estimated x(n 1), x(n 2), previous samples xˆ n estimated sample d n prediction error = x n xˆ n ©2001 Bijan Mobasseri 120 Encoder diagram x n d n d˜ n Quantizer xˆ n N-tap predictor ©2001 Bijan Mobasseri x˜ n 121 DPCM Decoder dˆn x˜ n xˆ n N-tap predictor Using the same predictor, encoder produces an estimate of x(n). Then uses received prediction error to update its estimate ©2001 Bijan Mobasseri 122 One-tap prediction • This is the simplest predictor. We use the current sample to predict the next one x n n 1 axn 1n 1 notation: x n m estimate of the nth sample given samples collected up through time m • What is a? ©2001 Bijan Mobasseri 123 Minimizing prediction error • Pick “a” so that prediction error is minimized dn xn xˆn x n ax n 1n 1 • Must minimize the power of prediction error by choosing the right multiplier 2 Ed 2 n Ex n axn 1n 1 Rx 0 2a Rx 1 a 2 Rx 0 signalpower onesample correlation ©2001 Bijan Mobasseri 124 Finding a • Take the derivative of error and set it to zero. The optimum a is Rx 1 onesmapelcorrelatin a Rx 0 signalpower • The resulting minimum error R 1 2 Rx 01 x Rx 0 ©2001 Bijan Mobasseri 125 Example • Clearly we need source statistics to perform DPCM. There is 80% sample correlation 0.8 Rx a=0.8,…x(n|n-1)=0.8x(n-1|n-1) 1 n Rx 1 onesmapelcorrelatin a Rx 0 signalpower ©2001 Bijan Mobasseri 126 N-tap prediction • N consecutive samples are weighted and summed. Samples are T seconds apart delay x(n) T T a1 T a2 aN N xˆ n a j x n j j 1 aT X ©2001 Bijan Mobasseri 127 Finding a • Take N derivatives of error and set it to zero. The result is the following matrix equation Rx 1 Rx 2 Rx 1 Rx 0 R 2 R 1 Rx 0 Rx 1 x x Rx 3 Rx 2 Rx 1 Rx 0 Rx N Rx N Rx N 2 Rx N 3 Rx 1 N a1 Rx 2 N a2 Rx 3 N a3 Rx 0 a N Rx 1, N R x a a R 1 x R x 1, N ©2001 Bijan Mobasseri 128 DPCM for images • For images we have to do 2D prediction. We also need to know the 2D correlation function. A bit more involved! ? • Want to find the center pixel from its neighboring values ©2001 Bijan Mobasseri 129 Special case • For a 2D Markov source the autocorrelation function at shifts (i,j) is defined Ri, j E f x, y f x i, y j 2 ix yj (i,j) (0,0) ©2001 Bijan Mobasseri 130 Prediction for Markov source • It can be shown that the optimal 2D linear predictor is fˆ x, y 1 f x, y 1 2 f x 1, y 1 3 f x 1, y 4 f x 1, y 1 • Where horizontal correlation 1 x , 2 x y ,3 y , 4 0 ©2001 Bijan Mobasseri 131 JPEG ©2001 Bijan Mobasseri 132 JPEG standard • The concept of JPEG had been known for at least 30 years before it became a widely adopted standard in the early 90’s • JPEG is a block transform compression that uses DCT • OF all practical transforms, DCT has the most energy compaction property ©2001 Bijan Mobasseri 133 Elements of JPEG • JPEG closely follows DCT with a few caveats. – – – – – – 8x8 DCT block transform Perceptual masking Zigzag ordering Transform coefficient quantization Run-length encoding compression Variable-length encoding ©2001 Bijan Mobasseri 134 8x8 block DCT • Image is subdivided into non overlapping 8x8 blocks • Pixels are level shifted by subtracting 2n-1 where 2n is the maximum number of gray levels ©2001 Bijan Mobasseri 135 Quantization • The standard quantization table is applied to each 8x8 DCT block DC block T(u, v) Tˆ u,v round Z u,v 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 Z(u,v)= 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 ©2001 Bijan Mobasseri 72 92 95 98 112 100 103 99 136 Re-ordering • DCT coefficients are reordered according to the following pattern 0 1 5 6 14 2 4 7 1316 3 8 121725 9 11182431 ©2001 Bijan Mobasseri 137 Coding • Zigzag scanning following quantization insures long runs of AC coefficients (indicating lack of high frequency content) • DC coefficient is differentially encoded • Non-zero AC coefficients are variable length encoded (Huffman) ©2001 Bijan Mobasseri 138 JPEG example • Consider the following hypothetical 8x8 block 52 55 61 66 70 61 64 73 63 59 66 90 109 85 69 72 1 2 62 59 68 113 144 104 66 73 63 58 71 122 154 106 70 69 67 61 68 104 126 88 68 70 79 65 60 70 77 68 58 75 85 71 64 59 55 61 65 83 87 79 69 68 65 76 78 94 3 4 5 6 7 ©2001 Bijan Mobasseri 8 1 2 3 4 5 6 7 8 139 Level shifting • The coding process begins by level shifting the pixels by -128 gray levels -76 -73 -67 -62 -58 -67 -64 -55 -65 -69 -62 -38 -19 -43 -59 -56 -66 -69 -60 -15 16 -24 -62 -55 -65 -70 -57 -6 26 -22 -58 -59 -61 -67 -60 -24 -2 -40 -60 -58 -49 -63 -68 -58 -51 -60 -70 -53 -43 -57 -64 -69 -73 -67 -63 -45 -41 -49 -59 -60 -63 -52 -50 -34 ©2001 Bijan Mobasseri 140 Do the DCT • Here are the transform coefficients -415 7 -46 -50 11 -10 -4 -1 -29 -21 8 13 -8 1 -1 -1 -62 -62 77 35 -13 3 2 -1 25 55 -20 -1 3 9 11 -7 -6 6 -25 -30 10 7 -5 -15 -9 6 0 3 -2 -1 1 -4 1 -3 -1 0 2 -1 -1 2 -3 1 -2 -2 -1 -1 0 -1 ©2001 Bijan Mobasseri 141 Quantization, scaling and truncation • JPEG standard normalization array is given by the following(left) truncated coefficients are shown on the right 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99 -26 -3 -6 2 2 1 -2 -4 0 0 -3 1 5 -1 -1 -4 1 2 -1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ©2001 Bijan Mobasseri 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 142 Zigzag ordering • We then make a 1D sequence according to the zigzag pattern -26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB • Anything after EOB isMobasseri taken as zero ©2001 Bijan 143 Coding the DC coefficient • The DC coefficient is differentially encoded relative the DC average of the previous block • Let the block to the immediate left have a DC coefficient of -17 • Differential is [-27-(-16)]=-9 • What do we do with this now? ©2001 Bijan Mobasseri 144 JPEG coefficient coding table JPEG Coefficients Coding Categories RANGE 0 -1,1 -3, -2, 2, 3 -7,…, -4, 4, …, 7 -15,…, -8, 8,…, 15 -31,…,-16, 16,…, 31 -63,…, -32, 32,…, 63 DC difference category A/C category 0 1 2 3 4 5 6 N/A 1 2 3 4 5 6 ©2001 Bijan Mobasseri 145 Huffman coding of categories – DC block differential was -9. This lies in difference category 4. But where do we go from here? – Difference categories are variable length Huffman coded as shown in the next table ©2001 Bijan Mobasseri 146 Huffman coding of categories Category Base code Length Category Base Code Length 0 1 2 3 010 011 100 00 3 4 5 5 6 7 8 9 10 12 14 16 4 101 7 A 5 110 8 B 1110 11110 111110 111111 0 111111 10 111111 110 18 20 Length is the final length in bits of each category ©2001 Bijan Mobasseri 147 Huffman code of DC block differential • We had -9. This value belongs to category 4 giving rise to 101 • The remaining 4 bits come from the least significant bits(LSB) of the difference value • For category k, an additional k bits are computed as k LSBs of of the positive difference (+9), 0110 In this case • Full DPCM coded DC block is 1010110 ©2001 Bijan Mobasseri 148 AC coefficients code table Each coeff.depends on the number of zero-valued coeff. preceding the non-zero ones -26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB JPEG Default AC Code Run/Category 0/0 0/1 0/2 0/3 0/4 0/5 Base Code 1010 (=EOB) 00 01 100 1011 11010 . . Coded as 0100 . . . . First 2bits (01):category 2 0/A 1111111110000011 1/1 1100 with no zero-valued coeff 1/2 111001 1/3 1111001 preceding it Last two are found the same 1/A 1111111110001000 ©2001 Bijan Mobasseri way as in DC Length 4 3 4 6 8 10 . . . 26 5 8 10 26 149 Final JPEG bitstream • For the 8x8 block the following represents its JPEG bitstream [1010110 0100 001 0100 0101 100001 0110 100011 001 100011 001 001 100101 11100110 110110 0110 11110100 000 1010] • Total number of bits: 92 • Original block: 512 • Compression ratio: 5.6:1 ©2001 Bijan Mobasseri 150 Decoding JPEG • Via table lookup operation, the quantized coefficient matrix is quickly recreated • Denormalization is done by simply multiplying the normalized matrix above by the normalization matrix T(u, v) Tˆ u,v round Z u,v Not the same T˜ u,v Tˆ u,vZ u,v ©2001 Bijan Mobasseri 151 Recovered coefficients -415 7 -46 -50 11 -10 -4 -1 -29 -21 8 13 -8 1 -1 -1 -62 -62 77 35 -13 3 2 -1 25 55 -20 -1 3 9 11 -7 -6 6 -25 -30 10 7 -5 -15 -9 6 0 3 -2 -1 1 -4 1 -3 -1 0 2 -1 -1 2 -3 1 -2 -2 -1 -1 0 -1 -416 12 -42 -56 18 0 0 0 original -33 -60 -24 -56 13 80 17 44 0 0 0 0 0 0 0 0 32 48 0 0 0 0 0 0 0 0 -24 -40 0 0 0 -29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 recovered ©2001 Bijan Mobasseri 152 Recovering block 52 55 61 66 70 61 64 73 58 64 67 64 59 62 70 78 63 59 66 90 109 85 69 72 56 55 67 89 98 88 74 69 62 59 68 113 144 104 66 73 60 50 70 119 141 116 80 64 63 58 71 122 154 106 70 69 69 67 61 68 104 126 88 68 70 74 53 64 105 115 84 65 72 79 65 60 70 77 68 58 75 76 57 56 74 75 57 57 74 85 71 64 59 55 61 65 83 83 69 59 60 61 61 67 78 87 79 69 68 65 76 78 94 93 81 67 62 69 80 84 84 51 71 128 149 115 77 68 Original JPEG ©2001 Bijan Mobasseri 153 Color JPEG • A color image can be represented in a variety of spaces including RGB and YUV – RGB image consists of 3 primary color channels- red, green, blue – YU V (also known as YIQ or YCbCr) color model consists of an intensity component Y (grayscale and two chrominance channels ©2001 Bijan Mobasseri 154 RGB planes ©2001 Bijan Mobasseri 155 Another look R G B Recording the color image in gray using red, green and blue filters ©2001 Bijan Mobasseri 156 YUV model ©2001 Bijan Mobasseri 157 Color quantization table • Color bands are much more aggressively compressed than luminance. Here is the quantization table 16 11 10 16 24 40 51 61 17 18 24 47 99 99 99 99 12 12 14 19 26 58 60 55 18 14 13 16 24 40 57 69 56 24 26 56 99 99 99 99 99 14 17 22 29 51 87 80 62 47 66 99 99 99 99 99 99 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 49 64 78 87 103 121 120 101 99 99 99 99 99 99 99 99 72 92 95 98 112 100 103 99 99 99 99 99 99 99 99 99 Luminance table©2001 Bijan Mobasseri 21 26 66 99 99 99 99 Color table 158 Examples of JPEG-original 84,197 bytes ©2001 Bijan Mobasseri 159 JPEG-75% 36,401 bytes ©2001 Bijan Mobasseri 160 JPEG - 50% 22,112 bytes ©2001 Bijan Mobasseri 161 JPEG - 25% 13,907 bytes ©2001 Bijan Mobasseri 162 JPEG -15% 9,973 bytes ©2001 Bijan Mobasseri 163 JPEG -5% 5,662 bytes ©2001 Bijan Mobasseri 164 Foundation for JPEG2000 Compression at multiples of scale ©2001 Bijan Mobasseri 165