Why do we need compression? Why do we need image compression? -Example: digital camera (4Mpixel) Raw data – 24bits, 5.3M pixels 16M bytes 4G memory card ($10-30) 250 pictures raw image (16M bytes) JPEG encoder compressed JPEG file (1M bytes) compression ratio=16 4000 pictures EE565 Advanced Image Processing Copyright Xin Li 2009-2012 1 Roadmap to Image Coding • Introduction to data compression – A modeling perspective – Shannon’s entropy and Rate-Distortion Theory* (skipped) – Arithmetic coding and context modeling • Lossless image compression (covered in EE465) – Spatially adaptive prediction algorithms • Lossy image compression – Before EZW era: first-generation wavelet coders – After EZW era: second-generation wavelet coders – A quick tour of JPEG2000 • New direction in image coding EE565 Advanced Image Processing Copyright Xin Li 2009-2012 2 Modeler’s View on Image Coding Spatial-domain models Transform-domain models Stationary process Conventional MED, GAP Non-Stationary process Least-Square Based Edge Directed Prediction Nonparametric (patch-based) Intra-coding in H.264 Stationary GGD First-generation Wavelet coders Non-Stationary GGD Second-generation Wavelet coders Patch-based Transform models Next-generation coders EE565 Advanced Image Processing Copyright Xin Li 2009-2012 3 Two Regimes • Lossless coding – No distortion is tolerable – Decoded signal is mathematically identical to the encoded one • Lossy coding – Distortion is allowed for the purpose of achieving higher compression ratio – Decoded signal should be perceptually similar to the encoded one EE565 Advanced Image Processing Copyright Xin Li 2009-2012 4 Data Compression Basics Discrete Source: X is a discrete random variable x {1,2,..., N } pi prob ( x i ), i 1,2,..., N N p i 1 i 1 Shannon’s Source Entropy Formula N H ( X ) pi log 2 pi i 1 (bits/sample) or bps weighting coefficients EE565 Advanced Image Processing Copyright Xin Li 2009-2012 5 Code Redundancy Theoretical bound Practical performance r l H (X ) 0 N Average code length: l pi li i 1 N 1 H ( X ) pi log 2 pi i 1 li: the length of codeword assigned to the i-th symbol Note: if we represent each symbol by q bits (fixed length codes), Then redundancy is simply q-H(X) bps EE565 Advanced Image Processing Copyright Xin Li 2009-2012 6 How to achieve source entropy? discrete source X entropy coding binary bit stream P(X) Note: The above entropy coding problem is based on simplified assumptions are that discrete source X is memoryless and P(X) is completely known. Those assumptions often do not hold for real-world data such as images and we will discuss them later. EE565 Advanced Image Processing Copyright Xin Li 2009-2012 7 Two Goals of VLC design • Achieve optimal code length (i.e., minimal redundancy) For an event x with probability of p(x), the optimal code-length is –log2p(x) , where x denotes the smallest integer larger than x (e.g., 3.4=4 ) code redundancy: r l H ( X ) 0 Unless probabilities of events are all power of 2, we often have r>0 • Satisfy prefix condition EE565 Advanced Image Processing Copyright Xin Li 2009-2012 8 Prefix condition No codeword is allowed to be the prefix of any other codeword. 1 11 0 10 1 01 00 …… codeword 1 0 11 111 10 110… … codeword 2 EE565 Advanced Image Processing Copyright Xin Li 2009-2012 9 Huffman Codes (Huffman’1952) • Coding Procedures for an N-symbol source – Source reduction • • • List all probabilities in a descending order Merge the two symbols with smallest probabilities into a new compound symbol Repeat the above two steps for N-2 steps – Codeword assignment • • Start from the smallest source and work back to the original source Each merging point corresponds to a node in binary codeword tree EE565 Advanced Image Processing Copyright Xin Li 2009-2012 10 A Toy Example symbol x e a i o u p(x) 0.4 0.4 0.2 0.2 0.2 0.1 0.1 0.2 0.6 0 (aiou) 0.4 1 0.4 (iou) 0.2 0.4 0.2 (ou) compound symbols Source reduction EE565 Advanced Image Processing Copyright Xin Li 2009-2012 0 1 (aiou) e 00 01 (iou) a 000 001 (ou) i 0010 0011 o u Codeword Assignment 11 Arithmetic Coding • One of the major milestones in data compression (just like Lempel-Ziv coding used in WinZIP) • The building block of almost all existing compression algorithms including text, audio, image and video • Remarkably simple idea and ease of implementation (especially computational efficiency in the special case of binary arithmetic coding) EE565 Advanced Image Processing Copyright Xin Li 2009-2012 12 Basic Idea • The input sequence will be mapped to a unique real number on [0,1] – The more symbols are coded, the smaller such interval (therefore it takes more bits to represent the interval) – The size of interval is proportional to the probability of the whole sequence • Note that we still assume source X is memoryless – source with memory will be handled by context modeling techniques next EE565 Advanced Image Processing Copyright Xin Li 2009-2012 13 Example Alphabet: {E,Q,S,U,Z} Input Sequence: SQUEEZ… P(E)=0.429 P(Q)=0.142 P(S)=0.143 P(U)=0.143 P(Z)=0.143 P(SQUEEZ)= P(S)P(Q)P(U) P(E)2P(Z) EE565 Advanced Image Processing Copyright Xin Li 2009-2012 14 Example (Con’d) Symbol sequence P(X) SQUEEZ… Interval [0.64769,0.64777] The mapping of a real number to a binary bit stream is easy 0 Notes: First bit 1 Second bit Any number between 0.64769 and 0.64777 will produce a sequence starting from SQUEEZ How do we know when the sequence stops? I.e. how can encoder distinguish between SQUEEZ and SQUEEZE? EE565 Advanced Image Processing Copyright Xin Li 2009-2012 15 Another Example Solution: Use a special symbol to denote end of block (EOB) For example: if we use “!” as EOB symbol, “eaii” becomes eaii!” In other words, we will assign a nonzero probability for EOB symbol. EE565 Advanced Image Processing Copyright Xin Li 2009-2012 16 Implementation Issues Witten, I. H., Neal, R. M., and Cleary, J. G. “Arithmetic coding for data compression”. Commun. ACM 30, 6 (Jun. 1987), 520-540. EE565 Advanced Image Processing Copyright Xin Li 2009-2012 17 Arithmetic Coding Summary • Based on the given probability model P(X), AC maps the symbol sequence to a unique number between 0 and 1, which can then be conveniently represented by binary bits • You will compare Huffman coding and Arithmetic coding in your homework and learn how to use it in the computer assignment EE565 Advanced Image Processing Copyright Xin Li 2009-2012 18 Context Modeling • Arithmetic coding (entropy coding) solves the problem under the assumption that P(X) is known • In practice, we don’t know P(X) and have to estimate it from the data • More importantly, the memoryless assumption with source X does not hold for real-world data EE565 Advanced Image Processing Copyright Xin Li 2009-2012 19 Probability Estimation Problem Given a sequence of symbols, how do we estimate the probability of each individual symbol? Forward solution: Encoder counts the frequency of each symbol for the whole sequence and transmit the frequency table to the decoder as the overhead Backward solution (more popular in practice): Both encoder and decoder count the frequency of each symbol on-the-fly from the causal past only (so no overhead is needed) EE565 Advanced Image Processing Copyright Xin Li 2009-2012 20 Examples For simplicity, we will consider binary symbol sequence (M-ary sequence is conceptually similar) S={0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1} Forward approach: count 4 “1”s and 12 “0”s P(0)=3/4, P(1)=1/4 Backward approach: P(0) P(1) N(0) N(1) start 1/2 1/2 1 1 0 2/3 1/3 2 1 0 3/4 1/4 3 1 0 4/5 1/5 4 1 EE565 Advanced Image Processing Copyright Xin Li 2009-2012 21 Backward Adaptive Estimation The probability estimation will be based on the causal past with a specified window length T (i.e., assume source is Markovian) Such adaptive estimation is particularly effective for handling sequence with dynamically varying statistics Example: 0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,1,1,0,0,0,0 T P(0)=.6 P(1)=.4 EE565 Advanced Image Processing Copyright Xin Li 2009-2012 22 Now Comes Context • Importance of context – Context is a fundamental concept to help us resolve ambiguity • The best known example: By quoting Darwin "out of context" creationists attempt to convince their followers that Darwin didn't believe the eye could evolve by natural selection. • Why do we need context? – To handle the memory in the source – Context-based modeling often leads to better estimation of probability models EE565 Advanced Image Processing Copyright Xin Li 2009-2012 23 Order of Context “q u o t e” First order context Note: P(u)<<P(u|q) “s h o c k w a v e” Second order context Context Dilution Problem: If source X has N different symbols, K-th order context modeling will define NK different contexts (e.g., consider N=256 for images) EE565 Advanced Image Processing Copyright Xin Li 2009-2012 24 Context-Adaptive Probability Estimation Thumb rule: in estimating probabilities, only those symbols with the same context will be used in counting frequencies 1D Example 0,1,0,1,0,1,0,1, 0,1,0,1,0,1,0,1, 0,1,0,1,0,1,0,1, zero-order (No) context P(0)=P(1)1/2 first-order context P(1|0)=P(0|1)=1, P(0|0)=P(1|1)=0 EE565 Advanced Image Processing Copyright Xin Li 2009-2012 25 2D Example (Binary Image) 000000 011110 011110 011110 011110 000000 Zero-order context: P(0)=5/9, P(1)=4/9 First-order context: W-X P(X|W)=? W=0 (total 20): P(0|0)=4/5, P(1|0)=1/5 W=1 (total 16): P(0|1)=1/4, P(1|1)=3/4 Fourth-order context: NW N NE W X P(1|1111)=1 EE565 Advanced Image Processing Copyright Xin Li 2009-2012 26 Data Compression Summary • Entropy coding is solved by arithmetic coding techniques • Context plays an important role in statistical modeling of source with memory (there exists a problem of context dilution which can be handled by quantizing the context information) • Quantization of memoryless source is solved by Lloyd-Max algorithm EE565 Advanced Image Processing Copyright Xin Li 2009-2012 27 Quantization Theory (Rate-Distortion Theory) Q x ^ x e X Xˆ Quantization noise: For a continuous random variable, distortion is defined by D f ( x)( x xˆ ) 2 dx probability distribution function For a discrete random variable, distortion is defined by N D pi ( xi xˆi ) 2 i 1 probabilities EE565 Advanced Image Processing Copyright Xin Li 2009-2012 28 Recall: Quantization Noise of UQ A -A f(e) 1/ - /2 e /2 Quantization noise of UQ on uniform distribution is also uniformly distributed Recall Variance of U[- /2, /2] is 1 2 12 2 EE565 Advanced Image Processing Copyright Xin Li 2009-2012 29 6dB/bit Rule of UQ Signal: X ~ U[-A,A] s2 Noise: e ~ U[- /2, /2] Choose N=2n (n-bit) codewords for X 1 1 (2 A) 2 A2 12 3 1 2 2 e 12 2A N (quantization stepsize) A2 s2 SNR 10 log 10 2 10 log 10 2 3 20 log 10 N 6.02n(dB) e 12 EE565 Advanced Image Processing Copyright Xin Li 2009-2012 N=2n 30 Shannon’s R-D Function D R-D function of source X determines the optimal tradeoff between Rate and Distortion R ^ R(D)=min R s.t. E[(X-X)2]≤D EE565 Advanced Image Processing Copyright Xin Li 2009-2012 31 A Few Cautious Notes Regarding Distortion • Unlike rate (how many bits are used?), definition of distortion is nontrivial at all • Mean-Square-Error (MSE) is widely used and will be our focus in this class • However, for image signals, MSE has little correlation with subjective quality (the design of perceptual image coders is a very interesting research problem which is still largely open) EE565 Advanced Image Processing Copyright Xin Li 2009-2012 32 Gaussian Random Variable X: a given random variable with Gaussian distribution N(0,σ2) Its Rate-Distortion function is known as 1 2 log R( D) 2 D 0 0 D 2 D 2 2 2 R D 2 EE565 Advanced Image Processing Copyright Xin Li 2009-2012 33 Quantizer Design Problem For a memoryless source X with pdf of P(X), how to design a quantizer (i.e., where to put the L=2K codewords) to minimize the distortion? Solution: Lloyd-Max Algorithm minimized MSE (we will study it in detail on the blackboard) EE565 Advanced Image Processing Copyright Xin Li 2009-2012 34 Rate Allocation Problem* LL HL LH HH Given a quota of bits R, how should we allocate them to each band to minimize the overall MSE distortion? Solution: Lagrangian Multiplier technique (we will study it in detail on the blackboard) EE565 Advanced Image Processing Copyright Xin Li 2009-2012 35 Gap Between Theory and Practice • Information theoretical results offer little help in the practice of data compression • What is the entropy of English texts, audio, speech or image? – Curse of dimensionality • Without exact knowledge about the subband statistics, how can we solve the rate allocation problem? – Image subbands are nonstationary and nonGaussian • What is the class of image data we want to model in the first place? – Importance of understanding the physical origin of data and its implication into compression EE565 Advanced Image Processing Copyright Xin Li 2009-2012 36