Context-based Data Compression Xiaolin Wu Polytechnic University Brooklyn, NY Part 3. Context modeling Context model – estimated symbol probability Variable length coding schemes need estimates of probability of each symbol - model Model can be Static - Fixed global model for all inputs English text Semi-adaptive - Computed for specific data being coded and transmitted as side information C programs Adaptive - Constructed on the fly Any source! 7/15/2016 2 Adaptive vs. Semi-adaptive Advantages of semi-adaptive Simple decoder Disadvantages of semi-adaptive Overhead of specifying model can be high Two-passes of data required Advantages of adaptive one-pass universal As good if not better Disadvantages of adaptive Decoder as complex as encoder Errors propagate 7/15/2016 3 Adaptation with Arithmetic and Huffman Coding Huffman Coding - Manipulate Huffman tree on the fly - Efficient algorithms known but nevertheless they remain complex. Arithmetic Coding - Update cumulative probability distribution table. Efficient data structure / algorithm known. Rest essentially same. Main advantage of arithmetic over Huffman is the ease by which the former can be used in conjunction with adaptive modeling techniques. 7/15/2016 4 Context models If source is not iid then there is complex dependence between symbols in the sequence In most practical situations, pdf of symbol depends on neighboring symbol values - i.e. context. Hence we condition encoding of current symbol to its context. How to select contexts? - Rigorous answer beyond our scope. Practical schemes use a fixed neighborhood. 7/15/2016 5 Context dilution problem The minimum code length of sequence x1 x2 xn achievable by arithmetic coding, if P( xi | xi 1 xi 2 x1 ) is known. The difficulty of estimating P( xi | xi 1 xi 2 x1 ) due to insufficient sample statistics prevents the use of high-order Markov models. 7/15/2016 6 Estimating probabilities different contexts Two approaches Maintain symbol occurrence counts within each context number of contexts needs to be modest to avoid context dilution Assume pdf shape within each context same (e.g. Laplacian), only parameters (e.g. mean and variance) different Estimation may not be as accurate but much larger number of contexts can be used 7/15/2016 7 Entropy (Shannon 1948) Consider a random variable X { x , x ,..., x N 1} source alphabet : 0 1 probability mass functionp X:( xi ) prob( X xi ) i( xisi ) log P( xi ) Self-information xof i measured in bits if the log is base 2. event of lower the probability carries more information Self-entropy is the weighted average of self information H ( X ) P( x ) log P( x ) i i i 7/15/2016 8 Conditional Entropy Consider two random variables X { x , x ,..., x } X Alphabet of : 0 1 N 1 Alphabet of C {:c0 , c1 ,..., cM 1} C and xi of Conditional Self-information is i( xi | cm ) log P( xi | cm ) Conditional Entropy is the average value of conditional self-information N 1 M 1 H ( X | C ) P( xi | cm ) P (cm ) log 2 P( xi | cm ) i 0 m 0 7/15/2016 9 Entropy and Conditional Entropy H ( X | C) The conditional entropy can be X interpreted as the amount of uncertainty C we know remaining about the , given that random variable . C The additional knowledge of X the uncertainty about . should reduce H ( X | C) H ( X ) 7/15/2016 10 Context Based Entropy Coders Consider a sequence of symbols S symbol to code form the context space, C 0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 1 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 1 1 0 1 1 0 ... C0 0 0 0 C1 0 0 1 C7 1 1 1 7/15/2016 EC EC H (C0 ) H (C1 ) R P(C )H (C ) i i i EC H (C7 ) 11 Decorrelation techniques to exploit sample smoothness Transforms DCT, FFT wavelets Differential Pulse Coding Modulation (DPCM) predict current symbol with past observations d xˆi t xi t t 1 code prediction residual rather than the symbol ei xi xˆi xi xˆi ei 7/15/2016 12 Benefits of prediction and transform A priori knowledge exploited to reduce the self-entropy of the source symbols Higher coding efficiency due to Fewer parameters to be estimated adaptively Faster convergence of adaptation 7/15/2016 13 Further Reading Text Compression - T.Bell, J. Cleary and I. Witten. Prentice Hall. Good coverage on statistical context modeling. Focus on text though. Articles in IEEE Transactions on Information Theory by Rissanen and Langdon Digital Coding of Waveforms: Principles and Applications to Speech and Video. Jayant and Noll. Good coverage on Predictive coding. 7/15/2016 14