Compression Kurs INFMKT Compression 10110100 Ifi, UiO Norsk Regnesentral Vårsemester 2003 Wolfgang Leister This part of the course ... • ... is held at Ifi, UiO ... (Wolfgang Leister) • … and at University College Karlsruhe (Peter Oel, Clemens Knoerzer) Overview • Information Theory • Data Compression Preview • Image Formats – JPEG / JIFF – Wavelet Compression – Fractal Compression • Video Formats – MJPEG – MPEG Multimedia data types • • • • • • • Images Graphics Video / Image sequences Audio 3D-Data Text / Documents others Multimedia data types • • • • • • • Images Graphics Video / Image sequences Audio 3D-Data Text / Documents others Information Theory • Symbols: A = a0,a1 , ... aN-1 • Coding: C = c 0,c1, ... cN-1 – trivial Coding (constant Code Length) • ci = b i0,bi1 , ... biM-1 • i∈[0,N-1] • bij∈{0,1} • M = log2 N (binary) (Code length, number of bits) • Distribution: P = p 0,p1, ... pN-1 – probability for symbols Information Theory Information content: (Entropy) N −1 H0 = −∑ p i log p i i= 0 (Basis = 2 ⇒ bit/code word) Example: Equal distribution p i = N1 N −1 H0 = −∑ N1 log N1 =− log N1 =logN i= 0 Information theory • Coding: C = c 0,c1, ... cN-1 – ci = b i0,b i1, ... bili-1 • i∈[0,N-1] • bij∈{0,1} • li = length of code word c i • average code length: N−1 L = ∑ pi l i ≥ H 0 i= 0 Information Theory • When is L = H0 ? li = − log 2 p i p i = 21k p i ≠ 21k ? • What if – Group events into one code word: (ai1,........,ain) → ci – Every (ai)-combination must be available: Nn code words –Better: arithmetic coding Intermezzo ... Why Data compression? • An image says more than a thousand words … • An image needs more space than a thousand words … • Data are contain redundancy ... • Humans love redundancy … lossless run length encoding optimal Codes adaptive Codes with loss Quantizing Clustering Descriptive Transformation Compression techniques Compression techniques lossless run length encoding with loss Compression techniques lossless run length encoding run length encoding: 000011100000000110000101111110000000000 with loss (4x0)(3x1)(8x0)(2x1)(4x0)(1x1)(1x0)(6x1)(10x0) 4,3,8,2,4,1,1,6,10 Examples: PCX Fax JPEG Compression techniques lossless run length encoding optimal Codes with loss Compression techniques lossless run length encoding optimal Codes Code words have different lengths Length of code word dependent on probability: with loss (high probability à short Codewort) (low probability à long Codewort) Global and fixed code word table ñ ò Code word table is part of coding Compression techniques Example: Huffman-Coding: lossless run length encoding 1.Step: Events to be coded are sorted by probabilities (rising order). optimal Codes 2.Step: The events with least probabilities are removed from list, united to one event, and added sorted into list with sum of both probabilities. with loss 3.Step: Repeat Step 2 until only one element is contained in list. 4.Step: Code words are generated by marking the edges in the binary tree by 0 and 1. Read the code word from top to bottom. Compression techniques Example Huffman-Coding: lossless a b 0.2 0.15 c 0.4 d 0.05 c 0.4 a 0.2 e 0.2 b 0.15 with loss run e length encoding 0.2 optimal Codes d 0.05 Compression techniques Example: Huffman-Coding: lossless a b 0.2 0.15 c 0.4 d 0.05 c 0.4 c 0.4 a 0.2 a 0.2 e 0.2 e 0.2 b 0.15 bd 0.2 with loss run e length encoding 0.2 optimal Codes d 0.05 b 0.15 d 0.05 Compression techniques Example: Huffman-Coding: lossless a b 0.2 c 0.4 c 0.4 with c 0.4 c 0.4 d 0.05 a e 0.2 0.2 a e 0.2 0.2 loss bde a 0.4 0.2 b 0.15 bd 0.2 0.15 run e length encoding 0.2 optimal Codes d 0.05 e 0.2 b 0.15 d 0.05 Compression techniques Example: Huffman-Coding: lossless a b 0.2 c 0.4 d 0.05 a e 0.2 0.2 a e 0.2 0.2 loss bde a 0.4 0.2 c 0.4 b 0.15 bd 0.2 0.15 c 0.4 c 0.4 with c 0.4 bdea 0.6 run e length encoding 0.2 optimal Codes d 0.05 e 0.2 b 0.15 d 0.05 a 0.2 Compression techniques Example: Huffman-Coding: lossless a b 0.2 c 0.4 c 0.4 with c 0.4 bdea 0.6 bdeac 1.0 c 0.4 d 0.05 a e 0.2 0.2 a e 0.2 0.2 loss bde a 0.4 0.2 c 0.4 b 0.15 bd 0.2 0.15 run e length encoding 0.2 optimal Codes d 0.05 e 0.2 b 0.15 d 0.05 a 0.2 c 0.4 Compression techniques Example: Huffman-Coding: lossless a b 0.2 c 0.4 d 0.05 a e 0.2 0.2 a e 0.2 0.2 loss bde a 0.4 0.2 c 0.4 b 0.15 bd 0.2 0.15 c 0.4 c 0.4 with c 0.4 bdea 0.6 bdeac 1.0 run e length encoding 0.2 optimal Codes d 0.05 0 1 0 1 0 e 0.2 1 0 1 b 0.15 d 0.05 a 0.2 c 0.4 Compression techniques Example: Huffman-Coding: lossless a b 0.2 0.15 c a a 0.4 b 0.2 c c a 0.4 d 0.2 with bde c e loss 0.4 0.4 bdea c 0.6 0.4 bdeac 1.0 c 0.4 d 0.05 01e b 0010 0.2 0.15 1e bd 0011 0.2 0.2 000 a 0.2 run e length encoding 0.2 optimal Codes d 0.05 0 1 0 1 0 e 0.2 1 0 1 b 0.15 d 0.05 a 0.2 c 0.4 Compression techniques lossless run length encoding optimal Codes adaptive Codes with loss Compression techniques lossless run length encoding optimal Codes adaptive Codes Method by Ziv and Lempel with loss Code tabelle is generated while coding No need to transfer code tabelle Compression techniques lossless Example: with 1 2 loss3 4 5 6 7 8 ... a b c ab run length encoding aabboptimal c a b a b aCodes c adaptive Codes w: a b Output: 1 Compression techniques lossless Example: with 1 2 loss3 4 5 6 7 8 ... a b c ab bc run length encoding abboptimal cc a b a b aCodes c adaptive Codes w: b c Output: 12 Compression techniques lossless Example: with 1 2 loss3 4 5 6 7 8 ... a b c ab bc ca run length encoding optimal a bc caa b a b aCodes c adaptive Codes w: c a Output: 12 3 Compression techniques lossless Example: with 1 2 loss3 4 5 6 7 8 ... a b c ab bc ca aba run length encoding a boptimal caabbaa b aCodes c adaptive Codes w: a b a Output: 12 3 4 Compression techniques lossless Example: with 1 2 loss3 4 5 6 7 8 ... run length encoding a boptimal c a baabbaaCodes cc adaptive Codes a b c ab bc ca aba abac w: a b a c Output: 12 3 4 7 Arithmetic Coding • Symbols are represented by probability intervals • Symbol chains are represented by a concatenation of their probability intervals Arithmetic Coding 1.0 0.95 # A a b c d e # P 0.2 0.15 0.35 0.05 0.2 0.05 e 0.75 0.7 d c # = [0.95, 1.0) e = [0.75, 0.95) d = [0.7, 0.75) c = [0.35, 0.7) b = [0.2, 0.35) a = [0.0, 0.2) 0.35 b 0.2 a 0.96.. 0.75 0.71.. 0.5 0.25 0.125 0.11111 0.11 0.10111 0.1 0.01 0.001 0.0 Arithmetic Coding 1.0 0.95 # e 0.75 0.7 d c 0.35 b 0.2 a 0.0 a# ae ad ac ab aa A a b c d e # P 0.2 0.15 0.35 0.05 0.2 0.05 Arithmetic Coding 1.0 0.95 0.75 0.7 0.35 # b# e be d bd c bc b bb a ba A a b c d e # P 0.2 0.15 0.35 0.05 0.2 0.05 0.35 0.2 0.0 0.2 Arithmetic Coding bca# 1.0 0.95 0.75 0.7 # 0.35 b# 0.305 bc# 0.263 bca# e be bce bcae d bd bcd bcad c bc bcc bcac b bb bcb bcab a ba bca bcaa 0.263 0.35 0.2 0.0 0.2 0.2525 0.2525 0.262475 Arithmetic Coding 0.263 0.01000011010100111 0.262475 0.01000011001100001 0.35 0.305 1.0 # 0.95 0.0100001101 b# be e 0100001101 0.75 Huffman: bd d 0110110(0000) 0.7 bc# 0.263 bca# bce bcae bcd bcad c bc bcc bcac b bb bcb bcab a ba bca bcaa 0.263 0.35 0.2 0.0 0.2 0.2525 0.2525 Compression techniques lossless Quantising: with loss Analogue 8 Bit RGB Clustering run length encoding optimal Codes adaptive Codes à Digital à 4 Bit Quantisierung à Colour palette Clustering 0.262475 Compression techniques lossless run length encoding optimal Codes adaptive Codes with loss Quantising Clustering lossless run length encoding optimal Codes adaptive Codes with loss Quantising Clustering Descriptive Transformation Compression techniques Image Data formats lossless PBM+ GIF PNG (JPEG) with loss ? JPEG Wavelet Compr. Fractal Compr. The End of Lecture