Lec05, Entropy Coding, v1.06

Course Presentation Multimedia Systems Entropy Coding Mahdi Amiri October 2015 Sharif University of Technology Source and Channel Coding Shannon's Separation Principle Assumptions: Single source and user Unlimited complexity and delay Claude E. Shannon, 1916-2001 Information Source Source Coding Channel Coding Generates information we want to transmit or store. Reduces number of bits to store or transmit relevant information. Increases number of bits or changes them to protect against channel errors. Coding related elements in a communication system. What about joint source and channel coding? Page 1 Ref.: en.wikipedia.org/wiki/Information_theory information source: en.wikipedia.org/wiki/Information_source source coding : en.wikipedia.org/wiki/Data_compression Channel coding: en.wikipedia.org/wiki/Forward_error_correction Multimedia Systems, Entropy Coding Source Coding Motivation Data storage and transmission cost money. Use fewest number of bits to represent information source. Pro: Less memory, less transmission time. Cons: Extra processing required. Distortion (if using lossy compression ). Data has to be decompressed to be represented, this may cause delay. Page 2 Multimedia Systems, Entropy Coding Source Coding Principles Example The source coder shall represent the video signal by the minimum number of (binary) symbols without exceeding an acceptable level of distortion. Two principles are utilized: 1. Properties of the information source that are known a priori result in redundant information that need not be transmitted (“redundancy reduction“). 2. The human observer does not perceive certain deviations of the received signal from the original (“irrelevancy reduction“). Approaches: Lossless coding: completely reversible, exploit 1. principle only. Lossy coding: not reversible, exploit 1. and 2. principle. Page 3 Multimedia Systems, Entropy Coding Data Compression Lossless and Lossy Lossless Exact reconstruction is possible. Applied to general data. Lower compression rates. Examples: Run-length, Huffman, Lempel-Ziv. Lossy Higher compression rates. Applied to audio, image and video. Examples: CELP, JPEG, MPEG-2. Page 4 Multimedia Systems, Entropy Coding Data Compression Codec (Encoder and Decoder) Original signal Transform, prediction Quantization Entropy encoder T Q E Compressed bit-stream Reconstructed signal T-1 Q-1 E-1 Inverse Transform Dequantization Entropy decoder General structure of a Codec. In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium. Page 5 Ref.: en.wikipedia.org/wiki/Entropy_(information_theory) en.wikipedia.org/wiki/Entropy_encoding Multimedia Systems, Entropy Coding Entropy Coding Selected Topics and Algorithms Run-length encoding Fixed Length Coding (FLC) Variable Length Coding (VLC) Huffman Coding Algorithm Entropy, Definition Lempel-Ziv (LZ77) Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding Lossless Compression Run-Length Encoding (RLE) BBBBHHDDXXXXKKKKWWZZZZ Image of a rectangle 4B2H2D4X4K2W4Z 0, 40 0, 40 0,10 1,20 0,10 0,10 1,1 0,18 1,1 0,10 0,10 1,1 0,18 1,1 0,10 0,10 1,1 0,18 1,1 0,10 0,10 1,20 0,10 0,40 RLE used in Fax machines. Page 7 Multimedia Systems, Entropy Coding Lossless Compression Fixed Length Coding (FLC) A simple example The message to code: ►♣♣♠☻►♣☼►☻ Message length: 10 symbols 5 different symbols at least 3 bits Codeword table Total bits required to code: 10*3 = 30 bits Page 8 Multimedia Systems, Entropy Coding Lossless Compression Variable Length Coding (VLC) Intuition: Those symbols that are more frequent should have smaller codes, yet since their length is not the same, there must be a way of distinguishing each code The message to code: ►♣♣♠☻►♣☼►☻ Codeword table To identify end of a codeword as soon as it arrives, no codeword can be a prefix of another codeword How to find the optimal codeword table? Total bits required to code: 3*2 +3*2+2*2+3+3 = 24 bits Page 9 Multimedia Systems, Entropy Coding Lossless Compression VLC, Example Application Morse code nonprefix code Needs separator symbol for unique decodability Page 10 Multimedia Systems, Entropy Coding Lossless Compression Huffman Coding Algorithm Step 1: Take the two least probable symbols in the alphabet (longest codewords, equal length, differing in last digit) Step 2: Combine these two symbols into a single symbol, and repeat. P(n): Probability of symbol number n Here there is 9 symbols. e.g. symbols can be alphabet letters ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’ Page 11 Multimedia Systems, Entropy Coding Lossless Compression Huffman Coding Algorithm Paper: "A Method for the Construction of Minimum-Redundancy Codes“, 1952 Results in "prefix-free codes“ Most efficient No other mapping will produce a smaller average output size, If the actual symbol frequencies agree with those used to create the code. Cons: David A. Huffman Have to run through the entire data in advance to find frequencies. 1925-1999 ‘Minimum-Redundancy’ is not favorable for error correction techniques (bits are not predictable if e.g. one is missing). Does not support block of symbols: Huffman is designed to code single characters only. Therefore at least one bit is required per character, e.g. a word of 8 characters requires at least an 8 bit code. Page 12 Multimedia Systems, Entropy Coding Entropy Coding Entropy, Definition The entropy, H, of a discrete random variable X is a measure of the amount of uncertainty associated with the value of X. Information Theory X Information Source Point of View P(x) Probability that symbol x in X will occur H (X ) = ∑ P ( x ) ⋅ log 2 x∈ X 1 P ( x) Measure of information content (in bits) A quantitative measure of the disorder of a system It is impossible to compress the data such that the average number of bits per symbol is less than the Shannon entropy of the source(in noiseless channel) The Intuition Behind the Formula Claude E. Shannon 1 P ( x ) ↑ ⇒ amount of uncertatinty ↓ ⇒ H ∼ 1916-2001 P ( x) 1 bringing it to the world of bits ⇒ H ∼ log 2 = I ( x ) , information content of x P ( x) weighted average number of bits required to encode each possible value ⇒ × P ( x ) and ∑ Page 13 Multimedia Systems, Entropy Coding Lossless Compression Lempel-Ziv (LZ77) Algorithm for compression of character sequences Assumption: Sequences of characters are repeated Idea: Replace a character sequence by a reference to an earlier occurrence 1. Define a: search buffer = (portion) of recently encoded data look-ahead buffer = not yet encoded data 2. Find the longest match between the first characters of the look ahead buffer and an arbitrary character sequence in the search buffer 3. Produces output <offset, length, next_character> offset + length = reference to earlier occurrence next_character = the first character following the match in the look ahead buffer Page 14 Multimedia Systems, Entropy Coding Lossless Compression Lempel-Ziv-Welch (LZW) Drops the search buffer and keeps an explicit dictionary Produces only output <index> Used by unix "compress", "GIF", "V24.bis", "TIFF” Example: wabbapwabbapwabbapwabbapwoopwoopwoo Progress clip at 12th entry Encoder output sequence so far: 5 2 3 3 2 1 Page 15 Multimedia Systems, Entropy Coding Lossless Compression Lempel-Ziv-Welch (LZW) Example: wabbapwabbapwabbapwabbapwoopwoopwoo Progress clip at the end of above example Encoder output sequence: 5 2 3 3 2 1 6 8 10 12 9 11 7 16 5 4 4 11 21 23 4 Page 16 Multimedia Systems, Entropy Coding Lossless Compression Arithmetic Coding Encodes the block of symbols into a single number, a fraction n where (0.0 ≤ n < 1.0). Step 1: Divide interval [0,1) into subintervals based on probability of the symbols in the current context Dividing Model. Step 2: Divide interval corresponds to the current symbol into subintervals based on dividing model of step 1. Step 3: Repeat Step 2 for all symbols in the block of symbols. Step 4: Encode the block of symbols with a single number in the final resulting range. Use the corresponding binary number in this range with the smallest number of bits. See the encoding and decoding examples in the following slides Page 17 Multimedia Systems, Entropy Coding Lossless Compression Arithmetic Coding, Encoding Example: SQUEEZE Using FLC: 3 bits per symbol 7*3 = 21 bits P(‘E’) = 3/7 Prob. ‘S’ ‘Q’ ‘U’ ‘Z’: 1/7 Dividing Model We can encode the word SQUEEZE with a single number in [0.64769-0.64772) range. The binary number in this range with the smallest number of bits is 0.101001011101, which corresponds to 0.647705 decimal. The '0.' prefix does not have to be transmitted because every arithmetic coded message starts with this prefix. So we only need to transmit the sequence 101001011101, which is only 12 bits. Page 18 Multimedia Systems, Entropy Coding Lossless Compression Arithmetic Coding, Decoding Input Probabilities: P(‘A’)=60%, P(‘B’)=20%, P(‘C’)=10%, P(‘<space>’)=10% Decoding the input value of 0.538 60% Dividing model from input probabilities The fraction 0.538 (the circular point) falls into the sub-interval [0, 0.6) the first decoded symbol is 'A' The subregion containing the point is successively subdivided in the same way as diviging model. Since .538 is within the interval [0.48, 0.54), the second symbol of the message must have been 'C'. Since .538 falls within the interval [0.534, 0.54), the Third symbol of the message must have been '<space>'. The internal protocol in this example indicates <space> as the termination symbol, so we consider this is the end of decoding process Page 19 Multimedia Systems, Entropy Coding 20% 10% 10% Lossless Compression Arithmetic Coding Pros Typically has a better compression ratio than Huffman coding. Cons High computational complexity. Patent situation had a crucial influence to decisions about the implementation of an arithmetic coding (Many now are expired). Page 20 Multimedia Systems, Entropy Coding Supplementary Materials Lossless Compression CAVLC and CABAC CAVLC: Context-based adaptive variable-length coding. CABAC: Context-based adaptive binary arithmetic coding. Both, a form of entropy encoding and lossless compression used in the H.264/MPEG-4 AVC and h.265. Have multiple look-up tables (CAVLC) and multiple probability modes (CABAC). The different tables or probability modes are selected based on the context (input bit stream). Ref.: en.wikipedia.org/wiki/Context-adaptive_variable-length_coding en.wikipedia.org/wiki/Context-adaptive_binary_arithmetic_coding Page 21 Multimedia Systems, Entropy Coding Supplementary Materials Lossless Compression CAVLC vs. CABAC Though results are source-dependent, CABAC is generally regarded as being between 5–15% more efficient than CAVLC. This means that CABAC should deliver equivalent quality at a 5– 15% lower data rate, or better quality at the same data rate. CAVLC requires considerably less processing to decode than CABAC. Page 22 Multimedia Systems, Entropy Coding Multimedia Systems Entropy Coding Thank You Next Session: Color Space FIND OUT MORE AT... 1. http://ce.sharif.edu/~m_amiri/ 2. http://www.aictc.ir/ Page 23 Multimedia Systems, Entropy Coding

Lec05, Entropy Coding, v1.06

Related documents

Products

Support

Lec05, Entropy Coding, v1.06

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib