Lecture 4

advertisement
Lecture 4
Source Coding and Compression
Dr.-Ing. Khaled Shawky Hassan
Room: C3-222, ext: 1204,
Email: khaled.shawky@guc.edu.eg
1
Static vs. Adaptive Coding
Encoder
Static (Two-Pass Model)
1. Initialize the data model based on a first pass over the data
(i.e., perform the probabilities analysis)
2. Transmit the data model (encoder).
3. Send data and while there is more data to send:
-- Encode the next symbol using the existing data model and send it.
Decoder
1. Receive the data model (decoder).
2. Receive the data and while there is more data to receive
-- Decode the next symbol using the data model and output it.
Summary about the Two-Pass procedure:
1. Collect statistics, generate codewords
(1st pass round)
2. Perform actual encoding/compression
(2nd pass round)
3. Not practical in many situations (e.g., compressing network transmissions)
2
Static vs. Adaptive Coding
Adaptive (One-Pass Model)
Encoder
1. Initialize the data model as fixed probability and fixed code length.
2. Send data first and while there is more data to send
a. Encode the next symbol using the data model (if we have) and
send it.
b. Modify the existing data model based on the last symbol.
Decoder
1. Initialize the data model as per agreement.
2. While there is more data to receive
a. Decode the next symbol using the data model and output it.
b. Modify the data model based on the decoded symbol.
What Do We Find ?
No Encoder map to send!
3
Huffman Coding (e.g.: Lossless JPEG)
Properties:
I. Huffman codes are built from the bottom up, starting with the leaves of the tree and
working progressively closer to the root
II. Huffman coding will always at least work more efficient than Shannon-Fano coding,
so it has become the predominate entropy coding method
III. It was shown that Huffman coding cannot be improved or with any other integral bitwidth coding stream
Sibling Property: Defined by Gallager [Gallager 1978]:
“A binary code tree has the sibling property if each node (except the root) has a sibling and
if the nodes can be listed in order of nonincreasing (decreasing) weight with each node
adjacent to its sibling.”
Thus:
1- If A is the parent node of B (left) and C (right) is a child of B, then
W(A) > W(B) > W(C)
Thus if A is the parent node of B (left) and C (right), then
W(B) < W(C)
4
Huffman Coding Properties
A binary tree is a Huffman tree if and only if it obeys the sibling
property, i.e.,
W(#1) ≤ W(#3) ≤ W(#3) ≤ … ≤ W(#7) ≤ W(#8) ≤ W(#9)
Non-Decreasing Order
#1 A(1)
#5(3)
#2 B(2)
#7(7)
#3 C(2)
#6(4)
#9(17)
#4 D(2)
5
#8 E(10)
Adaptive Huffman Coding
Algorithm:
o Given: alphabet S = {s1, …, sn} (NO Probabilities !!!)
o Pick a fixed default binary codes for all symbols (block/quadratic code)
o Start with an empty “Huffman” tree (I said and I mean it – Empty )
o Read symbol s from source
If NYT(s) %% (//) Not Yet Transmitted
Send NYT, default(s) (except for the first symbol)
Update the tree (and keep it Huffman)
Else
Send codeword for s
• Update tree
o Repeat until done with all symbols in the source
6
Example (Adaptive Huffman)
• Assume we are encoding the message [a a r d v a r k]
• The total number of nodes in this tree will be (at most) 2*n – 1 + 2 = 2*26 -1 +2 = 53
where n is the number of usable alphabets and +2 is only for the “NYT” and its “node”
• The first letter to be transmitted is “a”
• As a does not yet exist in the tree, we send a binary code 00000 for a and then add a
to the tree
• The NYT node gives birth to a new NYT node and a terminal node corresponding to “a”
• In this example, we will consider only 51 nodes and leaves (instead of 53!!). However,
the correct is 53. The weight of the terminal node will be higher than the NYT node, so
we assign the number 49 to the NYT node and 50 to the terminal node “a”
• The next symbol is a, and the transmitted code is 1 now (as a = 1 only now!)
• Lest see an example … (we first starts with a fixed code!)
7
Example: Adaptive Huffman Coding
Input: aardvark
To keep the rest of the slides
as is, we started as the book
with 51; however, the correct
thing is to start with 53!
Output:
Symbol
Code
NYT
0
a
r
d
v
k
8
Example: Adaptive Huffman Coding
Input: aardvark
Output: 00000
Symbol
Code
NYT
0
a
r
d
v
k
9
Example: Adaptive Huffman Coding
Input: aardvark
1
Output: 00000
Symbol
Code
NYT
0
a
1
1
r
d
v
k
10
Example: Adaptive Huffman Coding
Input: aardvark
Output: 000001
Symbol
Code
NYT
0
a
1
r
d
v
k
11
Example: Adaptive Huffman Coding
Input: aardvark
Output: 000001010001
Symbol
Code
NYT
00
a
1
r
01
d
v
k
12
Example: Adaptive Huffman Coding
Input: aardvark
Output:0000010100010000011
Symbol
Code
NYT
000
a
1
r
01
d
001
v
k
13
Example: Adaptive Huffman Coding
Input: aardvark
Output:0000010100010000011000
Symbol
Code
NYT
0000
a
1
r
01
d
001
v
0001??
k
14
Example: Adaptive Huffman Coding
Input: aardvark
Output:0000010100010000011000
Symbol
Code
NYT
0000
a
1
r
01
d
001
v
0001??
k
15
Example: Adaptive Huffman Coding
Input: aardvark
Output:0000010100010000011000
Symbol
Code
NYT
0000
a
1
r
01
d
001
v
0101 ??
k
16
Example: Adaptive Huffman Coding
Input: aardvark
Output:0000010100010000011000
Symbol
Code
NYT
000
a
1
r
01
d
001
v
??
k
17
Example: Adaptive Huffman Coding
Input: aardvark
Output:000001010001000001100010101
Symbol
Code
NYT
1100
a
0
r
10
d
111
v
1101
k
18
Example: Adaptive Huffman Coding
Input: aardvark
Output:0000010100010000011000101010
Symbol
Code
NYT
1100
a
0
r
10
d
111
v
1101
k
19
Example: Adaptive Huffman Coding
Input: aardvark
Output:000001010001000001100010101010
Symbol
Code
NYT
1100
a
0
r
10
d
111
v
1101
k
20
Example: Adaptive Huffman Coding
Input: aardvark
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
11000?
a
0
r
10
d
111
v
1101
k
11001??
21
Example: Adaptive Huffman Coding
Input: aardvark
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
11000?
a
0
r
10
d
111
v
1101
k
11001 ??
22
Example: Adaptive Huffman Coding
Input: aardvark
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
11100
a
0
r
10
d
110
v
1111
k
11101
23
Adaptive Huffman Decoding
Input: a
Output:0000010100010000011000101010
10110001010
Symbol
NYT
Code
1
a
r
1
d
v
k
24
Adaptive Huffman Decoding
Input: aa
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
0
a
1
r
d
v
k
25
Adaptive Huffman Decoding
Input: aar
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
00
a
1
r
01
d
v
k
26
Adaptive Huffman Decoding
Input: aard
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
000
a
1
r
01
d
001
v
k
27
Adaptive Huffman Decoding
Input: aardv
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
0000 ?
a
1
r
01
d
001
v
0001??
k
28
Adaptive Huffman Decoding
Input: aardv
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
0000
a
1
r
01
d
001
v
0001??
k
29
Adaptive Huffman Decoding
Input: aardv
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
0000
a
1
r
01
d
001
v
0101 ??
k
30
Adaptive Huffman Decoding
Input: aardv
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
000
a
1
r
01
d
001
v
??
k
31
Adaptive Huffman Decoding
Input: aardv
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
1100
a
0
r
10
d
111
v
1101
k
32
Adaptive Huffman Decoding
Input: aardva
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
1100
a
0
r
10
d
111
v
1101
k
33
Adaptive Huffman Decoding
Input: aardvar
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
1100
a
0
r
10
d
111
v
1101
k
34
Adaptive Huffman Decoding
Input: aardvar
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
1100
a
0
r
10
d
111
v
1101
k
35
Adaptive Huffman Decoding
Input: aardvark
Output:0000010100010000011000101010
10110001010
Symbol
Code
NYT
11000
a
0
r
10
d
111
v
1101
k
??
36
Adaptive Huffman Decoding
Input: aardvark
Output:0000010100010000011000101010
10110001010 ?
Symbol
Code
NYT
11100
a
0
r
10
d
110
v
1111
k
11101
37
Adaptive Huffman Exercise
Try to solve the following!
Find the adaptive Huffman encoder (compressor) for the following text:
raaaabcbaacvkl
Assuming 26 alphabet set!
38
Adaptive Huffman Notes
To Follow the Text Book
example:
If the source has an alphabet {a1,a2, …, am} of size m , then pick e and r such that
m = 2e+r and 0 ≤ r <2e . The letter ak is encoded as the ﴾e+1﴿-bit corresponds to k−1,
iff 1≤ k ≤2r; else, ak is encoded as (only) the e-bit binary representation of k−r−1.
Example:
suppose m = 26, then e = 4, and r=10.
Then symbol a1 is encoded as 00000, (“a” in English)
the symbol a2 is encoded as 00001, (“b” in English)
and the symbol a22 is encoded as 1011 (“b” in English)
39
Adaptive Huffman Applications
Lossless Image Compression
Steps to have lossless image compression:
1. Generate a Huffman code for each uncompressed
image (but already quantized and compressed with
lossy methods)
2. Encode the image using the Huffman code
3. Save it in a file again !!!
The original (uncompressed) image representation uses 8 bits/pixel. The image
consists of 256 rows of 256 pixels, so the uncompressed representation uses
65,536 bytes.
Compression ratio → number of bytes (uncompressed): number of bytes compressed
40
Adaptive Huffman Applications
Lossless Image Compression
41
Adaptive Huffman Applications
Lossless Image Compression
Image Name
Bits/Pixel
Total Size (B)
Compression Ratio
Sena
7.01
57,504
1.14
Sensin
7.49
61,430
1.07
Earth
4.94
40,534
1.62
Omaha
7.12
58,374
1.12
Huffman (Lossless JPEG) Compression Based on Pixel value
42
Adaptive Huffman Applications
Lossless Image Compression
Image Name
Bits/Pixel
Total Size (B)
Compression Ratio
Sena
4.02
32,968
1.99
Sensin
4.70
38,541
1.70
Earth
4.13
33,880
1.93
Omaha
6.42
52,643
1.24
Huffman Compression Based on Pixel Difference value and
Two-Pass Model
43
Adaptive Huffman Applications
Lossless Image Compression
Image Name
Bits/Pixel
Total Size (B)
Compression Ratio
Sena
3.93
32,261
2.03
Sensin
4.63
37,896
1.73
Earth
4.82
39,504
1.66
Omaha
6.39
52,321
1.25
Huffman Compression Based on Pixel Difference Value and
One-Pass Adaptive Model
44
Adaptive Huffman Applications
Lossless Image Compression
Image Name
Bits/Pixel
Total Size (B)
Compression Ratio
Sena
3.93
32,261
2.03
Sensin
4.63
37,896
1.73
Earth
4.82
39,504
1.66
Omaha
6.39
52,321
1.25
Huffman Compression Based on Pixel Difference Value and
One-Pass Adaptive Model
45
Optimality of Huffman Codes!
The necessary conditions for an optimal
variable-length binary code:
Condition 1: Given any two letters aj and ak, if P(aj) ≥ P(ak), then lj ≤ lk, where lj
is the number of bits in the codeword for aj.
Condition 2: The two least probable letters have codewords with the same
maximum length lm.
Condition 3: In the tree corresponding to the optimum code, there must be two
branches stemming from each intermediate node.
Condition 4: Suppose we change an intermediate node into a leaf node by
combining all the leaves descending from it into a composite word of a
reduced alphabet. Then, if the original tree was optimal for the original
alphabet, the reduced tree is optimal for the reduced alphabet.
46
Minimum Variance Huffman Codes
By performing the sorting procedure in a slightly different
manner, we could have found a different Huffman code.
47
Huffman Coding: Self Study!
3.2.1 Minimum Variance Huffman Codes (pp.
46 – 47 {redo the examples})
3.2.3 Length of Huffman Codes (pp. 49 ~ 51 and
the example 3.2.2)
3.2.3 Huffman Codes optimality condition!!
48
Download