Huffman Coding

advertisement
Huffman Coding
A simple example
 Suppose we have a message consisting of 5 symbols, e.g.
[►♣♣♠☻►♣☼►☻]
 How can we code this message using 0/1 so the coded message
will have minimum length (for transmission or saving!)
 5 symbols  at least 3 bits
 For a simple encoding,
length of code is 10*3=30 bits
A simple example – cont.
 Intuition: Those symbols that are more frequent should have
smaller codes, yet since their length is not the same, there must
be a way of distinguishing each code
 For Huffman code,
length of encoded message
will be ►♣♣♠☻►♣☼►☻
=3*2 +3*2+2*2+3+3=24bits
Another Example
 A=0
B = 100
C = 1010
D = 1011
R = 11
 ABRACADABRA = 01001101010010110100110
 This is eleven letters in 23 bits
 A fixed-width encoding would require 3 bits for
five different letters, or 33 bits for 11 letters
 Notice that the encoded bit string can be
decoded!
Huffman codes
 Binary character code: each character is represented by
a unique binary string.
 A data file can be coded in two ways:
a
b
c
d
e
f
frequency(%)
45
13
12
16
9
5
fixed-length code
000
001
010
011
100
101
variable-length code
0
101
100
111
1101 1100
The first way needs 1003=300 bits. The second way needs
45 1+13 3+12 3+16 3+9 4+5 4=232 bits.
2016/5/29
Page 5
Variable-length code
 Need some carefulness to read the code.


001011101 (codeword: a=0, b=00, c=01, d=11.)
Where to cut? 00 can be explained as either aa or b.
 Prefix of 0011: 0, 00, 001, and 0011.
 Prefix codes: no codeword is a prefix of some
other codeword. (prefix free)
 Prefix codes are simple to encode and decode.
Page 6
2016/5/29
Using codeword in Table to encode and
decode
 Encode: abc = 0.101.100 = 0101100

(just concatenate the codewords.)
 Decode: 001011101 = 0.0.101.1101 = aabe
frequency(%)
a
45
b
13
c
12
d
16
e
9
f
5
fixed-length code
000
001
010
011
100
101
variable-length code
0
101
100
111
1101 1100
Page 7
2016/5/29
 Encode: abc = 0.101.100 = 0101100

(just concatenate the codewords.)
 Decode: 001011101 = 0.0.101.1101 = aabe

(use the (right)binary tree below:)
100
0
1
14
86
0
0
1
0
1
a:45
b:13
c:12
d:16
Tree for the fixed
length codeword
2016/5/29
55
0
1
25
14
28
1
a:45
0
1
58
100
0
0
1
e:9
f:5
30
0
1
c:12
b:13
0
14
0
f:5
Tree for variablelength codeword
1
d:16
1
e:9
Page 8
Binary tree
 Every nonleaf node has two children.

Why?
 The fixed-length code in our example is not optimal.
 The total number of bits required to encode a file is
B(T )   f (c)dT (c)
cC


f ( c ) : the frequency (number of occurrences) of c in the file
dT(c): denote the depth of c’s leaf in the tree
Page 9
2016/5/29
Constructing an optimal coding scheme
 Formal definition of the problem:
 Input: a set of characters C={c1, c2, …, cn},
each cC has frequency f[c].
 Output: a binary tree representing codewords
so that the total number of bits required for the
file is minimized.
 Huffman proposed a greedy algorithm to solve
the problem.
Page 10
2016/5/29
(a)
(b)
f:5
c:12
e:9
c:12
b:13
b:13
14
0
f:5
d:16
a:45
d:16
a:45
1
e:9
Page 11
2016/5/29
14
(c)
0
f:5
d:16
0
1
c:12
b:13
1
e:9
25
(d)
a:45
25
30
0
1
c:12
b:13
0
14
0
f:5
a:45
1
d:16
1
e:9
Page 12
2016/5/29
0
a:45
55
100
a:45
0
55
1
25
0
30
0
1
c:12
b:13
0
14
0
f:5
(e)
1
25
1
d:16
1
e:9
1
30
0
1
c:12
b:13
0
14
0
f:5
1
d:16
1
e:9
(f)
Page 13
2016/5/29
HUFFMAN(C)
1
n:=|C|
2
Q:=C
3
for i:=1 to n-1 do
4
z:=ALLOCATE_NODE()
5
x:=left[z]:=EXTRACT_MIN(Q)
6
y:=right[z]:=EXTRACT_MIN(Q)
7
f[z]:=f[x]+f[y]
8
INSERT(Q,z)
9
return EXTRACT_MIN(Q)
Page 14
2016/5/29
The Huffman Algorithm
 This algorithm builds the tree T corresponding to the
optimal code in a bottom-up manner.
 C is a set of n characters, and each character c in C is a
character with a defined frequency f[c].
 Q is a priority queue, keyed on f, used to identify the
two least-frequent characters to merge together.
 The result of the merger is a new object (internal node)
whose frequency is the sum of the two objects.
Page 15
2016/5/29
Time complexity
 Lines 4-8 are executed n-1 times.
 Each heap operation in Lines 4-8 takes O(lg n)
time.
 Total time required is O(n lg n).
Note: The details of heap operation will not be
tested. Time complexity O(n lg n) should be
remembered.
Page 16
2016/5/29
An Complete Example
Scan the original text
Eerie eyes seen near lake.
 What characters are present?
E e r i space
ysnalk.
Building a Tree
Scan the original text
Eerie eyes seen near lake.
 What is the frequency of each character in the text?
Char Freq.
E
1
e
8
r
2
i
1
space 4
Char Freq.
y
1
s
2
n
2
a
2
l
1
Char Freq.
k
1
.
1
Building a Tree
 The array after inserting all nodes
E
i
y
l
k
.
r
s
n
a
sp
e
1
1
1
1
1
1
2
2
2
2
4
8
Building a Tree
E
i
y
l
k
.
r
s
n
a
sp
e
1
1
1
1
1
1
2
2
2
2
4
8
Building a Tree
y
l
k
.
r
s
n
a
sp
e
1
1
1
1
2
2
2
2
4
8
2
E
1
i
1
Building a Tree
y
l
k
.
r
s
n
a
1
1
1
1
2
2
2
2
2
E
1
i
1
sp
e
4
8
Building a Tree
k
.
r
s
n
a
1
1
2
2
2
2
2
E
1
i
1
2
y
1
l
1
sp
e
4
8
Building a Tree
k
.
r
s
n
a
1
1
2
2
2
2
2
2
E
1
i
1
y
1
l
1
sp
e
4
8
Building a Tree
r
s
n
a
2
2
2
2
2
E
1
2
i
1
y
1
l
1
2
k
1
.
1
sp
e
4
8
Building a Tree
r
s
n
a
2
2
2
2
2
E
1
2
2
i
1
y
1
l
1
k
1
.
1
sp
e
4
8
Building a Tree
n
2
a
2
2
2
E
1
i
1
y
1
2
l
1
k
1
4
r
2
s
2
.
1
sp
e
4
8
Building a Tree
n
a
2
2
2
sp
2
2
e
4
8
4
E
1
i
1
y
1
l
1
k
1
.
1
r
2
s
2
Building a Tree
2
2
2
sp
4
E
1
i
1
y
1
l
1
k
1
.
1
4
n
2
a
2
e
4
8
r
2
s
2
Building a Tree
2
2
2
sp
4
E
1
i
1
y
1
l
1
k
1
4
.
1
e
4
8
r
2
s
2
n
2
a
2
Building a Tree
2
sp
4
k
1
4
.
1
e
4
8
r
2
s
2
n
2
a
2
4
2
E
1
i
1
2
y
1
l
1
Building a Tree
2
4
k
1
4
sp
.
1
r
2
4
4
s
2
n
2
e
2
a
2
E
1
i
1
8
2
y
1
l
1
Building a Tree
4
r
2
4
4
s
2
n
2
2
a
2
E
1
6
sp
4
2
k
1
.
1
e
i
1
8
2
y
1
l
1
Building a Tree
4
4
r
2
s
2
n
2
6
4
2
a
2
E
1
i
1
2
2
y
1
l
1
k
1
e
sp
4
.
1
What is happening to the characters with a low number of
occurrences?
8
Building a Tree
4
6
2
E
1
i
1
2
y
1
2
l
1
k
1
e
sp
4
8
.
1
8
4
4
r
2
s
2
n
2
a
2
Building a Tree
4
6
2
E
1
i
1
2
y
1
2
l
1
k
1
sp
4
.
1
8
e
8
4
4
r
2
s
2
n
2
a
2
Building a Tree
8
e
8
4
4
10
r
2
s
2
n
2
a
2
4
6
2
E
1
i
1
2
y
1
2
l
1
k
1
sp
4
.
1
Building a Tree
8
e
8
10
r
2
4
4
4
s
2
n
2
6
2
a
2
E
1
i
1
2
y
1
2
l
1
k
1
sp
4
.
1
Building a Tree
10
16
4
6
2
E
1
i
1
2
y
1
2
l
1
k
1
sp
4
e
8
8
.
1
4
4
r
2
s
2
n
2
a
2
Building a Tree
10
16
4
6
2
E
1
i
1
2
y
1
2
l
1
k
1
e
8
8
sp
4
4
4
.
1
r
2
s
2
n
2
a
2
Building a Tree
26
16
10
4
2
E
1
i
1
e
8
6
2
y
1
2
l
1
k
1
8
.
1
4
4
sp
4
r
2
s
2
n
2
a
2
Building a Tree
After enqueueing this
node there is only one
node left in priority
queue.
26
16
10
4
2
E
1
i
1
e
8
6
2
y
1
2
l
1
k
1
8
.
1
4
4
sp
4
r
2
s
2
n
2
a
2
Using heap:
P L R
f
5
P L R
e
9
P L R
c
12
P L R
b
13
P L R
d
16
P L R
a
45
Using heap:
P L R
e
9
P L R
c
12
P L R
b
13
P L R
d
16
P L R
a
45
P L R
f
5
Page 44
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P L R
a
45
P L R
e
P L R
9
c
12
P L R
b
13
P L R
d
16
P L R
f
5
Page 45
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P L R
e
9
P L R
a
P L R
45
c
12
P L R
b
13
P L R
d
16
P L R
f
5
Page 46
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P L R
e
9
P L R
b
P L R
13
c
12
P L R
a
45
P L R
d
16
P L R
f
5
Page 47
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P L R
b
P L R
13
c
12
P L R
a
45
P L R
d
16
P L R
e
9
P L R
f
5
Page 48
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
P L R
d
16
P L R
b
P L R
13
c
g L R
g L R
f
5
e
12
P L R
a
45
9
Page 49
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P L R
c
12
P L R
b
P L R
13
d
16
P L R
a
45
P f e
g
g L R
f
5
14
g L R
e
9
Page 50
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P L R
c
12
P L R
b
13
P L R
d
16
P f e
P L R
a
g
45
g L R
f
5
14
g L R
e
9
Page 51
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P L R
b
13
P L R
d
16
a
P L R
c
P f e
P L R
g
45
g L R
12
f
5
14
g L R
e
9
Page 52
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P f e
g
g L R
f
5
P L R
14
b
P L R
13
d
16
P L R
a
45
g L R
e
9
P L R
c
12
Page 53
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P L R
b
P f e
13
g
g L R
f
5
P L R
14
d
16
P L R
a
45
g L R
e
9
P L R
c
12
Page 54
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P f e
g
P L R
b
13
g L R
f
5
P L R
14
d
16
P L R
a
45
g L R
e
9
P L R
c
12
Page 55
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P L R
a
P f e
45
g
g L R
f
5
P L R
14
d
16
g L R
e
9
P L R
c
12
P L R
b
13
Page 56
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P f e
g
g L R
f
5
P L R
14
a
P L R
45
d
16
g L R
e
9
P c b
h
h L R
c
12
25
h L R
b
13
Page 57
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P f e
g
g L R
f
5
P L R
14
a
g L R
e
9
45
P c b
P L R
d
h
16
h L R
c
12
25
h L R
b
13
Page 58
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P f e
g
g L R
f
5
P c b
14
h
g L R
e
9
h L R
c
12
P L R
25
d
16
P L R
a
45
h L R
b
13
Page 59
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P c b
h
h L R
c
12
P L R
25
d
16
P L R
a
45
h L R
b
13
P f e
g
g L R
f
2016/5/29
5
14
g L R
e
9
Page 60
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P L R
a
P c b
45
h
h L R
c
12
P L R
25
d
16
h L R
b
13
P f e
g
g L R
f
2016/5/29
5
14
g L R
e
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
9
Page 61
Using heap:
P L R
d
P c b
16
h
h L R
c
12
P L R
25
a
45
h L R
b
13
P f e
g
g L R
f
2016/5/29
5
14
g L R
e
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
9
Page 62
Using heap:
P c b
h
h L R
c
12
P L R
25
a
h L R
b
13
P L R
d
45
P f e
16
g
g L R
f
2016/5/29
5
14
g L R
e
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
9
Page 63
Using heap:
P L R
a
P c b
45
h
h L R
c
12
25
h L R
b
13
P f e
g
g L R
f
2016/5/29
5
P L R
14
d
16
g L R
e
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
9
Page 64
Using heap:
P c b
h
P L R
25
a
45
P g d
h L R
c
12
i
h L R
b
13
i
g
g L R
f
2016/5/29
30
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
5
f e
14
i L R
d
16
g L R
e
9
Page 65
Using heap:
P c b
h
h L R
c
12
25
a
i
45
h L R
b
P g d
P L R
i
13
g
g L R
f
5
f e
14
30
i L R
d
16
g L R
e
9
Page 66
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P g d
P L R
a
i
45
i
g
f e
14
30
i L R
d
16
P c b
h
g L R
25
f
h L R
c
2016/5/29
12
5
g L R
e
9
h L R
b
13
Page 67
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P g d
i
i
g
f e
14
P L R
30
a
45
i L R
d
16
P c b
h
g L R
f
5
25
g L R
e
9
h L R
c
12
h L R
b
13
Page 68
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P L R
a
45
P g d
i
30
P c b
h
i
g
f e
14
i L R
d
16
h L R
c
g L R
f
2016/5/29
5
25
12
h L R
b
13
g L R
e
9
Page 69
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Using heap:
P h i
j
55
P L R
a
45
j g d
j c b
h
h L R
c
12
i
25
i
h L R
b
g
13
g L R
f
5
f e
14
30
i L R
d
16
g L R
e
9
Page 70
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
P L R
a
P h i
45
j
55
j g d
j c b
h
h L R
c
12
i
25
i
h L R
b
g
13
g L R
f
5
f e
14
30
i L R
d
16
g L R
e
9
Page 71
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
P h i
j
55
P L R
a
45
j g d
j c b
h
h L R
c
12
i
25
i
h L R
b
g
13
g L R
f
5
f e
14
30
i L R
d
16
g L R
e
9
Page 72
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
P h i
j
55
j g d
j c b
h
h L R
c
12
i
25
i
h L R
b
g
13
g L R
f
5
f e
14
P L R
30
a
45
i L R
d
16
g L R
e
9
Page 73
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
P h i
j
55
P L R
h
h L R
c
12
i
25
i
h L R
b
g
13
g L R
f
2016/5/29
a
j g d
j c b
5
f e
14
45
30
i L R
d
16
g L R
e
9
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
Page 74
P a
k
j
100
k L R
a
k h i
45
j
55
j g d
j c b
h
h L R
c
12
i
25
i
h L R
b
g
13
g L R
f
f e
14
30
i L R
d
16
g L R
5
e
9
Page 75
2016/5/29
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
P a
k
j
100
k L R
a
k h i
45
j
55
j g d
j c b
h
h L R
c
12
i
25
i
h L R
b
g
13
g L R
f
2016/5/29
f e
14
30
i L R
d
16
g L R
5
CS3335 Design and Analysis of
Algorithms/WANG Lusheng
e
9
Page 76
Exercise
Modify MyHeap.java in Tutorial 6’s folder so that the class ArrayNode has five
data fields:
int key;
char letter;
ArrayNode parent;
ArrayNode left;
ArrayNode right;
and use the modified MyHeap to construct Huffman code tree. The program
can read n pairs (ai, bi) from the keyboard , where ai is the number of times that
character/letter bi appears and construct the Huffman code tree for the n pairs.
Page 77
2016/5/29
Download