Lecture Notes: Review and Huffman code

advertisement
Assignment 2:
(Due at 10:30 a.m on Friday of Week 10)
Question 1 (Given in Tutorial 5)
Question 2 (Given in Tutorial 7)
•If you do Question 1 only, you get 60 points.
•If you do Question 2 only, you get 90 points.
•If you correctly do both Question 1 and
Question 2, you get 100 points.
•Bonus: 5 Points will be given to those who
write a Java program for the Huffman code
algorithm.
1
Review of Lecture 1 to Lecture 6
Lecture 1: Some concept: Pseudo code, Abstract Data
Type. (Page 60 of text book.)
Stack. Give the ADT of stack (slide 11 of lecture1)
The interface is on slide 19. (Q: Is the interface
equivalent to ADT? Not really. We need the method
for insertion and deletion, i.e., first in last out. )
Applications: parentheses matching
2
Lecture 2:
Linked list
Singly linked list
Doubly linked list
Just know how to setup a list. (Assignment 1)
Lecture 3: Analysis of Algorithms (important)
Primitive operations
Count number of primitive operations for an algorithm
big-O notation 2nO(n), 5n2+10n+11++>O(n2).
3
Lecture 4: Tree
Definition of tree (slide 7)
Tree terminology: root, internal node, external node
(leaf), depth of a node, height of a node, height of a
node.
Inorder traversal of a binary tree
Tree ADT, slide 11, Binary tree ADT, slide 17
In terms of programming, understand
TreeInExample1.java. (If tested in exam, java codes
will be given. I do not want to give long code.)
4
Lecture 5: More on Trees
Linked Structure for Binary Tree.
Just understand the node:

Preorder traversal for any tree
Postorder traversal for any tree
Array-Based representation of binary tree (slide 9)
Algorithms for Depth(), Height() slide 12-15.
5
Lecture 6: Priority Queue (Heeps)
Priority Queue ADT (slide 2)
Heap:
1. definition of heap
2. What does “heap-order” mean?
3. Complete Binary tree (what is a complete binary?)
4. Height of a complete binary tree with n nodes is O(log n).
5. Insert a node into a heap runtimg time O(log n).
6. removeMin: remove a node with minimum key. Running
time O(log n)
Array-based complete binary tree representation.
Show a sample exam paper.
6
Lecture 6: Priority Queue (Heeps)
Priority Queue ADT (slide 2)
Heap:
1. definition of heap
2. What does “heap-order” mean?
3. Complete Binary tree (what is a complete binary?)
4. Height of a complete binary tree with n nodes is O(log n).
5. Insert a node into a heap runtimg time O(log n).
6. removeMin: remove a node with minimum key. Running
time O(log n)
Array-based complete binary tree representation.
Show a sample exam paper.
7
Exercise:
Give some trees and ask students to give InOrder, PostOrder
and PreOrder.
Tutorial 6 of Question 2: Using PreOrder.
Given a complete binary, write the array representation.
Given an array, draw the complete binary tree.
Given a heap, show the steps to removMin.
Given a heap, show the steps to insert a node with key 3. (Do it
for the tree version, do it for an array version.)
Linear time construction of a heap.
8
Huffman codes (Page 565 Chapter 12.4)
Binary character code: each character is
represented by a unique binary string.
A data file can be coded in two ways:
a
b
c
d
e
f
frequency(%)
45
13
12
16
9
5
fixed-length code
000
001
010
011
100
101
variable-length
0
101
100
111
1101 1100
code
The first way needs 1003=300 bits. The second way needs
45 1+13 3+12 3+16 3+9 4+5 4=232 bits.
Hash Tables
9
Variable-length code
Need some care to read the code.
 001011101 (codeword: a=0, b=00, c=01, d=11.)
 Where to cut? 00 can be explained as either aa
or b.
Prefix of 0011: 0, 00, 001, and 0011.
Prefix codes: no codeword is a prefix of some other
codeword. (prefix free)
Prefix codes are simple to encode and decode.
Hash Tables
10
Using codeword in Table to
encode and decode
Encode: abc = 0.101.100 = 0101100

(just concatenate the codewords.)
Decode: 001011101 = 0.0.101.1101 = aabe
a
b
c
d
e
f
frequency(%)
45
13
12
16
9
5
fixed-length code
000
001
010
011
100
101
variable-length
code
0
101
100
111
1101
1100
Hash Tables
11
Encode: abc = 0.101.100 = 0101100

(just concatenate the codewords.)
Decode: 001011101 = 0.0.101.1101 = aabe

(use the (right)binary tree below:)
10
0
0
0
1
86
0
14
1
58
0
a:4
5
0
28
1
0
b:13 c:1
2
a:4
5
0
d:16 e:
9
1
55
0
1
25
14
1
10
0
1
f:5
Tree for the fixed
length codeword
Hash Tables
30
0
1
c:1
2
b:13
1
0
14
d:16
0
1
f:5
e:
9
Tree for variablelength codeword
12
Binary tree
Every nonleaf node has two children.
The fixed-length code in our example is not
optimal.
The total number of bits required to encode a file is
B(T )   f (c)dT (c)
cC


f ( c ) : the frequency (number of occurrences) of
c in the file
dT(c): denote the depth of c’s leaf in the tree
Hash Tables
13
Constructing an optimal code
Formal definition of the problem:
Input: a set of characters C={c1, c2, …,
cn}, each cC has frequency f[c].
Output: a binary tree representing
codewords so that the total number of
bits required for the file is minimized.
Huffman proposed a greedy algorithm
to solve the problem.
Hash Tables
14
(a)
(b)
f:5
c:1
2
e:
9
c:1
2
b:13
b:13
14
0
1
f:5
e:
9
Hash Tables
d:16
a:4
5
d:16
a:4
5
15
14
(c)
d:16
25
0
1
0
1
f:5
e:
9
c:1
2
b:13
25
(d)
30
0
1
c:1
2
b:13
0
14
1
a:4
5
a:4
5
d:16
0
1
f:5
e:
9
Hash Tables
16
0
a:4
5
55
0
a:4
5
1
25
1
c:1
2
b:13
0
14
0
1
f:5
e:
9
1
25
1
d:16
0
1
55
30
0
10
0
30
0
1
c:1
2
b:13
1
0
14
d:16
0
1
f:5
e:
9
(f)
(e)
Hash Tables
17
HUFFMAN(C)
1
n:=|C|
2
Q:=C
3
for i:=1 to n-1 do
4
z:=ALLOCATE_NODE()
5
x:=left[z]:=EXTRACT_MIN(Q)
6
y:=right[z]:=EXTRACT_MIN(Q)
7
f[z]:=f[x]+f[y]
8
INSERT(Q,z)
9
return EXTRACT_MIN(Q)
Hash Tables
18
The Huffman Algorithm
This algorithm builds the tree T corresponding to the
optimal code in a bottom-up manner.
C is a set of n characters, and each character c in C is
a character with a defined frequency f[c].
Q is a priority queue, keyed on f, used to identify the
two least-frequent characters to merge together.
The result of the merger is a new object (internal
node) whose frequency is the sum of the two
objects.
Hash Tables
19
Time complexity
Lines 4-8 are executed n-1 times.
Each heap operation in Lines 4-8 takes
O(lg n) time.
Total time required is O(n lg n).
Note: The details of heap operation will
not be tested. Time complexity O(n lg
n) should be remembered.
Hash Tables
20
Another example:
e:4
c:6
a:6
c:6
b:9
b:9
d:11
10
0
e:4
Hash Tables
d:11
1
a:6
21
10
0
d:11
15
1
e:4
0
c:6
a:6
15
0
c:6
1
b:9
21
0
1
1
10
b:9
0
e:4
Hash Tables
d:11
1
a:6
22
36
0
1
15
0
c:6
21
0
1
1
10
b:9
0
e:4
d:11
1
a:6
Summary Huffman Code: Given a set of characters and frequency, you
should be able to construct the binary tree for Huffman codes. Proofs for
why this algorithm can give optimal solution are not required.
Hash Tables
23
Download