Huffman Codes

advertisement
Huffman Codes
Drozdek Chapter 11
1
Objectives

You will be able to

Construct an optimal variable bit length code
for an alphabet with known probability for
each letter occuring in a message.



Huffman Code
Construct a tree for decoding messages
encoded in a Huffman code.
Construct a tree for encoding messages
encoded in a Huffman code.
2
Huffman Codes

Common character codes such as ASCII
and EBCDIC use same size data structure
for all characters.


Contrast Morse code


Eight bits per character.
Uses variable-length sequences.
Variable length codes can produce shorter
messages than fixed length codes

on average when applied to many messages with
given character probabilities.
3
Variable-Length Codes

Each character in such a code
has a weight (probability) and a length

The expected message length per character
is the sum of the products of the code
lengths and the probabilties for all the
characters
(0.2*2) + (0.1*4) + (0.1*4) + (0.15*3) + (0.45*1) = 2.1
4
Immediate Decodability

When no sequence of bits that
represents a character is a prefix of a
longer sequence for another character



Can be decoded without waiting for
remaining bits.
Note how previous scheme is not
immediately decodable.
And this one is
5
Immediate Decodability



Codes that are immediatly decodable are
called prefix codes.
No valid code symbol is a prefix of
another valid code symbol.
Perhaps better called prefix free codes.
6
Optimal Codes

We seek codes that are



Immediately decodable.
Average message length for a large number
of messages is minimal.
For a set of n characters { C1 .. Cn }
with weights { w1 .. wn }

We need an algorithm which generates
variable length bit strings representing the
characters.
7
Huffman Codes


An optimal code scheme developed by David A.
Huffman while a PhD student at MIT.
“A Method for the Construction of MinimumRedundancy Codes”

Proceedings of the I.R.E., Sept. 1952

http://en.wikipedia.org/wiki/David_A._Huffman

http://www.huffmancoding.com/david-huffman/scientific-american
8
Huffman's Algorithm

How to determine an optimal code for a
set of N characters given their relative
frequencies (or weights).
9
Huffman's Algorithm

Initialize a list of one-node binary trees


One node for each character containing the
character and its weight.
While there is more than one tree in the list:





Find two trees in the list having minimal weights.
Remove those trees from the list and make them
the left and right subtrees of a new node having the
sum of their weights as its weight.
Label the arc to the left subtree with 0.
Label the arc to the right subtree with 1.
Add the new tree to the list.
10
Huffman's Algorithm

The code for character Ci is the bit string along
the path from the root to Ci in the final binary
tree.
11
Example
Given characters
and probabilities:
The end result is
Character
Huffman
Code
Note arbitrary choice for sibling of D.
A
011
B
000
C
001
D
010
E
1
12
Alternate Result
Average message length is the same.
13
Huffman Decoding Algorithm
Given a message as a string of 0's and 1's:
Initialize pointer p to the root of Huffman tree.
While end of message string not reached:
Let x be the next bit of the message string.
If x is 0
move p to the left child
else
move p to the right child
If p points to a leaf
Display the character at that leaf.
Reset p to the root of the Huffman tree.
14
Huffman Decoding Algorithm

For message string 0001011010

Using Huffman Tree and decoding algorithm
Click for answer
000
1
011
010
B
E
A
D
15
Implementing a Huffman Code Program

Let’s implement a program to build a
Huffman code tree.


Encode and decode text messages using the
resulting Huffman code.
Limit input to letters and spaces.

Convert to letters to lower case.
16
Implementing a Huffman Code Program

In order to create a Huffman code for English
text, we need weighting factors for the letters.


Frequency tables are readily available.
To simplify testing and debugging, start with
the a small example:

Just the letters A, B, C, D, and E
17
Getting Started

Create a new empty C++ project in
Visual Studio, Huffman_Code


or a directory in Unix.
Add a C++ code file main.cpp
18
main.cpp
#include <iostream>
using namespace std;
int main(void)
{
cout << "This is the Huffman Code program" << endl;
cin.get();
cin.get();
return 0;
}
Build and test
19
Program Running
20
Class char_freq

We need a class to hold the elements of
a Huffman tree.

Data



Pointers



Character
Frequency (Probability of occurance)
Left child
Right child
Add class Char_Freq
21
Char_Freq.h
#pragma once
#include <iostream>
using std::ostream;
class Char_Freq
{
private:
char ch;
double freq;
Char_Freq* left;
Char_Freq* right;
public:
Char_Freq(void);
Char_Freq(char c, double f);
Char_Freq(char c, double f, Char_Freq* Left, Char_Freq* Right);
char Ch() const { return ch;};
double Freq() const { return freq;};
bool operator<(const Char_Freq& rhs) const;
friend ostream& operator<< (ostream& os, const Char_Freq& cf);
};
22
Char_Freq.cpp
#include "Char_Freq.h"
Char_Freq::Char_Freq(void)
{}
Char_Freq::Char_Freq(char c, double f) :
ch(c), freq(f), left(0), right(0)
{}
Char_Freq::Char_Freq(char c, double f, Char_Freq* Left, Char_Freq* Right) :
ch(c), freq(f), left(Left), right(Right)
{}
bool Char_Freq::operator<(const Char_Freq& rhs) const
{
return this->freq < rhs.freq;
}
ostream& operator<< (ostream& os, const Char_Freq& cf)
{
os << cf.ch << " " << cf.freq;
return os;
}
23
The Huffman Tree


Add class Huffman_Tree
Will hold code to build and access the
Huffman code for a specific set of
characters and frequencies.
24
Starting the Huffman Tree


We will build multiple trees of Char_Freq
elements.
Keep the roots in a list.


Initially one tree per character to be coded.


Use Standard Template Library list class.
Each tree consists of root only.
Method Add() will be used to add char-freq
pairs to the list
25
Huffman_Tree.h
#pragma once
#include <list>
#include "Char_Freq.h"
class Huffman_Tree
{
public:
Huffman_Tree(void);
~Huffman_Tree(void) {};
// Add a single node tree to the list.
void Add(char c, double frequency);
void Display_List(void);
private:
std::list<Char_Freq> node_list;
};
26
Huffman_Tree.cpp
#include <iostream>
#include <string>
#include "Huffman_Tree.h"
using namespace std;
Huffman_Tree::Huffman_Tree(void)
{}
void Huffman_Tree::Add(char c, double frequency)
{
Char_Freq cf(c, frequency);
node_list.push_back(cf);
}
27
Huffman_Tree.cpp
void Huffman_Tree::Display_List(void)
{
cout << "Character frequency list:" << endl;
list<Char_Freq>::iterator itr;
for (itr=node_list.begin(); itr!=node_list.end(); ++itr)
{
cout << *itr << endl;
}
}
28
main.cpp
#include <iostream>
#include <string>
#include "Huffman_Tree.h"
using namespace std;
Huffman_Tree huffman_tree;
int main(void)
{
cout << "This is the Huffman code program.\n\n";
huffman_tree.Add('a',
huffman_tree.Add('b',
huffman_tree.Add('c',
huffman_tree.Add('d',
huffman_tree.Add('e',
0.2
0.1
0.1
0.15);
0.45);
);
);
);
huffman_tree.Display_List();
cin.get();
cin.get();
return 0;
}
29
Program in Action
30
Implementing Huffman’s Algorithm




Huffman’s algorithm requires us to identify two
trees with minimal total frequency.
To do this we can sort the list.
The < operator for the char_freq class
compares the frequency values.
So the sort method of the list template class
will sort the trees into increasing order by
frequency.
31
Implementing Huffman’s Algorithm


Add function Make_Decode_Tree to class
Huffman_Tree.
Repeatedly



Sort the list of trees by frequency
Remove the first two trees
Create a new node with these trees as subtrees.



Frequency is sum of their frequencies
Add the new node to the list.
Continue until there is only one node on the list.
32
Huffman_Tree.h

Add new public method:
void Make_Decode_Tree(void);
33
Huffman_Tree.cpp

Start by sorting the list.

Display the sorted list.
void Huffman_Tree::Make_Decode_Tree(void)
{
node_list.sort();
cout << "\nSorted list:\n";
Display_List();
}
34
main.cpp

Add call to make_decode_tree.
int main(void)
{
cout << "This is the Huffman code program.\n";
huffman_tree.Add('a',
huffman_tree.Add('b',
huffman_tree.Add('c',
huffman_tree.Add('d',
huffman_tree.Add('e',
0.2
);
0.1
);
0.1
);
0.15);
0.45);
huffman_tree.Display_List();
huffman_tree.Make_Decode_Tree();
cin.get();
cin.get();
return 0;
}
35
Program in Action
36
Huffman_Tree.cpp
Add to function Make_Decode_Tree()
while (node_list.size() > 1)
{
Char_Freq* cf1 = new Char_Freq(node_list.front());
node_list.pop_front();
Char_Freq* cf2 = new Char_Freq(node_list.front());
node_list.pop_front();
Char_Freq cf3(0, cf1->Freq()+cf2->Freq(), cf1, cf2);
node_list.push_back(cf3);
node_list.sort();
}
This is the essence of Huffman’s algorithm!
37
Huffman_Tree.h

Add a new private member variable to
class Huffman_Tree to hold the root of
the tree.
private:
std::list<Char_Freq> node_list;
Char_Freq decode_tree_root;
};
38
Huffman_Tree.cpp

In order to check our results we need to
be able to display the tree.


Also show the code as a list.
Add public functions to Huffman_Tree.h:
void Display_Decode_Tree(Char_Freq* cf, int indent) const;
void Display_Code(Char_Freq* cf, std::string prefix) const;

Add at top of Huffman_Tree.cpp:
#include <iomanip>
39
Display_Decode_Tree()
void Huffman_Tree::Display_Decode_Tree(Char_Freq* cf,
int indent) const
{
if (cf->left != 0)
{
Display_Decode_Tree(cf->left, indent + 8);
}
cout << setw(indent) << " " << *cf << endl;
if (cf->right != 0)
{
Display_Decode_Tree(cf->right, indent + 8);
}
}


Note access of private members of cf.
Make class Huffman_Tree a friend of class Char_Freq.
40
Char_Freq.h

Add at the end of Char_Freq.h:
bool operator<(const Char_Freq& rhs) const;
friend ostream& operator<< (ostream& os, const Char_Freq& cf);
friend class Huffman_Tree;
};
41
char_freq.cpp

Update << to handle merged nodes

ch will be 0
ostream& operator<< (ostream& os, const Char_Freq& cf)
{
if (cf.ch > 0)
{
os << cf.ch << " " << cf.freq;
}
else
{
os << '*' << " " << cf.freq;
}
return os;
}
42
Huffman_Tree.cpp

Add at the end of function Make_Decode_Tree()
decode_tree_root = node_list.front();
cout << endl << "The Huffman Tree" << endl;
Display_Decode_Tree(&decode_tree_root, 0);
43
Program in Action
44
Download