huffman compression

advertisement
Huffman Compression
Steps for the Huffman Compression
1) Write down all of the letters appearing in the text in a list, most frequent to least
frequent. For each letter, put the frequency as a subscript (it will look like a scrabble
tile). For our example (which was DANIEL LEWIS DREIBELBIS), you would get:
E4
I4
L3
D2
S2
B2
A1
N1
R1
W1
2) Combine the two letters with the smallest frequencies. If there is more than one way
to pick your two letters, then make a choice. This will give you a token of two letters,
and you will put the sum of the frequencies as its subscript. For our example, we would
combine R1 and W1 to get RW2.
3) Repeat step 2), always combining the tokens with the smallest frequencies. While
combining these tokens, we will draw a chart to keep track of how we did it (see the
Huffman chart).
4) Continue until there is only one token left which contains all of the letters. See the
Huffman chart to see what you should end up with.
5) In the end, we will end up with a tree that describes how all of the elements were
combined. Redraw this tree, this time with the combined token on top and all the
branches going down:
EILDSBANRW
EILD
EI
E I
SBANRW
LD
L
SB
D
S B
ANRW
AN
RW
A N R W
6) Now, for each letter, assign the code that tells how you would traverse the tree to reach
that letter. Use a “0” to represent a left branch and a “1” to represent a right branch. For
instance, to get to the letter “B”, you must first go right, then left, then right again, and so
the code for “B” will be “010”.
The overall code for us is:
Letter
Frequency
Code
E
4
000
I
4
001
L
3
010
D
2
011
S
2
100
B
2
101
A
1
1100
N
1
1101
R
1
1110
W
1
1111
The final result for DANIEL LEWIS DREIBELBIS:
01111001101001000010 0100001111001100 0111110000001101000010101001100
To read the code, take out your tree and use the sequence of zeros and ones as a set of
directions. So if you saw 0111110, the first three digits will give you directions to the
letter “D”, and the final four digits will give you directions to the letter “R”.
Download