Example Shannon-Fano Coding To create a code tree according to Shannon and Fano an ordered table is required providing the frequency of any symbol. Each part of the table will be divided into two segments. The algorithm has to ensure that either the upper or the lower part of the segment have nearly the same sum of frequencies. This procedure will be repeated until only single symbols are left. Symbol Frequency Code Code Total Length Length -----------------------------------------A 24 2 00 48 B 12 2 01 24 C 10 2 10 20 D 8 3 110 24 E 8 3 111 24 -----------------------------------------total: 62 symbols SF coded: 140 Bit linear (3 Bit/Symbol): 186 Bit The original data can be coded with an average length of 2.26 bit. Linear coding of 5 symbols would require 3 bit per symbol. But, before generating a Shannon-Fano code tree the table must be known or it must be derived from preceding data. Algorithm Encoding Create table providing frequencies ----------------------------------------------Sort symbols according to frequency in descending order ----------------------------------------------Start with the entire table ----------------------------------------------Division: ------------------------------------------Seek pointer to the first and last symbol to the segment ------------------------------------------Divide the segment into two parts, both nearly equal in sum of frequencies ------------------------------------------Add a binary 0 to the code words of the upper part and a 1 to the lower part ------------------------------------------Search for the next segment containing more than two symbols and repeat division ----------------------------------------------Coding of the origination data according to code words in the table ----------------------------------------------- The decoding process follows the general algorithm for interpretation of binary code trees. Interpretation of Code Trees A particular code word will be created by running from the root node to the symbol's leaf node. Any left-sided branch adds a 0 to the code word; every right-sided branch a binary 1. The required number of steps or the depth of this part of the code tree is equal to the code length. The example mentioned above result in the following code representing the three symbols "a", "b" and "c": 0 -- a 10 -- b 11 -- c Characteristics of Code Trees Any node in a binary code tree has only one predecessor and two successors. If symbols are assigned only to leaf nodes and interior nodes only construct the tree, it is always sure that no code word could be the prefix of another code word. Such a tree matches the prefix property. Code trees can be constructed in a way that the probability for the occurrence of the symbols will be represented by the tree structure. A wide-spread procedure is the coding scheme developed by David Huffman. Binary trees only allow a graduation of 1 bit. More precise solutions will be offered by the arithmetic code. Step-by-Step Construction Freq- 1. Step 2. Step 3. Step Symbol quency Sum Code Sum Code Sum Code ----------------------------------------------A 24 24 0 24 00 ---------B 12 36 0 12 01 -------------------------C 10 26 1 10 10 ---------------------D 8 16 1 16 16 110 ----------E 8 8 1 8 8 111 ----------------------------------------------- Code trees according to the steps mentioned above: 1. 2. 3. Shannon-Fano versus Huffman The point is whether another method would provide a better code efficiency. According to information theory a perfect code should offer an average code length of 2.176 bit or 134,882 bit in total. For comparison purposes the former example will be encoded by the Huffman algorithm: Shannon-Fano Huffman Sym. Freq. code len. tot. code len. tot. -----------------------------------------A 24 00 2 48 0 1 24 B 12 01 2 24 100 3 36 C 10 10 2 20 101 3 30 D 8 110 3 24 110 3 24 E 8 111 3 24 111 3 24 -----------------------------------------total 186 140 138 (linear 3 bit code) The Shannon-Fano code does not offer the best code efficiency for the exemplary data structure. This is not necessarily the case for any frequency distribution. But, the Shannon-Fano coding provides a similar result compared with Huffman coding at the best. It will never exceed Huffman coding. The optimum of 134,882 bit will not be matched by both. Another Example An Example Applying the Shannon-Fano algorithm to the file with variable symbols frequencies cited earlier, we get the result below. The first dividing line is placed between the ‘B’ and the ‘C’, assigning a count of 21 to the upper group and 14 to the lower group, which is the closest to half. This means that ‘A’ and ‘B’ will have a binary code starting with 0, while ‘C’, ‘D’, and ‘E’ will have binary codes starting with 1, as shown. Applying the recursive definition of the algorithm, the next division occurs between ‘A’ and ‘B’, putting ‘A’ on a leaf with code 00 and ‘B’ on another leaf with code 01. After performing four divisions, the three most frequent symbols have a 2-bit code while the remaining, rarer symbols have 3-bit codes. A 14 B 7 C 5 D 5 E 4 Second Division First Division Third Division 0 0 0 1 1 0 1 Fourth Division 1 1 0 1 1 The binary tree resulting from applying the Shannon-Fano algorithm to the aforementioned text file is given below, and differs slightly from the one obtained by applying the Huffman coding to the same exact character set. Note that as with the Huffman tree, since all the characters are situated at the leaves of the tree, the possibility that one code is a prefix of another code is altogether avoided. The table below shows the resultant bit codes for each character: Symbol Bit Code A 00 B 01 C 10 D 110 E 111