Binary trees Abstract This technical note describes a method to store a binary tree within a bit pattern. This can be used in compression systems to store huffman codes etc. Introduction To store compressed data in many cases binary trees are common, also in higher methods, where Huffman’s coding is used only as an supplementary method. We show here an alternate method to store the information required to build a Huffman – or more general a binary – tree. This is done by use of a bit pattern, where each inner node is coded by a binary 1, and each leave by a zero. We show that the original tree can be retrieved in a unique way. Foundations A binary tree consists of nodes, where each node specifies no or two successors. A node is called inner node (or usually only node), if it has successors. A node with no successors is called a leave. A tree is defined recursively by: 1. A binary tree has a unique node, called a root. A root can be an inner node or a leave. 2. If a binary tree is given and a leave is replaced by a node with two leaves, where the two leaves are no members of the binary tree, then we get another binary tree. The second condition states, that the connection between a node and its successors can never lead to a node of the binary tree. We state the following simply observation: Observation 1: In a binary tree the number of leaves and inner nodes differ by one. This holds obviously for a binary tree with one node, which is the root and a leave. Let it hold for any binary tree with less than n nodes. To get a binary tree with n+2 nodes, we can replace a leave by a node with two leaves as successors, where the number of nodes is increased by one, where the number of leaves is decreased by one and increased by two, i.e. increased by one leave. Thus the difference between the number of nodes and leaves remains unchanged. The proof also shows that the number of nodes in a binary tree is always uneven. We define the left subtree of a node as the tree consisting of the left successor of this tree (as root) and all its successors. In the same way we define the right subtree. Left depth search We define now the left depth search as a sequence of all nodes in a binary tree. Left depth search: The left depth search of a binary tree is the following sequence: 1. The root is the first element of the left depth search. 2. The left depth search of the left subtree of the root are the next elements of the sequence. 3. The left depth search of the right subtree of the root are the final elements of the sequence. This sequence defines the structure of the binary completely, if inner nodes and leaves are distinguished. To generate this sequence, a simple recursive function can be used. The following function leftDepthSearch prints the Names (which are here numbers) of the left depth search sequence of a binary tree. struct node {struct node *left, *right; int Name}* start; // start points to the first root of the binary tree. leftDepthSearch(struct node * start) { printf(“%d,”,start->Name); if(start->left!=NULL) leftDepthSearch(start->left) if(start->right!=NULL) leftDepthSearch(start->right) } Since the left depth search sequence defines the structure of the tree, we simply have to distinguish between inner nodes and leaves by using bits 1 for inner nodes and bits 0 for leaves. Thus the following function produces a corresponding sequence. leftDepthSearchNodeLeave(struct node * start) { if(start->left==NULL && start->right==NULL) // this is a leave printf(“0”); else // this is a node printf(“1”); if(start->left!=NULL) leftDepthSearch(start->left) if(start->right!=NULL) leftDepthSearch(start->right) } Construct a tree Now let us assume we read a sequence of bits (0,1) and we want to construct a binary tree from them. This can be done recursively as well. Let nextBit return 0 or 1, we use the following recursive function Int countNodes=0, countLeaves=0; struct node * makeTree() { int next = nextBit(); struct node * nextNode = NewNode(); if(next==0) { countLeaves++; nextNode ->left = NULL; nextNode ->right = NULL; return nextNode; } // next must be 1, i.e. generate an inner node with two successors countNodes++; nextNode->left = makeTree(); nextNode->right = makeTree(); return nextNode; } struct node * start; int main(){ start = makeTree(); } If the first number read by nextBit() is 0, then this program enters the first if-condition, sets the left, right variables to NULL and exits. This is okay, since the binary tree consists of a leave and nothing else. There cannot be an empty binary tree, since the number of nodes is always odd, i.e. never zero. If the first number read by nextBit() is 1, then this program assigns the (by makeTree()) newly created subtrees at first to the left, then to the right variable. The first call to makeTree() consumes exactly the left depth search sequence of the original subtree. By induction, its structure must be identical of that left subtree. The same holds for the right subtree. It should be clear, that the program stops automatically when the left depth search sequence is completely consumed. Example: The tree might be: 1 2 3 7 4 5 8 9 6 The green root is coded by 1, the left yellow node by 2, the most left blue leave by 3 etc.; so we get the left depth search sequence as: 123456789. The corresponding bit sequence for inner nodes(=1) and leaves (=0) yields: 110100100. The first call to makeTree() calls nextBit(),reads at first 1 and generates a node, the left successor to which a newly generated subtree is assigned by another call to makeTree(). This reads the second bit, a 1, and generates a new subtree by a call to makeTree(). Now, the third call to nextBit() reads 0, so a leave is constructed and assigned to the left variable of node 2. Then node 2 calls makeTree() again, which reads the fourth bit, a 1 (for node 4). Thus this calls makeTree() again, which reads the fifth bit, a 0. Thus a leave (5) is returned to node 4, which is assigned to its left variable. Then node 4 calls makeTree() again to find the next bit to be a 0. A leave (6) is returned and assigned to its right variable. Node 4 returns its reference, which is assigned in node 2 to its right variable. This node returns its reference, which is assigned by node 1 to its left variable. The next call in node 1 to makeTree() finds a bit 1, which generates two further calls to makeTree(), which yield references to two leaves and assignes them to the left and right variable of node 7. Node 7 returns its reference, which is assigned by node 1 to its right variable. Finally node 1 returns its reference to the main program, which assigns this to the start variable. That’s all, folks. The left depth search sequence has the property that the number of inner nodes (or 1) is always less than or equal to the number of leaves (or 0), if we sum from left to right. This means, that it is easy to find the number of nodes in a tree, if only the left depth search sequence is given. Scan the bits, add +1 for a 1 and –1 for a 0, and if the sum becomes –1, then this is the last node of the tree. int CountNodes() { int NodesInTree=0, countsNodes=0; while(countsNodes>=0) { NodesInTree++; if(nextBit()==0) countsNodes--; else countsNodes++; } return NodesInTree; } To show that the latter assumptions holds, we have to prove that this holds for each inner node. We know that a subtree of any node has the property that the number of leaves is one more than the number of nodes. We start with the root, for which countsNodes yields 1 if it is an inner node. The left subtree counts –1, so that after its scan countsNodes yields 0. The right subtree yields –1, so that countsNodes becomes –1. We show that after reading the bit for any inner node countsNodes is 1 or greater. This holds for the root. If the first node of the left subtree is a node, then countsNodes becomes 2. After reading this left subtree, countsNodes is 0. If the first node of the right subtree is an inner node, then countsNodes yields 1. Thus the assumption holds for the successor of the root, and it holds also for those successors. Application to Huffman Coding To apply this to Huffman coding, we have to assign to each leave the value of a symbol which is coded with that bit sequence. The easiest way is to use the recursive function to build a sequence of the symbols in left depth search (where only for the leaves the corresponding value is stored). Since generating of the binary tree yields the same sequence of leaves, we can use it directly to generate the corresponding symbols in the newly created tree. Storage size needed to store this information is the number of nodes (in bits) plus the number of symbols, which might be coded to save some space. In classical Huffman coding we have to store either only the frequency of the values (where their index gives the symbol value). However, here unused symbols must be stored by the frequency 0. In many ASCII-texts less than 80 symbols are used. If 256 bytes are required to store the frequencies, we can do this with 80 + 161/8=101 bytes. This saves some bytes, which is the goal of compression anyhow.