Binary trees

advertisement
Binary trees
Abstract
This technical note describes a method to store a binary tree within a bit pattern. This can be used in
compression systems to store huffman codes etc.
Introduction
To store compressed data in many cases binary trees are common, also in higher methods, where
Huffman’s coding is used only as an supplementary method. We show here an alternate method to
store the information required to build a Huffman – or more general a binary – tree. This is done by
use of a bit pattern, where each inner node is coded by a binary 1, and each leave by a zero. We
show that the original tree can be retrieved in a unique way.
Foundations
A binary tree consists of nodes, where each node specifies no or two successors. A node is called
inner node (or usually only node), if it has successors. A node with no successors is called a leave. A
tree is defined recursively by:
1. A binary tree has a unique node, called a root. A root can be an inner node or a leave.
2. If a binary tree is given and a leave is replaced by a node with two leaves, where the two
leaves are no members of the binary tree, then we get another binary tree.
The second condition states, that the connection between a node and its successors can never lead to
a node of the binary tree.
We state the following simply observation:
Observation 1: In a binary tree the number of leaves and inner nodes differ by one.
This holds obviously for a binary tree with one node, which is the root and a leave. Let it hold
for any binary tree with less than n nodes. To get a binary tree with n+2 nodes, we can replace
a leave by a node with two leaves as successors, where the number of nodes is increased by
one, where the number of leaves is decreased by one and increased by two, i.e. increased by
one leave. Thus the difference between the number of nodes and leaves remains unchanged.
The proof also shows that the number of nodes in a binary tree is always uneven.
We define the left subtree of a node as the tree consisting of the left successor of this tree (as root)
and all its successors. In the same way we define the right subtree.
Left depth search
We define now the left depth search as a sequence of all nodes in a binary tree.
Left depth search: The left depth search of a binary tree is the following sequence:
1. The root is the first element of the left depth search.
2. The left depth search of the left subtree of the root are the next elements of the sequence.
3. The left depth search of the right subtree of the root are the final elements of the sequence.
This sequence defines the structure of the binary completely, if inner nodes and leaves are
distinguished.
To generate this sequence, a simple recursive function can be used. The following function
leftDepthSearch prints the Names (which are here numbers) of the left depth search sequence of
a binary tree.
struct node {struct node *left, *right; int Name}* start;
// start points to the first root of the binary tree.
leftDepthSearch(struct node * start)
{
printf(“%d,”,start->Name);
if(start->left!=NULL) leftDepthSearch(start->left)
if(start->right!=NULL) leftDepthSearch(start->right)
}
Since the left depth search sequence defines the structure of the tree, we simply have to distinguish
between inner nodes and leaves by using bits 1 for inner nodes and bits 0 for leaves. Thus the
following function produces a corresponding sequence.
leftDepthSearchNodeLeave(struct node * start)
{
if(start->left==NULL && start->right==NULL) // this is a leave
printf(“0”); else // this is a node
printf(“1”);
if(start->left!=NULL) leftDepthSearch(start->left)
if(start->right!=NULL) leftDepthSearch(start->right)
}
Construct a tree
Now let us assume we read a sequence of bits (0,1) and we want to construct a binary tree from them.
This can be done recursively as well. Let nextBit return 0 or 1, we use the following recursive function
Int countNodes=0, countLeaves=0;
struct node * makeTree()
{ int next = nextBit();
struct node * nextNode = NewNode();
if(next==0)
{ countLeaves++;
nextNode ->left = NULL;
nextNode ->right = NULL;
return nextNode;
} // next must be 1, i.e. generate an inner node with two successors
countNodes++;
nextNode->left = makeTree();
nextNode->right = makeTree();
return nextNode;
}
struct node * start;
int main(){
start = makeTree();
}
If the first number read by nextBit() is 0, then this program enters the first if-condition, sets the
left, right variables to NULL and exits. This is okay, since the binary tree consists of a leave and
nothing else. There cannot be an empty binary tree, since the number of nodes is always odd, i.e.
never zero.
If the first number read by nextBit() is 1, then this program assigns the (by makeTree()) newly
created subtrees at first to the left, then to the right variable. The first call to makeTree()
consumes exactly the left depth search sequence of the original subtree. By induction, its structure
must be identical of that left subtree. The same holds for the right subtree.
It should be clear, that the program stops automatically when the left depth search sequence is
completely consumed.
Example:
The tree might be:
1
2
3
7
4
5
8
9
6
The green root is coded by 1, the left yellow node by 2, the most left blue leave by 3 etc.; so we get
the left depth search sequence as: 123456789. The corresponding bit sequence for inner nodes(=1)
and leaves (=0) yields: 110100100.
The first call to makeTree() calls nextBit(),reads at first 1 and generates a node, the left successor to
which a newly generated subtree is assigned by another call to makeTree(). This reads the second
bit, a 1, and generates a new subtree by a call to makeTree(). Now, the third call to nextBit() reads 0,
so a leave is constructed and assigned to the left variable of node 2. Then node 2 calls
makeTree() again, which reads the fourth bit, a 1 (for node 4). Thus this calls makeTree() again,
which reads the fifth bit, a 0. Thus a leave (5) is returned to node 4, which is assigned to its left
variable. Then node 4 calls makeTree() again to find the next bit to be a 0. A leave (6) is returned
and assigned to its right variable. Node 4 returns its reference, which is assigned in node 2 to its
right variable. This node returns its reference, which is assigned by node 1 to its left variable.
The next call in node 1 to makeTree() finds a bit 1, which generates two further calls to
makeTree(), which yield references to two leaves and assignes them to the left and right
variable of node 7. Node 7 returns its reference, which is assigned by node 1 to its right variable.
Finally node 1 returns its reference to the main program, which assigns this to the start variable.
That’s all, folks.
The left depth search sequence has the property that the number of inner nodes (or 1) is always less
than or equal to the number of leaves (or 0), if we sum from left to right. This means, that it is easy to
find the number of nodes in a tree, if only the left depth search sequence is given. Scan the bits, add
+1 for a 1 and –1 for a 0, and if the sum becomes –1, then this is the last node of the tree.
int CountNodes()
{ int NodesInTree=0, countsNodes=0;
while(countsNodes>=0)
{ NodesInTree++;
if(nextBit()==0) countsNodes--; else countsNodes++;
}
return NodesInTree;
}
To show that the latter assumptions holds, we have to prove that this holds for each inner node. We
know that a subtree of any node has the property that the number of leaves is one more than the
number of nodes. We start with the root, for which countsNodes yields 1 if it is an inner node. The
left subtree counts –1, so that after its scan countsNodes yields 0. The right subtree yields –1, so
that countsNodes becomes –1. We show that after reading the bit for any inner node countsNodes
is 1 or greater. This holds for the root. If the first node of the left subtree is a node, then countsNodes
becomes 2. After reading this left subtree, countsNodes is 0. If the first node of the right subtree is
an inner node, then countsNodes yields 1. Thus the assumption holds for the successor of the root,
and it holds also for those successors.
Application to Huffman Coding
To apply this to Huffman coding, we have to assign to each leave the value of a symbol which is
coded with that bit sequence. The easiest way is to use the recursive function to build a sequence of
the symbols in left depth search (where only for the leaves the corresponding value is stored). Since
generating of the binary tree yields the same sequence of leaves, we can use it directly to generate
the corresponding symbols in the newly created tree.
Storage size needed to store this information is the number of nodes (in bits) plus the number of
symbols, which might be coded to save some space. In classical Huffman coding we have to store
either only the frequency of the values (where their index gives the symbol value). However, here
unused symbols must be stored by the frequency 0. In many ASCII-texts less than 80 symbols are
used. If 256 bytes are required to store the frequencies, we can do this with 80 + 161/8=101 bytes.
This saves some bytes, which is the goal of compression anyhow.
Download