9 Trees Another very important data structure is that of a tree. By definition: A tree may be empty or it may consist of a root and 0 or more subtrees. Each subtree is also a tree. A B D E C F H I 9.1 G J Definitions: For examples refer to above diagram Node: data stored by the tree (All the circles) Root Node the top most node from which all other nodes come from. Note that all nodes can be considered root node of their subtree. Ex. A is the root of the entire tree. B is the root of the subtree containing B,D,E and F Leaf Node A node with no subtrees Ex. D,E,F,I,J,and G are all leaf nodes Child: Root node of a subtree of a node is the child node Ex. B is the child of A. I is the child of H Parent : Opposite of a child node Ex. A is parent of B. H is the parent of I Siblings: All nodes that have the same parent node are siblings Ex. E and F are siblings of D but H is not Page 60 Ancestor: All nodes that can be reached by moving only in an upward direction in the tree. Ex. C, A and H are all ancestors of I but G and B are not. Descendants of a node are nodes that can be reached by only going down in the tree. Ex. Descendents of C are G,H,I and J Levels: Nodes in Level 0 of our tree is A. Nodes in Level 1 are B and C, Nodes in Level 2 are D,E,F,G and H. Nodes in Level 3 are I and J Height: number of levels in the tree (in our case 4) Path: Set of branches taken to connect an ancestor of a node to the node. Usually described by the set of nodes encountered along the path. Binary tree: A binary tree is a tree where every node has 2 subtrees that are also binary trees. The subtrees may be empty. Each node has a left child and a right child. The following are NOT trees. B D A A C B C E D 9.2 Tree Implementations By its nature, trees are typically implemented using a Node/link data structure like that of a linked list. In a linked list, each node has data and it has a next (and sometimes also a previous) pointer. In a tree each node has data and a number of node pointers that point to the node's children. However, you could also use an array based implementation for a tree. Each element would be an object containing data and several integers indicating the index of where the root of all its subtrees are. This is how we implemented the Heap. For BINARY TREES ONLY, you can use the array indexes as indication of a nodes level and parents. element 0 is root. element 1 and 2 are its children. 3 and 4 are children of element 1. 5 and 6 are children of 2. In other words, for node k, it left child is node 2k+1 and right child is 2k+2. Parent node k is node (k-1)/2. Array implementations are good only when the trees are complete or much space will be wasted. Page 61 9.3 Binary Trees A binary tree is a tree where every node has 2 subtrees that are also binary trees. The subtrees may be empty. The following is a binary tree: A B D C F H I G J The following isn't (B has 3 subtrees): A B D E C F H I G J Page 62 A binary tree that is ordered is called a binary search tree. This is not the same as a tree with the heap order property. A binary search tree is a binary tree where all values in left subtree < value in current node < values in right subtree. The following is NOT a binary search tree A B D C F H I G J The following IS a binary search tree: D B A H C F E I G A binary search tree structure can allow us to do searches quickly even using a linking structure (under certain conditions). To find a value, we simply start at the root and look at the value. If our key is less than the value, search left subtree. If key is greater than value search right subtree. It provides a way for us to do a "binary search" on a linked structure. TNode<TYPE>* BST::Search(TYPE key){ TNode* retval=NULL; TNode* curr=root_; while(retval && curr){ if(curr->data==key) retval=curr; else if(key < curr->data) curr=curr->left_; else curr=curr->right_; } return retval; } Page 63 9.3.1 Insertion To insert into a binary search tree, we must maintain the nodes in sorted order. There is only one place an item can go. Example Insert the letters D,B,C,E,A, and F in to an initially empty Binary search tree in the order listed. Insert D D Insert B. Must go on left as B < D D B Insert C. C < D so look at left subtree. C > B so go on right branch from B D B C Insert E. E > D. D's right subtree is empty so put E there. D E B C Page 64 Insert A. A < D. Look in left subtree. A < B so make it B's left subtree D E B A C Insert F. F > D and F > E so put it under E's right subtree. D E B A C F 9.3.2 Traversal Suppose that I want to print all values in the tree from smallest to biggest. How would I do this? void PrintNode(TNode<TYPE>* node){ if(node){ PrintNode(node->left_); node->data_.print(); PrintNode(node->right_); } } void Print(BST& tree){ PrintNode(tree.root_); } A tree is defined recursively. Many algorithms are most easily written recursively. This is especially true for functions involving tree traversals. A tree traversal is an algorithm that is done by visiting every node just once and applying some function to that node. In above we visit each node and print it just once. The above code is what is called InOrder traversal. This means all values less than the current node are first dealt with, then Page 65 the current node then the nodes with values greater than the current node. Aside from InOrder tree traversal there is also PreOrder traversal and PostOrder traversal. In PreOrder traversal, the current node is visited first then the children are visited (left first then right). In Post Order traversal, the children are visited first and then the current node is visited. If we did a PreOrder Printing of the tree the code would look like: void PrePrintNode(<TNode<TYPE>* node){ if(node){ node->data_.print(); PrePrintNode(node->left_); PrePrintNode(node->right_); } } void PreOrderPrint(BST& tree){ PrePrintNode(tree.root_); } If we did a PostOrder Printing of the tree the code would look like: void PostPrintNode(<TNode<TYPE>* node){ if(node){ PostPrintNode(node->left_); PostPrintNode(node->right_); node->data_.print(); } } void PostOrderPrint(BST& tree){ PostPrintNode(tree.root_); } 9.3.3 Deletion In order to delete a node, we must be sure to link up the subtree(s) of the node properly. Let us consider the following situations: D B A H C F E L G J I K Page 66 Delete Node with G. G has no children. Simply delete node and make the pointer from the parent node point to NULL. D B A H C L F J E I K Page 67 Now, Lets delete a node like F which has only a left child but no right child. This is also Easy. all we need to do is make the pointer from the parent node point to the left child. D B A H C F L E J I K Thus our tree is D B A H C E L J J I K Page 68 Delete node E (easy. E is a node with no children) so that we can have a node with just a right subtree in H. D B A H C L J I K Now lets delete Node H. Node H only has a single right subtree. The process is the same as that of deleting a node with just a left subtree, make the parent node point to the right subtree instead of itself. D B A L C J I K Delete node B. B has two children which means we can't just make D point to its successors as there are two of them only one available link from D. To delete B, we must: 1) find its inorder successor (next biggest value). done by going right then go left as far as possible 2) promote inorder successor to replace node to be deleted (think of it as deleting inorder successor and then replace value in current node with value of inorder successor. Note that inorder sucessor will always either be a leaf or at most have one child Page 69 In order successor of B is C. Delete node containing C and replace data in node B with data in node C D C L A J I K What would tree look like if we were to delete node D? In order successor of D is I. Remove and promote I. I L L C A J K This promotion can be handled two ways: 1) copy data into node and do a delete on inorder successor. 2) change the links involved (there are at most three of them) method 2 is better as original address of the nodes are preserved and thus any other data structures that had stored these address can remain valid. So far this seems pretty good. Inserts and deletes are not too hard to do AND search is very fast. Only problem is the following is ALSO a binary search tree: Page 70 B C A D E F G H I If we were to insert in the following order: B,A,C,D,E,F,G,H,I we would get the above tree. Problem with our algorithms so far is that the order the data is presented determines how efficient the structure will be. This is not a good. Page 71