EXPERIMENTAL STUDY OF NODE INSERTION IN BINARY SEARCH TREES Geiby George & geibygeorge2004@yahoo.com.sg Arun Mahendra arun_m_1210@hotmail.com Tarleton State University Stephenville, Texas ABSTRACT Binary Search Trees (BSTs) are one of the most heavily used data structures in Computer Science, with applications both in theory (algorithms, recursive functions, compilers) and practice. The worst-case complexity for many operations on BSTs is proportional to the depth of the respective tree, so depth is a crucial performance parameter for BSTs. We study the dynamics of the depth under insertion of random numbers into the BST. It is a well-known theoretical result that in this case the average depth is "big-Oh" of the logarithm of the number of nodes. Our study confirms this, and also gives insight into the multiplicative constants behind the "big-Oh" and the second-order behavior. 1 SECTION I - INTRODUCTION In this experiment we study the dynamics of the depth under insertion of random numbers into the Binary Search Tree. In this paper we include various topics related to the Binary Search Tree. We take a quick look at Trees in general and its related terminologies, their applications and the different fields where they are used. Later on, we explain each functions used in this program, their syntax, and the code. We discuss the results and also plot a graph related to the results obtained. Tree is an abstract type that stores elements hierarchically. Applications of Binary Trees include: Heap Sort (theoretical), Text Searching (Suffix Trees). In medical field, for example for DNA analysis. For data compression with Huffman’s Algorithms. Storing a set of names, and being able to lookup based on a prefix of the name. (Used in internet routers) Storing a path in a graph, and being able to reverse any subsection of the path in O(log n) time. (Useful in traveling salesman problems). TREES AND RELATED TERMINOLOGIES Each element in a tree has a parent element and zero or more children, except the top element. A tree is a non-empty collection of finite vertices and edges that obeys certain requirements. In a tree there is one specially designated vertex called root and the remaining vertices are partitioned into a collection of sub-trees, each of which is also a tree. A simple object (node) that can have a name and carry other associated information is called a vertex. The connection between two vertices is known as an edge. A node may not have children, such a node is known as leaf. Children of same parent are called siblings. A set of tree is called a forest. If a node has no children then the node is external whereas if it has more than one or more children then is internal. External nodes are also known as leaves. An ancestor is either the node itself or an ancestor of parent of the node in a tree. A descendent is the child of a node in a tree. A tree is called ordered if there is a linear ordering defined for the children of each node of the tree. Usually ordered trees indicate the linear order existing between siblings by listing them in a sequence. Successive vertices are connected by edges in a tree and are know as the path of the tree. The depth of any node is the length of the path from the root to the current node. The height of a tree is equal to the maximum depth of an external node of a tree. 2 SECTION II - BINARY TREES An ordered tree is said to be binary tree if the nodes of the tree have at most two children. A binary tree is a tree where the nodes have data and pointers to at most 2 children. The child nodes are called the left node and the right node. The left node is often used to represent children with values that are less than or equal to the parent node and the right node is for children with values that are greater than the parent node. A full binary tree is a binary tree in which each vertex has either two children or zero children. No node in a binary tree may have more than two children, whereas there is no limit on the number of nodes in a tree. A binary tree may be empty where as a tree cannot be empty. Root 25 depth = 0 Parent 20 depth = 1 30 height = 3 Child 10 5 22 11 21 28 Leaf 35 depth = 2 depth = 3 Figure 1: shows a binary tree with height 3 with 5 leaves and 5 internal nodes. Traversing of Binary Tree Traversing a tree means to go through each node in the tree, in the proper order. There are 4 ways to traverse a binary tree: o pre-order - the root node is visited first and then the subtrees rooted at its children are traversed recursively. o in-order - traverse the left subtree, then the root node, then the right subtree. o post-order - it recursively traverses the subtrees rooted at the children of the root first, and then visits the root node. o backward in-order - traverse the right subtree, then the root node, then the left subtree. 3 SECTION III - FUNCTIONS insertNode The insertNode function adds a new node to the binary tree. The insertNode changes the structure of the binary tree. Syntax: Static void insertNode ( node*& parent, int num ); Explanation: The insertNode begins at the root of the tree and traces the path downward. The function is passed a node tempnode and a pointer to an integer array perm[x]. It modifies the tree and some fields of tempnode in such a way that tempnode is inserted into an appropriate position in the tree. In this function, num is compared to the parent to check whether num is greater than the parent. If num is greater than parent then num is stored in the right child where as if num is less then parent then num is stored as left child. Figure 2 shows how insertNode works. insertNode runs in O(h) time on a tree of height h. ___________________________________________________________________________ static void insertNode ( node*& parent, int num ) { if ( parent == NULL ){ parent = new node; parent -> left = NULL; parent -> right = NULL; parent -> value = num; parent -> depth = d; } else if ( num < parent -> value ){ d++; insertNode ( parent -> left, num ); d--; } else{ d++; insertNode ( parent -> right, num ); d--; } } __________________________________________________________________________________ 4 Figure 2: Inserting an item with a new value perm 23 into the binary search tree. The shaded nodes indicate the path from the root to the position where the item is inserted. The new value is inserted into the faded box in the binary tree. The dashed line indicates the new link in the tree that is added to insert the item. The dashed arrow indicates the position where the new node is inserted in the array. 25 20 30 10 5 22 11 28 35 21 23 Sample Input Data: 25 20 30 10 22 28 35 5 11 21 23 0 1 2 3 4 5 6 7 8 9 10 5 maxDepth The maxDepth function computes the maximum depth of a tree. Syntax: void maxDepth(node *tree); Explanation: maxDepth computes the maximum depth by computing the number of nodes along the longest path from the root node down to the farthest leaf node. ___________________________________________________________________________ void maxDepth(node *tree){ if (tree){ maxDepth(tree->left); d = (d > tree->depth) ? d : tree->depth; maxDepth(tree->right); } return; } __________________________________________________________________________________ Figure 3: The shaded nodes indicate the path used by the maxDepth function from the root to the farthest leaf node. 25 20 30 22 10 5 11 21 28 35 23 2 6 destroytree Syntax: void destroytree(node *tree); Explanation: The destroytree function is used to return memory used by the tree to the operating system thus avoiding memory leaks. A memory leak occurs when a program does not deallocate memory that has been used. A small memory leak is often not a problem. A large memory leak will cause a program to stop running because it has run out of memory. In this function, leafs are always destroyed first; it deletes the left leaf first and then followed by the right leaf and then the parent node. destroytree runs in O(n) time on a tree 25 20 30 10 5 22 11 28 35 21 ___________________________________________________________________________ void destroytree ( node *tree ) { if ( tree != NULL ) { destroytree ( tree -> left ); destroytree ( tree -> right ); delete tree; } } 7 SECTION V - RESULT h is O (log n) h = C1 log (n) + C2 where C1 is the multiplicative constant behind the Big-Oh C2 is the constant (ignored) Catalan number increases exponentially. C1 is the calculated by, C1 is the slope of the equation of line and we can calculate it by using, y2 – y1 x2 – x1 putting the values we get, 49.28 – 12.33 6–2 the approximate value of slope is, C1 = 9.24 SECTION VI - CONCLUSION In this research we studied the dynamics of the depth under insertion of random numbers into the BST. This research just confirmed the theoretical result that the average depth is "big-Oh" of the number of nodes. It also gives insight into the multiplicative constants behind the "big-Oh" and the second-order behavior. 8 SECTION VII – REFERENCE Goodrich, Michael T. and Tamassia, Roberto. Algorithm Design – Foundations, Analysis, and Internet Examples. Wiley Text Books, 2001. Cormen, Thomas H., Leiserson, Charles E. and Rivest, Ronald L. Introduction to Algorithms, MIT Press, 1990. Internet 1 <http://www.laynetworks.com/cs05_SAD_2a.htm> Internet 2 <http://www.rrcc-online.com/~julies/csc161/csc161.htm#ch10> Internet 3 <http://www.cs.jhu.edu/~goodrich/dsa/trees/btree.html> 9