binary search trees - Tarleton State University

advertisement
EXPERIMENTAL STUDY OF NODE INSERTION IN
BINARY SEARCH TREES
Geiby George
&
geibygeorge2004@yahoo.com.sg
Arun Mahendra
arun_m_1210@hotmail.com
Tarleton State University
Stephenville, Texas
ABSTRACT
Binary Search Trees (BSTs) are one of the most heavily used data structures in
Computer Science, with applications both in theory (algorithms, recursive functions,
compilers) and practice. The worst-case complexity for many operations on BSTs is
proportional to the depth of the respective tree, so depth is a crucial performance
parameter for BSTs.
We study the dynamics of the depth under insertion of random numbers into the
BST. It is a well-known theoretical result that in this case the average depth is "big-Oh"
of the logarithm of the number of nodes. Our study confirms this, and also gives insight
into the multiplicative constants behind the "big-Oh" and the second-order behavior.
1
SECTION I - INTRODUCTION
In this experiment we study the dynamics of the depth under insertion of random
numbers into the Binary Search Tree.
In this paper we include various topics related to the Binary Search Tree. We take a
quick look at Trees in general and its related terminologies, their applications and the different
fields where they are used. Later on, we explain each functions used in this program, their
syntax, and the code. We discuss the results and also plot a graph related to the results
obtained.
Tree is an abstract type that stores elements hierarchically.
Applications of Binary Trees include:
 Heap Sort (theoretical), Text Searching (Suffix Trees).
 In medical field, for example for DNA analysis.
 For data compression with Huffman’s Algorithms.
 Storing a set of names, and being able to lookup based on a prefix of the name.
(Used in internet routers)
 Storing a path in a graph, and being able to reverse any subsection of the path in
O(log n) time. (Useful in traveling salesman problems).
TREES AND RELATED TERMINOLOGIES
Each element in a tree has a parent element and zero or more children, except the top
element.
A tree is a non-empty collection of finite vertices and edges that obeys certain
requirements. In a tree there is one specially designated vertex called root and the remaining
vertices are partitioned into a collection of sub-trees, each of which is also a tree. A simple
object (node) that can have a name and carry other associated information is called a vertex.
The connection between two vertices is known as an edge. A node may not have children,
such a node is known as leaf. Children of same parent are called siblings. A set of tree is
called a forest.
If a node has no children then the node is external whereas if it has more than one or
more children then is internal. External nodes are also known as leaves. An ancestor is either
the node itself or an ancestor of parent of the node in a tree. A descendent is the child of a
node in a tree.
A tree is called ordered if there is a linear ordering defined for the children of each
node of the tree. Usually ordered trees indicate the linear order existing between siblings by
listing them in a sequence.
Successive vertices are connected by edges in a tree and are know as the path of the
tree. The depth of any node is the length of the path from the root to the current node. The
height of a tree is equal to the maximum depth of an external node of a tree.
2
SECTION II - BINARY TREES
An ordered tree is said to be binary tree if the nodes of the tree have at most two
children. A binary tree is a tree where the nodes have data and pointers to at most 2 children.
The child nodes are called the left node and the right node. The left node is often used to
represent children with values that are less than or equal to the parent node and the right node
is for children with values that are greater than the parent node.
A full binary tree is a binary tree in which each vertex has either two children or zero
children. No node in a binary tree may have more than two children, whereas there is no limit
on the number of nodes in a tree. A binary tree may be empty where as a tree cannot be
empty.
Root
25
depth = 0
Parent
20
depth = 1
30
height = 3
Child
10
5
22
11
21
28
Leaf
35
depth = 2
depth = 3
Figure 1: shows a binary tree with height 3 with 5 leaves and 5 internal nodes.
Traversing of Binary Tree
Traversing a tree means to go through each node in the tree, in the proper order.
There are 4 ways to traverse a binary tree:
o pre-order - the root node is visited first and then the subtrees rooted at its children are
traversed recursively.
o in-order - traverse the left subtree, then the root node, then the right subtree.
o post-order - it recursively traverses the subtrees rooted at the children of the root first,
and then visits the root node.
o backward in-order - traverse the right subtree, then the root node, then the left subtree.
3
SECTION III - FUNCTIONS
insertNode
The insertNode function adds a new node to the binary tree. The insertNode changes the
structure of the binary tree.
Syntax: Static void insertNode ( node*& parent, int num );
Explanation:
The insertNode begins at the root of the tree and traces the path downward. The
function is passed a node tempnode and a pointer to an integer array perm[x]. It modifies the
tree and some fields of tempnode in such a way that tempnode is inserted into an appropriate
position in the tree.
In this function, num is compared to the parent to check whether num is greater than
the parent. If num is greater than parent then num is stored in the right child where as if num is
less then parent then num is stored as left child. Figure 2 shows how insertNode works.
insertNode runs in O(h) time on a tree of height h.
___________________________________________________________________________
static void insertNode ( node*& parent, int num )
{
if ( parent == NULL ){
parent = new node;
parent -> left = NULL;
parent -> right = NULL;
parent -> value = num;
parent -> depth = d;
}
else if ( num < parent -> value ){
d++;
insertNode ( parent -> left, num );
d--;
}
else{
d++;
insertNode ( parent -> right, num );
d--;
}
}
__________________________________________________________________________________
4
Figure 2: Inserting an item with a new value perm 23 into the binary search tree. The shaded
nodes indicate the path from the root to the position where the item is inserted. The new value
is inserted into the faded box in the binary tree. The dashed line indicates the new link in the
tree that is added to insert the item. The dashed arrow indicates the position where the new
node is inserted in the array.
25
20
30
10
5
22
11
28
35
21
23
Sample Input Data:
25
20
30
10
22
28
35
5
11
21
23
0
1
2
3
4
5
6
7
8
9
10
5
maxDepth
The maxDepth function computes the maximum depth of a tree.
Syntax: void maxDepth(node *tree);
Explanation:
maxDepth computes the maximum depth by computing the number of nodes along the
longest path from the root node down to the farthest leaf node.
___________________________________________________________________________
void maxDepth(node *tree){
if (tree){
maxDepth(tree->left);
d = (d > tree->depth) ? d : tree->depth;
maxDepth(tree->right);
}
return;
}
__________________________________________________________________________________
Figure 3: The shaded nodes indicate the path used by the maxDepth function from the root to
the farthest leaf node.
25
20
30
22
10
5
11
21
28
35
23
2
6
destroytree
Syntax: void destroytree(node *tree);
Explanation:
The destroytree function is used to return memory used by the tree to the operating
system thus avoiding memory leaks. A memory leak occurs when a program does not
deallocate memory that has been used. A small memory leak is often not a problem. A large
memory leak will cause a program to stop running because it has run out of memory.
In this function, leafs are always destroyed first; it deletes the left leaf first and then
followed by the right leaf and then the parent node.
destroytree runs in O(n) time on a tree
25
20
30
10
5
22
11
28
35
21
___________________________________________________________________________
void destroytree ( node *tree ) {
if ( tree != NULL ) {
destroytree ( tree -> left );
destroytree ( tree -> right );
delete tree;
}
}
7
SECTION V - RESULT
h is O (log n)
h = C1 log (n) + C2
where C1 is the multiplicative constant behind the Big-Oh
C2 is the constant (ignored)
Catalan number increases exponentially.
C1 is the calculated by,
C1 is the slope of the equation of line and we can calculate it by using,
y2 – y1
x2 – x1
putting the values we get,
49.28 – 12.33
6–2
the approximate value of slope is,
C1 = 9.24
SECTION VI - CONCLUSION
In this research we studied the dynamics of the depth under insertion of random
numbers into the BST. This research just confirmed the theoretical result that the average
depth is "big-Oh" of the number of nodes. It also gives insight into the multiplicative
constants behind the "big-Oh" and the second-order behavior.
8
SECTION VII – REFERENCE
Goodrich, Michael T. and Tamassia, Roberto. Algorithm Design – Foundations, Analysis, and
Internet Examples. Wiley Text Books, 2001.
Cormen, Thomas H., Leiserson, Charles E. and Rivest, Ronald L. Introduction to Algorithms,
MIT Press, 1990.
Internet 1 <http://www.laynetworks.com/cs05_SAD_2a.htm>
Internet 2 <http://www.rrcc-online.com/~julies/csc161/csc161.htm#ch10>
Internet 3 <http://www.cs.jhu.edu/~goodrich/dsa/trees/btree.html>
9
Download