BST Data Structure A BST node contains: – A key (used to search) – The data associated with that key – Pointers to children, parent • Leaf nodes have NULL pointers for children A BST contains – A pointer to the root of the tree. 1 BST Operations: Insert BST property must be maintained Algorithm sketch: – To insert data with key k – Compare k to root.key – If k < root.key, go left – If k > root.key, go right – Repeat until you reach a leaf. That's where the new node should be inserted. • Note: keep track of prospective parent along the way. 2 BST Operations: Insert Running time: – The new node is inserted at a leaf position, so this depends on the height of the tree. Worst case: – Inserting keys 1,2,3,... in this order will result in a tree that looks like a chain: • Tree has degenerated to list 1 • Height : linear • Note also that such a tree is worse than a linked list since it takes up more space (more pointers) 2 3 3 BST Operations: Insert Running time: – The new node is inserted at a leaf position, so this depends on the height of the tree. Best case – The top levels of the tree are filled up completely – The height is then logn where n is the number of nodes in the tree. 2 12 4 14 8 16 4 BST Operations: Insert The height of a complete (i.e. all levels filled up) BST with n nodes is logarithmic. Why? – Level i has 2i nodes, for i=0 (top level) through h (=height) – The total number of nodes, n, is then: n = 20+21+...+2h = (2h+1-1)/(2-1) = 2h+1-1 Solving for h gives us h logn 5 BST Operations: Insert Analysis conclusion – An insert operation consists of two parts: • Search for the position – best case logarithmic – worst case linear • Physically insert the node – constant 6 BST Operations: Insert What if we allow duplicate keys? – Idea #1 : Always insert in the right subtree • Results in very unbalanced tree – Idea #2 : Insert in alternate subtrees • Makes it difficult to search for all occurrences – Idea #3 : All elements with the same key are inserted in a single node • Good idea! – Easy to search, does not affect balance any more than non-duplicate insertion. 7 BST Operations: Insert What if we allow variable number of children? (n-ary tree) – Idea : Use a vector/list of pointers to children. 8 BST Operations: Search Take advantage of the BST property. Algorithm sketch: – Compare target to root – If equal, return success – If target < root, search left – If target > root, search right Running time: – Similar to insert 9 BST Operations: Delete The Delete operation consists of two parts: – Search for the node to be deleted • best case constant (deleting the root) • worst case linear – Delete the node • best case? • worst case? 10 BST Operations: Delete CASE #1 – The node to be deleted is a leaf node. – Easy! • Physically remove the node. • Constant time – We are just resetting its parent's child pointer and deallocating memory 11 BST Operations: Delete CASE #2 – The node to be deleted has exactly one child – Easy! • Physically remove the node. • Constant time – We are just resetting its parent's child pointer, its child's parent pointer and deallocating memory 12 BST Operations: Delete CASE #3 – The node to be deleted has two children – Not so easy • If we physically delete the node, we'll have to place its two children somewhere. This seems to require too much tree restructuring. • But we know it's easy to delete a node that has at most one child. What if we find such a node whose contents can be copied over without violating the BST property and then physically delete that node? 13 BST Operations: Delete CASE #3, continued – The node to be deleted, x, has two children – Idea: • Find the x's immediate successor, y. It is guaranteed to have at most one child • Copy the y's contents over to x • Physically delete y. 14 BST Operations: Delete Finding the immediate successor: – We know that the node has two children. Due to the BST property, the immediate successor will be in the right subtree. – In particular, the immediate successor will be the smallest element in the right subtree. – The smallest element in a BST is always the leftmost leaf. 15 BST Operations: Delete Finding the immediate successor: – Since it requires traveling down the tree from the current node to a leaf, it may take up to linear time in the worst case. – In the best case it will take logarithmic time. – The time to perform the copy and delete the successor is constant. 16 Binary Search Trees Traversing a tree = visiting its nodes Three major ways to traverse a binary tree: •preorder •visit root •visit left subtree •visit right subtree •inorder •visit left subtree •visit root •visit right subtree •postorder •visit left subtree •visit right subtree •visit root When applied on a BST, it visits the nodes in order from smaller to larger 17 Binary Search Trees void print_inorder(Node *subroot ) { if (subroot != NULL) { print_inorder(subroot left); cout << subrootdata; print_inorder(subroot right); } } How long does this take? There is exactly one call to print_inorder() for each node of the tree. There are n nodes, so the running time of this operation is (n) 18 Binary Search Trees A tree may also be traversed one "level" at a time (top to bottom, left to right). This is usually called a level-order traversal. – It requires the use of a temporary queue: enqueue root while (queue is not empty) { get the front element, f print f enqueue f's children dequeue } 19 Binary Search Trees 12 4 2 14 8 6 16 10 in-order : 2 - 4 - 6 - 8 - 10 - 12 - 14 pre-order: 12 - 4 - 2 - 8 - 6 - 10 - 14 - 16 post-order: 2 - 6 - 10 - 8 - 4 - 16 - 14 - 12 level-order: 12 - 4 - 14 - 2 - 8 - 16 - 6 - 10 20 Binary Search Trees Idea for sorting algorithm: – Given a sequence of integers, insert each one in a BST – Perform an inorder traversal. The elements will be accessed in sorted order. Running time: – In the worst case, the tree will degenerate to a list. Creation will take quadratic time and traversal will be linear. Total: O(n2) – On average, the tree will be mostly balanced. Creation will take O(nlogn) and traversal will again be linear. Total: O(nlogn) 21 BSTs vs. Lists Time – In the worst case, all dictionary operations are linear. – On average, BSTs are expected to do better. Space – BSTs store an additional pointer per node. The BST seemed like a good idea, but in the end it doesn't offer much improvement. – We must find a way to keep the tree balanced and guarantee logarithmic height. 22 Balanced Trees There are several ways to define balance Examples: – Force the subtrees of each node to have almost equal heights – Place upper and lower bounds on the heights of the subtrees of each node. – Force the subtrees of each node to have similar sizes (=number of nodes) 23