CPSC 221: Data Structures Lecture #5 Branching Out Steve Wolfman 2014W1 1 Today’s Outline • • • • • Binary Trees Dictionary ADT Binary Search Trees Deletion Some troubling questions 2 Binary Trees • Binary tree is either – empty (NULL for us), or – a datum, a left subtree, and a right subtree A B • Properties – max # of leaves: – max # of nodes: C D E F G H • Representation: Data left right pointer pointer I J 3 Representation struct Node { KTYPE key; DTYPE data; Node * left; Node * right; }; A left right pointerpointer B C left right pointerpointer left right pointerpointer D E F left right pointerpointer left right pointerpointer left right pointerpointer A B D C E F 4 Today’s Outline • • • • • Binary Trees Dictionary ADT Binary Search Trees Deletion Some troubling questions 5 What We Can Do So Far • Stack • List – Push – Pop – Insert – Remove – Find • Queue – Enqueue – Dequeue • Priority Queue (skipped!) – Insert – DeleteMin What’s wrong with Lists? 6 Dictionary ADT • • Dictionary operations – – – – – create destroy insert find delete – would be tastier with brownies insert • brownies - tasty midterm • prog-project – so painful… who designed this language? • find(wolf) • wolf - the perfect mix of oomph wolf – the perfect mix of oomph and Scrabble value and Scrabble value • Stores values associated with user-specified keys – values may be any (homogenous) type – keys may be any (homogenous) comparable type 7 Search/Set ADT • Dictionary operations – – – – – create destroy insert find delete insert • Min Pin find(Wolf) NOT FOUND • • • • • • • • Berner Whippet Alsatian Sarplaninac Beardie Sarloos Malamute Poodle • Stores keys – keys may be any (homogenous) comparable – quickly tests for membership 8 A Modest Few Uses • • • • • • • Arrays and “Associative” Arrays Sets Dictionaries Router tables Page tables Symbol tables C++ Structures 9 Desiderata • Fast insertion – runtime: • Fast searching – runtime: • Fast deletion – runtime: 10 Naïve Implementations insert find delete • Linked list • Unsorted array • Sorted array worst one… yet so close! 11 Today’s Outline • • • • • Binary Trees Dictionary ADT Binary Search Trees Deletion Some troubling questions 12 Binary Search Tree Dictionary Data Structure • Binary tree property 8 – each node has 2 children – result: • storage is small • operations are simple • average depth is small* • Search tree property – all keys in left subtree smaller than root’s key – all keys in right subtree larger than root’s key – result: • easy to find any given key 5 2 11 6 4 10 7 9 12 14 13 13 *Technically: a result of both properties. Example and Counter-Example 5 8 4 1 8 7 5 11 3 BINARY SEARCH TREE 2 7 4 11 6 10 15 NOT A BINARY SEARCH TREE 18 20 21 14 In Order Listing struct Node { KTYPE key; DTYPE data; Node * left; Node * right; }; 10 5 15 2 9 7 20 17 30 In order listing: 25791015172030 15 Finding a Node 10 5 15 2 9 7 a. b. c. runtime: d. e. 20 17 30 O(1) O(lg n) O(n) O(n lg n) None of these Node *& find(Comparable key, Node *& root) { if (root == NULL) return root; else if (key < root->key) return find(key, root->left); else if (key > root->key) return find(key, root->right); else return root; } 16 Finding a Node 10 5 15 2 9 7 20 17 30 WARNING: Much fancy footwork with refs (&) coming. You can do all of this without refs... just watch out for special cases. Node *& find(Comparable key, Node *& root) { if (root == NULL) return root; else if (key < root->key) return find(key, root->left); else if (key > root->key) return find(key, root->right); else return root; } 17 Iterative Find Node * find(Comparable key, Node * root) { while (root != NULL && root->key != key) { if (key < root->key) root = root->left; else root = root->right; } 10 5 15 2 9 7 20 17 30 return root; } Look familiar? (It’s trickier to get the ref return to work here.) 18 Insert 10 5 15 2 9 7 20 17 30 // Precondition: key is not // already in the tree! void insert(Comparable key, Node * root) { Node *& target(find(key, root)); assert(target == NULL); target = new Node(key); } runtime: 19 Funky game we can play with the *& version. Digression: Value vs. Reference Parameters • Value parameters (Object foo) – copies parameter – no side effects • Reference parameters (Object & foo) – shares parameter – can affect actual value – use when the value needs to be changed • Const reference parameters (Object const & foo) – shares parameter – cannot affect actual value 20 – use when the value is too big for copying in pass-by-value BuildTree for BSTs • Suppose the data 1, 2, 3, 4, 5, 6, 7, 8, 9 is inserted into an initially empty BST: – in order – in reverse order – median first, then left median, right median, etc. so: 5, 3, 8, 2, 4, 7, 9, 1, 6 21 Analysis of BuildTree • Worst case: O(n2) as we’ve seen • Average case assuming all orderings equally likely turns out to be O(n lg n). 22 Bonus: FindMin/FindMax • Find minimum 10 5 • Find maximum 15 2 9 7 20 17 30 23 Double Bonus: Successor Find the next larger node in this node’s subtree. Node *& succ(Node *& root) { if (root->right == NULL) return root->right; else return min(root->right); } 10 5 15 2 Node *& min(Node *& root) { if (root->left == NULL) return root; else return min(root->left); } 9 7 20 17 30 24 More Double Bonus: Predecessor Find the next smaller node in this node’s subtree. Node *& pred(Node *& root) { if (root->left == NULL) return root->left; else return max(root->left); } Node *& max(Node *& root) { if (root->right == NULL) return root; else return max(root->right); } 10 5 15 2 9 7 20 17 30 25 Today’s Outline • Some Tree Review (here for reference, not discussed) • Binary Trees • Dictionary ADT • Binary Search Trees • Deletion • Some troubling questions 26 Deletion 10 5 15 2 9 7 20 17 30 Why might deletion be harder than insertion? 27 Lazy Deletion • Instead of physically deleting nodes, just mark them as deleted (with a “tombstone”) + + + – simpler physical deletions done in batches some adds just flip deleted flag 2 small amount of extra memory for deleted flag – many tombstones slow finds – some operations may have to be modified (e.g., min and max) 10 5 15 9 7 20 17 30 28 Lazy Deletion Delete(17) 10 Delete(15) 5 Delete(5) Find(9) Find(16) 15 2 9 7 20 17 30 Insert(5) Find(17) 29 Deletion - Leaf Case 10 Delete(17) 5 15 2 9 7 20 17 30 30 Deletion - One Child Case 10 Delete(15) 5 15 2 9 7 20 30 31 Deletion - Two Child Case 10 Delete(5) 5 20 2 9 30 7 32 Finally… 10 7 2 20 9 30 33 Delete Code void delete(Comparable key, Node *& root) { Node *& handle(find(key, root)); Node * toDelete = handle; if (handle != NULL) { if (handle->left == NULL) { // Leaf or one child handle = handle->right; } else if (handle->right == NULL) { // One child handle = handle->left; } else { // Two child case Node *& successor(succ(handle)); handle->data = successor->data; toDelete = successor; successor = successor->right; // Succ has <= 1 child } delete toDelete; } Refs make this short and “elegant”… } 34 but could be done without them with a bit more work. Today’s Outline • Some Tree Review (here for reference, not discussed) • Binary Trees • Dictionary ADT • Binary Search Trees • Deletion • Some troubling questions 35 Thinking about Binary Search Trees • Observations – Each operation views two new elements at a time – Elements (even siblings) may be scattered in memory – Binary search trees are fast if they’re shallow • Realities – For large data sets, disk accesses dominate runtime – Some deep and some shallow BSTs exist for any data 36 Solutions? • Reduce disk accesses? • Keep BSTs shallow? 37 To Do • Continue readings on website! 38 Coming Up Spawns parallel task. Since we have only one classroom, one of these goes first! • • • • cilk_spawn Parallelism and Concurrency cilk_spawn Self-balancing Binary Search Trees cilk_spawn Priority Queues cilk_spawn Sorting (most likely!) • • • • • Huge Search Tree Data Structure cilk_join cilk_join cilk_join cilk_join 39