Joe Meehean 1 Important and common problem Given a collection, determine whether value v is a member Common variation • given a collection of unique keys, each associated with a value • find value v associated with the key k • find the mapping Dictionary • key => word • value => definition Phonebook • key => name • value => phone number Webpage • key => address • value => html files and pictures Problem: given an array of N values, determine if v is one of them Two approaches • sequential search • binary search Look at each value in turn (iterate) • e.g., a[0], a[1], … • quit when v is found or end of array reached • worst case time: O(N) What if a is sorted? • look at each value in turn • quit when v is found or end of array reached • OR, when current value is > v • worst case time still O(N) Array must be sorted • (a[0] <= a[1] <= a[2] … <= a[n]) Algorithm • like the Clock Game on Price is Right Array must be sorted • (a[0] <= a[1] <= a[2] … <= a[n]) Algorithm • look at middle value x in array • if x == v • else eliminate ½ the array if v < x, eliminate the right half if v > x, eliminate the left half • repeat until v is found or no remaining values bat cat dog elk fox owl rat fox owl rat fox owl rat fox owl rat bar ant bat cat dog elk bar ant bat cat dog elk bar ant bat cat dog elk bar Throws away half the entries at every compare ant 8 Array or vector must be sorted Data type must provide < operator • if !(a < b) && !(b < a) then b == a Or Comparator Number of times N can be divided by 2 In Big O it is O(logN) • difference between log2 N and log N is a constant Scales better than O(N) • O(logN) algorithms are faster If N = 1024, log(N) is 10 10 bat cat dog elk fox owl rat fox owl rat fox owl rat fox owl rat bar ant bat cat dog elk bar ant bat cat dog elk bar ant bat cat dog elk bar Throws away half the entries at every compare ant 12 What if we made a special data structure that represents a binary search? 13 Special kind of binary tree Each node stores a key • sometimes an associated value For each node n • all keys in n’s left subtree are < key at n • all keys in n’s right subtree are > key at n • if duplicate keys allowed keys that equal n can go left XOR right (not both) 14 Insert a key (and associated data) Lookup a key (and associated data) Remove a key (…) Print all keys in sorted order • using an inorder traversal 15 6 4 2 6 9 5 4 2 9 7 16 4 2 1 6 5 4 2 6 9 5 4 2 9 7 3 Yes In order traversal produces: 24569 No: 7 is not < 6 17 // private inner class of BST<K> class BinaryNode{ public: K key_; BinaryNode * left_; BinaryNode * right_; //constructors } 18 template <class K,class Compare=less<K> > class BST <K,Compare>{ private: BinaryNode* root_; Compare isLessThan_; public: BST() {root_ = NULL;} bool insert(const K& key); bool lookup(const K& key); void delete(const K& key); } 19 20 Key is in BST if it is in • the root • the left subtree • or the right subtree Don’t need to look in both subtrees • just like binary search in an array 21 // public driver method // method of BST bool lookup(K& key){ // private recursive helper method // on next slide return lookup(root, key); } 22 // private method of BST bool lookup(Bnode* n, K& k){ if( n == NULL ) return false; else if( isLessThan(k, n->key) ) return lookup(n->left, k); else if( isLessThan(n->key, k)) return lookup(n->right, k); else return true; } 23 Class Activity Cases 6 • empty (null) subtree • value found • next look left 4 • next look right Shout 9 it out • lookup(4) • lookup(5) • lookup(3) 2 5 24 Always follows path from root down Worst-case • goes to a leaf along longest path • proportional to tree height Height related to size • given size, how can we know height? 25 Best case tree is balanced • all non-leaf nodes have 2 children • all leaves at the same depth • height is log2N Worst case tree is linear • all non-leaf nodes have a single child • height is N 26 6 lookup(2) 4 3 9 5 7 15 2 27 6 lookup(2) 4 3 2 9 5 7 15 Eliminates half the nodes at every compare 28 6 lookup(2) 4 3 2 9 5 7 15 Eliminates half the nodes at every compare 29 6 lookup(2) 4 3 2 9 5 7 15 Eliminates half the nodes at every compare 30 Worst-case • O(height of tree) Worst of worst • height is N • lookup is O(N) Best worst-case • height is log2N • lookup is O(logN) O(LogN) is waaaay better than O(N) 31 32 New values inserted as leaves Must choose position to respect BST ordering • and to ensure we can find it with a lookup Duplicate keys are not allowed 33 Traverse the tree • like a lookup If we find a duplicate • return an error If we end up at a null • make a new node with the key • make it the child of the node with null pointer Note the above two were our base cases for lookup too 34 // members of BST void insert(const K& key){ insert(root, key); } void insert( BinaryNode*& n, const K& key){ if( n == NULL ){ n = new BinaryNode(key); }else if( isLessThan(key, n->key_) ){ insert(n->left_, key); }else if( isLessThan(n->key_, key) ){ insert(n->right_, key); }else{ //duplicate, do nothing } } 35 CLASS ACTIVITY First names BST • You add your names 36 Similar to lookup • worst-case follow path from root to leaf • O(logN) for a balanced tree • O(N) for a completely unbalanced tree 37 Find the node n w/ key to be deleted Different actions depending on n’s # of kids • Case 1: n has 0 kids (it’s a leaf) set parent’s n-pointer (left or right) to null • Case 2: n has 1 kid set parent’s n-pointer to point to n’s only kid • Case 3: n has 2 kids replace n’s key with a key further down in the tree delete that node 38 What node key can replace n’s key? • new key of n must be: • > all keys in left subtree • < all keys in right subtree Largest key from the left subtree Smallest key from the right subtree • let’s choose this one (arbitrarily) • use findMin on root of right subtree 39 8 delete(17) 15 … … 20 … 18 … 16 17 40 8 delete(17) 15 … … 20 … 18 … 16 17 41 8 delete(16) 15 … … 20 … 18 … 16 17 42 8 delete(16) 15 … … 20 … 18 … 16 17 43 delete(15) 8 15 … … 20 … 18 … 16 Smallest key in right subtree 17 44 delete(15) 8 16 … … 20 … 18 … 16 Case 2: 1 kid Replace 16 with it’s only child 17 45 8 delete(15) 16 … … 20 … 18 … 16 Case 2: 1 kid Replace 16 with it’s only child 17 46 47 Find the node n w/ key to be deleted Different actions depending on n’s # of kids • case 1: n has 0 kids (it’s a leaf) set parent’s n-pointer (left or right) to null • case 2: n has 1 kid set parent’s n-pointer to point to n’s only kid • case 3: n has 2 kids replace n’s key with a key further down in the tree delete that node 48 Case 1 (n is leaf) and case 2 (n has 1 kid) both need to update the parents pointer How? Pass a reference to that pointer 49 // publicly visible method void BST<K>::delete(const K& key){ delete(root_, key); } // private helper method void BST<K>::delete( Node*& n, const K& k){ // base case 1 (key not in tree) if( n == null ){ return; } ... } 50 // private helper method void BST<K>::delete( Node*& n, const K& k){ ... if( isLessThan(k, n->key) ){ delete(n->left, k); }else if( isLessThan(n->key, k) ){ delete(n->right, k); } ... } 51 // private helper method void BST<K>::delete( Node*& n, const K& k){ ... // case 3 (has two children) else if( n->left != NULL && n->right != NULL ){ Node** tmp = findMin(&n->right); n->key = (*tmp)->key; // handles cases 1 & 2 for tmp removeNodeSimple(*tmp); } ... } 52 // private helper method void BST<K>::delete( Node*& n, const K& k){ ... else{ // handles cases 1 & 2 removeNodeSimple(n); } ... } 53 // private helper method // should only be called on nodes with // 0 or 1 children void removeNodeSimple(Node*& n){ // left as an exercise } 54 Find the node to be deleted • follow path from root to desired node Delete node • worst-case: 2 kids • find smallest key in right subtree 55 Worst-case (delete deepest leaf) Root to leaf H = Height of Tree • O(H) Balanced tree • O(logN) where N is keys in tree O(N) for a completely unbalanced tree 56 Some BSTs have two data values • a key: used to lookup • a value: some data associated with that key • used to implement maps 57 In class BST<K,V,Compare> this case: • two generic types • binary node has four K key_; V value_; Bnode* left_; Bnode* right_; insert( const K& key, const V& value); V& lookup(const K& key); V delete(const K& key); member variables • insert has two parameters • lookup returns value for given key • delete might return value too 58 BSTs store comparable keys • and associated values if desired Lookup, insert, delete are easy to implement (sort of) All are worst-case O(Height of tree) • O(N) for unbalanced • O(logN) for balanced • O(logN) is waaaaay better than O(N) If only we could guarantee a tree was balanced… 59 60