CS 221 Analysis of Algorithms Ordered Dictionaries and Search Trees Portions of these slides come from Michael Goodrich and Roberto Tamassia, Algorithm Design: Foundations, Analysis and Internet Examples, 2002, John Wiley and Sons. and its authors, Michael Goodrich and Roberto Tamassia, the books publisher John Wiley & Sons and… www.wikipedia.org Reading material Goodrich and Tamassia, 2002 Chapter 2, section 2.5,pages 114-137 see also section 2.6 Chapter 3, section 3.1 pages 141-151 Wikipedia: http://en.wikipedia.org/wiki/AVL_trees in the previous episode… …we defined a data structure which we called a dictionary. It was… a container to hold multiple objects or in Goodrich and Tamassia’s terminology “items” each item = a (key, element) pair element = a “piece” of data think= name, address, phone number key = a value we associate the element to help us find, retrieve, delete, etc an element think = rdbms autoincrement key, student ID# Dictionaries Up til now we looked at Unordered dictionaries container for (k,e) pairs but… in no particular order Logfiles Hash Tables Dictionaries A terminology note for purposes of our discussion – A linear unordered dictionary = logfile A lineary ordered dictionary = lookup table Game Time Twenty Questions One person thinks of an object that can be any person, place or thing… and does not disclose the selected object until it is specifically identified by the other players… All other players take turns asking Yes/No questions in an attempt to identify the mystery object Game Time Twenty Questions An efficient problem solving strategy is to ask questions for which the answers will optimally narrow the size of the problem space (possible solutions) for example, Q: Is it a person? A: Yes ….we just eliminated all places and non-human objects from the solution set Game Time Twenty Questions Size of problem? N=??? large ~∞ Yes/No attack makes this a binary search problem… So, what size of problem space can we effectively search? 220 Game Time Twenty Questions Something to think about… N is conceivably much larger than 220 So, how is that we can usually solve this problem in 20 steps or less… i.e. correctly identify the mystery object Dictionaries Ordered Dictionaries suppose the items in a dictionary are ordered (sorted) like low to high Would that make a difference in terms of size() isEmpty() findElement() insertItem() removeItem() Dictionaries Ordered Dictionaries suppose we implement an ordered dictionary as a linear data structure or more specifically a vector items are in vector in key order we gain considerable efficiency because we can visit D[x], where x is a rank in O(1) time Can we achieve the same time of findElement() time if the ordered dictionary were implemented as a linked list? Binary Search Binary search performs operation findElement(k) on a dictionary implemented by means of an array-based sequence, sorted by key similar to the high-low game at each step, the number of candidate items is halved terminates after O(log n) steps Example: findElement(7) 0 1 3 4 5 7 1 0 3 4 5 m l 0 9 11 14 16 18 m l 0 8 1 1 3 3 7 19 h 8 9 11 14 16 18 19 8 9 11 14 16 18 19 8 9 11 14 16 18 19 h 4 5 7 l m h 4 5 7 l=m =h Binary Search Method Logfile Lookup Table findElement O(n) O(log n) insertItem O(1) O(n) removeElement O(n) O(n) closetKeyBef O(log n) O(n) Lookup tables are not very efficient for dynamic data (lot of insertItem, removeElement Lookup tables are efficient for dictionaries where predominant access is findElement, and relatively little inserts or removes credit card authorizations, code translation tables,… Binary Search Tree Binary tree for holding (k,e) items, such that… each internal node v store elem e with key k k of e in left subtree of v <= k of v k of e in right subtree of v >= k of v external nodes store no elements… only placeholder (NULL_NODE) Binary Search Tree Each left subtree is less than its parent Each right subtree is greater than its parent All leaf nodes hold no items 58 31 90 25 12 42 36 62 75 Search Algorithm findElement(k, v) if T.isExternal (v) return NO_SUCH_KEY if k < key(v) return findElement(k, T.leftChild(v)) 1 else if k = key(v) return element(v) else { k > key(v) } return findElement(k, T.rightChild(v)) < 2 6 9 > 4 = 8 removeElement(k) – simple case To perform operation removeElement(k), we search for key k Assume key k is in the tree, and let let v be the node storing k If node v has a leaf child w, we remove v and w from the tree with operation removeAboveExternal(w) Example: remove 4 6 < 2 9 > 4 v 1 8 w 5 6 2 1 9 5 8 RemoveElement(k) – more complicated case We consider the case where the key k to be removed is stored at a node v whose children are both internal we find the internal node w that follows v in an inorder traversal we copy key(w) into node v we remove node w and its left child z (which must be a leaf) by means of operation removeAboveExternal(z) Example: remove 3 1 v 3 2 8 6 w 9 5 z 1 v 5 2 8 6 9 Binary Search Tree Performance Consider a dictionary with n items implemented by means of a binary search tree of height h the space used is O(n) methods findElement , insertItem and removeElement take O(h) time The height h is O(n) in the worst case and O(log n) in the best case Balanced Trees When a path in a tree gets very long relative to other paths in the tree… the tree is unbalanced In fact, in its extreme form an unbalanced tree is a linear list. So, to achieve optimal performance… you need to keep the tree balanced AVL Trees we want to maintain a balanced tree recall height of a node v = longest path from v to an external node We want to maintain the principle that for every node v the height of its children can differ by no more than 1 Height-Balance Property AVL Trees h(right_subtree)-h(left_subtree) = Balance Factor |h(right_subtree)-h(left_subtree)| = {0,1} Tree with Balance Factor ≠ {-1,0,1} Unbalanced Tree Must be rebalanced Balance Factor exists for every node v except (trivially) external nodes AVL Trees If Balance Factor = -1,0,1 tree balanced does not need restructured If Balance Factor = -2, 2 tree unbalanced needs restructured restructured done by process called rotation AVL Trees Rotation Four types – but two are symmetrical Left Single Rotation Right Single Rotation Left Double Rotation Right Double Rotation Since two are symmetrical –only consider single and double rotation AVL Trees Rotation if BF = 2 AVL Trees Binary Trees that maintain the Height-Balance Property are called AVL trees the name comes from the inventors G.M. Adelson-Velsky and E.M. Landis in paper entitled “An Algorithm for Information Organization” AVL Trees Unbalanced Tree Balanced Tree from:http://en.wikipedia.org/wiki/AVL_trees AVL Trees h(right_subtree)-h(left_subtree) = Balance Factor (BF) If BF = {-1,0,1} then tree balanced (do nothing) If BF ≠{-1,0,1} then tree unbalanced (must be restructured) Restructuring done by rotation from:http://en.wikipedia.org/wiki/AVL_trees AVL Trees Rotation four cases – but pairs are symmetrical left single rotation right single rotation left double rotation right double rotation singe symmetric – we only examine single and double from:http://en.wikipedia.org/wiki/AVL_trees AVL Trees - Insertion Rotation If BF > 2 unbalance occurred further down in right subtree Recursively walk down subtree until |BF| =2 If BF < -2 unbalance occurred further down in left subtree Recursively walk down subtree until |BF| =2 from:http://en.wikipedia.org/wiki/AVL_trees AVL Trees - Insertion Rotation If BF = 2 unbalance occurred in right subtree Recursively walk down subtree until |BF| =2 If BF = -2 unbalance occurred in left subtree Recursively walk down subtree until |BF| =2 from:http://en.wikipedia.org/wiki/AVL_trees AVL Trees - Insertion Rotation If BF = 2 unbalance occurred in right subtree Step down to subtree to find where insertion occurred If BF = -2 unbalance occurred in left subtree Step down to subtree to find where insertion occurred from:http://en.wikipedia.org/wiki/AVL_trees AVL Trees - Insertion Rotation If BF at subtree = 1 insertion occurred on right leaf node single rotation required If BF at subtree = -1 insertion occurred on left leaf node double rotation occurred from:http://en.wikipedia.org/wiki/AVL_trees AVL Trees - Insertion Rotation See http://en.wikipedia.org/wiki/AVL_trees from:http://en.wikipedia.org/wiki/AVL_trees AVL Trees - Insertion Performance rotations – O(1) Recall h(T) maintained at O(log n) insertItem – O(log n) balanced tree - priceless from:http://en.wikipedia.org/wiki/AVL_trees Bounded –depth Search Trees Search efficiency in tree is related to the depth of the tree Can use depth bounded tree to create ordered dictionaries that run in O(log n) for search and update run-time Multi-way Search Trees Remember Binary Search Trees any node v can have at most 2 children what if we get rid of that rule Suppose a node could have multiple children (>2) Terminology – if v has d children – v is a d-node Multi-way Search Trees Multi-way Search Tree - T Each Internal node must have at least two children -- internal node is d-node with d ≥ 2 Internal nodes store collections of items (k,e) Each d-node stores d-1 items Special keys k0 = -∞ and kd = ∞ External nodes only placeholders