Algorithms R OBERT S EDGEWICK | K EVIN W AYNE 3.2 B INARY S EARCH T REES ‣ BSTs ‣ ordered operations Algorithms F O U R T H E D I T I O N R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu ‣ deletion 3.2 B INARY S EARCH T REES ‣ BSTs ‣ ordered operations Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu ‣ deletion Binary search trees Definition. A BST is a binary tree in symmetric order. root a left link A binary tree is either: a subtree ・Empty. ・Two disjoint binary trees (left and right). right child of root null links Anatomy of a binary tree Symmetric order. Each node has a key, and every node’s key is: ・ ・Smaller than all keys in its right subtree. Larger than all keys in its left subtree. parent of A and R left link of E S E X A R C keys smaller than E key H 9 value associated with R keys larger than E Anatomy of a binary search tree 3 Binary search tree demo Search. If less, go left; if greater, go right; if equal, search hit. successful search for H S E X A R C H M 4 Binary search tree demo Insert. If less, go left; if greater, go right; if null, insert. insert G S E X A R C H G M 5 BST representation in Java Java definition. A BST is a reference to a root Node. A Node is composed of four fields: ・A Key and a Value. ・A reference to the left and right subtree. smaller keys larger keys private class Node { private Key key; private Value val; private Node left, right; public Node(Key key, Value val) { this.key = key; this.val = val; } } BST Node key left BST with smaller keys val right BST with larger keys Binary search tree Key and Value are generic types; Key is Comparable 6 BST implementation (skeleton) public class BST<Key extends Comparable<Key>, Value> { private Node root; private class Node { /* see previous slide */ root of BST } public void put(Key key, Value val) { /* see next slides */ } public Value get(Key key) { /* see next slides */ } public void delete(Key key) { /* see next slides */ } public Iterable<Key> iterator() { /* see next slides */ } } 7 BST search: Java implementation Get. Return value corresponding to given key, or null if no such key. public Value get(Key key) { Node x = root; while (x != null) { int cmp = key.compareTo(x.key); if (cmp < 0) x = x.left; else if (cmp > 0) x = x.right; else if (cmp == 0) return x.val; } return null; } Cost. Number of compares is equal to 1 + depth of node. 8 BST insert Put. Associate value with key. inserting L S E X A H C Search for key, then two cases: ・Key in tree ⇒ reset value. ・Key not in tree ⇒ add new node. R M P search for L ends at this null link S E X A R H C M create new node P L S E X A R C H M reset links on the way up L P Insertion into a BST 9 BST insert: Java implementation Put. Associate value with key. public void put(Key key, Value val) { root = put(root, key, val); } concise, but tricky, recursive code; read carefully! private Node put(Node x, Key key, Value val) { if (x == null) return new Node(key, val); int cmp = key.compareTo(x.key); if (cmp < 0) x.left = put(x.left, key, val); else if (cmp > 0) x.right = put(x.right, key, val); else if (cmp == 0) x.val = val; return x; } Cost. Number of compares is equal to 1 + depth of node. 10 S C Tree shape R E A X ・Many BSTs correspond to same set of keys. typical case S best case is equal to 1 + depth of node. ・Number of compares for search/insert H E X S C best case R X typical case X A R H C E H R H S X S E worst case C R C H C X A R A S E S C E X typical case H A R E A A A worst case C BST possibilities E H R S BottomAline. Tree depends on order of X insertion. worstshape case C E BST possibilities H 11 BST insertion: random order visualization Ex. Insert keys in random order. 12 Sorting with a binary heap Q. What is this sorting algorithm? 0. Shuffle the array of keys. 1. Insert all keys into a BST. 2. Do an inorder traversal of BST. A. It's not a sorting algorithm (if there are duplicate keys)! Q. OK, so what if there are no duplicate keys? Q. What are its properties? 13 Correspondence between BSTs and quicksort partitioning P H T D O E A I S U Y C M L Remark. Correspondence is 1–1 if array has no duplicate keys. 14 BSTs: mathematical analysis Proposition. If N distinct keys are inserted into a BST in random order, the expected number of compares for a search/insert is ~ 2 ln N. Pf. 1–1 correspondence with quicksort partitioning. Proposition. [Reed, 2003] If N distinct keys are inserted in random order, expected height of tree is ~ 4.311 ln N. How Tall is a Tree? How Tall is a Tree? Bruce Reed Bruce Reed CNRS, Paris, France CNRS, Paris, France reed@moka.ccr.jussieu.fr reed@moka.ccr.jussieu.fr ABSTRACT purpose of this note ABSTRACT have: purpose this note that for /3 -- ½ + ~ 3, Let H~ be the height of a random binaryofsearch tree toonprove n nodes. Wesearch show that there constants a = 4.31107... Let H~ be the height of a random binary tree on n existshave: THEOREM 1. E(H a n d / 3 = 1.95... such that E(H~) = c~logn - / 3 1 o g l o g n + nodes. We show that there exists constants a = 4.31107... Var(Hn) . O(1), We also show that Var(H~) = O(1). THEOREM 1. E(H~) = ~ l o g n - / 3 1 o g l=o gO(1) n + O(1 a n d / 3 = 1.95... such that E(H~) = c~logn - / 3 1 o g l o g n + Var(Hn) = O(1) . O(1), We also show that Var(H~) = O(1). R e m a r k By the def Categories and Subject Descriptors nition given is more R e m a r k By the definition of a, /3 = 7"g~" 3~ The firs as we will see. Categories and Subject Descriptors E.2 [ D a t a S t r u c t u r e s ] : Trees nition given is more suggestive of why this value is co For more informatio as we will see. E.2 [ D a t a S t r u c t u r e s ] : Trees consult [6],[7], For more information on randommay binary search trees[ 1. THE RESULTS R eand m a r[8]. k After I ann may consult [6],[7], [1], [2], [9], [4], A binary search tree is a binary tree to each node of which 1. THE RESULTS developed an alterna Rkeys e m aaxe r k drawn After I from announced these results, Drmota(unp we have associated a key; these some O(1) using complete A binary search tree is a binary tree to each node of which developed anbealternative proof of the fact that Var(H totally ordered set and the key at v cannot larger than proofs illuminate we have associated a key; these keys axe drawn from some O(1) than using different techniques. As ou 15dif at its child thecompletely key at its left decided to submit th totally ordered set and the key atthev key cannot beright larger thannor smaller proofs illuminate different aspects of the problem, we But… Worst-case height is N. [ exponentially small chance when keys are inserted in random order ] ST implementations: summary guarantee average case operations on keys implementation search insert search hit insert sequential search (unordered list) N N ½N N equals() binary search (ordered array) lg N N lg N ½N compareTo() BST N N 1.39 lg N 1.39 lg N compareTo() Why not shuffle to ensure a (probabilistic) guarantee of 4.311 ln N? 16 3.2 B INARY S EARCH T REES ‣ BSTs ‣ ordered operations Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu ‣ deletion Minimum and maximum Minimum. Smallest key in table. Maximum. Largest key in table. max()max S min() E min X A R C H M Examples of BST order queries Q. How to find the min / max? 18 Floor and ceiling Floor. Largest key ≤ a given key. Ceiling. Smallest key ≥ a given key. floor(G) max() S min() E X A R C H ceiling(Q) M floor(D) Examples of BST order queries Q. How to find the floor / ceiling? 19 Computing the floor Case 1. [k equals the key in the node] finding floor(G) The floor of k is k. S E X A R H C Case 2. [k is less than the key in the node] M G is less than S so floor(G) must be on the left The floor of k is in the left subtree. S E Case 3. [k is greater than the key in the node] The floor of k is in the right subtree (if there is any key ≤ k in right subtree); X A R H C G is greater than E so floor(G) could be M on the right S E otherwise it is the key in the node. X A R H C M floor(G)in left subtree is null S E A C result X R H M 20 Computing the floor finding floor(G) public Key floor(Key key) { Node x = floor(root, key); if (x == null) return null; return x.key; } private Node floor(Node x, Key key) { if (x == null) return null; int cmp = key.compareTo(x.key); if (cmp == 0) return x; if (cmp < 0) return floor(x.left, key); Node t = floor(x.right, key); if (t != null) return t; else return x; S E X A R H C M G is less than S so floor(G) must be on the left S E X A R H C G is greater than E so floor(G) could be M on the right S E X A R H C M floor(G)in left subtree is null S } E A C result X R H M 21 Rank and select Q. How to implement rank() and select() efficiently? A. In each node, we store the number of nodes in the subtree rooted at that node; to implement size(), return the count at the root. node count 8 6 S E 2 A 3 1 2 C H 1 X R 1 M 22 BST implementation: subtree counts private class Node { private Key key; private Value val; private Node left; private Node right; private int count; } public int size() { return size(root); } private int size(Node x) { if (x == null) return 0; ok to call return x.count; when x is null } number of nodes in subtree initialize subtree private Node put(Node x, Key key, Value val) count to 1 { if (x == null) return new Node(key, val, 1); int cmp = key.compareTo(x.key); if (cmp < 0) x.left = put(x.left, key, val); else if (cmp > 0) x.right = put(x.right, key, val); else if (cmp == 0) x.val = val; x.count = 1 + size(x.left) + size(x.right); return x; } 23 Rank Rank. How many keys < k ? Easy recursive algorithm (3 cases!) node count 8 6 S E 2 A 3 1 2 C H 1 X R 1 M public int rank(Key key) { return rank(key, root); } private int rank(Key key, Node x) { if (x == null) return 0; int cmp = key.compareTo(x.key); if (cmp < 0) return rank(key, x.left); else if (cmp > 0) return 1 + size(x.left) + rank(key, x.right); else if (cmp == 0) return size(x.left); } 24 Inorder traversal ・Traverse left subtree. ・Enqueue key. ・Traverse right subtree. public Iterable<Key> keys() { Queue<Key> q = new Queue<Key>(); inorder(root, q); return q; } private void inorder(Node x, Queue<Key> q) { if (x == null) return; inorder(x.left, q); q.enqueue(x.key); inorder(x.right, q); } BST key left val right BST with larger keys BST with smaller keys smaller keys, in order key larger keys, in order all keys, in order Property. Inorder traversal of a BST yields keys in ascending order. 25 BST: ordered symbol table operations summary sequential search binary search BST search N lg N h insert N N h min / max N 1 h floor / ceiling N lg N h rank N lg N h select N 1 h ordered iteration N log N N N h = height of BST (proportional to log N if keys inserted in random order) order of growth of running time of ordered symbol table operations 26 3.2 B INARY S EARCH T REES ‣ BSTs ‣ ordered operations Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu ‣ deletion ST implementations: summary guarantee average case implementation ordered ops? operations on keys search insert delete search hit insert delete sequential search (linked list) N N N ½N N ½N binary search (ordered array) lg N N N lg N ½N ½N ✔ compareTo() BST N N N 1.39 lg N 1.39 lg N ??? ✔ compareTo() equals() Next. Deletion in BSTs. 28 BST deletion: lazy approach To remove a node with a given key: ・Set its value to null. ・Leave key in tree to guide search (but don't consider it equal in search). E E A S A S delete I C H ☠ C I R N H tombstone R N Cost. ~ 2 ln N' per insert, search, and delete (if keys in random order), where N' is the number of key-value pairs ever inserted in the BST. Unsatisfactory solution. Tombstone (memory) overload. 29 Deleting the minimum To delete the minimum key: ・Go left until finding a node with a null left link. ・Replace that node by its right link. ・Update subtree counts. go left until reaching null left link A S X E R H C M return that node’s right link S X E R A public void deleteMin() { root = deleteMin(root); H C } private Node deleteMin(Node x) { if (x.left == null) return x.right; x.left = deleteMin(x.left); x.count = 1 + size(x.left) + size(x.right); return x; } M available for garbage collection update links and node counts after recursive calls 7 S 5 X E R C H M Deleting the minimum in a BST 30 Hibbard deletion To delete a node with key k: search for node t containing key k. Case 0. [0 children] Delete t by setting parent link to null. deleting C update counts after recursive calls S S E X A R C E R M node to delete replace with null link 1 E X A R H H H S 5 X A 7 M M available for garbage collection C 31 Hibbard deletion To delete a node with key k: search for node t containing key k. Case 1. [1 child] Delete t by replacing parent link. deleting R update counts after recursive calls S S E X A R C H M node to delete E C X E M X H A replace with child link S 5 H A 7 C M available for garbage collection R 32 Hibbard deletion deleting E node to delete S To delete a node with key k: search for node tEcontainingX key k. A R H C M Case 2. [2 children] ・ ・Delete the minimum in t's right subtree. A C ・Put x in t's spot. Find successor x of t. t S E node to delete S E X H C M x search for key E successor M go right, then go left until reaching null left link min(t.right) X deleteMin(t.right) X M 7 S 5 H A X R C S E E R R H C min(t.right) S H X x M C S E x t.left still a BST successor H A t A R go right, then go left until reaching null left link R x has no left child but X don't garbage collect x x deleting E A search for key E M update links and node counts after recursive calls Deletion in a BST 33 Hibbard deletion: Java implementation public void delete(Key key) { root = delete(root, key); } private Node delete(Node x, Key key) { if (x == null) return null; int cmp = key.compareTo(x.key); if (cmp < 0) x.left = delete(x.left, key); else if (cmp > 0) x.right = delete(x.right, key); else { if (x.right == null) return x.left; if (x.left == null) return x.right; Node t = x; x = min(t.right); x.right = deleteMin(t.right); x.left = t.left; } x.count = size(x.left) + size(x.right) + 1; return x; search for key no right child no left child replace with successor update subtree counts } 34 Hibbard deletion: analysis Unsatisfactory solution. Not symmetric. Surprising consequence. Trees not random (!) ⇒ √ N per op. Longstanding open problem. Simple and efficient delete for BSTs. 35 ST implementations: summary guarantee average case implementation ordered ops? operations on keys search insert delete search hit insert delete sequential search (linked list) N N N ½N N ½N binary search (ordered array) lg N N N lg N ½N ½N ✔ compareTo() BST N N N 1.39 lg N 1.39 lg N √N ✔ compareTo() equals() other operations also become √N if deletions allowed Next lecture. Guarantee logarithmic performance for all operations. 36