241-423 Advanced Data Structures and Algorithms Semester 2, 2012-2013 12. Balanced Search Trees Objectives – discuss various kinds of balanced search trees: AVL trees, 2-3-4 trees, Red-Black trees, ADSA: Balanced Trees/12 1 Contents 1. 2. 3. 4. What is a Balanced Binary Search Tree? AVL Trees 2-3-4 Trees Red-Black Trees ADSA: Balanced Trees/12 2 1.What is a Balanced Binary Search Tree? • A balanced search tree is one where all the branches from the root have almost the same height. balanced unbalanced ADSA: Balanced Trees/12 continued 3 • As a tree becomes more unbalanced, search running time decreases from O(log n) to O(n) – because the tree shape turns into a list • We want to keep the binary search tree balanced as nodes are added/removed, so searching/insertion remain fast. ADSA: Balanced Trees/12 4 1.1. Balanced BSTs: AVL Trees • An AVL tree maintains height balance – for each node, the difference in height of its two subtrees is in the range -1 to 1 ADSA: Balanced Trees/12 5 1.2. 2-3-4 Trees o o A multiway tree where each node has at most 4 children, and a node can hold up to 3 values. A 2-3-4- tree can be perfectly balanced • • no difference in height between branches requires complex nodes and links ADSA: Balanced Trees/12 6 1.3. Red-Black Trees o A red-black tree is a binary version of a 2-3-4 tree • the nodes have a 'color' attribute: BLACK or RED • • drawn in Ford and Topp (and here) in white and gray!! the tree maintains a balance measure called the BLACK height BLACK RED ADSA: Balanced Trees/12 7 1.4. B-Trees • A multiway tree where each node has at most m children, and a node can hold up to m-1 values – a more general version of a 2-3-4 tree • B-Trees are most commonly used in databases and filesystems – most nodes are stored in secondary storage such as hard drives ADSA: Balanced Trees/12 8 2. AVL Trees • For each AVL tree node, the difference between the heights of its left and right subtrees is either -1, 0 or +1 – this is called the balance factor of a node • balanceFactor = height(left subtree) - height(right subtree) L-R – if balanceFactor > 1 or < -1 then the tree is too unbalanced, and needs 'rearranging' to make it more balanced ADSA: Balanced Trees/12 9 • Heaviness – if the balanceFactor is positive, then the node is "heavy on the left" • the height of the left subtree is greater than the height of the right subtree – a negative balanceFactor, means the node is "heavy on the right" ADSA: Balanced Trees/12 continued 10 L–R = 0-1 root is heavy on the right, but still balanced ADSA: Balanced Trees/12 L–R = 2-1 root is heavy on the left, but still balanced L–R = 1-2 root is heavy on the right, but still balanced 11 2.1. The AVLTree Class ADSA: Balanced Trees/12 12 Using AVLTree String[] stateList = {"NV", "NY", "MA", "CA", "GA"}; AVLTree<String> avltreeA = new AVLTree<String>(); for (int i = 0; i < stateList.length; i++) avltreeB.add(stateList[i]); System.out.println("States: " + avltreeA); int[] arr = {50, 95, 60, 90, 70, 80, 75, 78}; AVLTree<Integer> avltreeB = new AVLTree<Integer>(); for (int i = 0; i < arr.length; i++) avltreeB.add(arr[i]); // display the tree System.out.println(avltreeB.displayTree(2)); avltreeB.drawTree(2); ADSA: Balanced Trees/12 13 Execution States: [CA, GA, MA, NV, NY] 70(-1) 60(1) 50(0) 90(1) 78(0) 75(0) 95(0) 80(0) (-1) (1) ADSA: Balanced Trees/12 root is heavy on the right, but still balanced 14 The AVLTree Node • An AVLNode contains the node's value, references to the node's two subtrees, and the node height. height(node) = max ( height(node.left), height(node.right) ) + 1; nodeValue height left ADSA: Balanced Trees/12 AVTTreeNode object right continued 15 nodeValue height private static class AVLNode<T> { public T nodeValue; // node data left right public int height; // child links public AVLNode<T> left, right; public AVLNode (T item) { nodeValue = item; height = 0; left = null; right = null; } Use of public is bad; the coding style is due to Ford & Topp } ADSA: Balanced Trees/12 16 2.2. Adding a Node to the Tree • The addition of a node may cause the tree to go out of balance. – addNode() adds a node and may reorder nodes as it returns from the adding back to the root • reordering is the new idea in AVL trees – the reordering is done using single and double rotations ADSA: Balanced Trees/12 17 2.2.1. Too Heavy on the Left L – R = 3-1 left 25 12 Two cases or node P (2) left 30 12 30 left 5 node P (2) 25 right 20 left or right branch -- doesn't matter 11 outside grandchild ADSA: Balanced Trees/12 5 20 branch doesn't matter 22 inside grandchild continued 18 • Inserting a node in the left subtree of P (e.g. adding 11 or 22) may cause P to become "too heavy on the left" – balance factor == 2 • The new node can either be in the outside or inside grandchild subtree: – outside grandchild = left-left – inside grandchild = left-right ADSA: Balanced Trees/12 continued 19 ADSA: Balanced Trees/12 20 2.2.2. Too Heavy on the Right L – R = 1-3 node P (-2) right 12 30 25 right 12 right 27 45 branch doesn't matter 40 outside grandchild ADSA: Balanced Trees/12 or node P (-2) 25 Two cases 30 left 27 branch doesn't matter 45 29 inside grandchild continued 21 • Inserting a node in the right subtree of P (e.g. adding 29 or 40) may cause P to become "too heavy on the right" – balance factor == -2 • The new node can either be in the outside or inside grandchild subtree: – outside grandchild = right-right – inside grandchild = right-left ADSA: Balanced Trees/12 continued 22 ADSA: Balanced Trees/12 23 2.3. Single Rotations • When a new item is added to the subtree for an outside grandchild, the imbalance is fixed with a single right or left rotation • Two cases: – left outside grandchild (left-left) --> single right rotation – right outside grandchild (right-right) --> single left rotation ADSA: Balanced Trees/12 24 2.3.1. Single Right Rotation • A single right rotation occurs when a new element is added to the subtree of the left outside grandchild (left-left) ADSA: Balanced Trees/12 continued 25 add cut cut add left outside grandchild (left-left) ADSA: Balanced Trees/12 continued 26 • A single right rotation rotates the left child (LC) to replace the parent – the parent becomes the new right child • The right subtree of LC (RGC) is attached as a left child of P – ok since the nodes in RGC are greater than LC but less than P ADSA: Balanced Trees/12 27 singleRotateRight() // single right rotation on p private static <T> AVLNode<T> singleRotateRight( AVLNode<T> p) { AVLNode<T> lc = p.left; p.left = lc.right; lc.right = p; // 1 & 4 on slide 26 // 2 & 3 p.height = max(height(p.left), height(p.right)) + 1; lc.height = max(height(lc.left), height(rc.right)) + 1; return lc; } ADSA: Balanced Trees/12 28 private static <T> int height(AVLNode<T> t) { if (t == null) return -1; else return t.height; } ADSA: Balanced Trees/12 29 2.3.2. Single Left Rotation • A single left rotation occurs when a new element is added to the subtree of the right outside grandchild (right-right). • The rotation exchanges the parent (P) and right child (RC) nodes, and attaches the subtree LGC as the right subtree of P. ADSA: Balanced Trees/12 continued 30 cut add add cut right outside grandchild (right-right) ADSA: Balanced Trees/12 31 singleRotateLeft() // single left rotation on p private static <T> AVLNode<T> singleRotateLeft( AVLNode<T> p) { AVLNode<T> rc = p.right; p.right = rc.left; rc.left = p; // 1 & 4 on slide 31 // 2 & 3 p.height = max(height(p.left),height(p.right)) + 1; rc.height = max(height(rc.left), height(rc.right)) + 1; return rc; } ADSA: Balanced Trees/12 32 2.4. Double Rotations • When a new item is added to the subtree for an inside grandchild, the imbalance is fixed with a double right or left rotation – a double rotation is two single rotations • Two cases: – left inside grandchild (left-right) --> double right rotation – right inside grandchild (right-left) --> double left rotation ADSA: Balanced Trees/12 33 2.4.1. A Double Right Rotation Single left rotation about LC left inside grandchild (left-right) ADSA: Balanced Trees/12 Single right rotation about P Watch RGC rise to the top balanced 34 doubleRotateRight() private static <T> AVLNode<T> doubleRotateRight( AVLNode<T> p) /* double right rotation on p is left rotation, then right rotation */ { p.left = singleRotateLeft(p.left); return singleRotateRight(p); } ADSA: Balanced Trees/12 35 2.4.2. A Double Left Rotation P P LGC RC LGC LGC LC P RC LC RC A RGC A B A B Single right rotation about RC right inside grandchild (right-left) ADSA: Balanced Trees/12 RGC Single left rotation about P Watch LGC rise to the top B LC RGC balanced 36 doubleRotateLeft() private static <T> AVLNode<T> doubleRotateLeft( AVLNode<T> p) /* double left rotation on p is right rotation, then left rotation */ { p.right = singleRotateRight(p.right); return singleRotateLeft(p); } ADSA: Balanced Trees/12 37 2.5. addNode() • addNode() recurses down to the insertion point and inserts the node. • As it returns, it visits the nodes in reverse order, fixing any imbalances using rotations. • It must handle four cases: – balance height == 2: left-left, left-right – balance height == -2: right-left, right-right ADSA: Balanced Trees/12 38 Basic addNode() was P in earlier slides No AVL rotation code added yet private Node<T> addNode(Node<T> t, T item) { if (t == null) // found insertion point t = new Node<T>(item); } else if (((Comparable<T>)item).compareTo(t.nodeValue) < 0) { t.left = addNode( t.left, item); // visit left subtree else if (((Comparable<T>)item).compareTo(t.nodeValue) > 0 ) { t.right = addNode(t.right, item); // visit right else throw new IllegalStateException(); // duplicate error return t; // end of addNode() ADSA: Balanced Trees/12 39 AVL rotation code added private AVLNode<T> addNode(AVLNode<T> t, T item) { if(t == null) // found insertion point t = new AVLNode<T>(item); else if (((Comparable<T>)item).compareTo(t.nodeValue) < 0) { // visit left subtree: add node then maybe rotate t.left = addNode( t.left, item); // add node, then... if (height(t.left) - height(t.right) == 2 ) { //too heavy on left if (((Comparable<T>)item).compareTo(t.left.nodeValue) < 0) // problem on left-left t = singleRotateRight(t); else // problem on left-right t = doubleRotateRight(t); // left then right rotation } } : ADSA: Balanced Trees/12 continued 40 else if (((Comparable<T>)item).compareTo(t.nodeValue) > 0 ) { // visit right subtree: add node then maybe rotate t.right = addNode(t.right, item ); // add node, then... if (height(t.left)-height(t.right) == -2){ //too heavy on right if (((Comparable<T>)item).compareTo(t.right.nodeValue) > 0) // problem on right-right t = singleRotateLeft(t); else // problem on right-left t = doubleRotateLeft(t); // right then left rotation } } else // duplicate; throw IllegalStateException throw new IllegalStateException(); // calculate new height of t t.height = max(height(t.left), height(t.right)) + 1; } return t; // end of addNode() ADSA: Balanced Trees/12 41 add() public interface for inserting an item public boolean add(T item) { try { root = addNode(root, item); // start from root } catch (IllegalStateException e) { return false; } // item is a duplicate // increment the tree size and modCount treeSize++; modCount++; return true; // node was added ok } ADSA: Balanced Trees/12 42 2.6. Building an AVL Tree gray node is too heavy left outside grandchild (left-left) ADSA: Balanced Trees/12 continued 43 right outside grandchild (right-right) ADSA: Balanced Trees/12 45 continued 44 right inside grandchild (right-left) double rotate left (right then left rotation) ADSA: Balanced Trees/12 continued 45 left inside grandchild (left-right) ADSA: Balanced Trees/12 double rotate right (left then right rotation) continued 46 2.7. Efficiency of AVL Tree Insertion • Detailed analysis shows: int(log2n) height < 1.4405 log2(n+2) - 1.3277 • So the worst case running time for insertion is O(log2n). • The worst case for deletion is also O(log2n). ADSA: Balanced Trees/12 47 2.8. Deletion in an AVL Tree • Deletion can easily cause an imbalance – e.g delete 32 44 44 17 62 32 50 48 17 78 54 50 88 48 78 54 88 after deletion before deletion of 32 ADSA: Balanced Trees/12 62 AVL Trees 48 48 3. 2-3-4 Trees • In a 2-3-4 tree: The numbers refer to the maximum number of branches that can leave the node – a 2-node has 1 value and a max of 2 children – a 3-node has 2 values and a max of 3 children – a 4-node has 3 values and a max of 4 children same as a binary tree node ADSA: Balanced Trees/12 49 3.1. Searching a 2-3-4 Tree • To find an item: – start at the root and compare the item with all the values in the node; – if there's no match, move down to the appropriate subtree; – repeat until you find a match or reach an empty subtree ADSA: Balanced Trees/12 50 Search Example ADSA: Balanced Trees/12 Try finding 9 and 30 51 3.2. Inserting into a 2-3-4 Tree • Search to the bottom for an insertion node – 2-node at bottom: convert to 3-node – 3-node at bottom: convert to 4-node – 4-node at bottom: ?? ADSA: Balanced Trees/12 52 Splitting 4-nodes • Transform tree on the way down: – ensures last node is not a 4-node – local transformation to split a 4-node Insertion at the bottom is now easy since it's not a 4-node ADSA: Balanced Trees/12 53 Example • To split a 4-node. move middle value up. ADSA: Balanced Trees/12 54 3.3. Building This 4-node will be split during the next insertion. insert 4 This 4-node will be split during the next insertion. ADSA: Balanced Trees/12 continued 55 insert 10 Insertions happen at the bottom. ADSA: Balanced Trees/12 This 4-node will be split during the next insertion. 56 insert 55 The insertion point is at level 1, so the new 4-node at level 0 is not split during this insertion. ADSA: Balanced Trees/12 continued 57 insert 11 12 12 4 2 8 4 25 10 15 Split 4-node (4, 12, 25) 35 55 2 8 25 10 11 15 35 55 Insert 11 This 4-node will be split during the next insertion. ADSA: Balanced Trees/12 58 Another Example insert The search missed insert the 4-nodes on the left, so not changed. ADSA: Balanced Trees/12 59 3.4. Efficiency of 2-3-4 Trees fast! • Searching for an item in a 2-3-4 tree with n elements: – the max number of nodes visited during the search is int(log2n) + 1 • Inserting an element into a 2-3-4 tree: – requires splitting no more than int(log2n) + 1 4-nodes • normally requires far fewer splits ADSA: Balanced Trees/12 60 3.5. Drawbacks of 2-3-4 Trees • Since any node may become a 4-node, then all nodes must have space for 3 values and 4 links – but most nodes are not 4-nodes – lots of wasted memory, unless impl. is fancier • Complex nodes and links – slower to process than binary search trees ADSA: Balanced Trees/12 61 4. Red-Black Trees • A red-black tree is a binary search tree where each node has a 'color' – BLACK or RED • A red-black tree is a binary version of a 2-3-4 tree, using different color combinations to represent 3-nodes and 4-nodes. – a 2-node is already a binary node ADSA: Balanced Trees/12 62 BLACK RED ADSA: Balanced Trees/12 BLACK and RED are drawn in Ford and Topp (and here) in white and gray!! 63 4.1. From 2-3-4 Tree Nodes to Red-Black Nodes 2-node Conversion • A 2-node is already a binary node so doesn't need to change its shape. • The color of a 2-node is always BLACK (drawn as white in these slides). ADSA: Balanced Trees/12 continued 64 4-node Conversion • A 4-node has it's middle value become a BLACK (white) parent and the other values become RED (gray) children. BLACK RED ADSA: Balanced Trees/12 continued 65 3-node Conversion • Represent a 3-node as: – a BLACK parent and a smaller RED left child or – a BLACK parent and a larger RED right child 3-node (A, B) in a 2-3-4 Tree (a) Red-black tree representation A is a black parent; B is a red right child A A B B OR S S T (b) Red-black tree representation B is a black parent; A is a red left child U A B U T ADSA: Balanced Trees/12 U S T 66 4.2. Changing a 2-3-4 Tree into a Red-Black Tree change this node ADSA: Balanced Trees/12 continued 67 change this node ADSA: Balanced Trees/12 continued 68 change these nodes ADSA: Balanced Trees/12 69 4.3. Three Properties of a Red-Black Tree that must always be true for the tree to be red-black • 1. The root must always be BLACK (white in our pictures) • 2. A RED parent never has a RED child – in other words: there are never two successive RED nodes in a path ADSA: Balanced Trees/12 continued 70 • 3. Every path from the root to an empty subtree contains the same number of BLACK nodes – called the black height • We can use black height to measure the balance of a red-black tree. ADSA: Balanced Trees/12 71 Check the Example Properties ADSA: Balanced Trees/12 72 4.4. Inserting a Node Three things to do. • 1. Search down the tree to find the insertion point, splitting any 4-nodes (a BLACK parent with two RED children) by coloring the children BLACK and the parent RED – called a color flip • Splitting a 4-node may involve additional rotations and color changes – there are 4 cases to consider (section 4.4.1) ADSA: Balanced Trees/12 continued 73 • 2. Once the insertion point is found, add the new item as a RED leaf node (section 4.4.2) – this may create two successive RED nodes • again use rotation and recoloring to reorder/rebalance the tree • 3. Keep the root as a BLACK node. ADSA: Balanced Trees/12 74 4.4.1. Four Cases for Splitting a 4-Node L G L L L R L 1 LL/black parent (also mirror case, RR) G G L G R 2 3 4 LR/black parent (also mirror case, RL) LL/red parent (also mirror case, RR) LR/red parent (also mirror case, RL) ADSA: Balanced Trees/12 75 Case 1 (LL/black P): An Example • If the parent is BLACK, only a color flip is needed. 1 L G L ADSA: Balanced Trees/12 L L G L L G L G 76 Case 2 (LR/black P): Insert 55 L G L R ADSA: Balanced Trees/12 G R 2 L G L G R Only a color flip is required for same reason as case 1 77 Case 3 (LL / red parent) 3 L L L L The color-flip creates two successive red nodes -- this breaks property 2, so must be fixed ADSA: Balanced Trees/12 78 • To fix the red color conflict, carry out a single left or right rotation of node P: • LL right rotation of P • RR (mirror case) left rotation of P • Also change the colors of nodes P and G. ADSA: Balanced Trees/12 continued 79 3 3 R L R L LL of G → right rot of P RR of G → left rot of P and P and G color changes ADSA: Balanced Trees/12 80 Case 3 (LL / red P) as a 2-3-4 Tree 3 L L XPG A ADSA: Balanced Trees/12 B C D 81 Case 4 (LR / red parent) 4 L L R R The color-flip creates two successive red nodes -- this breaks property 2, so must be fixed ADSA: Balanced Trees/12 82 • To fix the red color conflict, carry out a double left or right rotation of node X: • LR double right rotation of X – left then right rotations • RL (mirror case) double left rot of X – right then left rotations • Also change the colors of nodes X and G. ADSA: Balanced Trees/12 83 LR Example 4 L X R P left rotation of X ADSA: Balanced Trees/12 right rotation of X X and G recoloured 84 Same Example, with 2-3-4 Views 4 L R ADSA: Balanced Trees/12 85 4.4.2. Inserting a New Item • Always add a new item to a tree as a RED leaf node – this may create two successive RED nodes, which breaks property 2 • Fix using single / double rotation and color flip as used for splitting 4-nodes: – LL/RR single right/left rotation of parent (P) – LR/RL double right/left rot of new node (X) ADSA: Balanced Trees/12 86 Example: Insert 14 R R ADSA: Balanced Trees/12 single left rotation of 12 and color flip 87 Insert 10 instead of 14 R L single right rotation of 10 single left rotation and color flip 10 12 RL = double left rotation of node 10 (right then left) ADSA: Balanced Trees/12 88 4.5. Building a Red-Black Tree L L ADSA: Balanced Trees/12 single right rotation of 20 and color flip continued 89 these numbers are from section 4.4. 1 2 3 Insert 35 Insert 25 ADSA: Balanced Trees/12 now continued 90 Insert 30 L R LR = double right rotation of node 30 (left then right) ADSA: Balanced Trees/12 91 4.6. Search Running Time • The worst-case running time to search a redblack tree or insert an item is O(log2n) – the maximum length of a path in a red-black tree with black height B is 2*B-1 but this cannot happen if the insertion rules are followed ADSA: Balanced Trees/12 92 4.7. Deleting a Node • Deletion is more difficult than insertion! – must usually replace the deleted node – but no further action is necessary when the replacement node is RED Delete 75 Replacement node 78 is RED 75 78 60 50 60 90 70 80 78 ADSA: Balanced Trees/12 100 replacement (next biggest) 50 90 70 80 100 continued 93 • But deletion requires recoloring and rotations when the replacement node is BLACK. Delete 90 Replacement node is BLACK 75 60 50 Replace 90 with 100 90 70 80 78 75 75 100 replacement ADSA: Balanced Trees/12 60 50 100 70 80 Right rotation with pivot 80 and recoloring 80 60 50 70 78 100 78 94 4.8. The RBTree Class • RBTree implements the Collection interface and uses RBNode objects to create a red-black tree. nodeValue color RBNode object parent left ADSA: Balanced Trees/12 right 95 class RBTree<T> implements Collection<T> ds.util Constructor RBTree() Creates an empty red-black tree. Methods String displayTree(int maxCharacters) Returns a string that gives a hierarchical view of the tree. An asterisk (*) marks red nodes. void drawTree(int maxCharacters) Creates a single frame that gives a graphical display of the tree. Nodes are colored. String drawTrees(int maxCharacters) Creates of the action of the function and any return value. String toString() Returns a string that describes the elements in a comma-separated list enclosed in brackets. ADSA: Balanced Trees/12 96 • Ford and Topp's tutorial, "RBTree Class.pdf", provides more explanation of the RBTree class – local copy at • http://fivedots.coe.psu.ac.th/Software.coe/ ADSA/Ford%20and%20Topp/ • Includes: – a discussion of the private data – explanation of the algorithms for splitting a 4-node and performing a top-down insertion ADSA: Balanced Trees/12 97 Using the RBTree Class import ds.util.RBTree; public class UseRBTree { public static void main (String[] args) { // 10 values for the red-black tree int[] intArr = {10, 25, 40, 15, 50, 45, 30, 65, 70, 55}; RBTree<Integer> rbtree = new RBTree<Integer>(); // load the tree with values for(int i = 0; i < intArr.length; i++) { rbtree.add(intArr[i]); rbtree.drawTrees(4); // display on-going tree } // in a JFrame : ADSA: Balanced Trees/12 98 // display final tree to stdout System.out.println(rbtree.displayTree(2)); /* // remove red-node 25 rbtree.remove(25); rbtree.drawTrees(4); // tree shown in JFrame // remove black-node root 45 rbtree.remove(45); rbtree.drawTree(3); // tree shown in JFrame */ } // end of main() } // end of UseRBTree class ADSA: Balanced Trees/12 99 Execution Tree changes are shown graphically by drawTrees() and drawTree() ADSA: Balanced Trees/12 100 JFrame Output: Add 10 Values single left rot of 25 BLACK RED 40 25 started red, then flipped to black 15 a 50 b split 25; color change 25 ADSA: Balanced Trees/12 continued 101 double right rotation of 45 split 45 45 30 c d 65 ADSA: Balanced Trees/12 continued 102 single left rotation of 65 70 e f ADSA: Balanced Trees/12 55 split 65; left rotation of 45 103