Ch 16. Balanced Search Trees © copyright 2006 SNU IDB Lab. SNU IDB Lab. Bird’s-Eye View (0) Chapter 15: Binary Search Tree Chapter 16: Balanced Search Tree BST and Indexed BST AVL tree: BST + Balance B-tree: generalized AVL tree Chapter 17: Graph Data Structures 2 SNU IDB Lab. Bird’s-Eye View Balanced tree structures - Height is O(log n) AVL Binary Search Tree with Balance Red-black trees Splay trees Individual dictionary operation 0(n) Take less time to perform a sequence of u operations 0(u log u) B-trees (Balanced Tree) Suitable for external memory Data Structures 3 SNU IDB Lab. Table of Contents AVL TREES Definition Searching an AVL Search Tree Inserting into an AVL Search Tree Deletion from an AVL Search Tree RED-BLACK TREES SPLAY TREES B-TREES Data Structures 4 SNU IDB Lab. The History of Balanced Trees Adel'son-Vel'skiĭ and Landis introduced AVL tree in 1962 Bayer and McCreight introduced B-tree in 1972 Kept balanced by requiring that all leaf nodes are at the same depth Join or split is needed instead of re-balancing Bayer, Guibas and Sedgewick introduced Red-black tree in 1978 Ensures balance by restricting every node's depth to differ at most by 1 Ensures balance by restricting the occurrence of red nodes in the tree Sleator and Tarjan introduced Splay tree in 1983 Maintains balance without any explicit balance condition such as color Splay operations are performed within the tree every time an access is made Data Structures 5 SNU IDB Lab. AVL TREES Balanced tree AVL search tree Balanced binary search trees Can be generalized to a B-tree A height-balanced k tree (HB(k) tree) Trees with a worst-case height of O(log n) Allowable height difference of any two sub-trees is k AVL Tree : HB(1) Tree G.M. Adel’son, Vel’skii, E.M. Landis Performance Given N keys, worst-case search 1.44 log2(N+2) cf. Completely balanced AVL tree : worst-case search log2(N+1) Data Structures 6 SNU IDB Lab. Height of an AVL Tree n : nodes in AVL tree Nh : min number of nodes in an AVL tree of height h Nh = Nh-1 + Nh-2 + 1, N0 = 0, and N1 = 1 Similar in definition to Fibonacci numbers Fh = Fn-1 + Fn-2., F0 = 0 and F1 = 1 It can be shown that Nh = Fh+2 - 1 for h > 0 Fibonacci theory: Fh ≒ Øh/√5 where Ø = (1 + √5)/2 therefore Nh ≒ Øh+2/√5-1 If there are n nodes then its height h = logØ(√5(n+1)) - 2 ≒ 1.44log2(n+2) h = O(log n) Data Structures 7 SNU IDB Lab. AVL Tree Definition An empty binary tree is an AVL Tree If T is a nonempty binary tree with TL and TR as its left and right subtrees, then T is an AVL tree iff (1) TL and TR are AVL Trees and (2) | hL - hR| ≤ 1 where hL and hR are the heights of TL and TR, respectively For any node in tree T in AVL tree, BF(T) should be one of “ -1, 0, 1” If BF(T) is -2 or 2, then proper rotation is performed in order to get balance Conceptually AVL search tree = AVL tree + Binary Search Tree Data Structures 8 SNU IDB Lab. AVL Tree Examples (a) AVL Trees (b) Non - AVL Trees X X X Data Structures 9 X SNU IDB Lab. Intuition: AVL Search Tree AVL Search Tree = Binary Search Tree + AVL Tree = Balanced Binary Search Tree 20 30 60 15 12 5 25 22 18 40 2 70 65 80 (a) (b) (c) BST X O O AVL O O X AVL ST X O X Data Structures 10 SNU IDB Lab. Indexed AVL Search Tree Indexed AVL search Tree = AVL Tree + LeftSize variable = (Balanced + Binary Search Tree) + LeftSize variable 3 1 1 Data Structures APR MAY AUG 0 1 MAR NOV 11 SNU IDB Lab. Representation of an AVL Tree Balance factor bf(x) of a node x = height of left subtree – height of right subtree Permissible balance factors: (-1, 0, 1) 20 0 30 -1 5 0 40 35 0 Data Structures 25 -1 0 15 1 0 12 12 18 0 30 0 SNU IDB Lab. AVL Search Tree Example (1) New Identifier After Insertion MARCH 0 MAR New Identifier After Insertion MAY No Rebalancing needed No Rebalancing needed -1 MAR 0 MAY New Identifier NOVEMBER After Rebalancing After Insertion -2 MAR -1 MAY Data Structures 0 MAY RR 0 MAR 0 NOV 13 0 NOV SNU IDB Lab. AVL Search Tree Example (2) New Identifier After Insertion AUGUST +1 MAY +1 MAR No Rebalancing needed 0 NOV 0 AUG Data Structures 14 SNU IDB Lab. AVL Search Tree Example (3) New Identifier After Insertion APRIL +2 MAY +2 MAR After Rebalancing LL +1 MAY 0 NOV 0 AUG +1 AUG 0 APR 0 NOV 0 MAR 0 APR Data Structures 15 SNU IDB Lab. AVL Search Tree Example (4) New Identifier After Rebalancing After Insertion +2 MAY JANUARY -1 AUG 0 APR 0 MAR LR 0 NOV 0 AUG +1 MAR 0 APR -1 MAY 0 JAN 0 NOV 0 JAN Data Structures 16 SNU IDB Lab. AVL Search Tree Example (5) New Identifier No Rebalancing needed After Insertion +1 MAR DECEMBER -1 AUG -1 MAY +1 JAN 0 APR 0 NOV 0 DEC Data Structures 17 SNU IDB Lab. AVL Search Tree Example (6) New Identifier +1 MAR JULY -1 AUG -1 MAY 0 JAN 0 APR 0 DEC Data Structures No Rebalancing needed After Insertion 0 NOV 0 JUL 18 SNU IDB Lab. AVL Search Tree Example (7) New Identifier FEBRUARY RL +2 MAR -2 AUG +1 MAR -1 MAY +1 JAN 0 APR After Rebalancing After Insertion 0 DEC 0 NOV 0 JAN +1 AUG 0 JUL -1 DEC 0 APR -1 MAY 0 FEB 0 NOV 0 JUL 0 FEB Data Structures 19 SNU IDB Lab. AVL Search Tree Example (8) JUNE 0 APR LR +2 MAR -1 DEC 0 FEB 0 JAN +1 DEC -1 MAY -1 JAN +1 AUG After Rebalancing After Insertion New Identifier +1 AUG 0 NOV 0 APR -1 JUL 0 FEB 0 MAR -1 MAY -1 JUL 0 JUN -1 NOV 0 JUN Data Structures 20 SNU IDB Lab. AVL Search Tree Example (9) After Rebalancing After Insertion New Identifier 0 JAN OCTOBER-1 JAN +1 DEC +1 AUG RR -1 MAR 0 FEB 0 APR -1 JUL 0 JUN 0 MAR +1 DEC +1 AUG -2 MAY -1 NOV 0 APR 0 FEB -1 JUL 0 JUN 0 NOV 0 MAY 0 OCT 0 OCT Data Structures 21 SNU IDB Lab. AVL Search Tree Example (11) New Identifier No Rebalancing needed After Insertion SEPTEMBER -1 JAN -1 MAR +1 DEC +1 AUG 0 APR 0 FEB -1 JUL 0 JUN -1 NOV 0 MAY -1 OCT 0 SEP Data Structures 22 SNU IDB Lab. Table of Contents AVL TREES Definition Searching an AVL Search Tree Inserting into an AVL Search Tree Deletion from an AVL Search Tree RED-BLACK TREES SPLAY TREES B-TREES Data Structures 23 SNU IDB Lab. Searching in an AVL Search Tree search in binary search tree : Wish to Search for thekey from root to leaf If (root == null) search is unsuccessful; else if (thekey < key in root) only left subtree is to be searched; else if (thekey > key in root) only right subtree is to be searched; else (thekey == key in root) search terminates successfully; Subtrees may be searched similarly in a recursive manner TimeComplexity = O(height) Height of an AVL tree with n element O(log n): search time is O(log n) Data Structures 24 SNU IDB Lab. Table of Contents AVL TREES Definition Searching an AVL Search Tree Inserting into an AVL Search Tree Deletion from an AVL Search Tree RED-BLACK TREES SPLAY TREES B-TREES Data Structures 25 SNU IDB Lab. Unbalance due to Inserting When an insertion into an AVL Tree using the strategy of Program15.5 (insert in BST), the resulting tree is unbalanced 30 5 0 0 -1 40 35 Data Structures 1 New element 26 SNU IDB Lab. Observations on Imbalance due to Insertion O1: In the unbalanced tree the BFs are limited to –2, -1, 0, 1, 2 O2: A node with BF “2” had a BF “1” before the insertion O3: The BF of only those nodes on the path from the root to the newly inserted node can change as a result of the insertion O4: Let A denote the nearest ancestor of the newly inserted node whose BF is either –2 or 2. The BF of all nodes on the path from A to the newly inserted node was 0 prior to the insertion O5: Imbalance can happen in the last node encountered that has a balance factor 1 or –1 prior to the insertion Data Structures 27 SNU IDB Lab. Node X with Potential Imbalance (1) Let X denote the last node encountered that has a balance factor 1 or –1 prior to the insertion If the tree is unbalanced following the insertion, X exists If bf(x) = 0 after the insertion, then the height of the subtree with root X is the same before and after the insertion No node X -1 30 0 5 0 20 X 40 1 35 0 32 Data Structures 0 15 12 0 0 20 X -125 18 22 0 0 30 28 50 28 0 15 0 12 -125 18 0 0 30 10 14 16 19 SNU IDB Lab. Node X with Potential Imbalance (2) (a) (b) (c) height h h h+1 bf(x) 1 0 2 balanced balanced balanced imbalanced The only way the tree can become unbalanced is when the insertion causes bf(x) to change from –1 to –2 or from 1 to 2. Data Structures 29 SNU IDB Lab. Imbalance Patterns due to Insertion The imbalance at A is one of the types LL (when new node is in the left subtree of the left subtree of A) LR (when new node is in the right subtree of the left subtree of A) RR (when new node is in the right subtree of the right subtree of A) RL (when new node is in the left subtree of the right subtree of A) LL and RR imbalances require single rotation LR and RL imbalances require double rotations A Insert Y LL Data Structures RL LR 30 RR SNU IDB Lab. LL Rebalancing after Insertion Balanced Subtree Unbalanced following insertion +1 A +2 A 0 B AR 0 B h+2 rotation type LL BR BL 0 B BL AR 0 A h+2 h BL Balanced Subtree BR BR AR Height of BL increase to h+1 (BL < B < BR < A < AR) Data Structures 31 SNU IDB Lab. RR Rebalancing after Insertion Unbalanced following insertion Balanced Subtree 0 B AL BL rotation type RR -2 A -1 A BR 0 A h+2 h+2 BL Al BR Height of BR increase to h+1 (AL < A < BL < B < BR) Data Structures 0 B 0 B AL BR Balanced Subtree 32 BL SNU IDB Lab. LR-a Rebalancing after Insertion Balanced Subtree +1 A 0 B Unbalanced following insertion rotation type Balanced Subtree 0 C LR(a) +1 A 0 B -1 B 0 A 0 C (B < C < A) Data Structures 33 SNU IDB Lab. LR-b Rebalancing after Insertion Unbalanced following insertion Balanced Subtree 0 B AR h+2 BL CR h+2 -1 A AR +1 C h h-1 CL 0 C 0 B -1 B h 0 C BL rotation type LR(b) +2 A +1 A Balanced Subtree CL BL CL CR AR h CR (BL < B < CL < C < CR < A < AR) Data Structures 34 SNU IDB Lab. LR-c Rebalancing after Insertion Unbalanced following insertion Balanced Subtree 0 B AR h-1 CL h+2 BL CR 0 C h+2 0 A +1 B -1 B h 0 C BL rotation type LR(c) +2 A +1 A Balanced Subtree AR -1 C CL BL CL CR AR h CR RL a, b and c are symmetric to LR a, b and c Data Structures 35 SNU IDB Lab. Table of Contents AVL TREES Definition Searching an AVL Search Tree Inserting into an AVL Search Tree Deletion from an AVL Search Tree RED-BLACK TREES SPLAY TREES B-TREES Data Structures 36 SNU IDB Lab. Deletion from an AVL Tree Let q be the parent of the node that was physically deleted If the deletion took place from the left subtree of q bf(q) decreases by 1 the right subtree of q bf(q) increases by 1 Observations D1 : If the new BF of q is 0, its height has decreased by 1. we need to change the BF of its parent (if any) and possibly those of its other ancestors D2 : If the new BF of q is either –1 or 1, its height is the same as before the deletion and the BFs of tis ancestors and unchanged D3 : If the new BF of q is either –2 or 2, the tree is unbalanced at q Data Structures 37 SNU IDB Lab. Imbalance Patterns due to Deletion Type L If the deletion took place from A’s left subtree with root B Subclassified : L-1, L0 and L1 depending on bf(B) Type R If the deletion took place from A’s right subtree with root B Subclassified : R-1, R0 and R1 depending on bf(B) Data Structures 38 SNU IDB Lab. R0 rotation after Deletion Height of tree is h+2 (h+2) before (after) deletion Single rotation is sufficient BL < B < BR < A < AR Data Structures 39 SNU IDB Lab. R1 rotation after Deletion Height of tree is h+2 (h+1) before (after) deletion Single rotation is sufficient BL < B < BR < A < AR Data Structures 40 SNU IDB Lab. R-1 rotation after Deletion Height of tree is h+2 (h + 1) before (after) deletion Double rotations BL < B < CL < C < CR < A < AR Data Structures 41 SNU IDB Lab. Rotation Taxonomy in AVL Rotation types due to Insertion Rotation types due to Deletion LL type RR type LR type: LR-a, LR-b, LR-c RL type: RL-a, RL-b, LR-c R type: R-1, R0, R1 L type: L-1, L0, L1 LL rotation in insertion and R1 rotation in deletion are identical LR rotation in insertion and R-1 rotation in deletion are identical LL rotation in insertion and R0 rotation in deletion differ only in the final BF of A and B Data Structures 42 SNU IDB Lab. Table of Contents AVL TREES RED-BLACK TREES Definition Searching a Red-Black Tree Inserting into a Red-Black Tree Deletion from a Red-Black Tree Implementation Considerations and Complexity SPLAY TREES B-TREES Data Structures 43 SNU IDB Lab. Red-Black Tree vs. AVL Tree (1) more balanced less balanced Data Structures Red-Black tree AVL tree Lookup O(logn) O(logn) Insertion O(logn) O(logn) Deletion O(logn) O(logn) 44 SNU IDB Lab. Red-Black Tree vs. AVL Tree (2) insert a node x x x Red-black tree doesn't need rebalancing Data Structures AVL tree needs rebalancing 45 SNU IDB Lab. Red-Black Tree: Definition Red-black tree Binary Search tree Every node is colored red or black RB1. Root and all external nodes are black. RB2. No root-to-external-node path has two consecutive red nodes. RB3. All root-to-external-node paths have the same number of black nodes ≡ equivalent RB1’. Pointers from an internal node to an external node are black RB2’. No root-to-external-node path has two consecutive red pointers RB3’. All root-to-external-node paths have the same number of black pointers Data Structures 46 SNU IDB Lab. Red-Black Tree: Example 65 50 10 5 80 70 60 62 Every path from the root to an external node has exactly 2 black pointers and 3 black nodes No such path has two consecutive red nodes or pointers Small black box nodes are for ensuring every node has two children The color of newly inserted node is red SNU Data Structures 47 IDB Lab. RBT: Glossary Rank: number of black pointers on any path from the node to any external node in red-black tree Length (of a root-to-external-node path): number of pointers on the path. • rank = 1 • height = length = 2 Data Structures 48 SNU IDB Lab. RBT: Lemma 1 Lemma 1 If P and Q are two root-to-external-node paths in a red-black tree, Then length(P) ≤ 2 * length(Q) Proof Suppose that the rank of the root is r From RB1’ and RB2’, each root-to-externalnode path has between r and 2r pointers So length(P) ≤ 2length(Q) Data Structures 49 length(Q)=2 length(P)=4 SNU IDB Lab. RBT: Lemma 2 Lemma 2 h=4 n=5 r=2 h : height of a red-black tree n : number of internal nodes r : rank of the root (a) h ≤ 2r (b) n ≥ 2r – 1 From Lemma 16.1, no root-to-external-node path has length > 2r No external nodes at levels 1 through r so 2r – 1 internal nodes at these levels (c) h ≤ 2log2(n+1) 2r ≤ n + 1 from (b) r ≤ log2(n+1) f ≤ 2r ≤ 2log2(n+1) Data Structures 50 SNU IDB Lab. RBT: Representation Null pointers represent external nodes Pointer and node colors are closely related Each node we need to store only its color ( one additional bit per node ) or the color of the two pointers to its children (two additional bit per node) Data Structures 51 → null pointer →R/B or → {R / B, R / B} SNU IDB Lab. Table of Contents AVL Tree RED-BLACK TREES Definition Searching a Red-Black Tree Inserting into a Red-Black Tree Deletion from a Red-Black Tree Implementation Splay Tree B Tree Data Structures 52 SNU IDB Lab. Searching a Red-Black Tree Use the same code to search ordinary binary search tree (Program 15.4), AVL tree, red-black trees if(root == null) { search is unsuccessful } else { if ( thekey < key in root) only left subtree is to be searched } else { if(thekey > key in root) only right subtree is to be searched else (thekey == key in root) search terminates successfully } } Data Structures 53 SNU IDB Lab. Table of Contents AVL Tree RED-BLACK TREES Definition Searching a Red-Black Tree Inserting into a Red-Black Tree Deletion from a Red-Black Tree Implementation Splay Tree B Tree Data Structures 54 SNU IDB Lab. Violations due to Insertion (1) The RBT should have the same number of black nodes in all paths If new node is colored as black The updated tree will always violate RB3 (same number of black nodes) 3 3 2 insert 1 2 4 4 1 Data Structures 55 SNU IDB Lab. Violations due to Insertion (2) If new node is colored as red If the parent of inserted node is black, it's OK (no violation). But if the parent of inserted node is also red, violation occurs! Violate RB2 (no two consecutive reds) 3 3 insert 1 2 2 1 Data Structures 56 SNU IDB Lab. L Type Imbalances due to Insertion (1) u be the inserted node (red) pu be the parent of u (red) puL & puR gu be the granparent of u uL & uR guL & guR LLr & LRr The color of guR is red Data Structures 57 SNU IDB Lab. L Type Imbalances due to Insertion (2) u be the inserted node (red) uL & uR U pu be the parent of u (red) puL & puR gu be the granparent of u U guL & guR LLb & LRb The color of guR is black Data Structures 58 SNU IDB Lab. Fixing LLr and LRr Imbalance Begin change the color of pu & guR : red black if (gu != root) { change the color of gu : black red } else { the color change not done. the number of black nodes increases by 1. (on all root-to-external-node paths) } if (the color change of gu causes imbalance) gu became the new u node if (gu != root && the color change causes imbalance) continue to rebalance End Data Structures 59 SNU IDB Lab. Fixing LLr Imbalance If a node (which is red) u is left child of its parent (also red) and its parent is left child of its grandparent & its uncle is red, then change its grandparent's color to red & change its parent's and uncle's color to black A B u C D B G F E LLr imbalance Data Structures A u C D G F E After LLr color change 60 SNU IDB Lab. Fixing LRr Imbalance If a node (which is red) u is right child of its parent (also red) and its parent is left child of its grandparent & its uncle is red, then change its grandparent's color to red & change its parent's and uncle's color to black A A B D u C B G E LRr imbalance Data Structures F C G D u F E After LRr color change 61 SNU IDB Lab. Fixing LLb and LRb Imbalance Rotation first & then Change the color The root of the involved subtree is black following the rotation Number of black nodes on all root-to-external-node paths is unchanged LLb rotation in RB tree is similar to LL rotation in AVL tree LRb rotation in RB tree is similar to LR rotation in AVL tree Data Structures 62 SNU IDB Lab. Fixing LLb Imbalance If a node (which is red) u is left child of its parent(also red) and its parent is left child of its grandparent & its uncle is black, then do rotation and color change like the following B A u C u E B D D LLb imbalance Data Structures A C E After LLb rotation 63 SNU IDB Lab. Fixing LRb Imbalance a node (which is red) u is right child of its parent (also red) and its parent is left child of its grandparent & its uncle is black, then do rotation and color change like the following If A B u D G D u C E C F LRb imbalance Data Structures A B E F G After LRb rotation 64 SNU IDB Lab. Insertion Example in RBT (1) (a) Initial state: all root-to-external-node paths have 3 black nodes & 2 black pointers 50 80 10 90 50 80 10 70 Data Structures (b) insert 70 as a red node: No violations of RBT No remedial action is necessary 90 65 SNU IDB Lab. Insertion Example in RBT (2) 50 (c) insert 60 as a red node LLr imbalance gu 10 pu uu 70 80 90 60 50 pu 80 10 70 (d) LLr color change on nodes 70, 80 & 90; gu is null, so not RB2 imbalance Data Structures 66 u 90 60 SNU IDB Lab. Insertion Example in RBT (3) 50 (e) Insertion 65 as a red node LRb imbalance 80 10 gu pu 70 90 60 u 50 65 80 10 65 (f) Perform LRb rotation Data Structures 67 60 90 70 SNU IDB Lab. Insertion Example in RBT (4) 50 (g) Insertion 62 as a red node LRr imbalance 80 10 gu pu 65 60 u 90 gu 50 70 62 80 10 u (h) LRr color change on nodes 65, 60 & 70 RLb imbalance Data Structures 68 65 60 pu 90 70 62 SNU IDB Lab. Insertion Example in RBT (5) gu 50 80 10 u 65 60 pu RLb imbalance 90 65 70 50 62 10 80 60 70 90 62 (i) Perform RLb rotation Data Structures 69 SNU IDB Lab. Table of Contents AVL Tree RED-BLACK TREES Definition Searching a Red-Black Tree Inserting into a Red-Black Tree Deletion from a Red-Black Tree Implementation Splay Tree B Tree Data Structures 70 SNU IDB Lab. Violations due to Deletion (1) If the parent of deleted node is red, RB2 violation occurs! 4 3 2 4 ... 3 ... delete 2 1 1 Data Structures 71 SNU IDB Lab. Violations due to Deletion (2) If the deleted node is black, RB3 violation occurs! 3 2 Data Structures 3 delete 2 4 4 72 SNU IDB Lab. Deletion & Imbalance in RBT (1) 65 50 10 (a) A Red-Black tree 90 60 70 62 65 50 10 60 62 Data Structures 90 (b) Delete 70 Deleted node was red Same number of black nodes before and after the rotation This is OK 73 SNU IDB Lab. Deletion & Imbalance in RBT (2) 65 (a) A Red-Black tree 50 10 90 60 70 62 65 50 10 60 62 Data Structures 70 y (c) Delete 90 The red node 70 takes the place of the deleted node which was black Then, the number of black nodes on path from root-to-external node in y is 1 less than before RB3 violation occurs = imbalance Change the color of y to Black 74 SNU IDB Lab. Deletion & Imbalance in RBT (3) 65 50 10 62 90 50 60 70 10 90 60 70 62 (a) A Red-Black tree (d) Delete 65 Deleted node was black and the node 62 was red, so change to black ** An RB3 violation occurs only when the deleted node was black and y is not the root of the resulting tree. Data Structures 75 SNU IDB Lab. Rb Imbalance due to Deletion Rb0 => color change Rb1 => handled by rotation Rb2 => handled by rotation number of y’s nephew y's sibling is black y is the right child of its parent (y is the node that takes the place of removed node) Data Structures 76 SNU IDB Lab. Deletion Imbalances: Rb family y: the node that takes the place of removed node py: parent of y v: sibling of y vL & vR: children of v Data Structures 77 SNU IDB Lab. Fixing Rb0 Imbalance If a node (which is black) y is right child of its parent and its sibling is black & its sibling has 0 red child, then change its sibling's color to red A A E y B C Data Structures C D Rb0 imbalance E y B D After Rb0 color change 78 SNU IDB Lab. Fixing Rb1 Imbalance If a node(which is black) y is right child of its parent and its sibling is black & its sibling has 1 red child, then do rotation and color change like the following B C A G y D F E A red / black G y B D B D C E C F Rb1 imbalance Data Structures A E F G y After Rb1 rotation 79 SNU IDB Lab. Fixing Rb2 Imbalance If a node(which is black) y is right child of its parent and its sibling is black & its sibling has 2 red children, then do rotation and color change like the following A D G y B B D C E F Rb2 imbalance Data Structures C A E F G y After Rb2 rotation 80 SNU IDB Lab. Rr Imbalance due to Deletion Rr0 Rr1 Rr2 handled by rotation number of red child that v’s right child has (v is sibling of y) y's sibling is red y is the right child of its parent (y is the node that takes the place of removed node)SNU Data Structures 81 IDB Lab. Deletion Imbalances: Rr family y: the node that takes the place of removed node py: parent of y v: sibling of y vL & vR: children of v Data Structures 82 SNU IDB Lab. Fixing Rr0 Imbalance If a node(which is black) y is right child of its parent and its sibling is red & its nephew has 0 red child, then do rotation and color change like the following B A E y B C D Rr0 imbalance Data Structures A C D E y After Rr0 rotation 83 SNU IDB Lab. Fixing Rr1 Imbalance If a node(which is black) y is right child of its parent and its sibling is red & its nephew has 1 red child, then do rotation and color change like the following D B C I y red / black F D B F E G C H A D E Rr1 imbalance Data Structures H G B I y F E A C A H I y G After Rr1 rotation 84 SNU IDB Lab. Fixing Rr2 Imbalance If a node(which is black) y is right child of its parent and its sibling is red & its nephew has 2 red children, then do rotation and color change like the following A I y B F B D C C F E G Rr2 imbalance Data Structures D E H A H I y G After Rr2 rotation 85 SNU IDB Lab. Deletion Example (1) 65 50 10 80 60 70 90 65 62 50 10 (a) 90 deleted Not root & black Imbalance Rb0 Data Structures 80 60 v py 70 y 62 vR 86 SNU IDB Lab. Deletion Example (2) 65 80 py 50 10 60 v 70 (b) Rb0 color change py was red before delete Rb0 color change of 70 & 80 we are done 65 62 vR 50 10 ( C) delete 80 Black node “80” was deleted So tree remains balanced Data Structures 70 60 62 87 SNU IDB Lab. Deletion Example (3) py 65 65 50 10 v 70 10 60 62 50 60v 65 Data Structures 60v w 62 x 62 10 50 (d) delete 70 Nonroot black node was deleted Tree is imbalance Rr1(ii) (e) after Rr1(ii) Rotation This tree is now balanced! 88 SNU IDB Lab. Rotation Taxonomy in RBT Rotation types due to Insertion L family LLb type LRb type RRr type RLr type RRb type RLb type R family LLr type LRr type Rotation types due to Deletion Rb family Rb1(ii) Rb2 Rr0 Rr1(i) Rr1(ii) Rr2 Lb0 Lb1(i) Lb1(ii) Lb2 Lr0 Lr1(i) Lr1(ii) Lr2 Lb family Rb1(i) Rr family Rb0 Lr family Data Structures 89 SNU IDB Lab. Table of Contents AVL Tree RED-BLACK TREES Definition Searching a Red-Black Tree Inserting into a Red-Black Tree Deletion from a Red-Black Tree Implementation Splay Tree B Tree Data Structures 90 SNU IDB Lab. Implementation Considerations Insertion / Deletion require backward movement If use red-black-tree nodes Backward movement is easy else Backward movement is complex //use stack instance of color fields..etc Complexity For an n-element red-black tree parent-pointer scheme runs slightly faster than tack scheme Color change : O(log n) // propagate back toward the root Rotation : O(1) Each color change or ratation : Θ(1) Total insert/delete O(log n) Data Structures 91 SNU IDB Lab. Table of Contents AVL TREES RED-BLACK TREES SPLAY TREES B-TREES Data Structures 92 SNU IDB Lab. Splay Tree Splay tree is a binary search tree whose nodes are rearranged by splay operation whenever search, insertion, or deletion occurs The recently accessed node is moved to the top Self-Balancing by Splay operation Properties of splay tree Recently accessed elements are quick to access again Basic operations run in O(log n) amortized time It is simpler to implement splay trees than red-black trees or AVL trees Splay trees don't need to store any extra data in nodes SNU Data Structures 93 IDB Lab. The Splay Operation We call recently accessed(searched, inserted, or deleted) node as splay node Splay operation is performed on splay node to move it to the root We can perform successive accesses faster because recently accessed node is moved to the top of the tree x g p x A g splay node A B p D B C D C Splay operation comprises sequence of the following splay steps. If (Splay node = root) then sequence of steps is empty Else splay step moves the splay node either 1 level or 2 levels up the tree Data Structures 94 SNU IDB Lab. Splay Node Search(x) makes the node x as a splay node Insert(x) makes the node x as a splay node Delete(x) makes the parent node of x as a splay node 5 2 1 5 6 4 1 Search(4) 2 5 2 6 4 splay node 3 2 6 4 1 Insert(3) 5 splay node 6 1 splay node Delete(4) Data Structures 95 SNU IDB Lab. One Level Splay Step When the level of splay node = 2 (Only) L splay step : splay node is Left child of its parent R splay step : splay node is Right child of its parent L splay step If splay node q is the left child of its parent, then do rotation like the following Notice that following the splay step the splay node becomes the root of binary search tree root Data Structures root 96 SNU IDB Lab. Two Level Splay Step When the level of splay node > 2 Types LL : p is Left child of gp, q is Left child of p LR : p is Left child of gp, q is Right child of p RR : p is Right child of gp, q is Right child of p RL : p is Right child of gp, q is Left child of p LL Data Structures RL LR 97 RR SNU IDB Lab. LL Splay Step If splay node q is the left child of its parent & its parent is the left child of its grandparent, then do rotation like the following The splay node is moved 2 level up Data Structures 98 SNU IDB Lab. LR Splay Step If splay node q is the right child of its parent & its parent is the left child of its grandparent, then do rotation like the following The splay node is moved 2 level up Data Structures 99 SNU IDB Lab. Sample Splay Operation Search “2” Data Structures 100 SNU IDB Lab. Rotation Taxonomy of Splay Tree 1 level splay step L type R type 2 level splay step LL type LR type RL type RR type Data Structures 101 SNU IDB Lab. Concept of Amortized Rule: Spend less than 100$ per month Normal spending – Spend less than 100$ per month Amortized spending – Spend less than (100 * 12)$ per year Remember array expansion Regular complexity Double the size (initialize) -- O(n) Copy the old array to the new array – O(n) Amortized complexity Doubling will happen after n insertions! One insertion is responsible for one slot expansion O(1) Data Structures 102 SNU IDB Lab. Amortized Complexity (1) In an amortized analysis, the time required to perform a sequence of data-structure operations is averaged over all the operations performed Amortized analysis differs from average-case analysis Amortized analysis guarantees the average performance of each operation in the worst case n n i 1 i 1 amortized (i) actual(i) Theorem 16.1 The amortized complexity of a get, put or remove operation performed on a splay tree with n element is O(log n) Actual Complexity of any sequence of g get, p put and r remove operations O((g+p+r)log n) Data Structures 103 SNU IDB Lab. Amortized Complexity (2) Example (1) 7 6 5 4 4 2 1 4 1 2 splay 2 2 LL 1 5 4 3 6 5 splay L 7 4 6 3 3 2 3 5 splay LR 1 7 6 6 search(2) 5 1 7 7 3 T1 = (search time)+(splay time)= 6 comparisons + 5 rotations Data Structures 104 SNU IDB Lab. Amortized Complexity (3) Example (2) 2 1 7 1 search(1) 5 4 3 1 2 7 5 4 6 6 3 2 7 splay L 5 4 6 3 T2 = (search time) + (splay time) = 2 comparisons + 1 rotation Data Structures 105 SNU IDB Lab. Amortized Complexity (4) Example (3) 1 1 2 2 7 7 5 4 3 1 2 search(2) 5 4 6 6 splay R 7 5 4 6 3 3 T3 = (search time) + (splay time) = 2 comparisons + 1 rotations Data Structures 106 SNU IDB Lab. Amortized Complexity (5) Example (4) In the previous example, total time taken is 10 comparisons + 7 rotations If there were no splay operation, total time taken would be 18 comparisons Generally, it is known that (t1+t2+…+tk) / k ≤ 3*log2n, where n is the number of nodes if k is large enough 7 6 7 6 search(2) 5 7 5 4 4 1 2 3 5 4 1 2 6 search(2) 5 4 1 Data Structures 6 search(1) 7 1 2 3 2 3 107 3 SNU IDB Lab. Table of Contents AVL TREES RED-BLACK TREES SPLAY TREES B-TREES Indexed Sequential Files (ISAM) m-WAY Search Trees B-Trees of Order m Height of a B-Tree Searching a B-Tree Inserting into a B-Tree Deletion from a B-Tree Data Structures 108 SNU IDB Lab. Indexed Sequential Access Method (ISAM) Small dictionary may reside in internal memory Large dictionary must reside on a disk ISAM file (= Indexed Sequential file) A disk consists of many blocks Elements (records) are packed into a block in ascending order disk-based file structure for large dictionary Provide good sequential and random access Primary Concern: reducing the number of disk IO s during search Data Structures 109 SNU IDB Lab. Overview : ISAM File R 61 b 101 10 20 50 61 d 1 3 10 A B A e 11 20 C D f 30 40 45 D C A part description records primary key PART No PART-Type a g 51 55 57 A D B 50 D c h 65 70 101 E B C 60 B i 120150 A D 61 A Example : Indexed sequential structure (when using overflow chain) File Structures Data Structures 110 SNU IDB Lab. File Structure Evolution Sequential file: records can be accessed sequentially not good for access, insert, delete records in random order Indexed-sequential file = Indexed Sequential Access Method (ISAM) Sequential file + Index B+ tree file Indexed-sequential file + Balance But here we study “B tree” data structure --- m-Way search tree is similar to ISAM file File Structures Data Structures 111 SNU IDB Lab. Table of Contents AVL TREES RED-BLACK TREES SPLAY TREES B-TREES Indexed Sequential Files (ISAM) m-WAY Search Trees B-Trees of Order m Height of a B-Tree Searching a B-Tree Inserting into a B-Tree Deletion from a B-Tree Data Structures 112 SNU IDB Lab. m-Way Search Tree Binary Search Tree can be generalized to m-Way search tree White box is an internal node while solid square is external node Each internal node can have up to six keys and seven pointers A certain input sequence would build the following example Data Structures 113 SNU IDB Lab. Properties of m-WAY Search Tree m-Way search tree has the following properties In the corresponding extended search tree, each internal node has up to p+1 children and between 1 and p elements. Every node with p elements has exactly p + 1 children Let k1, ...,kp be the keys of these ordered elements (k1< k2<…< kp) Let c0, c1…, cp be the p+1 children of the node. Key ranges The elements in the subtree with root co have keys smaller than k1 The elements in the subtree with root cp have keys larger than kp The elements in the subtree with root ci have keys larger than ki but smaller than ki+1, 1≤ i ≤ p Data Structures 114 SNU IDB Lab. Searching an m-Way Search Tree Search the element with key 31 10< 31 <80 : Move to the middle subtree k2< 31 <k3 : Move to the third subtree 31< k1 : Move to the first subtree, Fall off the tree, No element Data Structures 115 SNU IDB Lab. Inserting into an m-Way Search Tree Insert the new key 31 (a) Search for 31 & Fall off the tree at the node[32,26] (b) Insert at the first element in the node Data Structures 116 SNU IDB Lab. Inserting into an m-Way Search Tree Insert the new key 65: (a) Search for 65 & Fall off the tree at six subtree of node [20,30,40,50,60,70] (b) New node obtained & New node becomes the sixth child of [20,30,40,50,60,70] 65 Data Structures 117 SNU IDB Lab. Deleting from an m-Way Search Tree Delete the key 20 Search for 20, k1=20 & C0=C1=0, and Simply Delete 20 Data Structures 118 SNU IDB Lab. Deleting from an m-Way Search Tree Delete the key 84 Search for 84, k2=84 & C1=C2=0, and Simply Delete 84 Data Structures 119 SNU IDB Lab. Deleting from an m-Way Search Tree Delete the key 5 : (a) Only one key in the node Need to replace (b) From C0, move up the element with largest key move the key 4 to the key 5’s position Data Structures 120 SNU IDB Lab. Deleting from an m-Way Search Tree Delete the key 10 Replace this element with either the largest element in C0 or smallest element in C1 So, element with key 5 is moved to top & element with key 4 is moved up to the key 5’s position Data Structures 121 SNU IDB Lab. Height of an m-Way Search Tree h : Height, n : number of elements, m : m-way The number of elements: h ≤ n ≤ mh – 1 h-1 The number of nodes : ∑ mi = (mh-1)/(m-1) nodes i=0 The range of height: logm(n+1) ≤ h ≤ n The number of disk accesses : O(h) We want to ensure that the height h is close to logm(n+1) this is accomplished by B-tree! Data Structures 122 SNU IDB Lab. Table of Contents B-TREES Indexed Sequential Access Method(ISAM) m-WAY Search Trees B-Trees of Order m Height of a B-Tree Searching a B-Tree Inserting into a B-Tree Deletion from a B-Tree Data Structures 123 SNU IDB Lab. Definition: B tree of Order m B-tree is a m-way search tree satisfying the following properties 1. 2. 3. The root has at least two children All internal nodes other than the root at least m/2 children (pointers to the children nodes) All external nodes are at the same level Internal node has several pairs of a key and a pointer to a disk block Data Structures 124 SNU IDB Lab. B-Trees of Order m B-tree of order 2: Fully binary tree B-tree of order 3 (= 2- 3 tree): 2 or 3 children B-tree of order 4 (= 2- 3- 4 tree): 2 or 3 or 4 children Data Structures 125 SNU IDB Lab. Table of Contents B-TREES Indexed Sequential Access Method(ISAM) m-WAY Search Trees B-Trees of Order m Height of a B-Tree Searching a B-Tree Inserting into a B-Tree Deletion from a B-Tree Data Structures 126 SNU IDB Lab. Height of a B-Tree of Order m Remember: All internal nodes other than the root at least m/2 children (pointers to the children nodes) Lemma 16.3 Let T be a B-tree of order m Let h be the height of T Let d= m/2 be the degree of T Let n be the number of elements in T (a) 2dh-1 ≤ n ≤ mh – 1 (b) logm(n + 1) ≤ h ≤ logd((n+1)/2) + 1 Data Structures 127 SNU IDB Lab. Table of Contents B-TREES m-WAY Search Trees B-Trees of Order m Height of a B-Tree Searching a B-Tree Inserting into a B-Tree Deletion from a B-Tree Data Structures 128 SNU IDB Lab. Searching a B-Tree Using the same algorithm as is an m-way search tree First visit the root with the given key K Compare K and the keys in the root Follow the corresponding pointer Search the child node recursively until the leaf node If arrived at the leaf node, Search the external node Data Structures 129 SNU IDB Lab. Table of Contents B-TREES Indexed Sequential Access Method(ISAM) m-WAY Search Trees B-Trees of Order m Height of a B-Tree Searching a B-Tree Inserting into a B-Tree Deletion from a B-Tree Data Structures 130 SNU IDB Lab. Inserting into a B-Tree First search with the key of the new element Found Insertion fails (if duplicates are not permitted) Not Found Insert the new element into the last encountered internal node If (no overflow) return ok Else (overflow) { split the last internal node into 2 new nodes; go to the 1-level up for updating the parent node (recursively)} Data Structures 131 SNU IDB Lab. Notations in B-tree e : element Full node has m elements & m+1 children d : degree of a node at least m/2 ei : element pointers d-1, c0, (e1, c1), …, (ed1-1, cd-1) Q : Right remainder m, c0, (e1, c1), …, (em, cm) P : Left remainder ci : children pointers Overfull node c : children p : parent node m-d, cd, (ed+1, cd+1), …, (em, cm) Pair(ed, Q) is inserted into the parent of P Data Structures 132 SNU IDB Lab. Insert the key 3 in B-tree Data Structures 133 SNU IDB Lab. Insert the key 25 in B-tree d = 4 & the target node (“6”, 20,30,40,50,60,70) P : 3, 0, (20,0), (25,0), (30,0) Q : 3, 0, (50,0), (60,0), (70,0) (40, Q) is inserted into parent of P P Data Structures Q 134 SNU IDB Lab. Growing B tree by Insertion (1) 30 80 20 10 50 60 25 35 40 55 90 70 82 85 95 Fig 16.25 B-tree of order 3 (at least 2 pointers) node format: M, C0, (e1, c1), (e2, c2)… (em, cm) where m= no of elements, ei = elements, ci = children Data Structures 135 SNU IDB Lab. Growing B tree by Insertion (2) Insert 44 30 80 20 10 50 60 25 35 40 44 55 90 70 82 85 95 d = 2 & the target node was (2, c5, (35,c6),(40, c7)) Overfull node 3, c5, (35,c6), (40,c7), (44,cn) Data Structures 136 SNU IDB Lab. Growing B tree by Insertion (3) 30 80 S 20 10 82 85 95 d= 3/2 2, split the overfull node into P & Q 25 40 50 60 P Q C D 35 44 55 70 T 90 P : 1, 0, (35,0) Q : 1, 0, (44,0) (40,Q) into the parent A of P Again the parent A is overfull node Data Structures 137 SNU IDB Lab. Growing B tree by Insertion (4) S 20 10 25 30 80 A 40 50 60 P Q C D 35 44 55 70 T 90 82 85 95 Node A is again the overfull node A : 3, P, (40,Q), (50,C), (60,D) Data Structures 138 SNU IDB Lab. Growing B tree by Insertion (5) S 20 10 T 90 82 85 95 d= 3/2 = 2, split the node A into A & B 25 R 30 50 80 B A 40 60 P Q C D 35 44 55 70 A : 1, P, (40,Q) B : 1, C, (60,D) Move (50,B) into the parent of A Again the parent of A is overfull node Data Structures 139 SNU IDB Lab. Growing B-tree by Insertion (6) S 20 10 25 R 30 50 80 B A 40 60 P Q C D 35 44 55 70 T 90 82 85 95 The root node R is now the overfull node R : 3, S, (30,A), (50,B), 80,T) Data Structures 140 SNU IDB Lab. Growing B tree by Insertion (7) 50 R 30 S 20 10 B 60 A 40 P 35 Q C 44 55 D 70 T 90 82 85 95 d= 3/2 2, split the root node R into R & U 25 U 80 R : 1, S, (30,A) U : 1, B, (80,T) Move the new index (50, U) into the parent of R R has no parent, we create a new root for the new index Data Structures 141 SNU IDB Lab. Disk accesses in B tree Worst case: Insertion may cause s nodes to split upto root Number of disk accesses in the worst case h (to read in the nodes on the search path) + 2s (to write out the two split parts of each node) + 1 (to write the new root or the node into which an insertion that does not result in a split is made) h + 2s + 1 at most 3h + 1 because s is at most h The worst scenario is to have 3h+1 disk IOs by splitting Data Structures 142 SNU IDB Lab. Table of Contents B-TREES Indexed Sequential Access Method (ISAM) m-WAY Search Trees B-Trees of Order m Height of a B-Tree Searching a B-Tree Inserting into a B-Tree Deletion from a B-Tree Data Structures 143 SNU IDB Lab. Deletion from a B-Tree Deletion cases Case 1: Key k is in the leaf node Case 2: Key k is in the internal node Case 2 by replacing the deleted element with The largest element in its left-neighboring subtree The smallest element in its right-neighboring subtree Replacing element is supposed to be in a leaf, so we can apply case 1 Data Structures 144 SNU IDB Lab. Case 1: Leaf Node Deletion If key k is in leaf node, then remove k from leaf node X If underfull node happens, care must be exercised (will address shortly) Data Structures 145 SNU IDB Lab. Case 2: Internal Node Deletion If the key k is in the internal node x One of 3 subcases: a. If the left child y preceding k in x has ≥ t keys b. If the right child z following k in x has ≥ t keys c. If both the left and right subchild y and z have t-1 keys t : m/2 - 1 (half of the keys) Data Structures 146 SNU IDB Lab. Case 2a: Internal Node Deletion (1) If the left child y preceding k in x has ≥ t keys Find predecessor k' of k in subtree rooted at y Replace k by k' in x x Data Structures 147 SNU IDB Lab. Case 2a: Internal Node Deletion (2) If underfull node happens, care must be exercised (will address shortly) Data Structures 148 SNU IDB Lab. Case 2b: Internal Node Deletion If the right child z following k in x has ≥ t keys: (a) Find successor k' of k in subtree y, (b) Replace k by k' in x If underfull node happens, care must be exercised (will address shortly) Data Structures 149 SNU IDB Lab. Case 2c: Internal Node Deletion If both the left and right subchild y and z have t-1 keys Select the replacement as shown in case 2a or case 2b If underfull node happens, care must be exercised as shown in the below Data Structures 150 SNU IDB Lab. Shrinking B-Tree by Deletion (1) 50 R 30 S 20 10 25 U 80 A 40 P 35 Q C 44 55 B 60 * Try to delete “44” T 90 D 70 82 85 95 50 R 30 S 20 After deleting “44”, “35” & “40” are merged 10 Data Structures 25 151 U 80 A 40 P 35 Q C 44 55 B 60 D 70 T 90 82 85 SNU IDB Lab. 95 Shrinking B-Tree by Deletion (2) “20” & “40” also needs to merged “50” and “80” also needs to merged Data Structures 152 SNU IDB Lab. Shrinking B-Tree by Deletion (3) “50” & “80” are merged and now the old root becomes empty Free the old root and make the new root Data Structures 153 SNU IDB Lab. Technique for Reducing Node Merging : B tree Deletion with Redistribution (1) Try to delete “25” Underflow happens & Redistribute some neighbor nodes Move down 10 & move up 6 Data Structures Save node merging 154 SNU IDB Lab. Technique for Reducing Node Merging : B tree Deletion with Redistribution (2) Try to delete “10” Merging is unavoidable Data Structures 155 SNU IDB Lab. Technique for Reducing Node Merging : B tree Deletion with Redistribution (3) Consider redistributing some nodes: move down “30” & move up “50” Data Structures Save propagation of 156 node merging SNU IDB Lab. Summary (0) Chapter 15: Binary Search Tree Chapter 16: Balanced Search Tree BST and Indexed BST AVL tree: BST + Balance B-tree: generalized AVL tree Chapter 17: Graph Data Structures 157 SNU IDB Lab. Summary (1) Balanced tree structures - Height is O(log n) AVL and Red-black trees Splay trees Suitable for internal memory applications Individual dictionary operation 0(n) Take less time to perform a sequence of u operations 0(u log u) B-trees Suitable for external memory Data Structures 158 SNU IDB Lab.