Foundations of Data Structures Practical Session #7 AVL Trees 2 AVL Tree properties Height-Balance Property For every internal node 𝑣 of a tree 𝑇, the height of the children nodes of 𝑣 differ by at most 1. AVL Tree Any binary search tree that satisfies the Height-Balance property. Thus, it has a height of 𝑂(𝑙𝑜𝑔𝑛), which implies an O(𝑙𝑜𝑔𝑛) worst case search and insertion times. AVL Interface Supports the following operations in (𝑙𝑜𝑔𝑛) time: insert, search, delete, maximum, minimum, predecessor and successor. AVL Height Lemma: The height of an AVL tree storing 𝑛 keys is 𝑂(𝑙𝑜𝑔𝑛). 2 AVL Tree example 14 11 7 4 17 12 8 53 13 3 Question 1 Insert the following sequence of integers into an empty AVL tree: 14, 17, 11, 7, 53, 4, 13 14 11 7 17 53 4 4 A single right rotation of ’11’ is executed to rebalance the tree: 14 7 4 17 11 53 Insert 13 13 5 Now insert 12 14 7 4 17 11 53 13 The sub-tree of 11 is unbalanced. Double rotation: right and then left. 12 6 After right rotation of ’13’ Now left rotate ’11’ 14 7 4 17 11 53 12 13 7 After left rotation of ’11’ Now balanced! 14 7 4 17 12 11 53 13 8 Now insert 8 14 7 4 12 11 The sub-tree of 7 is unbalanced. Required double rotation: right and then left. 17 53 13 8 9 After right rotation of ’12’ Now left rotate ‘7’ 14 7 4 17 11 8 53 12 13 10 Now balanced! 14 11 7 4 17 12 8 53 13 11 Remove 53 14 11 7 4 17 12 8 53 13 12 Unbalanced! Right rotate ’14’ 14 11 7 4 17 12 8 13 13 Balanced! Remove 11 11 7 4 14 8 12 17 13 14 Remove 11 11 7 4 14 8 12 17 13 Replace it with the maximum in its left branch 15 Remove 8 8 7 4 14 12 17 13 16 Unbalanced! Required double rotatation 7 4 14 12 17 13 17 After right rotation of ‘14’ 7 4 12 14 13 17 18 After left rotation of ‘7’ 12 7 4 14 13 17 19 Question 2 In class we’ve seen an implementation of AVL tree where each node v has an extra field h, the height of the sub-tree rooted at v. The height can be used in order to balance the tree. - How many bits are required to store the height in a node? Answer: For an AVL tree with n nodes, h=O(logn) thus requires O(loglogn) extra bits. 1. How can we reduce the number of the extra bits necessary for balancing the AVL tree? 2. Suggest an algorithm for computing the height of a given AVL tree given in the representation you suggested in 1. 20 Question 2 solution 1. Instead of a height field, which is redundant, each node will store 2 balance bits, calculated as the difference of heights between its right and left sub-trees. • • Two bits suffice because the difference can be one of the three: -1, 0, 1. (The leftmost bit represents the sign) The balance field should be updated on insert and delete operations, along the path to the root. 21 Question 2 solution 2. To compute the height of a tree, follow the path from the root to the deepest leaf by reading the balance field. If a sub tree is balanced to one side, the deepest leaf resides on that side. CalcHeight(T) if T == null return -1 if T.balance == -1 or T.balance == 0 return 1 + CalcHeight( T.left ) else return 1 + CalcHeight( T.right ) 22 Question 3 Suggest two ways for an AVL tree to support a query for retrieving all the keys in range [𝑘1 , 𝑘2 ] in 𝑂(𝑙𝑜𝑔𝑛 + 𝑘) time, where 𝑘 is the number of keys in the range. 23 Question 3 solution 1. Store in each node pointers to its predecessor and successor. • • Requires updating on insert and delete operations. Finding the successor/predecessor requires an 𝑂 ℎ𝑒𝑖𝑔ℎ𝑡 time, equivalent to 𝑂(𝑙𝑜𝑔𝑛) in an AVL tree, thus the time of the insert and delete operations is unchanged. 24 Question 3 solution 2. Use the following claim: “Starting at any node in a height ℎ BST, 𝑘 successive calls to TREE-SUCCESSOR take 𝑂(𝑘 + ℎ) time.” • Doesn’t require extra pointers. • Doesn’t require modifications to the insert and delete operations. 25 Question 3 solution Reminder: TREE-SUCCESSOR(x) If x.right != NULL then return TREE-MINIMUM(x.right) y ← x.parent while y != NULL and x == y.right do x←y y ← y.parent return y 26 Question 3 solution Claim: “Starting at any node in a height ℎ BST, 𝑘 successive calls to TREE-SUCCESSOR take 𝑂(𝑘 + ℎ) time.” Proof outline • Let 𝑥 be the starting node and 𝑧 be the ending node after 𝑘 successive calls to TREE-SUCCESSOR. • Let 𝑃 be the simple path between 𝑥 and 𝑧 inclusive. • Let 𝑦 be the common ancestor of 𝑥 and 𝑧 that 𝑃 visits. • The length of 𝑃 is at most 2ℎ, which is 𝑂(ℎ). • Let output be the elements with keys between 𝑥. 𝑘𝑒𝑦 and 𝑧. 𝑘𝑒𝑦 inclusive. • The size of the output is 𝑂(𝑘). • In the execution of 𝑘 successive calls to TREE-SUCCESSOR, each node in 𝑃 is visited at most 3 times (on the way to its left, right and up). • Besides the nodes 𝑥, 𝑦 and 𝑧, if a sub tree of a node in 𝑃 is visited then all its elements are in output. • Hence, the total running time is 𝑂(ℎ + 𝑘). 27 Question 4 Suggest an efficient algorithm for sorting an array of numbers. Analyze its running time and required space. 28 Question 4 solution 1. Define a new empty AVL tree, T. 2. Traverse the input array and insert each item to T. 3. Traverse the tree T in a In-order manner, copying the items back to the array. Time: step 2 requires 𝑂(𝑛𝑙𝑜𝑔𝑛), step 3 requires 𝑂(𝑛). In total 𝑂(𝑛𝑙𝑜𝑔𝑛). Extra space: an AVL tree of size 𝑛 requires 𝑂(𝑛) extra space. 29 Question 5 Suggest a data structure for storing integers that supports the following operations. Init() Initialize the data structure. O(1) Insert(x) Insert x, if it is not present yet. O(log n) Delete(x) Delete x if it exists. O(log n) Delete the element in the ith place (as determined DeletePlace(i) by the order of insertion). GetPlace(x) Return the place (which is determined by the order of insertion) of x. If x does not exist, return -1. O(log n) O(log n) 30 Question 5 solution For example, for the following sequence of actions: Insert(3), Insert(5), Insert(11), Insert(4), Insert(7), Delete(5) GetPlace(7) returns 4, and DeletePlace(2) will delete 11. The solution We will use two AVL trees: • T1 stores the elements by their key. • T2 stores the elements by the order of insertion (using a running counter). • There are pointers between the two trees connecting the nodes with the same key. 31 Question 5 solution Init() – initialize two empty trees Insert(x) – insert the element by its key into T1, insert it by its order of insertion into T2 and update the pointers. Delete(x) – find the element in T1 and delete it from both the trees, following the pointer. DeletePlace(i) – find the node with key 𝑖 in T2, delete it from both the trees, following the pointer. GetPlace(x) – find the node with key 𝑥 in T1, follow the pointer to its copy in T2 and return its key (in T2). 32