Data Structures and Algorithms for Information Processing Lecture 6: Heaps, B-Trees, and B+Trees 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 1 Homework Policy • Late homework will normally be penalized 10% per day late; • Each student may turn in one late homework with no penalty (up to one week late) 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 2 Grading • Homeworks(4-5) • Midterm Exam • Final Exam 90-723: Data Structures and Algorithms for Information Processing 50% 25% 25% Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 3 Today’s Topics • Ways to Balance Trees – Heaps & Priority Queues – B-Trees • Time Analysis of Trees – Binary trees – Heaps – B-Trees • See Chapter 10 in Main • B+ trees 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 4 Binary Trees: Worst Case 1 Inserting nodes that are already sorted leads to worst-case behavior: d = (n - 1) = 5 2 How can we use the idea of balanced trees to avoid this kind of situation? 3 4 5 6 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 5 Balanced Trees 4321 2 221 453 6 1 1 3 3 5 Trees are “no deeper than they have to be” Complete binary trees minimize depth by forcing each row to be full before d is increased 90-723: Data Structures and Algorithms for Information Processing 7 Heaps are complete binary trees which limit the depth to a minimum for any given n nodes, independently of the order of insertion. Heaps are not search trees. Main’s slides on Heaps Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 6 B-Trees • B-Trees are a type of search tree • Further reduction in depth for a given tree of n nodes • Two adjustments: – nodes have more than two children – nodes hold more than a single element 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 7 B-Trees • Can be implemented as a set (no duplicate elements) or as a bag (duplicate elements allowed) • This example focuses on the set implementation 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 8 B-Trees • Every B-Tree depends on a positive constant, MINIMUM, which determines how many elements are held in a single node • Rule 1: The root may have as few as 0 or 1 elements; all other nodes have at least MINIMUM elements 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 9 B-Trees • Rule 2: The maximum number of elements in a node is twice the value of MINIMUM • Rule 3: Elements in a node are stored in a partially-filled array, sorted from smallest (element 0) to largest (final position used) 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 10 B-Trees • Rule 4: The number of subtrees below a non-leaf node is always one more than the number of elements in the node 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 11 B-Trees • Rule 5: For any non-leaf node: – The element at index I is greater than all the elements in subtree number I of the node – An element at index I is less than all the elements in subtree (I + 1) of the node 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 12 B-Trees 93 and 107 Each element in subtree 2 is greater than 107. Each element in subtree 0 is less than 93. Subtree Number 0 Subtree Number 1 Subtree Number 2 Each element in subtree 1 is between 93 & 107. 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 13 B-Trees • Rule 6: Every leaf in a B-Tree has the same depth • The implication is that B-Trees are always balanced. 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 14 B-Tree Example 6 2 and 4 1 3 9 5 7 and 8 NOTE: Every child of the root node is also a B-Tree! 90-723: Data Structures and Algorithms for Information Processing 10 MINIMUM = 1 Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 15 Set ADT with B-Trees public class IntBalancedSet { // constants private static final MINIMUM = 200; private static final MAXIMUM = 2 * MINIMUM; // info about root node int dataCount; int[] data = new int[MAXIMUM + 1]; int childCount; // info about children IntBalancedSet[] subset = new IntBalancedSet [MAXIMUM+2]; …} 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 16 MINIMUM = 1 6 MAXIMUM = 2 2 and 4 1 9 3 5 7 and 8 dataCount 1 data childCount 2 subset 10 6 ? ? null null [References to IntBalancedSet instances] 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 17 Invariant for Set B-Tree • The elements of the set are stored in a B-Tree, satisfying the 6 rules • The number of elements in the root is stored in the instance variable dataCount, and the number of subtrees is stored in the instance variable childCount. 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 18 Invariant for Set B-Tree • The root’s elements are stored in data[0] through data[dataCount - 1] . • If the root has subtrees, then subset[0] through subset[childCount - 1] are references to those subtrees. 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 19 Searching a B-Tree • Sets use the method contains to find if an element is in the set: – Set I equal to the first index I where data[I]>=target; otherwise I = dataCount – If data[I] == target, return true; else if (no children) return false; else return subset[I].contains(target); 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 20 Sample Search contains(7); 6 7 > 6, so I = dataCount = 1 2 and 4 1 3 9 5 7 and 8 Subset[1].contains(7); 10 9>=7, so I = 0; data[I] != 7 Subset[0].contains(7); 7>=7, so I = 0; data[I] = 7! 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 21 Add/Remove from B-Tree • Complex two-pass operations • pp. 500-512 • Covered on next slide set for 2-3 trees 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 22 Trees, Logs, Time Analysis • Heaps and B-Trees are efficient because d is kept small • How can we relate the depth of a tree and the worst-case time required to search, add, and remove an element? 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 23 Trees, Logs, Time Analysis • The worst case time performance for the following operations are all O(d): – Adding an element to a binary search tree, heap, or B-Tree – Removing an element from a binary search tree, heap or B-Tree – Searching for a specified element in a binary search tree or B-Tree 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 24 Trees, Logs, Time Analysis • How can we relate the depth d to the number of elements n? • Example: binary trees – d is no more than n - 1 – O(d) is therefore O(n - 1) = O(n) (remember, we can ignore constants) 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 25 Time Analysis for Heaps • Heaps – Level 0 1 2 3 … d 90-723: Data Structures and Algorithms for Information Processing Nodes to Fill 1 2 4 8 … 2^d Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 26 Time Analysis for Heaps • Minimum nodes to reach depth d in a heap: (1 2 4 ... 2 d 1 ) 1 2 d • The number of nodes in a heap is d at least 2 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 27 Review Base-2 Logarithms • For any positive number x, the base 2 logarithm of x is an exponent r such that: 2 r 90-723: Data Structures and Algorithms for Information Processing x Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 28 Review Base-2 Logarithms 20 1 log 2 1 0 21 2 log 2 2 1 22 4 log 2 4 2 ... 2d 2d log 2 2 d d 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 29 Worst-Case For Heaps • In a heap the number of elements n is at least 2^d log 2 n log 2 2 d log 2 2 d d log 2 n d 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 30 Worse-Case For Heaps • Adding or removing an element in a heap with n elements is O(d) where d is the depth of the tree. Because d is no more than log2(n), the operations are O(log2(n)), which is O(log(n)). • (see discussion p. 516-520) 90-723: Data Structures and Algorithms for Information Processing Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 31 Many Databases use B+ Trees 90-723: Data Structures and Algorithms for Information Processing From Wikipedia Lecture 6: Heaps & B-Trees Copyright © 1999, Carnegie Mellon. All Rights Reserved. 32