Lecture 2 Search Trees: Splay Trees, ”Multi-Way” Search Trees, B-Trees TDDC32 Lecture notes in Design and Implementation of a Software Module in Java 16 January 2013 Tommy Färnqvist, IDA, Linköping University 2.1 Lecture Topics Innehåll 1 Splay Trees 1 2 (2, 4) Trees 8 3 B-Trees 9 1 2.2 Splay Trees Splay Tree - basic idea Recall the basic BST: • Simple insert and delete when balanced, but. . . • The “balance” is determined by order of inserts and deletes. . . Combine with the “keep recent objects first” heuristics for lists? • Often-used elements should be near the root! insert: 1,2,4,5,8 insert: 5,2,1,4,8 2.3 The splay(k) operation • Perform a normal search for k, remember all nodes we pass. . . • Label the last node we inspect P – If k is in T , then k is in node P, – otherwise P is the parent of an empty tree • Return back to root, at each node do a rotation to move P up the tree. . . (3 cases) 2.4 1 The splay(k) operation • zig: parent(P) is the root: rotate around P: 2.5 Q P P Q c a a b b c The splay(k) operation • zig-zig: P and parent(P) are both left children (or both right children): perform two rotations to shift P up: R Q Q P P R Q d a P R a c a b c d b b c d 2.6 The splay(k) operation • zig-zag: One of P and parent(P) is a left child and the other is a right child or vice versa: Perform two rotations in different directions: R P Q R Q a P a d b b c d c Note: These rotations may increase the height of the tree! 2 2.7 find and insert function FIND(k, T ) SPLAY (k, T ) if KEY(ROOT(T )) = k then return (k, v) else return null function INSERT(k, v, T ) SPLAY (k, T ) if KEY(ROOT(T )) = k then update its value with v else make (k, v) a new root in T 2.8 Example: inserting 14 2.9 Example: inserting 14 3 2.10 Example: inserting 14 2.11 Example: inserting 14 4 2.12 Example: inserting 14 2.13 Example: inserting 14 5 2.14 delete function DELETE(k, T ) if k is at a leaf then perform SPLAY on the parent of the leaf else if k is at an internal node then replace the node with its inorder predecessor perform SPLAY on the parent of the predecessor Of course, using the inorder successor also works. 2.15 Exampel: deleting 8 2.16 6 Exampel: deleting 8 2.17 Exampel: deleting 8 2.18 Performance • Each operation may face a totally unbalanced tree – not guaranteed to operate in O(log n) time in worst case • Amortized time is logarithmic: 7 – Any sequence of length m of these operations, starting with an empty tree, will use a total amount of O(m log m) time – Hence, the amortized cost/time of an operation is O(log n), although individual operations may be significantly worse 2.19 2 (2, 4) Trees New approach: Relax some condition • AVL Tree: binary tree, accepts a small unbalance. . . • Recall: Full binary tree: non-empty; degree is either 0 or 2 for each node Perfect binary tree: full, all leaves have the same depth • Can we build and maintain a perfect tree (if we skip “binary”)? Then we would always know the worst search time exactly! 2.20 (2, 4) Tree Previously: • A single “pivot element” • If larger we search to the right • If smaller we search to the left Now: • Allow multiple (namely 1–3) pivot elements • No. of children of an internal node = no. of pivot elements + 1 (i.e. 2–4) 5 2 5 2 8 10 8 15 12 18 2.21 More generally: (a, b)-trees • Each node is either a leaf, or has c children, where a ≤ c ≤ b Each node has a − 1 to b − 1 pivot elements • 2 ≤ a ≤ (b + 1)/2 (but the root may have only two children (or none) even when a > 2) • find works approximately as before • insert must check that a node does not overflow (then we split the node) • remove must check that a node does not become empty (then we merge nodes or transfer values between nodes Proposition 1 (14.1 in course book). The height of an (a, b)-tree storing n entries is Ω(log n/ log b) and O(log n/ log a). ⇒ Shorter/flatter trees, but more work to be done in nodes. 8 2.22 Insertion in (a, b)-treewith a = 2 and b = 4 • As long as there is room in the child we find, add element to that child. . . • If full, split and push the median (rounded up) pivot element upwards. . . . . . this may happen repeatedly 12 Insert(15) 4 6 12 4 6 12 15 4 6 15 2.23 Delete in a (2, 4)-tree Three cases • No constraints are violated by removal • A leaf is removed (becomes empty) Then transfer some other key to that leaf, . . . ok if one of the two nearest siblings have 2+ elements 30 50 25 35 40 Transfer of 30 and 35 30 50 Delete(25) ? 35 40 60 35 50 ? 50 60 30 35 40 60 40 30 60 2.24 Delete in a (2, 4)-tree • If a leaf is removed (becomes empty) • Then transfer some other key to that leaf, or • Merge (fuse) it with a neighbour 35 50 Delete(60) 30 35 50 40 ? 35 35 30 40 60 30 30 40 40 50 50 2.25 Delete in a (2, 4)-tree • An internal node becomes empty Root: replace with inorder predecessor or successor Then repair inconsistencies with suitable merge and transfer operations. . . 3 2.26 B-Trees B-Tree • • • • Used to keep an index over external data (e.g. content of a disc) It’s only an (a, b)-tree where a = db/2e, i.e. b = 2a − 1 or b = 2a We may now choose b so that b references to children (and b − 1 keys) fit into a single disc block By defining a = db/2e we will always fill up a disc block when two blocks are merged! B-Trees are widely used for file systems and databases • Windows: HPFS • Mac: HFS, HFS+ • Linux: ReiserFS, XFS, Ext3FS, JFS • Databases: ORACLE, DB2, INGRES, PostgreSQL 9 2.27 Voluntary Homework Problem Show what happens step-by-step when you remove the values 39, 18, 50 from the following (2, 4)-tree: 20 10 3 30 40 50 18 23 39 44 59 2.28 10