CPS216: Advanced Database Systems Notes 06: Operators for Data Access (contd.) Shivnath Babu 1 Insertion in a B-Tree n=2 49 15 36 49 Insert: 62 2 Insertion in a B-Tree n=2 49 15 36 49 62 Insert: 62 3 Insertion in a B-Tree n=2 49 15 36 49 62 Insert: 50 4 Insertion in a B-Tree 49 15 36 n=2 62 49 50 62 Insert: 50 5 Insertion in a B-Tree 49 15 36 n=2 62 49 50 62 Insert: 75 6 Insertion in a B-Tree 49 15 36 n=2 62 49 50 62 75 Insert: 75 7 Insertion 8 Insertion 9 Insertion 10 Insertion 11 Insertion 12 Insertion 13 Insertion 14 Insertion 15 Insertion 16 Insertion 17 Insertion 18 Insertion: Primitives Inserting into a leaf node Splitting a leaf node Splitting an internal node Splitting root node 19 Inserting into a Leaf Node 58 54 57 60 62 20 Inserting into a Leaf Node 58 54 57 60 62 21 Inserting into a Leaf Node 58 54 57 58 60 62 22 Splitting a Leaf Node 61 54 54 57 58 60 66 62 23 Splitting a Leaf Node 61 54 54 57 58 60 66 62 24 Splitting a Leaf Node 61 54 54 57 58 66 60 61 62 25 Splitting a Leaf Node 59 61 54 54 57 58 66 60 61 62 26 Splitting a Leaf Node 61 54 54 57 58 59 66 60 61 62 27 Splitting an Internal Node … 21 99 … 59 40 54 [54, 59) 66 74 [ 59, 66) 84 [66,74) Splitting an Internal Node … 21 99 … 59 40 54 [54, 59) 66 74 [ 59, 66) 84 [66,74) Splitting an Internal Node 66 … 40 54 [54, 59) 21 99 … [21,66) [66, 99) 59 74 [ 59, 66) 84 [66,74) Splitting the Root 59 40 54 [54, 59) 66 74 [ 59, 66) 84 [66,74) Splitting the Root 59 40 54 [54, 59) 66 74 [ 59, 66) 84 [66,74) Splitting the Root 66 40 54 [54, 59) 59 74 [ 59, 66) 84 [66,74) Deletion 34 Deletion redistribute 35 Deletion 36 Deletion - II 37 Deletion - II merge Deletion - II 39 Deletion - II 40 Deletion - II 41 Deletion - II Not needed merge 42 Deletion - II 43 Deletion: Primitives Delete key from a leaf Redistribute keys between sibling leaves Merge a leaf into its sibling Redistribute keys between two sibling internal nodes Merge an internal node into its sibling 44 Merge Leaf into Sibling 72 … 54 58 64 67 85 68 72 75 45 Merge Leaf into Sibling 72 … 54 58 64 67 85 68 75 46 Merge Leaf into Sibling 72 … 54 58 64 68 67 85 75 47 Merge Leaf into Sibling 72 … 54 58 64 68 85 75 48 Merge Internal Node into Sibling … 41 48 59 … 63 52 [52, 59) 74 [59,63) 49 Merge Internal Node into Sibling … 41 48 52 59 59 … 63 [52, 59) [59,63) 50 B-Tree Roadmap B-Tree Recap Insertion (recap) Deletion Construction Efficiency B-Tree variants Hash-based Indexes 51 Question How does insertion-based construction perform? 52 B-Tree Construction Sort 48 57 41 15 75 21 62 34 81 11 97 13 53 B-Tree Construction 11 13 15 21 34 41 48 57 11 13 15 21 34 41 48 57 Scan 62 62 75 81 75 81 97 97 B-Tree Construction 21 11 13 15 21 34 48 41 Scan 75 48 57 62 75 81 97 B-Tree Construction Why is sort-based construction better than insertion-based one? 56 Cost of B-Tree Operations Height of B-Tree: H Assume no duplicates Question: what is the random I/O cost of: Insertion: Deletion: Equality search: Range Search: 57 Height of B-Tree Number of keys: N B-Tree parameter: n log N Height ≈ log n N = log n In practice: 2-3 levels 58 Question: How do you pick parameter n? 1. Ignore inserts and deletes 2. Optimize for equality searches 3. Assume no duplicates 59 Roadmap B-Tree B-Tree variants Sparse Index Duplicate Keys Hash-based Indexes 60 Roadmap B-Tree B-Tree variants Hash-based Indexes Static Hash Table Extensible Hash Table Linear Hash Table 61 Hash-Based Indexes Adaptations of main memory hash tables Support equality searches No range searches 62 Indexing Problem (recap) Index Keys record pointers a1 a2 A = val ai an Main Memory Hash Table buckets key h (key) 0 1 32 48 2 10 (null) 27 75 3 4 h (key) = key % 8 5 21 6 7 55 (null) (null) (null) (null) 64 Adapting to disk 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together No need for linked lists of keys … 65 Adapting to Disk How do we handle this? 66 Adapting to disk 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together No need for linked lists of keys … … but need linked list of blocks (overflow blocks) 67 Adapting to Disk 68 Adapting to disk Bucket Id Disk Address mapping Contiguous blocks Store mapping in main memory Too large? Dynamic Linear and Extensible hash tables 69 Beware of claims that assume 1 I/O for hash tables and 3 I/Os for B-Tree!! 70