Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu

Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu 1 Insertion in a B-Tree n=2 49 15 36 49 Insert: 62 2 Insertion in a B-Tree n=2 49 15 36 49 62 Insert: 62 3 Insertion in a B-Tree n=2 49 15 36 49 62 Insert: 50 4 Insertion in a B-Tree 49 15 36 n=2 62 49 50 62 Insert: 50 5 Insertion in a B-Tree 49 15 36 n=2 62 49 50 62 Insert: 75 6 Insertion in a B-Tree 49 15 36 n=2 62 49 50 62 75 Insert: 75 7 Insertion 8 Insertion 9 Insertion 10 Insertion 11 Insertion 12 Insertion 13 Insertion 14 Insertion 15 Insertion 16 Insertion 17 Insertion 18 Insertion: Primitives     Inserting into a leaf node Splitting a leaf node Splitting an internal node Splitting root node 19 Inserting into a Leaf Node 58 54 57 60 62 20 Inserting into a Leaf Node 58 54 57 60 62 21 Inserting into a Leaf Node 58 54 57 58 60 62 22 Splitting a Leaf Node 61 54 54 57 58 60 66 62 23 Splitting a Leaf Node 61 54 54 57 58 60 66 62 24 Splitting a Leaf Node 61 54 54 57 58 66 60 61 62 25 Splitting a Leaf Node 59 61 54 54 57 58 66 60 61 62 26 Splitting a Leaf Node 61 54 54 57 58 59 66 60 61 62 27 Splitting an Internal Node … 21 99 … 59 40 54 [54, 59) 66 74 [ 59, 66) 84 [66,74) Splitting an Internal Node … 21 99 … 59 40 54 [54, 59) 66 74 [ 59, 66) 84 [66,74) Splitting an Internal Node 66 … 40 54 [54, 59) 21 99 … [21,66) [66, 99) 59 74 [ 59, 66) 84 [66,74) Splitting the Root 59 40 54 [54, 59) 66 74 [ 59, 66) 84 [66,74) Splitting the Root 59 40 54 [54, 59) 66 74 [ 59, 66) 84 [66,74) Splitting the Root 66 40 54 [54, 59) 59 74 [ 59, 66) 84 [66,74) Deletion 34 Deletion redistribute 35 Deletion 36 Deletion - II 37 Deletion - II merge Deletion - II 39 Deletion - II 40 Deletion - II 41 Deletion - II Not needed merge 42 Deletion - II 43 Deletion: Primitives      Delete key from a leaf Redistribute keys between sibling leaves Merge a leaf into its sibling Redistribute keys between two sibling internal nodes Merge an internal node into its sibling 44 Merge Leaf into Sibling 72 … 54 58 64 67 85 68 72 75 45 Merge Leaf into Sibling 72 … 54 58 64 67 85 68 75 46 Merge Leaf into Sibling 72 … 54 58 64 68 67 85 75 47 Merge Leaf into Sibling 72 … 54 58 64 68 85 75 48 Merge Internal Node into Sibling … 41 48 59 … 63 52 [52, 59) 74 [59,63) 49 Merge Internal Node into Sibling … 41 48 52 59 59 … 63 [52, 59) [59,63) 50 B-Tree Roadmap  B-Tree Recap  Insertion (recap)  Deletion  Construction  Efficiency    B-Tree variants Hash-based Indexes 51 Question How does insertion-based construction perform? 52 B-Tree Construction Sort 48 57 41 15 75 21 62 34 81 11 97 13 53 B-Tree Construction 11 13 15 21 34 41 48 57 11 13 15 21 34 41 48 57 Scan 62 62 75 81 75 81 97 97 B-Tree Construction 21 11 13 15 21 34 48 41 Scan 75 48 57 62 75 81 97 B-Tree Construction Why is sort-based construction better than insertion-based one? 56 Cost of B-Tree Operations    Height of B-Tree: H Assume no duplicates Question: what is the random I/O cost of: Insertion:  Deletion:  Equality search:  Range Search:  57 Height of B-Tree   Number of keys: N B-Tree parameter: n log N Height ≈ log n N = log n In practice: 2-3 levels 58 Question: How do you pick parameter n? 1. Ignore inserts and deletes 2. Optimize for equality searches 3. Assume no duplicates 59 Roadmap   B-Tree B-Tree variants Sparse Index  Duplicate Keys   Hash-based Indexes 60 Roadmap    B-Tree B-Tree variants Hash-based Indexes Static Hash Table  Extensible Hash Table  Linear Hash Table  61 Hash-Based Indexes    Adaptations of main memory hash tables Support equality searches No range searches 62 Indexing Problem (recap) Index Keys record pointers a1 a2 A = val ai an Main Memory Hash Table buckets key h (key) 0 1 32 48 2 10 (null) 27 75 3 4 h (key) = key % 8 5 21 6 7 55 (null) (null) (null) (null) 64 Adapting to disk  1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block  Intuition: keys in a bucket usually accessed together  No need for linked lists of keys …  65 Adapting to Disk How do we handle this? 66 Adapting to disk  1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block  Intuition: keys in a bucket usually accessed together  No need for linked lists of keys …  … but need linked list of blocks (overflow blocks)  67 Adapting to Disk 68 Adapting to disk  Bucket Id  Disk Address mapping Contiguous blocks  Store mapping in main memory  Too large?  Dynamic  Linear and Extensible hash tables  69 Beware of claims that assume 1 I/O for hash tables and 3 I/Os for B-Tree!! 70

Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu

Related documents

Products

Support

Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib