CS 245: Database System Principles Notes 4: Indexing Chapter 4

Chapter 4 CS 245: Database System Principles Indexing & Hashing record value Notes 4: Indexing  value Hector Garcia-Molina CS 245 Notes 4 1 CS 245 Notes 4 2 Sequential File Topics 10 20 • Conventional indexes • B-trees • Hashing schemes 30 40 50 60 70 80 90 100 CS 245 Notes 4 Dense Index CS 245 Sequential File 70 80 90 100 110 120 Notes 4 50 60 70 80 170 190 210 230 90 100 5 CS 245 Sequential File 30 40 90 110 130 150 50 60 4 10 20 10 30 50 70 30 40 50 60 70 80 Notes 4 Sparse Index 10 20 10 20 30 40 CS 245 3 90 100 Notes 4 6 1 10 90 170 250 330 410 490 570 10 20 10 30 50 70 30 40 90 110 130 150 50 60 70 80 170 190 210 230 CS 245 • Comment: {FILE,INDEX} may be contiguous or not (blocks chained) Sequential File Sparse 2nd level 90 100 Notes 4 7 CS 245 Notes 4 8 Question: Notes on pointers: • Can we build a dense, 2nd level index for a dense index? (1) Block pointer (sparse index) can be smaller than record pointer BP RP CS 245 Notes 4 9 CS 245 Notes 4 Notes on pointers: 10 R1 (2) If file is contiguous, then we can omit pointers (i.e., compute them) K1 R2 K2 K3 R3 K4 R4 CS 245 Notes 4 11 CS 245 Notes 4 12 2 Sparse vs. Dense Tradeoff R1 K1 R2 say: 1024 B per block K2 K3 R3 K4 R4 (Later: • if we want K3 block: get it at offset (3-1)1024 = 2048 bytes CS 245 – sparse better for insertions – dense needed for secondary indexes) Notes 4 13 Terms • • • • • • • CS 245 14 Notes 4 • Duplicate keys • Deletion/Insertion • Secondary indexes 15 CS 245 Notes 4 16 Duplicate keys Duplicate keys Dense index, one way to implement? 10 10 20 30 10 20 20 30 20 30 30 30 30 30 40 45 Notes 4 10 10 10 10 10 20 10 20 CS 245 Notes 4 Next: Index sequential file Search key (  primary key) Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index CS 245 • Sparse: Less index space per record can keep more of index in memory • Dense: Can tell if any record exists without accessing file 17 CS 245 30 30 40 45 Notes 4 18 3 Duplicate keys Duplicate keys Dense index, better way? Sparse index, one way? 10 10 10 20 30 40 CS 245 10 10 10 10 20 30 10 20 10 20 20 30 20 30 30 30 40 45 30 30 40 45 Notes 4 19 CS 245 Notes 4 20 Duplicate keys Sparse index, one way? Sparse index, another way? careful if looking for 20 or 30! Duplicate keys 10 10 20 30 CS 245 – place first new key from block 10 10 10 20 20 30 30 30 40 45 30 30 40 45 21 Duplicate keys 10 20 30 30 CS 245 10 20 Duplicate values, primary index a a 30 30 40 45 Notes 4 22 • Index may point to first instance of each value only File Index a 10 10 20 30 CS 245 Notes 4 Summary Sparse index, another way? should this be 40? 10 20 20 30 Notes 4 – place first new key from block 10 10 10 20 30 30   b 23 CS 245 Notes 4 24 4 Deletion from sparse index Deletion from sparse index – delete record 40 10 20 10 30 50 70 30 40 50 60 90 110 130 150 CS 245 25 Deletion from sparse index CS 245 50 60 Notes 4 27 30 40 50 60 CS 245 70 80 Notes 4 28 Deletion from sparse index – delete record 30 – delete records 30 & 40 10 10 20 50 70 30 40 40 10 20 10 30 50 70 50 60 30 40 50 60 90 110 130 150 70 80 Notes 4 26 10 20 90 110 130 150 70 80 90 110 130 150 Notes 4 10 30 50 70 30 40 40 30 70 80 – delete record 30 Deletion from sparse index CS 245 CS 245 10 20 90 110 130 150 50 60 Deletion from sparse index – delete record 40 10 30 50 70 30 40 90 110 130 150 70 80 Notes 4 10 20 10 30 50 70 29 CS 245 70 80 Notes 4 30 5 Deletion from sparse index Deletion from sparse index – delete records 30 & 40 – delete records 30 & 40 10 20 10 30 50 70 30 40 50 60 90 110 130 150 CS 245 30 40 50 60 90 110 130 150 70 80 Notes 4 10 20 10 50 30 70 50 70 31 Deletion from dense index CS 245 70 80 Notes 4 32 Deletion from dense index – delete record 30 10 20 10 20 30 40 30 40 50 60 50 60 70 80 CS 245 33 Deletion from dense index CS 245 CS 245 Notes 4 34 10 20 10 20 40 30 40 30 40 40 50 60 30 40 40 50 60 50 60 70 80 70 80 Notes 4 70 80 – delete record 30 10 20 50 60 70 80 50 60 Deletion from dense index – delete record 30 10 20 30 40 30 40 50 60 70 80 70 80 Notes 4 10 20 10 20 30 40 35 CS 245 70 80 Notes 4 36 6 Insertion, sparse index case Insertion, sparse index case – insert record 34 10 20 10 30 40 60 CS 245 30 40 50 60 60 37 Insertion, sparse index case CS 245 CS 245 38 – insert record 15 10 20 10 20 10 30 40 60 30 34 30 40 50 40 50 60 60 Notes 4 39 Insertion, sparse index case CS 245 Notes 4 40 Insertion, sparse index case – insert record 15 – insert record 15 10 10 20 15 40 60 30 20 30 20 30 Notes 4 Insertion, sparse index case – insert record 34 • our lucky day! we have free space where we need it! 30 40 50 Notes 4 10 30 40 60 10 20 10 30 40 60 10 10 20 15 40 60 30 20 30 20 30 40 50 40 50 60 • Illustrated: Immediate reorganization 60 • Variation: – insert new block (chained file) – update index CS 245 Notes 4 41 CS 245 Notes 4 42 7 Insertion, sparse index case Insertion, sparse index case – insert record 25 10 30 40 60 – insert record 25 10 20 10 30 40 60 30 10 20 30 40 50 40 50 60 60 CS 245 Notes 4 43 25 CS 245 overflow blocks (reorganize later...) Notes 4 44 Secondary indexes Insertion, dense index case Sequence field • Similar 30 50 • Often more expensive . . . 20 70 80 40 100 10 90 60 CS 245 Notes 4 45 Secondary indexes 90 ... Notes 4 Sequence field 30 50 20 70 30 20 80 100 80 40 90 ... 80 40 does not make sense! 100 10 90 60 100 10 90 60 CS 245 46 • Sparse index 30 50 30 20 80 100 Notes 4 Secondary indexes Sequence field • Sparse index CS 245 47 CS 245 20 70 Notes 4 48 8 Secondary indexes Secondary indexes Sequence field • Dense index 30 50 100 10 90 60 Notes 4 49 10 50 90 ... sparse high level CS 245 30 50 10 20 30 40 80 40 51 CS 245 Notes 4 52 Duplicate values & secondary indexes one option... 20 10 20 10 10 10 10 20 20 40 20 40 20 30 40 40 10 40 10 40 10 40 10 40 40 40 ... 30 40 Notes 4 50 (not block pointers; not computed) Duplicate values & secondary indexes CS 245 Notes 4 Also: Pointers are record pointers 100 10 90 60 Notes 4 100 10 90 60 • Lowest level is dense • Other levels are sparse 20 70 50 60 70 ... CS 245 80 40 With secondary indexes: Sequence field • Dense index 20 70 50 60 70 ... 80 40 Secondary indexes 30 50 10 20 30 40 20 70 CS 245 Sequence field • Dense index 53 CS 245 30 40 Notes 4 54 9 Duplicate values & secondary indexes one option... Problem: excess overhead! • disk space • search time another option... 20 10 10 10 10 20 20 40 20 30 40 40 10 40 55 another option... 20 40 10 40 10 40 10 20 10 20 20 40 CS 245 Notes 4 57 Duplicate values & secondary indexes 20 10 CS 245 10 40 Another idea (suggested in class): Chain records with same key? 30 40    58 20 40 10 40 50 60 ... 10 40 30 40  Problems: • Need to add fields to records • Need to follow chain to know records Notes 4 30 40 20 10 10 20 30 40   Notes 4  10 40 10 40 Duplicate values & secondary indexes 20 40 50 60 ... 10 40 Another idea (suggested in class): Chain records with same key? Notes 4  20 40 50 60 ... 10 40 10 20 30 40 20 10 10 20 30 40 10 40 30 40 56 Duplicate values & secondary indexes 30 40 CS 245 20 30 40 Duplicate values & secondary indexes CS 245 20 10 30 40 Notes 4 Problem: variable size records in index! 10 30 40 10 40 40 40 ... CS 245 Duplicate values & secondary indexes buckets 59 CS 245 Notes 4 60 10 Query: Get employees in (Toy Dept) ^ (2nd floor) Why “bucket” idea is useful Indexes Name: primary Dept: secondary Floor: secondary CS 245 Dept. index Records EMP (name,dept,floor,...) Notes 4 61 EMP Floor index Toy CS 245 2nd Notes 4 62 This idea used in text information retrieval Query: Get employees in (Toy Dept) ^ (2nd floor) Dept. index EMP Floor index Toy Documents ...the cat is fat ... 2nd ...was raining cats and dogs... ...Fido the dog ...  Intersect toy bucket and 2nd Floor bucket to get set of matching EMP’s CS 245 Notes 4 63 This idea used in text information retrieval 64 • Find articles with “cat” and “dog” • Find articles with “cat” or “dog” • Find articles with “cat” and not “dog” ...the cat is fat ... dog Notes 4 IR QUERIES Documents cat CS 245 ...was raining cats and dogs... ...Fido the dog ... Inverted lists CS 245 Notes 4 65 CS 245 Notes 4 66 11 Common technique: more info in inverted list IR QUERIES • Find articles with “cat” and “dog” • Find articles with “cat” or “dog” • Find articles with “cat” and not “dog” cat Title d1 5 Author 10 Abstract 57 • Find articles with “cat” in title • Find articles with “cat” and “dog” within 5 words CS 245 dog Notes 4 67 Posting: an entry in inverted list. Represents occurrence of term in article Size of a list: 1 Rare words or miss-spellings 106 Common words (in postings) Size of a posting: 10-15 bits CS 245 Title 100 Title 12 CS 245 d2 d3 Notes 4 68 IR DISCUSSION • • • • • Stop words Truncation Thesaurus Full text vs. Abstracts Vector model (compressed) Notes 4 69 CS 245 Notes 4 70 Vector space model Vector space model w1 w2 w3 w4 w5 w6 w7 … DOC = <1 0 0 1 1 0 0 …> w1 w2 w3 w4 w5 w6 w7 … DOC = <1 0 0 1 1 0 0 …> Query= <0 Query= <0 0 1 1 0 0 0 …> 0 1 PRODUCT = CS 245 Notes 4 71 CS 245 1 0 0 0 …> 1 + ……. = score Notes 4 72 12 • How to process V.S. Queries? • Tricks to weigh scores + normalize w1 w2 w3 w4 w5 w6 Q=<0 0 0 1 1 0 e.g.: Match on common word not as useful as match on rare words... CS 245 Notes 4 73 • Try Stanford Libraries • Try Google, Yahoo, ... CS 245 … …> Notes 4 74 Summary so far • Conventional index – Basic Ideas: sparse, dense, multi-level… – Duplicate Keys – Deletion/Insertion – Secondary indexes – Buckets of Postings List CS 245 Notes 4 75 CS 245 Example Conventional indexes continuous Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance Notes 4 Index 76 (sequential) 10 20 30 Advantage: - Simple - Index is sequential file good for scans CS 245 Notes 4 40 50 60 free space 70 80 90 77 CS 245 Notes 4 78 13 Example Index (sequential) 10 20 30 33 40 50 60 continuous • Conventional indexes • B-Trees  NEXT • Hashing schemes 32 38 34 free space 70 80 90 CS 245 Outline: 39 31 35 36 overflow area (not sequential) Notes 4 79 CS 245 Notes 4 80 B+Tree Example • NEXT: Another type of index – Give up on sequentiality of index – Try to get “balance” n=3 CS 245 Notes 4 81 CS 245 180 200 150 156 179 120 130 100 101 110 3 5 11 Note: This index is called B+ tree, but Gradiance homeworks just call it B-tree. 30 35 30 120 150 180 100 Root Notes 4 82 Sample leaf node: Sample non-leaf CS 245 Notes 4 to keys 95 83 CS 245 95 81k<95 81 to keys 57 k<81 To record with key 85 to keys < 57 To record with key 81 to keys To record with key 57 57 95 81 57 From non-leaf node Notes 4 to next leaf in sequence 84 14 In textbook’s notation n=3 Leaf: Size of nodes: n+1 pointers (fixed) n keys 30 35 30 35 Non-leaf: 30 30 CS 245 Notes 4 85 CS 245 Notes 4 86 n=3 Don’t want nodes to be too empty Full node min. node CS 245 B+tree rules Notes 4 87 tree of order n Notes 4 CS 245 Notes 4 88 (3) Number of pointers/keys for B+tree (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” CS 245 Leaf counts even if null (n+1)/2 pointers to data 30 Leaf: Non-leaf 30 35 (n+1)/2 pointers 120 150 180 Non-leaf: 3 5 11 • Use at least Max Max Min ptrs keys ptrsdata Non-leaf (non-root) Leaf (non-root) Root 89 CS 245 Min keys n+1 n (n+1)/2 (n+1)/2- 1 n+1 n (n+1)/2 (n+1)/2 n+1 n 1 1 Notes 4 90 15 (a) Insert key = 32 n=3 100 Insert into B+tree (a) simple case – space available in leaf CS 245 Notes 4 91 (a) Insert key = 32 30 31 3 5 11 30 (b) leaf overflow (c) non-leaf overflow (d) new root CS 245 Notes 4 (a) Insert key = 7 n=3 30 100 100 n=3 30 31 32 3 5 11 30 31 30 Notes 4 93 (a) Insert key = 7 CS 245 Notes 4 (a) Insert key = 7 n=3 CS 245 30 7 95 30 31 3 57 11 3 5 30 31 3 57 11 3 5 30 100 n=3 Notes 4 94 100 3 5 11 CS 245 CS 245 92 Notes 4 96 16 (c) Insert key = 160 (c) Insert key = 160 150 156 179 99 CS 245 180 Notes 4 100 (d) New root, insert 45 n=3 Notes 4 101 CS 245 Notes 4 40 45 30 32 40 20 25 10 12 1 2 3 30 32 40 20 25 10 20 30 n=3 10 20 30 10 12 n=3 120 150 180 100 180 200 160 179 180 120 150 180 150 156 179 Notes 4 (d) New root, insert 45 98 (c) Insert key = 160 n=3 100 CS 245 1 2 3 Notes 4 160 179 (c) Insert key = 160 CS 245 160 179 150 156 179 97 180 200 Notes 4 180 200 120 150 180 100 CS 245 CS 245 n=3 160 150 156 179 180 200 120 150 180 100 n=3 102 17 (d) New root, insert 45 CS 245 103 30 CS 245 Notes 4 104 (b) Coalesce with sibling Deletion from B+tree 105 (b) Coalesce with sibling CS 245 n=4 10 40 100 10 20 30 35 40 50 Notes 4 106 – Delete 50 10 40 100 10 20 30 40 CS 245 Notes 4 (c) Redistribute keys n=4 – Delete 50 40 50 10 40 100 10 20 30 Notes 4 107 CS 245 40 50 CS 245 n=4 – Delete 50 (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf 40 45 20 25 10 12 10 20 30 Notes 4 1 2 3 40 45 40 30 32 40 20 25 10 12 1 2 3 10 20 30 new root n=3 40 n=3 30 32 40 (d) New root, insert 45 Notes 4 108 18 (c) Redistribute keys (d) Non-leaf coalesce n=4 n=4 – Delete 37 Notes 4 109 (d) Non-leaf coalesce 30 40 40 45 30 37 40 45 30 40 30 37 20 22 40 Notes 4 112 B+tree deletions in practice 25 – Often, coalescing is not implemented new root Notes 4 40 45 30 37 25 26 30 30 40 40 20 22 10 20 25 – Too hard and not worth it! 10 14 1 3 n=4 25 n=4 – Delete 37 CS 245 10 14 10 20 111 1 3 40 45 30 37 30 40 Notes 4 (d) Non-leaf coalesce CS 245 110 – Delete 37 25 20 22 25 26 30 10 20 10 14 1 3 CS 245 25 26 20 22 Notes 4 (d) Non-leaf coalesce n=4 – Delete 37 CS 245 25 26 30 CS 245 10 14 1 3 10 20 30 35 35 40 50 10 20 10 40 35 100 25 – Delete 50 113 CS 245 Notes 4 114 19 Comparison: B-trees vs. static indexed sequential file Ref # 1 claims: - Concurrency control harder in B-Trees - B-tree consumes more space Ref #1: Held & Stonebraker “B-Trees Re-examined” CACM, Feb. 1978 CS 245 For their comparison: block = 512 bytes key = pointer = 4 bytes 4 data records per block Notes 4 115 Example: 1 block static index 127 keys k1 1 data block 116 k1 1 data block k2 k2 k2 63 keys k3 k3 Notes 4 Example: 1 block B-tree k1 k1 CS 245 k2 ... k63 k3 next (127+1)4 = 512 Bytes -> pointers in index implicit! CS 245 Notes 4 Size comparison CS 245 2 3 4 117 Ref. #1 Static Index # data blocks height 2 -> 127 128 -> 16,129 16,130 -> 2,048,383 63x(4+4)+8 = 512 Bytes -> pointers needed in B-tree blocks because index is not contiguous up to 127 blocks Notes 4 Notes 4 118 Ref. #1 analysis claims B-tree # data blocks height 2 -> 63 64 -> 3968 3969 -> 250,047 250,048 -> 15,752,961 CS 245 up to 63 blocks 2 3 4 5 119 • For an 8,000 block file, after 32,000 inserts after 16,000 lookups  Static index saves enough accesses to allow for reorganization CS 245 Notes 4 120 20 Ref. #1 analysis claims Ref #2: M. Stonebraker, “Retrospection on a database system,” TODS, June 1980 • For an 8,000 block file, after 32,000 inserts after 16,000 lookups  Static index saves enough accesses to allow for reorganization Ref. #1 conclusion CS 245 Static index better!! Notes 4 121 B-trees better!! Ref. #2 conclusion Notes 4 CS 245 Notes 4 123 • Speaking of buffering… Is LRU a good policy for B+tree buffers? 122 B-trees better!! Ref. #2 conclusion • DBA does not know when to reorganize • DBA does not know how full to load pages of new index CS 245 B-trees better!! Ref. #2 conclusion • Buffering – B-tree: has fixed buffer requirements – Static index: must read several overflow blocks to be efficient (large & variable size buffers needed for this) CS 245 Notes 4 124 • Speaking of buffering… Is LRU a good policy for B+tree buffers?  Of course not!  Should try to keep root in memory at all times (and perhaps some nodes from second level) CS 245 Notes 4 125 CS 245 Notes 4 126 21 Sample assumptions: Interesting problem: (1) Time to read node from disk is (S+Tn) msec. For B+tree, how large should n be? … n is number of keys / node CS 245 Notes 4 127 Sample assumptions: CS 245 Notes 4 128 Sample assumptions: (1) Time to read node from disk is (S+Tn) msec. (2) Once block in memory, use binary search to locate key: (a + b LOG2 n) msec. (1) Time to read node from disk is (S+Tn) msec. (2) Once block in memory, use binary search to locate key: (a + b LOG2 n) msec. For some constants a,b; Assume a << S For some constants a,b; Assume a << S (3) Assume B+tree is full, i.e., # nodes to examine is LOGn N where N = # records CS 245 Notes 4 129 Can get: Notes 4 130  FIND nopt by f’(n) = 0 f(n) = time to find a record Answer is nopt = “few hundred” (see homework for details) f(n) nopt CS 245 CS 245 Notes 4 n 131 CS 245 Notes 4 132 22 Exercise  FIND nopt by f’(n) = 0 Answer is nopt = “few hundred” (see homework for details) S=  What happens to nopt as 14000 T= 0.2 b= 0.002 a= 0 N= 10000000 • f(n)= lognN*[S+T*n+a+b*log2(n)] • Disk gets faster? • CPU get faster? CS 245 Notes 4 133 S= T= b= a= N= N=10 million records CS 245 14000 0.2 0.002 0 10000000 Notes 4 N=100 million records times in microseconds Notes 4 14000 0.2 0.002 0 10000000 times in microseconds 135 N=100 million records S= T= b= a= N= n n CS 245 134 S= T= b= a= N= CS 245 varies 0.2 0.002 0 10000000 Notes 4 136 Variation on B+tree: B-tree (no +) • Idea: – Avoid duplicate keys – Have record pointers in non-leaf nodes S=20000 S=14000 S=10000 n times in microseconds CS 245 Notes 4 137 CS 245 Notes 4 138 23 139 n=2 • sequence pointers not useful now! 170 180 150 160 130 140 110 120 90 100 170 180 150 160 130 140 110 120 145 165 CS 245 Notes 4 142 So, for B-trees: • Say we insert record with key = 25 MAX 10 20 30 n=3 Non-leaf non-root Leaf non-root Root non-leaf Root Leaf 25 30 – 20 – 10 CS 245 n=3 10 20 30 65 125 141 Note on inserts • Afterwards: 140 Note on inserts Notes 4 leaf 70 80 Notes 4 leaf 85 105 90 100 70 80 25 45 50 60 30 40 10 20 CS 245 • Say we insert record with key = 25 (but keep space for simplicity) CS 245 50 60 to keys >k3 Notes 4 B-tree example 145 165 25 45 to record to record to record with K1 with K2 with K3 to keys to keys K1<x<K2 K2<x<k3 CS 245 85 105 K3 P3 10 20 to keys < K1 K2 P2 30 40 K1 P1 n=2 65 125 B-tree example Notes 4 143 CS 245 MIN Tree Ptrs Rec Keys Ptrs Tree Ptrs Rec Ptrs Keys n+1 n n 1 n n 1 n/2 n+1 n n 2 1 1 1 n n 1 1 1 (n+1)/2 (n+1)/2-1 (n+1)/2-1 Notes 4 n/2 144 24 Tradeoffs: Tradeoffs:  B-trees have faster lookup than B+trees  B-trees have faster lookup than B+trees  in B-tree, non-leaf & leaf different sizes  in B-tree, deletion more complicated  in B-tree, non-leaf & leaf different sizes  in B-tree, deletion more complicated  B+trees preferred! CS 245 Notes 4 145 But note: Example: - Pointers - Keys - Blocks - Look at full • If blocks are fixed size (due to disk and buffering restrictions) Then lookup for B+tree is actually better!! CS 245 Notes 4 CS 245 147 CS 245 Notes 4 146 4 bytes 4 bytes 100 bytes (just example) 2 level tree Notes 4 148 B-tree: B-tree: Root has 8 keys + 8 record pointers + 9 son pointers = 8x4 + 8x4 + 9x4 = 100 bytes Root has 8 keys + 8 record pointers + 9 son pointers = 8x4 + 8x4 + 9x4 = 100 bytes Each of 9 sons: 12 rec. pointers (+12 keys) = 12x(4+4) + 4 = 100 bytes CS 245 Notes 4 149 CS 245 Notes 4 150 25 B-tree: B+tree: Root has 8 keys + 8 record pointers + 9 son pointers = 8x4 + 8x4 + 9x4 = 100 bytes Root has 12 keys + 13 son pointers = 12x4 + 13x4 = 100 bytes Each of 9 sons: 12 rec. pointers (+12 keys) = 12x(4+4) + 4 = 100 bytes 2-level B-tree, Max # records = 12x9 + 8 = 116 CS 245 Notes 4 151 CS 245 Notes 4 152 B+tree: B+tree: Root has 12 keys + 13 son pointers = 12x4 + 13x4 = 100 bytes Root has 12 keys + 13 son pointers = 12x4 + 13x4 = 100 bytes Each of 13 sons: 12 rec. ptrs (+12 keys) = 12x(4 +4) + 4 = 100 bytes Each of 13 sons: 12 rec. ptrs (+12 keys) = 12x(4 +4) + 4 = 100 bytes 2-level B+tree, Max # records = 13x12 = 156 CS 245 Notes 4 153 So... 8 records CS 245 Notes 4 154 So... 8 records B+ B B+ B ooooooooooooo ooooooooo ooooooooooooo ooooooooo 156 records 108 records Total = 116 156 records 108 records Total = 116 • Conclusion: – For fixed block size, – B+ tree is better because it is bushier CS 245 Notes 4 155 CS 245 Notes 4 156 26 An Interesting Problem... One Solution: Multiple Indexes • What is a good index structure when: • Example: I1, I2 – records tend to be inserted with keys that are larger than existing values? day (e.g., banking records with growing data/time) 10 11 12 13 – we want to remove older data days indexed I1 1,2,3,4,5 11,2,3,4,5 11,12,3,4,5 11,12,13,4,5 days indexed I2 6,7,8,9,10 6,7,8,9,10 6,7,8,9,10 6,7,8,9,10 •advantage: deletions/insertions from smaller index •disadvantage: query multiple indexes CS 245 Notes 4 157 CS 245 Another Solution (Wave Indexes) day 10 11 12 13 14 15 16 I1 1,2,3 1,2,3 1,2,3 13 13,14 13,14,15 13,14,15 I2 4,5,6 4,5,6 4,5,6 4,5,6 4,5,6 4,5,6 16 I3 7,8,9 7,8,9 7,8,9 7,8,9 7,8,9 7,8,9 7,8,9 I4 10 10,11 10,11, 10,11, 10,11, 10,11, 10,11, Notes 4 159 158 Outline/summary • Conventional Indexes 12 12 12 12 12 • Sparse vs. dense • Primary vs. secondary • B trees • B+trees vs. B-trees • B+trees vs. indexed sequential • Hashing schemes •advantage: no deletions •disadvantage: approximate windows CS 245 Notes 4 CS 245 --> Next Notes 4 160 27

CS 245: Database System Principles Notes 4: Indexing Chapter 4

Related documents

Products

Support

CS 245: Database System Principles Notes 4: Indexing Chapter 4

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib