CS 245: Database System Principles Notes 4: Indexing Hector Garcia-Molina CS 245 Notes 4 1 Chapter 4 Indexing & Hashing value CS 245 record value ? Notes 4 2 Topics • Conventional indexes • B-trees • Hashing schemes CS 245 Notes 4 3 Sequential File 10 20 30 40 50 60 70 80 90 100 CS 245 Notes 4 4 Dense Index 10 20 10 20 30 40 30 40 50 60 70 80 50 60 70 80 90 100 90 100 110 120 CS 245 Sequential File Notes 4 5 Sparse Index 10 20 10 30 50 70 30 40 90 110 130 150 50 60 70 80 90 100 170 190 210 230 CS 245 Sequential File Notes 4 6 Sequential File Sparse 2nd level 10 90 170 250 330 410 490 570 10 20 10 30 50 70 30 40 90 110 130 150 50 60 70 80 90 100 170 190 210 230 CS 245 Notes 4 7 • Comment: {FILE,INDEX} may be contiguous or not (blocks chained) CS 245 Notes 4 8 Question: • Can we build a dense, 2nd level index for a dense index? CS 245 Notes 4 9 Notes on pointers: (1) Block pointer (sparse index) can be smaller than record pointer BP RP CS 245 Notes 4 10 Notes on pointers: (2) If file is contiguous, then we can omit pointers (i.e., compute them) CS 245 Notes 4 11 R1 K1 R2 K2 K3 R3 K4 R4 CS 245 Notes 4 12 R1 K1 R2 K2 K3 R3 say: 1024 B per block K4 R4 • if we want K3 block: get it at offset (3-1)1024 = 2048 bytes CS 245 Notes 4 13 Sparse vs. Dense Tradeoff • Sparse: Less index space per record can keep more of index in memory • Dense: Can tell if any record exists without accessing file (Later: – sparse better for insertions – dense needed for secondary indexes) CS 245 Notes 4 14 Terms • • • • • • • Index sequential file Search key ( primary key) Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index CS 245 Notes 4 15 Next: • Duplicate keys • Deletion/Insertion • Secondary indexes CS 245 Notes 4 16 Duplicate keys 10 10 10 20 20 30 30 30 40 45 CS 245 Notes 4 17 Duplicate keys Dense index, one way to implement? 10 10 10 10 10 20 10 20 20 30 20 30 30 30 CS 245 30 30 40 45 Notes 4 18 Duplicate keys Dense index, better way? 10 10 10 20 30 40 10 20 20 30 30 30 40 45 CS 245 Notes 4 19 Duplicate keys Sparse index, one way? 10 10 10 10 20 30 10 20 20 30 30 30 40 45 CS 245 Notes 4 20 Duplicate keys careful if looking for 20 or 30! Sparse index, one way? CS 245 10 10 10 10 20 30 10 20 20 30 30 30 40 45 Notes 4 21 Duplicate keys Sparse index, another way? – place first new key from block 10 20 30 30 10 10 10 20 20 30 30 30 40 45 CS 245 Notes 4 22 Duplicate keys Sparse index, another way? – place first new key from block should this be 40? 10 20 30 30 10 10 10 20 20 30 30 30 40 45 CS 245 Notes 4 23 Summary Duplicate values, primary index • Index may point to first instance of each value only File Index a a a . . b CS 245 Notes 4 24 Deletion from sparse index 10 20 10 30 50 70 30 40 50 60 90 110 130 150 CS 245 70 80 Notes 4 25 Deletion from sparse index – delete record 40 10 20 10 30 50 70 30 40 50 60 90 110 130 150 CS 245 70 80 Notes 4 26 Deletion from sparse index – delete record 40 10 20 10 30 50 70 30 40 50 60 90 110 130 150 CS 245 70 80 Notes 4 27 Deletion from sparse index – delete record 30 10 20 10 30 50 70 30 40 50 60 90 110 130 150 CS 245 70 80 Notes 4 28 Deletion from sparse index – delete record 30 10 20 10 40 30 50 70 30 40 40 50 60 90 110 130 150 CS 245 70 80 Notes 4 29 Deletion from sparse index – delete records 30 & 40 10 20 10 30 50 70 30 40 50 60 90 110 130 150 CS 245 70 80 Notes 4 30 Deletion from sparse index – delete records 30 & 40 10 20 10 30 50 70 30 40 50 60 90 110 130 150 CS 245 70 80 Notes 4 31 Deletion from sparse index – delete records 30 & 40 10 20 10 50 30 70 50 70 30 40 50 60 90 110 130 150 CS 245 70 80 Notes 4 32 Deletion from dense index 10 20 10 20 30 40 30 40 50 60 50 60 70 80 CS 245 70 80 Notes 4 33 Deletion from dense index – delete record 30 10 20 10 20 30 40 30 40 50 60 50 60 70 80 CS 245 70 80 Notes 4 34 Deletion from dense index – delete record 30 10 20 10 20 30 40 30 40 40 50 60 50 60 70 80 CS 245 70 80 Notes 4 35 Deletion from dense index – delete record 30 10 20 10 20 40 30 40 30 40 40 50 60 50 60 70 80 CS 245 70 80 Notes 4 36 Insertion, sparse index case 10 20 10 30 40 60 30 40 50 60 CS 245 Notes 4 37 Insertion, sparse index case – insert record 34 10 20 10 30 40 60 30 40 50 60 CS 245 Notes 4 38 Insertion, sparse index case – insert record 34 10 20 10 30 40 60 30 34 40 50 • our lucky day! we have free space where we need it! CS 245 Notes 4 60 39 Insertion, sparse index case – insert record 15 10 20 10 30 40 60 30 40 50 60 CS 245 Notes 4 40 Insertion, sparse index case – insert record 15 10 20 15 10 20 30 40 60 30 20 30 40 50 60 CS 245 Notes 4 41 Insertion, sparse index case – insert record 15 10 20 15 10 20 30 40 60 30 20 30 40 50 • Illustrated: Immediate reorganization 60 • Variation: – insert new block (chained file) – update index CS 245 Notes 4 42 Insertion, sparse index case – insert record 25 10 30 40 60 10 20 30 40 50 60 CS 245 Notes 4 43 Insertion, sparse index case – insert record 25 10 30 40 60 10 20 25 30 overflow blocks (reorganize later...) 40 50 60 CS 245 Notes 4 44 Insertion, dense index case • Similar • Often more expensive . . . CS 245 Notes 4 45 Secondary indexes Sequence field 30 50 20 70 80 40 100 10 90 60 CS 245 Notes 4 46 Secondary indexes Sequence field • Sparse index 30 50 30 20 80 100 20 70 80 40 90 ... 100 10 90 60 CS 245 Notes 4 47 Secondary indexes Sequence field • Sparse index 30 50 30 20 80 100 20 70 80 40 90 ... 100 10 does not make sense! CS 245 Notes 4 90 60 48 Secondary indexes Sequence field • Dense index 30 50 20 70 80 40 100 10 90 60 CS 245 Notes 4 49 Secondary indexes Sequence field • Dense index 10 20 30 40 30 50 50 60 70 ... 80 40 20 70 100 10 90 60 CS 245 Notes 4 50 Secondary indexes Sequence field • Dense index 10 50 90 ... sparse high level CS 245 10 20 30 40 30 50 50 60 70 ... 80 40 20 70 100 10 90 60 Notes 4 51 With secondary indexes: • Lowest level is dense • Other levels are sparse Also: Pointers are record pointers (not block pointers; not computed) CS 245 Notes 4 52 Duplicate values & secondary indexes 20 10 20 40 10 40 10 40 30 40 CS 245 Notes 4 53 Duplicate values & secondary indexes one option... 20 10 10 10 10 20 20 40 20 30 40 40 10 40 10 40 30 40 40 40 ... CS 245 Notes 4 54 Duplicate values & secondary indexes one option... Problem: excess overhead! • disk space • search time CS 245 20 10 10 10 10 20 20 40 20 30 40 40 10 40 10 40 30 40 40 40 ... Notes 4 55 Duplicate values & secondary indexes another option... 10 20 10 20 20 40 10 40 30 40 CS 245 10 40 30 40 Notes 4 56 Duplicate values & secondary indexes another option... Problem: variable size records in index! CS 245 10 20 10 20 20 40 10 40 30 40 10 40 30 40 Notes 4 57 Duplicate values & secondary indexes 20 10 10 20 30 40 20 40 10 40 50 60 ... Another idea (suggested in class): Chain records with same key? CS 245 Notes 4 10 40 30 40 58 Duplicate values & secondary indexes 20 10 10 20 30 40 20 40 10 40 50 60 ... Another idea (suggested in class): Chain records with same key? 10 40 30 40 Problems: • Need to add fields to records • Need to follow chain to know records CS 245 Notes 4 59 Duplicate values & secondary indexes 20 10 10 20 30 40 20 40 10 40 50 60 ... 10 40 30 40 buckets CS 245 Notes 4 60 Why “bucket” idea is useful Indexes Name: primary Dept: secondary Floor: secondary CS 245 Records EMP (name,dept,floor,...) Notes 4 61 Query: Get employees in (Toy Dept) ^ (2nd floor) Dept. index EMP Toy CS 245 Floor index 2nd Notes 4 62 Query: Get employees in (Toy Dept) ^ (2nd floor) Dept. index EMP Floor index Toy 2nd Intersect toy bucket and 2nd Floor bucket to get set of matching EMP’s CS 245 Notes 4 63 This idea used in text information retrieval Documents ...the cat is fat ... ...was raining cats and dogs... ...Fido the dog ... CS 245 Notes 4 64 This idea used in text information retrieval Documents cat ...the cat is fat ... dog ...was raining cats and dogs... ...Fido the dog ... Inverted lists CS 245 Notes 4 65 IR QUERIES • Find articles with “cat” and “dog” • Find articles with “cat” or “dog” • Find articles with “cat” and not “dog” CS 245 Notes 4 66 IR QUERIES • Find articles with “cat” and “dog” • Find articles with “cat” or “dog” • Find articles with “cat” and not “dog” • Find articles with “cat” in title • Find articles with “cat” and “dog” within 5 words CS 245 Notes 4 67 Common technique: more info in inverted list cat d1 Title 5 Author 10 Abstract 57 dog CS 245 Title Title d3 d2 100 12 Notes 4 68 Posting: an entry in inverted list. Represents occurrence of term in article Size of a list: 1 Rare words or miss-spellings 106 Common words (in postings) Size of a posting: 10-15 bits CS 245 Notes 4 (compressed) 69 IR DISCUSSION • • • • • Stop words Truncation Thesaurus Full text vs. Abstracts Vector model CS 245 Notes 4 70 Vector space model w1 w2 w3 w4 w5 w6 w7 … DOC = <1 0 0 1 1 0 0 …> Query= <0 CS 245 0 1 Notes 4 1 0 0 0 …> 71 Vector space model w1 w2 w3 w4 w5 w6 w7 … DOC = <1 0 0 1 1 0 0 …> Query= <0 0 1 PRODUCT = CS 245 1 0 0 0 …> 1 + ……. = score Notes 4 72 • Tricks to weigh scores + normalize e.g.: Match on common word not as useful as match on rare words... CS 245 Notes 4 73 • How to process V.S. Queries? w1 w2 w3 w4 w5 w6 Q=<0 0 0 1 1 0 CS 245 Notes 4 … …> 74 • Try Stanford Libraries • Try Google, Yahoo, ... CS 245 Notes 4 75 Summary so far • Conventional index – Basic Ideas: sparse, dense, multi-level… – Duplicate Keys – Deletion/Insertion – Secondary indexes – Buckets of Postings List CS 245 Notes 4 76 Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance CS 245 Notes 4 77 Example Index (sequential) 10 20 30 continuous 40 50 60 free space 70 80 90 CS 245 Notes 4 78 Example continuous Index (sequential) 10 20 30 33 40 50 60 39 31 35 36 32 38 34 free space 70 80 90 CS 245 overflow area (not sequential) Notes 4 79 Outline: • Conventional indexes • B-Trees NEXT • Hashing schemes CS 245 Notes 4 80 • NEXT: Another type of index – Give up on sequentiality of index – Try to get “balance” CS 245 Notes 4 81 CS 245 Notes 4 180 200 150 156 179 120 130 100 101 110 30 35 3 5 11 120 150 180 30 100 B+Tree Example n=3 Root 82 95 81 57 Sample non-leaf to keys to keys to keys < 57 57 k<81 81k<95 CS 245 Notes 4 to keys 95 83 Sample leaf node: CS 245 81 95 To record with key 81 To record with key 85 To record with key 57 57 From non-leaf node Notes 4 to next leaf in sequence 84 In textbook’s notation n=3 Leaf: 30 35 30 35 Non-leaf: 30 30 CS 245 Notes 4 85 Size of nodes: CS 245 n+1 pointers (fixed) n keys Notes 4 86 Don’t want nodes to be too empty • Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data CS 245 Notes 4 87 n=3 30 Leaf 30 35 CS 245 Notes 4 counts even if null Non-leaf 120 150 180 min. node 3 5 11 Full node 88 B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” CS 245 Notes 4 89 (3) Number of pointers/keys for B+tree Max Max Min ptrs keys ptrsdata Non-leaf (non-root) Leaf (non-root) Root CS 245 Min keys n+1 n (n+1)/2 (n+1)/2- 1 n+1 n (n+1)/2 (n+1)/2 n+1 n 1 1 Notes 4 90 Insert into B+tree (a) simple case – space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root CS 245 Notes 4 91 (a) Insert key = 32 CS 245 30 31 3 5 11 30 100 n=3 Notes 4 92 (a) Insert key = 32 CS 245 30 31 32 3 5 11 30 100 n=3 Notes 4 93 (a) Insert key = 7 CS 245 30 31 3 5 11 30 100 n=3 Notes 4 94 (a) Insert key = 7 CS 245 30 31 3 57 11 3 5 30 100 n=3 Notes 4 95 (a) Insert key = 7 CS 245 30 30 31 3 57 11 3 5 7 100 n=3 Notes 4 96 (c) Insert key = 160 150 156 179 180 200 120 150 180 100 n=3 CS 245 Notes 4 97 (c) Insert key = 160 CS 245 Notes 4 180 200 160 179 150 156 179 120 150 180 100 n=3 98 (c) Insert key = 160 CS 245 Notes 4 180 200 180 160 179 150 156 179 120 150 180 100 n=3 99 n=3 CS 245 Notes 4 180 200 180 160 179 150 156 179 120 150 180 160 100 (c) Insert key = 160 100 (d) New root, insert 45 CS 245 Notes 4 30 32 40 20 25 10 12 1 2 3 10 20 30 n=3 101 (d) New root, insert 45 CS 245 Notes 4 40 45 30 32 40 20 25 10 12 1 2 3 10 20 30 n=3 102 (d) New root, insert 45 40 45 40 Notes 4 30 32 40 20 25 10 20 30 10 12 1 2 3 CS 245 n=3 103 (d) New root, insert 45 CS 245 Notes 4 40 45 30 32 40 20 25 10 12 1 2 3 10 20 30 40 30 new root n=3 104 Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf CS 245 Notes 4 105 (b) Coalesce with sibling n=4 CS 245 40 50 10 20 30 10 40 100 – Delete 50 Notes 4 106 (b) Coalesce with sibling n=4 CS 245 40 50 10 20 30 40 10 40 100 – Delete 50 Notes 4 107 (c) Redistribute keys n=4 CS 245 40 50 10 20 30 35 10 40 100 – Delete 50 Notes 4 108 (c) Redistribute keys n=4 CS 245 35 40 50 10 20 30 35 10 40 35 100 – Delete 50 Notes 4 109 (d) Non-leaf coalese n=4 CS 245 Notes 4 40 45 30 37 30 40 25 26 20 22 10 14 1 3 10 20 25 – Delete 37 110 (d) Non-leaf coalese n=4 CS 245 Notes 4 40 45 30 37 30 40 25 26 30 20 22 10 14 1 3 10 20 25 – Delete 37 111 (d) Non-leaf coalese n=4 CS 245 Notes 4 40 45 30 37 25 26 30 30 40 40 20 22 10 14 1 3 10 20 25 – Delete 37 112 (d) Non-leaf coalese n=4 25 – Delete 37 CS 245 Notes 4 40 45 30 37 30 40 25 26 30 20 22 10 14 1 3 10 20 25 40 new root 113 B+tree deletions in practice – Often, coalescing is not implemented – Too hard and not worth it! CS 245 Notes 4 114 Comparison: B-trees vs. static indexed sequential file Ref #1: Held & Stonebraker “B-Trees Re-examined” CACM, Feb. 1978 CS 245 Notes 4 115 Ref # 1 claims: - Concurrency control harder in B-Trees - B-tree consumes more space For their comparison: block = 512 bytes key = pointer = 4 bytes 4 data records per block CS 245 Notes 4 116 Example: 1 block static index k1 k1 127 keys k2 k2 k3 k3 (127+1)4 = 512 Bytes -> pointers in index implicit! CS 245 1 data block Notes 4 up to 127 blocks 117 Example: 1 block B-tree k1 k1 1 data block k2 63 keys k2 ... k63 k3 next 63x(4+4)+8 = 512 Bytes -> pointers needed in B-tree blocks because index is not contiguous CS 245 Notes 4 up to 63 blocks 118 Size comparison Ref. #1 Static Index # data blocks height 2 -> 127 128 -> 16,129 16,130 -> 2,048,383 CS 245 2 3 4 B-tree # data blocks height 2 -> 63 64 -> 3968 3969 -> 250,047 250,048 -> 15,752,961 Notes 4 2 3 4 5 119 Ref. #1 analysis claims • For an 8,000 block file, after 32,000 inserts after 16,000 lookups Static index saves enough accesses to allow for reorganization CS 245 Notes 4 120 Ref. #1 analysis claims • For an 8,000 block file, after 32,000 inserts after 16,000 lookups Static index saves enough accesses to allow for reorganization Ref. #1 conclusion CS 245 Static index better!! Notes 4 121 Ref #2: M. Stonebraker, “Retrospective on a database system,” TODS, June 1980 B-trees better!! Ref. #2 conclusion CS 245 Notes 4 122 B-trees better!! Ref. #2 conclusion • DBA does not know when to reorganize • DBA does not know how full to load pages of new index CS 245 Notes 4 123 B-trees better!! Ref. #2 conclusion • Buffering – B-tree: has fixed buffer requirements – Static index: must read several overflow blocks to be efficient (large & variable size buffers needed for this) CS 245 Notes 4 124 • Speaking of buffering… Is LRU a good policy for B+tree buffers? CS 245 Notes 4 125 • Speaking of buffering… Is LRU a good policy for B+tree buffers? Of course not! Should try to keep root in memory at all times (and perhaps some nodes from second level) CS 245 Notes 4 126 Interesting problem: For B+tree, how large should n be? … n is number of keys / node CS 245 Notes 4 127 Sample assumptions: (1) Time to read node from disk is (S+Tn) msec. CS 245 Notes 4 128 Sample assumptions: (1) Time to read node from disk is (S+Tn) msec. (2) Once block in memory, use binary search to locate key: (a + b LOG2 n) msec. For some constants a,b; Assume a << S CS 245 Notes 4 129 Sample assumptions: (1) Time to read node from disk is (S+Tn) msec. (2) Once block in memory, use binary search to locate key: (a + b LOG2 n) msec. For some constants a,b; Assume a << S (3) Assume B+tree is full, i.e., # nodes to examine is LOGn N where N = # records CS 245 Notes 4 130 Can get: f(n) = time to find a record f(n) nopt CS 245 Notes 4 n 131 FIND nopt by f’(n) = 0 Answer is nopt = “few hundred” (see homework for details) CS 245 Notes 4 132 FIND nopt by f’(n) = 0 Answer is nopt = “few hundred” (see homework for details) What happens to nopt as • Disk gets faster? • CPU get faster? CS 245 Notes 4 133 Exercise S= 14000 T= 0.2 b= 0.002 a= 0 N= 10000000 • f(n)= lognN*[S+T*n+a+b*log2(n)] CS 245 Notes 4 134 N=10 million records S= T= b= a= N= 14000 0.2 0.002 0 10000000 times in microseconds CS 245 Notes 4 135 N=100 million records S= T= b= a= N= 14000 0.2 0.002 0 10000000 times in microseconds CS 245 Notes 4 136 Variation on B+tree: B-tree (no +) • Idea: – Avoid duplicate keys – Have record pointers in non-leaf nodes CS 245 Notes 4 137 K1 P1 to keys < K1 CS 245 K2 P2 K3 P3 to record to record to record with K1 with K2 with K3 to keys to keys K1<x<K2 K2<x<k3 Notes 4 to keys >k3 138 CS 245 Notes 4 170 180 150 160 130 140 110 120 90 100 70 80 50 60 30 40 10 20 145 165 85 105 25 45 65 125 B-tree example n=2 139 CS 245 Notes 4 170 180 150 160 145 165 110 120 90 100 70 80 50 60 30 40 10 20 25 45 85 105 (but keep space for simplicity) 130 140 • sequence pointers not useful now! n=2 65 125 B-tree example 140 Note on inserts leaf CS 245 10 20 30 • Say we insert record with key = 25 Notes 4 n=3 141 Note on inserts • Say we insert record with key = 25 CS 245 25 30 10 – 20 – • Afterwards: 10 20 30 leaf n=3 Notes 4 142 So, for B-trees: MAX Non-leaf non-root Leaf non-root Root non-leaf Root Leaf CS 245 MIN Tree Ptrs Rec Keys Ptrs Tree Ptrs Rec Ptrs Keys n+1 n n 1 n n 1 n/2 n+1 n n 2 1 1 1 n n 1 1 1 (n+1)/2 (n+1)/2-1 (n+1)/2-1 Notes 4 n/2 143 Tradeoffs: B-trees have faster lookup than B+trees in B-tree, non-leaf & leaf different sizes in B-tree, deletion more complicated CS 245 Notes 4 144 Tradeoffs: B-trees have faster lookup than B+trees in B-tree, non-leaf & leaf different sizes in B-tree, deletion more complicated B+trees preferred! CS 245 Notes 4 145 But note: • If blocks are fixed size (due to disk and buffering restrictions) Then lookup for B+tree is actually better!! CS 245 Notes 4 146 Example: - Pointers - Keys - Blocks - Look at full CS 245 4 bytes 4 bytes 100 bytes (just example) 2 level tree Notes 4 147 B-tree: Root has 8 keys + 8 record pointers + 9 son pointers = 8x4 + 8x4 + 9x4 = 100 bytes CS 245 Notes 4 148 B-tree: Root has 8 keys + 8 record pointers + 9 son pointers = 8x4 + 8x4 + 9x4 = 100 bytes Each of 9 sons: 12 rec. pointers (+12 keys) = 12x(4+4) + 4 = 100 bytes CS 245 Notes 4 149 B-tree: Root has 8 keys + 8 record pointers + 9 son pointers = 8x4 + 8x4 + 9x4 = 100 bytes Each of 9 sons: 12 rec. pointers (+12 keys) = 12x(4+4) + 4 = 100 bytes 2-level B-tree, Max # records = 12x9 + 8 = 116 CS 245 Notes 4 150 B+tree: Root has 12 keys + 13 son pointers = 12x4 + 13x4 = 100 bytes CS 245 Notes 4 151 B+tree: Root has 12 keys + 13 son pointers = 12x4 + 13x4 = 100 bytes Each of 13 sons: 12 rec. ptrs (+12 keys) = 12x(4 +4) + 4 = 100 bytes CS 245 Notes 4 152 B+tree: Root has 12 keys + 13 son pointers = 12x4 + 13x4 = 100 bytes Each of 13 sons: 12 rec. ptrs (+12 keys) = 12x(4 +4) + 4 = 100 bytes 2-level B+tree, Max # records = 13x12 = 156 CS 245 Notes 4 153 So... 8 records B+ B ooooooooooooo ooooooooo 156 records 108 records Total = 116 CS 245 Notes 4 154 So... 8 records B+ B ooooooooooooo ooooooooo 156 records 108 records Total = 116 • Conclusion: – For fixed block size, – B+ tree is better because it is bushier CS 245 Notes 4 155 An Interesting Problem... • What is a good index structure when: – records tend to be inserted with keys that are larger than existing values? (e.g., banking records with growing data/time) – we want to remove older data CS 245 Notes 4 156 One Solution: Multiple Indexes • Example: I1, I2 day 10 11 12 13 days indexed I1 1,2,3,4,5 11,2,3,4,5 11,12,3,4,5 11,12,13,4,5 days indexed I2 6,7,8,9,10 6,7,8,9,10 6,7,8,9,10 6,7,8,9,10 •advantage: deletions/insertions from smaller index •disadvantage: query multiple indexes CS 245 Notes 4 157 Another Solution (Wave Indexes) day 10 11 12 13 14 15 16 I1 1,2,3 1,2,3 1,2,3 13 13,14 13,14,15 13,14,15 I2 4,5,6 4,5,6 4,5,6 4,5,6 4,5,6 4,5,6 16 I3 7,8,9 7,8,9 7,8,9 7,8,9 7,8,9 7,8,9 7,8,9 I4 10 10,11 10,11, 10,11, 10,11, 10,11, 10,11, •advantage: no deletions •disadvantage: approximate windows CS 245 Notes 4 158 12 12 12 12 12 Outline/summary • Conventional Indexes • Sparse vs. dense • Primary vs. secondary • B trees • B+trees vs. B-trees • B+trees vs. indexed sequential • Hashing schemes CS 245 --> Next Notes 4 159