CS 245: Database System Principles Notes 4: Indexing Chapter 4

advertisement
Chapter 4
CS 245: Database System
Principles
Indexing & Hashing
record
value
Notes 4: Indexing

value
Hector Garcia-Molina
CS 245
Notes 4
1
CS 245
Notes 4
2
Sequential File
Topics
10
20
• Conventional indexes
• B-trees
• Hashing schemes
30
40
50
60
70
80
90
100
CS 245
Notes 4
Dense Index
CS 245
Sequential File
70
80
90
100
110
120
Notes 4
50
60
70
80
170
190
210
230
90
100
5
CS 245
Sequential File
30
40
90
110
130
150
50
60
4
10
20
10
30
50
70
30
40
50
60
70
80
Notes 4
Sparse Index
10
20
10
20
30
40
CS 245
3
90
100
Notes 4
6
1
10
90
170
250
330
410
490
570
10
20
10
30
50
70
30
40
90
110
130
150
50
60
70
80
170
190
210
230
CS 245
• Comment:
{FILE,INDEX} may be contiguous
or not (blocks chained)
Sequential File
Sparse 2nd level
90
100
Notes 4
7
CS 245
Notes 4
8
Question:
Notes on pointers:
• Can we build a dense, 2nd level index
for a dense index?
(1) Block pointer (sparse index) can be
smaller than record pointer
BP
RP
CS 245
Notes 4
9
CS 245
Notes 4
Notes on pointers:
10
R1
(2) If file is contiguous, then we can omit
pointers (i.e., compute them)
K1
R2
K2
K3
R3
K4
R4
CS 245
Notes 4
11
CS 245
Notes 4
12
2
Sparse vs. Dense Tradeoff
R1
K1
R2
say:
1024 B
per block
K2
K3
R3
K4
R4
(Later:
• if we want K3 block:
get it at offset
(3-1)1024
= 2048 bytes
CS 245
– sparse better for insertions
– dense needed for secondary indexes)
Notes 4
13
Terms
•
•
•
•
•
•
•
CS 245
14
Notes 4
• Duplicate keys
• Deletion/Insertion
• Secondary indexes
15
CS 245
Notes 4
16
Duplicate keys
Duplicate keys
Dense index, one way to implement?
10
10
20
30
10
20
20
30
20
30
30
30
30
30
40
45
Notes 4
10
10
10
10
10
20
10
20
CS 245
Notes 4
Next:
Index sequential file
Search key (  primary key)
Primary index (on Sequencing field)
Secondary index
Dense index (all Search Key values in)
Sparse index
Multi-level index
CS 245
• Sparse: Less index space per record
can keep more of index in memory
• Dense: Can tell if any record exists
without accessing file
17
CS 245
30
30
40
45
Notes 4
18
3
Duplicate keys
Duplicate keys
Dense index, better way?
Sparse index, one way?
10
10
10
20
30
40
CS 245
10
10
10
10
20
30
10
20
10
20
20
30
20
30
30
30
40
45
30
30
40
45
Notes 4
19
CS 245
Notes 4
20
Duplicate keys
Sparse index, one way?
Sparse index, another way?
careful if looking
for 20 or 30!
Duplicate keys
10
10
20
30
CS 245
– place first new key from block
10
10
10
20
20
30
30
30
40
45
30
30
40
45
21
Duplicate keys
10
20
30
30
CS 245
10
20
Duplicate values,
primary index
a
a
30
30
40
45
Notes 4
22
• Index may point to first instance of
each value only
File
Index
a
10
10
20
30
CS 245
Notes 4
Summary
Sparse index, another way?
should
this be
40?
10
20
20
30
Notes 4
– place first new key from block
10
10
10
20
30
30


b
23
CS 245
Notes 4
24
4
Deletion from sparse index
Deletion from sparse index
– delete record 40
10
20
10
30
50
70
30
40
50
60
90
110
130
150
CS 245
25
Deletion from sparse index
CS 245
50
60
Notes 4
27
30
40
50
60
CS 245
70
80
Notes 4
28
Deletion from sparse index
– delete record 30
– delete records 30 & 40
10
10
20
50
70
30 40
40
10
20
10
30
50
70
50
60
30
40
50
60
90
110
130
150
70
80
Notes 4
26
10
20
90
110
130
150
70
80
90
110
130
150
Notes 4
10
30
50
70
30
40
40 30
70
80
– delete record 30
Deletion from sparse index
CS 245
CS 245
10
20
90
110
130
150
50
60
Deletion from sparse index
– delete record 40
10
30
50
70
30
40
90
110
130
150
70
80
Notes 4
10
20
10
30
50
70
29
CS 245
70
80
Notes 4
30
5
Deletion from sparse index
Deletion from sparse index
– delete records 30 & 40
– delete records 30 & 40
10
20
10
30
50
70
30
40
50
60
90
110
130
150
CS 245
30
40
50
60
90
110
130
150
70
80
Notes 4
10
20
10
50 30
70 50
70
31
Deletion from dense index
CS 245
70
80
Notes 4
32
Deletion from dense index
– delete record 30
10
20
10
20
30
40
30
40
50
60
50
60
70
80
CS 245
33
Deletion from dense index
CS 245
CS 245
Notes 4
34
10
20
10
20
40 30
40
30 40
40
50
60
30 40
40
50
60
50
60
70
80
70
80
Notes 4
70
80
– delete record 30
10
20
50
60
70
80
50
60
Deletion from dense index
– delete record 30
10
20
30
40
30
40
50
60
70
80
70
80
Notes 4
10
20
10
20
30
40
35
CS 245
70
80
Notes 4
36
6
Insertion, sparse index case
Insertion, sparse index case
– insert record 34
10
20
10
30
40
60
CS 245
30
40
50
60
60
37
Insertion, sparse index case
CS 245
CS 245
38
– insert record 15
10
20
10
20
10
30
40
60
30
34
30
40
50
40
50
60
60
Notes 4
39
Insertion, sparse index case
CS 245
Notes 4
40
Insertion, sparse index case
– insert record 15
– insert record 15
10
10
20 15
40
60
30 20
30
20 30
Notes 4
Insertion, sparse index case
– insert record 34
• our lucky day!
we have free space
where we need it!
30
40
50
Notes 4
10
30
40
60
10
20
10
30
40
60
10
10
20 15
40
60
30 20
30
20 30
40
50
40
50
60
• Illustrated: Immediate reorganization 60
• Variation:
– insert new block (chained file)
– update index
CS 245
Notes 4
41
CS 245
Notes 4
42
7
Insertion, sparse index case
Insertion, sparse index case
– insert record 25
10
30
40
60
– insert record 25
10
20
10
30
40
60
30
10
20
30
40
50
40
50
60
60
CS 245
Notes 4
43
25
CS 245
overflow blocks
(reorganize later...)
Notes 4
44
Secondary indexes
Insertion, dense index case
Sequence
field
• Similar
30
50
• Often more expensive . . .
20
70
80
40
100
10
90
60
CS 245
Notes 4
45
Secondary indexes
90
...
Notes 4
Sequence
field
30
50
20
70
30
20
80
100
80
40
90
...
80
40
does not make sense!
100
10
90
60
100
10
90
60
CS 245
46
• Sparse index
30
50
30
20
80
100
Notes 4
Secondary indexes
Sequence
field
• Sparse index
CS 245
47
CS 245
20
70
Notes 4
48
8
Secondary indexes
Secondary indexes
Sequence
field
• Dense index
30
50
100
10
90
60
Notes 4
49
10
50
90
...
sparse
high
level
CS 245
30
50
10
20
30
40
80
40
51
CS 245
Notes 4
52
Duplicate values & secondary indexes
one option...
20
10
20
10
10
10
10
20
20
40
20
40
20
30
40
40
10
40
10
40
10
40
10
40
40
40
...
30
40
Notes 4
50
(not block pointers; not computed)
Duplicate values & secondary indexes
CS 245
Notes 4
Also: Pointers are record pointers
100
10
90
60
Notes 4
100
10
90
60
• Lowest level is dense
• Other levels are sparse
20
70
50
60
70
...
CS 245
80
40
With secondary indexes:
Sequence
field
• Dense index
20
70
50
60
70
...
80
40
Secondary indexes
30
50
10
20
30
40
20
70
CS 245
Sequence
field
• Dense index
53
CS 245
30
40
Notes 4
54
9
Duplicate values & secondary indexes
one option...
Problem:
excess overhead!
• disk space
• search time
another option...
20
10
10
10
10
20
20
40
20
30
40
40
10
40
55
another option...
20
40
10
40
10
40
10
20
10
20
20
40
CS 245
Notes 4
57
Duplicate values & secondary indexes
20
10
CS 245
10
40
Another idea (suggested in class):
Chain records with same key?
30
40



58
20
40
10
40
50
60
...
10
40
30
40

Problems:
• Need to add fields to records
• Need to follow chain to know records
Notes 4
30
40
20
10
10
20
30
40


Notes 4

10
40
10
40
Duplicate values & secondary indexes
20
40
50
60
...
10
40
Another idea (suggested in class):
Chain records with same key?
Notes 4

20
40
50
60
...
10
40
10
20
30
40
20
10
10
20
30
40
10
40
30
40
56
Duplicate values & secondary indexes
30
40
CS 245
20
30
40
Duplicate values & secondary indexes
CS 245
20
10
30
40
Notes 4
Problem:
variable size
records in
index!
10
30
40
10
40
40
40
...
CS 245
Duplicate values & secondary indexes
buckets
59
CS 245
Notes 4
60
10
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Why “bucket” idea is useful
Indexes
Name: primary
Dept: secondary
Floor: secondary
CS 245
Dept. index
Records
EMP (name,dept,floor,...)
Notes 4
61
EMP
Floor index
Toy
CS 245
2nd
Notes 4
62
This idea used in
text information retrieval
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index
EMP
Floor index
Toy
Documents
...the cat is
fat ...
2nd
...was raining
cats and dogs...
...Fido the
dog ...
 Intersect toy bucket and 2nd Floor
bucket to get set of matching EMP’s
CS 245
Notes 4
63
This idea used in
text information retrieval
64
• Find articles with “cat” and “dog”
• Find articles with “cat” or “dog”
• Find articles with “cat” and not “dog”
...the cat is
fat ...
dog
Notes 4
IR QUERIES
Documents
cat
CS 245
...was raining
cats and dogs...
...Fido the
dog ...
Inverted lists
CS 245
Notes 4
65
CS 245
Notes 4
66
11
Common technique:
more info in inverted list
IR QUERIES
• Find articles with “cat” and “dog”
• Find articles with “cat” or “dog”
• Find articles with “cat” and not “dog”
cat
Title
d1
5
Author 10
Abstract 57
• Find articles with “cat” in title
• Find articles with “cat” and “dog”
within 5 words
CS 245
dog
Notes 4
67
Posting: an entry in inverted list.
Represents occurrence of
term in article
Size of a list:
1
Rare words or
miss-spellings
106
Common words
(in postings)
Size of a posting: 10-15 bits
CS 245
Title
100
Title
12
CS 245
d2
d3
Notes 4
68
IR DISCUSSION
•
•
•
•
•
Stop words
Truncation
Thesaurus
Full text vs. Abstracts
Vector model
(compressed)
Notes 4
69
CS 245
Notes 4
70
Vector space model
Vector space model
w1 w2 w3 w4 w5 w6 w7 …
DOC = <1
0 0 1
1 0 0 …>
w1 w2 w3 w4 w5 w6 w7 …
DOC = <1
0 0 1
1 0 0 …>
Query= <0
Query= <0
0
1
1
0
0
0 …>
0
1
PRODUCT =
CS 245
Notes 4
71
CS 245
1
0
0
0 …>
1 + ……. = score
Notes 4
72
12
• How to process V.S. Queries?
• Tricks to weigh scores + normalize
w1 w2 w3 w4 w5 w6
Q=<0
0 0 1
1 0
e.g.: Match on common word not as
useful as match on rare words...
CS 245
Notes 4
73
• Try Stanford Libraries
• Try Google, Yahoo, ...
CS 245
…
…>
Notes 4
74
Summary so far
• Conventional index
– Basic Ideas: sparse, dense, multi-level…
– Duplicate Keys
– Deletion/Insertion
– Secondary indexes
– Buckets of Postings List
CS 245
Notes 4
75
CS 245
Example
Conventional indexes
continuous
Disadvantage:
- Inserts expensive, and/or
- Lose sequentiality & balance
Notes 4
Index
76
(sequential)
10
20
30
Advantage:
- Simple
- Index is sequential file
good for scans
CS 245
Notes 4
40
50
60
free space
70
80
90
77
CS 245
Notes 4
78
13
Example
Index
(sequential)
10
20
30
33
40
50
60
continuous
• Conventional indexes
• B-Trees
 NEXT
• Hashing schemes
32
38
34
free space
70
80
90
CS 245
Outline:
39
31
35
36
overflow area
(not sequential)
Notes 4
79
CS 245
Notes 4
80
B+Tree Example
• NEXT: Another type of index
– Give up on sequentiality of index
– Try to get “balance”
n=3
CS 245
Notes 4
81
CS 245
180
200
150
156
179
120
130
100
101
110
3
5
11
Note: This index is called B+ tree,
but Gradiance homeworks just call it B-tree.
30
35
30
120
150
180
100
Root
Notes 4
82
Sample leaf node:
Sample non-leaf
CS 245
Notes 4
to keys
95
83
CS 245
95
81k<95
81
to keys
57 k<81
To record
with key 85
to keys
< 57
To record
with key 81
to keys
To record
with key 57
57
95
81
57
From non-leaf node
Notes 4
to next leaf
in sequence
84
14
In textbook’s notation
n=3
Leaf:
Size of nodes:
n+1 pointers
(fixed)
n keys
30
35
30 35
Non-leaf:
30
30
CS 245
Notes 4
85
CS 245
Notes 4
86
n=3
Don’t want nodes to be too empty
Full node
min. node
CS 245
B+tree rules
Notes 4
87
tree of order n
Notes 4
CS 245
Notes 4
88
(3) Number of pointers/keys for B+tree
(1) All leaves at same lowest level
(balanced tree)
(2) Pointers in leaves point to records
except for “sequence pointer”
CS 245
Leaf
counts even if null
(n+1)/2 pointers to data
30
Leaf:
Non-leaf
30
35
(n+1)/2 pointers
120
150
180
Non-leaf:
3
5
11
• Use at least
Max Max Min
ptrs keys ptrsdata
Non-leaf
(non-root)
Leaf
(non-root)
Root
89
CS 245
Min
keys
n+1
n
(n+1)/2 (n+1)/2- 1
n+1
n
(n+1)/2
(n+1)/2
n+1
n
1
1
Notes 4
90
15
(a) Insert key = 32
n=3
100
Insert into B+tree
(a) simple case
– space available in leaf
CS 245
Notes 4
91
(a) Insert key = 32
30
31
3
5
11
30
(b) leaf overflow
(c) non-leaf overflow
(d) new root
CS 245
Notes 4
(a) Insert key = 7
n=3
30
100
100
n=3
30
31
32
3
5
11
30
31
30
Notes 4
93
(a) Insert key = 7
CS 245
Notes 4
(a) Insert key = 7
n=3
CS 245
30
7
95
30
31
3
57
11
3
5
30
31
3
57
11
3
5
30
100
n=3
Notes 4
94
100
3
5
11
CS 245
CS 245
92
Notes 4
96
16
(c) Insert key = 160
(c) Insert key = 160
150
156
179
99
CS 245
180
Notes 4
100
(d) New root, insert 45
n=3
Notes 4
101
CS 245
Notes 4
40
45
30
32
40
20
25
10
12
1
2
3
30
32
40
20
25
10
20
30
n=3
10
20
30
10
12
n=3
120
150
180
100
180
200
160
179
180
120
150
180
150
156
179
Notes 4
(d) New root, insert 45
98
(c) Insert key = 160
n=3
100
CS 245
1
2
3
Notes 4
160
179
(c) Insert key = 160
CS 245
160
179
150
156
179
97
180
200
Notes 4
180
200
120
150
180
100
CS 245
CS 245
n=3
160
150
156
179
180
200
120
150
180
100
n=3
102
17
(d) New root, insert 45
CS 245
103
30
CS 245
Notes 4
104
(b) Coalesce with sibling
Deletion from B+tree
105
(b) Coalesce with sibling
CS 245
n=4
10
40
100
10
20
30
35
40
50
Notes 4
106
– Delete 50
10
40
100
10
20
30
40
CS 245
Notes 4
(c) Redistribute keys
n=4
– Delete 50
40
50
10
40
100
10
20
30
Notes 4
107
CS 245
40
50
CS 245
n=4
– Delete 50
(a) Simple case - no example
(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys
(d) Cases (b) or (c) at non-leaf
40
45
20
25
10
12
10
20
30
Notes 4
1
2
3
40
45
40
30
32
40
20
25
10
12
1
2
3
10
20
30
new root
n=3
40
n=3
30
32
40
(d) New root, insert 45
Notes 4
108
18
(c) Redistribute keys
(d) Non-leaf coalesce
n=4
n=4
– Delete 37
Notes 4
109
(d) Non-leaf coalesce
30
40
40
45
30
37
40
45
30
40
30
37
20
22
40
Notes 4
112
B+tree deletions in practice
25
– Often, coalescing is not implemented
new root
Notes 4
40
45
30
37
25
26
30
30
40
40
20
22
10
20
25
– Too hard and not worth it!
10
14
1
3
n=4
25
n=4
– Delete 37
CS 245
10
14
10
20
111
1
3
40
45
30
37
30
40
Notes 4
(d) Non-leaf coalesce
CS 245
110
– Delete 37
25
20
22
25
26
30
10
20
10
14
1
3
CS 245
25
26
20
22
Notes 4
(d) Non-leaf coalesce
n=4
– Delete 37
CS 245
25
26
30
CS 245
10
14
1
3
10
20
30
35
35
40
50
10
20
10
40 35
100
25
– Delete 50
113
CS 245
Notes 4
114
19
Comparison: B-trees vs. static
indexed sequential file
Ref # 1 claims:
- Concurrency control harder in B-Trees
- B-tree consumes more space
Ref #1: Held & Stonebraker
“B-Trees Re-examined”
CACM, Feb. 1978
CS 245
For their comparison:
block = 512 bytes
key = pointer = 4 bytes
4 data records per block
Notes 4
115
Example: 1 block static index
127 keys
k1
1 data
block
116
k1
1 data
block
k2
k2
k2
63 keys
k3
k3
Notes 4
Example: 1 block B-tree
k1
k1
CS 245
k2
...
k63
k3
next
(127+1)4 = 512 Bytes
-> pointers in index implicit!
CS 245
Notes 4
Size comparison
CS 245
2
3
4
117
Ref. #1
Static Index
# data
blocks
height
2 -> 127
128 -> 16,129
16,130 -> 2,048,383
63x(4+4)+8 = 512 Bytes
-> pointers needed in B-tree
blocks because index is
not contiguous
up to 127
blocks
Notes 4
Notes 4
118
Ref. #1 analysis claims
B-tree
# data
blocks
height
2 -> 63
64 -> 3968
3969 -> 250,047
250,048 -> 15,752,961
CS 245
up to 63
blocks
2
3
4
5
119
• For an 8,000 block file,
after 32,000 inserts
after 16,000 lookups
 Static index saves enough accesses
to allow for reorganization
CS 245
Notes 4
120
20
Ref. #1 analysis claims
Ref #2: M. Stonebraker,
“Retrospection on a database
system,” TODS, June 1980
• For an 8,000 block file,
after 32,000 inserts
after 16,000 lookups
 Static index saves enough accesses
to allow for reorganization
Ref. #1 conclusion
CS 245
Static index better!!
Notes 4
121
B-trees better!!
Ref. #2 conclusion
Notes 4
CS 245
Notes 4
123
• Speaking of buffering…
Is LRU a good policy for B+tree buffers?
122
B-trees better!!
Ref. #2 conclusion
• DBA does not know when to reorganize
• DBA does not know how full to load
pages of new index
CS 245
B-trees better!!
Ref. #2 conclusion
• Buffering
– B-tree: has fixed buffer requirements
– Static index: must read several overflow
blocks to be efficient
(large & variable size
buffers needed for this)
CS 245
Notes 4
124
• Speaking of buffering…
Is LRU a good policy for B+tree buffers?
 Of course not!
 Should try to keep root in memory
at all times
(and perhaps some nodes from second level)
CS 245
Notes 4
125
CS 245
Notes 4
126
21
Sample assumptions:
Interesting problem:
(1) Time to read node from disk is
(S+Tn) msec.
For B+tree, how large should n be?
…
n is number of keys / node
CS 245
Notes 4
127
Sample assumptions:
CS 245
Notes 4
128
Sample assumptions:
(1) Time to read node from disk is
(S+Tn) msec.
(2) Once block in memory, use binary
search to locate key:
(a + b LOG2 n) msec.
(1) Time to read node from disk is
(S+Tn) msec.
(2) Once block in memory, use binary
search to locate key:
(a + b LOG2 n) msec.
For some constants a,b; Assume a << S
For some constants a,b; Assume a << S
(3) Assume B+tree is full, i.e.,
# nodes to examine is LOGn N
where N = # records
CS 245
Notes 4
129
Can get:
Notes 4
130
 FIND nopt by f’(n) = 0
f(n) = time to find a record
Answer is nopt = “few hundred”
(see homework for details)
f(n)
nopt
CS 245
CS 245
Notes 4
n
131
CS 245
Notes 4
132
22
Exercise
 FIND nopt by f’(n) = 0
Answer is nopt = “few hundred”
(see homework for details)
S=
 What happens to nopt as
14000
T=
0.2
b=
0.002
a=
0
N=
10000000
• f(n)= lognN*[S+T*n+a+b*log2(n)]
• Disk gets faster?
• CPU get faster?
CS 245
Notes 4
133
S=
T=
b=
a=
N=
N=10 million records
CS 245
14000
0.2
0.002
0
10000000
Notes 4
N=100 million records
times in microseconds
Notes 4
14000
0.2
0.002
0
10000000
times in microseconds
135
N=100 million records
S=
T=
b=
a=
N=
n
n
CS 245
134
S=
T=
b=
a=
N=
CS 245
varies
0.2
0.002
0
10000000
Notes 4
136
Variation on B+tree: B-tree (no +)
• Idea:
– Avoid duplicate keys
– Have record pointers in non-leaf nodes
S=20000
S=14000
S=10000
n
times in microseconds
CS 245
Notes 4
137
CS 245
Notes 4
138
23
139
n=2
• sequence pointers
not useful now!
170
180
150
160
130
140
110
120
90
100
170
180
150
160
130
140
110
120
145
165
CS 245
Notes 4
142
So, for B-trees:
• Say we insert record with key = 25
MAX
10
20
30
n=3
Non-leaf
non-root
Leaf
non-root
Root
non-leaf
Root
Leaf
25
30
–
20
–
10
CS 245
n=3
10
20
30
65
125
141
Note on inserts
• Afterwards:
140
Note on inserts
Notes 4
leaf
70
80
Notes 4
leaf
85
105
90
100
70
80
25
45
50
60
30
40
10
20
CS 245
• Say we insert record with key = 25
(but keep space for simplicity)
CS 245
50
60
to keys
>k3
Notes 4
B-tree example
145
165
25
45
to record
to record
to record
with K1
with K2
with K3
to keys
to keys
K1<x<K2
K2<x<k3
CS 245
85
105
K3 P3
10
20
to keys
< K1
K2 P2
30
40
K1 P1
n=2
65
125
B-tree example
Notes 4
143
CS 245
MIN
Tree
Ptrs
Rec Keys
Ptrs
Tree
Ptrs
Rec
Ptrs
Keys
n+1
n
n
1
n
n
1
n/2
n+1
n
n
2
1
1
1
n
n
1
1
1
(n+1)/2 (n+1)/2-1 (n+1)/2-1
Notes 4
n/2
144
24
Tradeoffs:
Tradeoffs:
 B-trees have faster lookup than B+trees
 B-trees have faster lookup than B+trees
 in B-tree, non-leaf & leaf different sizes
 in B-tree, deletion more complicated
 in B-tree, non-leaf & leaf different sizes
 in B-tree, deletion more complicated
 B+trees preferred!
CS 245
Notes 4
145
But note:
Example:
- Pointers
- Keys
- Blocks
- Look at full
• If blocks are fixed size
(due to disk and buffering restrictions)
Then lookup for B+tree is
actually better!!
CS 245
Notes 4
CS 245
147
CS 245
Notes 4
146
4 bytes
4 bytes
100 bytes (just example)
2 level tree
Notes 4
148
B-tree:
B-tree:
Root has 8 keys + 8 record pointers
+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
Root has 8 keys + 8 record pointers
+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
Each of 9 sons: 12 rec. pointers (+12 keys)
= 12x(4+4) + 4 = 100 bytes
CS 245
Notes 4
149
CS 245
Notes 4
150
25
B-tree:
B+tree:
Root has 8 keys + 8 record pointers
+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
Root has 12 keys + 13 son pointers
= 12x4 + 13x4 = 100 bytes
Each of 9 sons: 12 rec. pointers (+12 keys)
= 12x(4+4) + 4 = 100 bytes
2-level B-tree, Max # records =
12x9 + 8 = 116
CS 245
Notes 4
151
CS 245
Notes 4
152
B+tree:
B+tree:
Root has 12 keys + 13 son pointers
= 12x4 + 13x4 = 100 bytes
Root has 12 keys + 13 son pointers
= 12x4 + 13x4 = 100 bytes
Each of 13 sons: 12 rec. ptrs (+12 keys)
= 12x(4 +4) + 4 = 100 bytes
Each of 13 sons: 12 rec. ptrs (+12 keys)
= 12x(4 +4) + 4 = 100 bytes
2-level B+tree, Max # records
= 13x12 = 156
CS 245
Notes 4
153
So...
8 records
CS 245
Notes 4
154
So...
8 records
B+
B
B+
B
ooooooooooooo
ooooooooo
ooooooooooooo
ooooooooo
156 records
108 records
Total = 116
156 records
108 records
Total = 116
• Conclusion:
– For fixed block size,
– B+ tree is better because it is bushier
CS 245
Notes 4
155
CS 245
Notes 4
156
26
An Interesting Problem...
One Solution: Multiple Indexes
• What is a good index structure when:
• Example: I1, I2
– records tend to be inserted with keys
that are larger than existing values?
day
(e.g., banking records with growing data/time)
10
11
12
13
– we want to remove older data
days indexed
I1
1,2,3,4,5
11,2,3,4,5
11,12,3,4,5
11,12,13,4,5
days indexed
I2
6,7,8,9,10
6,7,8,9,10
6,7,8,9,10
6,7,8,9,10
•advantage: deletions/insertions from smaller index
•disadvantage: query multiple indexes
CS 245
Notes 4
157
CS 245
Another Solution (Wave Indexes)
day
10
11
12
13
14
15
16
I1
1,2,3
1,2,3
1,2,3
13
13,14
13,14,15
13,14,15
I2
4,5,6
4,5,6
4,5,6
4,5,6
4,5,6
4,5,6
16
I3
7,8,9
7,8,9
7,8,9
7,8,9
7,8,9
7,8,9
7,8,9
I4
10
10,11
10,11,
10,11,
10,11,
10,11,
10,11,
Notes 4
159
158
Outline/summary
• Conventional Indexes
12
12
12
12
12
• Sparse vs. dense
• Primary vs. secondary
• B trees
• B+trees vs. B-trees
• B+trees vs. indexed sequential
• Hashing schemes
•advantage: no deletions
•disadvantage: approximate windows
CS 245
Notes 4
CS 245
--> Next
Notes 4
160
27
Download