Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu

advertisement
Data-intensive Computing
Systems
Operators for Data Access (contd.)
Shivnath Babu
1
Insertion in a B-Tree
n=2
49
15
36
49
Insert: 62
2
Insertion in a B-Tree
n=2
49
15
36
49
62
Insert: 62
3
Insertion in a B-Tree
n=2
49
15
36
49
62
Insert: 50
4
Insertion in a B-Tree
49
15
36
n=2
62
49
50
62
Insert: 50
5
Insertion in a B-Tree
49
15
36
n=2
62
49
50
62
Insert: 75
6
Insertion in a B-Tree
49
15
36
n=2
62
49
50
62
75
Insert: 75
7
Insertion
8
Insertion
9
Insertion
10
Insertion
11
Insertion
12
Insertion
13
Insertion
14
Insertion
15
Insertion
16
Insertion
17
Insertion
18
Insertion: Primitives




Inserting into a leaf node
Splitting a leaf node
Splitting an internal node
Splitting root node
19
Inserting into a Leaf Node
58
54
57
60
62
20
Inserting into a Leaf Node
58
54
57
60
62
21
Inserting into a Leaf Node
58
54
57
58
60
62
22
Splitting a Leaf Node
61
54
54
57
58
60
66
62
23
Splitting a Leaf Node
61
54
54
57
58
60
66
62
24
Splitting a Leaf Node
61
54
54
57
58
66
60
61
62
25
Splitting a Leaf Node
59
61
54
54
57
58
66
60
61
62
26
Splitting a Leaf Node
61
54
54
57
58
59
66
60
61
62
27
Splitting an Internal Node
…
21
99
…
59
40
54
[54, 59)
66
74
[ 59, 66)
84
[66,74)
Splitting an Internal Node
…
21
99
…
59
40
54
[54, 59)
66
74
[ 59, 66)
84
[66,74)
Splitting an Internal Node
66
…
40
54
[54, 59)
21
99
…
[21,66)
[66, 99)
59
74
[ 59, 66)
84
[66,74)
Splitting the Root
59
40
54
[54, 59)
66
74
[ 59, 66)
84
[66,74)
Splitting the Root
59
40
54
[54, 59)
66
74
[ 59, 66)
84
[66,74)
Splitting the Root
66
40
54
[54, 59)
59
74
[ 59, 66)
84
[66,74)
Deletion
34
Deletion
redistribute
35
Deletion
36
Deletion - II
37
Deletion - II
merge
Deletion - II
39
Deletion - II
40
Deletion - II
41
Deletion - II
Not needed
merge
42
Deletion - II
43
Deletion: Primitives





Delete key from a leaf
Redistribute keys between sibling leaves
Merge a leaf into its sibling
Redistribute keys between two sibling internal
nodes
Merge an internal node into its sibling
44
Merge Leaf into Sibling
72
…
54
58
64
67
85
68
72
75
45
Merge Leaf into Sibling
72
…
54
58
64
67
85
68
75
46
Merge Leaf into Sibling
72
…
54
58
64
68
67
85
75
47
Merge Leaf into Sibling
72
…
54
58
64
68
85
75
48
Merge Internal Node into Sibling
…
41
48
59
…
63
52
[52, 59)
74
[59,63)
49
Merge Internal Node into Sibling
…
41
48
52
59
59
…
63
[52, 59)
[59,63)
50
B-Tree Roadmap

B-Tree
Recap
 Insertion (recap)
 Deletion
 Construction
 Efficiency



B-Tree variants
Hash-based Indexes
51
Question
How does insertion-based construction
perform?
52
B-Tree Construction
Sort
48
57
41
15
75
21
62
34
81
11
97
13
53
B-Tree Construction
11
13
15
21
34
41
48
57
11
13
15
21
34
41
48
57
Scan
62
62
75
81
75
81
97
97
B-Tree Construction
21
11
13
15
21
34
48
41
Scan
75
48
57
62
75
81
97
B-Tree Construction
Why is sort-based construction better than
insertion-based one?
56
Cost of B-Tree Operations



Height of B-Tree: H
Assume no duplicates
Question: what is the random I/O cost of:
Insertion:
 Deletion:
 Equality search:
 Range Search:

57
Height of B-Tree


Number of keys: N
B-Tree parameter: n
log N
Height ≈ log n N =
log n
In practice: 2-3 levels
58
Question: How do you pick parameter n?
1. Ignore inserts and deletes
2. Optimize for equality searches
3. Assume no duplicates
59
Roadmap


B-Tree
B-Tree variants
Sparse Index
 Duplicate Keys


Hash-based Indexes
60
Roadmap



B-Tree
B-Tree variants
Hash-based Indexes
Static Hash Table
 Extensible Hash Table
 Linear Hash Table

61
Hash-Based Indexes



Adaptations of main memory hash tables
Support equality searches
No range searches
62
Indexing Problem (recap)
Index Keys
record pointers
a1
a2
A = val
ai
an
Main Memory Hash Table
buckets
key
h (key)
0
1
32
48
2
10
(null)
27
75
3
4
h (key) = key % 8
5
21
6
7
55
(null)
(null)
(null)
(null)
64
Adapting to disk

1 Hash Bucket = 1 Block
All keys that hash to bucket stored in the block
 Intuition: keys in a bucket usually accessed together
 No need for linked lists of keys …

65
Adapting to Disk
How do we handle this?
66
Adapting to disk

1 Hash Bucket = 1 Block
All keys that hash to bucket stored in the block
 Intuition: keys in a bucket usually accessed together
 No need for linked lists of keys …
 … but need linked list of blocks (overflow blocks)

67
Adapting to Disk
68
Adapting to disk

Bucket Id  Disk Address mapping
Contiguous blocks
 Store mapping in main memory
 Too large?
 Dynamic  Linear and Extensible hash tables

69
Beware of claims that assume 1 I/O
for hash tables and 3 I/Os for B-Tree!!
70
Download