Ch4.6 B+ Tree

advertisement
§6 B+ Trees
【Definition】A B+ tree of order M is a tree with the following
structural properties:
(1) The root is either a leaf or has between 2 and M children.
(2) All nonleaf nodes (except the root) have between M/2 and
M children.
(3) All leaves are at the same depth.
EachMinterior
node
And
 1 smallest
All the actual
Assume each nonroot leaf also has between M/2 and M children.
keycontains
values in
Mthe
data are stored at
subtrees
pointers
except
to thethe
A B+ tree of order 4
thestleaves.
children.
1 one.
(2-3-4 tree)
21 48 72
12
1,4,8,11
25
12,13 15,18,19 21,24
<
1/13
15
<
<
31
25,26
<
<
41
59
84
31,38 41,43,46 48,49,50 59,68
<
<
<
<
72,78
91
84,88 91,92,99
<
<
§6 B+ Trees
A B+ tree of order 3
(2-3 tree)
22:
22:


16:
11:16
41:58
41:58
1, 88,11,12 11,12
16,17,18
16,17 16,17,18
 Find: 52  Insert: 18
22,23,31
22,23,31
 Insert: 1
1, 8
2/13
11,12
18:
16,17
18,19
58,59,61
58,59,61
 Insert: 19
 Insert: 28
16:22
11:
41,52
41,52
41:58
22,23,31
41,52
58,59,61
§6 B+ Trees
 Insert: 70
22:
16:
11:
1, 8
41:
18:
11,12
16,17
18,19
28:
22,23
58:
28,31
41,52
58,59,61
First find a sibling with 2 keys
and adjust. Keep more nodes full.
 Deletion is similar to insertion except that the root
is removed when it loses two children.
3/13
§6 B+ Trees
For a general B+ tree of order M
Btree Insert ( ElementType X, Btree T )
{
Search from root to leaf for X and find the proper leaf node;
Insert X;
while ( this node has M+1 keys ) {
split it into 2 nodes
with (M+1)/2
p.138
4.36 and (M+1)/2  keys,
respectively;
Access
if (this node is
the root) a 2-3 tree
create a new root with two children;
check its parent;
}
}
Home work:
Discussion 7: Depth(M, N) = ? Tinsert = ? Tfind = ?
4/13
Research Project 3
Family of B Trees (23)
In computer science, there is a family of B trees –
B- trees, B+ trees, B* trees, B# trees, and B x-trees. They
are tree data structures that keep data sorted and allow
searches, insertions, and deletions in logarithmic
(amortized) time.
In this project, you are supposed to introduce the Btrees and compare it with B+ trees.
Detailed requirements can be downloaded from
http://acm.zju.edu.cn/dsaa/
5/13
Research Project 4
Tries (23)
A trie is an index structure that is particularly useful
when the keys vary in length. It is also called a prefix tree,
and is used to store an associative array where the keys are
usually strings.
In this project, you are supposed to introduce the tries
and compare with ordinary binary search trees.
Detailed requirements can be downloaded from
http://acm.zju.edu.cn/dsaa/
6/13
Inverted File Index
How can I find in which
retrieved web pages that include
"Computer Science"?
7/13
Inverted File Index
 Solution 1: Scan each page for the string "Computer
Science".
How did Google do?
8/13
Inverted File Index
 Solution 2: Inverted File Index
【Definition】 Index is a mechanism for locating a given
term in a text.
【Definition】 Inverted file contains a list of pointers (e.g.
the number of a page) to all occurrences of that term in the
text.
silver truck
〖Example〗 Document sets
Doc
1
2
3
4
9/13
No.
1
Text
2
Gold silver truck
Shipment of gold
damaged in a fire
Inverted
File
Index
3
4
5
Delivery of silver
arrived in a silver
truck
6
Shipment of gold
arrived in a truck
9
7
8
10
11
Term
Times; Documents
a
arrived
damaged
delivery
fire
gold
of
in
shipment
silver
truck
<3; 2,3,4>
<2; 3,4>
<1; 2>
<1; 3>
<1; 2>
<3; 1,2,4>
<3; 2,3,4>
<3; 2,3,4>
<2; 2,4>
<2; 1,3>
<3; 1,3,4>
Inverted File Index
Doc
Text
No.
1
2
1
Gold silver truck
3
2
Shipment of gold
damaged in a fire
4
3
4
Delivery of silver
arrived in a silver
truck
Shipment of gold
arrived in a truck
5
6
7
8
9
10
11
Term
Times; Documents
a
arrived
damaged
delivery
fire
gold
of
in
shipment
silver
truck
<3; 2,3,4>
<2; 3,4>
<1; 2>
<1; 3>
<1; 2>
<3; 1,2,4>
<3; 2,3,4>
<3; 2,3,4>
<2; 2,4>
<2; 1,3>
<3; 1,3,4>
Discussion 8: How to easily print the sentences
which contain the words and highlight the words?
10/13
Inverted File Index
 Word Stemming
Process a word so that only its stem or root form is left.
〖Example〗 Process
processing
processes
processed
process
says
said
saying
say
 Stop Words
Some words are so common that almost every document
contains them, such as “a” “the” “it”. It is useless to
index them. They are called stop words. We can eliminate
them from the original documents.
11/13
Inverted File Index
 Access Methods
 Solution 1: Search trees ( B- trees, B+ trees, Tries, ... )
 Solution 2: Hashing
Discussion 9:
What are the pros and cons of using hashing?
Discussion 10:
How to improve the quality of search results?
12/13
Research Project 5
Roll Your Own
Mini Search Engine
In this project, you are supposed to create your own
mini search engine which can handle 1 million inquiries
over 100 files in 1 second.
You may download the functions for handling stop
words and stemming from the Internet.
Detailed requirements can be downloaded from
http://acm.zju.edu.cn/dsaa/
13/13
Download