SQLUnit19CS360WatsonCh11

advertisement
SQL Unit 19: Data Management:
Databases and Organizations
Richard Watson
Summary of Selections from Chapter
11 prepared by Kirk Scott
1
Outline of Topics
•
•
•
•
•
•
Relationship between O/S and dbms
Indexing
Hashing
File organization and access
Joining
B+ trees
2
Relationship between O/S and dbms
• Most performance concerns in dbms internals
eventually hinge on secondary storage
• In other words, by definition, a db is
persistent, stored on disk
• The dbms is responsible for managing and
accessing this data on disk
• As such, dbms internals are either closely
related to or integrated with O/S functions
3
4
• In general, db records may be longer or
shorter than O/S pages
• It’s convenient to think of them as shorter—
many records per page
• The performance goal can be stated
succinctly:
• Keep paging to a minimum
5
• If the dbms and O/S are integrated, the db
administrator may have the ability to specify
physical storage characteristics of tables:
• Clustering in sectors, tracks, cylinders, etc.
6
• Recall that in relational db’s, everything is
value based
• There is no such thing as following linked data
structures in order to find related data
• One of the fundamental problems of
implementing dbms internals is mapping from
values to locations (in secondary storage)
7
Indexing
• Indexes were introduced when considering
SQL
• In simple terms, they provide key based access
to the contents of tables
• It turns out that devising a special kind of
index was one of the critical parts of making
relational dbms’s practical to implement
8
• Indexes are one of the fundamental structures
used to provide access to data
• They also can be used to implement
operations like joining
9
• In simplest terms, the index can be visualized
as a two column look-up table
• This applies to an index on a primary key, for
example
• The index is sorted by the look-up key, the
primary key, for example
10
• Simple look up could be O(n)—linear search
through the index
• A better scheme would be O(log2n)—binary
search, for example
• The value obtained is the relative record
number (RRN) or the address of the
corresponding record
11
• Indexes on non-primary key fields are also OK
• In this case, the index can be visualized as
something slightly more complex than a 2
column look-up table
• There may be duplicate values of non-primary
key fields
12
• Therefore, for any look-up key, there may be
more than one corresponding record/address
• These multiple record addresses could be
managed as linked lists extending from the
look-up key
13
• In general, it is possible to have more than
one index on a table, on different fields
• It is also possible to specify a single index on
more than one field at a time (city + state for
example)
14
• In reality, an index is not typically
implemented as a simple look-up table
• The full scale details of one kind of indexing
scheme are given in the section on B+ trees
• In the meantime, it is worth considering one
nuance that results from the interaction
between dbms records and O/S pages
15
Sparse Indexes
• A given file may be stored in order, sorted by a
given field of interest
• Superficially, this might suggest that an index
on that field is not needed
• However, the size of database tables means
that you don’t want to have to do linear
search through secondary storage in order to
find a desired record
16
• The reality is that what you want is not a RRN
or an address—what you want is the page that
the desired record would be on
• This is because the O/S returns data in pages
anyway
• An RRN request would be translated into a
page request and the complete set of records
on the page would be returned as a block
anyway
17
• For a sorted file, an index may be sparse
• The index can again be envisioned as a simple
look-up table
• The look-up key values would correspond only to
the first records on each page
• If a desired value fell between two entries in the
index, it would be on the page of the first of
those two entries
• Note again that this only works if the table is
stored in sorted order on the look-up key
18
Clustering Tables
• The issue of whether a table is stored in some
sorted order is significant and will be treated
in general in a later section
• In the meantime, note that SQL supports this
with the keyword CLUSTER
19
•
•
•
•
This is an example of its use:
CREATE INDEX indexname
ON tablename(fieldname) CLUSTER
This means that as records are entered, the
table is organized in sorted order in secondary
storage
20
• The term inter-file clustering refers to storing
records from related tables in order
• For example, you could have mothers followed
by their children
• This violates every precept of relational
databases
• However, in rare circumstances this may be
within the db administrator’s power for
performance reasons
21
Index Access
• Indexing supports two kinds of access into a
file:
• Random access: Given a single look-up key
value, it’s possible to find the one (or more)
corresponding record(s)
• Sequential access: Reading through the index
from beginning to end produces all of the
records in a table sorted in the order of the
index key field
22
Index Support for Queries
• Not only do keys support the simple access
schemes given above, they can also support
various aspects of SQL queries
• Take a query with a WHERE clause for example
• Let the table be indexed on the field in the
WHERE clause
• Then a query optimizer could use the index in
order to restrict the results of the query without
having to search through the whole table looking
for matches
23
Hashing
• Hashing has many uses in computer science
• It turns out to have particularly useful
applications in dbms internals
• In a perfect world, a primary key field might
be an unbroken set of integers
• The identifiers for records would map directly
into a linear address space
24
• In reality, typically the set of values for a key is
sparse
• You have a few records with widely varying
key values
• Suppose you have 100 records
• It would be helpful to have a scheme that
would map those records into a linear address
space from 0 to 99
25
• The reason for this is the following:
• In general, you have alternatives on how to
store the records of a table
• They can be stored in arrival sequence
• You could cluster them
• If you hash them, they can be saved at a
particular address (offset), without wasting
space due to the sparseness of the key values
26
• The utility of hashing comes from the
following:
• The location of a record is computed based on
the key on its way in
• That means that, given the key value, the
location of the corresponding record can be
computed again for easy access upon retrieval
27
• Indexing supports both direct access and
sequential access
• Hashing doesn’t support sequential access,
but it does support direct access
• As a matter of fact, no better scheme for
implementing direct access exists
• It is quicker to hash than it is to search an
index
28
• The classic hashing algorithm, which is
relatively easy to illustrate, is divisionremainder hashing
• The look-up or hashing key of interest may be
of any type
• If it’s not actually an integer field, let it be
converted into a unique integer value
• Let the number of expected records, the size
of the desired address space, be n
29
• Then choose p to be the smallest prime
number larger than n
• A prime number is desirable because it will
tend to minimize problems like collisions (see
below)
• Why this is the case will not be explained
• It is rooted in the mysteries of abstract algebra
30
• The idea is that for key values of integer form
which are larger (or smaller) than p, you can
do integer division by p
• What you are interested in is the remainder—
in other words modulus
• The range of possible modulus values when
dividing by p is 0 through p – 1
• This is the new, limited address space defined
by the hashing scheme
31
• A simple example will illustrate the idea
• Let the key of interest be 9 digit social security
numbers
• Let the desired address space be 20
• 23 is the smallest prime number larger than
20
32
• The table on the next overhead shows the
results of hashing for a given set of values
• It also illustrates what a collision is
• Many different 9 digit numbers, mod 23,
might give the same remainder
• A collision is an occurrence of such a situation
33
SSN
Hash value, ssn % 23
472604787
5
472543952
5
472865123
4
472776747
17
472875522
7
472987531
6
472101352
17
472256749
3
472616853
19
472753864
19
Collision?
Collision, yes
Collision, yes
Collision, yes
34
• Collisions are a natural result of this scheme
• They are not an insuperable problem
• The thing to keep in mind is that when doing
look-up, you repeat what you did at hashing
time—you hash again
35
• In other words, you get the same hash value
back
• This means that the collision occurs again, but
this is not a problem
• The only thing you have to worry about is
where you store two different things that hash
to the same location in the 0-22 address
space
• There are basically two approaches
36
• The first approach is to maintain an overflow
area at the end of a page
• Suppose you hash something on look-up
• You go to the hash address obtained
• When you get there, you do not find the
desired key value
• Then go to the overflow area at the end of the
page and do linear search in it
37
• Alternatively, if records collide, they can
simply be displaced
• In other words, let there be a collision upon
data entry
• Simply search forward in the address space
until the next empty slot is found and place
the new record there
• The same strategy is used for finding the right
record when accessing data later
38
• The address space and the placement of
records after the first six hashes in the table
above is illustrated in the table on the
following overhead
• This shows the results after 6 hashes.
39
Address
Key value
Present/in place?
4
472865123
3rd hash, in place
5
472604787
1st hash, in place
6
472543952
2nd hash, out of place (5)
7
472875522
5th hash, in place
8
472987531
6th hash, out of place (6)
572776747
4th hash, in place
0
1
2
3
…
17
…
40
• Note that in order to be practical, there have
to be limitations on collision handling in this
way
• If an overflow area is used, it should be of
fixed size
• If you just move to the next available space,
you might be limited to spaces on the same
page
41
• It is desirable to be able to do existence
queries
• It is also desirable to be able to determine
that an incorrect (non-existent) key has been
entered
• If the overflow space is unlimited, potentially
you have to do an exhaustive search to
establish these things
42
• Thus, overflow space has to be limited
• For this reason it is possible to run out of
space when inserting records into a hashed
file
• It is also possible to simply exhaust the hash
space
• When either of these things happen, a hashed
file has to be reorganized
43
File Organization and Access
• The previous discussions of indexing and hashing
may have seemed somewhat disjointed
• Now that both topics have been covered, it’s
possible to summarize some of the choices for
maintaining tables in secondary storage and their
advantages and disadvantages
• Choices like indexing are available to users
• Other choices, like clustering and hashing, would
only be available to database administrators
44
Arrival Order Files
• File organization: Arrival order—this is
standard
• Indexed: Yes, possibly on >1 field
• Access: Random and sequential by index
• Performance: Good for both
• Maintenance and cost: None on base file;
update and deletion maintenance costs on
index(es)
45
Clustered Files
• File organization: Sequential—in other words,
maintained in sorted order by some field
• Indexed: Not necessarily—possibly desirable on
non-key fields, sparse if on key field
• Access: Sequential on key. No other unless
indexed
• Performance: perfect sequential on key
• Maintenance and cost: Initial sorting, overflow
and reorganization of base table—cost is
horrendous
46
Hashed Files
• File organization: Hashed (on one field only)
• Indexed: Typically, no—hashing implies that the
principal goal is random access on a single field;
you don’t need sequential access and don’t want
the cost of index maintenance
• Access: Direct/random (only)
• Performance: The best possible direct access
• Maintenance and cost: Reorganization if the
address space fills or there are too many
collisions
47
• Notice that choosing hashed file organization is a
specialized option that would be available to a
database administrator
• It is not necessarily part of a standard database
application
• It is used when direct access is critical to
performance
• Historically, things like airline ticketing databases
have driven the need for extremely quick direct
access
48
Joining
• Historically, the performance costs of joining
were one of the things that made the
relational model impractical
• As noted in the chapter on the relational
model, implementing the theoretical
definition of a join is out of the question:
• Form the Cartesian product of two tables and
then perform selection and projection on the
results…
49
Merge Join
• In theory, the cost of joining would be
manageable under these conditions:
• The two tables were each maintained in
sequential order on their respective joining
fields
• This would make merge-join possible
• However, the cost of maintaining sequential
files is prohibitive
50
Interfile Clustering
• There is also a concept known as interfile
clustering
• This means that the records for two different
tables are stored intermixed
• In other words, the physical data, for example,
follows a pattern such as this:
51
• Mother 1, child a, child b, mother 2, child c,
mother 3, child d, child e, child f, …
• In rare cases, a database administrator may
specify this as the only way to get acceptable
performance
• However, it is very costly to maintain tables in
this way, and very rare
52
Nested Loop Join
• Nested loop would be the naïve option for
files in arrival order with no indexes on the
joining fields
• This kind of algorithm would be O(n2)
• It is not out of the question, but it is important
to remember an underlying reality
• The costs of these algorithms are in secondary
storage access, not main memory access
• Anything above linear is painfully expensive
53
Index Join
• If both tables are indexed on their joining
fields, then merge join becomes possible on
the indexes
• This is certainly better than having nothing to
work with at all, but there is still a warning:
54
• Even though progression through indexes is
linear, the retrieval of records from the tables
may not follow this pattern
• In other words, if the records themselves are
not clustered, you may read the same page
more than once at different times in order to
access the various records on it
55
Hash Join
• It turns out that hashing was the basis for a
joining algorithm that finally made the
relational model practical
• The fundamental principle of hashing is the
same as explained above, but it’s applied in a
different way
56
• A preliminary picture of main memory and
secondary storage is given on the next
overhead
• The assumptions about and relationships
between the tables, records, buckets, and
pages will be explained following it
57
58
• This scheme only makes sense if the size of
the tables is such that there would be
significant I/O overhead
• This is a realistic assumption
• In order for it to work, there needs to be a
significant amount of main memory available
• This is just a fact of life
59
• Only a general description of the scheme will
be given
• In order for this to work various parameters
would have to be tuned
• This can be accomplished based on experience
but the details are of no interest here
60
What’s a Bucket?
• There is a significant difference in the use of
hashing here compared to the earlier
explanation
• Earlier, collisions were a necessary evil
• Here, collisions are desirable and the hashing
address space is so defined
61
• The term “bucket” refers to one collection of
things (table records) that hash to the same
value
• As seen in the picture given earlier, the
expectation is that multiple pages worth of
records will hash into a single bucket
62
• Hashing now is being used to group together
things that have something in common, not to
map individual items into an address space
• Actually, depending on the values involved,
bucket hashing may place more than one set
of items that have something in common into
the same bucket.
• In that case, all the two sets have in common
is that they both hash to the same place
63
• Under the earlier explanation, if you had m
items, ideally they would hash to m
contiguous values
• Under bucketing, you have m different items
an you would like to hash them into n buckets
where n is significantly smaller than m
• Hashing now becomes a grouping scheme
rather than an addressing scheme
64
Assumptions for the Example
• What follows is a numbered list of
assumptions needed in order to explain how
hash join works:
• 1. More than one table record fits into a page
of memory
• 2. When hashing, more than one page’s
worth of records will hash to the same value
• Let the collection of things that hash to the
same value be referred to as a bucket
65
• 3. Note that you’re not worried about
collisions
• If there is more than one record with the
same key value that hashes to the same
bucket, that’s OK
• It’s also OK if there are genuine collisions,
where different key values hash to the same
bucket
66
• 4. The parameters that have to be tuned in
order for this to work involve the size of the
hash space (i.e., the number of different
buckets) relative to the sizes of the tables to
be joined.
67
The Hash Join Algorithm Phases
• Hash join proceeds in two phases
• 1. a reading/hashing/writing phase
• 2. a reading/joining/writing phase
68
Hash Join Demands on Memory
• In order for the scheme to work:
• During phase 1, at least one page for each
bucket has to fit in memory at the same time
• This means that the number of buckets total
can’t exceed the size of memory in pages
allocated to the process
69
• During phase 2, all of the pages for
corresponding buckets of tables A and B have
to fit in memory at the same time
• This mathematical expression makes specific
the limitation on buckets/pages vs. the
memory allocated to the process:
• Size{max[card(A buckets) + card(B buckets)]}
• <= size(main memory)
70
Implementability
• Ultimately, the implementability of the algorithm
depends on:
• 1. The sizes of tables A and B
• 2. The distribution of the values of the joining
fields in the two tables
• It is the joining field values that they’ll be hashed
on
• 3. The amount of main memory available to the
joining process
71
• Notice how you’re caught in a vise:
• Limited memory implies a maximum number
of buckets
• If you have large tables, each bucket may
consist of many pages
• If a bucket consists of too many pages, it
won’t fit in memory
72
• This is a simplistic way of seeing it:
• Let there be n pages allocated and let there be
n buckets
• Let the size of table A be m, for example
• Hashing table A gives n groups of records
73
• If you added up the number of records in each
group you would get m
• If table A is too large or the distribution of
values is not good, one of those n groups
could be larger than the memory space
• This is not a problem in the first phase, but it
can be a problem in the second phase
74
Tunability
• Available memory might be a tunable
parameter, with some leeway towards
granting larger amounts of memory to a
joining process
• The hashing algorithm itself is probably preselected, so that wouldn’t be tunable directly
• However, if the minimum memory needs are
met, picking how many buckets to use is
tunable
75
•
•
•
•
•
This is a classic balancing act
You can try to optimize join performance
That would demand more memory
You can try to conserve memory
That would affect join performance
76
The Phases in Detail
• Phase 1:
• If all of the necessary conditions are met, then
phase 1 of the hash join algorithm can be
described as follows:
• Read A in arrival order
• Hash on the joining field
• Write out the bucket pages as they fill
77
• Do the same for B
• The end result is two new files in secondary
storage
• These files contain the records of both A and
B, organized in hash order
• The point is that A and B can now both be
read back in in hash order
78
•
•
•
•
•
Phase 2:
Phase 2 of the hash join algorithm can be
described as follows:
Read tables A and B back in from secondary
storage, bucket by matching bucket
Note that collisions don’t matter
You may have a mixture of different key values,
but the same key values from A and B will be
present
79
• Use a memory resident algorithm to form the
join of the bucket contents
• The memory resident algorithm can sort out
which records actually match with which
records
• The point is that all records that would match
would be in memory at the same time
• You then write the join results back out, page
by page for each bucket
80
Tunability, Again
• Note that the memory required by the overall
algorithm can vary over time
• If buckets are small, less memory will be
required
• If buckets are large, more memory will be
required
• It’s actually an operating system function to
determine how much memory a process can
have at a given time
81
Efficiency of the Algorithm
• The reality is that the memory-resident
algorithm is probably O(n2)
• Essentially, you scan the corresponding
buckets for A and B looking for matches
• It is the in-memory equivalent of nested loop
join
82
• The critical point is the following:
• Access to secondary storage, paging, has been
optimized
• In total, each of the records of A and B, that is,
each page containing records of A and B, is
read exactly twice
• A and B are each read once during phase 1
and again during phase 2
83
• Hash join is O(n) in I/O costs
• Access to secondary storage is potentially
around 3 orders of magnitude slower than
memory access
• I/O costs will dominate any algorithm that
involves access to secondary storage
84
Memory Allocation, Again
• The importance of the memory allocation
comes up again here
• If the allocation falls below that needed in
order to hold the required buckets, then the
performance changes
• It may become necessary to read various
pages more than one time each
85
• Suppose bucket x of Table A consisted of p pages
and bucket x of Table B consisted of q pages
• In the worst case you would have to read p x q
pages to form the join
• In other words, the algorithm is O(n2) in
secondary storage access
• The lack of memory has defeated your purpose
and given you a performance nightmare
86
Reality
• Devising an algorithm that’s linear in I/O costs
means that whatever you have to do in memory
is of virtually no performance consequence
• In theory, there can be memory resident
databases where the foregoing discussion doesn’t
apply
• However, the real world of relational databases,
by definition, consists of substantial amounts of
data stored in tables in secondary storage
87
• To reiterate the point made at the beginning:
• Implementing databases in relational form
only truly became practical when hash join
was developed
• This topic was covered in these overheads
because it is of such fundamental importance
88
B+ Trees
• As noted above, the application of hashing
was critical to making relational database
systems practical
• The development of indexes was equally
important
• As stated before, real indexes are not, in fact,
simple look-up tables
89
• In reality, indexes take a tree-like form
• Also, they are not simply indexes
• The records in a table can be stored in a treelike structure that is simultaneously index like
• B+ trees can serve as indexes or combined
indexes and tables
90
VSAM
• The classic, original development of B+ tree
like structures was known at IBM as VSAM
• This stood for virtual storage access method
• The tree structure is known as a B tree, or
depending on the implementation, a B+ tree
• The characteristics of VSAM will be compared
with the other file organization and access
options listed earlier
• Then the details of B+ trees will be given
91
• File organization: VSAM
• Indexed: This is a B tree (index) with data
records stored in the index nodes
• It is inherently indexed
92
• Access: Both random and sequential are
supported
• For access on the organizing field, this doesn’t
involve a reference to a separate index, since
the data and index are unified
• Because records are clustered on tree node
pages, genuine effective sequential access to
records in secondary storage is supported by
tree traversal, page by page
93
• Performance: Access to any record (page) is
bounded by logn (number of records in file)
where n = the number of records per tree
(index) node
• Maintenance and cost: The insertion and
deletion algorithms automatically maintain
the data and indexing simultaneously
94
B+ Tree Background
• 1. You can think of B+ trees as being the hardcoded equivalent of binary (or in general, base
n) search.
• 2. The B in the name means balanced.
• The nodes in the tree may vary in how many
entries they contain, but balanced means that
all of the leaves are the same distance from
the root.
95
• 3. The balance of the tree is desirable
because it places an upper bound on the
number of pages that have to be read in order
to get any value.
• The bound is O(logn(number of records in file))
where n = the number of records per index
node
96
• 4. If + is included in the name of the data
structure, this signifies that the index tree and
the nodes containing data are separate
• You traverse the tree on key value
• At the bottom-most nodes in the tree, there
are pointers to the nodes containing data
• The bottom-most nodes can also be traversed
in order, providing sequential order access
directly without traversing the index tree.
97
• In the discussion that follows, the examples
will consist of a tree that only contains index
values, not records
• They will be B+ trees, with links to the records
at the bottom of the tree
98
Example
• An example of a B+ tree at a certain stage of
development is shown on the next overhead.
• It is taken from page 4 of part 1 of the
assignment keys.
• The question of how insertions and deletions
are made will be addressed later.
• At this point it is simply desirable to see a tree
and explain its contents.
99
100
• The tree structure represents an index on a
field in a table.
• The tree consists of nodes which each fit on a
single page of memory.
• In this diagram, the pairs of parentheses and
their contents represent the nodes in the tree.
• The integers are values of the field that is
being indexed.
101
• This field may not be a key field in the table,
but in general, when indexing, the field that is
being indexed on can be referred to as the key.
• The nodes also contain pointers.
• In this diagram the pointers are represented
by arrows.
• In reality, the pointers would be stored in the
nodes as addresses referring to other nodes.
102
• This illustration is set up as a pure index.
• The idea behind VSAM is that each node
(page) is large enough to hold not just a key
value, but the complete record containing it.
• This would mean that the contents of a file
were conceptually stored as a tree—
• And that the file contents would be essentially
self indexing.
103
• In the tree as given, which is pure index, the
top two rows form the index set.
• The bottom row forms the sequence set.
• The pointers in the index nodes point to
internal or leaf nodes of the tree.
104
• From the sequence set it is possible to point
to the pages containing the actual table
records containing those key values.
• This is indicated by the vertical arrows
pointing down from the leaf nodes.
• The horizontal arrows between the leaf nodes
represent the linkage that makes it possible to
access the key values in sequential order using
this index.
105
• Observe that in this example, each index node
can contain up to n = 4 pointers, and it can
contain up to n – 1 = 3 key values.
• If every node were completely full, there
would be 4 pointers in each.
106
• That means that the total number of key
values possible in the sequence set would be
4 * 4 = 16.
• All sequence set nodes are exactly 2 levels
down from the root.
• The bound on the number of page reads to
get through the index tree is log4 16 = 2.
107
• There are additional rules governing the
formation of trees of this sort.
• Counting by pointers, internal and leaf nodes
are not allowed to fall below half full.
• If n is even, that means that you are allowed
to have no fewer than n / 2 pointers in a node.
108
• If n is odd, you round up, and the minimum is (n /
2) + 1.
• Some books use the notation of the ceiling
function, n/2, which means the same thing.
• Because fullness is measured by the number of
pointers, it is possible for it to appear less than
half full when looking at the number of key values
present in a node.
• Finally, it is permissible in general for the root
node to fall below half full.
109
• Another thing becomes apparent about B+ trees
from looking at the example.
• In each node the key values are in order.
• There is also a relationship between the order of
the key values in one node, the pointers coming
from it, and the values in the nodes pointed to by
these pointers.
• This relationship is intrinsic to the meaning of the
contents of the tree and will be explained further
below when covering the rules for inserting and
deleting entries.
110
• It is also apparent that the index set is sparse
while the sequence set is dense.
• In other words, the leaves contain all key
values occurring in the table being indexed.
• Some of these key values occur in the index
set, but the majority do not.
• If a key value does occur in the index set, it
can only occur there once.
111
• It will become evident when looking at the
rules for inserting values how this situation
comes about.
• When the tree is growing, a value in a
sequence set node can be copied into the
index set node above it.
• However, when values are promoted from one
index set node to another they are not copied;
they are moved.
112
• A final remark can be made in this vein.
• The example shows creating a B+ tree on the
primary key of a table, in other words, a field
that is unique.
• All of the example problems on this topic will
do the same.
113
• If the index were on a non-unique field, the
difference would show up only in the
sequence set.
• It would be necessary at the leaf level to
arrange for multiple pointers from a single key
value, pointing to the multiple records that
contained that key value.
114
Creating and Updating B+ Trees
• Some authors present the rules for creating and
maintaining B+ trees as a set of mathematical
algorithms.
• Others give pseudo-code or code for
implementations.
• There is also a certain degree of choice in both
the algorithm and its implementation.
• What will be given here are sets of rules of thumb
that closely parallel Korth and Silberschatz.
115
• The kinds of test questions you should be able
to answer about B+ trees would be like the
assignment questions.
• In other words, given the number of key
values and pointers that a node can contain,
and given a sequence of unique key values to
insert and delete, you need to be able to
create and update the corresponding B+ tree
index.
116
Summary of the Characteristics of a
Correctly Formed Tree
• Some general rules of thumb that explain the
contents of a tree are given beginning on the
next overhead.
• More specific rules for insertion and deletion
are given in following lists.
• At the outset, however, it’s helpful to have a
few overall observations.
117
General Rules of Thumb, 1—Sequence
Set
• At the very beginning the whole tree structure
would consist of only one node, which would be
both the index set and the sequence set at the
same time.
• After the first node is split there is a distinction.
• The meaning of pointers coming from and
between sequence set nodes has already been
given above and no further explanation is
needed.
• The remaining remarks below address the
considerations of index set nodes specifically.
118
General Rules of Thumb, 2—Index Set
• If a key value appears in a node, it has to have
pointers on each side of it.
• In other words, the existence of a value in a node
fundamentally signals “branch left” or “branch
right”.
• In the algorithm for the insertion of values it will
become apparent that as the tree grows, a new
value in an index set node is promoted from a
lower node to indicate branching to the left or
right.
119
General Rules of Thumb 3—Index Set
• The pointer to the left of a key value points to the
subtree where all of the entries are strictly less
than that key value.
• The pointer to the right of a key value points to
the subtree where all of the entries are greater
than or equal to that key value.
• The “greater than or equal to” is part of the logic
of the tree that allows sequence set values to
appear in the index set, thereby creating the
index.
120
General Rules of Thumb, 4—Index Set
• As insertions are made, it is possible for a
node to become full.
• If it is necessary to insert another value into a
full node, that node has to be split in two.
• The detailed rules for splitting are given
below.
121
General Rules of Thumb, 5—Index Set
• Deletions can reduce a node to less than half
full.
• If this happens, sibling nodes have to be
merged.
• The detailed rules for merging are given
below.
122
Inserting and Deleting
• There is an important conceptual difference
between balanced trees and other tree
structures you might be familiar with.
• In other trees you work from the root down
when inserting and deleting.
• This leads to the characteristic that different
branches of the tree may be of different
lengths.
123
• In order to maintain balance in a tree, it’s
necessary to work from the leaves up.
• You use the tree to search downward to the
leaf (sequence set) node where a value either
would fall, or is.
• You then either insert or delete accordingly,
and adjust the index set above to correspond
to the new situation in the leaves.
124
• Enforcing the requirements on the fullness of
nodes leads to either splitting or merging.
• As a consequence of the adjustment to the
index set, the depth of the whole tree might
grow or shrink depending on whether the
inserting/splitting or deleting/merging
propagate all the way back up to the current
root node of the tree.
125
Rules of Thumb for Inserting
• Here is a list of the rules of thumb involved in
inserting a new value into the tree.
• 1. Search through the tree as it exists until
you find the sequence set node where the key
value belongs.
• 2. If there is room in the node, simply insert
the key value in order.
• Such an insertion has no effect upwards in the
index set.
126
• 3. If the destination leaf node is full, split it into 2
nodes and divide the key values evenly between
them.
• 4. Notice that in all of the examples the nodes
hold an odd number of values.
• This makes it easy to split the values evenly when
the n + 1st value is to be added.
• A real implementation would have to deal with
the possibility of uneven splits, but you do not.
127
• 5. When a node is split, the two resulting
nodes remain at the same level in the tree and
become siblings.
• 6. The critical outcome of a split is that the
new siblings’ parent node, its values, and its
pointers have to be updated to correctly refer
to the two new children.
128
• 7. In general, when a node is split, the leftmost
value in the new right sibling is promoted to the
parent.
• The fact that it is always the leftmost value that is
promoted is explained by the fact that after
promotion the parent’s right pointer points to a
subtree containing values greater than or equal
to that value.
• Promoting itself takes on two different meanings.
129
• When a value is inserted into a sequence set
node and is promoted from there into the index
set, what is promoted is a copy of that value.
• This explains how sequence set values appear in
the index set.
• However, if further up a value is promoted from
one index set node into another, it is moved, not
copied.
• This explains why a value can appear at most
twice in the tree, once in the sequence set and
only once in the index set.
130
• 8. The splitting and promoting process is
recursive.
• If the parent is already full and a value is to be
added to it, the parent is split into two siblings
and its parent is adjusted accordingly.
131
• 9. When you split a child and promote, if the
promotion causes a split in the parent, you
end up with the following situation:
• The leftmost pointer in the new right parent
appears to be able to point to the same child
as the rightmost pointer of the new left
parent.
132
• In other words, when the parent is split, 2 new
pointers arise when the number of children
only rises by one.
• However, the problem is resolved because the
split in the parent requires that the leftmost
pointer in the new right parent also be
promoted, and this promotion is a move, not
a copy.
133
• 10. If the splitting and promoting process
trickles all of the way back up to the root and
the root is split, then a new root node is
created.
• The last value to promote is put into this new
root.
• This is how the tree grows in depth.
134
• This growth at the root explains why balance
is maintained in the tree and no branches
become longer than any others.
• It also explains why it is necessary to allow the
root to be less than half full:
• A brand new root node will only contain the
single value that is promoted to it.
135
Deleting
• As described above, the splitting of nodes is
binary, resulting in two new sibling nodes.
• This is a simple and unremarkable result of the
insertion algorithm.
• Deletion and merging introduce a slight
complication.
• If a deletion causes a node to fall below half full,
it needs to be merged with another node.
• The question is, which one, the left sibling or the
right sibling?
136
• Except for the root, every node will have at least
one sibling.
• In general, it may have zero or more on each side.
• Should it be merged only with an immediate
neighbor, and if so, should it be the one on the
left or the right?
• The rules of thumb below embody the arbitrary
decision to merge with the sibling on the
immediate right, if there is one, and otherwise
take the one on the immediate left.
137
• In developing rules of thumb for this there is
another consideration with deletion that leads
to more complication than with insertion.
• It may be that the sibling that you merge with
has the minimum permissible number of
values in it.
• If this is the case, the total number of values
would fit into one node and you would truly
merge.
138
• If, however, the sibling to be merged with is over
half full, merging alone would not result in the
loss of a node.
• The values would simply have to be redistributed
between the nodes.
• The situation where the two nodes would
actually merge into one would be rare in practice.
• However, it is quite possible with examples where
the nodes can only contain a small number of
values and pointers.
139
• Just as with splitting, merging can trickle all of the
way back up to the root.
• If it reaches the point where the immediate
children of the root are merged into a single
node, then the original root is no longer needed.
• This is how the tree shrinks in a balanced way.
• Situations where nodes are merged and the
values are redistributed between them will still
require that the values and pointers in their
parent be adjusted.
140
• Finally, a simple deletion from the sequence
set which does not even cause a merge can
have an effect on the index set.
• This is because values in the index set have to
be values that exist in the sequence set.
• If the value disappears from the sequence set,
then it also has to be replaced in the index set.
• This is as true for the root node as for any
other.
141
• Here is one final note of explanation that is
directly related to the examples given.
• In order to make the examples more
interesting, the following assumption has
been made:
• You measure the fullness of a sequence set
node strictly according to the same standard
as an index node.
142
• Take the case where an index set node
contains 3 key values and 4 pointers for
example
• In the sequence set a node would contain 3
key values and 3 pointers
• An index set node might have only one key
value in it, but is considered half full because
it has two pointers in it.
143
• If a sequence set node falls to one key value,
then it only has one pointer in it, the pointer
to the record.
• It has fallen to less than half full.
• Thus, this sequence set node has to be
merged with a sibling.
144
Rules of Thumb for Deleting
• Here is a list of the rules of thumb involved in
deleting a value from the tree.
• 1. Search through the tree as it exists until
you find the sequence set node where the key
value exists.
145
• 2. Delete the value.
• If the value can be deleted without having the
node drop below half full, no merging is needed.
• However, if the deleted value was the leftmost in
a sequence set node (other than the leftmost
sequence set node), that value appears in the
index set and has to be replaced there.
• Its replacement will end up being the new
leftmost value in the sequence set node from
which the value was deleted.
146
• 3. If the deletion causes the node to drop
below half full, merge it with a sibling, taking
the sibling immediately on the right if there is
one.
• Otherwise take the one on the left.
147
• 4. If the total number of values merged
together can fit into a single node, then leave
them in a single node and adjust the values
and the pointers in the parent accordingly.
148
• 5. If the total number of values merged
together still have to be put into two nodes,
then redistribute the values evenly between
the two nodes and adjust the values and the
pointers in the parent accordingly.
149
• 6. Now check the parent to see whether due to
the adjustments it has fallen below half full.
• Recall that the measure of fullness has to do with
whether the number of pointers has fallen below
half.
• In most of the small scale examples given, the
sure sign of trouble is when a parent has only one
child.
• A tree which doesn’t branch at each level is by
definition not balanced.
150
• 7. If the parent is no longer half full, repeat
the process described above, and merge at
the parent level.
• This is the recursive part of the process.
151
• 8. Deletions can be roughly grouped into four
categories with corresponding concerns.
• 8.1. A deletion of a value that doesn’t appear in
the index set and which doesn’t cause a merge:
• This requires no further action.
• 8.2. A deletion of a value that appears in the
index set and which doesn’t cause a merge:
• Promote another value into its spot in the index
set.
152
• 8.3. A deletion which causes a redistribution of
values between nodes:
• This will affect the immediate parent; this may
also be a value that appeared higher in the index
set, requiring the promotion of a replacement.
• 8.4. A deletion which causes the merging of two
nodes:
• Work back up the tree, recursively merging as
necessary; also promote a value if necessary to
replace the deleted one in the index set.
153
• 9. If the merging process trickles all of the
way back up to the root and the children of
the current root are merged into one node,
then the current root is replaced with this new
node.
• This illustrates how balance is maintained
when deleting, because the length of all
branches of the tree is decreased at the same
time when the root is replaced in this way.
154
B+-Tree examples
• The first three example exercises were taken
from a previous edition of Korth and
Silberschatz.
• The same problems live on in a more recent
edition with different numbering.
• They're given in the fifth edition as shown on
the following overheads.
155
• 12.3 Construct a B+-tree for the following set of
key values:
• (2, 3, 5, 7, 11, 17, 19, 23, 29, 31)
• Assume that the tree is initially empty and values
are added in ascending order. Construct B+-trees
for the cases where the number of pointers that
will fit in one node is as follows:
• a. Four
• b. Six
• c. Eight
156
• 12.4 For each B+-tree of Exercise 12.3, show
the form of the tree after each of the
following series of operations:
• a. Insert 9.
• b. Insert 10.
• c. Insert 8.
• d. Delete 23.
• e. Delete 19.
157
• The example exercises are worked out on the
following overheads.
• As usual, the idea is that this may provide a
helpful illustration.
• If you decide to work the exercises yourself, it
is unlikely that you would be able to memorize
the given solutions.
• Instead, they are available for you to check
your own work if you want to.
158
B+-Trees, Example 1
• Let the index set nodes of the tree contain 4
pointers.
• Construct a B+-tree for the following set of key
values:
• (2, 3, 5, 7, 11, 17, 19, 23, 29, 31)
159
160
161
162
163
164
165
166
167
168
169
•
•
•
•
•
•
Now take these additional actions
a. Insert 9.
b. Insert 10.
c. Insert 8.
d. Delete 23.
e. Delete 19.
170
171
172
173
174
175
176
177
178
179
B+-Trees, Example 2
• Let the index set nodes of the tree contain 6
pointers.
• Construct a B+-tree for the following set of key
values:
• (2, 3, 5, 7, 11, 17, 19, 23, 29, 31)
180
181
182
•
•
•
•
•
•
Now take these additional actions
a. Insert 9.
b. Insert 10.
c. Insert 8.
d. Delete 23.
e. Delete 19.
183
184
185
186
B+-Trees, Example 3
• Let the index set nodes of the tree contain 8
pointers.
• Construct a B+-tree for the following set of key
values:
• (2, 3, 5, 7, 11, 17, 19, 23, 29, 31)
187
188
•
•
•
•
•
•
Now take these additional actions
a. Insert 9.
b. Insert 10.
c. Insert 8.
d. Delete 23.
e. Delete 19.
189
190
B+-Trees, Example 4
• This example is not taken from Korth and
Silberschatz.
• Let the index set nodes of the tree contain 4
pointers.
• Construct a B+-tree for the following set of key
values:
• (3, 8, 6, 9, 15, 20, 4, 25, 30, 13, 11, 7)
• Then delete 20 and 7.
191
192
193
194
195
196
197
• Now delete 20 and 7.
198
199
200
201
202
203
The End
204
Download