B+ Tree Insertion Example

advertisement
File Storage and Indexing


File Organizations
Indices
 Types of index
 Tree based indexing
 Hash based indexing

Accessing a table
 Consider SELECT * FROM Customer

To make access to a table efficient
 Store the table on adjacent blocks
 On the same cylinder, or
 Adjacent cylinders

But many queries include where clauses
 File organizations should support the efficient
retrieval of records within a file

Data in a database consists of collections of records,
or data files
 Each file consists of one or more blocks on a disk

A file organization is a method of arranging records
in a file
 File organizations make some operations more efficient

In addition, files can be indexed, to provide multiple
ways to access records efficiently
 Index files contain search key values and references to
records


Database data must be persistent, so must be stored on
secondary memory, such as a hard disk
Disk access is relatively inefficient, in the order of 10 to
15 milliseconds to access a single page
 Hundreds of thousands of times more than an equivalent access
to a main memory location
 The cost of disk I/O dominates the cost of database operations

The unit that data is read from or written to disk is a
block, typically 8 kilobytes
 Reading several pages in sequence from a disk takes much less
time than reading several random pages

The minimum set of file operations is:
 Create and destroy files
 Insert and delete records
 Scan an entire file
▪ A scan brings all of the records in the file into main memory

A single DB table is usually stored as a single file
 Every record in a file has a unique record ID, or rid
▪ A rid consists of a block address, and a slot number

The simplest file structure is an unordered file, or heap
file

Heap files support insertion and deletion of records and
file scans
 Because the entire file can be scanned, individual records or
collections of records can be found

New records are inserted where there is room
 Either in slots that contained previously deleted records, or
 At the end of the file

When a record is deleted, no other records are affected
 That is, there is no need to reorganize remaining records

The records in a sorted file, or sequential file, are stored
in order
 Based on the sort key of the file
▪ The attribute of the record that the file is sorted on


The basic organization assumes that file pages are filled
to conserve space
Pages should be maintained in sequence
 To allow for more efficient disk access


Insertions result in records being shuffled up
Deletions result in records being shuffled down

To avoid inefficiencies involved in inserting and deleting
records from sorted files
 Pages have only partial occupancy
▪ i.e. space for future insertions is left in each page
 Overflow pages can be attached (by pointers) to pages that
become full
 Records can be locally reorganized in adjacent pages

Sorted files may need to be periodically re-ordered

An index is a data structure that organizes data to
optimize the retrieval of records on some criteria
 An index supports efficient retrieval of records based on the
search key of the index
 An index can be created for a file to speed up searches that are
not efficiently supported by the file's organization
 A file can have more than one index

An index is a collection of data entries which must
contain:
 A search key value, k, and
 Information to find data records with that search key value

An index on a sequential file
 The search key of the index is the same as the sort
key of the file

Primary indexes can be either dense or sparse
There are different kinds of keys
Primary key
Candidate key
Sort key
Search key
Superkey

A dense index is a sequence of blocks containing
search key : rid pairs
 The rid contains addresses to records with the search key
 The blocks of the index are in the same order as the file

Searching an index is faster than searching the file
 The index is smaller
 The index is sorted so binary search can be used
 The index may be small enough to fit in main memory
▪ If so, once the index has been read, records can be found with one
disk I/O
Two data records or four index records fit on one block
10
10
20
20
30
30
41
41
file
53
53
60
60
77
91
index
77
91

A sparse index usually contains one data entry for
each block of records in a data file
 It is only possible to use a sparse index if the data file is
sorted by the search key of the index
 Sparse indexes are smaller than dense indexes

Sparse indexes are searched in much the same way
as dense indexes
 Except that the index is searched for the largest key less
than or equal to the target value
 The rid is then followed to a block of the data file
Two data records or four index records fit on one block
10
20
10
30
30
53
41
file
77
53
...
60
index
77
91

An index on a large data file can cover many blocks
 Even using binary search, multiple disk I/Os may be
needed to find a record
 An alternative is to build a multiple level index

The first level of the index may be dense or sparse
 Subsequent levels of the index are sparse indexes on the
preceding level of the index
multiple level index
10
10
20
20
10
53
...
30
30
41
41
file
53
53
60
60
77
91
77
91

We define a primary index is an index whose search key
is the same as the sort key of a sequential file
 There can only be one primary index for a file
▪ Terminology – a primary index sometimes refers to an index where the
search key includes the primary key, we do not use this definition
▪ More terminology – primary indices are also referred to as clustered

A secondary index is an index whose search key is not
the sort key of the file
 Secondary indexes must be dense
▪ Why?
 Secondary indices are also referred to as unclustered
10
bob
20
dave
bob
30
ann
dave
41
dave
ann
ann
file
dave
53
kim
dave
60
ann
77
dave
91
lee
kim
lee
index

The pointers in a block of a secondary index
may point to many blocks of the data file
 Making secondary indexes less efficient than
primary indexes for retrieval of a range of records

Heap (data) files are not ordered so require
secondary indexes

A clustered file contains the records of two tables
 Consider two tables in a one to many relationship
 Where queries containing a join between the two
tables are made frequently
 Note the unfortunate re-use of the word clustered ...
SELECT pet_name, species
FROM Owner, Pet
WHERE Owner.sin = 111 AND Owner.sin = Pet.sin
owner 1
owner 2
Pets of
owner1
owner 3
Pets of
owner2
owner 4
Pets of owner3
...
Pets of
owner4

Consider a primary index on Owner
 The file is sorted by owner
 Individual owners can be retrieve rapidly
 Pets of owners can also be retrieved rapidly


Efficiency is reduced when retrieving a range
of owner data with no pet data
Any index on attributes of Pets would be a
secondary index

A secondary index may waste space if search key
values are repeated
 Since each record ID is paired with its own search key

Create a bucket for each set of rids associated with a
search key
 Follow a pointer to the bucket, then
 Follow the rids in the bucket to the records

Saves space if search key values are larger in bytes
than record IDs
 And each key appears at least twice on average
10
bob
20
dave
dave
30
ann
kim
41
dave
ann
bob
file
lee
53
kim
...
60
ann
77
dave
91
lee
...
index
buckets

Multiple secondary indexes using indirection can
improve efficiency on queries with complex criteria
 Collect all the rids from the buckets that meet each of the
criteria
 Then intersect them
 And only retrieve records using the result

This avoids retrieving records that match some, but
not all, of the criteria

There are issues related to the storage and efficient
retrieval of documents
 Keywords are used to identify documents
 More and more documents are maintained on the web

A document can be considered as a record in a table
 The record can be thought of as having Boolean attributes
for each possible word in the document
 An attribute is true if the word is in the document and false
otherwise

Consider a secondary index on each attribute (word)
of a document
 Only those records where an attribute is true is contained
in the index
▪ So the index leads to documents with particular words

Indices for all attributes (words) are combined into a
single index
 Known as an inverted index
 The index uses indirect buckets for space efficiency

Inverted indexes of words in multiple
documents can be used to satisfy queries
 Using intersection, documents that contain
multiple target words can be found

Additional information can be maintained to
match words in sections of documents
 Titles, headers, and other sections
 Number of occurrences of words
 ...

An index’s search key can contain several fields
 Such search keys are referred to as composite search keys or
concatenated keys
▪ e.g. {fName, lName}

For searches on equality the values for each field in the
search key must match the values in record
 e.g. 'Joe Smith' does not match 'Joe Jones' or 'Fred Smith'

For range queries, ranges may be specified for any fields
in the search key.
 If no values are specified for a field it implies that any value is
acceptable for that field

Multiple level indexes can be very useful in speeding
up queries
 There is a general data structure that is used in
commercial DBMSs
 Known as B trees
▪ We will look at B+ trees, a commonly used variant

B trees have two desirable properties
 They keep as many levels as are required for the file being
indexed
 Space on tree blocks is managed so that each block is at
least ½ full

B trees are balanced structures
 All paths from the root to a leaf have the same length
 Most B trees have three levels
 But any number of levels is possible

B trees are similar to binary search trees
 Except that B tree nodes contain more than two children
 That is, they have greater fan-out
 B tree node size is chosen to be the same as a disk block

The number of data entries in a node is determined
by the size of the search key
 Up to n search key values and n + 1 pointers
 The value n is chosen to be as large as possible and still
allow n search keys and n + 1 pointers to fit on a block

Example
 If block size is 4,096, and the keys are 4 byte integers and
pointers are 8 byte addresses
 Find the largest value n such that 4n + 8(n + 1) ≤ 4,096
 n = 340

Search keys in leaf nodes are copies of the keys in
the data file
 The leaf nodes contain the keys in order
 The left most n pointers point to records in the data file
▪ A leaf node must use at least (n + 1)/2 of these pointers
 The right most pointer points to the next leaf
12
record with
key 12
24
29
record with
key 24
next leaf in
the tree
record with
key 29

In interior nodes, pointers point to next level nodes
 Label the search keys K1 to Kn, and pointers p0 to pn
▪ Pointer p0 points to nodes whose search key values are less than K1
▪ Other pointers, pi, point to nodes with search keys greater than or
equal to Ki and less than Ki+1
 An interior node must use at least (n + 1)/2 pointers
12
K < 12
12 ≤ K < 24
24
29
24 ≤ K < 29
K ≥ 29
in this
example n = 3
17
10
2 5
note that
(n+1)/2 = 2
10 13 15
17 22
27 87
27 35
87 91

B+ trees can be used to build many different indices
 A B+ tree could be a sparse index on a sorted data file, or
 A dense index on a data file

We will assume for now that there are no duplicate
values for search keys
 That is the search key is a candidate key for the relation
 The meaning of interior nodes changes slightly if there are
duplicate search key values

The B+ tree search algorithm is similar to a BST
 To search for a value K start at the root and end at a leaf
 If the node is a leaf and the ith key has the value K then
follow the ith pointer to the record
 If the node is an interior node follow the appropriate
pointer to the next (interior or leaf) node

Searching a B+ tree index requires a number of disk I/O
operations equal to the height of the tree
 Plus one I/O to retrieve the record
▪ If there are multiple searches on the same table the root of the tree
is probably in main memory
which nodes are visited in a search for 22?
17
which nodes are visited in a search for 16?
10
2 5
27 87
10 13 15
17 22
27 35
87 91

B+ trees are useful for processing range queries
 A range query typically has a WHERE clause that
specifies a range of values

Assume a query specifies values from x to y
 Search the tree for the leaf that should contain value x
 Follow the leaf pointers until a key greater than y is
found
 The tree can also be used to satisfy queries that have
no lower bound or no upper bound

Insert the record in the in the data file
 Retaining the rid and the search key value, K

Insert the entry in the appropriate place in a leaf
 Use the search algorithm to find the leaf node
 Insert a data entry, if it fits, the process is complete

If the target leaf node is full then split it
 The first (n + 1) / 2 entries stay in the original node
 Create a new node with the remaining (n + 1) / 2 entries to the
right of the original node
 Insert an entry with the first search key value from the new leaf
in its parent node that points to the new leaf

Adding an entry to an interior node may cause it to split
 After inserting a new entry there should be n + 1 keys (and n + 2





pointers)
The first (n + 2) / 2 pointers stay in the original node
Create a new node with the remaining (n + 2) / 2 pointers to
the right of the original node
Leave the first n / 2 keys in the original node and move the
last n / 2 keys to the new node
The remaining key 's value falls between the values in the
original and new node
This left over key is inserted into the parent of the node
along with a pointer to the new interior node

Moving a value to a higher, interior level of the tree,
may again cause a split
 The same process is repeated until no further splits are
required, or until a new root node has been created

If a new root is created it will initially have just one
key and two children
 So will be less than half full
 This is permitted for the root (only)
insert 2, 21 and 11
2 11 21
data file
the values are maintained in
order in the index pages
insert 8
create new root with the first
value of the new leaf node
11
2 11
8 21
11 21
chain the new node to the
original node
create a new node with the last
½ of the values
insert 64, then 5
insert 23 ...
11
2 85 8
11 21 64
both leaf nodes are now full ...
... inserting 23 ...
11 23
2 5 8
11 21 64
23 64
insert 97
and 6
11 23
2 5 8
11 21
23 64 97
insert 6
11
6 23
11 23
2 5 8
6 8
11 21
23 64
insert 6
6 11 23
2 5 8
6 8
11 21
23 64
insert 19 and 9
6 11 23
2 5 8
6 8 9
11 19
21 21
23 64
the same tree ...
6 11 23
2 5
6 8 9
11 19 21
23 64
insert 7
and now insert 8 in
the parent
2 5
6 87 9
6 11 23
8 9
which will require
that it splits
11 19 21
23 64
insert 7 – inserting
8 and a pointer in
the root node
keep ½ the pointers and
the first n/2 values
2 5
and make the middle
value the new root
11
6 7
6 11
8 23
8 9
23
move ½ the pointers and
the last n/2 values
11 19 21
23 64
tree after inserting 7
11
6 8
2 5
6 7
23
8 9
11 19 21
23 64
insert more values ...
11
6 8
2 5
6 7
8 9
23 45 60
11 19 21
23 31 39
45 51
60 64 93
and insert 77
11
6 8
23 45 60
77 93
2 5
6 7
8 9
11 19 21
23 31 39
45 51
60 64 93
... and insert 77 ...
11
now insert 60 in the root
6 8
23 45 60
77
77 93
2 5
6 7
8 9
11 19 21
23 31 39
45 51
60 64
... and insert 77 ...
11 60
now insert 60 in the root
6 8
23 45
77
77 93
2 5
6 7
8 9
11 19 21
23 31 39
45 51
60 64

Find the entry in the leaf node and delete it
 This may result in there being too few entries in the node
 If so select an adjacent sibling of the node and

Redistribute values between the two nodes
 So that both nodes have enough entries
 If this is not possible

Coalesce the two nodes
 Delete the appropriate value and pointer in the parent
node

A value and pointer are removed from an adjacent sibling
and inserted in the node with insufficient entries
 The sibling can be the left or the right sibling, although it makes
a slight difference to the process
 The chosen node must be a sibling to ensure that only a single
parent node is affected

After redistribution, one of the two nodes will have a
different first search key value
 The corresponding value in the parent node must be changed to
this value

If the node's sibling(s) have insufficient entries redistribution
may not be possible

When redistribution is not possible, two nodes can be
combined (or coalesced if you prefer)
 Keep track of the value in the parent between the pointers to
the two nodes to be combined
 Insert all of the values (and pointers) from one node into the
other
 Re-connect links between leaves (if the nodes are leaves)

Make a recursive call to the deletion process, deleting
the identified value in the parent node
 This, in turn, may require non-leaf nodes to be coalesced

The deletion algorithm requires a choice to be made
between siblings
 Such a choice has to be implemented in the algorithm

Coalescing nodes requires more work
 It may result in making changes up the tree, but
 The tree height may be reduced

Redistributing nodes requires less work, but does
not impact the height of the tree
delete 19
11 60
6 8
23 45
77
77 93
2 5
6 7
8 9
use search to find the value
11 19
21 21
23 31 39
45 51
delete the record in the data file first
60 64
delete 45
11 60
6 8
23 45
77
77 93
2 5
6 7
8 9
11 19 21
23 31 39
45
51 51
60 64
the node is less than half full:
(n + 1)/2 pointers to records
delete 45
11 60
6 8
23 45
77
77 93
2 5
6 7
8 9
11 19 21
23 31 39
51
take a value from the
left sibling node
60 64
delete 45
11 60
change the value in
the parent node
6 8
23 45
39
77
77 93
2 5
6 7
8 9
11 19 21
23 31 39
39
51 51
take a value from the
left sibling node
60 64
delete 9
11 60
6 8
23 45
77
77 93
2 5
6 7
8 9
leaf nodes must use
(n + 1)/2 = 2 pointers
11 19 21
not a
sibling
23 31 39
45 51
so must coalesce the
node with its L sibling
60 64
delete 9
11 60
now delete entry and
pointer from parent
2 5
6 7
8
6 8
8 9
leaf nodes must use
(n + 1)/2 = 2 pointers
23 45
77
77 93
11 19 21
not a
sibling
23 31 39
45 51
so must coalesce the
node with its L sibling
60 64
delete 9
11 60
note that these nodes
have enough (2) pointers
now delete entry and
pointer from parent
2 5
6 7
8
6 8
23 45
77
77 93
11 19 21
23 31 39
45 51
60 64
delete 6
11 60
the parent entry
doesn't need to change
2 5
67 87 8
6 8
23 45
77
77 93
11 19 21
23 31 39
45 51
60 64
delete 8
11 60
now delete entry and
pointer from parent
2 5
7
7 8
6 8
23 45
77
77 93
11 19 21
23 31 39
45 51
60 64
… delete 8 …
which value?
11 60
so take a pointer
from sibling
just 1 pointer
now delete entry and
pointer from parent
2 5
7
6
23 45
77
77 93
11 19 21
23 31 39
45 51
60 64
… delete 8 …
which value?
23
11 60
11
45
23 45
77
77 93
2 5
7
11 19 21
23 31 39
45 51
60 64
… delete 8 finished
23 60
11
2 5
7
11 19 21
45
23 31 39
77
45 51
60 64
77 93
delete 23 and 31?
23 60
11
2 5
7
11 19 21
45
23 31 39
77
45 51
60 64
77 93

Splitting and merging of index blocks is rare
 Typically the value of n will be much greater than 3!
 Most splits or merges are limited to two leaves and one
parent

The number of disk I/Os is based on the tree height
 It is a reasonable assumption that the majority of B trees
have a height of 3
 And one level is the root (i.e. one block) which can reside
in main memory

Assume a block size of 4,096 and rid and data
entry size of 8 bytes
 Each tree node can contain 340 key values and
pointers
 If each node is 2/3 full that is 255 pointers

How many records can be accessed by such a
tree with 3 levels?
 2553 = 16,600,000 records
 2554 = 4,228,250,625 records

Some B+ tree implementations don't fix
interior nodes for deletions
 If a leaf has too few keys and pointers it is allowed
to remain unchanged
 It is assumed that most DB files tend to grow not
shrink

It also allows efficient access to records
replaced by tombstones in the data file

The search algorithm assumes that all search key entries
with a given key are in the same node
 If duplicate values are allowed this may not be the case

Three methods for dealing with duplicate values are
 Maintain overflow pages for duplicates where necessary
 Include the rid as part of the search key
▪ Which ensures that there will not be duplicates
 Modify the tree to change the meaning of interior nodes
▪ Keys in interior nodes represent new keys, that is keys where the
same value does not appear to the left
▪ In some cases this requires null keys
find 11
19
and 19
and 23
i.e. no new keys in 2nd child
6
- 31 57
never follow this pointer!
2 5
6 11
11 19 23
23 23
23 31 41
57 64

A B+ tree can be created by repeated inserts of
data using the insertion algorithm
 This process is likely to be inefficient as the same disk
pages may be accessed more than once

An alternative is to initially sort the data and fill
the leaf nodes
 Non-leaf nodes can be created as necessary to create
the index
 Space should be left in all nodes to accommodate
future insertions

The height of a tree is determined by the fan-out
 The number of children of each node
 The fan-out is determined by the number of search key
values and pointers that can fit in one page
 A smaller search key leads to a greater fan-out and a tree
with fewer levels

It may be possible to compress search keys
 For example, a search key on last name can be truncated
to the extent that it is still sufficient to guide the search

In a hash table a hash function maps search key
values to array elements
 The array can either contain the data objects, or
 Linked lists containing data objects, call these buckets

Hash functions generate a value between 0 and B-1
 Where B is the number of buckets
 A record with search key K is stored in bucket h(K)

Buckets should consist of single blocks
 Full buckets can be chained to overflow blocks

The bucket locations need to be recorded
 An array of pointers to buckets, or
 The first block of each bucket is stored in consecutive disk
locations

The number of buckets should be greater than
 Number of data entries  entries per page
 A hash function would compute the remainder of K/B
▪ Where K is the key value and B is the number of buckets

A good hash function should evenly distribute
values over the buckets
 A hash function should be both uniform and random
▪ Buckets should be assigned the same number of values from the set
of all possible values, and
▪ On average each bucket should contain the same number of
entries, i.e. evenly distribute the actual values
 Note that buckets are expected to contain more than one
search key value

A typical hash functions is a bit representation of
the search key value modulo the number of buckets

Compute h(K) when a new record is to be inserted
 Insert the record in the bucket (or its overflow blocks)
▪ If necessary add an overflow block

Deletion is similar to insertion
 Use the hash function to find bucket h(K)
 Delete any records with search key K
 Consider combining blocks if possible

There should be enough buckets so that most
fit on one block
 i.e. there are few overflow blocks
 Most lookups require only two disk I/Os

The number of buckets is fixed
 Such indexes are referred to as static hash tables
 If the file grows most buckets will have overflow
chains, reducing efficiency

Hash indexes do not support range lookup

Two versions of dynamic hashing
 Extensible hashing, and
 Linear hashing

Both systems use the concept of a family of hash
functions
 As the size of the index grows larger, more disk pages
are required to store the data entries
 Rather than creating overflow pages, additional
buckets are created, and
 The range of the hash function is increased

Consider a hash function that returns some bit value
 e.g. 0100 1101 0110 0101
 The bucket can be derived by calculating the bit value
modulo n (the number of buckets)
 Note that if n is a power of 2 the bucket can be determined
by looking at the last k bits (where n = 2k)
▪ e.g. if n = 8 the bit value shown above maps to bucket 5

The range of a hash function can be doubled by
increasing the number of relevant bits by one

In extendible hashing there is a directory to buckets
 The directory is an array of pointers
 The array's size is always a power of 2
▪ If the directory needs to increase in size, it doubles
 New buckets are only created as necessary

The hash function computes a sequence of bits
 The directory (and the associated buckets) uses a smaller
number of i bits
 The directory will have 2i entries
▪ When the directory grows i+1 bits of the hash value are used

The directory consists of an array of pointers to buckets
 As the array only contains pointers it is relatively small, so
 Can usually fit on one page

The array index is calculated with the hash function
 And is determined by the relevant bits of the hash value
 The array size is determined by how many bits of the hash
function are being used

Even though the directory may become large, new
buckets are only created when overflow occurs
i –number of bits of the hash value being
used by the directory (global depth)
i=2
00
64 12
1 17 5
01
2
The array index is the last
two bits of the hash value
 Assume that three data
entries fit in one bucket

2
 i.e. one disk page

10
11
31 15
directory
6
2
2
The values shown are the
decimal equivalents of the
hash values
 Not search key values

Only four index pages are
currently required
j –the number of bits used to determine membership
in bucket – appears in the block header (local depth)
insert record where h(K) = 1100 1000
i=1
0
0100 0000 0100 0010 1100 1000
1
0000 1101 0100 0011
1
1
follow pointer in index 0 of the directory (as i =
1 only one bit of the hash value is being used)
Only the last bit of the hash value is
currently used to determine which
blocks records are inserted into
insert record where h(K) = 1101 1010
i = 21
00
0
0100 0000 0100 0010 1100 1000
1
0000 1101 0100 0011
1
01
1
10
11
the block is full
compare j to i
j = i so double the
directory size
distribute the
new pointers
based on the
previous index
split the block
that was full
insert record where h(K) = 1101 1010
split the block
that was full
0100 0000 0100
1100 1000
0010 1100 1000
21
00
0000 1101 0100 0011
1
01
0100 0010 1101 1010
2
i=2
10
11
use the last two
digits of the hash
value as i = 2
and increment j
adjust pointers
to the block that
has split
insert record where h(K) = 1111 1101
0100 0000 1100 1000
2
00
0000 1101 0100 0011 1111 1101
1
01
0100 0010 1101 1010
2
i=2
10
11
insert record where h(K) = 1001 1001
0100 0000 1100 1000
2
00
0000 1101 0100
1111 1101
0011 1001
1111 1101
1001
21
01
0100 0010 1101 1010
2
0100 0011
2
i=2
10
11
so split the block
but don’t double
directory size
increment j in
the new blocks
and adjust
pointers to the
new block
the block is full,
but j < i
insert record where h(K) = 0110 1101
0100 0000 1100 1000
2
000
00
0000 1101 1111 1101 1001 1001
2
001
01
0100 0010 1101 1010
2
0100 0011
2
i = 23
010
10
011
11
100
101
110
111
the block is full,
and j = i
insert record where h(K) = 0110 1101
0100 0000 1100 1000
2
000
0000
1001 1001
1101 1111 1101 1001 1001
23
001
0100 0010 1101 1010
2
0100 0011
2
0000 1101 1111 1101 0110 1101
3
i=3
010
011
100
101
110
111
the block is full,
and j = i
split block and
distribute values
insert record where h(K) = 1110 1101?
0100 0000 1100 1000
2
000
1001 1001
3
001
0100 0010 1101 1010
2
0100 0011
2
0000 1101 1111 1101 0110 1101
3
i=3
010
011
100
101
110
111

If a deletion empties a bucket it can be
merged with its counterpart
 The pointer entries in the directory are reset and
 The existing bucket's local depth is decremented

In practice this is not usually performed

If the directory fits in main memory, performance is
identical to static hashing
 One disk read to use index and one to retrieve record
 If the directory does not fit in memory, another read is required
▪ In practice, the directory usually fits in memory

Many collisions in a bucket creates a large directory
 Reducing the chance that it will fit in memory
 More likely if the number of records per block is small

Increasing the directory size is relatively time
consuming and interrupts access to the data file

If many entries have the same hash value across the
entire range of bits overflow pages are created
 The overflow pages are chained to the primary pages
 This can occur if the hash function is poor, or
 If there are many insertions with the same search key value
(i.e. a skewed distribution)

Otherwise repeated insertions of entries with the
same overall hash value would lead to
 The same bucket being repeatedly split, and
 The directory repeatedly doubling


Linear hashing is another dynamic hashing system
The number of buckets (n) is selected to maintain an
average occupancy in buckets
 e.g. 80%

Buckets are not always split when full
 So overflow blocks are allowed
 The average number of overflow blocks per bucket is less
than 1

The number of bits used to number the entries in
the bucket array is log2n


Linear hashing does not use a directory
The hash function determines which bucket a record
is mapped to
 The primary blocks of the buckets are stored sequentially
 So that bucket m can be found by adding m to the address
of the first bucket

Like extensible hashing only the right most i bits of
h(K) are used to determine a bucket

At any time i bits of the hash value are used to map
records to n buckets
 Values for the i bits range from 0 to 2i-1
 The value for n may be less than 2i-1

Computing h(K) on a search key value K results in a value m
for the last i bits of h(K)
 If m < n place the record in bucket m
 If n ≤ m < 2i then place m in bucket m - 2i-1
▪ i.e. change the left most bit of m to 0
insert record where h(K) = 1001 0100
insert record where h(K) = 0101 1110
so just use i-1 bits (or
subtract 2i-1 from 110)
place record in bucket
100 as 4 (100) < n
000
…
001
…
010
…
011
…
100
…
101
…
i=3
using 3 bits record is mapped to
bucket 110 which does not exist
n=6

Periodically buckets are added to the index
 When the ratio r / n (r records, n buckets) exceeds a
threshold value a new bucket is created
 The bucket that is added to the index may not have any
relationship to the bucket that was just inserted into

When a bucket is added values in its related bucket
are distributed
 That is the bucket whose index is the index of the new
bucket - 2i-1

When n > 2i, i is incremented by one
insert record where h(K) = 1100 1011
insert in bucket 11
which does not
exist so insert in 01
00 0100 0000 0100 0100
01 0000 1101 0100 0011 1001 0101
10 0100 0010
r / n < 2.4 so no new
bucket is created
i=2
n=3
r = 67
max occupancy = 0.8
so max r / n = 2.4
1100 1011
add an overflow
block to the bucket
insert record where h(K) = 1000 0110
insert in bucket 10
r / n = 2.667, so
make a new bucket
00 0100 0000 0100 0100
01 0000 1101 0100
1001 0101
0011 1001 0101
10 0100 0010 1000 0110
11 0100 0011 1100 1011
i=2
n = 43
r = 87
max occupancy = 0.8
so max r / n = 2.4
1100 1011
the next bucket is 11, so
distribute the values in 01
insert record where h(K) = 1011 1110
00 0100 0000 0100 0100
01 0000 1101 1001 0101
10 0100 0010 1000 0110 1011 1110
11 0100 0011 1100 1011
i=2
n=4
r=9
8
max occupancy = 0.8
so max r / n = 2.4
insert in bucket 10
r / n = 2.25, so don't
make a new bucket
insert record where h(K) = 0101 0110
insert in bucket 10
r / n = 2.5, so make
a new bucket
00 0100 0000 0100 0100
01 0000 1101 1001 0101
10 0100 0010 1000 0110 1011 1110
11 0100 0011 1100 1011
100 0100 0100
i = 23
n = 45
0101 0110
n = 2i, so increase i to 3
the next bucket is 100, so
distribute the values in 00
rr==10
9
max occupancy = 0.8
so max r / n = 2.4
insert record where h(K) = 1101 1101
insert in bucket 101
n – 1 (100) < 101 so
insert in 001
000 0100 0000
001 0000 1101 1001 0101 1101 1101
010 0100 0010 1000 0110 1011 1110
011 0100 0011 1100 1011
100 0100 0100
i=3
n=5
r = 10
11
max occupancy = 0.8
so max r / n = 2.4
0101 0110
r / n = 2.2, so don't
make a new bucket
insert record where h(K) = 1001 1100
insert in bucket 100
000 0100 0000
001 0000 1101 1001 0101 1101 1101
010 0100 0010 1000 0110 1011 1110
011 0100 0011 1100 1011
100 0100 0100 1001 1100
i=3
n=5
r = 12
11
max occupancy = 0.8
so max r / n = 2.4
0101 0110
r / n = 2.4, so don't
make a new bucket
insert record where h(K) = 1000 1000
insert in bucket 000
r / n = 2.6, so make
a new bucket
000 0100 0000 1000 1000
001 0000 1101 1001 0101 1101 1101
010 0100 0010 1000 0110 1011 1110
011 0100 0011 1100 1011
100 0100 0100 1001 1100
101 0000 1101 1001 0101 1101 1101
i=3
n = 65
r = 12
13
max occupancy = 0.8
so max r / n = 2.4
0101 0110
the next time a new bucket is
created the values in 010 will
be distributed into it
Linear hashing does not require a dictionary
Linear hashing may result in less space efficiency
because buckets are split before they overflow
 Multiple collisions in one bucket in extensible
hashing will result in a large directory


 Such a directory may not fit on one disk page

Collisions in linear hashing lead to long overflow
chains for the bucket with the collisions
 Requiring multiple disk reads for that bucket
 But no increase in the cost of accessing other buckets

Queries may contain complex conditions
 ... where name = 'bob' and age > 50

Indexes on the queried attributes can be used
to satisfy the result
 By retrieving matching rids and taking the
intersection of the result
 Note that this general strategy works poorly for
disjunctions (or clauses)

If two attributes are frequently queried
together they can be combined in an index
 Called a composite search key
 A single index where the search key is a
concatenation of two (or more attributes)

The search keys in both B+ trees and hash
indices can be composite
 A B+ tree composite index allows queries on just
the first attribute of the search key to be satisfied
 Whereas a hash index does not

There are a number 0f motivations for
specialized indexes
 That usually require some variant of complex
range queries

Geographic information systems
 Partial match queries
 Range queries
 Nearest neighbour queries

OLAP databases
 Queries on multidimensional daa


An index on indexes
An index on one attribute is built above an
index on a second
 The first index refers to index pages on the second
attribute
▪ Search key values in the lower index may be repeated
for different search key values from the first index

This can be generalized to more than two
attributes
30
ann
40
bob
50
dave
kim
...
lee
...
...
...
35
50
40
75
bob
50
kate 33
ann
50
dave 40
file
ann
40
ann
30
bob
35
dave 75
name index
age index

Multiple key indices work well for range
queries
 If the individual indices support range queries

They do not support queries where data for
the first attribute is missing
 Similarly to composite search keys

A k-dimensional search tree that generalizes
a binary search tree
 For multidimensional data


An in-memory data structure that can be
adapted to block storage on disk
kd tree nodes contain an attribute name and
an associated value
 Such as {salary, 40000}

A kd tree is structurally similar to a binary search
tree
 Values less than the node’s value are in its left subtree
 Values greater than the node’s value are in its right
subtree

The attributes at different levels of the tree are
different
 The levels rotate through the attributes of the tree
▪ With two attributes, the levels alternate between the
attributes
name kate
age 60
name bob
age 30
ann
25
ben
37
art
22
ada
40
age 47
joe
65
sue
30
kat
60
kim
40
name sue
joe
30
ned
49
sue
47
hil
40
ren
52
zak
60


Leaf nodes of kd trees should be blocks
The interior nodes can be adapted to be more
like B tree nodes
 With multiple key-pointer pairs
 Where each interior node is a block

A bitmap index consists of multiple vectors of
bits
 With one vector for each possible value of the
attribute
▪ A bitmap to record if a patient was a smoker would
require two bit vectors
▪ A bitmap on age might require 100 bit vectors
 The ith bit of the index is set to 1 if the ith row of
the table has the vector’s value for the attribute

A bitmap index can speed up queries on sparse columns,
that have few possible values
 One bit is allocated for each possible value

The indexes can be used to answer some queries
 How many male customers have a rating of 3?
 AND the M and 3 columns and count the 1s
gender
index
M F
id
name
sex
rating
1 2 3 4 5
1 0
112
Sam
M
5
0 0 0 0 1
0 1
113
Sue
F
3
0 0 1 0 0
0 1
121
Ann
F
2
0 1 0 0 0
1 0
131
Bob
M
3
0 0 1 0 0
rating
index

Bitmap indexes can satisfy conjunctions
 By taking the logical AND of the appropriate
vectors

They are also useful OR and NOT conditions
 By taking the appropriate Boolean combination of
the bit vectors

Bitmap indices are often used in databases
for data mining and OLAP
 Which often have low cardinality attributes
 And change relatively infrequently


Joins are often expensive operations
Join indexes can be built to speed up specific join
queries
 A join index contains record IDs of matching records from
different tables
 e.g. Sales, products and locations of all sales in B.C.
 The index would contain the sales rids and their matching
product and location rids
▪ Only locations where province = "BC" are included

The number of such indices can be a problem where
there are many similar queries

To reduce the number of join indices separate
indexes can be created on selected columns
 Each index contains rids of dimension table records that
meet the condition, and rids of matching fact table records
 The separate join indices have to be combined, using rid
intersection, to compute a join query

The intersection can be performed more efficiently if
the new indices are bitmap indices
 Particularly if the selection columns are sparse
 The result is a bitmapped join index
Download