Indexed File Organization

advertisement
Indexed File Organization
 Indexing allows access to records based on a key, on
which the file is stored and accessed. Address of a
record is some function of the key. Student id, social
security id, citizen id, etc. are good candidates for an
indexed file organization.
 Simple indexing is used where a separate index file is
maintained, in addition to the data file.
 In this case, the index file is generally as big as the
available memory. The handling of the index file is
treated as a fixed record length sorted file (or array).
 The data file is maintained as unsorted, with possibly
variable length records.
 This method becomes cumbersome when the index
file becomes too big to fit in the memory and when too
many updates are needed,
which means deletion/insertion of records from/into
the index files becomes costly.
Mbozyigit
1
 The problem is how to have a search method that does
better than the binary search, thus better than simple
indexing.
 The techniques, such as binary trees or AVL trees can
also be used, but they are logN efficient. This may be
considered unaffordable for a large file…
 As alternative to the binary or AVL based processing,
multilevel indexing or hashing is suggested.
o In this category, as potential file organization
methods for very large files, three methods can be
mentioned: ISAM, B Trees, and Hashing.
Historically, an indexing method known as ISAM
(Indexed Sequential Access Method) is used by
famous vendors such as IBM and others, dealing with
database management systems.
Mbozyigit
2
ISAM
 ISAM-Indexed Sequential Access method is generally
based on a cylinder index and a block index.
o The cylinder index contains the highest key record
in each cylinder
o The block index contains the highest key record
on each block
o To access a record,
 First, the cylinder index is accessed, generally
once for each disk, to find the cylinder on
which the record is.
 Second, the index block (containin pointers
for the data blocks) on that cylinder is access
to find the block address on which the target
record is located.
 Then, that block is accessed.
 Time required to access a single record would require
one seek to the target cylinder containing the data, one
r+btt for the index block, and one r+btt for the data
block:
Tf=r+s+btt+r+btt
 An overflow area allocated at the end of each cylinder
to be used when needed. When new records are added
the old records are shifted to open up space.
Mbozyigit
3
 The record which has the largest key in that block is
moved to the overflow area, with a pointer placed in
the inserted block to point to the moved block.
 The new records will cause the number of overflow
records to grow, if there is no space in the primary
area.
 After some time, with frequent insertions, there will
be a long list of linked records in the overflow area.
 Thus, ISAM degrades as the new records are added to
the overflow area, which will cause the ISAM file to
be reorganized, as a costly process…
 This inefficiency is the cause of the drop of once
important ISAM, in the eyes of database users.
Mbozyigit
4
B+Trees as an Indexed File Organization Method
(Due to Bayer and McCreight)
 B+ Tree is a multilevel indexing type organization
 In a B+ Tree,
o Each node may have any number of children
o It has all its leaves on the same level
 B+ Trees are also referred to as B Trees, B* Tree, or B+
Tree, with some differences. The ones with data in the
leaves, indices in the internal nodes, seems to be the
most common, B+ Tree.
 Thus, B+ Trees probably form the most common file
organization methods.
 Properties of B+ Trees
o Order of B+ Tree (v) is the minimum number of
the keys an internal node has. (Note that different
authors may define order differently!)
o Except, the root node can have at least 2 children
(minimum), unless it is a leaf.
o No internal nodes can have more than 2v keys.
o All the leaves are on the same level.
o Leaves contain data records (or the address of the
data records in case of secondary index).
Mbozyigit
5
Also note that for each secondary key a related
new B+ Tree is maintained.
o Leaves may also contain the address of the next
leaf for fast sequential access.
o An internal node with k keys has k+1 children.
o The keys in a node are sorted,
such that a given key is actually the largest or the
smallest key in the corresponding child node;
 Searching a B+ Tree for a record, given its key value.
Note that if there are k keys in a node (c1,…, ck), there
are k+1 pointers (p0, p1,…, pk), in the same node, for
that many descendent nodes.
o Given a key x, start from the root and do the
following until the corresponding record is
reached at a leaf: If x<ci, take the pi-1,
 If x>=ck, take the pk, for i=1,…,k
 Timing computations
o Tf=index access time + data access time
o If index access time=s+r+btt, and data access
time=s+r+dtt, Tf=2s+2r+btt+dtt
o Note that dtt (data transfer time) implies a
cluster(or bucket) to hold the data, which is
generally several blocks.
Mbozyigit
6
 This computation is based on the assumption that, the
rest of the B+ Tree is kept in the memory, except the
level above the leave nodes and the leaf nodes
themselves…
 For most files, a B+ Tree based fetch takes at most two
disk accesses. For small files, only one access per
fetch, for very small files no access is required, as the
leaves as well as index nodes can all be resident in the
memory during the application life time.
 Generally, it is arranged such that the B+ Tree nodes
(internal and leave) are ln2 full. For example if the
maximum number of keys is 200, the average
occupancy would be 140 (=0.7x200).
 Given the size of available memory, it is possible to
compute the number of buckets that can be supported
with two disk accesses only.
 An example for forming a B+ Tree:
o Assuming that a block can have an average of 140
keys and there are k data clusters (or blocks or
buckets), total number of internal nodes (blocks)
except the bottom most two levels is equal to
i
iΣk/140 , where i=2, …,logpk, where p=140
o If the memory size is limited to b blocks, and the
target is only two accesses. Then, there will be three
level above the bottom most two levels. Thus,
Mbozyigit
7
b= k/1403 +k/1402 +1
there will be k/140 blocks in the level above the
leaves.
 One can solve for k, if b is given, for b if k is given,
for blocking factor m if both b and k are given. Note
that m in the above example is 140/0.7. Memory size
is approximated as k/p2.
Mbozyigit
8
B+ Trees and secondary key
 B+ Trees are also appropriate for the secondary key
implementation.
 In this case, for each secondary key a separate B+
Tree is formed. Except that in this case the bottom
most level will contain the pointers to the data records,
rather than the clusters themselves.
Mbozyigit
9
Time considerations
 The time to read the whole file(exhaustive read), in the
order of the primary key, ignoring in memory
processing:
Txp=b*(s+r+dtt)
Where b=1/(ln2)*(n/m), where n is number of
records, assume that leaf nodes have links to the next
node, but the next node is not contiguously located.
 Time to read the whole file in the order of the
secondary key.
Txs=n*(s+r+dtt)+b*( s+r+btt)
Where the first term is reading the file record by
record; a record is in any cluster. Second term is
reading the secondary key’s B+ Tree which has b
blocks.
This is too slow!!!
 Accessing the next record is fast, in primary key case
TN=[(1/ln2)(1/m)]*(s+r+dtt)
Where the first factor is the probability that the
record is not on the current cluster.
Mbozyigit
10
B+ Tree Insertion Algorithm
 Top-down search to find the place to insert the new
record in the leaf nodes.
 If there is room in the leaf node, insert the record and
terminate.
 If there is no room in the leaf, allocate a new leaf
node, split the records in the middle. Place the first
half(ceiling) in the first, the rest in the second leaf.
 Place the smallest key value in the second(new) leaf in
the internal immediate parent node.
o If the internal parent node is already full, split it
into two internal nodes, each with half of the
keys
o Carry the middle key value to the next level up
(parent).
o If no parent exist while bottom up process
continues, create a new node(root in this case)!
Mbozyigit
11
Primary key case: Insertion
 If the new record fit into the data block, the insertion
time required is the sum of the fetch and update
times:
TIp=TF+2r
 If the record does not fit in the data block, then a
data block split is required. Considering the
expected times for this to happen, the insertion time
is formulated as follows:
TIp=TF+2r+(2/m)[(s+r+dtt)+(s+r+btt)+(2/2v)(s+
r+btt)]
Where 2/m (=1/(m/2)) is the probability that the data
block is full, as a block has to be half full any way;
2/v(=1/(2v/2)) is the probability that the parent of
the leaf is full.
Meaning of each term:
 TF+2r: Fetch and write the original data
cluster,
 (s+r+dtt): write a new cluster as a split
cluster, write the parent internal node
block,
 (s+r+bt):write the splitted parent internal
node.
Mbozyigit
12
 Notes:
(1) the minimum block occupancy is 50%, i.e.,
m/2. So, the insertion will be in positions from,
m/2+1 to m in the data block, this is the reason
for 2/m.
(2) m is assumed to be maximum blocking
factor for the leaves, 2v is assumed to be the
maximum blocking factor for the internal
nodes. You may choose both m and 2v to be
the same…
 Whenever a data record is inserted in the primary B+
Tree, the secondary key B+ Tree needs to be updated
as well. Note that this time, the maximum blocking
factors for both leaves and internal nodes are the
same, say m.
 Assuming that all the internal nodes, including the
parents of the leaves, are in memory. the time to
insert a secondary key:
TIs=(s+r+btt) + 2r + (2/m) (s+r+btt)
The first, term is the time to read, the second term is
the time to write back after modifications, the third
term is the time required if split is also considered.
Mbozyigit
13
 If data node in a primary B+ Tree is spilt, it may
require all the secondary indexes of this file to be
updated…
 To lessen this update problem, the secondary keys
may be associated (pinned) to the primary keys
rather than the record addresses. In this case, we
do not have to change the secondary key B+ Tree
when the record addresses do change.
Deletion of Records
 When the minimum criteria is met regarding the
occupancy, the deletion is no problem, just remove
the related entry from the node, both for primary
and secondary key cases.
 No problem is posed, if the record key in a parent
node does not exist in any leaf.
 If the subject leaf is at its minimum occupancy, then
a deletion will cause consolidation with a sibling
node.
 Consolidation may mean coalescence of two sibling
nodes, if the sum of entries is less then the
maximum;
or redistribution of the entries in adjacent siblings,
if the sum is more than the maximum.
Mbozyigit
14
If two siblings have equal number of entries chose
the left one…
 Algorithm
1. Find the block containing the record, say X
2. Delete the record and terminate if block limits
are Ok
3. Otherwise, if one of the sibling blocks exceed
the minimum the most, redistribute the entries
in both; change the parent accordingly to record
the correct key
4. If neither siblings have more than the minimum,
coalesce (combine) X with it and modify the
parent to reflect the change.
5. If an internal node has less than the minimum
after the modifications,
 redistribute its content with a sibling if the
total is more than the maximum and
modify the parent…
 coalesce the content with a sibling and
modify the parent, if the total is less than
the maximum.
 after the modification, if the parent is too
sparse, repeat this step (5)
Mbozyigit
15
Timing considerations for the record deletion
 Most usual case, deletion does not cause change in
the tree.
TDp= TF+ 2r
 If the deletion requires redistribution of the values in
the sibling, because it falls below the limit, then
there is a need for reading the two adjacent siblings.
The two siblings and the original leaf will make two
leaves to be written back to the disk. The parent of
the siblings, which is already in the memory) need
to be modified and rewritten. Thus,
TDp= TF+ 4(s+r+dtt) +s+r+btt
 If one sibling is involved, then the
TDp= TF+ 2(s+r+dtt) +s+r+btt
approximated to
TDp= TF +2TF+2r
 If we consider the probability of 1/m/2 we may have
to read two siblings and write a parent and a sibling,
TDp= TF +2(2/m)TF+ 2r
 For large m, TDp= TF + 2r
 For the secondary key deletion, if the probability
term is ignored for large blocking factor, we have
TDs= TF + 2r
Mbozyigit
16
Construction of A B+ Tree for an existing file
The most reasonable method is to use bottom-up B+
Tree construction:
1. Sort the file, on the disk with clusters ln2 full.
2. Read in the sorted file cluster by cluster, and
enter the addresses in the parent node until the
it is ln2 full.
3. If the index node is ln2 full, create a new entry
in the parent node if that is not full, otherwise
go one level up. Note that new root may be
created if all lower level nodes are full.
4. Each time a new entry is created in a node, all
lower level nodes needs to be created as well.
5. The process continues until all the sorted leaves
are consumed.
6. There may be sparse nodes on the right most
side of the tree, which need to be fixed.
 Note that a B+ Tree can also be constructed by
successive insertions, but this would be very
inefficient… Why?
Mbozyigit
17
Download