Access Paths 

advertisement
Access Paths
9:00
11:00
13:30
15:30
18:00
Aug. 2
Intro &
terminology
Reliability
Fault
tolerance
Transaction
models
Reception
Aug. 3
Aug. 4
Aug. 5
Aug. 6
TP mons
Logging &
Files &
Structured
& ORBs
res. Mgr.
Buffer Mgr.
files
Locking Res. Mgr. &
COM+
Access paths
theory
Trans. Mgr.
Locking
CICS & TP
CORBA/
Groupware
techniques & Internet
EJB + TP
Queueing
Advanced
Replication Performance
Trans. Mgr.
& TPC
Workflow Cyberbricks
Party
FREE
Chapter 20

Types of Associative Access Paths




Primary index: Given the primary key value,
find the tuple.
Secondary index: Given the value of a nonunique attribute, find all qualified tuples.
Range index: Given the value range of some
attribute, find all tuples within that range.
Structure index: Given some tuple, find all
structurally related tuples (CODASYL sets,
object hierarchies, etc.)
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

2
Two Important Techniques
Two basic techniques dominate in modern DBMSs:
 Hashing: Use a fixed transformation algorithm to
convert the attribute value into a database address.
 Tree search:A dynamical search structure is built that
guides the search for a given attribute value to the
proper database location.

Hashing supports primary indices only. Tree search is more
versatile.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
3
The Access Gap



Accessing a tuple in buffer costs ca. 2000 instr.
Accessing it on disk takes 25 ms for I/O-related
activities. On a 20 MIPS machine, this translates 
into 500,000 instructions.
Therefore, one can spend many instructions on
an algorithm that saves one I/O on the average.
For access paths, the dominant cost measure is
the number of different pages that need to be
accessed during the search.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
4
Hashing
the magic of folding
(the shaded
areas denote
used key
values)
Jim Gray, Andreas Reuter
range of positive integers
range of
potential
key values
the magic of hashing

tupleaddress
space
Tuples should use up the
availabe address spaceas evenly
as possible.The white portions
indicate that space utilization
will be < 100%.
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
5
Folding vs. Hashing


Folding is used to turn an attribute value of
arbitrary length and arbitrary data type in to an
unsigned integer the maximum length of which
is determined by the instruction set.
Hashing is used to transform the result of
folding into the address of a page that (probably)
holds the tuple with the specified key value.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

6
Requirement
A good hash function H has to map primary key
values, which are very unevenly distributed over
a large value range, into a tuple address space 
that is much smaller (proportional to the number
of existing tuples), such that the resulting
addresses are evenly distributed over the
address range.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
7
Parameters of the Hash Function
K: This is the (folded) key value. It varies
between 0 and 2**32 - 1.
 B: Number of pages to be allocated for the file.
Depends on the number of tuples expected.
 H: Hash function that performs a mapping of:
(0, 2**32-1) -> (0, B-1).

Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

8
Consequences of the Approach


Contiguous allocation: All B pages must
allocated in physical contiguity, because the
relative addresses vary between 0 and B-1.
Fixed size: The file size must be determined
upon creation time, because changing the size
(i.e. changing B) means changing the hash
function. This in turn requires a complete
reorganization.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

9
Requirements for a Hash Function





Hash-based allocation assumes that it is possible to estimate the number of
tuples T that the relation will have, and that this estimate is not drastically
exceeded.
If a block has length B, and a tuple has an average length of L bytes, then
we need at least S =  T / (B/L)  blocks to store the T tuples.
The required number of blocks (S) is allocated before the first tuple is
stored. It is a good idea to allocate some more blocks (S’ > S) to allow for
unexpected growth.
Then a hash function H is defined, which takes in the value of the primary
key k of the relation and converts it into a number between 1 and S’; this is
the block number where the tuple is to be stored. If K is the set of possible
values for the primary key we have:
H: K {1, 2, …, S’}
The set of potential values for the primary key attribute will be much larger
than the number of blocks allocated (think, e.g., of ISBNs for the books
relation). So the hash function is a compacting function. For each primary
key value, there is exactly one block it is mapped to. Many different primary
key values are mapped to the same block (n:1 relationship).
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

10
Properties of Hash Functions





A hash function must be easy to compute and must not require access to
any blocks in the database.
It must be able to map a generally very large set of potential primary key
values (remember: primary keys can be constructed by concatenating
several attributes of a relation) into a comparatively small set of block
numbers in which the tuples will be stored.
It must be able to take in primary key values of different data types (integer,
binary, decimal, character, etc.) and map them to the set of integers
between 1 and S’ with equal efficiency.
The formula for estimating S based on the number of tuples, average tuple
length and block length implicitly assumes that all blocks are equally filled,
i.e. that the same number of tuples is mapped to each block. This is the
most difficult requirement, because the primary key values in general are
not equally distributed over their value range: Some parts of the value
range are used, others are not used at all, keys are generated by some
regular mechanism, etc.
To achieve this “hashing” property, different methods exist: table look-up,
base conversion, folding, encryption, division by prime numbers, etc.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

11
A Popular Hash Function




A hash function that most database systems use as a default (if the user
does not specify one) is defined as:
H(k) := k mod d + 1
This requires that k is a positive integer. If the primary key attribute of the
relation does not have the data type integer - it could, for example, be a
name - then it has to be turned into an integer first. The usual way of doing
this is to “fold” the binary representation of the key value such that its length
does not exceed 32 bits. Then these 32 bits are interpreted as an integer; if
it is negative, it is multiplied by -1 one. Details of folding are omitted here.
For H to be a good hash function, d must be a large enough prime number;
this is explained by detailed number theoretic analyses. We also must make
sure that the number of blocks allocated is about 25% larger than the
minimum requirement.
Summing it up: We first compute S. Then we compute S’ = 1.25*S. Then we
compute d = next_higher_prime (S’). H(k) will then determine the block for
each tuple based on the primary key value.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

12
Average Number of Overflow Pages
maximum number of tuples per page (bucket)
2
3
5
10
20
U
1
.50
0.500
0.177
0.087
0.031
0.005
0.000
0.000
.60
0.750
0.293
0.158
0.066
0.015
0.002
0.000
.70
1.167
0.494
0.286
0.136
0.042
0.010
0.001
.80
2.000
0.903
0.554
0.289
0.110
0.036
0.009
.85
2.832
1.316
0.827
0.449
0.185
0.069
0.022
.90
4.495
2.146
1.377
0.777
0.345
0.144
0.055
.92
5.740
2.768
1.792
1.024
0.467
0.203
0.083
.95
9.444
4.631
3.035
1.769
0.837
0.386
0.171
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
40
WICS August 2 - 6, 1999

13
Hashing for Non-Unique Attributes
Let V denote the number of different attribute
values. Then we can distinguish three cases:
 V ~ T: The attribute is “almost” unique; a good

hash function should work in that case.
 V > B: There are more values than buckets. Can
be made work, but some buckets may get much
higher utilization than others.
 V < B: This is the case where hashing cannot be
used.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
14
Overflow Handling
(Implicit) pointers
established by the

overflow function.In
this example the
overflow bucket for
bucket b is b+3 mod
B.
buckets
overflow
pages
buckets
a) external overflow handling
b) internal overflow handling
(Completely filled buckets and pages are shaded.)
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
15
Hashing: Summary





For unique attributes, hashing can yield oneaccess retrieval.
It is critical to find a good hash function to
reduce collisions.
If the original estimate of the file size is wrong,
reorganization is inevitable.
Synchronization at the page level is done using
standard crabbing techniques.
Hashing does not support range queries.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

16
B-Trees
B-Trees consist of two types of nodes:
Leaf nodes: The contain the data, i.e. the tuples
or pointers to the tuples (TIDs).

Index nodes: Index nodes contain reference keys
to direct the search towards the leaves. The data
structure looks like this:
struct
{ char *
KeyValue;
PAGEID PointerToNextNode;
} index_node_structure[ ];
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
17
Rules for Index Nodes




Key values are in sorted order: K0  K1  ...  Ki
 ...  Kf (f is max. capacity of a node).
For any two adjacent key values Ki, Ki+1 the
pointer Pi points to a node covering all values in
the interval (Ki, Ki+1].
If a search for value v arrives at an index node,
the next node to be visited is pointed to by Pi
such that Ki  v < Ki+1.
K0 is an arbitrary “low value” (smaller than
anything else in that node).
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

18
Properties of a B-Tree





Parameter f is called the fan-out of the tree.
The number of nodes visited from the root to a
leaf is called the height of the tree.
A B-tree is always perfectly balanced, i.e. the
height is the same for all leaves.
Storage utilization is at least 50% for all nodes
except for the root.
Average storage utilization is close to 70%.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

19
A Simple B-Tree
root node
220
index node I
140 168
220 255 256
120 136

271 312
312 318
leaf node L1
140 151
leaf node L3
271 296 299 303
leaf node L2
168 170 190
The shaded parts at the beginning of the index
nodes represent the low value key (K0).
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
20
Some Observations



B-trees also work for non-unique attributes;
implementational optimizations will be discussed
later on.
The reference keys in the index nodes can be
different from all “real” key values in the leaves;
they only have to guide the search correctly.
The key values at the leaf level are sorted in
ascending order; this supports range queries.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

21
Inserting Into a B-Tree
before split of index node I
after split of index node I
index node I
index node I
271 312
255 256
leaf node L1

271 299 312
312 318
leaf node L3
271 296 299 303
leaf node L2
L1
L3
271 296
leaf node L2
299 303
insert
key value 280
Jim Gray, Andreas Reuter
insert
leaf node Ln
key value 280
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
22
Growing a B-Tree



If the insert leaf is full, allocate a new node,
distribute the values (the new one sorted in
place) evenly across the old leaf and the new
node, move the lowest key value of the new
node up to the index node.
If that index node is full, split it in the same way.
If the root has to be split: Allocate two new
nodes, distribute the key values evenly over
them, put the reference key in the root.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

23
Deleting Tuples From a B-Tree




To maintain the space utilization guarantees, a
leaf that becomes under-utilized (< 50%) would
have to be merged with its neighbours.
This is a very costly operation; in particular,
synchronization at the page level is very
complicated.
Therefore, most systems let nodes become
empty and discard them when that happens.
Analyses show that this does not deteriorate the
overall B-tree performance.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

24
Non-Unique Attributes
key value
pointer
23
P1
key value
23
3
23
P2
23
P3
28
P4
29
P5
29
P6

pointer list
P1
P2 P3
28
1 P4
29
R3
29
2 P5
P6
key value
reference
23
R1
28
R2
P1
P2 P3
P4
P5
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
P6
WICS August 2 - 6, 1999
25
The Basic Formula of B-Tree-Performance
With the N: number of tuples, C: average
number of entries in a leaf, F: average number
of entries in an index node, the height H of a Btree is

H = 1 + logF ( N/C)
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
26
Some Performance Figures
H
Key-sequenced file (C* = 43)
N(max)
increase
Secondary index (C* = 300)
N(max)
increase
2
12,900
3
3,870,000
3,857,100
27,000,000
26,910,000
4
1,161,000,000
1,157,130,000
8,100,000,000
8,073,000,000
5
90,000

-
-
348,300,000,000 347,139,000,000 2,430,000,000,000 2,421,900,000,000
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
27
Tuples in the Leaves?
Assuming a tuple is x times longer than a TID, we
get the following estimate:

1 + logF (N/(x•C)) + 1.1 £ 1 + logF (N/C).
This transforms into
1.1 £ logF x
When this holds, moving the tuples out of the
leaves improves performance.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
28
Key Compression: Suffix Compression
Bertolucci Copelletti Gambogi
...
Bert Cop . .Gam
.

. Cooperativa
..
Copelletti
...
a) Full key values stored in all
nodes of the B*-tree
Jim Gray, Andreas Reuter
Cooperativa
Copelletti
...
b) Suffix compression for key
values in high-level nodes
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
29
Synchronization on B-Trees: What Is the
Problem?



B-Trees are fully redundant structures, which
can be reconstructed from the tuples; therefore,
no synchronization should be required at all.
However, some queries operate on the index
only. This requires all operations on B-trees to
be serializable with the operations on the tuples.
Standard two-phase locking with the nodes as
the objects is not feasible for performance
reasons.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

30
Protecting Tree Traversal
1. semaphore on Q
Node Q
at level i
search
path
2. follow search path

3. semaphore on R
Node R
at level i+1
This is an example
of the crabbing
technique
Jim Gray, Andreas Reuter
4. release sem. on Q
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
31
B-Trees and Value Locks

B*-tree
14 P1 P2 16 P3 17 P4 P5 P6 20 P7 22 P8 P9 P10
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
32
Making Lock Names
To implement value locking, we need to build
lock names according to the following rule:

LockN := {TableName, IndexName, KeyValue}.
KeyValue in turn is a composite:
KeyValue := {AttributeValue, TupleIdentifier}.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
33
Key Range Locking an B-Trees
Simple retrieval: k = c
Get a semaphore on leaf page; get S-lock on key 
range defined by largest existing value c1 with
c1  c; hold lock until commit.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
34
Key Range Locking an B-Trees
Range retrieval: c1 < k < c2
Get s semaphore on first leaf page; get S-lock on
key defined by largest existing value c3 with
c3 < c1; proceed sequentially along leaf level;
request key range S-lock for each new attribute
value up to and including c2; do careful crabbing
across leaf pages; hold S-lock until commit.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

35
Key Range Locking an B-Trees
Insert: [c, ki]
Get X-semaphore on leaf page; find largest
existing value c1 with
c1 < c; request instant IX-lock on c1 ; request
long X-lock on c.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

36
Key Range Locking an B-Trees
Delete: [c, kd]
Get X-semaphore on leaf page; find largest
existing value c1 with
c1 < c; request long IX-lock on c; else request
long X-lock on c and c1.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

37
B-Tree Recovery

Pi
Pi
Pa
Pa
Pa'
T1
Transaction T1 inserts into
page pa, causing a split
Jim Gray, Andreas Reuter
Transaction T2 inserts into
page pa'
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
38
B-Tree Recovery Based on Physiological
Logging




Cover all B-tree operations with semaphores on
all affected pages.
For each logical update a log record with the
logical UNDO operation must be moved to the
log
While the update operation is being performed,
physical REDO log records are written.
After all REDO records are safely in the log, the
exclusive semaphores can be released.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

39
The Two Phases of B-Tree-Recovery
Phase1: Go forward through the log up to its
current end, applying all REDO records to the
tree.
Phase2: Go backward to the Begin of transaction
record of the oldest incomplete transaction,
executing the UNDO operations on the tree for
all losers along the way.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

40
Other Access Path Methods:
Extendible Hashing
pointer
table
(directory)
key value
buckets
(pages)

¥
the magic
of hashing
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
41
New Techniques
Grid files: Symmetric multi-dimensional point
access. Can become very unbalanced
depending on correlation in the data.
R-Trees: Symmetric multi-dimensional access.
Can deteriorate depending on insertion strategy.
hb-Tree: Symmetric multi-dimensional access.
Can turn into a DAG depending on deletion
order.
Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

42
Download