Concurrent R-trees

advertisement
Mehdi Kargar
Department of Computer Science and Engineering


Handling spatial and multidimensional data are
very important in modern database systems
Applications :
◦ CAD (Computer Aided Design)
◦ GIS (Geographical Information Systems)
◦ Cartography and …


Classical indexing structures such as B-Tree are
not suitable for handling multidimensional data
They use only one dimensional indexing
structures



R-tree is one of the best structures for
indexing multidimensional data.
Despite of other multi-dimensional
structures, R-tree directly stores
multidimensional spatial objects.
Spatial objects are represented by their
minimal bounding box

An R-tree is a depth balanced tree with a
dynamic index structure
◦ Leaf nodes point to actual keys
◦ The number of entries in a node is between m and
N (1 < m ≤ N)
◦ Root might have between 1 and N entries.
◦ All leaf nodes are at the same level
◦ The key for each internal node is the minimum
bounding rectangle of its child nodes
Search Query



keys at all levels might have overlap with
each other
During the search for a key, it might be
necessary to descend multiple sub-trees
Insertion is more complex than search
◦ After inserting a new key, the new bounding
rectangle should be propagated up to the tree.
◦ If a node overflows, it should be split. The split
should also be propagated up to the tree.

Deletion is a combination of methods used in
search and insertion algorithms.

The naïve approach to concurrent operations
on R-trees are not correct, R-link tree solves
the problem. (search for R5 and insertion of R2)

An R-link tree is like a normal R-tree with
two basic modifications.
◦ All of the nodes in any level of the tree are
connected together in a link list via right links (first
applied on B-trees)
◦ Addition of an LSN (Logical Sequence Number) in
each node and each parent entry which is unique
within the tree. It is used to produce a linear
ordering of the spatial keys.



Unfinished splits can be captured by comparing the
LSN of parent entry and its child node
c1 , c4 , c5 → normal situation
c2 , c3 → unfinished split situation





Since keys might overlap, multiple sub-trees
might navigated for a single search.
An Stack is used to remember which node is
waiting to be visited.
The LSN of each entry is also pushed into the
stack.
If the LSN of the node is higher than the one on
the stack, then the node has been split in the
meantime.
All of the nodes to the right of it, up to and
including the node with the LSN equal to the
expected LSN is pushed onto the stack.

The insertion algorithm consists of three
phases
1.
2.
3.
Finding the optimal leaf node for inserting the
new key.
If the leaf node overflows and splits, the split
should be propagated to upper levels.
if the bounding rectangle of the leaf node
changes, the new bounding rectangle should be
propagated to upper level



The path from the root to the leaf node
should be stored in an stack.
Backing up on this path to install the changes
(split and bounding rectangle) to the tree.
Using lock coupling strategy:
◦ For manipulating the parent node, the child nodes
remain write locked until a write locked is obtained
from the parent.


Using the combination of methods in search
and insertion algorithms.
Three phases:
1. Finding the leaf node containing the key.
2. Removing the entry from the node.
3. If the bounding rectangle of the leaf node
changes, it should propagated up to the tree.

For improving the performance of the tree
operations, empty nodes can be removed
from the tree.

1 - M. Kornacker and D. Banks. Highconcurrency locking in r-trees. In
Proceedings of the 21th International
Conference on Very Large Data Bases, pages
134-145. ACM, 1995.

2 - P. L. Lehman and S. B. Yao. Efficient
locking for concurrent operations on B-trees.
ACM Transactions on Database Systems,
6(4):650-670, December 1981.
Download