Spatial Join

advertisement
SPATIAL JOIN
Biplob Kumar Debnath
Department of Electrical and Computer Engineering,
University of Minnesota
SYNONYMS
Intersect Join
DEFINITION
Spatial join operation is used to combine two or more dataset with respect to a spatial
predicate. Predicate can be a combination of directional, distance, and topological spatial
relations. In case of nonspatial join, the joining attributes must of the same type, but for
spatial join they can be of different types. Usually each spatial attribute is represented by
its minimum bounding rectangles (MBR).
A typical example of spatial join is “Find all pair of rivers and cities that intersect”. For
example in Figure 1, the result of join between the set of rivers {R1, R2} and cities {C1,
C2, C3, C4, C5} is { (R1, C1), (R2, C5)}.
Figure 1: Example of spatial join
HISTORICAL BACKGROUND
In 1986, Orenstein used grid based technique to perform spatial join. It is the first known
technique to solve spatial join operation. Using grid multidimensional spaces are divided
into smaller blocks, known as pixels. Then a z-ordering is used to order the pixels. Each
object is approximated by the pixels which interest with its MBR. As pixels are ordered
by z-ordering, now each object is represented by a set of z-values, which are onedimensional. Now, any one-dimensional indexing (e.g., B+-tree) can be used sort them
and using sort-merge spatial join operation is done. The performance of this technique
solely depends on the granularity of the grids. The finer grids are the more accurate the
results will be, but the more memory it will consume. Later on to remedy this problem,
that people devised multidimensional indices (e.g., R-tree) which can directly handle
spatial data. Various new spatial join algorithms (e.g., R-tree join, sort and match, spatial
hash join, slot index hash join etc.) based on multi-dimensional index appeared .
KEY CONCEPTS:
Spatial join is done in two steps: filter step and refine step. In filter step, tuples whose
MBR overlaps with query region are determined. This step is not computationally
expensive as at most four computations are required to determine whether two rectangles
intersect. The tuples which passed the filter step is fed to the refinement step, where exact
spatial representation is used and spatial predicate is checked on these spatial
representations. Refinement step is computationally expensive, but the number of tuples
it processed in this step is less, due to initial filter step.
Spatial join algorithm can be classified into three categories. For the discussion below we
will assume that we want to spatial join relation R1 and R2. In this discussion, we will
focus on only intersection join. Same techniques can be extended for other join variants
(e.g., distance join).
Nested Loop
In this algorithm, for each tuple of R1, entire R2 is scanned; any pair of tuples of R1 and
R2 which satisfies the spatial join predicate is added to the result. The basic algorithm
follows:
1. for all tuple r1  R1
2.
for all tuple r2  R 2
3.
if pair (r1, r2) satisfies the spatial join predicate
4.
add <r1, r2> to result
Here, R1 is the outer relation and R2 is the inner relation. If an index is available, we can
make that relation as an inner one. In this case, we need not to scan the entire inner
relation.
Tree Matching
Tree matching algorithm can be applied when indices are available on both the relations.
For this discussion, we will assume that R-tree index is available. In R-tree, every node
is in the form of <ref, rect>, where ref is pointer to child node and rect is the MBR of the
child node or MBR of a spatial object. The pages which contain leaf nodes are called data
pages, and the pages which contain non-leaf nodes are called directory pages. As
directory entries contains the MBR of the child node entries, if MBRs of two directory
entries Er1 and Er2 are disjoint, then there can be no match between entries of both
directory pages. If they are not disjoint, there is some match between the entries, so we
have traverse deeper the tree to get the matching tuple. The basic algorithm follows:
Spatial_Join (R1, R2 ) // R1 and R2 are R-Tree nodes
1. for all Er1  R1
2.
for all Er2  R 2
3.
if (Not_Disjoint( Er1.rect, Er2.rect))
4.
if ( R1 and R2 are leaf pages)
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
if pair (R1, R2) satisfies the spatial join predicate
add <R1, R2> to result
else if (R1 is a leaf page)
Read_Page (Er2ptr)
Spatial_Join (Er1.tr, Er2.ptr)
else if (R2 is a leaf page)
Read_Page (Er1ptr)
Spatial_Join (Er1.tr, Er2.ptr)
else
Read_Page (Er1.ptr)
Read_Page (Er2.ptr)
Spatial_Join (Er1.tr, Er2.ptr)
When index exists for only one relation, the index on the other relation is built on the fly
and tree-matching technique is applied.
Partition-Based Spatial Merge Join
In this case, first both of the relations are divided into p partitions if both of them do not
fit in main memory. After that partition i of R1, where 1  i  p , is compared with
corresponding partition i of R2. We briefly go through the filter step of this algorithm:
1. For each tuple in R1 and R2, form new relations R1’ and R2’ where each tuple
consists of unique object id of the tuple and MBR of the joining attributes.
2. If we can fit both R1’ and R2’ in the main memory, using a plane-sweep
algorithm we can process the join relation.
3. If both R1’ and R2’ cannot be fitted in the main memory, we partition both the
relations into p parts (R1’1,….R1’p and R2’1,….R2’p) where any partitions pair
(R1’i,R2’i ) fits in main memory. In addition, we will make sure that, for each
R1’i, any overlapping tuples in R2’ will reside in partition R2’i. Now, we can
apply plane-sweep algorithm in each partition.
This strategy is very good when no indices are present on both the relations.
KEY APPLICATIONS
One of the applications of applications of spatial join is to find all the objects which
either intersect or overlap with each other. Some variants of spatial join (e.g., distance
join) are used in data mining for data analysis and clustering. It can also be used to
process closest-pairs query, k-nearest neighbors query, and є-distance query.
FUTURE DIRECTIONS
There are some issues in spatial join require further attention from the research
community. For processing spatial join queries we usually follow filter and refine step in
order. In some cases, some variants of this (e.g., interleaving) may give us more benefit.
We can explore where probable variants can be beneficial and what information we need
to collect for this. Although intersection joins algorithms (e.g., R-tree join) can be
directly extended for other types (e.g., distance join) but often it cause inefficient
performance benefit. Various optimization techniques can be applied to remedy this.
Extending existing intersection join algorithms with various optimization criteria to other
domain will be an interesting area for research.
CROSS REFERENCES
1. Intersection join
2. Distance join
3. Similarity join
4. Spatial access method
5. R-Tree
RECOMMENDED READING
1. Shashi Shekar, Sanjay Chawla (2003). Spatial Databases A Tour, First Edition,
Prentice Hall.
2. Patel J. M. and Dewitt. D. J. (1996). Partition Based Spatial-Merge Join,
Proceddings of ACM SIGMOD, pages 259-270.
3. Brinkhoff, T., Kriegel H., and Seeger B. (1993) Efficient processing of spatial
joins using R-trees. In Proceeding of ACM SIGMOD, pages 237-246.
4. Brinkhoff, T., Kriegel H., and Seeger B. (1996) Parallel processing of spatial joins
using R-trees. Proceeding of ICDE Conference, pages 258-265..
5. Yannis Manolopoulos, Apostolos Papadopoulos, Michel Gr. Vassilakopulous
(2005). Spatial Databases, Technologies, Techniques and Trends, IDEA Group
Publishing.
6. Böhm C. and Krebs F. (2002). High Performance Data Mining Using the nearest
Neighbor Join. Proceedings of the IEEE International Conference on Data
Mining, pages 43-55.
7. Shou Y., Mamoulis N., Cao H., Papadis D., Cheung D. W. (2003). Evaluation of
Iceberg Distance Joins. Proceedings of the Eighth International Symposium on
Spatial and Temporal Databases, pages 270-288.
8. Corral A., Manolopoulos Y., Theodorisdis Y., Vassilakopoulos M., (2000).
Closest pair queries in spatial databases. Proceedings of the ACM SIGMOD
Conference, pages 189-200.
9. Guttmann A.(1984) R-trees: A dynamic index structure for spatial searching.
Proceedings of the ACM SIGMOD Conderecee3, pages 47-57.
10. Koudas N., Sevcik k. (2000)/ High Dimensional Similarity Join. Proceedings of
the ACM SIGMOD Conference, pages 324-335.
11. Mamaulis N., Papadias D. (2001). Multi-way Spatial Joins. ACM Transactions on
Database Systems (TODS), 26(4), pages 424-475.
12. An N. Yang, Sivasurbramaniam A. (2001). Selectivity estimation for Spatial
Joins. Proceddings of the IEEEE ICDE Conference, pages 368-375.
13. Faloutsos C., Seeger B., Traina A. , Traina C. (2000). Spatial Join Selectivity
Using Power Laws. Proceedings of the ACM SIGMOD Conference, pages 177188.
14. Mamoulis N., and Papadias D. (2003). Slot Index Spatial Join, IEEE Transactions
on Knowledge and Data Engineering (TKDE), 15(1), pages 211-231.
15. Orenstein J. (1986). Spatial Query Processing in an Object-Oriented Database
System. Proceedings of the ACM SIGMOD Conference, pages 326-336.
Download