Spatial Access Methods

advertisement
Temple University – CIS Dept.
CIS616– Principles of Data
Management
V. Megalooikonomou
Spatial Access Methods (SAMs)
(based on notes by Silberchatz,Korth, and Sudarshan and notes by C.
Faloutsos at CMU)
General Overview

Multimedia Indexing

Spatial Access Methods (SAMs)





k-d trees
Point Quadtrees
MX-Quadtree
z-ordering
R-trees
SAMs - Detailed outline

spatial access methods






problem dfn
k-d trees
point quadtrees
MX-quadtrees
z-ordering
R-trees
Spatial Access Methods - problem


Given a collection of geometric objects
(points, lines, polygons, ...)
organize them on disk, to answer
spatial queries (like??)
Spatial Access Methods - problem


Given a collection of geometric objects
(points, lines, polygons, ...)
organize them on disk, to answer




point queries
range queries
k-nn queries
spatial joins (‘all pairs’ queries)
Spatial Access Methods - problem


Given a collection of geometric objects
(points, lines, polygons, ...)
organize them on disk, to answer




point queries
range queries
k-nn queries
spatial joins (‘all pairs’ queries)
Spatial Access Methods - problem


Given a collection of geometric objects
(points, lines, polygons, ...)
organize them on disk, to answer




point queries
range queries
k-nn queries
spatial joins (‘all pairs’ queries)
Spatial Access Methods - problem


Given a collection of geometric objects
(points, lines, polygons, ...)
organize them on disk, to answer




point queries
range queries
k-nn queries
spatial joins (‘all pairs’ queries)
Spatial Access Methods - problem


Given a collection of geometric objects
(points, lines, polygons, ...)
organize them on disk, to answer




point queries
range queries
k-nn queries
spatial joins (‘all pairs’ within ε)
SAMs - motivation

Q: applications?
SAMs - motivation
traditional DB
age
salary
GIS
SAMs - motivation
traditional DB
age
salary
GIS
SAMs - motivation
CAD/CAM
find elements
too close
to each other
SAMs - motivation
CAD/CAM
SAMs - motivation
eg,. std
S1
F(S1)
1
365
day
Sn
F(Sn)
eg, avg
1
365
day
SAMs: solutions






K-d trees
point quadtrees
MX-quadtrees
z-ordering
R-trees
(grid files)
Q: how would you organize,
e.g., n-dim points, on disk?
(C points per disk page)
SAMs - Detailed outline

spatial access methods






problem dfn
k-d trees
point quadtrees
MX-quadtrees
z-ordering
R-trees
k-d trees



Used to store k dimensional point data
It is not used to store region data
A 2-d tree (i.e., for k=2) stores 2-dimensional
point data while a 3-d tree stores 3dimensional point data, etc.
2-d trees – node structure





Binary trees
Info: information field
Xval,Yval: coordinates of a point associated with the node
Llink, Rlink: pointers to children
Properties (N: node):

If level N even ->



for all nodes M in the subtree rooted at N.Llink: M.Xval < N.Xval
for all nodes P in the subtree rooted at N.Rlink: P.Xval >= N.Xval
If level N odd ->

Similarly use Yvals
2-d trees – Example
2-d trees: Insertion/Search

To insert a node N into the tree pointed by T



If N and T agree on Xval, Yval then overwrite T
Else, branch left if N.Xval < T.xval, right
otherwise (even levels)
Similarly for odd levels (branching on Yvals)
2-d trees – Example of Insertion
City
(Xval, Yval)
Banja Luka
(19, 45)
Derventa
(40, 50)
Toslic
(38, 38)
Tuzla
(54, 35)
Sinj
(4, 4)
Splitting of region by Banja Luka
Splitting of region by Toslic
Splitting of region by Derventa
Splitting of region by Sinj
2-d trees: Deletion

Deletion of point (x,y) from T


If N is a leaf node easy
Otherwise either Tl (left subtree) or Tr (right
subtree) is non-empty




Find a “candidate replacement” node R in Tl or Tr
Replace all of N’s non-link fields by those of R
Recursively delete R from Ti
Recursion guaranteed to terminate - Why?
2-d trees: Deletion

Finding candidate replacement nodes for
deletion

Replacement node R must bear same spatial
relation to all nodes in Tl and Tr as node N
2-d trees: Range Queries


Q: Given a point (xc, yc) and a
distance r find all points in the 2-d
tree that lie within the circle
A: Each node N in a 2-d tree
implicitly represents a region RN –
If the circle (specified by the
query) has no intersection with RN
then there is no point in searching
the subtree rooted at node N
SAMs - Detailed outline

spatial access methods





problem dfn
k-d trees
point quadtrees
z-ordering
R-trees
Point Quadtrees






Represent point data
Always split regions into 4 parts
2-d tree: a node N splits a region into two by
drawing one line through the point (N.xval,
N.yval)
Point quadtree: a node N splits a region by
drawing a horizontal and a vertical line
through the point (N.xval, N.yval)
Four parts: NW, SW, NE, and SE quadrants
Q: Quadtree nodes have 4 children?
Point Quadtrees

Nodes in point quadtrees represent
regions
Point quadtrees - Insertion
City
(Xval, Yval)
Banja Luka
(19, 45)
Derventa
(40, 50)
Toslic
(38, 38)
Tuzla
(54, 35)
Sinj
(4, 4)
Splitting of region by Toslic
Splitting of region by Banja Luka
Splitting of region by Tuzla
Splitting of region by Derventa
Splitting of region by Sinj
Point Quadtrees - Insertion
Point quadtrees: Deletion

Deletion of point (x,y) from T


If N is a leaf node easy
Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is nonempty

Find a “candidate replacement” node R in one of the subtrees
such that:







Every other node R1 in N.NW is to the NW of R
Every other node R2 in N.SW is to the SW of R
etc…
Replace all of N’s non-link fields by those of R
Recursively delete R from Ti
In general, it may not always be possible to find such as
replacement node
Q: What happens in the worst case?
Point quadtrees: Deletion

Deletion of point (x,y) from T


If N is a leaf node easy
Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is nonempty

Find a “candidate replacement” node R in one of the subtrees
such that:







Every other node R1 in N.NW is to the NW of R
Every other node R2 in N.SW is to the SW of R
etc…
Replace all of N’s non-link fields by those of R
Recursively delete R from Ti
In general, it may not always be possible to find such as
replacement node
Q: What happens in the worst case? May require all
nodes to be reinserted
Point quadtrees: Range Searches


Each node in a point quadtree represents a
region
Do not search regions that do not intersect
the circle defined by the query
SAMs - Detailed outline

spatial access methods






problem dfn
k-d trees
point quadtrees
MX-quadtrees
z-ordering
R-trees
MX-Quadtrees

Drawbacks of 2-d trees, point quadtrees:



shape of tree depends upon the order in which
objects are inserted into the tree
splits may be uneven depending upon where the
point (N.xval, N.yval) is located inside the region
(represented by N)
MX-quadtrees: shape (and height) of tree
independent of number of nodes and order of
insertion
MX-Quadtrees


Assumption: the map is represented as
a grid of size (2k x 2k) for some k
When a region gets “split” it splits down
the middle
MX-Quadtrees - Insertion
After insertion of A, B, C, and D respectively
MX-Quadtrees - Insertion
After insertion of A, B, C, and D respectively
MX-Quadtrees - Deletion



Fairly easy – why?
All point are represented at the leaf
level
Total time for deletion: O(k)
MX-Quadtrees –Range Queries


Same as in point quadtrees
One difference:

Checking to see if a point is in the circle
defined by the range query needs to be
performed at the leaf level (points are
stored at the leaf level)
SAMs - Detailed outline

spatial access methods






problem dfn
k-d trees
point quadtrees
MX-quadtrees
z-ordering
R-trees
z-ordering
Q: how would you organize, e.g., n-dim
points, on disk? (C points per disk
page)
Hint: reduce the problem to 1-d points(!!)
Q1: why?
A:
Q2: how?
z-ordering
Q: how would you organize, e.g., n-dim
points, on disk? (C points per disk page)
Hint: reduce the problem to 1-d points (!!)
Q1: why?
A: B-trees!
Q2: how?
z-ordering
Q2: how?
A: assume finite granularity; z-ordering =
bit-shuffling = N-trees = Morton keys =
geo-coding = ...
z-ordering
Q2: how?
A: assume finite granularity (e.g., 232x232
; 4x4 here)
Q2.1: how to map n-d cells to 1-d cells?
z-ordering
Q2.1: how to map n-d cells to 1-d cells?
z-ordering
Q2.1: how to map n-d cells to 1-d cells?
A: row-wise
Q: is it good?
z-ordering
Q: is it good?
A: great for ‘x’ axis; bad for ‘y’ axis
z-ordering
Q: How about the ‘snake’ curve?
z-ordering
Q: How about the ‘snake’ curve?
A: still problems:
2^32
2^32
z-ordering
Q: Why are those curves ‘bad’?
A: no distance preservation (~ clustering)
Q: solution?
2^32
2^32
z-ordering
Q: solution? (w/ good clustering, and
easy to compute, for 2-d and n-d?)
z-ordering
Q: solution? (w/ good clustering, and
easy to compute, for 2-d and n-d?)
A: z-ordering/bit-shuffling/linearquadtrees
‘looks’ better:
• few long jumps;
• scoops out the whole quadrant
before leaving it
• a.k.a. space filling curves
z-ordering
z-ordering/bit-shuffling/linear-quadtrees
Q: How to generate this curve (z = f(x,y) )?
A: 3 (equivalent) answers!
z-ordering
z-ordering/bit-shuffling/linear-quadtrees
Q: How to generate this curve (z =
f(x,y))?
A1: ‘z’ (or ‘N’) shapes, RECURSIVELY
order-1 order-2
... order (n+1)
z-ordering
Notice:
 self similar (we’ll see about fractals,
soon)
 method is hard to use: z =? f(x,y)
order-1 order-2
... order (n+1)
z-ordering
z-ordering/bit-shuffling/linear-quadtrees
Q: How to generate this curve (z = f(x,y) )?
A: 3 (equivalent) answers!
Method #2?
z-ordering
bit-shuffling
x
00
y
11
10
01
00
y
11
z =( 0 1 0 1 )2 = 5
00
01
10
11
x
z-ordering
bit-shuffling
x
00
y
11
10
01
00
y
11
z =( 0 1 0 1 )2 = 5
How about the reverse:
00
01
10
11
x
(x,y) = g(z) ?
z-ordering
bit-shuffling
x
00
y
11
10
01
00
y
11
z =( 0 1 0 1 )2 = 5
How about n-d spaces?
00
01
10
11
x
z-ordering
z-ordering/bit-shuffling/linear-quadtrees
Q: How to generate this curve (z = f(x,y) )?
A: 3 (equivalent) answers!
Method #3?
z-ordering
linear-quadtrees : assign N->1, S->0
e.t.c.
W
E
1
0
0
1
N
01... 11...
S
00... 10...
z-ordering
... and repeat recursively. Eg.: zgray-cell =
WN;WN = (0101)2 = 5
W
E
00
1
0
0
1
N
01... 11...
S
00... 10...
11
z-ordering
Drill: z-value of grey cell, with the three
methods?
W
E
1
N
0
S
0
1
z-ordering
Drill: z-value of grey cell, with the three
methods?
W
E
1
N
0
S
0
1
method#1: 14
method#2: shuffle(11;10)=
(1110)2 = 14
z-ordering
Drill: z-value of grey cell, with the three
methods?
W
E
1
N
0
S
0
1
method#1: 14
method#2: shuffle(11;10)=
(1110)2 = 14
method#3: EN;ES = ... = 14
z-ordering - Detailed outline

spatial access methods

z-ordering





main idea - 3 methods
use w/ B-trees; algorithms (range, knn queries
...)
non-point (eg., region) data
analysis; variations
R-trees
z-ordering - usage & algo’s
Q1: How to store on disk?
A:
Q2: How to answer range queries etc
z-ordering - usage & algo’s
Q1: How to store on disk?
A: treat z-value as primary key; feed to B-tree
PGH
SF
z
cnam e
5
12
SF
PG H
etc
z-ordering - usage & algo’s
MAJOR ADVANTAGES w/ B-tree:
 already inside commercial systems (no coding
/debugging!)
 concurrency & recovery is ready
PGH
SF
z
cnam e
5
12
SF
PG H
etc
z-ordering - Detailed outline

spatial access methods

z-ordering





main idea - 3 methods
use w/ B-trees; algorithms (range, knn queries
...)
non-point (eg., region) data
analysis; variations
R-trees
z-ordering - variations
Q: is z-ordering the best we can do?
z-ordering - variations
Q: is z-ordering the best we can do?
A: probably not - occasional long ‘jumps’
Q: then?
z-ordering - variations
Q: is z-ordering the best we can do?
A: probably not - occasional long ‘jumps’
Q: then? A1: Gray codes
z-ordering - variations
A2: Hilbert curve! (a.k.a. Hilbert-Peano
curve)
z-ordering - variations
‘Looks’ better (never long jumps). How to
derive it?
z-ordering - variations
‘Looks’ better (never long jumps). How to
derive it?
order-1
order-2
... order (n+1)
z-ordering - variations
Q: function for the Hilbert curve ( h = f(x,y) )?
A: bit-shuffling, followed by post-processing,
to account for rotations. Linear on # bits.
See textbook, for pointers to
code/algorithms (eg., [Jagadish, 90])
z-ordering - variations
Q: how about Hilbert curve in 3-d? n-d?
A: Exists (and is not unique!). Eg., 3-d,
order-1 Hilbert curves (Hamiltonian
paths on cube)
#1
#2
z-ordering - Detailed outline

spatial access methods

z-ordering






main idea - 3 methods
use w/ B-trees; algorithms (range, knn queries
...)
non-point (eg., region) data
analysis; variations
R-trees
...
z-ordering - analysis
Q: How many pieces (‘quad-tree blocks’)
per region?
A: proportional to perimeter (surface etc)
z-ordering - analysis
(How long is the coastline, say, of England?
Paradox: The answer changes with the yardstick -> fractals ...)
z-ordering - analysis
Q: Should we decompose a region to full
detail (and store in B-tree)?
z-ordering - analysis
Q: Should we decompose a region to full
detail (and store in B-tree)?
A: NO! approximation with 1-3 pieces/zvalues is best [Orenstein90]
z-ordering - analysis
Q: how to measure the ‘goodness’ of a curve?
z-ordering - analysis
Q: how to measure the ‘goodness’ of a curve?
A: e.g., avg. # of runs, for range queries
4 runs
3 runs
(#runs ~ #disk accesses on B-tree)
z-ordering - analysis
Q: So, is Hilbert really better?
A: 27% fewer runs, for 2-d (similar for 3-d)
Q: are there formulas for #runs, #of
quadtree blocks etc?
A: Yes ([Jagadish; Moon+ etc] see
textbook)
z-ordering - fun observations
Hilbert and z-ordering curves: “space filling
curves”: eventually, they visit every point
in n-d space - therefore:
order-1
order-2
... order (n+1)
z-ordering - fun observations
... they show that the plane has as many
points as a line (-> headaches for 1900’s
mathematics/topology). (fractals, again!)
order-1
order-2
... order (n+1)
z-ordering - fun observations
Observation #2: Hilbert (like) curve for video
encoding [Y. Matias+, CRYPTO ‘87]:
Given a frame, visit its pixels in randomized
hilbert order; compress; and transmit
z-ordering - fun observations
In general, Hilbert curve is great for
preserving distances, clustering, vector
quantization etc
Conclusions



z-ordering is a great idea (n-d points ->
1-d points; feed to B-trees)
used by TIGER system and (most
probably) by other GIS products
works great with low-dim points
SAMs – Detailed Outline

spatial access methods






problem dfn
k-d trees
point quadtrees
MX-quadtrees
z-ordering
R-trees
SAMs - more detailed outline

R-trees





main idea; file structure
(algorithms: insertion/split)
(deletion)
(search: range, nn, spatial joins)
variations (packed; hilbert;...)
R-trees



z-ordering: cuts regions to pieces ->
dup. elim.
how could we avoid that?
Idea: Minimum Bounding Rectangles
R-trees

[Guttman 84] Main idea: allow parents
to overlap!



=> guaranteed 50% utilization
=> easier insertion/split algorithms.
(only deal with Minimum Bounding
Rectangles - MBRs)
R-trees

eg., w/ fanout 4: group nearby rectangles to
parent MBRs; each group -> disk page
I
AC
G
F
B
E
D
H
J
R-trees

eg., w/ fanout 4:
P1
P3
AC
G
F
B
E
P2 D
I
H
P4 J
A B C
D E
H I
F G
J
R-trees

eg., w/ fanout 4:
P1
P3
AC
P1 P2 P3 P4
G
F
B
E
P2 D
I
H
P4 J
A B C
D E
H I
F G
J
R-trees - format of nodes

{(MBR; obj-ptr)} for leaf nodes
P1 P2 P3 P4
x-low; x-high
obj
y-low; y-high
ptr ...
...
A B C
R-trees - format of nodes

{(MBR; node-ptr)} for non-leaf nodes
x-low; x-high
node
y-low; y-high
ptr ...
...
P1 P2 P3 P4
A B C
R-trees - range search?
P1
P3
AC
P1 P2 P3 P4
G
F
B
E
P2 D
I
H
P4 J
A B C
D E
H I
F G
J
R-trees - range search?
P1
P3
AC
P1 P2 P3 P4
G
F
B
E
P2 D
I
H
P4 J
A B C
D E
H I
F G
J
R-trees - range search
Observations:
 every parent node completely covers its
‘children’
 a child MBR may be covered by more than
one parent - it is stored under ONLY ONE of
them. (i.e., no need for dup. elim.)
 a point query may follow multiple branches.
 everything works for any dimensionality
SAMs - more detailed outline

R-trees






main idea; file structure
algorithms: insertion/split
deletion
search: range, nn, spatial joins
performance analysis
variations (packed; hilbert;...)
R-trees - insertion

eg., rectangle ‘X’
P1
P3
AC
P1 P2 P3 P4
G
F
B
X
P2 D
I
E
H
P4 J
A B C
D E
H I
F G
J
R-trees - insertion

eg., rectangle ‘X’
P1
P3
AC
P1 P2 P3 P4
G
F
B
X
P2 D
I
E
H
P4 J
A B C
D E X
H I
F G
J
R-trees - insertion

eg., rectangle ‘Y’
P1
P3
AC
P1 P2 P3 P4
G
F
B
Y
P2 D
I
E
H
P4 J
A B C
D E
H I
F G
J
R-trees - insertion

P1
eg., rectangle ‘Y’: extend suitable
parent.
P3
AC
P1 P2 P3 P4
G
F
B
Y
P2 D
I
E
H
P4 J
A B C
D E Y
H I
F G
J
R-trees - insertion


eg., rectangle ‘Y’: extend suitable
parent.
Q: how to measure ‘suitability’?
R-trees - insertion




eg., rectangle ‘Y’: extend suitable
parent.
Q: how to measure ‘suitability’?
A: by increase in area (volume) (more
details: later, under ‘performance
analysis’)
Q: what if there is no room? how to
split?
R-trees - insertion

P1
eg., rectangle ‘W’
P3
K
AC
W
B
E
P2 D
I
P1 P2 P3 P4
G
F
H
P4 J
A B C K
H I
D E
F G
J
R-trees - insertion

P1
eg., rectangle ‘W’ - focus on ‘P1’ - how
to split?
K
AC
B
W
R-trees - insertion

P1
eg., rectangle ‘W’ - focus on ‘P1’ - how
to split?
• (A1: plane sweep,
K
AC
B
W
until 50% of rectangles)
• A2: ‘linear’ split
• A3: quadratic split
• A4: exponential split
R-trees - insertion & split


pick two rectangles as ‘seeds’;
assign each rectangle ‘R’ to the ‘closest’
‘seed’
seed2
R
seed1
R-trees - insertion & split



pick two rectangles as ‘seeds’;
assign each rectangle ‘R’ to the ‘closest’
‘seed’
Q: how to measure ‘closeness’?
R-trees - insertion & split




pick two rectangles as ‘seeds’;
assign each rectangle ‘R’ to the ‘closest’
‘seed’
Q: how to measure ‘closeness’?
A: by increase of area (volume)
R-trees - insertion & split


pick two rectangles as ‘seeds’;
assign each rectangle ‘R’ to the ‘closest’
‘seed’
seed2
R
seed1
R-trees - insertion & split


pick two rectangles as ‘seeds’;
assign each rectangle ‘R’ to the ‘closest’
‘seed’
seed2
R
seed1
R-trees - insertion & split



pick two rectangles as ‘seeds’;
assign each rectangle ‘R’ to the ‘closest’
‘seed’
smart idea: pre-sort rectangles
according to delta of closeness (ie.,
schedule easiest choices first!)
R-trees - insertion - pseudocode


decide which parent to put new
rectangle into (‘closest’ parent)
if overflow, split to two, using (say,) the
quadratic split algorithm


propagate the split upwards, if necessary
update the MBRs of the affected
parents.
R-trees - insertion - observations

many more split algorithms exist
(next!)
SAMs - more detailed outline

R-trees






main idea; file structure
algorithms: insertion/split
deletion
search: range, nn, spatial joins
performance analysis
variations (packed; hilbert;...)
R-trees - deletion


delete rectangle
if underflow

??
R-trees - deletion


delete rectangle
if underflow



temporarily delete all siblings (!);
delete the parent node and
re-insert them
SAMs - more detailed outline

R-trees






main idea; file structure
algorithms: insertion/split
deletion
search: range, nn, spatial joins
performance analysis
variations (packed; hilbert;...)
R-trees - range search
pseudocode:
check the root
for each branch,
if its MBR intersects the query rectangle
apply range-search (or print out, if this
is a leaf)
R-trees - nn search
P1
P3
AC
G
F
B
q
E
P2 D
I
H
P4 J
R-trees - nn search

Q: How? (find near neighbor; refine...)
P1
P3
AC
G
F
B
q
E
P2 D
I
H
P4 J
R-trees - nn search

A1: depth-first search; then, range
query P1
P3
I
AC
G
F
B
q
E
P2 D
H
P4 J
R-trees - nn search

A1: depth-first search; then, range
query P1
P3
I
AC
G
F
B
q
E
P2 D
H
P4 J
R-trees - nn search

A1: depth-first search; then, range query
P1
P3
AC
G
F
B
q
E
P2 D
I
H
P4 J
R-trees - nn search

A2: [Roussopoulos+, sigmod95]:


priority queue, with promising MBRs, and
their best and worst-case distance
main idea:
R-trees - nn search
consider only P2 and P4, for illustration
P1
P3
AC
G
F
B
q
E
P2 D
I
H
P4 J
R-trees - nn search
best of P4
=> P4 is useless
for 1-nn
worst of P2
H
q
E
P2 D
P4 J
R-trees - nn search

what is really the worst of, say, P2?
worst of P2
q
E
P2 D
R-trees - nn search


what is really the worst of, say, P2?
A: the smallest of the two red segments!
q
P2
R-trees - nn search

variations: [Hjaltason & Samet] incremental nn:




build a priority queue
scan enough of the tree, to make sure you have the
k nn
to find the (k+1)-th, check the queue, and scan
some more of the tree
‘optimal’ (but, may need too much memory)
SAMs - more detailed outline

R-trees






main idea; file structure
algorithms: insertion/split
deletion
search: range, nn, spatial joins
performance analysis
variations (packed; hilbert;...)
R-trees - spatial joins
Spatial joins: find (quickly) all
counties
intersecting
lakes
R-trees - spatial joins
Assume that they are both organized in Rtrees:
R-trees - spatial joins
for each parent P1 of tree T1
for each parent P2 of tree T2
if their MBRs intersect,
process them recursively (ie., check their
children)
R-trees - spatial joins
Improvements - variations:
- [Seeger+, sigmod 92]: do some pre-filtering; do
plane-sweeping to avoid N1 * N2 tests for
intersection
- [Lo & Ravishankar, sigmod 94]: ‘seeded’ R-trees
(FYI, many more papers on spatial joins, without
R-trees: [Koudas+ Sevcik], e.t.c.)
SAMs - more detailed outline

R-trees





main idea; file structure
algorithms: insertion/split
deletion
search: range, nn, spatial joins
variations (packed; hilbert;...)
R-trees - variations
Guttman’s R-trees sparked much follow-up
work
 can we do better splits?
 what about static datasets (no
ins/del/upd)?
 what about other bounding shapes?
R-trees - variations
Guttman’s R-trees sparked much follow-up
work
 can we do better splits?

i.e, defer splits?
R-trees - variations
A: R*-trees [Kriegel+, SIGMOD90]
 defer splits, by forced-reinsert, i.e.:
instead of splitting, temporarily delete
some entries, shrink overflowing MBR,
and re-insert those entries
 Which ones to re-insert?
 How many?
R-trees - variations
A: R*-trees [Kriegel+, SIGMOD90]
 defer splits, by forced-reinsert, i.e.:
instead of splitting, temporarily delete
some entries, shrink overflowing MBR,
and re-insert those entries
 Which ones to re-insert?
 How many? A: 30%
R-trees - variations
Q: Other ways to defer splits?
R-trees - variations
Q: Other ways to defer splits?
A: Push a few keys to the closest sibling
node
(closest = ??)
R-trees - variations
R*-trees: Also try to minimize area AND
perimeter, in their split.
Performance: higher space utilization;
faster than plain R-trees. One of the
most successful R-tree variants.
R-trees - variations
Guttman’s R-trees sparked much follow-up
work
 can we do better splits?
 what about static datasets (no
ins/del/upd)?


Hilbert R-trees
what about other bounding shapes?
R-trees - variations


what about static datasets (no
ins/del/upd)?
Q: Best way to pack points?
R-trees - variations



what about static datasets (no
ins/del/upd)?
Q: Best way to pack points?
A1: plane-sweep
great for queries on ‘x’;
terrible for ‘y’
R-trees - variations



what about static datasets (no
ins/del/upd)?
Q: Best way to pack points?
A1: plane-sweep
great for queries on ‘x’;
bad for ‘y’
R-trees - variations




what about static datasets (no
ins/del/upd)?
Q: Best way to pack points?
A1: plane-sweep
great for queries on ‘x’;
terrible for ‘y’
Q: how to improve?
R-trees - variations

A: plane-sweep on HILBERT curve!
R-trees - variations



A: plane-sweep on HILBERT curve!
In fact, it can be made dynamic (how?),
as well as to handle regions (how?)
A: [Kamel+, VLDB94]
R-trees - variations
Guttman’s R-trees sparked much follow-up
work
 can we do better splits?
 what about static datasets (no
ins/del/upd)?
 what about other bounding shapes?
R-trees - variations



what about other bounding shapes? (and why?)
A1: arbitrary-orientation lines (cell-tree,
[Guenther]
A2: P-trees (polygon trees) (MB polygon: 0, 90,
45, 135 degree lines)
R-trees - variations



A3: L-shapes; holes (hB-tree)
A4: TV-trees [Lin+, VLDB-Journal 1994]
A5: SR-trees [Katayama+, SIGMOD97] (used in
Informedia)
R-trees - conclusions





Popular method; like multi-d B-trees
guaranteed utilization
good search times (for low-dim. at least)
R*-, Hilbert- and SR-trees: still used
IBM (Informix) ships DataBlade with R-trees
References



Guttman, A. (June 1984). R-Trees: A Dynamic Index
Structure for Spatial Searching. Proc. ACM SIGMOD,
Boston, Mass.
Jagadish, H. V. (May 23-25, 1990). Linear Clustering
of Objects with Multiple Attributes. ACM SIGMOD
Conf., Atlantic City, NJ.
Lin, K.-I., H. V. Jagadish, et al. (Oct. 1994). “The TVtree - An Index Structure for High-dimensional Data.”
VLDB Journal 3: 517-542.
References, cont’d



Pagel, B., H. Six, et al. (May 1993). Towards an
Analysis of Range Query Performance. Proc. of ACM
SIGACT-SIGMOD-SIGART Symposium on Principles of
Database Systems (PODS), Washington, D.C.
Robinson, J. T. (1981). The k-D-B-Tree: A Search
Structure for Large Multidimensional Dynamic
Indexes. Proc. ACM SIGMOD.
Roussopoulos, N., S. Kelley, et al. (May 1995).
Nearest Neighbor Queries. Proc. of ACM-SIGMOD,
San Jose, CA.
Download