Spatial data structures

advertisement
SPATIAL DATA STRUCTURES –KD-TREES
Jianping Fan
Department of Computer Science
UNC-Charlotte
SUMMARY

This lecture introduces multi-dimensional queries in
databases, as well as addresses how we can query
and represent multi-dimensional data
“A reasonable man adapts himself to his environment.
An unreasonable man persists in attempting to adapt
his environment to suit himself …Therefore, all
progress depends on unreasonable man”
 George Bernard Shaw

CONTENTS
Definitions
 Basic operations and construction
 Range queries on multi-attributes
 Variants
 Applications

USAGE
Rendering
 Surface reconstruction
 Collision detection
 Vision and machine learning
 Intel Interactive technology

KD TREE DEFINITION
A recursive space partitioning tree.
 – Partition along x and y axis in an alternating
fashion.
 – Each internal node stores the splitting node
along x (or y).

K-D TREE
Used for point location and multiple database
quesries, k –number of the attributes to perform
the search
 Geometric interpretation – to perform search in
2D space – 2-d tree
 Search components (x,y) interchange!

K-D TREE EXAMPLE
d
d
c
e
f
b
a
f
b
c a
e
KD TREE EXAMPLE
3D KD TREE
CONSTRUCTION



The canonical method of kd-tree construction is the
following:
As one moves down the tree, one cycles through the
axes used to select the splitting planes. (For
example, the root would have an x-aligned plane, the
root's children would both have y-aligned planes, the
root's grandchildren would all have z-aligned planes,
the next level would have an x-aligned plane, and so
on.)
Points are inserted by selecting the median of the
points being put into the subtree, with respect to
their coordinates in the axis being used to create the
splitting plane. (Note the assumption that we feed
the entire set of points into the algorithm up-front.)
CONSRUCTION


This method leads to a balanced kd-tree, in which
each leaf node is about the same distance
from the root. However, balanced trees are not
necessarily optimal for all applications.
Note also that it is not required to select the median
point. In that case, the result is simply that there is
no guarantee that the tree will be balanced. A
simple heuristic to avoid coding a complex lineartime median-finding algorithm or using an O(n log
n) sort is to use sort to find the median of a fixed
number of randomly selected points to serve as the
cut line
KD TREE – MEAN VS MEDIAN
kd-tree partitions of a uniform set of data points, using the mean
(left image) and the median (right image) thresholding options.
Median: The middle value of a set of values. Mean: The arithmetic
average.
(Andrea Vivaldi and Brian Fulkersson)
http://www.vlfeat.org/overview/kdtree.html
EXAMPLE OF USING MEDIAN
ADDITIONS




One adds a new point to a kd-tree in the same way as
one adds an element to any other search tree.
First, traverse the tree, starting from the root and
moving to either the left or the right child depending
on whether the point to be inserted is on the "left" or
"right" side of the splitting plane.
Once you get to the node under which the child
should be located, add the new point as either the left
or right child of the leaf node, again depending on
which side of the node's splitting plane contains the
new node.
Adding points in this manner can cause the tree to
become unbalanced, leading to decreased tree
performance
DELETIONS


To remove a point from an existing kd-tree,
without breaking the invariant, the easiest way
is to form the set of all nodes and leaves from the
children of the target node, and recreate that
part of the tree.
Another approach is to find a replacement for
the point removed. First, find the node R that
contains the point to be removed. For the base
case where R is a leaf node, no replacement is
required. For the general case, find a
replacement point, say p, from the sub-tree
rooted at R. Replace the point stored at R with p.
Then, recursively remove p.
BALANCING
Balancing a kd-tree requires care. Because kdtrees are sorted in multiple dimensions, the tree
rotation technique cannot be used to balance
them — this may break the invariant.
 Several variants of balanced kd-tree exists. They
include divided kd-tree, pseudo kd-tree, K-D-Btree, hB-tree and Bkd-tree. Many of these
variants are adaptive k-d tree.

QUERING

Kdtree query uses a best-bin first search
heuristic. This is a branch-and-bound technique
that maintains an estimate of the smallest
distance from the query point to any of the data
points down all of the open paths.

Kdtree query supports two important operations:
nearest-neighbor search and k-nearest neighbor
search. The first returns nearest-neighbor to a
query point, the latter can be used to return the k
nearest neighbors to a given query point Q. For
instance:
NEAREST-NEIGHBOR SEARCH
Starting with the root node, the algorithm moves down
the tree recursively (i.e. it goes right or left depending
on whether the point is greater or less than the current
node in the split dimension).
 Once the algorithm reaches a leaf node, it saves that
node point as the "current best"
 The algorithm unwinds the recursion of the tree,
performing the following steps at each node:

RECURSION STEP
If the current node is closer than the current best, then it
becomes the current best.
◦ The algorithm checks whether there could be any points on
the other side of the splitting plane that are closer to the
search point than the current best. In concept, this is done
by intersecting the splitting hyperplane with a hypersphere
around the search point that has a radius equal to the
current nearest distance.
◦ If the hypersphere crosses the plane, there could be nearer
points on the other side of the plane, so the algorithm must
move down the other branch of the tree from the current
node looking for closer points, following the same recursive
process as the entire search.
◦
If the hypersphere doesn't intersect the splitting
plane, then the algorithm continues walking up the
tree, and the entire branch on the other side of that
node is eliminated.
NEAREST-NEIGHBOR SEARCH



kd-trees are not suitable for efficiently finding
the nearest neighbour in high dimensional
spaces.
In very high dimensional spaces, the curse of
dimensionality causes the algorithm to need to visit
many more branches than in lower dimensional
spaces. In particular, when the number of points is
only slightly higher than the number of dimensions,
the algorithm is only slightly better than a linear
search of all of the points.
The algorithm can be improved. It can provide the kNearest Neighbors to a point by maintaining k
current bests instead of just one. Branches are only
eliminated when they can't have points closer than
any of the k current bests.
RANGE SEARCH
Kd tree provide convenient tool for range search
query in databases with more than one key. The
search might go down the root in both directions
(left and right), but can be limited by strict
inequality on key value at each tree level.
 Kd tree is the only data structure that allows
easy multi-key search.

KD TREE
http://upload.wikimedia.org/wikipedia/en/9/9c/KDTree-animation.gif
COMPLEXITY





Building a static kd-tree from n points takes O(n log 2
n) time if an O(n log n) sort is used to compute the
median at each level.
The complexity is O(n log n) if a linear medianfinding algorithm such as the one described in
Cormen et al.] is used.
Inserting a new point into a balanced kd-tree takes
O(log n) time.
Removing a point from a balanced kd-tree takes
O(log n) time.
Querying an axis-parallel range in a balanced kdtree takes O(n1-1/k +m) time, where m is the number
of the reported points, and k the dimension of the kdtree.
KD TREE OF RECTANGLES





Instead of points, a kd-tree can also contain rectangles.
A 2D rectangle is considered a 4D object (xlow, xhigh, ylow,
yhigh).
Thus range search becomes the problem of returning all
rectangles intersecting the search rectangle.
The tree is constructed the usual way with all the
rectangles at the leaves. In an orthogonal range search,
the opposite coordinate is used when comparing against
the median. For example, if the current level is split along
xhigh, we check the xlow coordinate of the search rectangle.
If the median is less than the xlow coordinate of the search
rectangle, then no rectangle in the left branch can ever
intersect with the search rectangle and so can be pruned.
Otherwise both branches should be traversed.
Note that interval tree is a 1-dimensional special case.
APPLICATIONS
Query processing in sensor networks
 Nearest-neighbor searchers
 Optimization
 Ray tracing
 Database search by multiple keys

EXAMPLES OF APPLICATIONS
Population, ’96
0
100Km.
Population Distribution in Alberta, 1996 census
PROGRESSIVE MESHES
Developed by Hugues Hoppe, Microsoft Research Inc. Published
first in SIGGRAPH 1996.
TERRAIN VISUALIZATION APPLICATIONS
GEOMETRIC SUBDIVISION
Problems with Geometric Subdivisions
ROAM PRINCIPLE
The basic operating principle of ROAM
REVIEW QUESTIONS
Define kd tree
 What is the difference from B tree? R tree? Quad
tree? Grid file? Interval tree?
 Define complexity of basic operations
 What is the difference between mean and median
kd tree?
 List typical queries – nearest-neighbor, k nearest
neighbors
 Provide examples of kd tree applciations

SOURCES
 In-line
references to current research in
the area and variety of research papers
and web sources and applications.
DECISION TREE
 Database
indexing structure is built for decision
making and tries to make the decision as fast as
possible!
Color =
yes
no
Green?
Size = Big?
yes
watermelo
n
Color =
yes Yellow?
no
Size =
Medium?
yes
apple
no
Grape
Shape = Round?
yes
no
Size = Big?
yes
no
grapefruit
no
Size = small?
yes
no
apple
bananaTaste = sweet?
yes
no
lemon
cherry
grape
DECISION TREE

How to obtain decision for a database?
a. Obtain a set of labeled training data set from the database.
b. Calculate the entropy impurity:
i(n)   p( j ) log2 p( j )
j
c. Classifier is built by:
maxi(n) 
KD-TREE
 By
treating query as a decision making
procedure, we can use decision to build more
effective database indexing!
Database root node
no
Salary > $75000?
yes
Age > 60?
no
yes
no
Data table
Age > 60?
yes
no
KD-TREE
 Each
inter-node, only one attribute is used!
 It is not balance! Search from different node
may have different I/O cost!
 It can support multiple attribute database
indexing like R-tree!
 It has integrated decision making and
database query!
KD-TREE
Tree levels: N; Leaf nodes: M; Number of data entries for leaf node: K
The inter-nodes for kd-tree at the same level are stored on the same page.
a. Equal query: N + M
b. Range query: N + M
c. Insert: N + M + 1
d. Delete: N+ M + 1
STORAGE MANAGEMENT FOR HIGHDIMENSIONAL INDEXING STRUCTURES
We want to put the similar data in the same page or neighboring pages!
CLUSTERED
CLUSTERED
UNCLUSTERED
Index entries
direct search for
Index
entries
data entries
direct search for
data entries
UNCLUSTERED
Data entries
Data entries
Data entries
(Index File)
Data entries
(Data File)
file)
(Index
(Data file)
Data Records
Data Records
Data Records
Data Records
STORAGE MANAGEMENT FOR HIGHDIMENSIONAL INDEXING STRUCTURES
It is very hard to do multi-dimensional data sorting
Hilbert Curve: scale multi-dimensional data into one dimension.
00
01
10
11
STORAGE MANAGEMENT FOR HIGHDIMENSIONAL INDEXING STRUCTURES
0
3
4
5
1
14
1
5
2
13
1
2
7
8
1
1
6
9
1
0
From multi-dimensional indexing to one-dimensional storage in disk!
Download