Recent spatial work by Jim Gray and Alex Szalay Bob Mann

advertisement
Recent spatial work by Jim
Gray and Alex Szalay
Bob Mann
Recent work on spatial querying
by Jim Gray & Alex Szalay
SkyServer HTM approach works, but precomputing
neighbours table take long time
– Partially solved by new version of HTM library (soon):
>10 times faster than current version
– C# interface to next version of SQL Server
Want a purely relational approach
– vendor-neutral
– no “impedance mismatch” with external calls
So, Gray and Szalay working on more geometrical
approach
Relational method for computing
neighbours on sphere
Problem:
– fGetNearbyObjEq() function can compute neighbours of
~70 objects/sec (GHz PC)
– Scaling to full SDSS: 2 months of computation!
One solution:
– Don’t use fGetNearbyObjEq/HTM index for computing
neighbours
– Subdivide sphere into Dec zones, then cut by RA
[Following slides stolen/adapted from Jim Gray]
Zone Based Spatial Join
Divide space into zones
Key points by Zone (by Dec), offset (RA)
(on the sphere this need wrap-around margin.)
Point search
look in a few zones
at a limited offset: ra ± r
a bounding box that has
1-π/4 false positives
All inside the relational engine
Avoids “impedance mismatch”
r
Can “batch” all-all comparisons
32x faster and parallel
√(r +(ra-zoneMax) )
2
ra-zoneMax
x
2
zoneMax
cos(radians(zoneMax))
Ra ± x
In SQL: points near point
create table zone (zone int, objID bigint, ra float, dec float,
x float, y float, z float, primary key (zone, ra, objID))
insert into zone
select floor((dec+90) /zoneHeight), ra, dec, cx, cy, cz
from PhotoObj
select o1.objID
-- find objects
from zone o1
-- in the zoned table
where o1.zoneID between
-- where zone #
floor((@dec-@r)/@zoneHeight) and -- overlaps the circle
ceiling((@dec+@r)/@zoneHeight)
and o1.ra between @ra - @r and @ra + @r -- quick filter on ra
and o1.dec between @dec-@r and @dec+@r -- quick filter on dec
and ( (sqrt( power(o1.cx-@cx,2)+power(o1.cy-@cy,2)+power(o1.cz-@cz,2))))
< @r
-- careful filter on distance
Some details to add
Correct for “compression” of RA by cos(dec) and
avoid division by zero at poles
– o.ra between (ra-R)/ (cos(dec)+) and (ra+R)/(cos(dec)+)
Include wrap around in RA (RA=0 & 360º)
– Insert margins into zone table (duplicate entries)
Result: 32x speed up over use of fGetNearbyObjEq
– More by parallelisation
Neighbour precomputation looks OK for WSA/VSA!
“Relational approach to testing
point-in-polygon containment”
Think geometrically
– Point within circular radius on sphere
translated to half-space cut in 3D Cartesian
space
Combine spatial cuts as
half-space constraints
Store geom. constraint
info in DB & query
[Graphic from Alex Szalay]
The Idea:
Equations Define Subspaces
y
For (x,y) above the line
ax+by > c
Reverse the space by
-ax + -by > -c
Intersect 3 half-spaces:
a1x + b1y > c1
a2x + b2y > c2
a3x + b3y > c3
x=c/a
x
y=c/b
y
x
Some terminology
A half-space, H, of the N-dimensional
space S can be expressed as
H = {x in S | f(x) > 0}
The intersection of a set of half-spaces
{Hi} is a convex hull of points, C
C = {x in S | x in Hi for all i}
A domain, D, is the union of a set of
convex hulls {Ci}
D = {x in Ci for all i}
The Idea:
y
Equations Define Subspaces
a1x + b1y > c1
a2x + b2y > c2
a3x + b3y > c3
x
2
1
3
select count(*)
from convex
where a*@x + b*@y < c
select count(*)
from convex
where a*@x + b*@y > c
1
2
2
y
1
x
2
1
2
0
1
1
2
Domain is Union of Convex Hulls
Not a
convex hull
+
Simple volumes are
unions of convex hulls.
Higher order curves
also work
Complex volumes have
holes and their holes
have holes.
(that is harder).
Now in Relational Terms
create table HalfSpace (
domainID
int not null -- domain name
foreign key references Domain(domainID),
convexID int not null, -- grouping a set of ½ spaces
halfSpaceID int identity(),
-- a particular ½ space
a
float not null,
-- the (a,b,..) parameters
b
float not null,
-- defining the ½ space
c
float not null,
-- the constraint (“c” above)
primary key (domainID, convexID, halfSpaceID)
(x,y) inside a convex if it is inside all lines of the convex
(x,y) inside a convex if it is NOT OUTSIDE ANY line of the convex
Convexes containing point (@x,@y):
select convexID
-from HalfSpace
-where (@x * a + @y * b) < c
group by all convexID
-having count(*) = 0
-- is
return the convex hulls
from the constraints
-- point outside the line?
insist no line of convex
outside (count outside == 0)
All Domains Containing this Point
The group by
is supported by the domain/convex index,
so it’s a sequential scan (pre-sorted!).
select distinct domainID -- return domains
from
HalfSpace
-- from constraints
where (@x * a + @y * b) < c
-- point outside
group by all domainID, convexID -– never happens
having count(*) = 0
-- count outside == 0
The Algebra is Simple (Boolean)
@domainID = spDomainNew (@type varchar(16), @comment varchar(8000))
@convexID = spDomainNewConvex (@domainID int)
@halfSpaceID = spDomainNewConvexConstraint (@domainID int, @convexID int,
@a float, @b float, @c float)
@returnCode = spDomainDrop(@domainID)
select * from fDomainsContainPoint(@x float, @y float)
Once constructed they can be manipulated with the Boolean operations.
@domainID = spDomainOr (@domainID1 int, @domainID2 int,
@type varchar(16), @comment varchar(8000))
@domainID = spDomainAnd (@domainID1 int, @domainID2 int,
@type varchar(16), @comment varchar(8000))
@domainID = spDomainNot (@domainID1 int,
@type varchar(16), @comment varchar(8000))
Very different approach to spatial querying – “constraint DB”
Reference: Representing Polygon Areas and Testing Point-inPolygon Containment in a Relational Database
http://research.microsoft.com/~Gray/papers/Polygon.doc
Download