Database Server Extension for managing and querying 4D gridded spatiotemporal data Ian Barrodale

advertisement
Database Server Extension for
managing and querying 4D gridded
spatiotemporal data
Presented at the Edinburgh e-Science Institute Nov 1-2, 2005
conference on “Spatiotemporal Databases”
by
Ian Barrodale
Barrodale Computing Services Ltd. (BCS)
http://www.barrodale.com
Barrodale Computing Services Ltd. (BCS)
“At BCS we let the actual tasks
that our clients are trying to
accomplish guide our solutions,
rather than producing software
that dictates how clients can
perform their work.”
Provides customized software
and R&D services to technical
clients
Successfully completed 450+
software development projects
since incorporation in 1978
Long-term professional staff
IBM Business Partner
Major clients include:
Canada - Province of BC
(Elections BC, Ministry of
Forests), DND,...
USA - US Navy, NOAA, IBM,
SPAWAR, Univ. of Mississippi,...
Barrodale Computing Services Ltd. (BCS)
Some Skill Sets:
Some Application Areas:
•
•
•
•
•
Defense Sciences
(ASW, MCM, METOC)
Elections/Census
(Geo-Spatial Database)
Forestry
(Spatial Timber Supply Models)
Terrain Modeling
(Watershed Delineation)
Seabed Monitoring
(Gas Hydrates)
•
•
•
•
•
•
•
•
•
•
•
•
Mathematical Analysis
Algorithm Development
Signal & Image Processing
Modeling & Simulation
Software Engineering
Spatial Data Analysis
Spatial Database Design
Database Server Extensions
Large Dataset Management
Graphical User Interfaces
Data Visualization
Web Map Services
(Simplistic) Database Classification Matrix
Query
No Query
Relational
DBMS
Object
Relational
DBMS
File System
Object
Oriented
DBMS
Simple Data
Complex Data
File Server vs. RDBMS + File Server
Files for both metadata and data
vs. RDBMS for metadata & files for data.
File Server alone:
+ Simpler.
+ Less expensive.
± Metadata stored in data file name/directory or inside gridded data file.
RDBMS + File Server:
+ Integrity checking of metadata - integrity checking of metadata can
be performed by built-in RDBMS features (check constraints,
triggers, etc.).
+ Efficient access to metadata - e.g., indices can be used.
+ Easier to locate gridded data of interest - e.g., complicated queries on
metadata can be performed.
− Metadata separated from gridded data - data inconsistencies
possible.
Object Relational DBMS
RDBMS for metadata & fileserver for data
vs. ORDBMS (metadata & data integrated).
ORDBMS:
+ Improved concurrency - concurrent users can safely query the same
gridded data.
+ Composite data types - gridded data bundled with their metadata.
+ Improved integrity - ability to reject bad gridded data before it is
stored in ORDBMS.
+ Database extensibility - easy addition of data types and operations.
+ Uniform treatment of data items - SQL interface can perform complex
queries based on any of these data items, e.g., metadata as well as
gridded data; less need for custom 3GL programming.
+ Custom data access methods - e.g., R-tree indexes.
+ Point-in-time recovery of gridded data possible.
+ Built-in complex SQL functions for gridded data operations - e.g.,
aggregating, slicing, subsetting, reprojecting, subsampling, ...
BCS specializes in ORDBMS Applications
The current main platforms for BCS database applications are IBM
Informix Dynamic Server and PostgreSQL . Object Relational Data
Base Management Systems (ORDBMSs) have four features that set
them apart from traditional DBMSs:
User-defined abstract data types (ADTs). ADTs allow new data
types with structures suited to particular applications to be
defined.
User-defined routines (UDRs). UDRs provide the means for writing
customized server functions that have much of the power and
functionality expressible in C.
“SmartBLOBs”. These are disk-based objects that have the
functionality of random access files. ADTs use them to store
any data that does not fit into a table row.
Flexible spatial indexing. R-tree indexing for multi-dimensional
data enables fast searching of particular ADTs in a table.
Example: Sum the “area” (UDR) of all “lakes” (ADT) contained
(R-tree) in “British Columbia” (ADT)
Query: From a given point on a stream, what is the
entire area from which drainage is received?
“SQL” Example 1: Find the area of the watershed that
is upstream from where a given road crosses a given
stream.
SELECT Area(Watershed(streamElement,
(Intersection(streamElement, roadElement))))
FROM streamNetwork, roadNetwork
WHERE Overlap(Intersection(Box(streamElement),
Box(roadElement)), userDefinedArea);
Note:
userDefinedArea is, say, a string provided by the user.
UDRs:
BOX - rectangle enclosing object
INTERSECTION - common area
OVERLAP - T or F
WATERSHED - calculates watershed upstream from a point
AREA - calculates area
“SQL” Example 2: Find all side-scan sonar images,
that are in a user-defined area, with a heading within
one degree of 128.3 degrees and with an average
slant range of less than 50 m.
SELECT image
FROM sonarImageArchive
WHERE Overlap(Box(image), userDefinedArea)
AND ABS(Heading(image) - 128.3) < 1.0
AND Average(SlantRange(image)) < 50.0;
Note:
userDefinedArea is, say, a string provided by the user.
UDRs:
SLANTRANGE - calculates slant range
AVERAGE - calculates average
HEADING - supplies heading of object
ABS - absolute value
BOX - rectangle enclosing object
OVERLAP - T or F
“SQL” Example 3: In a user-defined area, overlay on a
sea floor map all “West”-looking side scan sonar
images of “sandy” sea floor bottom type.
SELECT Overlay(image, map)
FROM sonarImageArchive, seaFloorMapping
WHERE Overlap(Box(image), userDefinedArea)
AND Overlap(Box(map), Box(image))
AND SlantDirection(image) = “West”
AND surfaceType = “sandy”;
Note:
userDefinedArea is, say, a string provided by the user,
and surfaceType is a column in seaFloorMapping.
UDRs:
SLANTDIRECTION - calculates slant direction
BOX - rectangle enclosing object
OVERLAP - T or F
OVERLAY - overlays one image on another
Gridded Data in Databases
• Gridded data occurs in meteorology, oceanography,
the life sciences, non-destructive testing, exploration
for oil, natural gas, coal & diamonds,…
• These datasets range from simple, uniformly spaced
grid points along a single dimension (e.g., time series)
to multidimensional grids containing several types of
values (e.g., 4D cubes of meteorological attributes).
• Grids have typically been stored in simple files and
then manipulated by programs that operated on these
files. Nowadays there is increasing justification for
storing and manipulating gridded data in DBMSs: the
principal advantages are their ability to (i) ensure data
integrity and consistency, and (ii) provide diverse users
with independent and effective query-based access to
these data across multiple applications and systems.
Gridded Data in Databases
• However, implementing an efficient gridded DBMS can
be very challenging, particularly when it involves
Binary Large Objects (BLOBs), user-defined abstract
datatypes (ADTs) that encapsulate grid data structures
and attributes, and user-defined routines (UDRs) with
which applications can create, manipulate and access
the gridded data stored in these new datatypes.
• BCS has developed an efficient technology that
supports database storage, update, and fast retrieval
of gridded data; it uses BLOBs, ADTs, and UDRs.
• Our first implementation of this technology was a Grid
DataBlade for IBM Informix, and then a Grid Extension
for PostgreSQL; we are currently developing an
analogous Grid Cartridge for use with Oracle.
The BCS Grid DataBlade/Extension
• is designed to handle 1D, 2D, 3D, 4D (and “5D”) grids.
• stores grids using SmartBLOBS and a (user-controlled) tiling
•
•
•
•
•
•
•
•
scheme that together permit very efficient generation of
products (e.g., oblique slices or 1D sticks from 4D grids).
sometimes provides more than 50-fold increases in speed of
data product generation compared to the conventional
approach that does not involve tiling or SmartBLOBs.
can store the data in, and convert it between, hundreds of
mapping projections.
can handle irregularly spaced grids in any/all grid dimensions.
can handle the presence of multiple vector and/or scalar
values.
provides several interpolation options.
provides for convenient database loading and extraction of grid
files via one form of the commonly used NetCDF format.
provides C, Java, and SQL application programming interfaces.
is supplied with full user/programmer documentation.
U.S. Navy Solution
Worldwide
weather grid
Sample of
interest
ORDBMS
User query
Get grid
sample
of interest
SQL
Grid types
Grid functions
BCS Grid DataBlade
Used API to develop
grid types, functions
& indexes
U.S. Navy: Tactical Environmental Data
Services
Air Temperature
Dust
Aerosols
HumidityRefractive Effects
Terrain
Land Cover
Fog
Wind Speed / Direction
Rain Rate
Shelf /
Internal
Waves
Surf
Wind - Driven Circulation
Sensible and
Latent Heat
Island Flow
Wrecks
Trafficability
Soil Moisture
Beach Profile
Ice
Waves
Reefs, Bars, Channels
Tidal Pulse
Sediment Transport
Coastal Configuration
Turbidity
Slope (Sea Floor)
Swell / Wave
Refraction
Straits
Biologics
Hydrography - Fine Scales
Medical Application Demo
http://www.barrodale.com/grid_Demo/GridBladeApplet.html
Medical Application Demo
http://www.barrodale.com/grid_Demo/GridBladeApplet.html
Grid Fusing: Visualized through IDV
Grid Fusing: Visualized through IDV
Grid Fusing: Visualized through IDV
Grid Fusing: Visualized through IDV
Grid Fusing: SQL for this example
SELECT GRDFuse(
GRDFuseCollect(GRDPriorityGrid(image,1.0)),
'((grdspec
(translation -90.4 29.57 0 0)
(affine_transformation 0 0 0 .001 0 0 .001 0
0 1 0 0 1 0 0 0)
(dim_sizes 1 1 800 800))
(rules(weight)) )')
FROM images i, places_of_interest p
WHERE i.imageType = 'aerialPhoto' AND
overlap(grdbox(i.image),grdbox(p.loc)) AND
p.name = 'New Orleans';
SQL driving the Grid Fusion
SELECT GRDFuse(
GRDFuseCollect(GRDPriorityGrid(image,1.0)),
'((grdspec
(translation -90.4 29.57 0 0)
(affine_transformation 0 0 0 .001 0 0 .001 0 0 1 0 0 1 0 0 0)
(dim_sizes 1 1 800 800))
(rules(weight)) )')
FROM images i, places_of_interest p
WHERE i.imageType = 'aerialPhoto' AND
overlap(grdbox(i.image),grdbox(p.loc)) AND
p.name = 'New Orleans';
UDR to resample a set of grids into a single
grid.
SQL driving the Grid Fusion
SELECT GRDFuse(
GRDFuseCollect(GRDPriorityGrid(image,1.0)),
'((grdspec
(translation -90.4 29.57 0 0)
(affine_transformation 0 0 0 .001 0 0 .001 0 0 1 0 0 1 0 0 0)
(dim_sizes 1 1 800 800))
(rules(weight)) )')
FROM images i, places_of_interest p
WHERE i.imageType = 'aerialPhoto' AND
overlap(grdbox(i.image),grdbox(p.loc)) AND
p.name = 'New Orleans';
Two UDRs to build a set of transient grids,
associating a floating-point value with each of
these grids. This floating-point value is later
used to establish the relative weight of each
grid’s elements in producing the fused grid.
We’ve chosen each grid to have equal weight.
SQL driving the Grid Fusion
SELECT GRDFuse(
GRDFuseCollect(GRDPriorityGrid(image,1.0)),
'((grdspec
(translation -90.4 29.57 0 0)
(affine_transformation 0 0 0 .001 0 0 .001 0 0 1 0 0 1 0 0 0)
(dim_sizes 1 1 800 800))
(rules(weight)) )')
FROM images i, places_of_interest p
WHERE i.imageType = 'aerialPhoto' AND
overlap(grdbox(i.image),grdbox(p.loc)) AND
p.name = 'New Orleans';
Each source grid is resampled at the same
locations, using the source images’ spatial
reference system, which is a Lat-Lon grid. The
fused grid’s horizontal resolution is 0.001
degrees.
SQL driving the Grid Fusion
SELECT GRDFuse(
GRDFuseCollect(GRDPriorityGrid(image,1.0)),
'((grdspec
(translation -90.4 29.57 0 0)
(affine_transformation 0 0 0 .001 0 0 .001 0 0 1 0 0 1 0 0 0)
(dim_sizes 1 1 800 800))
(rules(weight)) )')
FROM images i, places_of_interest p
WHERE i.imageType = 'aerialPhoto' AND
overlap(grdbox(i.image),grdbox(p.loc)) AND
p.name = 'New Orleans';
The source of the grids is a table called
“images”.
SQL driving the Grid Fusion
SELECT GRDFuse(
GRDFuseCollect(GRDPriorityGrid(image,1.0)),
'((grdspec
(translation -90.4 29.57 0 0)
(affine_transformation 0 0 0 .001 0 0 .001 0 0 1 0 0 1 0 0 0)
(dim_sizes 1 1 800 800))
(rules(weight)) )')
FROM images i, places_of_interest p
WHERE i.imageType = 'aerialPhoto' AND
overlap(grdbox(i.image),grdbox(p.loc)) AND
p.name = ‘New Orleans';
We use metadata stored in another column to
pick only those images derived from aerial
photographs.
SQL driving the Grid Fusion
SELECT GRDFuse(
GRDFuseCollect(GRDPriorityGrid(image,1.0)),
'((grdspec
(translation -90.4 29.57 0 0)
(affine_transformation 0 0 0 .001 0 0 .001 0 0 1 0 0 1 0 0 0)
(dim_sizes 1 1 800 800))
(rules(weight)) )')
FROM images i, places_of_interest p
WHERE i.imageType = 'aerialPhoto' AND
overlap(grdbox(i.image),grdbox(p.loc)) AND
p.name = 'New Orleans';
A second table called “places_of_interest” is
used to include only source grids that overlap
a region called “New Orleans”.
Barrodale Grid Datablade for IBM Informix
Resample and reproject a 386 MB raster image of the World
www.barrodale.com/
projectionDemo/
ProjectionApplet.html
select GRDExtract(grid, "((dim_sizes 1 1 600 600)(dim_names time
level row column)(translation -1489986.000000 6574741.000000 0
0)(affine_transformation 0 0 0 3338.898164 0 0 3338.898164 0 0 1 0 0
1 0 0 0)(srtext
'PROJCS[@[email protected],GEOGCS[@[email protected],
DATUM[@[email protected],SPHEROID[@[email protected],6378137.0,29
8.257223563]],PRIMEM[@[email protected],0.0],UNIT[@[email protected],0.01745
32925199433]],PROJECTION[@[email protected]],PARAMETER[@Fal
[email protected],0.0],PARAMETER[@[email protected],0.0],PARAMETER
[@[email protected],-65.0],UNIT[@[email protected],1.0]]'))"::GRDSpec)
from grdImages where g_keytext = "world2k";
Barrodale Grid Datablade for IBM Informix
Sample a 4D grid along a flight path
Head Wind
Humidity
Cross Wind
Ocean Applications?
BCS Grid DataBlade/Extension: Grids of Data
40m,30m,-50m
`
40m
`
40m,30m,-90m
`
(t6,s6,p6)
(t4,s4,p4)
10m
(t5,s5,p5)
`
DEPTH
40m,30m,-100m
`
(t3,s3,p3)
20m,10m,-100m
40m,10m,-100m
(t2,s2,p2)
5m
NORTHING
40m,20m,-100m
`
`
20m
`
10m
`
`
10m,10m,-100m
`
(t1,s1,p1)
45m,10m,-100m
EASTING
• Grids can have 1, 2, 3, or 4 dimensions.
• Each grid point can store several variables.
• Some grid point values can be NULL.
• Grid spacing along axes can be non-uniform.
BCS Grid DataBlade/Extension: Grids of Data
40m,30m,-50m
`
40m
`
40m,30m,-90m
`
(t6,s6,p6)
(t4,s4,p4)
10m
(t5,s5,p5)
`
DEPTH
40m,30m,-100m
`
(t3,s3,p3)
20m,10m,-100m
40m,10m,-100m
EASTING
(t2,s2,p2)
5m
`
`
20m
`
`
`
10m,10m,-100m
10m
`
(t1,s1,p1)
45m,10m,-100m
NORTHING
40m,20m,-100m
BCS Grid DataBlade/Extension: Types of Extraction
40m,30m,-98m
40m,30m,-99m
DEPTH
• Orthogonal ….
40m,30m,-100m
`
NORTHING
40m,20m,-100m
10m,10m,-100m
20m,10m,-100m
30m,10m,-100m
40m,10m,-100m
EASTING
30m,30m,-98m
40m,30m,-98m
40m,30m,-99m
30m,30m,-100m
DEPTH
30m,20m,-100m
40m,30m,-100m
`
20m,10m,-100m
30m,10m,-100m
NORTHING
40m,20m,-100m
• Oblique………………………...
10m,10m,-100m
20m,10m,-100m
30m,10m,-100m
40m,10m,-100m
EASTING
30m,30m,-98m
40m,20m,-98m
10m,20m,-99m
40m,30m,-98m
10m,20m,-99m
Original Grid Position
30m,30m,-99m
Interpolated Grid Position
40m,30m,-99m
10m,20m,-99m
40m,20m,-99m
DEPTH
10m,20m,-99m
40m,30m,-100m
`
NORTHING
40m,20m,-100m
10m,10m,-100m
20m,10m,-100m
30m,10m,-100m
40m,10m,-100m
EASTING
• Radial ….
10m,10m,-98m
Original Grid Position
Interpolated Grid Position
10m,10m,-100m
BCS Grid DataBlade/Extension: Orthogonal
Extraction
40m,30m,-98m
40m,30m,-99m
DEPTH
40m,30m,-100m
`
NORTHING
40m,20m,-100m
10m,10m,-100m
20m,10m,-100m
30m,10m,-100m
40m,10m,-100m
EASTING
30m,30m,-98m
30m,30m,-100m
30m,20m,-100m
20m,10m,-100m
30m,10m,-100m
BCS Grid DataBlade/Extension: Oblique Extraction
40m,30m,-98m
40m,30m,-99m
DEPTH
40m,30m,-100m
`
NORTHING
40m,20m,-100m
10m,10m,-100m
20m,10m,-100m
30m,10m,-100m
40m,10m,-100m
EASTING
30m,30m,-98m
10m,20m,-99m
40m,20m,-98m
10m,20m,-99m
Original Grid Position
30m,30m,-99m
10m,20m,-99m
10m,20m,-99m
40m,20m,-99m
Interpolated Grid Position
BCS Grid DataBlade/Extension: Radial Extraction
40m,30m,-98m
40m,30m,-99m
DEPTH
40m,30m,-100m
`
NORTHING
40m,20m,-100m
10m,10m,-100m
20m,10m,-100m
30m,10m,-100m
40m,10m,-100m
EASTING
10m,10m,-98m
Original Grid Position
Interpolated Grid Position
10m,10m,-100m
BCS Grid DataBlade/Extension: Types of Updates
• Individual Points
t021
t121
t221
t321
t021
t121
t'221
t311
t311
t320
t001
t320
t001
t101
t101
t310
Elevation
t321
t310
Longitude
t000
t100
t200
t300
t000
t100
t200
t300
Latitude
• Appending
Longitude
Elevation
0
1
2
3
4
0
1
2
3
4
Time
• Replacing Slices
Elevation
Longitude
Latitude
Original
Grid
New
Grid
Piece
Updated
Grid
Original Value
New Value
BCS Grid DataBlade/Extension: Updating Points
t021
t121
t221
t321
t021
t121
t311
t001
t101
t001
t320
t101
t310
t310
Longitude
t000
t100
t200
Latitude
t300
t321
t311
t320
Elevation
t'221
t000
t100
t200
t300
BCS Grid DataBlade/Extension: Appending a Grid
Longitude
Elevation
0
1
2
Time
3
4
0
1
2
3
4
BCS Grid DataBlade/Extension: Replacing a Slice
Elevation
Longitude
Latitude
Original
Grid
New
Grid
Piece
Updated
Grid
Original Value
New Value
BCS Grid DataBlade/Extension: Types of Aggregation
2
Y
1
0
Time = 0
1
2
3
2
3
X
2
Y
1
Time = 1
4
0
1
X
• JoinNew
3
Y
Time = 2
0
1
2
Time
2
3
X
2
2
1
Y
1
Time = 3
1
0
1
2
Y
3
0
X
1
2
X
2
Y
1
0
3
Time = 4
1
2
3
X
4
4
3
• JoinExisting
4
3
Time
3
Time
2
Time
2
2
1
2
2
1
1
Y
0
0
1
2
0
s00
s10
s20
s01
s11
3
s21
2
s02
s12
s22
3
s03
s13
s23
0
1
2
time
• Union
0
t00
t10
t20
1
t01
t11
t21
2
t02
t12
t22
3
t03
t13
t23
0
1
2
depth
time
s00,
t00
s10,
t10
s20,
t20
s01,
1 t
01
s11,
t11
s21,
t21
2
s02,
t02
s12,
t12
s22,
t22
3
s03,
t03
s13,
t13
s23,
t23
0
1
time
2
0
depth
4
5
X
depth
1
Y
0
3
X
1
2
1
1
6
Y
0
1
2
3
X
4
5
6
BCS Grid DataBlade/Extension: JoinNew
2
Y
1
0
Time = 0
1
2
3
X
2
Y
1
Time = 1
4
0
1
2
3
X
3
Y
Time = 2
0
1
2
Time
2
3
X
2
2
1
Y
1
Time = 3
1
0
1
2
Y
3
0
X
1
0
2
X
2
Y
1
Time = 4
1
2
X
3
3
BCS Grid DataBlade/Extension: JoinExisting
4
4
4
3
3
3
Time
Time
2
Time
2
2
1
2
2
1
1
1
Y
0
0
1
2
X
3
2
1
1
Y
0
3
4
5
X
6
Y
0
1
2
3
X
4
5
6
BCS Grid DataBlade/Extension: Union
0
s00
s10
s20
1
s01
s11
s21
2
s02
s12
s22
3
s03
s13
s23
0
1
2
depth
time
s00,
t00
s10,
t10
s20,
t20
1
s01,
t01
s11,
t11
s21,
t21
2
s02,
t02
s12,
t12
s22,
t22
3
s03,
t03
s13,
t13
s23,
t23
0
1
time
2
depth
0
t00
t10
t20
1
t01
t11
t21
2
t02
t12
t22
3
t03
t13
t23
0
1
2
depth
time
0
How much memory does the server need when
extracting a large gridded derived product using the
BCS Grid DataBlade (Informix) or the BCS Grid
Extension (PostgreSQL)?
Grid Extraction Size Limit as a Function of
Available Physical Memory
Recommended Maximum
Extraction Size (MB)
350
300
250
Informix
PostgreSQL
200
150
100
50
0
256
512
1024
1536
Physical Memory On Server (MB)
BCS Grid DataBlade: The Effect of Tile Size
• A good choice of tile size allows larger grids to
be extracted.
Extraction time (seconds)
Extraction Time vs. Extraction Size
Default and Optimized Tile Size
200.0
180.0
160.0
140.0
120.0
100.0
80.0
60.0
40.0
20.0
0.0
Default Tile Size
Optimized Tile
Size
560
510
460
410
360
310
260
210
160
110
60
10
Size of Extract (MB)
SUMMARY
The BCS Grid DataBlade/Extension
…..is designed for applications where:
1. The data volumes are such that they can’t be
kept in memory.
2. The amount of data extracted, in a particular
query, is small relative to the amount stored.
3. The data needs some form of resampling.
CONTACT INFORMATION:
Dr. Ian Barrodale, President
Barrodale Computing Services Ltd. (BCS)
P.O. Box 3075 STN CSC
Victoria BC V8W 3W2 Canada
(250) 472 4332 voice, (250) 472 4373 fax
e-mail: [email protected]
For more information about BCS projects, experience, and
capabilities, please visit:
http://www.barrodale.com
Download