DAP for ESIP dwf-Jul-2012

advertisement
OPeNDAP-Unidata
Development of DAP4
(a Data Access
Protocol)
Describing Progress and Seeking
Input
at the ESIP Summer Meeting 2012
by Dave Fulker (OPeNDAP President)
Overarching Concept of OPeNDAP’s Data Access Protocol
(DAP):
Clients Get Only Needed Data, When
Accessing data through web services (i.e., URL ≈
They
Need
them
dataset)
Appending query strings to invoke server functions,
esp. subsetting
Getting responses of 2 major types:
Metadata - dataset descriptions & catalogs (textual)
Content - values and metadata (binary or textual)
Using responses in diverse client contexts, e.g.,
MATLAB maps DAP responses directly to its internal
math types
DAP libraries (netCDF,
2 e.g.) simplify the programming
Some of DAP Users’
Distinguishing Needs
Data often depict (scientific) phenomena where
Geospatial maps are among the useful views
But other views are important as well
Coordinates often are 2-, 3-, 4- & even 5dimensional
These may include (time-dependent) coordinate-proxies
Users often wish to use data whose source files
Are in a variety of inconvenient formats
With insufficient or obsolete metadata
3
Present
State (after
of DAP
The DAP2 specification
nearly 2 decades!)
has multiple contemporary realizations on
servers and clients
Clients include: MATLAB, GRADS, IDL, IDV...
Python apps that employ the PyDAP library
Fortran, C, C++ & Java apps that employ the netCDF
library
Servers include: PyDAP, ERDAP... (often with augmented
services)
Most widely deployed: TDS (Unidata) & Hyrax
(OPeNDAP)
Widely used by data providers and users,
including cases where DAP servers provide
4
translations of inconveniently formatted source
Branching: Hyrax &
THREDDS
Multiple implementations of a protocol often is
considered a good thing (per IETF, e.g.)
This can be a problem, however, if the
implementations embody excessive redundancy or
confuse users
Our view: co-existence of TDS (Unidata) & Hyrax
(OPeNDAP) reflects some redundancy & creates
some inconsistencies for users
Need #1: achieve conformance ⇒ consistency for users
Need #2: more software reuse ⇒ more advancement
5
NOAA/BAA grant for
OPeNDAP-Unidata Linked
Servers
(OPULS)
Goal 1: OPeNDAP/Unidata conformance &
linkage
New data-model/protocol specs (DAP4), with
conformance tests
& extensibility demos:
Modes of asynchronous access (to near-line data,
e.g.)
Server-side subsetting of data on irregular meshes
Goal 2: common software for OPeNDAP &
Unidata servers
Work yet to begin... 6
OPeNDAP Data-Type
Philosophy
(reflected
in
DAP2
&
now
DAP4)
Data model has few data types
For simplified programming & lowered risk of errors
Data types are deliberately domain-neutral
For better trans-domain utility & programmer uptake
But they allow both syntactic & semantic
structures/metadata
These Types do in fact support domain needs
NetCDF-like (can represent functions on 4-D domains,
e.g.)
Sequences & selections match DBMS sensibilities
7
DAP4 Data Model
(simplified)
dataset ≈ unique URL (with no query string)
a dataset holds a hierarchy of groups, each a namespace
/container for variables, dimensions & attributes
each variable comprises
a name
a type
value(s)
(unique in
the group)
(which applies to
all values)
(organized as
dimensioned arrays)
attributes*
(optional)
*Attributes are like variables but with a semantic purpose,
making a variable or a group more meaningful. E.g., variables
often have an attribute (of type string) named “units.”
8
DAP4 Data Types &
Relations
as in C or Java, e.g., a variable’s type may be structured
or atomic: integer, float, byte, string...
DAP variables may be (semantically) related to one
another via two key grouping constructs
relations link 1-D variables
as columns in a table;
sampled functions link
coordinate-map variables
(domain) to function-value
variables (ranges) having
common indexes
in turn, relations can be
linked via variables that
serve as foreign keys
9
DAP4 Operations (invoked as
query strings)
3 kinds of constraint expressions (i.e. query strings)
yield subsets or invoke (server-side) processing
projection
selection
function
(returns a subset)
(returns a subset)
(today’s town hall!)
specify included
variables (by
name) as well as
indices of included
array elements
limit tuples (rows) of
a relation to those
with variable values
satisfying a DBMSstyle predicate
invoke server
functions to
calculate a return
[we intend to target
critical needs]
10
P
Projecti
on
Operato
rs
Like netCDF, but as a
Web service, users may
Skip indices
Limit index ranges
Reduce
dimensionality
11
Other DAP-Related
Serverices
Many DAP-based servers (from Unidata
Note: these were not part of the DAP2
& OPeNDAP, e.g.)
specification...
Accept multiple types of data as inputs
Offer several views of them over the web
Native DAP web services: for DAP-enabled
clients
Source format (lossless): netCDF-to-netCDF or
HDF4-to-HDF4, e.g.
Alternative web services: html (browser views),
XML, WCS, etc.
Town-Hall: what other services should be
12
Other OPULS
Accomplishments
Irregular mesh subsetting
Progress with U WA (Bill Howe)
To be released soon...
Asynchronous access
Preliminary trials...
Cloud-based service provision (with
parallelism)
MODIS reprojection (related, but not
OPULS funding)
13
OPULS Process
Transparency
Public documentation updated weekly (just
Google OPULS!)
Advisory committee
Jeff de La Beaujardiere, James Frew, Mike
Folk, Steve Hankin, Eric Kihn, Rich Signell
Welcoming input (per this town hall)
14
Town-Hall
Questions
What server functions ought to be specified in the
DAP4 protocol?
Simple point-wise mathematics
Mathematics on sampled functions
Truly domain-specific functions (involving the datum,
e.g.)
Which (other) web-service protocols should be
leveraged by DAP servers, & what are the
pertinent use cases?
To facilitate open search (exploiting ATOM), e.g.
To facilitate semantic analysis (providing RDF
output, e.g.)
15
i thank
you
•
OPeNDAP, Inc
•
http://opendap.or
g
•
16
increasing
data’s visibility
Download