OPeNDAP-Unidata Development of DAP4 (a Data Access Protocol) Describing Progress and Seeking Input at the ESIP Summer Meeting 2012 by Dave Fulker (OPeNDAP President) Overarching Concept of OPeNDAP’s Data Access Protocol (DAP): Clients Get Only Needed Data, When Accessing data through web services (i.e., URL ≈ They Need them dataset) Appending query strings to invoke server functions, esp. subsetting Getting responses of 2 major types: Metadata - dataset descriptions & catalogs (textual) Content - values and metadata (binary or textual) Using responses in diverse client contexts, e.g., MATLAB maps DAP responses directly to its internal math types DAP libraries (netCDF, 2 e.g.) simplify the programming Some of DAP Users’ Distinguishing Needs Data often depict (scientific) phenomena where Geospatial maps are among the useful views But other views are important as well Coordinates often are 2-, 3-, 4- & even 5dimensional These may include (time-dependent) coordinate-proxies Users often wish to use data whose source files Are in a variety of inconvenient formats With insufficient or obsolete metadata 3 Present State (after of DAP The DAP2 specification nearly 2 decades!) has multiple contemporary realizations on servers and clients Clients include: MATLAB, GRADS, IDL, IDV... Python apps that employ the PyDAP library Fortran, C, C++ & Java apps that employ the netCDF library Servers include: PyDAP, ERDAP... (often with augmented services) Most widely deployed: TDS (Unidata) & Hyrax (OPeNDAP) Widely used by data providers and users, including cases where DAP servers provide 4 translations of inconveniently formatted source Branching: Hyrax & THREDDS Multiple implementations of a protocol often is considered a good thing (per IETF, e.g.) This can be a problem, however, if the implementations embody excessive redundancy or confuse users Our view: co-existence of TDS (Unidata) & Hyrax (OPeNDAP) reflects some redundancy & creates some inconsistencies for users Need #1: achieve conformance ⇒ consistency for users Need #2: more software reuse ⇒ more advancement 5 NOAA/BAA grant for OPeNDAP-Unidata Linked Servers (OPULS) Goal 1: OPeNDAP/Unidata conformance & linkage New data-model/protocol specs (DAP4), with conformance tests & extensibility demos: Modes of asynchronous access (to near-line data, e.g.) Server-side subsetting of data on irregular meshes Goal 2: common software for OPeNDAP & Unidata servers Work yet to begin... 6 OPeNDAP Data-Type Philosophy (reflected in DAP2 & now DAP4) Data model has few data types For simplified programming & lowered risk of errors Data types are deliberately domain-neutral For better trans-domain utility & programmer uptake But they allow both syntactic & semantic structures/metadata These Types do in fact support domain needs NetCDF-like (can represent functions on 4-D domains, e.g.) Sequences & selections match DBMS sensibilities 7 DAP4 Data Model (simplified) dataset ≈ unique URL (with no query string) a dataset holds a hierarchy of groups, each a namespace /container for variables, dimensions & attributes each variable comprises a name a type value(s) (unique in the group) (which applies to all values) (organized as dimensioned arrays) attributes* (optional) *Attributes are like variables but with a semantic purpose, making a variable or a group more meaningful. E.g., variables often have an attribute (of type string) named “units.” 8 DAP4 Data Types & Relations as in C or Java, e.g., a variable’s type may be structured or atomic: integer, float, byte, string... DAP variables may be (semantically) related to one another via two key grouping constructs relations link 1-D variables as columns in a table; sampled functions link coordinate-map variables (domain) to function-value variables (ranges) having common indexes in turn, relations can be linked via variables that serve as foreign keys 9 DAP4 Operations (invoked as query strings) 3 kinds of constraint expressions (i.e. query strings) yield subsets or invoke (server-side) processing projection selection function (returns a subset) (returns a subset) (today’s town hall!) specify included variables (by name) as well as indices of included array elements limit tuples (rows) of a relation to those with variable values satisfying a DBMSstyle predicate invoke server functions to calculate a return [we intend to target critical needs] 10 P Projecti on Operato rs Like netCDF, but as a Web service, users may Skip indices Limit index ranges Reduce dimensionality 11 Other DAP-Related Serverices Many DAP-based servers (from Unidata Note: these were not part of the DAP2 & OPeNDAP, e.g.) specification... Accept multiple types of data as inputs Offer several views of them over the web Native DAP web services: for DAP-enabled clients Source format (lossless): netCDF-to-netCDF or HDF4-to-HDF4, e.g. Alternative web services: html (browser views), XML, WCS, etc. Town-Hall: what other services should be 12 Other OPULS Accomplishments Irregular mesh subsetting Progress with U WA (Bill Howe) To be released soon... Asynchronous access Preliminary trials... Cloud-based service provision (with parallelism) MODIS reprojection (related, but not OPULS funding) 13 OPULS Process Transparency Public documentation updated weekly (just Google OPULS!) Advisory committee Jeff de La Beaujardiere, James Frew, Mike Folk, Steve Hankin, Eric Kihn, Rich Signell Welcoming input (per this town hall) 14 Town-Hall Questions What server functions ought to be specified in the DAP4 protocol? Simple point-wise mathematics Mathematics on sampled functions Truly domain-specific functions (involving the datum, e.g.) Which (other) web-service protocols should be leveraged by DAP servers, & what are the pertinent use cases? To facilitate open search (exploiting ATOM), e.g. To facilitate semantic analysis (providing RDF output, e.g.) 15 i thank you • OPeNDAP, Inc • http://opendap.or g • 16 increasing data’s visibility