OPeNDAP Present and Future An Overview Encompassing Current Projects & Potential New Directions Dave Fulker and James Gallagher Rough Outline • Background • OPULS (an OPeNDAP-Unidata collaboration) – DAP4 (to supersede DAP2) – Experimental extensions (Async access, UGRID subsets) • Hyrax over Amazon/S3 • Elaboration on server functions – Perhaps binning, masking, a functional language? – Relationship to WPS & other Web services • Hyrax (& WCS) in OWS-9 OPeNDAP, Inc. 2 Origins • Scientists (ocean fluxes & temps) envisaged use of http for remote data access (1993) • Collaboration with the designer of the JGOFS data system… • Led to Distributed Ocean Data System (DODS) • DODS later was renamed OPeNDAP (to be explained momentarily…) OPeNDAP, Inc. 3 OPeNDAP Now Is: • An acronym – “Open-source Project for a Network Data Access Protocol” – Often a synonym for “DAP” • A not-for-profit corp. developing/supporting – “DAPx” - a web-services protocol for data access • Deployed by hundreds of data providers internationally • Employed in many analysis packages (MATLAB, e.g.) • Designated a “Community Standard” by NASA – Server & client implementations* of DAP *Note: there are other implementations 4 Available Software • Free end-user applications that include DAP support: panoply, idv, nco, … • Commercial: IDL, Matlab, ArcGIS • SDKs: The netCDF C and Java libraries; OC; libdap; Java OPeNDAP, PyDAP – Each of these provides its own API and they span C, C++, Java and Python • Data serves: PyDAP, Hyrax, TDS, … OPeNDAP, Inc. 5 Concept: Clients Get Just the Data They Need, as They Need them • Accessing data via URLs (i.e., URL = dataset) – Appending query strings to subset or run server functions • Getting responses of two (general) types: – Metadata - dataset descriptions & catalogs (textual) – Content - values and metadata (binary or textual) • Using responses in diverse ways, e.g. – MATLAB maps responses to its internal math types – netCDF library allows apps to work as though reading a local file 6 NOAA grant for OPeNDAP-Unidata Linked Servers (OPULS) • Goal 1: conformance & linkage between OPeNDAP & Unidata DAP-servers, with short-term outcomes: – New data-model & protocol specs: DAP4 • Consistent behaviors of OPeNDAP & Unidata servers • Data-type richness (NetCDF4, HDF5, RDBs) – Extensions (i.e., new server behaviors): • Irregular-mesh subsetting • Asynchronous access • Goal 2: common framework for OPeNDAP & Unidata servers, aiming for an architecture that – Underpins the unique strengths of both – Reduces likelihood of redundant effort 7 OPULS Progress So Far • Draft of DAP4 data model & protocol specs – Sufficient for the full richness of NetCDF-4 and HDF-5 files (including “Groups,” e.g.) • Progress on rigorous conformance-testing • Successful extensibility experiments – Irregular-mesh (i.e., UGRID) subsetting – Asynchronous access (as may be useful for near-line data storage) – Amazon cloud deployment (more later…) 8 Other technologies OPULS considered • JSON responses as an alternative to XML – Decided they added too much bulk to the specification and two many requirements for implementers – Could be added in a future version – Can be built using XSLT from DAP4 XML • OpenSearch – Not incorporated into DAP4 for many of the same reasons • The DAP4 metadata response specifically includes support for these OPeNDAP, Inc. 9 OPULS and Feedback • OPULS is ready for community feedback • Design documents are online – Web site: http://docs.opendap.org/ – The current draft specification is there as well • Many features are already available in C++ and C implementations OPeNDAP, Inc. 10 Hyrax over Amazon/S3 • Exploits a natural fit between DAP-based services and cloud services • Initial progress already achieved under the OPULS grant • Bears interesting similarities to the challenge of asynchronous data access • May yield a new community of OPeNDAP users OPeNDAP, Inc. 11 More about clouds… • Hyrax is trivial to run on the Amazon cloud • We are looking at ways to work with data held in S3 • S3 characteristics: – Flat; – Modest response times; – Simple GET/PUT type API OPeNDAP, Inc. 12 Using S3 • Tried S3 file systems – found them wanting – Not interoperable (hardly surprising, but limiting) – Extra layer to software stack • Now working with XML ‘catalogs’ – XML documents create a faux hierarchy – XML + XSLT HTML (i.e., a ‘free’ web interface) – XML + Hyrax + caching DAP access – The XML is very similar to THREDDS catalogs OPeNDAP, Inc. 13 Elaboration on Server Functions • Proposition: the future of OPeNDAP may lie in provision of data-proximate (i.e., server-side) functions that: – Deliver precisely defined subsets – Reduce the number of off-target retrievals • I.e., enable querying of complex dataset properties – Remap/transform data to simplify data use, especially multi-source data integration • Effective caching will be required OPeNDAP, Inc. 14 Server Functions, DAP4 • DAP2 supports functions and functional composition • Currently, DAP4 treats ‘functions’ and a ‘functional language’ as an extension • DAP4 provides more complete support for functions, including metadata responses (DAP2 does not provide this; a gap in the DAP2 specification) • Support for POST OPeNDAP, Inc. 15 Server Functions, experimentation • UGrid: Unstructured Grid (irregular mesh) subsetting • We have implemented a clone of the GDS server’s syntax for functions • Enables current netCDF-based DAP clients (e.g., ECMF) to use the Ugrid function • Other projects: Multi-instrument intercalibration OPeNDAP, Inc. 16 Some Server-Function Ideas • Binning: returns a distribution (as a raster of boolean values on a user-specified grid) of data values satisfying some criteria • Masking: accepts a raster of zero/nonzero values as a query argument, perhaps as a geospatial selection criterion, e.g. • Perhaps some (limited?) form of functional language for very rich capabilities • WPS, et al. OPeNDAP, Inc. 17 Summary • DAP is based on a domain neutral data model and an expression-based constraint language • While not ‘RESTful’ in the strictest sense, it is a REST design in spirit (DAP predates the term by several years) • OPULS is a collaborative project between OPeNDAP and Unidata that intends to update DAP • We are also running several experimental mini-projects within its context: – Asynchronous access, Unstructured Grid access, Cloud computing and an expanded, function-based, server-side processing system • DAP servers provide a good platform on which to build OGC web services, as described in the following presentation. OPeNDAP, Inc. 18