NOAA’s National Ocean Service • Office of Response and Restoration Serving unstructured grids using OPeNDAP: Using server-side operations to subset and subsample data Christopher Barker NOAA Office of Response & Restoration Emergency Response Division James Gallagher OPenDAP, inc. NOAA Emergency Response Division • National Contingency Plan specifies NOAA’s role in supporting the Coast Guard: “Provide scientific expertise to support an incident response for Oil and Chemical Spills” Key Role: Trajectory Modeling • Where is the oil (or chemical) going? Primary Tool: GNOME (General NOAA Operational Modeling Environment) • Lagrangian element (particle) model • Forcing from external sources: – Winds – Currents • Currents: – In house model – External operational models GOODS GNOME Online Operational Data Server Example: Deepwater Horizon • Ocean models utilized: – NOAA CSDL: NGOM – Navy models: NCOM, HYCOM, IASNFS – USF: West Florida Shelf ROMS – TGLO/TAMU: TX shelf ROMS – NC State: SABGOM – All structured grid models Unstructured Grid Models? • Unstructured Grids: – Allow resolution to vary spatially – Conform to boundaries • Nice for oil spills and particle tracking • Many more UGRID models coming online – Many papers at this conference Some Models of Interest • FVCOM: – nGOMOFS (NOAA CSDL) – Gulf of Maine/Mass Bay (UMASS) – Salish Sea (PNNL) • SELFE: – Columbia River (OHSU) – Texas Estuaries models (UT) • ADCIRC: – Gulf of Mexico / Southern LA and Texas grid 9,108,128 nodes--18,061,765 elements nGOMOFS (NOAA CSDL) V6 90,310 Nodes 174,550 Elements What if I just need Mobile Bay? Mobile Bay, AL detail grid. About 300 m grid resolution along a 13 m deep navigation channel FVCOM-GoM/GB for Mass Bay and Nantucket Sounds/Shoals Boston Inner Harbor ADCIRC: Gulf of Mexico / Southern LA and Texas grid (SL18TX) • Gulf of Mexico / Southern LA and Texas grid 9,108,128 nodes--18,061,765 elements • Just surface currents: – 275 MB per time step (plus the grid specs) Obstacles to using UGRID models: • No standard for data/results on UGRIDS: – Informal working group for (quite!) a few years – Recent draft standard (netcdf 3) – Work on JavaNetcdf lib to support it (SURA modeling test bed project) • Big Grids: – Need server side subsetting How to get it done? • NOAA/ORR post-DWH funding: – Better able to response to large spills • We started talking to folks about server-side subsetting options • But we’re clients: – We’re not going to run a server • We needed something that would become an excepted standard/tool. How to get it done? • NOAA/NESDIS noted assorted issues: – Netcdf/OpenDAP development funding limited – Multiple diverging implementations: “Unfunded Mandate” • NESDIS coordinated funding from: – Technology, Planning and Integration for Observations (TPIO) Program – OR&R – National Climatic Data Center (NCDC) OPeNDAP-Unidata Linked Servers (OPULS) • NOAA/BAA grant supports this important collaboration between Unidata & OPeNDAP • First goal: conformance between OPeNDAP & Unidata servers, through which access is gained to growing amounts of NOAA & related data. Other short-term goals include: – Asynchronous modes, such as are needed for (delayed) access to nearline data, perhaps stored on tape, e.g. – Improved access (with server-side subsetting) to data organized on nonrectangular meshes, such as in coastal modeling • Work began in Boulder during October & will be influenced by an advisory committee (yet to be appointed) OPeNDAP: the Data Access Protocol • DAP2 combines simple data model with a general set of operators. – Data Model: Atomic types (e.g., ‘Integer’); Arrays; Structures; Grids; and Sequences. – Operators: These provide ways to subset all but the atomic types. – Domain neutral: By keeping the semantics of the model clean, we ensure that it can be applied to many different types of data. But how is it used? • DAP is generally used as a ‘web service’ • DAP requests are made using a URL • DAP responses are ‘documents’: – Text that contains metadata – Combination of text/metadata and binary data. • Applications read these responses and use them it whatever ways they see fit: – the netCDF client library makes legacy applications believe they are reading from a local file About Array and Grid Selection • In addition to requesting a Grid or Array, the Selection can be used to subset in indicial space. About Functions • Constraint Expression can contain functions • These functions can perform any operation that can be programmed. • Thus they provide a good way to extend a data server to perform new operations • These include operations that are not domain neutral • In Hyrax they are written in C++ Example URLs • The base URL: “http://test.opendap.org/opendap/data/nc/fnoc1.nc” • To get metadata: – Dataset variables: http://test.opendap.org/opendap/data/nc/fnoc1.nc.dds – … attributes: http://test.opendap.org/opendap/data/nc/fnoc1.nc.das – Or less readable in XML: http://test.opendap.org/opendap/data/nc/fnoc1.nc.ddx • To get data: – Just the variables u and v: http://test.opendap.org/opendap/data/nc/fnoc1.nc.dods?u,v – … in ASCII so it’s easy to read: http://…/opendap/data/nc/fnoc1.nc.asc?u,v • With subsetting: – http://test.opendap.org/opendap/data/nc/fnoc1.nc.asc?u[0][3:6][5:8] • Here’s a function: – http://…/nc/coads_climatology.nc.ascii?geogrid(SST,45,-80,20,60,”1000<TIME<3000”) – This is an example of how functions can enable domain-specific behavior; this function will return an error if the Grid is not ‘geospatial’ Challenges • Unstructured Grids are not a specific type in DAP • We must choose a way, or set of ways, to represent these data • Datasets are often too large to download – subsetting must be done server-side. • Because the subsetting operations are complex, we will need to use server-side functions to implement them Requirements • Must enable subsetting by polygonal regions • The result must be an unstructured grid itself • A subset must preserve the topological and geometric relationships present in the whole: – we can’t just regrid everything to a more convenient form. Proposed Solution • Server-side function to add subsetting • Adopt the proposed unstructured grid encoding using netCDF3 • Result of the function will be a DAP2 response – Input is netCDF3 with some additional ‘conventions’: it can be represented in DAP2 – There are existing clients that can read DAP2 • If they understand netcdf in the new convention, they will understand the results The server-side function • Ugrid(Mesh,<polygon>) – <polygon> is a comma separated list of latitude and longitude points – However, there is an arbitrary limit to the number of characters in a URL, so • We will also support POST when OPULS makes the transition to DAP4 – It will likely take more than a year for all of DAP4 to be realized, but POST for constraint expressions will be set in the first year. Example ugrid() calls • http://…/model.nc?ugrid(SST,45,-80,20,-60) – When ugrid() is called with two points, it will assume the polygon is a box. • http://…/model.nc?ugrid(SST,45,-80, 45,-60, 20,-60, 20,-80) – Here the polygon the same box as above. – There’s an understood edge connecting the first and last points – Point order is important – self-intersecting polygons will raise an error. http://…/model.nc?ugrid(SST, -71.03, 42.38, -71.06, 42.37, 71.06, 42.36, -71.06, 42.35, -71.04, 42.33 -71.01, 42.34, 71.01, 42.35, -71.03, 42.38) Implementation • We will use the Gridfields library [Howe 05] • The library will be extended to work with the new netCDF3 file format: “Deltares CF proposal for Unstructured Grid data model” • And to work with DAP [Howe 05] Bill Howe, David Maier, “Algebraic Manipulation of Scientific Datasets,” VLDB Journal, 14(4) 2005 Progress so far • Gridfields has already been used to build a simpler server-side demonstration function • The Gridfields code has adopted GNU’s autotools to streamline its build. • We will factor out the C++ code into its own project, separate from the Python layer • This will simplify moving gridfields into the Linux community builds Summary • Ugrid models are seeing wide deployment • Subsetting UGrids on the server is critical to the wide use of model results • UGrids will be encoded in netCDF3 • We will use a widely available open-source library to perform the actual operations • The results will be valid UGrids, in DAP • The work has begun Use for Curvilinear grids, too? • Capture arbitrary polygon subset. • Rectangle in geo-coordinates not a rectangle in grid coordinates – We generally over sample. - But that’s not always a good solution for highly deformed grids. - What would the result look like? - A new structured grid? - An unstructured grid? Further Discussion, etc. • Meet here at ECM: – Lunch Wed? • Discussion on UGRID Google group: https://groups.google.com/group/ugrid-interoperability • OPeNDAP Wiki: http://docs.opendap.org/index.php/Projects