Workplan for the COTS Metadata Working Group DRAFT - November 23, 2004 Background and purpose: At the November 16-17, 2004 COTS meeting, a small working group was designated to develop a workplan and milestones over the next 6-12 months to enhance metadata development and implementation for the Interoperability II demonstration. The Interoperability II demonstration will expand upon the Interoperability I demonstration, which included sea surface temperature and winds from various COTS programs, by adding surface currents and chlorophyll. In particular, the group is to address outstanding issues such as a standard vocabulary and metadata content (e.g., fields) The following task list, with deliverables and timeline, fulfill the charge to the group. Tasks: Task 1. Establish subgroups of domain experts to develop a data dictionary, based on existing efforts, for the data sets that will be used in Interoperability II. The data dictionary should provide the standard name, abbreviation, definition, units, and scope the domain and range, more here… of the metadata elements necessary to describe the data sets. Data sets: May need more detail here… Sea surface temperature – satellite, models, in situ – e.g., CTD casts, moored CTD Winds – satellite, models, in situ Surface currents - HF radar, models, satellite, in situ (e.g., ADCP) Chlorophyll - satellite, models, in situ The method of data collection on a specific variable may affect the metadata fields necessary to describe the data set. So, it may be necessary to have different metadata fields to describe a sea surface temperature dataset, for example, depending on the data collection method used. Or, do we address this through a separate metadata record using SensorML? For current seacoos efforts, in the seacoos netCDF convention documentation, there are example netCDF representations for several of the ‘usual’ collection subtypes Table [XX]. SEACOOS CDL Format Categories Format Category Independent Variables Cartesian fixed-point t = t1,t2,t3 (buoy, tower) x0, constant y0, constant z0, constant(s) fixed-profiler t = t1,t2,t3 (wind profiler, x0, constant ADCP) y0, constant z = z1, z2, z3 fixed-map t = t1,t2,t3 (HF Radar, Satellite x = x1,x2,x3 Imagery) y = y1,y2,y3 z0, constant moving-point-2D t = t1,t2,t3 (ship, floating x = x(t) drifter) y = y(t) z0, constant(s) moving-point-3D t = t1,t2,t3 (aircraft, towed x = x(t) undulating vehicle, y = y(t) sea glider, z = z(t) lagrangian drifter) moving-profiler t = t1,t2,t3 (ship-mounted x = x(t) ADCP, CTD y = y(t) surveys) z = z1, z2, z3 Dependent Variables netCDF Cartesian netCDF time(time) lon(lon) lat(lat) z(z) T(x0, y0, z0, t) u(x0, y0, z0, t) v(x0, y0, z0, t) T(time) u(time) v(time) time(time) lon(lon) lat(lat) z(z) T(x0, y0, z, t) u(x0, y0, z, t) v(x0, y0, z, t) T(time,z) u(time,z) v(time,z) time(time) lon(lon) lat(lat) z(z) T(x, y, z0, t) u(x, y, z0, t) v(x, y, z0, t) T(time,y,x) u(time,y,x) v(time,y,x) time(time) lon(time) lat(time) z(z) T(x(t), y(t), z0, t) u(x(t), y(t), z0, t) v(x(t), y(t), z0, t) T(time) u(time) v(time) time(time) lon(time) lat(time) z(time) T(x(t), y(t), z(t), t) u(x(t), y(t), z(t), t) v(x(t), y(t), z(t), t) T(time) u(time) v(time) time(time) lon(time) lat(time) z(z) T(x(t), y(t), z, t) u(x(t), y(t), z, t) v(x(t), y(t), z, t) T(time,z) u(time,z) v(time,z) These representation types have met our needs thus far and could be extended with additional types and examples if needed. Charlton has perl scripts (listed here http://nautilus.baruch.sc.edu/seacoos_data/CSV/data_scout/ ) which using the expected convention attributes is able to properly aggregate this to the centralized relational database. I think there are a handful of design directions we would like to move towards. These are: 1)continuing support and documentation of what we have developed 2)movement towards OGC(OpenGeospatial Consortium) XML standards and service protocols(WMS/WFS, Catalog Service, Sensor Web specs(SensorML, Observations&Measurements(O&M)) and Services) 3)movement towards industry XML standards and service protocols(Web services using SOAP,UDDI,WSDL) We want to continue to support the existing seacoos netCDF convention and data aggregation process as it is attractive to providers who want to supply data in this format as it relates to other tools which utilize the netCDF format. Moving towards the use of XML representations and validations, it should be possible to use tools provided by Unidata(either ncML(netCDF Markup Language) or the netCDF Java API) to convert the netcdf metadata header into its corresponding XML representation. Seacoos could provide an XML representation of their data dictionary and an XML Schema Definition(XSD) which would allow more standard representation and programmatic validation of netCDF metadata and conventions. netCDF metadata headers could be placed within an additional FGDC record field in either their ascii(using the ncdump –h command) and/or XML representation). One stipulation about the representation of netCDF metadata within other record formats such as FGDC though would probably be that this would be more of a template record than an active one for near real-time datasets. The current near real-time netCDF datasets for seacoos are composed of many small hourly files instead of one long archival one. For OGC compatability, the SensorML spec (platforms and sensors) has been paired with the Observations&Measurements (the observed data itself) recommendation, so the two should be considered in conjunction regarding platform and data representation. We would also like to map to MarineXML for the observations as well since this may be useful as an additional community XML standard(apparently there is a US Navy MarineXML and an Australian MarineXML, need better clarification on what the differences are here). Our past focus has been on using the seacoos netCDF convention to collect observations with minimal metadata on the surrounding platforms and sensors. The development of the current MetaDoor tool is meant to help further gather surrounding metadata concerning the platforms and sensors and represent this in exports to various output record formats such as FGDC, SensorML, O&M, and MarineXML. The seacoos netCDF convention and MetaDoor tool will probably converge to some degree, either in MetaDoor being able to automatically pull and populate data from the netCDF using the methods mentioned above, or possibly MetaDoor generating the correct netCDF template for given entries. Another bridge which we would like to have in place is a service to translate between the current netCDF convention and an OGC WFS type URL request for XML data records. Currently we aggregate seacoos netCDF files to a centralized database which supports these WFS style requests, but an outside service or piece of software would be desirable to support WFS style queries run directly against the data provider. DM Solutions is currently testing a similar type solution called OGC Publisher which is currently designed to support WFS requests from csv(comma separated value) files or relational databases(but not netCDF files as of yet). It would also be ideal if DODS/OPeNDAP could better support WMS/WFS style requests on their middleware servers to answer this need. An additional note regarding choice and use of a data dictionary is that while initial choices are important, the ability to cross-walk between dictionaries current and future (ontological frameworks, XML/RDF/OWL) are more critical in the long term ability to continuously map data between old and new efforts. The identification of metadata elements can build from other resources such as: The SEACOOS data dictionary http://www.carocoops.org/seacoos_dd also http://nautilus.baruch.sc.edu/twiki_dmcc/pub/Main/WebHome/seacoos_da ta_dictionary_ver1.3_Upd.xls the Marine metadata workshop http://mmug.calfish.org/MM_report.pdf others? Deliverable: data dictionary for each data set (and perhaps, by each collection platform/sensor) Deadline: February 15, 2005 Task 2. Incorporate the dictionary in a NetCDF Standard (e.g., extend SEACOOS CDL if appropriate)… what to do about data sets not well-represented in NetCDF?? Discussed above in regards to the collection models and ability to add more model types and examples where needed the SEACOOS CDL http://nautilus.baruch.sc.edu/twiki_dmcc/bin/view/Main/WebHome http://nautilus.baruch.sc.edu/twiki_dmcc/pub/Main/WebHome/SEACOOSNetCD FStandardv2.0.doc Task 3. Develop FGDC-compliant metadata template for these data sets, using existing tools like MERMAID and MetaDoor…more… Questions/comments: Meta Door can create FGDC, Marine XML and Sensor ML compliant files. Are these files always separate? Do we need to find a way to extend the FGDC standard, just informally (for our purposes), to aggregate the information in these files? It may be best to keep the sensor metadata separate and reference it from the FGDC metadata record (though I always worry about referencing things to other documents…seems like the information can get separated permanently) MetaDoor can currently create FGDC files and we are developing support for SensorML and MarineXML files. The FGDC allows extended fields for additional metadata and this would be where we could provide both a link to active data(which may be updated by the minute) as well as a copy of the templates for this data(data which for the most part would not change in regards to layout or values) Is there a way to automatically dump the metadata from the NetCDF header into MetaDoor? There are java tools available to extract metadata from NetCDF files…not sure how good they are. Would like to see as much automatic generation of metadata records as possible. From the discussion above more automation should be possible, keeping in mind that some metadata(such as number of observations) may specified as ‘ongoing’ where data is being actively collected or changed. Is the FGDC catalog submission a separate step using MetaDoor? MetaDoor submits metadata to ISite server when a user Publishes a metadata record. But this can be automated if the metadata is generated automatically. Publishing in MetaDoor is just dropping the metadata document in the ISite index directory. A quick description and diagram of OGC services is referenced here http://nautilus.baruch.sc.edu/twiki_dmcc/bin/view/Main/OGCInfo