Data sets

advertisement
Workplan for the COTS Metadata Working Group
DRAFT - November 23, 2004
Background and purpose:
At the November 16-17, 2004 COTS meeting, a small working group was designated to
develop a workplan and milestones over the next 6-12 months to enhance metadata
development and implementation for the Interoperability II demonstration. The
Interoperability II demonstration will expand upon the Interoperability I demonstration,
which included sea surface temperature and winds from various COTS programs, by
adding surface currents and chlorophyll. In particular, the group is to address outstanding
issues such as a standard vocabulary and metadata content (e.g., fields)
The following task list, with deliverables and timeline, fulfill the charge to the group.
Tasks:
Task 1. Establish subgroups of domain experts to develop a data dictionary, based on
existing efforts, for the data sets that will be used in Interoperability II. The data
dictionary should provide the standard name, abbreviation, definition, units, and scope
the domain and range, more here… of the metadata elements necessary to describe the
data sets.
Data sets:
May need more detail here…
Sea surface temperature – satellite, models, in situ – e.g., CTD casts, moored CTD
Winds – satellite, models, in situ
Surface currents - HF radar, models, satellite, in situ (e.g., ADCP)
Chlorophyll - satellite, models, in situ
The method of data collection on a specific variable may affect the metadata fields
necessary to describe the data set. So, it may be necessary to have different metadata
fields to describe a sea surface temperature dataset, for example, depending on the data
collection method used. Or, do we address this through a separate metadata record
using SensorML?
For current seacoos efforts, in the seacoos netCDF convention documentation, there are
example netCDF representations for several of the ‘usual’ collection subtypes
Table [XX]. SEACOOS CDL Format Categories
Format Category
Independent Variables
Cartesian
fixed-point
t = t1,t2,t3
(buoy, tower)
x0, constant
y0, constant
z0, constant(s)
fixed-profiler
t = t1,t2,t3
(wind profiler,
x0, constant
ADCP)
y0, constant
z = z1, z2, z3
fixed-map
t = t1,t2,t3
(HF Radar, Satellite x = x1,x2,x3
Imagery)
y = y1,y2,y3
z0, constant
moving-point-2D
t = t1,t2,t3
(ship, floating
x = x(t)
drifter)
y = y(t)
z0, constant(s)
moving-point-3D
t = t1,t2,t3
(aircraft, towed
x = x(t)
undulating vehicle, y = y(t)
sea glider,
z = z(t)
lagrangian drifter)
moving-profiler
t = t1,t2,t3
(ship-mounted
x = x(t)
ADCP, CTD
y = y(t)
surveys)
z = z1, z2, z3
Dependent Variables
netCDF
Cartesian
netCDF
time(time)
lon(lon)
lat(lat)
z(z)
T(x0, y0, z0, t)
u(x0, y0, z0, t)
v(x0, y0, z0, t)
T(time)
u(time)
v(time)
time(time)
lon(lon)
lat(lat)
z(z)
T(x0, y0, z, t)
u(x0, y0, z, t)
v(x0, y0, z, t)
T(time,z)
u(time,z)
v(time,z)
time(time)
lon(lon)
lat(lat)
z(z)
T(x, y, z0, t)
u(x, y, z0, t)
v(x, y, z0, t)
T(time,y,x)
u(time,y,x)
v(time,y,x)
time(time)
lon(time)
lat(time)
z(z)
T(x(t), y(t), z0, t)
u(x(t), y(t), z0, t)
v(x(t), y(t), z0, t)
T(time)
u(time)
v(time)
time(time)
lon(time)
lat(time)
z(time)
T(x(t), y(t), z(t), t)
u(x(t), y(t), z(t), t)
v(x(t), y(t), z(t), t)
T(time)
u(time)
v(time)
time(time)
lon(time)
lat(time)
z(z)
T(x(t), y(t), z, t)
u(x(t), y(t), z, t)
v(x(t), y(t), z, t)
T(time,z)
u(time,z)
v(time,z)
These representation types have met our needs thus far and could be extended with
additional types and examples if needed. Charlton has perl scripts (listed here
http://nautilus.baruch.sc.edu/seacoos_data/CSV/data_scout/ ) which using the expected
convention attributes is able to properly aggregate this to the centralized relational
database.
I think there are a handful of design directions we would like to move towards.
These are:
1)continuing support and documentation of what we have developed
2)movement towards OGC(OpenGeospatial Consortium) XML standards and service
protocols(WMS/WFS, Catalog Service, Sensor Web specs(SensorML,
Observations&Measurements(O&M)) and Services)
3)movement towards industry XML standards and service protocols(Web services using
SOAP,UDDI,WSDL)
We want to continue to support the existing seacoos netCDF convention and data
aggregation process as it is attractive to providers who want to supply data in this format
as it relates to other tools which utilize the netCDF format.
Moving towards the use of XML representations and validations, it should be possible to
use tools provided by Unidata(either ncML(netCDF Markup Language) or the netCDF
Java API) to convert the netcdf metadata header into its corresponding XML
representation. Seacoos could provide an XML representation of their data dictionary
and an XML Schema Definition(XSD) which would allow more standard representation
and programmatic validation of netCDF metadata and conventions. netCDF metadata
headers could be placed within an additional FGDC record field in either their ascii(using
the ncdump –h command) and/or XML representation).
One stipulation about the representation of netCDF metadata within other record formats
such as FGDC though would probably be that this would be more of a template record
than an active one for near real-time datasets. The current near real-time netCDF datasets
for seacoos are composed of many small hourly files instead of one long archival one.
For OGC compatability, the SensorML spec (platforms and sensors) has been paired with
the Observations&Measurements (the observed data itself) recommendation, so the two
should be considered in conjunction regarding platform and data representation. We
would also like to map to MarineXML for the observations as well since this may be
useful as an additional community XML standard(apparently there is a US Navy
MarineXML and an Australian MarineXML, need better clarification on what the
differences are here).
Our past focus has been on using the seacoos netCDF convention to collect observations
with minimal metadata on the surrounding platforms and sensors. The development of
the current MetaDoor tool is meant to help further gather surrounding metadata
concerning the platforms and sensors and represent this in exports to various output
record formats such as FGDC, SensorML, O&M, and MarineXML. The seacoos netCDF
convention and MetaDoor tool will probably converge to some degree, either in
MetaDoor being able to automatically pull and populate data from the netCDF using the
methods mentioned above, or possibly MetaDoor generating the correct netCDF template
for given entries.
Another bridge which we would like to have in place is a service to translate between the
current netCDF convention and an OGC WFS type URL request for XML data records.
Currently we aggregate seacoos netCDF files to a centralized database which supports
these WFS style requests, but an outside service or piece of software would be desirable
to support WFS style queries run directly against the data provider. DM Solutions is
currently testing a similar type solution called OGC Publisher which is currently designed
to support WFS requests from csv(comma separated value) files or relational
databases(but not netCDF files as of yet). It would also be ideal if DODS/OPeNDAP
could better support WMS/WFS style requests on their middleware servers to answer this
need.
An additional note regarding choice and use of a data dictionary is that while initial
choices are important, the ability to cross-walk between dictionaries current and future
(ontological frameworks, XML/RDF/OWL) are more critical in the long term ability to
continuously map data between old and new efforts.
The identification of metadata elements can build from other resources such as:
 The SEACOOS data dictionary http://www.carocoops.org/seacoos_dd also
http://nautilus.baruch.sc.edu/twiki_dmcc/pub/Main/WebHome/seacoos_da
ta_dictionary_ver1.3_Upd.xls
 the Marine metadata workshop http://mmug.calfish.org/MM_report.pdf
 others?
Deliverable: data dictionary for each data set (and perhaps, by each collection
platform/sensor)
Deadline: February 15, 2005
Task 2. Incorporate the dictionary in a NetCDF Standard (e.g., extend SEACOOS CDL
if appropriate)…
what to do about data sets not well-represented in NetCDF??
Discussed above in regards to the collection models and ability to add more model
types and examples where needed
 the SEACOOS CDL
http://nautilus.baruch.sc.edu/twiki_dmcc/bin/view/Main/WebHome
 http://nautilus.baruch.sc.edu/twiki_dmcc/pub/Main/WebHome/SEACOOSNetCD
FStandardv2.0.doc
Task 3. Develop FGDC-compliant metadata template for these data sets, using existing
tools like MERMAID and MetaDoor…more…
Questions/comments:
 Meta Door can create FGDC, Marine XML and Sensor ML compliant files.
Are these files always separate? Do we need to find a way to extend the
FGDC standard, just informally (for our purposes), to aggregate the
information in these files? It may be best to keep the sensor metadata
separate and reference it from the FGDC metadata record (though I always
worry about referencing things to other documents…seems like the
information can get separated permanently)
 MetaDoor can currently create FGDC files and we are developing support
for SensorML and MarineXML files. The FGDC allows extended fields for
additional metadata and this would be where we could provide both a link to
active data(which may be updated by the minute) as well as a copy of the



templates for this data(data which for the most part would not change in
regards to layout or values)
Is there a way to automatically dump the metadata from the NetCDF header
into MetaDoor? There are java tools available to extract metadata from
NetCDF files…not sure how good they are. Would like to see as much
automatic generation of metadata records as possible.
From the discussion above more automation should be possible, keeping in
mind that some metadata(such as number of observations) may specified as
‘ongoing’ where data is being actively collected or changed.
Is the FGDC catalog submission a separate step using MetaDoor?
MetaDoor submits metadata to ISite server when a user Publishes a
metadata record. But this can be automated if the metadata is generated
automatically.
Publishing in MetaDoor is just dropping the metadata document in the
ISite index directory.
A quick description and diagram of OGC services is referenced here
http://nautilus.baruch.sc.edu/twiki_dmcc/bin/view/Main/OGCInfo
Download