Describing data through metadata and XML: the CSML

advertisement
Describing data through
metadata and XML: the CSML
experience.
Bryan Lawrence1
Andrew Woolf2, Roy Lowry3, Kerstin Kleese van Dam2, Ray Cramer3,
Marta Gutierrez1, Siva Kondapalli3, Susan Latham1, Dominic Lowe1,
Kevin O’Neill2, Ag Stephens1
… and Michael Burek4
1British
Atmospheric Data Centre
2CCLRC e-Science Centre
3British Oceanographic Data Centre
4National Center for Atmospheric Research
+
+
+
+
+[
BADC, BODC, CCLRC, PML and SOC
]=
Outline
• Motivation
– The NERC Datagrid
– Metadata Taxonomy
• Standards
– OGC Vision
– Feature Types
• CSML
– Description
– Prototyping in MarineXML
– Round-Tripping
• The importance of CF
• Feature Type Issues
• Next Steps
• NDG Discovery Demo
WMO Metadata, Sep, 2005
Complexity + Volume + Remote Access = Grid Challenge
British Atmospheric Data Centre
NCAR
British Oceanographic Data Centre
WMO Metadata, Sep, 2005
http://ndg.nerc.ac.uk
Internet Link
tape
robot
Online
Data
XML database
BADC NDG Wrapper
Online
Data
Online
Data
XML database
XML database
BODC NDG
Wrapper
Group NDG
Wrapper
Wider Internet
NERC Grid
Software Agent
Grid User
ESG (&other)
Applications
Satellite
Supercomputer
Research Group Data
Sources
Wider Internet
NDG
Web
Portal
Internet User
Internet Link
XML database
WMO Metadata, Sep, 2005
NDG Metadata Taxonomy
CSML
NCML+CF
MOLES
THREDDS
CLADDIER
DIF ->
ISO19115
… not one
schema,
not one
solution!
WMO Metadata, Sep, 2005
NDG
Things I’m not going to talk about:
• NDG PKI based secure access to datasets
– Supporting authentication/authorisation across
differing virtual organisations without centralised
user or role databases.
– Built on bilateral trust relationships
• NDG Discovery
– Portal already provides access to hundreds of
datasets with international scope
– Open Archives Initiative Harvesting
– Web Service Interfaces
• NDG Browse
– New method of “walking” around datasets.
WMO Metadata, Sep, 2005
OGC Web Feature Service
public  private boundary
HTML
WFS
Client
GML
WFS
Server
Data-source organised for
custodian’s requirements
Community-specific GML application language
– TigerGML, LandGML, O&M, XMML, CGI-GML, ADX,
GPML, CSML, MarineXML etc
Slide adapted from Simon Cox (AUKEGGS, 2005)
WMO Metadata, Sep, 2005
Standard transfer format allows multiple data
sources
WFS
Server
B
WFS
Client
WFS
Server
WFS
Server
C
Slide adapted from Simon Cox (AUKEGGS, 2005)
WMO Metadata, Sep, 2005
NetCDF + WMS
e.g.: ERA40 re-analysis surface air temperature, 2001-04-27
– deegree open-source WMS modified with netCDF connector
Overlaid with rainfall from
globe.digitalearth.gov WMS server
WMO Metadata, Sep, 2005
NB: Now using Mapserver for
Interoperability experiments
Standards
• ISO 19101:
Geographic
information –
Reference
model
…in a defined
logical
structure…
…delivered
through
services…
…and described by
metadata.
A geospatial dataset…
…consists of
features and
related objects…
WMO Metadata, Sep, 2005
Standards
• Geographic ‘features’
– “abstraction of real world
phenomena” [ISO 19101]
– Type or instance
– Encapsulate important
semantics in universe of
discourse
• Application schema
– Defines semantic content
and logical structure of
datasets
– ISO standards provide
toolkit:
• spatial/temporal
referencing
• geometry (1-, 2-, 3-D)
• topology
• dictionaries (phenomena,
units, etc.)
– GML – canonical encoding
[from ISO 19109 “Geographic information –
Rules for Application Schema”]
WMO Metadata, Sep, 2005
Standards
• Emerging ISO standards
– TC211 – around 40 standards for geographic
information
– Cover activity spectrum: discovery  access  use
– Provide a framework for data integration
Feature types
Application schema
...
CTDProfile
startTime[1]
bottomTime[1]
salinity[*]
temperature[*]
depth[*]
...
ISO 19103
...
ISO 19109
ISO 19110
Universe of discourse
FTC
WMO Metadata, Sep, 2005
<?xml version="1.0" encoding="UTF-8"?>
<schema targetNamespace="http://ndg.nerc.ac.uk/csml" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:csml="http://ndg.nerc.ac.uk/csml"
xmlns:om="http://www.opengis.net/om" xmlns:gml="http://www.opengis.net/gml" elementFormDefault="qualified" attributeFormDefault="unqualified" version="0.1">
<annotation>
<documentation>CSML application schema</documentation>
</annotation>
<!--====================================================================== -->
<import namespace="http://www.opengis.net/gml" schemaLocation="GML-3.1.0/base/gml.xsd"/>
<import namespace="http://www.opengis.net/om" schemaLocation="phenomenon.xsd"/>
<!--====================================================================== -->
<!--===== Root element for CSML dataset =====-->
<!--====================================================================== -->
<complexType name="DatasetType">
<complexContent>
<extension base="gml:AbstractGMLType">
<sequence>
<element ref="csml:UnitDefinitions" minOccurs="0" maxOccurs="unbounded"/>
<element ref="csml:ReferenceSystemDefinitions" minOccurs="0" maxOccurs="unbounded"/>
<element ref="csml:PhenomenonDefinitions" minOccurs="0"/>
<element ref="csml:_ArrayDescriptor" minOccurs="0" maxOccurs="unbounded"/>
<element ref="gml:FeatureCollection" minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</extension>
</complexContent>
</complexType>
<element name="Dataset" type="csml:DatasetType"/>
<!--====================================================================== -->
<!--===== Dictionary/definition elements =====-->
<!--====================================================================== -->
<complexType name="ReferenceSystemDefinitionsType">
<complexContent>
<extension base="gml:DictionaryType"/>
</complexContent>
</complexType>
<element name="ReferenceSystemDefinitions" type="csml:ReferenceSystemDefinitionsType"/>
<complexType name="ReferenceSystemDefinitionsPropertyType">
<sequence>
<element ref="csml:ReferenceSystemDefinitions" minOccurs="0"/>
</sequence>
<attributeGroup ref="gml:AssociationAttributeGroup"/>
</complexType>
Standards
• The importance of governance
– Information community defined by shared semantics
– Need community process to manage those semantics
(definitions, models, vocabularies, taxonomies, etc.)
– e.g. CF conventions for netCDF files
– Role of Feature Type Catalogues [ISO 19110] and registers [ISO
19135]
• Governance as driver for granularity
– Remit / interest determines appropriate granularity
– ref. IOC, IHO, WMO
<measurement type=“Radiosonde”
measurand=“temperature”/>
abstract
<Sonde parameter=“temperature”/>
generic
feature types spectrum
WMO Metadata, Sep, 2005
<temperatureProfile/>
highly specialised
NDG-A: Climate Science Modelling Language
• Aims:
– provide semantic integration mechanism for NDG data
– explore new standards-based interoperability framework
– emphasise content, not container
• Design principles:
– offload semantics onto parameter type (‘phenomenon’,
observable, measurand)
• e.g. wind-profiler, balloon temperature sounding
– offload semantics onto CRS
• e.g. scanning radar, sounding radar
– ‘sensible plotting’ as discriminant
• ‘in-principle’ unsupervised portrayal
– explicitly aim for small number of weakly-typed features
(in accordance with governance principle and NDG remit)
WMO Metadata, Sep, 2005
Climate Science
Modelling Language
• CSML feature types
– defined on basis of geometric and topologic
structure
CSML feature
type
TrajectoryFeatu
re
Description
Examples
Discrete path in time and space of a platform or
instrument.
PointFeature
Single point measurement.
ProfileFeature
Single ‘profile’ of some parameter along a
directed line in space.
Single time-snapshot of a gridded field.
ship’s cruise track,
aircraft’s flight path
raingauge
measurement
wind sounding, XBT,
CTD, radiosonde
gridded analysis field
tidegauge, rainfall
timeseries
vertical or scanning
radar, shipborne ADCP,
thermistor chain
timeseries
numerical weather
prediction model, ocean
general circulation
model
GridFeature
PointSeriesFeat
ure
Series of single datum measurements.
ProfileSeriesFe
ature
Series of profile-type measurements.
GridSeriesFeat
ure
Timeseries of gridded parameter fields.
WMO Metadata, Sep, 2005
Climate Science Modelling Language
• CSML feature types
– examples...
ProfileFeature
GridFeature
WMO Metadata, Sep, 2005
ProfileSeriesFeature
Climate Science Modelling Language
• Application schema
– logical structure and semantic content of NDG
‘Dataset’
– Based on GML 3.1
«Type»
GML::AbstractGMLType
«Type»
GML::FeatureCollection
*
«Type»
AbstractArrayDescriptor
*
*
*
*
«Type»
Dataset
*
*
*
«Type»
UnitDefinitions
*
«Type»
ReferenceSystemDefinitions
WMO Metadata, Sep, 2005
*
«Type»
PhenomenonDefinitions
Climate Science Modelling Language
• Numerical array
descriptors
«Type»
GML::AbstractGMLType
+id
+metaDataProperty
+description
+name
– provides ‘wrapper’
architecture for legacy
data files
– ‘Connected’ to data
model numerical content
through ‘xlink:href’
«Type»
AbstractArrayDescriptor
+arraySize[1]
+uom[0..1]
+numericType[0..1]
+numericTransform[0..1]
+regExpTransform[0..1]
*
+component
1
• Three subtypes:
«Type»
AggregatedArray
+aggType[1]
+aggIndex[1]
– InlineArray
– ArrayGenerator
– FileExtract (NASAAmes,
NetCDF, GRIB)
«Type»
InlineArray
+values[*]
• Composite design pattern
for aggregation
«Type»
ArrayGenerator
+expression[1]
«Type»
NASAAmesExtract
+variableName[1]
+index[0..1]
WMO Metadata, Sep, 2005
«Type»
AbstractFileExtract
+fileName[1]
«Type»
NetCDFExtract
+variableName[1]
«Type»
GRIBExtract
+parameterCode[1]
+recordNumber[0..1]
+fileOffset[0..1]
Climate Science Modelling Language
• Inline array
<NDGInlineArray>
<arraySize>5 2</arraySize>
<uom>udunits.xml#degreeC</uom>
<numericType>float</numericType>
<regExpTransform>s/10/9/ge</regExpTransform>
<numericTransform>+5</numericTransform>
<values>1 2 3 4 5 6 7 8 9 10</values>
</NDGInlineArray>
• Array generator
<NDGArrayGenerator>
<arraySize>10001</arraySize>
<uom>udunits.xml#minute</uom>
<numericType>float</numericType>
<expression>0:5:50000</expression>
</NDGArrayGenerator>
WMO Metadata, Sep, 2005
Climate Science Modelling Language
File extract
<NDGNASAAmesExtract>
<arraySize>526</arraySize>
<numericType>double</numericType>
<fileName>/data/BADC/macehead/mh960606.cf1</fileName>
<variableName>CFC-12</variableName>
</NDGNASAAmesExtract>
<NDGNetCDFExtract gml:id="feat04azimuth">
<arraySize>10000</arraySize>
<fileName>radar_data.nc</fileName>
<variableName>az</variableName>
</NDGNetCDFExtract>
<NDGGRIBExtract>
<arraySize>320 160</arraySize>
<numericType>double</numericType>
<fileName>/e40/ggas1992010100rsn.grb</fileName>
<parameterCode>203</parameterCode>
<recordNumber>5</ recordNumber>
<fileOffset>289412</fileOffset>
</NDGGRIBExtract>
WMO Metadata, Sep, 2005
CSML Observations
<gml:dictionaryEntry>
<om:Phenomenon
gml:id="atmosphere_cloud_condensed_water_content">
<gml:description>"condensed_water" means liquid and ice. "Content"
indicates a quantity per unit area. The "atmosphere content" of a quantity
refers to the vertical integral from the surface to the top of the
atmosphere. For the content between specified levels in the atmosphere,
standard names including content_of_atmosphere_layer are
used.</gml:description>
<gml:name codeSpace="http://www.cgd.ucar.edu/cms/eaton/cfmetadata/">atmosphere_cloud_condensed_water_content</gml:name>
<gml:name codeSpace="GRIB">76</gml:name>
<gml:name codeSpace="PCMDI">clwvi</gml:name>
</om:Phenomenon>
</gml:dictionaryEntry>
WMO Metadata, Sep, 2005
MarineXML Testbed
For each XSD (for the
source data) there is an
XSLT to translate the
data to the Feature
Types (FT) defined by
CSML. The FT’s and
XSLT are maintained in a
‘MarineXML registry’
Data from different
parts of the marine
community conforming
to a variety of schema
(XSD)
XSD
XML
Biological
Species
Phenomena in the
XSD must have
an associated
portrayal
The FTs can then
be translated to
equivalent FTs for
display in the
ECDIS system
S52 Portrayal
Library
XSD
XML
Chl-a from
Satellite
XSLT
XML
Parser
XSD
Measured
Hydrodynamics
XML
XSLT
XSLT
XSLT
Marine
XML
GML
(NDG)
XML
Feature
Types
XSD
XML
Modelled
Hydrodynamics
XSD
XML
S-57v3 GML
The result of the
translation is an
encoding that
contains the
Feature described using marine data in
S-57v3.1Application weakly typed (i.e.
Schema can be imported
generic) Features
and are equivalent to the
same features in CSML’
Slide adapted from Kieran Millard (AUKEGGS, 2005)
XSLT
SENC
SeeMyDENC
XSLT
ECDIS acts as an
example client for
the data.
Data Dictionary
Features in the source
XSD must be present in
the data dictionary.
WMO Metadata, Sep, 2005
MarineXML Testbed
Biological sampling station
of Chl-a
with Grid
attributes
for from
the the
MERIS
instrument
on
species
sampled
at each
ENVISAT
Predicted and measured
wave climate timeseries
(height, direction and
period)
Vectors of currents from
instruments
Slide adapted from Kieran Millard (AUKEGGS, 2005)
WMO Metadata, Sep, 2005
The Concept of re-using Features
Here structured XML is
converted
plain ascii
HTMLto
warning
service
Here
the
same
XML is
text in
the form
required
pages
are generated
‘on
converted
to the
XML
can also
be SENC
for athe
numerical
model
fly’
format used
a to
converted
to in
SVG
proprietary
for viewing
display
datatool
graphically
electronic navigation
charts.
All this requires agreement
on standards
Slide adapted from Kieran Millard (AUKEGGS, 2005)
WMO Metadata, Sep, 2005
Climate Science Modelling Language
• Status:
– Initial feature types defined
– First draft application schema complete
– Trial software tooling being coded (parser, netCDF
instantiation)
– Initial deployment trial across BODC, BADC datasets
• Future:
– Separate out wrapper implementation (array
descriptors)
– Disallow ‘internal’ dictionaries
– More strongly-typed features?
• Complex feature (vector measurements)
• Implicit Ensemble Support
• Swathes
– Follow (and pursue!) GML evolution, enhance
compliance
– Expand tooling
WMO Metadata, Sep, 2005
CSML Round Tripping - 1
Managing semantics
conceptual model
-End1
Class1
-End2
New
Dataset
«datatype»
DataType1
1
Conforms to
*
Class2
101
010
UGAS
produces
Under
Development
Application
parser
<gml:featureMember>
<NDGPointFeature
gml:id="ICES_100">
<NDGPointDomain>
<domainReference>
<NDGPosition
srsName="urn:EPSG:geographicCRS:497
9" axisLabels="Lat Long"
uomLabels="degree degree">
<location>55.25 6.5</location>
</NDGPosition>
</domainReference>
</NDGPointDomain>
<gml:rangeSet>
<gml:DataBlock>
<gml:rangeParameters>
<gml:CompositeValue>
<gml:valueComponents>
<gml:measure uom="#tn"/>
<gml:measure uom="#amount"/>
<gml:measure uom="#gsm"/>
</gml:valueComponents>
</gml:CompositeValue>
</gml:rangeParameters>
<gml:tupleList>
GML dataset
WMO Metadata, Sep, 2005
XML
GML app schema
instance
CSML Round Tripping - 2
Managing data - 1
Under
Development
CF
Dataset
GML app schema
scanner
101
010
produces
XML
CF
Under
Development
Application
parser
<gml:featureMember>
<NDGPointFeature
gml:id="ICES_100">
<NDGPointDomain>
<domainReference>
<NDGPosition
srsName="urn:EPSG:geographicCRS:497
9" axisLabels="Lat Long"
uomLabels="degree degree">
<location>55.25 6.5</location>
</NDGPosition>
</domainReference>
</NDGPointDomain>
<gml:rangeSet>
<gml:DataBlock>
<gml:rangeParameters>
<gml:CompositeValue>
<gml:valueComponents>
<gml:measure uom="#tn"/>
<gml:measure uom="#amount"/>
<gml:measure uom="#gsm"/>
</gml:valueComponents>
</gml:CompositeValue>
</gml:rangeParameters>
<gml:tupleList>
GML dataset
WMO Metadata, Sep, 2005
instance
Climate Forecasting Conventions
For using NetCDF to store CF data:
1.
2.
3.
4.
Data should be self-describing. No external tables are
needed to interpret the file. For instance, CF encodings
do not depend upon numeric codes (by contrast with
GRIB).
The convention should be easy to use, both for datawriters and users of data.
The metadata and the semantic meaning encoding
though the metadata should be readable by humans
as well as easily utilized by programs.
Redundancy should be minimised as much as possible
(because it reduces the chance of errors of
inconsistency when writing data)
WMO Metadata, Sep, 2005
CF
CF consists of:
• Vocabulary management
• Semantic concepts (axes, cells etc),
• and format specific conventions
(NetCDF now)
CF is at the heart of
• IPCC data comparison
• Academic earth system science data
exploitation (and archival).
WMO Metadata, Sep, 2005
CF
CF
• Exploits 100’s of man years of effort on
NetCDF evolution and tools
• Is one of the means by which we can take
NetCDF data and make meaningful feature
types.
• Application schema will depend on CF (a la the
registry relationships).
CF
• Needs WMO blessing (to validate the effort of
those who work on it as a standards activity).
WMO Metadata, Sep, 2005
Managing Data 2
CF
Dataset
CF
Dataset
101
010
101
010
scanner
<gml:featureMember>
<NDGPointFeature
gml:id="ICES_100">
<NDGPointDomain>
<domainReference>
<NDGPosition
srsName="urn:EPSG:geographicCRS:497
9" axisLabels="Lat Long"
uomLabels="degree degree">
<location>55.25 6.5</location>
</NDGPosition>
</domainReference>
</NDGPointDomain>
<gml:rangeSet>
<gml:DataBlock>
<gml:rangeParameters>
<gml:CompositeValue>
<gml:valueComponents>
<gml:measure uom="#tn"/>
<gml:measure uom="#amount"/>
<gml:measure uom="#gsm"/>
</gml:valueComponents>
</gml:CompositeValue>
</gml:rangeParameters>
<gml:tupleList>
GML dataset
Define
Dataset
XSLT
DECISION
PROCESSES
Add Information
PUBLISH
XML
ISO19115
WMO Metadata, Sep, 2005
Metadata extensions and profiles
ISO19115 Concept
Should apply to all “metadata”
WMO Metadata, Sep, 2005
D – ISO19115
• Most important decision to make is:
– “What is a dataset”
• Governance Issue
• Granularity Issue
• Not to be confused with application
schema
• Can and should exploit external
vocabularies
• Should be extended!
WMO Metadata, Sep, 2005
NumSim.xsd – extension to D
WMO Metadata, Sep, 2005
Registry Relationships
Feature
Types, need
• relationships
• operations
GML
Managed Registries Crucial to
Interoperabilty
WMO Metadata, Sep, 2005
Catalogues or Registries?
The terms catalogue and registry are often used
interchangeably, but in OGC speak:
“a registry is a catalogue that exemplifies a formal
registration process (e.g., such as those described in
ISO 19135 or ISO 11179-6).”
A registry is typically maintained by an authorized
registration authority who assumes responsibility for
complying with a set of policies and procedures for
accessing and managing metadata items within some
application domain.
WMO Metadata, Sep, 2005
Exploit Theory of Measurement
Work with Simon Cox on relationship between features,
coverages, complex (multiple) measurements and CSML.
Get “metocean” concepts in future version of GML!
WMO Metadata, Sep, 2005
Where next with feature types?
• Feature Types (ISO 19110) are the primary language
component for a spatial community to define real world
phenomena.
• Although ISO19136 provides a route for the encoding of
Feature Types in XML, it does not provide a route-map
for defining the features themselves.
• The level of granularity for Feature Types has to be
based on the governance model. If no-one is prepared
to put in place a process to define semantics,
interoperability has reached its limits.
• … Marine: MOTIIVE, WMO: ????
WMO Metadata, Sep, 2005
Summary Points
• Need clear distinction in mind between inter- and intraoperability
– When to use GML etc, and when not to!
• Need clear governance principles at a number of levels
for
– Dataset definitions (data provider, e.g. BADC)
– Feature-type definitions (community, e.g. WMO)
– Interoperability (frameworks, e.g. OGC)
• CSML
– Provides some geometric features based on portrayal
– Can and will be extended
– Provides generalisation of specific examples, e.g. METARS
• WMO should:
– “stand-up” a reusable feature-type catalogue
(but how? Work underway in MOTIIVE etc)
– Support CF development
– Work on dictionaries and thesauri and publish them.
– Exploit work in the academic community
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
Choose to return
either data or
“B-”Metadata
Can order
responses by Title,
Data Centre or
Temporal coverage
(default random)
Look at DIFs in
either HTML or
XML
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
Background activity being
parallelised with
GODIVA/CCLRC e-science
collaboration (spectral ->
gridpoint + CDMS +
visualisation tools)
Download either plot or the
data that went into the plot.
WMO Metadata, Sep, 2005
ERA40:
•All driven from
one CDML file, 9
TB online
spherical
harmonics,
looking like 40
TB “virtual”
gridded!
WMO Metadata, Sep, 2005
Download