Describing data through metadata and XML: the CSML experience. Bryan Lawrence1 Andrew Woolf2, Roy Lowry3, Kerstin Kleese van Dam2, Ray Cramer3, Marta Gutierrez1, Siva Kondapalli3, Susan Latham1, Dominic Lowe1, Kevin O’Neill2, Ag Stephens1 … and Michael Burek4 1British Atmospheric Data Centre 2CCLRC e-Science Centre 3British Oceanographic Data Centre 4National Center for Atmospheric Research + + + + +[ BADC, BODC, CCLRC, PML and SOC ]= Outline • Motivation – The NERC Datagrid – Metadata Taxonomy • Standards – OGC Vision – Feature Types • CSML – Description – Prototyping in MarineXML – Round-Tripping • The importance of CF • Feature Type Issues • Next Steps • NDG Discovery Demo WMO Metadata, Sep, 2005 Complexity + Volume + Remote Access = Grid Challenge British Atmospheric Data Centre NCAR British Oceanographic Data Centre WMO Metadata, Sep, 2005 http://ndg.nerc.ac.uk Internet Link tape robot Online Data XML database BADC NDG Wrapper Online Data Online Data XML database XML database BODC NDG Wrapper Group NDG Wrapper Wider Internet NERC Grid Software Agent Grid User ESG (&other) Applications Satellite Supercomputer Research Group Data Sources Wider Internet NDG Web Portal Internet User Internet Link XML database WMO Metadata, Sep, 2005 NDG Metadata Taxonomy CSML NCML+CF MOLES THREDDS CLADDIER DIF -> ISO19115 … not one schema, not one solution! WMO Metadata, Sep, 2005 NDG Things I’m not going to talk about: • NDG PKI based secure access to datasets – Supporting authentication/authorisation across differing virtual organisations without centralised user or role databases. – Built on bilateral trust relationships • NDG Discovery – Portal already provides access to hundreds of datasets with international scope – Open Archives Initiative Harvesting – Web Service Interfaces • NDG Browse – New method of “walking” around datasets. WMO Metadata, Sep, 2005 OGC Web Feature Service public private boundary HTML WFS Client GML WFS Server Data-source organised for custodian’s requirements Community-specific GML application language – TigerGML, LandGML, O&M, XMML, CGI-GML, ADX, GPML, CSML, MarineXML etc Slide adapted from Simon Cox (AUKEGGS, 2005) WMO Metadata, Sep, 2005 Standard transfer format allows multiple data sources WFS Server B WFS Client WFS Server WFS Server C Slide adapted from Simon Cox (AUKEGGS, 2005) WMO Metadata, Sep, 2005 NetCDF + WMS e.g.: ERA40 re-analysis surface air temperature, 2001-04-27 – deegree open-source WMS modified with netCDF connector Overlaid with rainfall from globe.digitalearth.gov WMS server WMO Metadata, Sep, 2005 NB: Now using Mapserver for Interoperability experiments Standards • ISO 19101: Geographic information – Reference model …in a defined logical structure… …delivered through services… …and described by metadata. A geospatial dataset… …consists of features and related objects… WMO Metadata, Sep, 2005 Standards • Geographic ‘features’ – “abstraction of real world phenomena” [ISO 19101] – Type or instance – Encapsulate important semantics in universe of discourse • Application schema – Defines semantic content and logical structure of datasets – ISO standards provide toolkit: • spatial/temporal referencing • geometry (1-, 2-, 3-D) • topology • dictionaries (phenomena, units, etc.) – GML – canonical encoding [from ISO 19109 “Geographic information – Rules for Application Schema”] WMO Metadata, Sep, 2005 Standards • Emerging ISO standards – TC211 – around 40 standards for geographic information – Cover activity spectrum: discovery access use – Provide a framework for data integration Feature types Application schema ... CTDProfile startTime[1] bottomTime[1] salinity[*] temperature[*] depth[*] ... ISO 19103 ... ISO 19109 ISO 19110 Universe of discourse FTC WMO Metadata, Sep, 2005 <?xml version="1.0" encoding="UTF-8"?> <schema targetNamespace="http://ndg.nerc.ac.uk/csml" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:csml="http://ndg.nerc.ac.uk/csml" xmlns:om="http://www.opengis.net/om" xmlns:gml="http://www.opengis.net/gml" elementFormDefault="qualified" attributeFormDefault="unqualified" version="0.1"> <annotation> <documentation>CSML application schema</documentation> </annotation> <!--====================================================================== --> <import namespace="http://www.opengis.net/gml" schemaLocation="GML-3.1.0/base/gml.xsd"/> <import namespace="http://www.opengis.net/om" schemaLocation="phenomenon.xsd"/> <!--====================================================================== --> <!--===== Root element for CSML dataset =====--> <!--====================================================================== --> <complexType name="DatasetType"> <complexContent> <extension base="gml:AbstractGMLType"> <sequence> <element ref="csml:UnitDefinitions" minOccurs="0" maxOccurs="unbounded"/> <element ref="csml:ReferenceSystemDefinitions" minOccurs="0" maxOccurs="unbounded"/> <element ref="csml:PhenomenonDefinitions" minOccurs="0"/> <element ref="csml:_ArrayDescriptor" minOccurs="0" maxOccurs="unbounded"/> <element ref="gml:FeatureCollection" minOccurs="0" maxOccurs="unbounded"/> </sequence> </extension> </complexContent> </complexType> <element name="Dataset" type="csml:DatasetType"/> <!--====================================================================== --> <!--===== Dictionary/definition elements =====--> <!--====================================================================== --> <complexType name="ReferenceSystemDefinitionsType"> <complexContent> <extension base="gml:DictionaryType"/> </complexContent> </complexType> <element name="ReferenceSystemDefinitions" type="csml:ReferenceSystemDefinitionsType"/> <complexType name="ReferenceSystemDefinitionsPropertyType"> <sequence> <element ref="csml:ReferenceSystemDefinitions" minOccurs="0"/> </sequence> <attributeGroup ref="gml:AssociationAttributeGroup"/> </complexType> Standards • The importance of governance – Information community defined by shared semantics – Need community process to manage those semantics (definitions, models, vocabularies, taxonomies, etc.) – e.g. CF conventions for netCDF files – Role of Feature Type Catalogues [ISO 19110] and registers [ISO 19135] • Governance as driver for granularity – Remit / interest determines appropriate granularity – ref. IOC, IHO, WMO <measurement type=“Radiosonde” measurand=“temperature”/> abstract <Sonde parameter=“temperature”/> generic feature types spectrum WMO Metadata, Sep, 2005 <temperatureProfile/> highly specialised NDG-A: Climate Science Modelling Language • Aims: – provide semantic integration mechanism for NDG data – explore new standards-based interoperability framework – emphasise content, not container • Design principles: – offload semantics onto parameter type (‘phenomenon’, observable, measurand) • e.g. wind-profiler, balloon temperature sounding – offload semantics onto CRS • e.g. scanning radar, sounding radar – ‘sensible plotting’ as discriminant • ‘in-principle’ unsupervised portrayal – explicitly aim for small number of weakly-typed features (in accordance with governance principle and NDG remit) WMO Metadata, Sep, 2005 Climate Science Modelling Language • CSML feature types – defined on basis of geometric and topologic structure CSML feature type TrajectoryFeatu re Description Examples Discrete path in time and space of a platform or instrument. PointFeature Single point measurement. ProfileFeature Single ‘profile’ of some parameter along a directed line in space. Single time-snapshot of a gridded field. ship’s cruise track, aircraft’s flight path raingauge measurement wind sounding, XBT, CTD, radiosonde gridded analysis field tidegauge, rainfall timeseries vertical or scanning radar, shipborne ADCP, thermistor chain timeseries numerical weather prediction model, ocean general circulation model GridFeature PointSeriesFeat ure Series of single datum measurements. ProfileSeriesFe ature Series of profile-type measurements. GridSeriesFeat ure Timeseries of gridded parameter fields. WMO Metadata, Sep, 2005 Climate Science Modelling Language • CSML feature types – examples... ProfileFeature GridFeature WMO Metadata, Sep, 2005 ProfileSeriesFeature Climate Science Modelling Language • Application schema – logical structure and semantic content of NDG ‘Dataset’ – Based on GML 3.1 «Type» GML::AbstractGMLType «Type» GML::FeatureCollection * «Type» AbstractArrayDescriptor * * * * «Type» Dataset * * * «Type» UnitDefinitions * «Type» ReferenceSystemDefinitions WMO Metadata, Sep, 2005 * «Type» PhenomenonDefinitions Climate Science Modelling Language • Numerical array descriptors «Type» GML::AbstractGMLType +id +metaDataProperty +description +name – provides ‘wrapper’ architecture for legacy data files – ‘Connected’ to data model numerical content through ‘xlink:href’ «Type» AbstractArrayDescriptor +arraySize[1] +uom[0..1] +numericType[0..1] +numericTransform[0..1] +regExpTransform[0..1] * +component 1 • Three subtypes: «Type» AggregatedArray +aggType[1] +aggIndex[1] – InlineArray – ArrayGenerator – FileExtract (NASAAmes, NetCDF, GRIB) «Type» InlineArray +values[*] • Composite design pattern for aggregation «Type» ArrayGenerator +expression[1] «Type» NASAAmesExtract +variableName[1] +index[0..1] WMO Metadata, Sep, 2005 «Type» AbstractFileExtract +fileName[1] «Type» NetCDFExtract +variableName[1] «Type» GRIBExtract +parameterCode[1] +recordNumber[0..1] +fileOffset[0..1] Climate Science Modelling Language • Inline array <NDGInlineArray> <arraySize>5 2</arraySize> <uom>udunits.xml#degreeC</uom> <numericType>float</numericType> <regExpTransform>s/10/9/ge</regExpTransform> <numericTransform>+5</numericTransform> <values>1 2 3 4 5 6 7 8 9 10</values> </NDGInlineArray> • Array generator <NDGArrayGenerator> <arraySize>10001</arraySize> <uom>udunits.xml#minute</uom> <numericType>float</numericType> <expression>0:5:50000</expression> </NDGArrayGenerator> WMO Metadata, Sep, 2005 Climate Science Modelling Language File extract <NDGNASAAmesExtract> <arraySize>526</arraySize> <numericType>double</numericType> <fileName>/data/BADC/macehead/mh960606.cf1</fileName> <variableName>CFC-12</variableName> </NDGNASAAmesExtract> <NDGNetCDFExtract gml:id="feat04azimuth"> <arraySize>10000</arraySize> <fileName>radar_data.nc</fileName> <variableName>az</variableName> </NDGNetCDFExtract> <NDGGRIBExtract> <arraySize>320 160</arraySize> <numericType>double</numericType> <fileName>/e40/ggas1992010100rsn.grb</fileName> <parameterCode>203</parameterCode> <recordNumber>5</ recordNumber> <fileOffset>289412</fileOffset> </NDGGRIBExtract> WMO Metadata, Sep, 2005 CSML Observations <gml:dictionaryEntry> <om:Phenomenon gml:id="atmosphere_cloud_condensed_water_content"> <gml:description>"condensed_water" means liquid and ice. "Content" indicates a quantity per unit area. The "atmosphere content" of a quantity refers to the vertical integral from the surface to the top of the atmosphere. For the content between specified levels in the atmosphere, standard names including content_of_atmosphere_layer are used.</gml:description> <gml:name codeSpace="http://www.cgd.ucar.edu/cms/eaton/cfmetadata/">atmosphere_cloud_condensed_water_content</gml:name> <gml:name codeSpace="GRIB">76</gml:name> <gml:name codeSpace="PCMDI">clwvi</gml:name> </om:Phenomenon> </gml:dictionaryEntry> WMO Metadata, Sep, 2005 MarineXML Testbed For each XSD (for the source data) there is an XSLT to translate the data to the Feature Types (FT) defined by CSML. The FT’s and XSLT are maintained in a ‘MarineXML registry’ Data from different parts of the marine community conforming to a variety of schema (XSD) XSD XML Biological Species Phenomena in the XSD must have an associated portrayal The FTs can then be translated to equivalent FTs for display in the ECDIS system S52 Portrayal Library XSD XML Chl-a from Satellite XSLT XML Parser XSD Measured Hydrodynamics XML XSLT XSLT XSLT Marine XML GML (NDG) XML Feature Types XSD XML Modelled Hydrodynamics XSD XML S-57v3 GML The result of the translation is an encoding that contains the Feature described using marine data in S-57v3.1Application weakly typed (i.e. Schema can be imported generic) Features and are equivalent to the same features in CSML’ Slide adapted from Kieran Millard (AUKEGGS, 2005) XSLT SENC SeeMyDENC XSLT ECDIS acts as an example client for the data. Data Dictionary Features in the source XSD must be present in the data dictionary. WMO Metadata, Sep, 2005 MarineXML Testbed Biological sampling station of Chl-a with Grid attributes for from the the MERIS instrument on species sampled at each ENVISAT Predicted and measured wave climate timeseries (height, direction and period) Vectors of currents from instruments Slide adapted from Kieran Millard (AUKEGGS, 2005) WMO Metadata, Sep, 2005 The Concept of re-using Features Here structured XML is converted plain ascii HTMLto warning service Here the same XML is text in the form required pages are generated ‘on converted to the XML can also be SENC for athe numerical model fly’ format used a to converted to in SVG proprietary for viewing display datatool graphically electronic navigation charts. All this requires agreement on standards Slide adapted from Kieran Millard (AUKEGGS, 2005) WMO Metadata, Sep, 2005 Climate Science Modelling Language • Status: – Initial feature types defined – First draft application schema complete – Trial software tooling being coded (parser, netCDF instantiation) – Initial deployment trial across BODC, BADC datasets • Future: – Separate out wrapper implementation (array descriptors) – Disallow ‘internal’ dictionaries – More strongly-typed features? • Complex feature (vector measurements) • Implicit Ensemble Support • Swathes – Follow (and pursue!) GML evolution, enhance compliance – Expand tooling WMO Metadata, Sep, 2005 CSML Round Tripping - 1 Managing semantics conceptual model -End1 Class1 -End2 New Dataset «datatype» DataType1 1 Conforms to * Class2 101 010 UGAS produces Under Development Application parser <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:497 9" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> GML dataset WMO Metadata, Sep, 2005 XML GML app schema instance CSML Round Tripping - 2 Managing data - 1 Under Development CF Dataset GML app schema scanner 101 010 produces XML CF Under Development Application parser <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:497 9" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> GML dataset WMO Metadata, Sep, 2005 instance Climate Forecasting Conventions For using NetCDF to store CF data: 1. 2. 3. 4. Data should be self-describing. No external tables are needed to interpret the file. For instance, CF encodings do not depend upon numeric codes (by contrast with GRIB). The convention should be easy to use, both for datawriters and users of data. The metadata and the semantic meaning encoding though the metadata should be readable by humans as well as easily utilized by programs. Redundancy should be minimised as much as possible (because it reduces the chance of errors of inconsistency when writing data) WMO Metadata, Sep, 2005 CF CF consists of: • Vocabulary management • Semantic concepts (axes, cells etc), • and format specific conventions (NetCDF now) CF is at the heart of • IPCC data comparison • Academic earth system science data exploitation (and archival). WMO Metadata, Sep, 2005 CF CF • Exploits 100’s of man years of effort on NetCDF evolution and tools • Is one of the means by which we can take NetCDF data and make meaningful feature types. • Application schema will depend on CF (a la the registry relationships). CF • Needs WMO blessing (to validate the effort of those who work on it as a standards activity). WMO Metadata, Sep, 2005 Managing Data 2 CF Dataset CF Dataset 101 010 101 010 scanner <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:497 9" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> GML dataset Define Dataset XSLT DECISION PROCESSES Add Information PUBLISH XML ISO19115 WMO Metadata, Sep, 2005 Metadata extensions and profiles ISO19115 Concept Should apply to all “metadata” WMO Metadata, Sep, 2005 D – ISO19115 • Most important decision to make is: – “What is a dataset” • Governance Issue • Granularity Issue • Not to be confused with application schema • Can and should exploit external vocabularies • Should be extended! WMO Metadata, Sep, 2005 NumSim.xsd – extension to D WMO Metadata, Sep, 2005 Registry Relationships Feature Types, need • relationships • operations GML Managed Registries Crucial to Interoperabilty WMO Metadata, Sep, 2005 Catalogues or Registries? The terms catalogue and registry are often used interchangeably, but in OGC speak: “a registry is a catalogue that exemplifies a formal registration process (e.g., such as those described in ISO 19135 or ISO 11179-6).” A registry is typically maintained by an authorized registration authority who assumes responsibility for complying with a set of policies and procedures for accessing and managing metadata items within some application domain. WMO Metadata, Sep, 2005 Exploit Theory of Measurement Work with Simon Cox on relationship between features, coverages, complex (multiple) measurements and CSML. Get “metocean” concepts in future version of GML! WMO Metadata, Sep, 2005 Where next with feature types? • Feature Types (ISO 19110) are the primary language component for a spatial community to define real world phenomena. • Although ISO19136 provides a route for the encoding of Feature Types in XML, it does not provide a route-map for defining the features themselves. • The level of granularity for Feature Types has to be based on the governance model. If no-one is prepared to put in place a process to define semantics, interoperability has reached its limits. • … Marine: MOTIIVE, WMO: ???? WMO Metadata, Sep, 2005 Summary Points • Need clear distinction in mind between inter- and intraoperability – When to use GML etc, and when not to! • Need clear governance principles at a number of levels for – Dataset definitions (data provider, e.g. BADC) – Feature-type definitions (community, e.g. WMO) – Interoperability (frameworks, e.g. OGC) • CSML – Provides some geometric features based on portrayal – Can and will be extended – Provides generalisation of specific examples, e.g. METARS • WMO should: – “stand-up” a reusable feature-type catalogue (but how? Work underway in MOTIIVE etc) – Support CF development – Work on dictionaries and thesauri and publish them. – Exploit work in the academic community WMO Metadata, Sep, 2005 WMO Metadata, Sep, 2005 Choose to return either data or “B-”Metadata Can order responses by Title, Data Centre or Temporal coverage (default random) Look at DIFs in either HTML or XML WMO Metadata, Sep, 2005 WMO Metadata, Sep, 2005 WMO Metadata, Sep, 2005 WMO Metadata, Sep, 2005 WMO Metadata, Sep, 2005 WMO Metadata, Sep, 2005 WMO Metadata, Sep, 2005 WMO Metadata, Sep, 2005 WMO Metadata, Sep, 2005 WMO Metadata, Sep, 2005 WMO Metadata, Sep, 2005 Background activity being parallelised with GODIVA/CCLRC e-science collaboration (spectral -> gridpoint + CDMS + visualisation tools) Download either plot or the data that went into the plot. WMO Metadata, Sep, 2005 ERA40: •All driven from one CDML file, 9 TB online spherical harmonics, looking like 40 TB “virtual” gridded! WMO Metadata, Sep, 2005