NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf1, Bryan Lawrence2, Roy Lowry3, Kerstin Kleese van Dam1, Ray Cramer3, Marta Gutierrez2, Siva Kondapalli3, Susan Latham2, Dominic Lowe2, Kevin O’Neill1, Ag Stephens2 1CCLRC e-Science Centre 2British Atmospheric Data Centre 3British Oceanographic Data Centre 1 Outline NESC workshop Grid and Geospatial Standards 7-Sep-2005 Background Standards – a framework for interoperability Climate Science Modelling Language (CSML) Using CSML 2 Background NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration requirements: • scalability across providers • warehousing not an option • enhance access and use, ‘outwards-facing’ (e.g. impacts community, policymakers) • storage heterogeneity Semantics as integration ‘key’ • common language across providers (and users) • supports wrapper/mediator architecture 3 Standards NESC workshop Grid and Geospatial Standards 7-Sep-2005 Emerging ISO standards • TC211 – around 40 standards for geographic information • Cover activity spectrum: discovery access use …in a defined logical structure… …and described by metadata. …delivered through services… ISO 19101 Domain Reference Model A geospatial dataset… …consists of features and related objects… 4 Standards NESC workshop Grid and Geospatial Standards 7-Sep-2005 Geographic ‘features’ • • • “abstraction of real world phenomena” [ISO 19101] Type or instance Encapsulate important semantics in universe of discourse Application schema • • • Defines semantic content and logical structure of datasets ISO standards provide toolkit: • spatial/temporal referencing • geometry (1-, 2-, 3-D) • topology • dictionaries (phenomena, units, etc.) GML – canonical encoding [from ISO 19109 “Geographic information – Rules for Application Schema”] 5 Standards NESC workshop Grid and Geospatial Standards 7-Sep-2005 The importance of governance • Information community defined by shared semantics • Need community process to manage those semantics (definitions, models, vocabularies, taxonomies, etc.) • e.g. CF conventions for netCDF files • Role of Feature Type Catalogues [ISO 19110] and registers [ISO 19135] Governance as driver for granularity • Remit / interest determines appropriate granularity • e.g. IOC, IHO, WMO <measurement type=“Radiosonde” measurand=“temperature”/> abstract <Sonde parameter=“temperature”/> generic <temperatureProfile/> highly specialised feature types spectrum 6 Aims: Climate Science Modelling Language NESC workshop Grid and Geospatial Standards 7-Sep-2005 • provide semantic integration mechanism for NDG data • explore new standards-based interoperability framework • emphasise content, not container Design principles: • offload semantics onto parameter type (‘phenomenon’, observable, measurand) • e.g. wind-profiler, balloon temperature sounding • offload semantics onto CRS • e.g. scanning radar, sounding radar • ‘sensible plotting’ as discriminant • ‘in-principle’ unsupervised portrayal • explicitly aim for small number of weakly-typed features (in accordance with governance principle and NDG remit) 7 Climate Science Modelling Language NESC workshop Grid and Geospatial Standards 7-Sep-2005 CSML feature types • defined on basis of geometric and topologic structure CSML feature type Description Examples TrajectoryFeature Discrete path in time and space of a platform or instrument. PointFeature Single point measurement. ProfileFeature Single ‘profile’ of some parameter along a directed line in space. ship’s cruise track, aircraft’s flight path raingauge measurement wind sounding, XBT, CTD, radiosonde GridFeature Single time-snapshot of a gridded field. gridded analysis field PointSeriesFeature Series of single datum measurements. tidegauge, rainfall timeseries ProfileSeriesFeature Series of profile-type measurements. vertical or scanning radar, shipborne ADCP, thermistor chain timeseries GridSeriesFeature Timeseries of gridded parameter fields. numerical weather prediction model, ocean general circulation model 8 Climate Science Modelling Language NESC workshop Grid and Geospatial Standards 7-Sep-2005 ProfileSeriesFeature CSML feature types • examples... ProfileFeature GridFeature 9 Climate Science Modelling Language NESC workshop Grid and Geospatial Standards 7-Sep-2005 Application schema • • logical structure and semantic content of NDG ‘Dataset’ Based on GML 3.1 «Type» GML::AbstractGMLType «Type» GML::FeatureCollection * «Type» AbstractArrayDescriptor * * * * «Type» Dataset * * * «Type» UnitDefinitions * «Type» ReferenceSystemDefinitions * «Type» PhenomenonDefinitions 10 Climate Science Modelling Language NESC workshop Grid and Geospatial Standards 7-Sep-2005 Integration approaches: wrapper/mediator dataInterface dataInterface mediatorService mediatorService wrapper wrapper BODC BADC 11 Climate Science Modelling Language Numerical array descriptors • • «Type» GML::AbstractGMLType +id +metaDataProperty +description +name provides ‘wrapper’ architecture for legacy data files ‘Connected’ to data model numerical content through ‘xlink:href’ «Type» AbstractArrayDescriptor +arraySize[1] +uom[0..1] +numericType[0..1] +numericTransform[0..1] +regExpTransform[0..1] Three subtypes: • • • InlineArray ArrayGenerator FileExtract (NASAAmes, NetCDF, GRIB) Composite design pattern for aggregation NESC workshop Grid and Geospatial Standards 7-Sep-2005 * +component 1 «Type» AggregatedArray +aggType[1] +aggIndex[1] «Type» InlineArray +values[*] «Type» ArrayGenerator +expression[1] «Type» NASAAmesExtract +variableName[1] +index[0..1] «Type» AbstractFileExtract +fileName[1] «Type» NetCDFExtract +variableName[1] «Type» GRIBExtract +parameterCode[1] +recordNumber[0..1] +fileOffset[0..1] 12 Climate Science Modelling Language NESC workshop Grid and Geospatial Standards 7-Sep-2005 Inline array <NDGInlineArray> <arraySize>5 2</arraySize> <uom>udunits.xml#degreeC</uom> <numericType>float</numericType> <regExpTransform>s/10/9/ge</regExpTransform> <numericTransform>+5</numericTransform> <values>1 2 3 4 5 6 7 8 9 10</values> </NDGInlineArray> Array generator <NDGArrayGenerator> <arraySize>10001</arraySize> <uom>udunits.xml#minute</uom> <numericType>float</numericType> <expression>0:5:50000</expression> </NDGArrayGenerator> 13 Climate Science Modelling Language NESC workshop Grid and Geospatial Standards 7-Sep-2005 File extract <NDGNASAAmesExtract> <arraySize>526</arraySize> <numericType>double</numericType> <fileName>/data/BADC/macehead/mh960606.cf1</fileName> <variableName>CFC-12</variableName> </NDGNASAAmesExtract> <NDGNetCDFExtract gml:id="feat04azimuth"> <arraySize>10000</arraySize> <fileName>radar_data.nc</fileName> <variableName>az</variableName> </NDGNetCDFExtract> <NDGGRIBExtract> <arraySize>320 160</arraySize> <numericType>double</numericType> <fileName>/e40/ggas1992010100rsn.grb</fileName> <parameterCode>203</parameterCode> <recordNumber>5</ recordNumber> <fileOffset>289412</fileOffset> </NDGGRIBExtract> 14 Climate Science Modelling Language NESC workshop Grid and Geospatial Standards 7-Sep-2005 Aggregated array • arrays may be aggregated along an ‘existing’ or ‘new’ dimension <NDGAggregatedArray gml:id="feat05cruisetrack"> <arraySize>2 50</arraySize> <aggType>new</aggType> <aggIndex>1</aggIndex> <component> <NDGNetCDFExtract> <arraySize>50</arraySize> <fileName>cruisetrack.nc</fileName> <variableName>alat</variableName> </NDGNetCDFExtract> </component> <component> <NDGNetCDFExtract> <arraySize>50</arraySize> <fileName>cruisetrack.nc</fileName> <variableName>alon</variableName> </NDGNetCDFExtract> </component> </NDGAggregatedArray> 15 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Climate Science Modelling Language NetCDF WCS WFS OPeNDAP .... instantiateNetCDF( DatasetID, FeatureID) <CSML> <CSML> <CSML> Provides semantic abstraction layer <CSML> (SAX) demarshalling CSMLAbstractFeature +writeNetCDF() AbstractFileExtract +read() filestore 16 Status: • • • • Climate Science Modelling Language NESC workshop Grid and Geospatial Standards 7-Sep-2005 Initial feature types defined First draft application schema complete Trial software tooling being coded (parser, netCDF instantiation) Initial deployment trial across BODC, BADC datasets Future: • • • • • Separate out wrapper implementation (array descriptors) Disallow ‘internal’ dictionaries More strongly-typed features? Follow (and pursue!) GML evolution, enhance compliance Expand tooling Related work • WMO, IOC, IHO • MarineXML • MOTIIVE (INSPIRE) 17 Using CSML NESC workshop Grid and Geospatial Standards 7-Sep-2005 EU project – MarineXML <gml:definitionMember> <om:Phenomenon gml:id="taxon"> <gml:description>The taxon name</gml:description> <gml:name codeSpace="http://www.vliz.be">taxon</gml:name> </om:Phenomenon> </gml:definitionMember> </NDGPhenomenonDefinitions> <!--===================================================================--> <gml:FeatureCollection> <!-- ============================================================== --> <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> 'ANTHOZOA',63.1,missing 'Scoloplos armiger',66.1,missing 'Spio filicornis',10,missing 'Spiophanes bombyx',60.3,missing 'Capitellidae',131.8,missing 'Pholoe',10,missing 'Owenia fusiformis',23.4,missing 'Hypereteone lactea',6.8,missing 'Anaitides groenlandica',13.2,missing 'Anaitides mucosa',6.8,missing “MarineXML is an initiative of the IOC/IODE of “... there is a momentum such UNESCO to improve marinefrom dataorganisations exchange within as“The IHOcommunity. and WMO adopt consistent the marine The European Commission NDG formatto proved a robust recipient for approaches for the vocabulary oftotheir data along has provided funding contribution initiative the dataa from each community. Itthis produced the implementation of ISOtoStandards as part ofreference its 5th Framework Programme economical files with few redundant elements, prescribed by thethe [Open undertake a ‘pre-standardisation’ task of striking about right Geospatial balance between weak Consortium]...” identifying the approaches and strong typing.” the marine community should adopt regarding XML technology to achieve improved data exchange.” 18 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Using CSML Managing semantics conceptual model Class1 -End1 -End2 «datatype» DataType1 1 * Class2 UGAS GML dataset parser <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> 'ANTHOZOA',63.1,missing 'Scoloplos armiger',66.1,missing 'Spio filicornis',10,missing 'Spiophanes bombyx',60.3,missing 'Capitellidae',131.8,missing GML app schema 19 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Using CSML «interface» SAX2::ContentHandler +setDocumentLocator() +startDocument() +endDocument() +startElement() +endElement() +characters() +processingInstruction() +ignorableWhitespace() +startPrefixMapping() +endPrefixMapping() +skippedEntity() «interface» SAX2::DTDHandler +notationDecl() +unparsedEntity() «interface» SAX2::ErrorHandler +error() +fatalError() +warning() SAX2::DefaultHandler «interface» SAX2::EntityResolver +resolveEntity() Stack of Builders (for UML meta-model) • • current class, object, attribute specialised for particular UMLXML mapping Builder receives: • • filtered SAX events built object Builder returns: • • • ObjectParser -process() builderStack * ObjectBuilder -currentClass -currentObject -currentAttribute +xmlElement() : BuilderUnion +xmlCharacters() : BuilderUnion +xmlAttribute() : BuilderUnion +processObject() : BuilderUnion ObjectPropertyBuilder «union» BuilderUnion AbstractClass AbstractObject AbstractBuilder ISO19118Builder built object new object class new Builder (for inheritance through substitutionGroups) 20