NERC DataGrid data model and its application

advertisement
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
NERC DataGrid data model and its
application
Andrew Woolf1 (A.Woolf@rl.ac.uk), Ray Cramer2,
Marta Gutierrez3, Kerstin Kleese van Dam1, Siva
Kondapalli2, Susan Latham3, Bryan Lawrence3, Roy
Lowry2, Kevin O’Neill1, Ag Stephens3
1 CCLRC
e-Science Centre
2 British Oceanographic Data Centre
3 British Atmospheric Data Centre
1
Outline
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
NERC DataGrid – data integration problem
Semantics as integration key
CSML
Wrapper/mediator architecture
Use and future
2
NERC DataGrid
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
British Atmospheric
Data Centre
Simulations
British
Oceanographic
Data Centre
Assimilation
3
NDG data integration
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Most (but not all) NDG data is file-based…
On the Grid, no-one should know if you’re a file or
relational table… (one service to bind them all)
The file problem
• multiple formats
• focus usually on container, not content
Scientific file format examples (earth sciences):
•
•
•
•
•
•
netCDF
HDF4
HDF5
GRIB
NASA Ames
...
4
NDG data integration
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
5
NDG data integration
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Typically, API is fundamental point of reference
•
•
•
•
binary format details not always exposed (or guaranteed)
public API often the only supported access mechanism
API typically implemented as optimised native library
why reinvent a well-known working interface?
Data Format Description Language (DFDL)
• XML ‘facade’ to file formats
• earth science files often giga-scale  XML query interface not
likely to be efficient
• encapsulating format not the issue for NDG...
• ...integrating domain-specific semantics efficiently across files and
formats is!
6
NDG data integration
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Information and file contents
• same information in different file formats – want to expose
information, not format (seen earlier)
• in addition, semantic information structures may be composed
across files
7
Integration – semantics
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Want semantic access to information, not abstract data
• getData(potential temperature from ERA-40 dataset
in North Atlantic from 1990 to 2000)
• not: getData(“era40.nc”, ‘PTMP’, 20:50, 300:340,
190:200)
• or even worse:
for j=1990:2000
getData(“era40_”+j+“.nc”, ‘PTMP’, 20:50,
300:340)
Lossy is OK!
• Care less about completeness of representation than semantic
unification
8
NDG data integration
Integration approaches:
warehousing
Integration approaches:
wrapper/mediator
dataInterface
NDG
mediatorService
wrapper
wrapper
BODC
BADC
n
co
io
rs
ve
n
BODC
co
n
dataInterface
mediatorService
at
rm
fo
at
rm
fo
o
si
er
nv
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
BADC
9
Integration – semantics
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Summary:
What we require is
• semantic access to information (within and across files);
• and to use native (well-known) efficient APIs under the covers
also:
• scalability across providers
• warehousing not an option (tera-scale!)
• enhance access and use, ‘outwards-facing’ (e.g. impacts
community, policymakers)
• storage heterogeneity
10
Integration – semantics
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Database data modelling
•
•
•
•
Relational model (Codd, 1970)
Entity-relationship model (Chen, 1976)
Semantic data models
Object-oriented data models (inheritance, aggregation, behaviour)
File-based data modelling
• Far less advanced
• Abstract models (‘variables’, ‘arrays’, etc.; no ‘object’ file formats in
widespread use for earth science data)
• API-driven
11
Integration – semantics
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Fundamentally, an information community is defined by
shared semantics
• semantics often (but not always) implicit
• use information semantics for data integration
 Semantics as integration ‘key’
• common language across providers (and users)
• supports wrapper/mediator architecture
NDG Solution components:
• semantic data model (Climate Science Modelling Language)
• storage descriptor (wrapper)
• data services (mediator)
12
CSML
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Geographic ‘features’
•
•
•
“abstraction of real world
phenomena” [ISO 19101]
Object models for data types – type
or instance
Encapsulate important semantics in
universe of discourse
Application schema
•
•
•
Defines semantic content and
logical structure of datasets
ISO standards provide conceptual
toolkit:
• spatial/temporal referencing
• geometry (1-, 2-, 3-D)
• topology
• dictionaries (phenomena, units,
etc.)
GML – canonical encoding
[from ISO 19109 “Geographic information –
Rules for Application Schema”]
13
CSML
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
CSML aims:
• provide semantic integration mechanism for NDG data
• explore new standards-based interoperability framework
• emphasise content, not container
Design principles:
• offload semantics onto parameter type (‘phenomenon’, observable,
measurand)
• e.g. wind-profiler, balloon temperature sounding
• offload semantics onto CRS
• e.g. scanning radar, sounding radar
• ‘sensible plotting’ as discriminant
• ‘in-principle’ unsupervised portrayal
• explicitly aim for small number of weakly-typed features (in accordance
with governance principle and NDG remit)
14
CSML
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Semantic data model
• Climate Science Modelling Language (CSML),
http://ndg.nerc.ac.uk/csml
• Weakly-typed conceptual models for range of information types
• Independent of storage concerns
• Based on ISO ‘geographic feature types’ framework
• Defined on basis of geometric and topologic structure
CSML feature type
Description
TrajectoryFeature
Discrete path in time and space of a platform or instrument.
PointFeature
GridFeature
PointSeriesFeature
Single point measurement.
Single ‘profile’ of some parameter along a directed line in
space.
Single time-snapshot of a gridded field.
Series of single datum measurements.
ProfileSeriesFeature
Series of profile-type measurements.
GridSeriesFeature
Timeseries of gridded parameter fields.
ProfileFeature
Examples
ship’s cruise track, aircraft’s
flight path
raingauge measurement
wind sounding, XBT, CTD,
radiosonde
gridded analysis field
tidegauge, rainfall timeseries
vertical or scanning radar,
shipborne ADCP, thermistor
chain timeseries
numerical weather prediction
model, ocean general
circulation model
15
CSML
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
CSML feature type examples:
ProfileSeriesFeature
ProfileFeature
GridFeature
16
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Wrapper
Numerical array descriptors
•
•
•
«Type»
GML::AbstractGMLType
+id
+metaDataProperty
+description
+name
provides ‘wrapper’ architecture
for legacy data files
proxy for numerical content
within feature instances
‘Connected’ to data model
numerical content through
‘xlink:href’
«Type»
AbstractArrayDescriptor
+arraySize[1]
+uom[0..1]
+numericType[0..1]
+numericTransform[0..1]
+regExpTransform[0..1]
*
+component
1
«Type»
AggregatedArray
+aggType[1]
+aggIndex[1]
Three subtypes:
•
•
•
InlineArray
ArrayGenerator
FileExtract (NASAAmes,
NetCDF, GRIB)
Composite design pattern for
aggregation
«Type»
InlineArray
+values[*]
«Type»
ArrayGenerator
+expression[1]
«Type»
NASAAmesExtract
+variableName[1]
+index[0..1]
«Type»
AbstractFileExtract
+fileName[1]
«Type»
NetCDFExtract
+variableName[1]
«Type»
GRIBExtract
+parameterCode[1]
+recordNumber[0..1]
+fileOffset[0..1]
17
Wrapper
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
File extract examples:
<NDGNASAAmesExtract>
<arraySize>526</arraySize>
<numericType>double</numericType>
<fileName>/data/BADC/macehead/mh960606.cf1</fileName>
<variableName>CFC-12</variableName>
</NDGNASAAmesExtract>
<NDGNetCDFExtract gml:id="feat04azimuth">
<arraySize>10000</arraySize>
<fileName>radar_data.nc</fileName>
<variableName>az</variableName>
</NDGNetCDFExtract>
<NDGGRIBExtract>
<arraySize>320 160</arraySize>
<numericType>double</numericType>
<fileName>/e40/ggas1992010100rsn.grb</fileName>
<parameterCode>203</parameterCode>
<recordNumber>5</ recordNumber>
<fileOffset>289412</fileOffset>
</NDGGRIBExtract>
18
Wrapper
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Aggregated array
• arrays may be aggregated along an ‘existing’ or ‘new’ dimension
<AggregatedArray gml:id="globaltemperature">
<arraySize>180 360</arraySize>
<aggType>existing</aggType>
<aggIndex>1</aggIndex>
<component>
<NetCDFExtract>
<arraySize>90 360</arraySize>
<fileName>northern_hemisphere.nc</fileName>
<variableName>TMP</variableName>
</NetCDFExtract>
</component>
<component>
<NetCDFExtract>
<arraySize>90 360</arraySize>
<fileName>southern_hemisphere.nc</fileName>
<variableName>TMP</variableName>
</NetCDFExtract>
</component>
</AggregatedArray>
19
Mediator
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Data services (mediator)
• Data services expose semantic model:
• Mappings to third-party data models (e.g. file formats,
OPeNDAP)
• Canonical serialisation (e.g. ISO 19118 UML  XML mapping)
– Geography Markup Language
• Example services:
• netCDF file instantiation
• OPeNDAP delivery
• Open Geospatial Consortium (OGC) web services, e.g. Web
Feature Service, Web Coverage Service
• Pushed down to the file level, data access request should use
optimised native file format-specific I/O
20
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Mediator
NetCDF
WCS
WFS
OPeNDAP
....
instantiateNetCDF(
DatasetID,
FeatureID)
<CSML>
<CSML>
<CSML>
Provides semantic
abstraction layer
<CSML>
(SAX) demarshalling
CSMLAbstractFeature
+writeNetCDF()
AbstractFileExtract
+read()
filestore
21
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Using CSML
Example of CSML use – MarineXML
Data from different
parts of the marine
community conforming
to a variety of schema
(XSD)
XSD
For each XSD (for the
source data) there is an
XSLT to translate the
data to the Feature
Types (FT) defined by
CSML. The FT’s and
XSLT are maintained in a
‘MarineXML registry’
XML
Biological
Species
Phenomena in the
XSD must have
an associated
portrayal
The FTs can then
be translated to
equivalent FTs for
display in the
ECDIS system
S52 Portrayal
Library
XSD
XML
XSLT
XSD
XML
XML
Parser
Chl-a from
Satellite
XSLT
XSLT
XSLT
Marine
XML
GML
(NDG)
XML
Feature
Types
Measured
Hydrodynamics
XSD
XML
Modelled
Hydrodynamics
The result of the
translation is an
encoding that
contains the
marine data in
weakly typed (i.e.
generic) Features
XSLT
SENC
SeeMyDENC
XSLT
ECDIS acts as an
example client for
the data.
with thanks to
Keiran Millard,
HR Wallingford
Data Dictionary
Features in the source
XSD must be present in
the data dictionary.
22
Using CSML
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
EU project – MarineXML
<gml:definitionMember>
<om:Phenomenon gml:id="taxon">
<gml:description>The taxon name</gml:description>
<gml:name codeSpace="http://www.vliz.be">taxon</gml:name>
</om:Phenomenon>
</gml:definitionMember>
</NDGPhenomenonDefinitions>
<!--===================================================================-->
<gml:FeatureCollection>
<!-- ============================================================== -->
<gml:featureMember>
<NDGPointFeature gml:id="ICES_100">
<NDGPointDomain>
<domainReference>
<NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree">
<location>55.25 6.5</location>
</NDGPosition>
</domainReference>
</NDGPointDomain>
<gml:rangeSet>
<gml:DataBlock>
<gml:rangeParameters>
<gml:CompositeValue>
<gml:valueComponents>
<gml:measure uom="#tn"/>
<gml:measure uom="#amount"/>
<gml:measure uom="#gsm"/>
</gml:valueComponents>
</gml:CompositeValue>
</gml:rangeParameters>
<gml:tupleList>
'ANTHOZOA',63.1,missing
'Scoloplos armiger',66.1,missing
'Spio filicornis',10,missing
'Spiophanes bombyx',60.3,missing
'Capitellidae',131.8,missing
'Pholoe',10,missing
'Owenia fusiformis',23.4,missing
'Hypereteone lactea',6.8,missing
'Anaitides groenlandica',13.2,missing
'Anaitides mucosa',6.8,missing
“MarineXML is an initiative of the IOC/IODE of
“... there
is a momentum
such
UNESCO
to improve
marinefrom
dataorganisations
exchange within
as“The
IHOcommunity.
and WMO
adopt
consistent
the marine
The
European
Commission
NDG
formatto
proved
a robust
recipient for
approaches
for the
vocabulary
oftotheir
data along
has provided
funding
contribution
initiative
the dataa from
each
community.
Itthis
produced
the
implementation
of ISOtoStandards
as part
ofreference
its 5th Framework
Programme
economical
files with few
redundant
elements,
prescribed
by thethe
[Open
undertake
a ‘pre-standardisation’
task
of
striking
about
right Geospatial
balance
between
weak
Consortium]...”
identifying
the approaches
and strong
typing.” the marine community
should adopt regarding XML technology to achieve
improved data exchange.”
23
Conclusions/future
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Conclusions
• Mechanism is lossy, in general
•  semantic integration is far more important than
completeness of representation
• Emphasis on content, not container
• Mediator services can expose data model
• Well-known community formats – use efficient legacy APIs
• Initial semantic decoration can add context to entire workflow chain
• Loose relationship between legacy file data model and semantic
(feature) instance to which it is mapped
24
Conclusions/future
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Current and future work (NDG)
• Implement tooling:
• CSML parsing/processing
• Automated ‘scanner’: {files}  CSML
• Implement NDG data delivery (mediator) services layered over data
model
Further perspectives
• Integrate with broader interoperability frameworks (e.g. ‘semantics
repositories’ Feature Type Catalogues – WMO, IOC, INSPIRE)
• Generalise approach:
• meta-model for data modelling
• ‘data storage description language’ for file mappings (DFDL role?)
• canonicalised serialisation for workflows
25
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Conclusions/future
Managing semantics
conceptual model
Class1
-End1
-End2
«datatype»
DataType1
1
define data
models
*
Class2
auto-generate
XSD
GML dataset
auto-generated
parser
<gml:featureMember>
<NDGPointFeature gml:id="ICES_100">
<NDGPointDomain>
<domainReference>
<NDGPosition srsName="urn:EPSG:geographicCRS:4979"
axisLabels="Lat Long" uomLabels="degree degree">
<location>55.25 6.5</location>
</NDGPosition>
</domainReference>
</NDGPointDomain>
<gml:rangeSet>
<gml:DataBlock>
<gml:rangeParameters>
<gml:CompositeValue>
<gml:valueComponents>
<gml:measure uom="#tn"/>
<gml:measure uom="#amount"/>
<gml:measure uom="#gsm"/>
</gml:valueComponents>
</gml:CompositeValue>
</gml:rangeParameters>
<gml:tupleList>
'ANTHOZOA',63.1,missing
'Scoloplos armiger',66.1,missing
'Spio filicornis',10,missing
'Spiophanes bombyx',60.3,missing
'Capitellidae',131.8,missing
GML app schema
populate
dataset instances26
Spatiotemporal Databases
e-Science Institute, Edinburgh
01-Nov-2005
Conclusions/future
Parser:
«interface»
SAX2::ContentHandler
+setDocumentLocator()
+startDocument()
+endDocument()
+startElement()
+endElement()
+characters()
+processingInstruction()
+ignorableWhitespace()
+startPrefixMapping()
+endPrefixMapping()
+skippedEntity()
«interface»
SAX2::DTDHandler
+notationDecl()
+unparsedEntity()
«interface»
SAX2::ErrorHandler
+error()
+fatalError()
+warning()
SAX2::DefaultHandler
«interface»
SAX2::EntityResolver
+resolveEntity()
Stack of Builders (for UML
meta-model)
•
•
current class, object, attribute
specialised for particular
UMLXML mapping
Builder receives:
•
•
filtered SAX events
built object
Builder returns:
•
•
•
ObjectParser
-process()
builderStack
*
ObjectBuilder
-currentClass
-currentObject
-currentAttribute
+xmlElement() : BuilderUnion
+xmlCharacters() : BuilderUnion
+xmlAttribute() : BuilderUnion
+processObject() : BuilderUnion
ObjectPropertyBuilder
«union»
BuilderUnion
AbstractClass
AbstractObject
AbstractBuilder
ISO19118Builder
built object
new object class
new Builder (for inheritance
through substitutionGroups)
27
Download