British Atmospheric Data Centre (BADC) Sam Pepler CSML slides stolen from Andrew

advertisement
British Atmospheric Data Centre
(BADC)
Sam Pepler
CSML slides stolen from Andrew
Woolf
Outline
• What is the BADC and how does it
work?
• Geospatial data at the BADC
• CSML
• Redefining datasets
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
2
What is the BADC
• NERC’s designated data centre
for atmospheric science.
• "The role of the British Atmospheric Data Centre
(BADC) is to assist UK atmospheric researchers to
locate, access and interpret atmospheric data and to
ensure the long-term integrity of atmospheric data
produced by Natural Environment Research Council
(NERC) projects.“
• Curation and Facilitation.
• http://badc.nerc.ac.uk/
• Part of NCAS
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
3
Primarily driven by
Facilitation
Datasets
• A bunch of files
sharing a common
administration.
• ~60TB
• 130 datasets
• From NERC
programmes, Met
Office, ECMWF,
NASA
Edinburgh, Oct 2006
Users
• Researchers
• 1730 active users
in last 12 months:
• Less than half
atmospheric
science
• 30% from
overseas
Maintaining Long-term Access to Geospatial Data
4
A & E influenza cases.
Pollution
chemistry
Discomfort
indices.
Ocean productivity
Castle mortar decay.
Atmospheric
chemistry
models.
Edinburgh, Oct 2006
Wind
power
Bird feeding habits.
research
Maintaining Long-term
Access to Geospatial Data
5
Data Sets
“A collection of files with
a common theme and
administration”
• Ground based
observation networks Met
Office surface stations
• Model output NWP, ECMWF
reanalyses & Climate
models
• Satellite data TOMS,
Envisat & MSG
• NERC programmes data
UTLS, CWVC & URGENT
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
6
User workflow
• Find data from web.
• Look at file naming convention and
workout what to get.
• Use web or FTP to get the data files.
• Simple tools available to subset and
plot some data.
• Go away and do research.
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
7
Archive Example
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
8
Geospatial Data
• Nearly all the data at
the BADC has
geospatial
information
• But it is not
represented in a
standard way
• Lots of types of
geospatial and
temporal things with
no clear
categorisation
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
9
Moving forward
• The current way of doing things makes it
hard to integrate data from other data
repositories…
• …, or other datasets…
• …, or even data from within the same dataset
sometimes!
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
10
NERC DataGrid (NDG)
British Atmospheric
Data Centre
Simulations
British
Oceanographic
Data Centre
Edinburgh, Oct 2006
Assimilation
Maintaining Long-term Access to Geospatial Data
11
Climate Science Modelling
Language (CSML)
• Data integration requirements:
– scalability across providers
– enhance access and use, ‘outwards-facing’ (e.g.
impacts community, policymakers)
– storage heterogeneity, many data providers,
many formats
• Semantics as integration ‘key’
– common language across providers (and users)
– supports wrapper/mediator architecture
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
12
Standards
•Emerging ISO standards
– TC211 – around 40 standards for
geographic information
•Geographic ‘features’
– “abstraction of real world
phenomena” [ISO 19101]
– Type or instance
– Encapsulate important
semantics in universe of
discourse
•Application schema
– Defines semantic content and
logical structure of datasets
– ISO standards provide toolkit:
•
•
•
•
spatial/temporal referencing
geometry (1-, 2-, 3-D)
topology
dictionaries (phenomena, units,
etc.)
– GML – canonical encoding
Edinburgh, Oct 2006
[from ISO 19109 “Geographic information –
Rules for Application Schema”]
Maintaining Long-term Access to Geospatial Data
13
Climate Science Modelling Language
• Feature type design principles:
– explicitly aim for small number of weakly-typed features (in
accordance with governance principle and NDG remit)
– ‘sensible plotting’ as discriminant
• ‘in-principle’ unsupervised portrayal
<measurement type=“Radiosonde”
measurand=“temperature”/>
abstract
<Sonde parameter=“temperature”/>
generic
<temperatureProfile/>
highly specialised
feature types spectrum
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
14
CSML Feature types
• defined on basis of geometric and topologic structure
CSML feature type
Description
Examples
TrajectoryFeature
Discrete path in time and space of a platform
or instrument.
PointFeature
Single point measurement.
ProfileFeature
Single ‘profile’ of some parameter along a
directed line in space.
ship’s cruise track,
aircraft’s flight path
raingauge
measurement
wind sounding, XBT,
CTD, radiosonde
GridFeature
Single time-snapshot of a gridded field.
gridded analysis field
PointSeriesFeature
Series of single datum measurements.
ProfileSeriesFeature
Series of profile-type measurements.
GridSeriesFeature
Timeseries of gridded parameter fields.
tidegauge, rainfall
timeseries
vertical or scanning
radar, shipborne ADCP,
thermistor chain
timeseries
numerical weather
prediction model, ocean
general circulation
model
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
15
Climate Science
Modelling Language
ProfileSeriesFeature
• CSML feature types
– examples...
ProfileFeature
GridFeature
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
16
Climate Science
Modelling Language
•Provides semantic
abstraction layer
•Provides ‘wrapper’
architecture for legacy
data files
•Composite design
pattern for aggregation
NetCDF
WCS
WFS
OPeNDAP
....
instantiateNetCDF(
DatasetID,
FeatureID)
<CSML>
<CSML>
<CSML>
<CSML>
(SAX) demarshalling
CSMLAbstractFeature
+writeNetCDF()
AbstractFileExtract
+read()
filestore
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
17
Datasets Redefined
• “A collection of files with a common theme and
administration”
• + Features are much better for data integration.
• + Features are a more natural thing to reference in
papers and other research communication.
• + Features don’t depend on format or physical storage
methods, potentially more migratable.
• + Features provide a clear definition of a datasets
scope.
• - Making features from files is lossy for metadata.
• - Making CSML files is not trivial. We are working on a
CSML scanner.
• ? How do I preserve features rather than files?
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
18
Summary
• We are going to use features to define
BADC datasets
• This should give us clarity for
referencing datasets and easier
integration
• This is not going to happen overnight.
We have just started producing CSML
for some of the easy datasets
• Questions?
Edinburgh, Oct 2006
Maintaining Long-term Access to Geospatial Data
19
Download