British Atmospheric Data Centre (BADC) Sam Pepler CSML slides stolen from Andrew Woolf Outline • What is the BADC and how does it work? • Geospatial data at the BADC • CSML • Redefining datasets Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 2 What is the BADC • NERC’s designated data centre for atmospheric science. • "The role of the British Atmospheric Data Centre (BADC) is to assist UK atmospheric researchers to locate, access and interpret atmospheric data and to ensure the long-term integrity of atmospheric data produced by Natural Environment Research Council (NERC) projects.“ • Curation and Facilitation. • http://badc.nerc.ac.uk/ • Part of NCAS Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 3 Primarily driven by Facilitation Datasets • A bunch of files sharing a common administration. • ~60TB • 130 datasets • From NERC programmes, Met Office, ECMWF, NASA Edinburgh, Oct 2006 Users • Researchers • 1730 active users in last 12 months: • Less than half atmospheric science • 30% from overseas Maintaining Long-term Access to Geospatial Data 4 A & E influenza cases. Pollution chemistry Discomfort indices. Ocean productivity Castle mortar decay. Atmospheric chemistry models. Edinburgh, Oct 2006 Wind power Bird feeding habits. research Maintaining Long-term Access to Geospatial Data 5 Data Sets “A collection of files with a common theme and administration” • Ground based observation networks Met Office surface stations • Model output NWP, ECMWF reanalyses & Climate models • Satellite data TOMS, Envisat & MSG • NERC programmes data UTLS, CWVC & URGENT Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 6 User workflow • Find data from web. • Look at file naming convention and workout what to get. • Use web or FTP to get the data files. • Simple tools available to subset and plot some data. • Go away and do research. Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 7 Archive Example Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 8 Geospatial Data • Nearly all the data at the BADC has geospatial information • But it is not represented in a standard way • Lots of types of geospatial and temporal things with no clear categorisation Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 9 Moving forward • The current way of doing things makes it hard to integrate data from other data repositories… • …, or other datasets… • …, or even data from within the same dataset sometimes! Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 10 NERC DataGrid (NDG) British Atmospheric Data Centre Simulations British Oceanographic Data Centre Edinburgh, Oct 2006 Assimilation Maintaining Long-term Access to Geospatial Data 11 Climate Science Modelling Language (CSML) • Data integration requirements: – scalability across providers – enhance access and use, ‘outwards-facing’ (e.g. impacts community, policymakers) – storage heterogeneity, many data providers, many formats • Semantics as integration ‘key’ – common language across providers (and users) – supports wrapper/mediator architecture Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 12 Standards •Emerging ISO standards – TC211 – around 40 standards for geographic information •Geographic ‘features’ – “abstraction of real world phenomena” [ISO 19101] – Type or instance – Encapsulate important semantics in universe of discourse •Application schema – Defines semantic content and logical structure of datasets – ISO standards provide toolkit: • • • • spatial/temporal referencing geometry (1-, 2-, 3-D) topology dictionaries (phenomena, units, etc.) – GML – canonical encoding Edinburgh, Oct 2006 [from ISO 19109 “Geographic information – Rules for Application Schema”] Maintaining Long-term Access to Geospatial Data 13 Climate Science Modelling Language • Feature type design principles: – explicitly aim for small number of weakly-typed features (in accordance with governance principle and NDG remit) – ‘sensible plotting’ as discriminant • ‘in-principle’ unsupervised portrayal <measurement type=“Radiosonde” measurand=“temperature”/> abstract <Sonde parameter=“temperature”/> generic <temperatureProfile/> highly specialised feature types spectrum Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 14 CSML Feature types • defined on basis of geometric and topologic structure CSML feature type Description Examples TrajectoryFeature Discrete path in time and space of a platform or instrument. PointFeature Single point measurement. ProfileFeature Single ‘profile’ of some parameter along a directed line in space. ship’s cruise track, aircraft’s flight path raingauge measurement wind sounding, XBT, CTD, radiosonde GridFeature Single time-snapshot of a gridded field. gridded analysis field PointSeriesFeature Series of single datum measurements. ProfileSeriesFeature Series of profile-type measurements. GridSeriesFeature Timeseries of gridded parameter fields. tidegauge, rainfall timeseries vertical or scanning radar, shipborne ADCP, thermistor chain timeseries numerical weather prediction model, ocean general circulation model Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 15 Climate Science Modelling Language ProfileSeriesFeature • CSML feature types – examples... ProfileFeature GridFeature Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 16 Climate Science Modelling Language •Provides semantic abstraction layer •Provides ‘wrapper’ architecture for legacy data files •Composite design pattern for aggregation NetCDF WCS WFS OPeNDAP .... instantiateNetCDF( DatasetID, FeatureID) <CSML> <CSML> <CSML> <CSML> (SAX) demarshalling CSMLAbstractFeature +writeNetCDF() AbstractFileExtract +read() filestore Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 17 Datasets Redefined • “A collection of files with a common theme and administration” • + Features are much better for data integration. • + Features are a more natural thing to reference in papers and other research communication. • + Features don’t depend on format or physical storage methods, potentially more migratable. • + Features provide a clear definition of a datasets scope. • - Making features from files is lossy for metadata. • - Making CSML files is not trivial. We are working on a CSML scanner. • ? How do I preserve features rather than files? Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 18 Summary • We are going to use features to define BADC datasets • This should give us clarity for referencing datasets and easier integration • This is not going to happen overnight. We have just started producing CSML for some of the easy datasets • Questions? Edinburgh, Oct 2006 Maintaining Long-term Access to Geospatial Data 19