a centre of expertise in data curation and preservation Scarp Investigating Our Digital landscape 1.The curation of earth observation data: an OAIS-based approach to preservation analysis 2. Curating digital support materials for atmospheric science data Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-ncsa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. a centre of expertise in data curation and preservation High level Data Survey • • • • • • • • World Data Centre EISCAT British Atmospheric Data Centre ISIS Diamond Light Source Central laser Facility Epubs Tier 1 a centre of expertise in data curation and preservation Analysed with high level data maps a centre of expertise in data curation and preservation Data set specific - Iononosonde - MST - Eiscat a centre of expertise in data curation and preservation CASPAR Questionaire • Information/Performance/Behaviour does your current user extract from this data and what needs preserving? • What information do you provides to a new data user and what support do you give them during the use of the data. • A clear definition for the information contained in the dataset • How is the digitally encoded information ingested into the repository • How is the required data currently located and accessed • Are there any access restrictions • Identify common ”domain objects” currently used/are these objects special cases of simpler objects • What Information is required to reconstruct the information objects or reproduce the performance or duplicate the required behaviour? • Structure Representation Information • Semantic Representation Information • How is the data physically stored? • Are there any additional preservation requirements? a centre of expertise in data curation and preservation Stakeholder analysis • • • • • Funding Bodies Scientific Organisations Data Producers Scientists in the Community Data Archivist a centre of expertise in data curation and preservation Impact of Archive evolution and management a centre of expertise in data curation and preservation Preservation Data Flows and strategies a centre of expertise in data curation and preservation MST simple scenario – As a simple record of wind sped and trajectory above Aberystwyth a centre of expertise in data curation and preservation MST complex – support atmospheric study and climate modelling on a global scale. 1. Permitting study of the following 2. Precipitation Convection Gravity Waves Rossby Waves Mesoscale and Microscale Structures .Fallstreak Clouds Ozone Layering a centre of expertise in data curation and preservation Ionosonde simple scenario a centre of expertise in data curation and preservation Ionsonde complex scenario - requiring raw data, instrument provenance, data provenance related to scaling of parameters, software technical manuals, bibliographies journal articles Eiscat Simple – Standard program rslt files with basic description of integration and analysis Eiscat Complex – Special program reanalysis scenario, raw data capturing the ability to reprocess, operational provenance and scientific intent outcome within scientific experimental proposals and output. a centre of expertise in data curation and preservation Wide ranging discipline specific information Survey - 10 data sets inspected - Over 1000 files manually read - Over 3000 OAIS relationships classified a centre of expertise in data curation and preservation a centre of expertise in data curation and preservation Atmospheric Datasets a centre of expertise in data curation and preservation Signifigant Properties of software The BADC has substantial data holdings of its own and also provides information and links to data held by other data centres. The data held at the BADC are of two types: • Datasets produced by NERC-funded projects; these datasets are of high priority since the BADC may be the only long-term archive of the data. • Third party datasets that are required by a large section of the UK atmospheric research community and are most efficiently made available through one location (e.g. Met Office and ECMWF datasets). The BADC therefore develops, supports, supplies and provides access to a variety of software necessary to locate access and interpret this atmospheric data. The BADCwould categorise the types of software it interacts with in the following ways • • • • • • Software which it utilises to facilitate the direct discovery, permit remote or local access to data Software which processes archived data for the “on-the-fly” provision of processed data product Generic Analysis tools Large Scale Modelling specifically the Met Office Unified Model Data Set Specific software tools and scripts which are informally archived Community based models and analysis tools a centre of expertise in data curation and preservation Software examples inspected 1.The BADC website www.badc.rl.ac.uk 2.SSH clients and localised processing of data 3.Trajectories 4.Data Extractor 5 Geosplat 6.Xconvsh/convsh 7.GrADS 8.CDAT 9.Met Office Ported Unified Model 10.Data Set Specific software tools and scripts 11. MST data plotting software 12 .Collected scripts instinctive in organic collection 13.Community based models and analysis tools a centre of expertise in data curation and preservation Repository Solutions? • What the functional requirements? • Should we collaborate or build our own? • What are the legal copyright issues ? a centre of expertise in data curation and preservation Repository scope: desired and required research deposit types • The core content intended for capture by an E-prints repository can be characterised by the following deposit types • Thesis or Dissertations • Research Papers • Pre-Prints • Reports • Working Papers • Conference Papers It was felt that this type of traditional research output should be reasonably in scope for capture within an EPrints repository. We have noted that NCAS produces other types of digital materials which could contribute to the understanding of atmospheric science. Some examples of this type of information we have identified are • Software including code, documentation. description of algorithms and support materials for use of software • File format descriptions • Data dictionaries, thesauri and informal semantic descriptions • Data provenance information including technical manuals calibration and operational information • WebPages including support materials, educational materials, non technical documents for consumption by general audience, information packs and background documents • Subject specific bibliographies and texts a centre of expertise in data curation and preservation Advantages of collaborating with the NERC Open Research Archive (NORA) This repository currently permits deposit by the following NERC research centres • The Proudman Oceanographic Institute • British Geological Survey • Centre for Ecology and Hydrology • British Antarctic Survey • NORA is now in a position where it could allow a wider range of scientists including NCAS to use this repository, where NCAS would be an additional depositing centre. a centre of expertise in data curation and preservation Software Selection Options There are number of options open to an organisation some of which we looked at included Eprints, DSpace, CDSWare, Fedora, I-ToR, MyCorEe, MPGeDoc, ARNO and Epubs and there is of course the possibility of writing our own bespoke solution. We though it essential that the NCAS institutional repository software be • OAI Compliant • Open Source • Use established technology • Should be well supported and easy to maintain • Easily configurable to needs of NCAS • High degree of acceptance by the target user community • It was felt that the E-prints software most closely met these requirements It also has the advantage of strong advocacy and support services surrounding E-prints which is currently endorsed by organisations such JISC, E-prints support services, DCC and NERC a centre of expertise in data curation and preservation Questions ?