Scarp Investigating Our Digital landscape

advertisement
a centre of expertise in data curation and preservation
Scarp Investigating Our Digital landscape
1.The curation of earth observation data: an OAIS-based approach to
preservation analysis
2. Curating digital support materials for atmospheric science data
Funded by:
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK:
Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-ncsa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San
Francisco, California, 94105, USA.
a centre of expertise in data curation and preservation
High level Data Survey
•
•
•
•
•
•
•
•
World Data Centre
EISCAT
British Atmospheric Data Centre
ISIS
Diamond Light Source
Central laser Facility
Epubs
Tier 1
a centre of expertise in data curation and preservation
Analysed with high level data
maps
a centre of expertise in data curation and preservation
Data set specific
- Iononosonde
- MST
- Eiscat
a centre of expertise in data curation and preservation
CASPAR Questionaire
•
Information/Performance/Behaviour does your current user extract from this data and what needs
preserving?
•
What information do you provides to a new data user and what support do you give them during the use of
the data.
•
A clear definition for the information contained in the dataset
•
How is the digitally encoded information ingested into the repository
•
How is the required data currently located and accessed
•
Are there any access restrictions
•
Identify common ”domain objects” currently used/are these objects special cases of simpler objects
•
What Information is required to reconstruct the information objects or reproduce the performance or
duplicate the required behaviour?
•
Structure Representation Information
•
Semantic Representation Information
•
How is the data physically stored?
•
Are there any additional preservation requirements?
a centre of expertise in data curation and preservation
Stakeholder analysis
•
•
•
•
•
Funding Bodies
Scientific Organisations
Data Producers
Scientists in the Community
Data Archivist
a centre of expertise in data curation and preservation
Impact of Archive evolution and management
a centre of expertise in data curation and preservation
Preservation Data Flows and
strategies
a centre of expertise in data curation and preservation
MST simple scenario – As a simple record of wind sped and trajectory
above Aberystwyth
a centre of expertise in data curation and preservation
MST complex – support atmospheric study and climate modelling on a
global scale.
1. Permitting study of
the following
2. Precipitation
Convection
Gravity Waves
Rossby Waves
Mesoscale and
Microscale
Structures
.Fallstreak Clouds
Ozone Layering
a centre of expertise in data curation and preservation
Ionosonde simple scenario
a centre of expertise in data curation and preservation
Ionsonde complex scenario - requiring raw data, instrument
provenance, data provenance related to scaling of
parameters, software technical manuals, bibliographies
journal articles
Eiscat Simple – Standard program rslt files with basic
description of integration and analysis
Eiscat Complex – Special program reanalysis scenario, raw
data capturing the ability to reprocess, operational
provenance and scientific intent outcome within scientific
experimental proposals and output.
a centre of expertise in data curation and preservation
Wide ranging discipline
specific information Survey
- 10 data sets inspected
- Over 1000 files manually read
- Over 3000 OAIS relationships
classified
a centre of expertise in data curation and preservation
a centre of expertise in data curation and preservation
Atmospheric Datasets
a centre of expertise in data curation and preservation
Signifigant Properties of software
The BADC has substantial data holdings of its own and also provides information
and links to data held by other data centres. The data held at the BADC are
of two types:
•
Datasets produced by NERC-funded projects; these datasets are of high
priority since the BADC may be the only long-term archive of the data.
•
Third party datasets that are required by a large section of the UK
atmospheric research community and are most efficiently made available
through one location (e.g. Met Office and ECMWF datasets).
The BADC therefore develops, supports, supplies and provides access to a variety
of software necessary to locate access and interpret this atmospheric data.
The BADCwould categorise the types of software it interacts with in the
following ways
•
•
•
•
•
•
Software which it utilises to facilitate the direct discovery, permit remote or
local access to data
Software which processes archived data for the “on-the-fly” provision of
processed data product
Generic Analysis tools
Large Scale Modelling specifically the Met Office Unified Model
Data Set Specific software tools and scripts which are informally archived
Community based models and analysis tools
a centre of expertise in data curation and preservation
Software examples inspected
1.The BADC website www.badc.rl.ac.uk
2.SSH clients and localised processing of data
3.Trajectories
4.Data Extractor
5 Geosplat
6.Xconvsh/convsh
7.GrADS
8.CDAT
9.Met Office Ported Unified Model
10.Data Set Specific software tools and scripts
11. MST data plotting software
12 .Collected scripts instinctive in organic collection
13.Community based models and analysis tools
a centre of expertise in data curation and preservation
Repository Solutions?
• What the functional requirements?
• Should we collaborate or build our own?
• What are the legal copyright issues ?
a centre of expertise in data curation and preservation
Repository scope: desired and
required research deposit types
•
The core content intended for capture by an E-prints repository can be characterised by the following
deposit types
•
Thesis or Dissertations
•
Research Papers
•
Pre-Prints
•
Reports
•
Working Papers
•
Conference Papers
It was felt that this type of traditional research output should be reasonably in scope for capture within an EPrints repository. We have noted that NCAS produces other types of digital materials which could
contribute to the understanding of atmospheric science. Some examples of this type of information we
have identified are
•
Software including code, documentation. description of algorithms and support materials for use of software
•
File format descriptions
•
Data dictionaries, thesauri and informal semantic descriptions
•
Data provenance information including technical manuals calibration and operational information
•
WebPages including support materials, educational materials, non technical documents for consumption by
general audience, information packs and background documents
•
Subject specific bibliographies and texts
a centre of expertise in data curation and preservation
Advantages of collaborating with the NERC Open Research
Archive (NORA)
This repository currently permits deposit by the
following NERC research centres
• The Proudman Oceanographic Institute
• British Geological Survey
• Centre for Ecology and Hydrology
• British Antarctic Survey
• NORA is now in a position where it could allow a
wider range of scientists including NCAS to use this
repository, where NCAS would be an additional
depositing centre.
a centre of expertise in data curation and preservation
Software Selection Options
There are number of options open to an organisation some of which we
looked at included Eprints, DSpace, CDSWare, Fedora, I-ToR,
MyCorEe, MPGeDoc, ARNO and Epubs and there is of course the
possibility of writing our own bespoke solution.
We though it essential that the NCAS institutional repository software be
• OAI Compliant
• Open Source
• Use established technology
• Should be well supported and easy to maintain
• Easily configurable to needs of NCAS
• High degree of acceptance by the target user community
• It was felt that the E-prints software most closely met these
requirements It also has the advantage of strong advocacy and support
services surrounding E-prints which is currently endorsed by
organisations such JISC, E-prints support services, DCC and NERC
a centre of expertise in data curation and preservation
Questions ?
Download