Data Management Strategy - Soil Biodiversity Programme

advertisement
Soil
Biodiversity
SB/DMP/V2/6-00
NERC Thematic Programme
SOIL BIODIVERSITY THEMATIC PROGRAMME
DATA MANAGEMENT PLAN
"It is NERC Policy that all Scientific Programmes must plan adequately for the management of the data they
will collect. The planning must cover not only the practical arrangements while the programme is running,
but also longer-term stewardship of the data thereafter. Properly managed, the data will contribute to a key
NERC resource that will continue to be exploited scientifically and commercially long after the formal end
of the programme." (Natural Environment Research Council (NERC), 1999a).
1. Programme Datasets
The Soil Biodiversity Thematic Programme has a 5-year life-span, during which time it will generate data
sets through baseline studies and individual project awards. These data sets will need to be accessible by
scientists during the Programme, and as data products after the Programme. A strategy is required to
resource and manage these activities. The three principal information components to be considered are:

Baseline data from the main field site at Sourhope and from the Ecotron at the Centre for
Population Biology, Imperial College.

Project data collected/generated by award-holders

Meta-information about experimental plots, baseline data, project data and documentation, and
administrative information (who’s doing what where and when).
The Programme does not envisage much demand for access to external databases. However, data collected
at Sourhope under the Environmental Change Network (ECN) Programme will be useful baseline
information for the site and should be integrated as far as possible. Links with other major data sources
should be considered.
2. Soil Biodiversity Programme Data Policy
A Data Policy for the Soil Biodiversity Programme has been developed by the Data and Modelling Subcommittee which takes into account the requirements and principles of the NERC Data Policy (NERC,
1999b). The main points of the Policy are as follows:

Data ownership will follow the NERC guidelines which state: "When NERC pays a grant to an HEI
(or other eligible body) to do research, the default position is that the resulting Intellectual Property
(IP) will belong to the HEI, and that any data collected, being part of that IP, will therefore likewise
be owned by the HEI."

The Environmental Information Centre (EIC) is the most appropriate NERC Designated Data
Centre for the central database and data co-ordination.

Baseline data (soils, vegetation, meteorological data) from the field site and Ecotron are to be held
in a central database for access by all award-holders. During the Programme, baseline data in
summary form will be made freely available via the Web; high resolution ('raw') data will be
available to Soil Biodiversity Programme members only.

Meta-information on all data is to be held centrally, and the structure and content is to be based on
standard protocols.
SB/DMP/V2/6-00

To comply with NERC data policy on long-term stewardship and data availability, PIs must offer to
deposit with the EIC (via the Programme's central database) a copy of the data and meta-data
resulting from their projects when completed, but without prejudice to the intellectual property
rights (IPR) on those data. However, PIs are strongly encouraged to provide project data for the
central database as soon as possible during the period of the project in order to facilitate crossproject integration and to provide for the long-term stewardship of Soil Biodiversity data.

PI teams will be permitted a period of 2 years after termination of their award to work exclusively
on, and publish the results of, the data collected during the project. After this time, project datasets
will be advertised through NERC directories, and may be made available for bona-fide research
purposes.

Raw project data for bona-fide research purposes will be released under licence; requests for the
raw data for other purposes will be referred to the HEI holding the IPR for that data. Summary data
outputs from the Programme may be made more openly available as 'data products' by agreement
through the Soil Biodiversity Programme Steering Committee.

Data model development, protocols for data and meta-data handling, quality assurance (QA) of
information and maintenance of databases should be carried out by a database specialist in
accordance with sampling protocols and through liaison with award-holders and specialists.

The Programme will provide a unique database of considerable interest for research and education.
This will be made available subject to the access controls described above. Ideally access to these
data and to derived data products should be made available via user-friendly front-ends, via the
Web or on CD-ROM. Opportunities for resourcing these kinds of developments will be explored.
3. Proposed Work Programme
Appendix 1 describes the main activities required for the management of Soil Biodiversity data and
provision of access.
4. Resources for Data Management
The Soil Biodiversity Steering Committee have allocated resources for data management sufficient to
support a half-time data manager for 5 years, to carry out the tasks outlined in Appendix I. The development
of database access systems and other data products will depend on the availability of additional resources.
5. Organisational and Technical Infrastructure
The work will be carried out through the Environmental Information Centre (EIC), which is centred at the
Centre for Ecology and Hydrology (CEH), Monks Wood, and which co-ordinates the CEH Data Network.
EIC is one of NERC’s Designated Data Centres with responsibility for the stewardship of NERC’s
terrestrial and freshwater ecological data. EIC is currently supporting the NERC Environmental Diagnostics
and URGENT Programmes.
The EIC at Monks Wood will be responsible for ensuring that the project is carried out in compliance with
NERC Policy requirements of Designated Data Centres, in relation to:





The physical custody, validation, dissemination and review of data during and beyond the life-span
of the Programme
Standards of data stewardship (including ownership and IPR; documentation and meta-data; and
data storage, management and access)
Cataloguing the data, promoting its use and availability
Drawing up licensing arrangements to control the release of data sets where appropriate.
Advising the Programme Steering Committee (and its sub-committees) on data policy matters
SB/DMP/V2/6-00
The project management and the day-to-day running of the work programme will be the responsibility of the
Environmental Change Network (ECN) Central Co-ordination Unit (CCU), a node of the CEH Data
Network, located at CEH Merlewood. Merlewood has well-founded infrastructure and support for the
management of large databases, with a dedicated Oracle database server linked to Unix and PC LANs, a
Web server, high-speed links to the Internet for data access, and a firewall for security. Sourhope is already
one of the ECN terrestrial sites for which the ECN CCU manages the data, and the CCU has extensive
experience in data management (including soils data) and in the development of Web-based interfaces. The
ECNCCU will be responsible for the following aspects of the work programme:










Project management
Design and implementation of data management model and central database
Co-ordination of data management activities at Sourhope and across the Programme Institutes
Management of central database and data transfer from the Institutes, including provision for data
exchange and cross-project integration
Development and maintenance of software systems
Implementation of data quality standards
Making data available
Providing assistance and advice to users of Soil Biodiversity Programme data
Producing progress reports, presenting papers and advising on data management at Programme
Steering Committee (and sub-committee) and award-holders' meetings.
Liaising with the EIC at Monks Wood over data policy, licensing arrangements and data access
6. Exploitation of Data
The Soil Biodiversity Exploitation Sub-committee will identify potential users of the data generated by the
Programme and possible 'added-value' products which could be published.
7. Long-term stewardship and continuation of data services
The datasets generated by the Programme are likely to be of long-term importance and will therefore form
part of NERC's environmental data holdings. The Environmental Change Network (under the aegis of EIC)
will undertake to provide this long-term stewardship by maintaining the data in a central database,
upgrading the database in line with software upgrades, and ensuring adequate off-line back-up on renewable
media. Simple requests for data after the end of the Programme will be handled as part of ECN's long-term
commitment to managing environmental data. A handling charge will be levied for the time taken to service
more complex requests for user-specific data products. It should be noted that the maintenance and
development of more generic data products (e.g. CD-ROMs, as in the LOIS Programme) and data access
systems (e.g. Web-to-database access, as in ECN) will require significant additional resources.
Mandy Lane
ECN Database Manager
27/7/1999
References:
NERC (1999a) Data Management Plans for Scientific Programmes. NERC Data Policy Guidance
Note 4. Swindon: NERC.
NERC (1999b) Data Policy Handbook, Version 2.1. Swindon: NERC.
WWW: http://www.nerc.ac.uk/environmental-data
SB/DMP/V2/6-00
APPENDIX I: Soil Biodiversity Data Management : Proposed Work Programme
The implementation of the Soil Biodiversity Data Policy and the provision of data management and data
access facilities will require the following key activities:

User Requirement:
Define requirements of the Programme and its award-holders for i) use of baseline data, ii) linking
baseline data with project data, iii) inter-project data exchange and integration, iv) data analysis &
modelling v) exploitation of information and results through data products

Data Management Strategy:
Design overall data management model for Programme based on requirements defined above. This
should consider all aspects of data handling, including:

Data and meta-data standards for baseline and project data

Central database design and implementation strategy

Data transfer protocols & mechanisms for project data and meta-data

Quality Assurance

Data access/release agreements & licensing

Data access systems

Data and Meta-data Standards:
These should be developed in liaison with Programme advisors, award-holders and topic specialists.
Define common coding systems, data dictionaries, formats and structures for baseline and project
databases including sample unit referencing and relationships, soil characteristics, soil biota and
vegetation. Draw up standard field recording system for data capture at Sourhope. Devise methods for
handling missing data and uncertainty, e.g. concerning limits of detection. Draw up content standards
and structures for meta-data, including documentation of experimental and analytical methods, space
and time dimensions of study components, definition of any additional data items, quality information,
bibliographic information and administrative details.

Central Database:
Design data model for and build database incorporating: i) Baseline data, ii) Project data, iii) Metainformation for both, iv) Links with external databases where these can be defined. The design will
need to be as generic as possible to enable the incorporation of project data at a later stage, whilst
remaining straightforward and easy to use. The implementation strategy will need to consider access
requirements (e.g. Internet links), functionality, security (access controls and long-term storage) and
performance. Process, validate and input incoming meta-data and project data, and maintain database.
Advise on strategies for project databases held at host institutions.

Data transfer protocols:
Devise procedures and formats for the transfer of meta-information about individual projects to the
central database, and the transfer of project data as appropriate. The aim should be towards generic, but
easy-to-use systems for data transfer over the Internet.

Quality Assurance (QA) of data:
The data manager will be primarily concerned with:
-
Refining data specifications, quality criteria and targets in sampling protocols, and developing
validation procedures consistent with those criteria
Ensuring and maintaining the validity and integrity of the data and meta-data in the central
database and within data access systems
Advising on data quality issues for host institute databases
Developing quality standards for meta-information
Managing the results of any quality assessment exercises
Incorporating data quality information in the meta-data system
SB/DMP/V2/6-00
Procedures will be developed in conjunction with Programme advisors, award-holders and specialists.
Meta-information on data quality should include documentation of quality targets and quality control
procedures defined in the sampling protocols and project methodologies, conformance of analytical
laboratories to QC standards, details of known problems which may affect the quality of the data and
quality ‘flags’ for suspect data.

Data Access systems:
-
Make data available, subject to data access agreements
*Provide data access systems (e.g. Web, CD-ROM) if further resources become available
*The development of user-friendly, tailored database access systems, which allow user-defined query
and analysis, requires significant programming effort. The degree to which such systems can be
developed will depend on the availability of additional resources to the Programme. The types of data
products and systems desirable (e.g. WWW, CD-ROM) should be defined in liaison with the Data and
Modelling Sub-committee and the Exploitation Sub-committee. The data management model will take
account of these requirements as far as possible, in preparation for data access developments should
resources become available.

Co-ordination, Documentation and Reporting:
-
-
Co-ordinate data management across the Programme Institutes, through developing,
implementing and advising on protocols, data management and data compatibility issues, and
through holding occasional workshops.
Document central database structure, and data handling protocols
Provide progress reports for Steering Committee and Programme Managers
Download