Soil Biodiversity SB/DMP/V2/6-00 NERC Thematic Programme SOIL BIODIVERSITY THEMATIC PROGRAMME DATA MANAGEMENT PLAN "It is NERC Policy that all Scientific Programmes must plan adequately for the management of the data they will collect. The planning must cover not only the practical arrangements while the programme is running, but also longer-term stewardship of the data thereafter. Properly managed, the data will contribute to a key NERC resource that will continue to be exploited scientifically and commercially long after the formal end of the programme." (Natural Environment Research Council (NERC), 1999a). 1. Programme Datasets The Soil Biodiversity Thematic Programme has a 5-year life-span, during which time it will generate data sets through baseline studies and individual project awards. These data sets will need to be accessible by scientists during the Programme, and as data products after the Programme. A strategy is required to resource and manage these activities. The three principal information components to be considered are: Baseline data from the main field site at Sourhope and from the Ecotron at the Centre for Population Biology, Imperial College. Project data collected/generated by award-holders Meta-information about experimental plots, baseline data, project data and documentation, and administrative information (who’s doing what where and when). The Programme does not envisage much demand for access to external databases. However, data collected at Sourhope under the Environmental Change Network (ECN) Programme will be useful baseline information for the site and should be integrated as far as possible. Links with other major data sources should be considered. 2. Soil Biodiversity Programme Data Policy A Data Policy for the Soil Biodiversity Programme has been developed by the Data and Modelling Subcommittee which takes into account the requirements and principles of the NERC Data Policy (NERC, 1999b). The main points of the Policy are as follows: Data ownership will follow the NERC guidelines which state: "When NERC pays a grant to an HEI (or other eligible body) to do research, the default position is that the resulting Intellectual Property (IP) will belong to the HEI, and that any data collected, being part of that IP, will therefore likewise be owned by the HEI." The Environmental Information Centre (EIC) is the most appropriate NERC Designated Data Centre for the central database and data co-ordination. Baseline data (soils, vegetation, meteorological data) from the field site and Ecotron are to be held in a central database for access by all award-holders. During the Programme, baseline data in summary form will be made freely available via the Web; high resolution ('raw') data will be available to Soil Biodiversity Programme members only. Meta-information on all data is to be held centrally, and the structure and content is to be based on standard protocols. SB/DMP/V2/6-00 To comply with NERC data policy on long-term stewardship and data availability, PIs must offer to deposit with the EIC (via the Programme's central database) a copy of the data and meta-data resulting from their projects when completed, but without prejudice to the intellectual property rights (IPR) on those data. However, PIs are strongly encouraged to provide project data for the central database as soon as possible during the period of the project in order to facilitate crossproject integration and to provide for the long-term stewardship of Soil Biodiversity data. PI teams will be permitted a period of 2 years after termination of their award to work exclusively on, and publish the results of, the data collected during the project. After this time, project datasets will be advertised through NERC directories, and may be made available for bona-fide research purposes. Raw project data for bona-fide research purposes will be released under licence; requests for the raw data for other purposes will be referred to the HEI holding the IPR for that data. Summary data outputs from the Programme may be made more openly available as 'data products' by agreement through the Soil Biodiversity Programme Steering Committee. Data model development, protocols for data and meta-data handling, quality assurance (QA) of information and maintenance of databases should be carried out by a database specialist in accordance with sampling protocols and through liaison with award-holders and specialists. The Programme will provide a unique database of considerable interest for research and education. This will be made available subject to the access controls described above. Ideally access to these data and to derived data products should be made available via user-friendly front-ends, via the Web or on CD-ROM. Opportunities for resourcing these kinds of developments will be explored. 3. Proposed Work Programme Appendix 1 describes the main activities required for the management of Soil Biodiversity data and provision of access. 4. Resources for Data Management The Soil Biodiversity Steering Committee have allocated resources for data management sufficient to support a half-time data manager for 5 years, to carry out the tasks outlined in Appendix I. The development of database access systems and other data products will depend on the availability of additional resources. 5. Organisational and Technical Infrastructure The work will be carried out through the Environmental Information Centre (EIC), which is centred at the Centre for Ecology and Hydrology (CEH), Monks Wood, and which co-ordinates the CEH Data Network. EIC is one of NERC’s Designated Data Centres with responsibility for the stewardship of NERC’s terrestrial and freshwater ecological data. EIC is currently supporting the NERC Environmental Diagnostics and URGENT Programmes. The EIC at Monks Wood will be responsible for ensuring that the project is carried out in compliance with NERC Policy requirements of Designated Data Centres, in relation to: The physical custody, validation, dissemination and review of data during and beyond the life-span of the Programme Standards of data stewardship (including ownership and IPR; documentation and meta-data; and data storage, management and access) Cataloguing the data, promoting its use and availability Drawing up licensing arrangements to control the release of data sets where appropriate. Advising the Programme Steering Committee (and its sub-committees) on data policy matters SB/DMP/V2/6-00 The project management and the day-to-day running of the work programme will be the responsibility of the Environmental Change Network (ECN) Central Co-ordination Unit (CCU), a node of the CEH Data Network, located at CEH Merlewood. Merlewood has well-founded infrastructure and support for the management of large databases, with a dedicated Oracle database server linked to Unix and PC LANs, a Web server, high-speed links to the Internet for data access, and a firewall for security. Sourhope is already one of the ECN terrestrial sites for which the ECN CCU manages the data, and the CCU has extensive experience in data management (including soils data) and in the development of Web-based interfaces. The ECNCCU will be responsible for the following aspects of the work programme: Project management Design and implementation of data management model and central database Co-ordination of data management activities at Sourhope and across the Programme Institutes Management of central database and data transfer from the Institutes, including provision for data exchange and cross-project integration Development and maintenance of software systems Implementation of data quality standards Making data available Providing assistance and advice to users of Soil Biodiversity Programme data Producing progress reports, presenting papers and advising on data management at Programme Steering Committee (and sub-committee) and award-holders' meetings. Liaising with the EIC at Monks Wood over data policy, licensing arrangements and data access 6. Exploitation of Data The Soil Biodiversity Exploitation Sub-committee will identify potential users of the data generated by the Programme and possible 'added-value' products which could be published. 7. Long-term stewardship and continuation of data services The datasets generated by the Programme are likely to be of long-term importance and will therefore form part of NERC's environmental data holdings. The Environmental Change Network (under the aegis of EIC) will undertake to provide this long-term stewardship by maintaining the data in a central database, upgrading the database in line with software upgrades, and ensuring adequate off-line back-up on renewable media. Simple requests for data after the end of the Programme will be handled as part of ECN's long-term commitment to managing environmental data. A handling charge will be levied for the time taken to service more complex requests for user-specific data products. It should be noted that the maintenance and development of more generic data products (e.g. CD-ROMs, as in the LOIS Programme) and data access systems (e.g. Web-to-database access, as in ECN) will require significant additional resources. Mandy Lane ECN Database Manager 27/7/1999 References: NERC (1999a) Data Management Plans for Scientific Programmes. NERC Data Policy Guidance Note 4. Swindon: NERC. NERC (1999b) Data Policy Handbook, Version 2.1. Swindon: NERC. WWW: http://www.nerc.ac.uk/environmental-data SB/DMP/V2/6-00 APPENDIX I: Soil Biodiversity Data Management : Proposed Work Programme The implementation of the Soil Biodiversity Data Policy and the provision of data management and data access facilities will require the following key activities: User Requirement: Define requirements of the Programme and its award-holders for i) use of baseline data, ii) linking baseline data with project data, iii) inter-project data exchange and integration, iv) data analysis & modelling v) exploitation of information and results through data products Data Management Strategy: Design overall data management model for Programme based on requirements defined above. This should consider all aspects of data handling, including: Data and meta-data standards for baseline and project data Central database design and implementation strategy Data transfer protocols & mechanisms for project data and meta-data Quality Assurance Data access/release agreements & licensing Data access systems Data and Meta-data Standards: These should be developed in liaison with Programme advisors, award-holders and topic specialists. Define common coding systems, data dictionaries, formats and structures for baseline and project databases including sample unit referencing and relationships, soil characteristics, soil biota and vegetation. Draw up standard field recording system for data capture at Sourhope. Devise methods for handling missing data and uncertainty, e.g. concerning limits of detection. Draw up content standards and structures for meta-data, including documentation of experimental and analytical methods, space and time dimensions of study components, definition of any additional data items, quality information, bibliographic information and administrative details. Central Database: Design data model for and build database incorporating: i) Baseline data, ii) Project data, iii) Metainformation for both, iv) Links with external databases where these can be defined. The design will need to be as generic as possible to enable the incorporation of project data at a later stage, whilst remaining straightforward and easy to use. The implementation strategy will need to consider access requirements (e.g. Internet links), functionality, security (access controls and long-term storage) and performance. Process, validate and input incoming meta-data and project data, and maintain database. Advise on strategies for project databases held at host institutions. Data transfer protocols: Devise procedures and formats for the transfer of meta-information about individual projects to the central database, and the transfer of project data as appropriate. The aim should be towards generic, but easy-to-use systems for data transfer over the Internet. Quality Assurance (QA) of data: The data manager will be primarily concerned with: - Refining data specifications, quality criteria and targets in sampling protocols, and developing validation procedures consistent with those criteria Ensuring and maintaining the validity and integrity of the data and meta-data in the central database and within data access systems Advising on data quality issues for host institute databases Developing quality standards for meta-information Managing the results of any quality assessment exercises Incorporating data quality information in the meta-data system SB/DMP/V2/6-00 Procedures will be developed in conjunction with Programme advisors, award-holders and specialists. Meta-information on data quality should include documentation of quality targets and quality control procedures defined in the sampling protocols and project methodologies, conformance of analytical laboratories to QC standards, details of known problems which may affect the quality of the data and quality ‘flags’ for suspect data. Data Access systems: - Make data available, subject to data access agreements *Provide data access systems (e.g. Web, CD-ROM) if further resources become available *The development of user-friendly, tailored database access systems, which allow user-defined query and analysis, requires significant programming effort. The degree to which such systems can be developed will depend on the availability of additional resources to the Programme. The types of data products and systems desirable (e.g. WWW, CD-ROM) should be defined in liaison with the Data and Modelling Sub-committee and the Exploitation Sub-committee. The data management model will take account of these requirements as far as possible, in preparation for data access developments should resources become available. Co-ordination, Documentation and Reporting: - - Co-ordinate data management across the Programme Institutes, through developing, implementing and advising on protocols, data management and data compatibility issues, and through holding occasional workshops. Document central database structure, and data handling protocols Provide progress reports for Steering Committee and Programme Managers