A federated bio data infrastructure (BDI) for New Zealand Why do we need it? What do we need to make it happen ? Jochen Schmidt Background Over the last years a range of projects, initiatives, and experts from NIWA, Department of Conservation, Landcare Research, Te Papa, and various Regional Councils have worked towards the case for a “New Zealand Bio Data Infrastructure (BDI)”. During various meetings in early 2013 general agreements was found to work towards a biodata infrastructure and the very general needs and requirements outlined. A joint TFBIS proposal was put forward in 2013 by NIWA, Landcare, DOC, Regional Councils and Te Papa which eventually realized the BSS project: https://teamwork.niwa.co.nz/display/NZBSS A workshop was held at Te Papa in July 2013 with key stakeholders to further scope the case for a New Zealand BDI. The Dataversity workshop in late 2013 socialized the BDI idea further with practitioners and defined the use cases for the BSS project. During 2014/2014 the BSS project has worked with practitioners in a series of workshops to identify gaps and ways forward in how organisational bio data management could be improved to contribute to a national bio data infrastructure. Based on these experiences, this paper argues the case for implementing a New Zealand Bio Data Infrastructure (BDI) based on the experiences of the BSS project. Why a NZ Bio Data Infrastructure? Many organisations in New Zealand are collecting and holding bio data. Data exchange and data access happens, and is generally based on ad-hoc person-to-person data exchange using much communication and negotiation to agree on formats, semantics, vocabularies etc., and including adhoc and manual conversion and adoption steps. Generally, the amount of people resource required in data finding, understanding, formatting is estimated to exceed the resource required to analyse and use that data, often of an order of magnitude. This is a waste of resources for the data provider and for the data user. We argue that establishing and maintaining a New Zealand Bio data infrastructure supporting machine-to-machine data exchange is greatly desirable and the cost of maintaining it will by far exceed the cost of currently manual data transactions1. In most general 1 I believe we need more examples and underpinning information here [Jochen] terms, a New Zealand Bio Data Infrastructure is based on national standards and systems that are nationally maintained and adopted by agencies within their internal management systems. Hence, the cost of maintaining a BDI is made up of the cost of maintaining national standards and national systems and the cost of agencies to change their internal data management systems to comply with the national systems. What is a BDI and how do we make it happen? Required BDI Components A national bio data infrastructure should include the following components 1. 2. 3. 4. 5. 6. 7. 2 A national entity / facility responsible for the development and ongoing maintenance of bio data collection standards, and to support agencies in their adoption. This ensures data is collected consistently. This also requires (mandate for) agencies to adopt the standards into their systems and workflows. A national entity / facility responsible for development and ongoing maintenance of bio data exchange standards (as demonstrated by the BSS project), this ensures that data providers publish their data to standards and consumers consume data to standards. This ensures data can be seamlessly exchanged. This also requires (mandate for) agencies to adopt the standards into their systems and workflows. A national facility for maintaining taxonomic concepts, as championed by the New Zealand Organisms Register (NZOR). This enables organisations to maintain and publish their species data to a common standard. This also requires (mandate for) agencies to adopt the national system into their systems and workflows. A national vocabulary service / register for bio data methods2. This enables organisations to annotate their bio data with a common methodology. This also requires (mandate for) agencies to adopt the national system into their systems and workflows. A national vocabulary for observation names and units. This enables organisations to publish their observations to a common standard. This also requires (mandate for) agencies to adopt the national vocabulary into their systems and workflows. A national registry service for bio data sources / service endpoints3, this ensures data sources can be discovered by end users. This is to ensure users can find / discover data sources. A national framework including a linked data URI policy to support national vocabulary services4. This ensures national vocabularies can be operated consistently. A demonstration vocabulary service has been setup as part of the BSS project under: http://test.data.scinfo.org.nz/x/def/bss/protocol/invertebrate/C1-P2-QC2 3 A draft paper on principles for setting up a national “Linked Data Registry” services has been prepared by the BSS project: https://confluence.landcareresearch.co.nz/display/SBINTEROP/Linked+Data+Registries We note that This infrastructure needs to include governance with clear mandates (even legislation?); clear responsibilities; and resources5. One ‘lever’ to achieve this is to (re-)establish a New Zealand GBIF node. Many of the mentioned components functions / components are generic in nature (in particular (2) and (6)) and can be re-used in the context of a wider “Environmental Data Infrastructure EDI”, and combined with the national Spatial Data Infrastructure (SDI) work that is going on through NZGO. The BSS project has started conversations and hosted a workshop with NZGO on that matter. Required national systems BDI Component Tasks Mandates Bio data collection standards Develop new standards Maintain existing standards Provide training and support for agencies to adopt standards Monitor and audit organisations on standard adoption Develop new standards Maintain existing standards Provide training and support for agencies to adopt standards Monitor and audit organisations on standard adoption Develop and Maintain registry and metadata standards and register; Training and support for organisations; Monitoring and audit Maintain register; Update content; Mandate required for 2.0 FTE organisation to adopt standards; Mandate required for entity to audit organisations Bio data exchange standards National Service endpoint register National Organisms 4 Estimated Resources Potential owner and funder NEMS (on behalf of various organisations?) Mandate required for 0.5 FTE? organisation to adopt standards; Mandate required for entity to audit organisations NEMS Organisations required to publish their service end points 0.2 FTE? Alliance? Organisations required to match taxa to NZOR 1.0 FTE? DOC A draft paper on principles for setting up a national “Linked Data URI Policy” has been prepared by the BSS project: https://confluence.landcareresearch.co.nz/display/SBINTEROP/Linked+Data+URI+Policy 5 More fleshing out of governance required [Jochen] Register National Methods Register National Observation names and units register Training and support for organisations; Monitoring and audit Maintain register; Update content; Training and support for organisations; Monitoring and audit Maintain register; Update content; Training and support for organisations; Monitoring and audit National URI framework Organisations required to publish methods according to this service 0.2 FTE LINZ Organisations required to publish parameter / variable names and units according to this service 0.2 FTE LINZ Use URIs for semantic web apps 0.2FTE? LINZ Required Change for organisations managing and publishing their data Required change Adoption of collection standards Adoption of national organisms vocabulary Adoption of national methods vocabulary Adoption of observation vocabulary Data Publication to standards and service end point maintenance One-off costs 0.5 FTE (software changes and policies) 0.5 FTE (software changes and policies) 0.5 FTE (software changes and policies) 0.5 FTE (software changes and policies) 1 FTE (software setup and policies) Ongoing costs 0.3 FTE