Why a NZ Bio Data Infrastructure? - Teamwork

advertisement
A federated bio data infrastructure
(BDI) for New Zealand
Why do we need it? What do we need to make it happen ?
Jochen Schmidt
Background







Over the last years a range of projects, initiatives, and experts from NIWA, Department of
Conservation, Landcare Research, Te Papa, and various Regional Councils have worked
towards the case for a “New Zealand Bio Data Infrastructure (BDI)”.
During various meetings in early 2013 general agreements was found to work towards a
biodata infrastructure and the very general needs and requirements outlined.
A joint TFBIS proposal was put forward in 2013 by NIWA, Landcare, DOC, Regional Councils
and Te Papa which eventually realized the BSS project:
https://teamwork.niwa.co.nz/display/NZBSS
A workshop was held at Te Papa in July 2013 with key stakeholders to further scope the case
for a New Zealand BDI.
The Dataversity workshop in late 2013 socialized the BDI idea further with practitioners and
defined the use cases for the BSS project.
During 2014/2014 the BSS project has worked with practitioners in a series of workshops to
identify gaps and ways forward in how organisational bio data management could be
improved to contribute to a national bio data infrastructure.
Based on these experiences, this paper argues the case for implementing a New Zealand Bio
Data Infrastructure (BDI) based on the experiences of the BSS project.
Why a NZ Bio Data Infrastructure?
Many organisations in New Zealand are collecting and holding bio data. Data exchange and data
access happens, and is generally based on ad-hoc person-to-person data exchange using much
communication and negotiation to agree on formats, semantics, vocabularies etc., and including adhoc and manual conversion and adoption steps. Generally, the amount of people resource required
in data finding, understanding, formatting is estimated to exceed the resource required to analyse
and use that data, often of an order of magnitude. This is a waste of resources for the data provider
and for the data user. We argue that establishing and maintaining a New Zealand Bio data
infrastructure supporting machine-to-machine data exchange is greatly desirable and the cost of
maintaining it will by far exceed the cost of currently manual data transactions1. In most general
1
I believe we need more examples and underpinning information here [Jochen]
terms, a New Zealand Bio Data Infrastructure is based on national standards and systems that are
nationally maintained and adopted by agencies within their internal management systems. Hence,
the cost of maintaining a BDI is made up of the cost of maintaining national standards and national
systems and the cost of agencies to change their internal data management systems to comply with
the national systems.
What is a BDI and how do we make it happen?
Required BDI Components
A national bio data infrastructure should include the following components
1.
2.
3.
4.
5.
6.
7.
2
A national entity / facility responsible for the development and ongoing maintenance of bio
data collection standards, and to support agencies in their adoption. This ensures data is
collected consistently.
This also requires (mandate for) agencies to adopt the standards into their systems and
workflows.
A national entity / facility responsible for development and ongoing maintenance of bio data
exchange standards (as demonstrated by the BSS project), this ensures that data providers
publish their data to standards and consumers consume data to standards. This ensures data
can be seamlessly exchanged.
This also requires (mandate for) agencies to adopt the standards into their systems and
workflows.
A national facility for maintaining taxonomic concepts, as championed by the New Zealand
Organisms Register (NZOR). This enables organisations to maintain and publish their species
data to a common standard.
This also requires (mandate for) agencies to adopt the national system into their systems and
workflows.
A national vocabulary service / register for bio data methods2. This enables organisations to
annotate their bio data with a common methodology.
This also requires (mandate for) agencies to adopt the national system into their systems and
workflows.
A national vocabulary for observation names and units. This enables organisations to publish
their observations to a common standard.
This also requires (mandate for) agencies to adopt the national vocabulary into their systems
and workflows.
A national registry service for bio data sources / service endpoints3, this ensures data sources
can be discovered by end users. This is to ensure users can find / discover data sources.
A national framework including a linked data URI policy to support national vocabulary
services4. This ensures national vocabularies can be operated consistently.
A demonstration vocabulary service has been setup as part of the BSS project under:
http://test.data.scinfo.org.nz/x/def/bss/protocol/invertebrate/C1-P2-QC2
3
A draft paper on principles for setting up a national “Linked Data Registry” services has been prepared by the
BSS project: https://confluence.landcareresearch.co.nz/display/SBINTEROP/Linked+Data+Registries
We note that


This infrastructure needs to include governance with clear mandates (even legislation?); clear
responsibilities; and resources5. One ‘lever’ to achieve this is to (re-)establish a New Zealand
GBIF node.
Many of the mentioned components functions / components are generic in nature (in particular
(2) and (6)) and can be re-used in the context of a wider “Environmental Data Infrastructure
EDI”, and combined with the national Spatial Data Infrastructure (SDI) work that is going on
through NZGO. The BSS project has started conversations and hosted a workshop with NZGO on
that matter.
Required national systems
BDI
Component
Tasks
Mandates
Bio data
collection
standards
Develop new standards
Maintain existing
standards
Provide training and
support for agencies to
adopt standards
Monitor and audit
organisations on
standard adoption
Develop new standards
Maintain existing
standards
Provide training and
support for agencies to
adopt standards
Monitor and audit
organisations on
standard adoption
Develop and Maintain
registry and metadata
standards and register;
Training and support
for organisations;
Monitoring and audit
Maintain register;
Update content;
Mandate required for
2.0 FTE
organisation to adopt
standards;
Mandate required for entity
to audit organisations
Bio data
exchange
standards
National
Service
endpoint
register
National
Organisms
4
Estimated
Resources
Potential
owner and
funder
NEMS (on
behalf of
various
organisations?)
Mandate required for
0.5 FTE?
organisation to adopt
standards;
Mandate required for entity
to audit organisations
NEMS
Organisations required to
publish their service end
points
0.2 FTE?
Alliance?
Organisations required to
match taxa to NZOR
1.0 FTE?
DOC
A draft paper on principles for setting up a national “Linked Data URI Policy” has been prepared by the BSS
project: https://confluence.landcareresearch.co.nz/display/SBINTEROP/Linked+Data+URI+Policy
5
More fleshing out of governance required [Jochen]
Register
National
Methods
Register
National
Observation
names and
units register
Training and support
for organisations;
Monitoring and audit
Maintain register;
Update content;
Training and support
for organisations;
Monitoring and audit
Maintain register;
Update content;
Training and support
for organisations;
Monitoring and audit
National URI
framework
Organisations required to
publish methods according
to this service
0.2 FTE
LINZ
Organisations required to
publish parameter /
variable names and units
according to this service
0.2 FTE
LINZ
Use URIs for semantic web
apps
0.2FTE?
LINZ
Required Change for organisations managing and publishing their data
Required change
Adoption of collection
standards
Adoption of national
organisms vocabulary
Adoption of national methods
vocabulary
Adoption of observation
vocabulary
Data Publication to standards
and service end point
maintenance
One-off costs
0.5 FTE
(software changes and policies)
0.5 FTE
(software changes and policies)
0.5 FTE
(software changes and policies)
0.5 FTE
(software changes and policies)
1 FTE
(software setup and policies)
Ongoing costs
0.3 FTE
Download