Organising data flows and modelling for the Essential Biodiversity Variables Hannu Saarenmaa – University of Eastern Finland • GEO BON, WG8 – Data Integration and Interoperability • EU BON, WP2 – Data Integration and Interoperability • BioVeL, WP2 – Workflows for Scientific Research GEO - X Plenary Geneva, 14 January 2014 1 Essential Biodiversity Variables • • • • • • • Conceived by GEO BON Collaborators (Pereira et.al. (2013) “Essential Biodiversity Variables”, Science, Vol. 339, 18 Jan 2013). EBVs facilitate data integration by providing an intermediate abstraction layer between primary observations and indicators. Computed from a large number of inputs (monitoring/incidental data). EBVs aim to help observation communities harmonise monitoring, by identifying how variables should be sampled and measured. EBVs standardise an ontology for biodiversity and harmonise measurements, observations, and protocols. Endorsed by Convention on Biological Diversity (CBD) and in line with the 2020 Aichi Targets. Provide focus for GEO BON and hence for the interoperability thrust within GEO BON. • A use case that GEO BON, EU BON and BioVeL focus on. Where does the data come from? • In Europe there are about 2000 biodiversity observation networks (only 643 listed by EUMON). • GBIF has 10,000 data sets, openly accessible, conforming to GEOSS Data Sharing Principles. • LTER/DataONE has 1,000’s biodiversity datasets. • EU BON is carrying out a gap analysis: – There is a massive duplication of effort in data management, and lack of data sharing. – There are very few data sets whose ”quality” (coverage, accuracy, etc.) has been documented and guaranteed. – So called ”Data core” in biodiversity has not yet been defined. 4 Biodiversity Virtual e-Laboratory BioVeL processing services and workflows • “Workflows” (series of data analysis steps) allow to process vast amounts of data. • Build your own workflow: select and apply successive “services” (data processing techniques.) • Import data from one’s own research and/or from existing libraries (i.e. GBIF, Catalogue of Life). • Access a library of workflows and re-use existing workflows. • Cut down research time and overhead expenses. Part of a workflow to study the ecological niche of the horseshoe crab 6 Aim: Predictive modelling of biodiversity change Available tools from a growing family of ENM workflows – released to public at www.biovel.eu 1. Data assembly, cleaning, and refinement Ecological Niche Modelling Workflow (ENM) – Classic ENM with 15 algorithms – Separate BioClim workflow (requires special inputs) 3. Data discovery Data Refinement Workflow (DRW) for pre-processing – Taxonomic Name Resolution / Occurrence retrieval – Geo-temporal data selection using ‘BioSTIF’. – Data quality checks / filtering using ‘Google Refine’. 2. The analytical cycle ENM Statistical Workflow (ESW) for post-processing – DIFF: Extent and intensity of change – STACK: Extent, intensity, and a cumulated potential – SHIFT: of the centre of gravity (direction, length, in kilometers) Ecological Niche Modeling Statistical analysis 8 Seamless exchange of data layers http://openmodeller.cria.org.br/ Use case: The spruce bark beetle, Ips typographus, disturbance of forest ecosystems Pre 2002 Year 2050 Difference • Statistical processing of the difference in Finland indicates that susceptibility of spruce forests to Ips typographus damage will get five-fold by 2050. • Policy advise: Stricter forest hygiene through tougher legislation, so that Ips populations are kept at minimum, because of the increased risk. • Papers for Silva Fennica and INTECOL session proceedings at Journal of Ecology. Outline of the use case • Running Ecological Niche Modeling (ENM) workflow for large number of species – – – – – Process data points for hundreds of species (e.g. plants, butterflies, …) Use data mostly from GBIF, but also from elsewhere Each individual species may have 105 of data points Run openModeller based ENM for all the data points Choose predictive layers from WorldClim and GEOSS sources • Generate summary statistics that can answer questions such as: – How many species are increasing? How many are decreasing? EBVs? Does the flora/fauna move to any direction? Is distribution fragmenting? Is distribution shrinking? How many populations are becoming marginalised? – Prototype automatic data processing for computing the Essential Biodiversity Variables (EBV) 11 Status of the current BioVeL ENM workflow • Current openModeller based ENM workflows work at a smaller scale – focus on one or a few selected species • Current workflow requires frequent interaction with the user (many clicks if we simply multiply runs) • We need a system that is scalable and automated to run ENM for hundreds of species • We need a system that can perform a summary analysis across all the species based on the individual ENM runs • The 2nd generation BioVeL portal will provide the required capabilities. • To be released publicly in January 2014 (currently in beta mode) 12 Envisaged application structure Selected species ENM parameter sets for species GBIF query LTER query ... ENM workflow ENM workflow ... ENM output file ENM output file EUMON query ENM workflow ENM output file Summary analysis • Multiple species may use the same ENM parameter set (e.g. Mediterranean dryland plants) • Parameter sets are generated and tested with another workflow (see next slide) • Some species may need other offline data, or private data (uploaded from user side). • One ENM workflow predicts the impact of environmental changes on the distribution of one species. • Portal offers files for download • Performed with R-based custom tool outside the portal • EBV production by combining data from different models 13 ENM parameter optimisation workflow Selected species Parameter test and selection job Parameter test and selection job Parameter matrix ... ENM parameter sets for species • Possible parameter combinations. Parameter test and selection job • The optimal parameter input for the large ENM workflow (see previous slide) 14 Initialising the data sweep on portal 15 Results of data sweep, ready to be mapped, and statistically analysed 16 Example product: Accumulated invasive potential for ecological groups 20 blacklisted species divided in 4 ecological regimes Zoobenthos Phytobenthos Example: Stack of combined macrozoobenthic invasion heatmaps Zoopelagial Phytopelagial Slide by Matthias Obst, BioVeL QUESTIONS? www.earthobservations.org/geobon.shtml www.eubon.eu www.biovel.eu 18