Linking data, services and communities using Virtual Research Environments (VRE) The Scratchpads example Dimitris Koureas, Vince Smith & Simon Rycroft Natural History Museum London Jönköping, Sweden October 27-31, 2014 The problem Capturing and integrating biodiversity data The challenges Challenge 1: Capturing and mobilising data at all scales The challenges Challenge 2: Linking & aggregating data at different scales Communities c.50k (e.g. Scratchpads) Global Efforts c.500M (e.g. GBIF Data Portal) National Efforts c.5M (e.g. NHM Data Portal) The challenges Challenge 3: Synthesising data, e.g. modelling human pressures on biodiversity Ecosystems 2M records, 19k sites, 34k spp. Small aggregated datasets Agro-systems Management Practices Species richness in different ecosystems ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ●●● ●● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ●● ● ●● ● ● ●●●● ●● ● ●● ● ●● ●●● ●●● ●● ●●● ●● ●● ●●● ●●● ●●● ●● ●●● ●●● ●●● ● ●●●●● ●●● ●● ●● ● ● ● ● ●●● ●●● ●●● ● ●●●● ● ● ●●● ● ●● ●●● ●● ●●● ●●● ● ● ●● ● ●●● ●● ● ●● ●●●● ● ● ●● ●●● ●● ● ●● ● ● ● ●●● ●● ● ●●● ●●● ●● ●● ●● ● ●●● ●●● ●● ●● ● ●● ● ●●● ●● ●● ● ●●● ●● ● ●●● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ●●● ●● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Models to predict how biodiversity responds to human pressures ● ● ● ● ●●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ●● ●● ●● ●● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ●●● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●●●● ●● ●● ● ● ● ●●●● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ●●● ● ●● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●●● ● ● ●● ●● ● ●● ● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ●● ●●● ● ●● ● ● ●● ●● ●●● ●● ● ●● ●● ● ●●●● ●●● ●●● ● ●● ●● ● ●● ● ●● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ●● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● A Boreal Forests/Taiga B Deserts & Xeric Shrublands C Flooded Grasslands & Savannas D Mangroves E Mediterranean Forests, Woodlands & Scrub F Montane Grasslands & Shrublands G Temperate Broadleaf & Mixed Forests ● H Temperate Conifer Forests I Temperate Grasslands, Savannas & Shrublands J Tropical & Subtropical Coniferous Forests K Tropical & Subtropical Dr y Broadleaf Forests L Tropical & Subtropical Gr asslands, Savannas & Shrublands M Tropical & Subtropical Moist Broadleaf F orests N Tundra • • • • Land-use change Pollution Invasive species Infrastructure The vision What is the long-term vision? Broad consensus in the Biodiversity Informatics community White paper Nature 2013, doi:10.1038/493295a GBIO report BIH 2013 The long tail of Biodiversity data Dark or Grey shaded data Inaccessible | native format/private silos Disconnected | not aggregated or discoverable Redundant | overlapping efforts no coordination Cluttered | small and dispersed datasets Typically produced by small communities 20% 80% Empower long-tail researchers to make use of the available e-infrastructures and services We need to change the modus operandi of doing Science Positioning of VREs VREs derived from VLEs Sit on the top of e-infrastructures Abstract from available services Lower the entry barrier Scratchpads Virtual Research Environments 580 Communities 150,000 taxa 6,400 active users More than 2,500,000 visitors Data mobilisation & generation Data Data publishing Curation and linking Data analysis Empower users to engage Simplify access to tools & services Incentivise use through data publication A Scratchpad is a gateway to big data xls csv SDD DwC-A XML xls csv TCS XML xls csv contributed data External data & services Taxa (Classifications, taxon profiles, specimens, literature, images, maps, phenotypic, genotypic & morphometric datasets, keys, phylogenies) Conservation Projects Regions Societies Leverage effort and data impact A highly dynamic but fragmented landscape Specific configurations for particular domains or systems serving very generic functions What we want to do? Develop the highly responsive digital framework required to enable high throughput research and support science of scale towards the long term vision of modelling Life on Earth Inspired by Roadmap publications GBIO & White paper Mandated by European & global Societal challenges Supported by Maturity of available e-infrastructures Horizon 2020 VRE Proposal Topic: EINFRA-9-2015 Virtual Research Environments Estimated Budget: c. € 8 m Consortium: c. 25 partners LinkD Linking data, services and communities for predictive modelling of the biosphere What we want to do? LinkD Science of Scale for L i fe o n Ea r t h How are we going to do it? Build on top of existing e-infrastructures Learn from previous experiences Prioritise operation over demonstration and proof-of-concept Engage with multi-disciplinary research & citizen science communities Develop and execute a comprehensive training programme Proposed project structure Improving uptake of cyberinfrastructures Confidence Agility Marketing Commitment Longevity Adaptability User monitoring Visibility Intuitive interface Thank you @dimitriskoureas d.koureas@nhm.ac.uk