Koureas_VRE-Scratchpads

advertisement
Linking data, services and communities
using Virtual Research Environments (VRE)
The Scratchpads example
Dimitris Koureas, Vince Smith & Simon Rycroft
Natural History Museum London
Jönköping, Sweden
October 27-31, 2014
The problem
Capturing and integrating biodiversity data
The challenges
Challenge 1: Capturing and mobilising data at all scales
The challenges
Challenge 2: Linking & aggregating data at different scales
Communities c.50k
(e.g. Scratchpads)
Global Efforts c.500M
(e.g. GBIF Data Portal)
National Efforts c.5M
(e.g. NHM Data Portal)
The challenges
Challenge 3: Synthesising data, e.g. modelling human pressures on biodiversity
Ecosystems
2M records, 19k sites, 34k spp.
Small aggregated
datasets
Agro-systems
Management
Practices
Species richness in different ecosystems
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●●
●●●
●●
●
●●
●
●
●
●●
●
●
● ●●
●●●
●
●●
●
●●
●
●●
●
●
●●●●
●●
●
●●
●
●●
●●●
●●●
●●
●●● ●●
●●
●●●
●●●
●●●
●● ●●●
●●●
●●●
●
●●●●● ●●●
●●
●● ●
●
● ● ●●●
●●●
●●●
●
●●●● ●
● ●●●
●
●●
●●●
●● ●●●
●●● ● ●
●●
●
●●●
●●
●
●● ●●●●
●
●
●●
●●● ●●
●
●●
●
●
●
●●●
●●
●
●●●
●●● ●● ●●
●●
●
●●●
●●●
●●
●●
● ●● ●
●●●
●●
●● ●
●●●
●●
●
●●●
●●
●
●● ●
● ● ●●
●●
●
●
●
●
●●
●●● ● ●
●
● ●● ●
●
●
●●
●
●
●
●
●
●●●
●
●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●●
●
●
● ●
●
●
● ●
●
●
●
●
●●
●
●
●
●
● ●
●
●●
● ●
●
●
●●
●●
●
● ●●
●
●
● ● ●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
● ●
●●
● ●●
●●
●●
●
●
●
●● ●●
●
●●●
●
●
●
●
● ●●●
●●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Models to predict how biodiversity responds to human pressures
●
●
●
●
●●●
●
●
●●
●●
●
●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●
●● ●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●●
● ●● ●
●●
●●
●●
●●
●●
●● ●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●● ●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●●
●
●
●●●
●
●
●
●●
●●
●
● ●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●● ● ● ● ●
●
●
●
●●
●
●
●
● ●● ●
●
●
●●
●
●
●
●
● ●
● ●
●●
●
●
●
●
●●
●
● ●
●●
●
●
●
●
●●
●
●
●
●
●● ●●●●
●●
●●
●
●
●
●●●●
●●
●
●●
●
●
●
●
● ●
●
●●
● ●
●
●
●
●
●● ●●
●●
● ●●
●
●
●●
●
● ●
●
● ●
●
●
●●
●
●●
●
●●
●
● ● ●●●
●
●
●●
●
●
●
● ●●
●
●
●
● ●● ●●
●●●
●
●●
●
●●
●● ●
●●
● ●
●
● ●●
●
●
●
●
● ●●
● ●
● ● ●● ●
●
●
●●
●
●
●
● ●
●
● ● ●●
●
●
●
●
●●
● ●●
● ●●●
● ● ●●
●● ●
●●
● ●
●
●
●●●
● ●●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●●
●
●
●●
●
●
●
●●
●
●
●
●
●● ●
●● ●
●
●
● ●
●
●●
●●
●
● ●
●
●●
●● ●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
● ●
●
●
●●
●●
●
●●
●●●
●
●●
●
●
●●
●●
●●●
●●
●
●●
●●
●
●●●●
●●●
●●●
● ●●
●●
●
●●
● ●● ●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●● ●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●●●
●●
●●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●● ●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
A Boreal Forests/Taiga
B Deserts & Xeric Shrublands
C Flooded Grasslands & Savannas
D Mangroves
E Mediterranean Forests, Woodlands & Scrub
F Montane Grasslands & Shrublands
G Temperate Broadleaf & Mixed Forests
●
H Temperate Conifer Forests
I Temperate Grasslands, Savannas & Shrublands
J Tropical & Subtropical Coniferous Forests
K Tropical & Subtropical Dr y Broadleaf Forests
L Tropical & Subtropical Gr asslands, Savannas & Shrublands
M Tropical & Subtropical Moist Broadleaf F orests
N Tundra
•
•
•
•
Land-use change
Pollution
Invasive species
Infrastructure
The vision
What is the long-term vision?
Broad consensus in the Biodiversity
Informatics community
White paper
Nature 2013, doi:10.1038/493295a
GBIO report
BIH 2013
The long tail of Biodiversity data
Dark or Grey shaded data
Inaccessible | native format/private silos
Disconnected | not aggregated or discoverable
Redundant | overlapping efforts no coordination
Cluttered | small and dispersed datasets
Typically produced by small communities
20%
80%
Empower long-tail researchers to
make use of the available e-infrastructures and services
We need to change the modus operandi of doing Science
Positioning of VREs
VREs derived from VLEs
Sit on the top of e-infrastructures
Abstract from available services
Lower the entry barrier
Scratchpads
Virtual Research Environments
580 Communities
150,000 taxa
6,400 active users
More than
2,500,000 visitors
Data
mobilisation &
generation
Data
Data
publishing
Curation and
linking
Data
analysis
Empower users to engage
Simplify access to tools & services
Incentivise use through data publication
A Scratchpad is a gateway to big data
xls
csv
SDD
DwC-A
XML
xls
csv
TCS
XML
xls
csv
contributed data
External data & services
Taxa
(Classifications, taxon profiles, specimens, literature, images, maps, phenotypic, genotypic
& morphometric datasets, keys, phylogenies)
Conservation
Projects
Regions
Societies
Leverage effort and data impact
A highly dynamic but fragmented landscape
Specific configurations for particular domains or systems serving very generic functions
What we want to do?
Develop the highly responsive digital framework required to enable high
throughput research and support science of scale towards the long term vision of
modelling Life on Earth
Inspired by
Roadmap
publications
GBIO &
White paper
Mandated by
European & global
Societal challenges
Supported by
Maturity of available
e-infrastructures
Horizon 2020 VRE Proposal
Topic:
EINFRA-9-2015
Virtual Research Environments
Estimated Budget:
c. € 8 m
Consortium:
c. 25 partners
LinkD
Linking data, services and communities for predictive modelling of the biosphere
What we want to do?
LinkD
Science of Scale
for
L i fe o n Ea r t h
How are we going to do it?
Build on top of existing e-infrastructures
Learn from previous experiences
Prioritise operation over demonstration and proof-of-concept
Engage with multi-disciplinary research & citizen science communities
Develop and execute a comprehensive training programme
Proposed project structure
Improving uptake of cyberinfrastructures
Confidence
Agility
Marketing
Commitment
Longevity
Adaptability
User monitoring
Visibility
Intuitive interface
Thank you
@dimitriskoureas
d.koureas@nhm.ac.uk
Download