Scratchpads

advertisement
Scratchpads
Virtual Research Environments
for taxonomic and biodiversity related data
Reading, 27-02-2013
Our current taxonomic data production
•
•
•
•
•
15-20k new spp. described annually (2M total)1
30k nomenclatural acts (12M total) 1
20k phylogenies (750k total)2
31k taxa sequenced (360k taxa total)3
800k BioMed papers (40M total pp. of taxonomy) 4
•
Countless specimens, images, maps, keys and datasets
Typically generated by small communities for
“local” research projects
Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed.
On the other hand:
Estimates of
7.5 million species
still undescribed1
1How
Many Species Are There on Earth and in the Ocean? Mora C et al.
doi:10.1371/journal.pbio.1001127
Expected volume
Need of extracting,
of taxonomic and
aggregating and linking
biodiversity data
data on a global level
The four nodes of data workflow
1.
We collect and generate data
2.
We curate, link and structure data
3.
We analyse data
4.
We publish data
The four nodes of data workflow
What are the
bottlenecks
Data
in the workflow?
collection &
generation
Data
Data
publishing
curation
Data
analysis
What we need is…
a
seamless
workflow
Data
collection &
generation
Data
Data
publishing
curation
Data
analysis
To achieve this…
Link together
evolutionary
data… by developing
“
analytical tools and
proper
documentation and
This requires data, information & knowledge
to be…
• Digital
Not printed paper
• Openly accessible
Not behind barriers (e.g. paywalls)
• Linked-up
Not in silos
then use this framework to
conduct comparative analyses,
studies of evolutionary process
and biodiversity analyses”
Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001
Scratchpads
Virtual Research Environments
Making taxonomy digital, open & linked
so…
what are
the
Scratchpads?
What are Scratchpads?
• Hosted websites for biodiversity data
• Virtual research & publication platform
• Completely open access & open source
• Modular & flexible
What are Scratchpads?
facilitate
development of online research communities
through
standardized environment of entering and curating data
that allow
sharing and interlinking
and
dissemination of research products
The Scratchpads concept
A Scratchpad is a website that holds data for you and your community
Your data
External data & services
The Scratchpads concept
Examples of use:
Taxa
(Classifications, taxon profiles, specimens, literature, images, maps, phenotypic, genotypic
& morphometric datasets, keys, phylogenies)
Conservation
Projects
Regions
Societies
Examples of use:
Red List conservation assessments
Examples of use:
Bulbous monocot genera listed in CITES
Examples of use:
Global Invasive Alien Species Information Partnership
Major integrated projects
• Online resource for
monocot plants
• Collaboration between
Kew, Oxford University
and NHM
• Data to be open and
usable by other scientists
Major integrated projects
• 21+ open community sites and
growing
• Over 45 internationally
collaborating scientists
• Site data feeds into a “Portal”
Site List: http://about.e-monocot.org/list-emonocot-scratchpads
Major integrated projects
• Retrieve information on
any Monocot plant
• Rich downloadable data
• Identification keys
• Model example of linked
attributed data
eMonocot Portal: http://e-monocot.org/
Are Scratchpads sustainable?
464 Scratchpads Communities
by
6,407 active registered users
covering
52,661 taxa
in 559,488 pages.
In total more than
1,200,000 visitors
Per month unique visitors to Scratchpads sites
65000
unique visitors/month
Are Scratchpads sustainable?
2007
2011
2014
ViBRANT
Virtual Biodiversity Research
&
&
Other grants in the pipeline
Proposals?
the main
features
The main features
Classification term
oriented system
Biological
classifications
Taxonomies
Non-biological
classifications
Hierarchical controlled
vocabularies
The main features
Dynamic Biological Classifications
Manually entered or imported
Auto generated
The main features
Taxon pages
Overview of data related to taxon
Generated from tagged content
The main features
Bibliography management
An inbuilt Bibliography manager
Faceted browsing
Taxon tagging and free keywords
Import from and export to all major formats
The main features
Specimen/Observation data
Annotated full specimen/observation records
Linked to images and georeferenced
The main features
Distribution maps
Google maps based
Data layers
Occurrence data
Distribution data
TDWG regions
GBIF data
The main features
Example regional distribution
The main features
Character matrices – Key construction
Quantitative or qualitative characters
Auto generation of keys
Taxon based matrices
[Specimens based character matrices]
The main features
Media handling
Bulk upload
Metadata (incl. EXIF)
Media galleries
The main features
Generation of custom pages
Tagged or not
External RSS
Twitter feeds
Media files
The main features
Enhanced communication tools
Working groups
Forums
Blog entries
Webforms
Newsletters
RSS syndication
Inbuilt comments
The main features
analytical
tools
OBOE service
i.a.
Ecological informatics,
Phylogenetics,
Sequence alignment
External services Integration
data
mobilisation
more on the way…
IUCN data integration
GBIF data integration
BRAHMS data migration
The main features
The
Publication
module
Open-access
journal
What will BDJ publish?
• Single taxon treatments and
nomenclatural acts
• Local or regional checklists
• Sampling reports and occasional
inventories
• Habitat-based checklists and inventories
• Ecological and biological observations of
species and communities?
• Single identification keys
• biodiversity-related databases, including
genomic, ecological and environmental
data (data papers)
• Biodiversity-related software tools
How do
Scratchpads
and
BDJ
interact?
Working in a single environment
Allow submission of
datasets
for publication
without
reformatting and restructuring
based on standardised XML schema
The publication module
Author names and affiliations
Taxon descriptions
Specimen data
Figures and Tables
XML
Keys
References
Texts
Community
The data workflow
XML
submission
SCRATCHPADS
PENSOFT JOURNAL SYSTEM
(PJS 2.0)
MANUSCRIPT PUBLISHED
(XML, PDF)
Archive
datasets
Occurrence data
Taxon treatments
Plazi
Taxon names
Wiki
Scratchpads are an integrated system to
Enter, Curate, Mark-up, Link and Publish data
workflow
in a single virtual environment
taxonomic
Acknowledgements
Scratchpads technical development
- Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton & Katherine Bouton
Scratchpads outreach
- Laurence Livermore, Isa van deVelde & Dimitris Koureas
e-Monocot
- Paul Wilkin & the Kew team, Charles Godfray & the Oxford team
ViBRANT
- Vince Smith, Dave Roberts & Lucy Reeve
Pensoft
- Lyobomir Penev and the Pensoft team
Our 7000 users
Help & Support
• In-site Support
• Wiki
• Training Courses (12 in 2012)
• Ambassadors Programme
• Embedded Issues Queue
• Sandbox Site
http://help.scratchpad.eu
Data
collection &
generation
Data
publishing
Thank you
Data
analysis
Data
curation
Authors and Contributors
Contributors
(mentor, linguis c editor, copy editor,
poten al reviewer, colleague/friend)
Con
trib
u
ng
ite
Inv
Manuscript ready to submit
Taxon treatment
Templatebased
manuscript
Lead author crea on
Interac ve key
Checklist
Authoring
Data paper
Inv
ite
ing
hor
Aut
Co-authors
Download