Scratchpads Virtual Research Environments for taxonomic and biodiversity related data Reading, 27-02-2013 Our current taxonomic data production • • • • • 15-20k new spp. described annually (2M total)1 30k nomenclatural acts (12M total) 1 20k phylogenies (750k total)2 31k taxa sequenced (360k taxa total)3 800k BioMed papers (40M total pp. of taxonomy) 4 • Countless specimens, images, maps, keys and datasets Typically generated by small communities for “local” research projects Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed. On the other hand: Estimates of 7.5 million species still undescribed1 1How Many Species Are There on Earth and in the Ocean? Mora C et al. doi:10.1371/journal.pbio.1001127 Expected volume Need of extracting, of taxonomic and aggregating and linking biodiversity data data on a global level The four nodes of data workflow 1. We collect and generate data 2. We curate, link and structure data 3. We analyse data 4. We publish data The four nodes of data workflow What are the bottlenecks Data in the workflow? collection & generation Data Data publishing curation Data analysis What we need is… a seamless workflow Data collection & generation Data Data publishing curation Data analysis To achieve this… Link together evolutionary data… by developing “ analytical tools and proper documentation and This requires data, information & knowledge to be… • Digital Not printed paper • Openly accessible Not behind barriers (e.g. paywalls) • Linked-up Not in silos then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses” Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001 Scratchpads Virtual Research Environments Making taxonomy digital, open & linked so… what are the Scratchpads? What are Scratchpads? • Hosted websites for biodiversity data • Virtual research & publication platform • Completely open access & open source • Modular & flexible What are Scratchpads? facilitate development of online research communities through standardized environment of entering and curating data that allow sharing and interlinking and dissemination of research products The Scratchpads concept A Scratchpad is a website that holds data for you and your community Your data External data & services The Scratchpads concept Examples of use: Taxa (Classifications, taxon profiles, specimens, literature, images, maps, phenotypic, genotypic & morphometric datasets, keys, phylogenies) Conservation Projects Regions Societies Examples of use: Red List conservation assessments Examples of use: Bulbous monocot genera listed in CITES Examples of use: Global Invasive Alien Species Information Partnership Major integrated projects • Online resource for monocot plants • Collaboration between Kew, Oxford University and NHM • Data to be open and usable by other scientists Major integrated projects • 21+ open community sites and growing • Over 45 internationally collaborating scientists • Site data feeds into a “Portal” Site List: http://about.e-monocot.org/list-emonocot-scratchpads Major integrated projects • Retrieve information on any Monocot plant • Rich downloadable data • Identification keys • Model example of linked attributed data eMonocot Portal: http://e-monocot.org/ Are Scratchpads sustainable? 464 Scratchpads Communities by 6,407 active registered users covering 52,661 taxa in 559,488 pages. In total more than 1,200,000 visitors Per month unique visitors to Scratchpads sites 65000 unique visitors/month Are Scratchpads sustainable? 2007 2011 2014 ViBRANT Virtual Biodiversity Research & & Other grants in the pipeline Proposals? the main features The main features Classification term oriented system Biological classifications Taxonomies Non-biological classifications Hierarchical controlled vocabularies The main features Dynamic Biological Classifications Manually entered or imported Auto generated The main features Taxon pages Overview of data related to taxon Generated from tagged content The main features Bibliography management An inbuilt Bibliography manager Faceted browsing Taxon tagging and free keywords Import from and export to all major formats The main features Specimen/Observation data Annotated full specimen/observation records Linked to images and georeferenced The main features Distribution maps Google maps based Data layers Occurrence data Distribution data TDWG regions GBIF data The main features Example regional distribution The main features Character matrices – Key construction Quantitative or qualitative characters Auto generation of keys Taxon based matrices [Specimens based character matrices] The main features Media handling Bulk upload Metadata (incl. EXIF) Media galleries The main features Generation of custom pages Tagged or not External RSS Twitter feeds Media files The main features Enhanced communication tools Working groups Forums Blog entries Webforms Newsletters RSS syndication Inbuilt comments The main features analytical tools OBOE service i.a. Ecological informatics, Phylogenetics, Sequence alignment External services Integration data mobilisation more on the way… IUCN data integration GBIF data integration BRAHMS data migration The main features The Publication module Open-access journal What will BDJ publish? • Single taxon treatments and nomenclatural acts • Local or regional checklists • Sampling reports and occasional inventories • Habitat-based checklists and inventories • Ecological and biological observations of species and communities? • Single identification keys • biodiversity-related databases, including genomic, ecological and environmental data (data papers) • Biodiversity-related software tools How do Scratchpads and BDJ interact? Working in a single environment Allow submission of datasets for publication without reformatting and restructuring based on standardised XML schema The publication module Author names and affiliations Taxon descriptions Specimen data Figures and Tables XML Keys References Texts Community The data workflow XML submission SCRATCHPADS PENSOFT JOURNAL SYSTEM (PJS 2.0) MANUSCRIPT PUBLISHED (XML, PDF) Archive datasets Occurrence data Taxon treatments Plazi Taxon names Wiki Scratchpads are an integrated system to Enter, Curate, Mark-up, Link and Publish data workflow in a single virtual environment taxonomic Acknowledgements Scratchpads technical development - Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton & Katherine Bouton Scratchpads outreach - Laurence Livermore, Isa van deVelde & Dimitris Koureas e-Monocot - Paul Wilkin & the Kew team, Charles Godfray & the Oxford team ViBRANT - Vince Smith, Dave Roberts & Lucy Reeve Pensoft - Lyobomir Penev and the Pensoft team Our 7000 users Help & Support • In-site Support • Wiki • Training Courses (12 in 2012) • Ambassadors Programme • Embedded Issues Queue • Sandbox Site http://help.scratchpad.eu Data collection & generation Data publishing Thank you Data analysis Data curation Authors and Contributors Contributors (mentor, linguis c editor, copy editor, poten al reviewer, colleague/friend) Con trib u ng ite Inv Manuscript ready to submit Taxon treatment Templatebased manuscript Lead author crea on Interac ve key Checklist Authoring Data paper Inv ite ing hor Aut Co-authors