Semantics of Biodiversity Workshop Scoping Document May (16 optional) 17 - 18, 2012 University of Kansas Biodiversity Institute Lawrence, Kansas This document provides the scope of use cases, need, and environment for a workshop on Semantics of Biodiversity. Based on discussions at the recent RCN4GSC meeting at JGI in Walnut Creek (Jan. 2012) we anticipate an invitation list of 15 people and open to 40 people maximum. This proposal is for a 1 ½ day workshop by Barry Smith and a possible extra day to discuss related issues. The goals of this workshop (still being refined): 1) Clarification of terms used in the biodiversity, genomics, and ecological communities, and 2) Steps to take in building a Biocollections Ontology. 1. Background/ Use cases The following points were discussed during the From May 2011 RCN4GSC Meeting at UC San Diego: ● Extending traditional biodiversity data by adding specimen sequence data to the data about the specimen. ● Extending traditional biodiversity data by adding metagenomic data taken from associated microbial communities (gut, surface, various cavities and orifices) to the description of the specimen. ● Extending traditional biodiversity data by adding metagenomic data taken from the surrounding environment (soil, water, air) to the description of the environment from which the specimen was collected (particularly important for plants and sessile animals). ● Extending metagenomic data by adding a full collections-oriented (e.g., Darwin Core) description of the host from which a commensal microbial metagenomics sample was collected. For example, instead of merely noting that a metagenomics sample was taken from the gut of a particular species of beetle, also collecting enough information about the individual beetle so that it could be accessioned into a good entomological collection. (expand on this goal?) ● Extending environmental metagenomics data to include documentation about the ecosystem (both gross and micro-habitat) and about the geospatial environment from which the sample was collected. ● Extending genomic data by adding a full collections-oriented (e.g., Darwin Core) description of the individual from which the DNA was taken. ● Integrating occurrence and genomic information with field ecology data systems, including GIS, so that geospatial queries could be made that range across genomic, organismal, ecological, environmental, and temporal variables. Additional Use cases to consider (from TDWG): ● ● http://code.google.com/p/tdwg-rdf/wiki/CompetencyQuestions or http://code.google.com/p/tdwg-rdf/wiki/UseCases 2. What do we have as a starting point GSC ● ● MIxS Checklists MBBI - MInimal Information about a Biomedical Investigation (List of checklists up to cross-talking between lists)-- do we create Minimal Information for Collections and Archives (MICA)? Minimal Information for Scientific Collections (MISC)? TDWG ● Darwin Core Terms (http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwcterms.rdf) ● DSW (http://code.google.com/p/darwin-sw/) ● TDWG Ontology (http://wiki.tdwg.org/twiki/bin/view/TAG/TDWGOntology) 3. What are the Gaps ● ● ● ● ● ● ● ● ● Resolve the issue of Individual/Group/Population/Metapopulation/Community for both the DwC and GSC sides. Resolve issue of Sample vs. Lot /Basis of Record. Important to develop a clear list of available terms. Also, is there any clear definition of a “Sample”? Term Occurrence issues: ○ Can be observation or specimen-- does it equal a collection object? ○ Occurrence vs. a sample ○ In some circles, an occurrence refers to a process Sample (referring to a single individual) vs. Sample (referring to multiple individuals) ○ Can sample be equated with an occurrence Habitat vs. Environment ○ Molecular work the environmental material (e.g. biofilm) has a different meaning perhaps than for bigger things (e.g. a bird exists in the material “air” but is this meaningful). How do we negotiate this? What relationships do we express between Measurement/Fact and other Objects. Comment from Barry: “If you don’t get this clear from the start, you are doomed” What is a taxon? Issues of provenance, who asserts relationships Is there any case for using the DwC class of terms under ResourceRelationship? What are community best practices for describing relationships? How do the ResourceRelationship terms fit with Relations in Biomedical Ontologies (http://genomebiology.com/2005/6/5/R46) 4. Scope - Conceptual breadth DwC = Darwin Core PC = PhyloCode DSW = Darwin Semantic Web DC = Dublin Core AudCore = Audubon Core FP = Filtered Push OpenAnn = Open Annotation Collaboration ENV = ENVO SONET = Scientific Observations Network (https://sonet.ecoinformatics.org/) [DwC] Observations / Specimens (DwC, Ontology of Biomedical Investigations) [DSW] Individuals [ DwC ] Population or “Measure of Abundance”. Needs to be sorted out. [ DwC-part ] Sample or Lot or BasisOfRecord (distinguish?) -- samples can be derived from environment or another sample, or an individual. This needs to be discussed further. Nikos: standardizednaming for samples. “Environmental Microbiology”, 2010 paper discusses this (will send reference). [ENV] Environments [DwC] Measurement or Fact. Need to define measurement type, the measurement, and the unit. [DwC] Event, Temporal Component (when collected, also data is modified or updated) [DwC] Geospatial Location [SONET] Ecological Processes? [FP, OpenAnn] Annotations of any other Element [DwC] Collectors / agents / RecordedBy [Aud.Core,DC] Digital Representations / Media (are these just observations?) [PC] Phylo-References - Reference to a node in a tree [ ? ] Taxonomic Concept - A hypothesis about a reference to a taxonomic name 5. Execution/ Timeframe for RDF/Ontology discussions ● ● ● ● ● February 2012: Oxford Hackathon March 2012: iDiGBio Session: lots of overlap in names. Define the MISC (min. information for Sci. Collections) May 16-18 2012: Semantics of Biodiversity Workshop 17-21 September 2012: GSC14 - RDF at Oxford October 2012: TDWG at Beijing 6. Core Member of the Semantics of Biodiversity working sessions ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● John Deck - UCB/BiSciCol Reed Beaman - UFl/iDigBio/BiSciCol Norman Morrison - UManchester/RCN4GSC Andrea Thomer - UIUC/LIS/BiSciCol Dawn Field - Oxford/RCN4GSC John Wieczorek - UCB/VertNet/DwC Rob Guralnick - CU Boulder/VertNet/BiSciCol Inigo Gil San - UNM/LTER/GSC Stan Blum - CalAcad/DwC Bob Robbins - RCN4GSC Éamonn Ó Tuama - GBIF Steve Baskauf - Darwin-sw Kris Krishtalka Jim Beach Peter Dawyndt Renzo Kottmann David Vieglais 7. Stakeholders (institutions and people). Folks with possible interest JPL/Nasa - Semantic Web for Earth and Environmental Terminology (SWEET) - Rob Raskin LTER - John Porter, Corinna Gries Oceans Observatories Initiative - Karen Stocks GSC - Dawn Field NCEAS - Mark Schildhauer OBI - Ontology of Biomedical Investigations ISA - Phillipe Rocca-Serra La Jolla Institute for Allergy and Immunology - Bjorn Peters TDWG (int. groups: RDF Group, Biodiversity Genomics Group) OGC - Peter Fox Darwin-SW: Cam Webb, Steve Baskauf, Joel Sachs FP: Bob Morris, Paul Morris Field Notebook Registry: Carolyn Sheffield Freebase/Google: Jamie Taylor BOLD: Sujaveen Ratnasingham Dave Dubin: Illinois UIUC NESCent: Hilmar Lapp, Todd Vision UNC Chapel Hill Library Science: Jane Greenberg (on Dublin Core working group) EOL - Cindy Parr GBIF - Eamonn O’ Tuoma, Dag Endreeson, Markus Doring USGS - Peter N. Schweitzer OBOFoundry (http://obofoundry.org/), GO: Chris Mungall BiSciCol SONET - Shawn Bowers Ramona Walls - New York Botanical Garden (Population and Community Ontology, http://www.nybg.org/science/scientist_profile.php?id_scientist=119) Maria Alejandra Gandolfo Nixon http://plantbio.cornell.edu/cals/plbio/directory/faculty.cfm?netId=mag4 Plant Ontology (http://www.plantontology.org/) ENVO/GSC - Pier Luigi Buttigieg pbuttigi@mpi-bremen.de