Scoping Document

Semantics of Biodiversity Workshop Scoping Document
May (16 optional) 17 - 18, 2012
University of Kansas Biodiversity Institute
Lawrence, Kansas
This document provides the scope of use cases, need, and environment for a workshop on
Semantics of Biodiversity. Based on discussions at the recent RCN4GSC meeting at JGI in
Walnut Creek (Jan. 2012) we anticipate an invitation list of 15 people and open to 40 people
maximum. This proposal is for a 1 ½ day workshop by Barry Smith and a possible extra day to
discuss related issues.
The goals of this workshop (still being refined): 1) Clarification of terms used in the biodiversity,
genomics, and ecological communities, and 2) Steps to take in building a Biocollections
1. Background/ Use cases
The following points were discussed during the From May 2011 RCN4GSC Meeting at UC San
● Extending traditional biodiversity data by adding specimen sequence data to the data
about the specimen.
● Extending traditional biodiversity data by adding metagenomic data taken from
associated microbial communities (gut, surface, various cavities and orifices) to the
description of the specimen.
● Extending traditional biodiversity data by adding metagenomic data taken from the
surrounding environment (soil, water, air) to the description of the environment from
which the specimen was collected (particularly important for plants and sessile animals).
● Extending metagenomic data by adding a full collections-oriented (e.g., Darwin Core)
description of the host from which a commensal microbial metagenomics sample was
collected. For example, instead of merely noting that a metagenomics sample was taken
from the gut of a particular species of beetle, also collecting enough information about
the individual beetle so that it could be accessioned into a good entomological collection.
● Extending environmental metagenomics data to include documentation about the
ecosystem (both gross and micro-habitat) and about the geospatial environment from
which the sample was collected.
● Extending genomic data by adding a full collections-oriented (e.g., Darwin Core)
description of the individual from which the DNA was taken.
● Integrating occurrence and genomic information with field ecology data systems,
including GIS, so that geospatial queries could be made that range across genomic,
organismal, ecological, environmental, and temporal variables.
Additional Use cases to consider (from TDWG):
2. What do we have as a starting point
MIxS Checklists
MBBI - MInimal Information about a Biomedical Investigation (List of checklists up to
cross-talking between lists)-- do we create Minimal Information for Collections and
Archives (MICA)? Minimal Information for Scientific Collections (MISC)?
● Darwin Core Terms
● DSW (
● TDWG Ontology (
3. What are the Gaps
Resolve the issue of Individual/Group/Population/Metapopulation/Community for both
the DwC and GSC sides.
Resolve issue of Sample vs. Lot /Basis of Record. Important to develop a clear list of
available terms. Also, is there any clear definition of a “Sample”?
Term Occurrence issues:
○ Can be observation or specimen-- does it equal a collection object?
○ Occurrence vs. a sample
○ In some circles, an occurrence refers to a process
Sample (referring to a single individual) vs. Sample (referring to multiple individuals)
○ Can sample be equated with an occurrence
Habitat vs. Environment
○ Molecular work the environmental material (e.g. biofilm) has a different meaning
perhaps than for bigger things (e.g. a bird exists in the material “air” but is this
meaningful). How do we negotiate this?
What relationships do we express between Measurement/Fact and other Objects.
Comment from Barry: “If you don’t get this clear from the start, you are doomed”
What is a taxon?
Issues of provenance, who asserts relationships
Is there any case for using the DwC class of terms under ResourceRelationship? What
are community best practices for describing relationships? How do the
ResourceRelationship terms fit with Relations in Biomedical Ontologies
4. Scope - Conceptual breadth
DwC = Darwin Core
PC = PhyloCode
DSW = Darwin Semantic Web
DC = Dublin Core
AudCore = Audubon Core
FP = Filtered Push
OpenAnn = Open Annotation Collaboration
SONET = Scientific Observations Network (
[DwC] Observations / Specimens (DwC, Ontology of Biomedical Investigations)
[DSW] Individuals
[ DwC ] Population or “Measure of Abundance”. Needs to be sorted out.
[ DwC-part ] Sample or Lot or BasisOfRecord (distinguish?) -- samples can be derived from
environment or another sample, or an individual. This needs to be discussed further. Nikos:
standardizednaming for samples. “Environmental Microbiology”, 2010 paper discusses this (will
send reference).
[ENV] Environments
[DwC] Measurement or Fact. Need to define measurement type, the measurement, and the
[DwC] Event, Temporal Component (when collected, also data is modified or updated)
[DwC] Geospatial Location
[SONET] Ecological Processes?
[FP, OpenAnn] Annotations of any other Element
[DwC] Collectors / agents / RecordedBy
[Aud.Core,DC] Digital Representations / Media (are these just observations?)
[PC] Phylo-References - Reference to a node in a tree
[ ? ] Taxonomic Concept - A hypothesis about a reference to a taxonomic name
5. Execution/ Timeframe for RDF/Ontology discussions
February 2012: Oxford Hackathon
March 2012: iDiGBio Session: lots of overlap in names. Define the MISC (min.
information for Sci. Collections)
May 16-18 2012: Semantics of Biodiversity Workshop
17-21 September 2012: GSC14 - RDF at Oxford
October 2012: TDWG at Beijing
6. Core Member of the Semantics of Biodiversity working sessions
John Deck - UCB/BiSciCol
Reed Beaman - UFl/iDigBio/BiSciCol
Norman Morrison - UManchester/RCN4GSC
Andrea Thomer - UIUC/LIS/BiSciCol
Dawn Field - Oxford/RCN4GSC
John Wieczorek - UCB/VertNet/DwC
Rob Guralnick - CU Boulder/VertNet/BiSciCol
Inigo Gil San - UNM/LTER/GSC
Stan Blum - CalAcad/DwC
Bob Robbins - RCN4GSC
Éamonn Ó Tuama - GBIF
Steve Baskauf - Darwin-sw
Kris Krishtalka
Jim Beach
Peter Dawyndt
Renzo Kottmann
David Vieglais
7. Stakeholders (institutions and people). Folks with possible interest
JPL/Nasa - Semantic Web for Earth and Environmental Terminology (SWEET) - Rob Raskin
LTER - John Porter, Corinna Gries
Oceans Observatories Initiative - Karen Stocks
GSC - Dawn Field
NCEAS - Mark Schildhauer
OBI - Ontology of Biomedical Investigations
ISA - Phillipe Rocca-Serra
La Jolla Institute for Allergy and Immunology - Bjorn Peters
TDWG (int. groups: RDF Group, Biodiversity Genomics Group)
OGC - Peter Fox
Darwin-SW: Cam Webb, Steve Baskauf, Joel Sachs
FP: Bob Morris, Paul Morris
Field Notebook Registry: Carolyn Sheffield
Freebase/Google: Jamie Taylor
BOLD: Sujaveen Ratnasingham
Dave Dubin: Illinois UIUC
NESCent: Hilmar Lapp, Todd Vision
UNC Chapel Hill Library Science: Jane Greenberg (on Dublin Core working group)
EOL - Cindy Parr
GBIF - Eamonn O’ Tuoma, Dag Endreeson, Markus Doring
USGS - Peter N. Schweitzer
OBOFoundry (, GO: Chris Mungall
BiSciCol SONET - Shawn Bowers
Ramona Walls - New York Botanical Garden (Population and Community Ontology,
Maria Alejandra Gandolfo Nixon
Plant Ontology (
ENVO/GSC - Pier Luigi Buttigieg