BIS TDWG Conference 29 October 2014, Jönköping, Sweden Publishing sample-based data using Darwin Core Archives Éamonn Ó Tuama, Markus Döring, Kyle Braak, Tim Robertson, Olaf Bánki Global Biodiversity Information Facility (GBIF) Why do this? • Long perceived need by GBIF to enable publishing of abundance (sample) data; • Requirement with the EU Project EU BON (http://eubon.eu); • Meeting the needs of the GEO Biodiversity Observation Network (GEO BON ). Sample-based data • Output of monitoring programmes; • Quantitative, calibrated; • Using standard protocols; • Repeatable, comparable. Detect changes and trends in populations Constraints • Be available for testing in 2015 • Build on existing widely used standards: Darwin Core • Work within the existing tools ecosystem: IPT • … while acknowledging the promise of ontologies (BCO, OBOE …) Caveat Aim: demonstrate one way data can be exposed to maximize discoverability and reuse. Not in scope: establishing how data should be captured or modelled. A use case Enabling the flow of sample based data in support of GEO BON Essential Biodiversity Variables (EBVs). Essential Biodiversity Variables intermediate layer between raw data and indicators GEO BON has identified six EBV classes a measurement required for study, reporting and management of biodiversity change EBV Class: Species populations Building on the Darwin Core vocabulary Darwin Core – a glossary of terms higherClassification coordinatePosition specificEpithet geodeticDatum collectionCode taxonConceptID taxonRank collectionCode: The name, acronym, coden, or initialism identifying the collection or data set from which the record was derived. Examples: "Mammals", "Hildebrandt", "eBird". 7 essential terms for encoding sample data 1. 2. 3. 4. 5. 6. 7. eventID projectID (new) samplingProtocol sampleSize (new) sampleSizeUnit (new) quantity (new) quantityType (new) New terms required eventID: an identifier for the set of information associated with an Event; may be a global unique identifier or an identifier specific to the data set. projectID: an identifier for a project with which the data is associated; use to link related data sets, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series. New terms required sampleSize: a numeric value for the time duration, length, area or volume involved in the sampling. sampleSizeUnit: the unit of measurement used for sampling, e.g., minute, hour, day, metre, metre^2, metre^3. 2 hour 3 m2 17 km 1 litre Unit of measurement vocabulary Unit of measurement vocabulary Used in IPT as controlled list for sampleSizeUnit http://rs.gbif.org/sandbox/vocabulary/gbif/unit_of_measurement.xml New terms required quantity: the number or enumeration value of the entity or category being quantified in the sample. As such it is paired with quantityType. 14 quantityType: the entity being referred to by quantity, e.g., individuals, a percentage (e.g., species, biomass, biovolume), a scale type Individuals r BraunBlanquetScale 0.4 %Species 31 %Biomass Publishing sample data using the IPT http://www.gbif.org/ipt Event Core • An event core is the logical way of organising a sampling event; • Related environmental measurements can be included in an extension; • Vegetation plot data (coverages) can be included separately from “occurrences”. Darwin Core Archive components Relevé ext Occurrence ext + meta.xml Event core DwC Archive … … EML.xml Measurement-or-fact ext http://rs.tdwg.org/dwc/terms/guides/text/index.htm Placing the terms in a Darwin Core Archive Event Core (Event, Location, Geological Context) Occurrence Extension eventID, projectID (n), samplingProtocol, sampleSize (n), sampleSizeUnit (n) eventID, quantity (n), quantityType (n) (Occurrence, Taxon, Identification) For term definitions, see http://links.gbif.org/ipt-sample-data-primer (n) = proposed new term A sampling event uses a particular samplingProtocol with sampleSize and sampleSizeUnit, etc. and can record one or more taxa, each of which has a measurement (quantity and quantityType associated with it. Occurrence extension eventID scientificName quantity quantityType … C_1428 Baetis rhodani 14 individuals … C_1428 Ephemera danica 15 individuals … C_1428 Gyraulus albus 2 individuals … C_1538 Serratella ignita 318 individuals … http://rs.gbif.org/sandbox/extension/event_occurrence.xml Event core eventID projectID sampling Protocol sample Size sample SizeUnit event Date location decimal Latitude decimal Longitude … C_1428 RM065 AQEM 1.25 m2 1963-03-01 Kinzig O3 Rothenbergen 48.1333 11.5667 … C_1538 RM065 AQEM 1.25 m2 1975-01-21 Kinzig W1 Bulau -34.6033 -58.3817 … http://rs.gbif.org/sandbox/core/dwc_event.xml Adapting the IPT Now with Event Core Acknowledgement EU BON and GEO BON partners, TDWG mailing list contributors and GBIF sample data workshop participants informed this work and are gratefully acknowledged. This project has received funding from the European Union’s Seventh Programme for research, technological development and demonstration under grant agreement No 308454. Thank you GBIF Secretariat Universitetsparken 15 DK-2100 Copenhagen Ø Denmark www.gbif.org E-mail: info@gbif.org Phone: +45 3532 1470 Fax: +45 3532 1480