OTuama PublishingSampleData

advertisement
BIS TDWG Conference
29 October 2014, Jönköping, Sweden
Publishing sample-based data using
Darwin Core Archives
Éamonn Ó Tuama, Markus Döring, Kyle Braak,
Tim Robertson, Olaf Bánki
Global Biodiversity Information Facility (GBIF)
Why do this?
• Long perceived need by GBIF to
enable publishing of abundance
(sample) data;
• Requirement with the EU Project EU
BON (http://eubon.eu);
• Meeting the needs of the GEO
Biodiversity Observation Network
(GEO BON ).
Sample-based data
• Output of monitoring programmes;
• Quantitative, calibrated;
• Using standard protocols;
• Repeatable, comparable.
Detect changes and trends
in populations
Constraints
• Be available for testing in 2015
• Build on existing widely used
standards: Darwin Core
• Work within the existing tools
ecosystem: IPT
• … while acknowledging the promise of
ontologies (BCO, OBOE …)
Caveat
Aim: demonstrate one way data can
be exposed to maximize
discoverability and reuse.
Not in scope: establishing how data
should be captured or modelled.
A use case
Enabling the flow of sample based data
in support of GEO BON Essential
Biodiversity Variables (EBVs).
Essential Biodiversity Variables
intermediate layer between raw data and indicators
GEO BON has identified six EBV classes
a measurement
required for study,
reporting and
management of
biodiversity
change
EBV Class: Species populations
Building on
the
Darwin Core
vocabulary
Darwin Core – a glossary of terms
higherClassification
coordinatePosition
specificEpithet
geodeticDatum
collectionCode
taxonConceptID
taxonRank
collectionCode: The name, acronym, coden, or initialism identifying the collection
or data set from which the record was derived. Examples: "Mammals",
"Hildebrandt", "eBird".
7 essential terms for encoding
sample data
1.
2.
3.
4.
5.
6.
7.
eventID
projectID (new)
samplingProtocol
sampleSize (new)
sampleSizeUnit (new)
quantity (new)
quantityType (new)
New terms required
eventID: an identifier for
the set of information
associated with an Event;
may be a global unique
identifier or an identifier
specific to the data set.
projectID: an identifier for
a project with which the
data is associated; use to
link related data sets, e.g.,
a monitoring series; may
be a global unique
identifier or an identifier
specific to the series.
New terms required
sampleSize: a numeric
value for the time
duration, length, area or
volume involved in the
sampling.
sampleSizeUnit: the unit
of measurement used for
sampling, e.g., minute,
hour, day, metre, metre^2,
metre^3.
2
hour
3
m2
17
km
1
litre
Unit of measurement vocabulary
Unit of measurement vocabulary
Used in IPT as
controlled list
for
sampleSizeUnit
http://rs.gbif.org/sandbox/vocabulary/gbif/unit_of_measurement.xml
New terms required
quantity: the number or
enumeration value of the
entity or category being
quantified in the sample.
As such it is paired with
quantityType.
14
quantityType: the entity
being referred to by
quantity, e.g., individuals, a
percentage (e.g., species,
biomass, biovolume), a
scale type
Individuals
r
BraunBlanquetScale
0.4
%Species
31
%Biomass
Publishing sample
data
using the IPT
http://www.gbif.org/ipt
Event Core
• An event core is the logical way of
organising a sampling event;
• Related environmental measurements
can be included in an extension;
• Vegetation plot data (coverages) can
be included separately from
“occurrences”.
Darwin Core Archive components
Relevé ext
Occurrence ext
+
meta.xml
Event core
DwC Archive
…
…
EML.xml
Measurement-or-fact ext
http://rs.tdwg.org/dwc/terms/guides/text/index.htm
Placing the terms in a Darwin
Core Archive
Event Core
(Event, Location,
Geological Context)
Occurrence
Extension
eventID, projectID (n),
samplingProtocol, sampleSize (n),
sampleSizeUnit (n)
eventID, quantity (n),
quantityType (n)
(Occurrence, Taxon,
Identification)
For term definitions, see
http://links.gbif.org/ipt-sample-data-primer
(n) = proposed new term
A sampling event uses a particular samplingProtocol with sampleSize and
sampleSizeUnit, etc. and can record one or more taxa, each of which has a
measurement (quantity and quantityType associated with it.
Occurrence extension
eventID
scientificName
quantity
quantityType
…
C_1428
Baetis rhodani
14
individuals
…
C_1428
Ephemera danica
15
individuals
…
C_1428
Gyraulus albus
2
individuals
…
C_1538
Serratella ignita
318
individuals
…
http://rs.gbif.org/sandbox/extension/event_occurrence.xml
Event core
eventID
projectID
sampling
Protocol
sample
Size
sample
SizeUnit
event Date
location
decimal
Latitude
decimal
Longitude
…
C_1428
RM065
AQEM
1.25
m2
1963-03-01
Kinzig O3
Rothenbergen
48.1333
11.5667
…
C_1538
RM065
AQEM
1.25
m2
1975-01-21
Kinzig W1
Bulau
-34.6033
-58.3817
…
http://rs.gbif.org/sandbox/core/dwc_event.xml
Adapting
the
IPT
Now with
Event Core
Acknowledgement
EU BON and GEO BON partners, TDWG mailing
list contributors and GBIF sample data
workshop participants informed this work and
are gratefully acknowledged.
This project has received funding from the European
Union’s Seventh Programme for research, technological
development and demonstration under grant agreement
No 308454.
Thank you
GBIF Secretariat
Universitetsparken 15
DK-2100 Copenhagen Ø
Denmark
www.gbif.org
E-mail: info@gbif.org
Phone: +45 3532 1470
Fax: +45 3532 1480
Download