GLOBAL BIODIVERSITY INFORMATION FACILITY DarwinCore Germplasm Extension and deployment in the GBIF infrastructure TDWG 2009, Montpelier, November 12, 2009 Dag Endresen (NordGen) & Samy Gaiji (GBIF) WWW.GBIF.ORG Topics for this session Darwin Core (2009) DwC Germplasm Extension (DRAFT 0.1) Germplasm Extension Terms Mapping to the Multi-Crop Passport Descriptors Integrated Publishing Toolkit (IPT) IPT Germplasm Extension Darwin Core (2009) The Darwin Core should be viewed as an extension of the Dublin Core for biodiversity information. The purpose of these terms is to facilitate data sharing a well-defined standard core vocabulary a flexible framework to maximize re-usability The Darwin Core can be extended by adding new terms to share additional information. http://rs.tdwg.org/dwc/ DwC star schema Star schema model Can relate elements one-to-many DwC Germplasm Extension DwC Germplasm Extension (DRAFT 0.1) August 26, 2009 The DarwinCore Germplasm Extension additional terms to describe germplasm samples maintained by genebanks worldwide http://rs.nordgen.org/dwc/ http://www.nordgen.org/epgris3/wiki/index.php/DwC_Germplasm DwC Germplasm Extension DwC Germplasm Extension (DRAFT 0.1) Modelled starting from the Multi-Crop Passport standard (MCPD, 2001) Includes the new terms for crop trait experiments developed as part of the European EPGRIS3 project. Includes a few additional terms for new international crop treaty regulations. DwC Germplasm (1) DwC Germplasm (2) DwC Germplasm (3) DwC Germplasm (4) DwC Germplasm (5) DwC Germplasm (6) GermplasmDistribution Perhaps add new terms to facilitate the reporting of germplasm distribution for the ITPGRFA (International Treaty for Genetic Resources for Food and Agriculture) GermplasmManagement The Millennium Seed Bank (Kew) has contributed feedback to the DwC-G modeling and proposed to include a number of seed management descriptors. • Seed processing terms • Seed cleaning • Seed germination testing Mapping of DwC-G terms to the MCPD descriptors Mapping of DwC-G terms to the MCPD descriptors (continued) MCPD -> ABCD 2.06 (2004) National Inventory Code Institute Code Accession Number Collecting Number Collecting Institute Code Genus Species Species Authority „Subtaxa“ „Subtaxa“ Authority Common Crop Name Accession Name Acquisition Date Country of Origin Location of Collection Site Latitude of CS Longitude of CS Elevation of CS Collecting Date of Sample Breeding Institute Code Biological Status of Accession Ancestral Data Collecting/Acquisition Source Donor Institute Code Donor Accession Number Other Identification (Number) associated with the accession Location of Safety Duplicates Type of Germplasm Storage Remarks Decoded Collecting Institute Decoded Breeding Institute Decoded Donor Institute Decoded Safety Duplication Location Accession URL Helmut Knüpffer IPK Gatersleben Descriptors marked red did not match the earlier versions of ABCD ABCD was extended by a PGR section [W. Berendsohn, H. Knüpffer] Walter Berendsohn BGBM http://www.ecpgr.cgiar.org/epgris/Tech_papers/EURISCO_Descriptors.pdf GBIF Informatics Suite GBIF Decentralization Strategy (WP 2009-2010) Customized biodiversity data networks Tools to empower decentralized thematic or regional networks IPT Project site: http://code.google.com/p/gbif-providertoolkit/ IPT DEMO. http://ipt.gbif.org/ IPT LITE DEMO: http://ipt-lite.gbif.org/index.html IPT Mailing List: http://lists.gbif.org/mailman/listinfo/ipt/ GBIF HIT: http://code.google.com/p/gbif-indexingtoolkit/ GBRDS: http://code.google.com/p/gbif-registry/ • The GBIF IPT is an open source, Java (TM) based web application that connects and serves primary biodiversity data. • The data registered in the IPT is connected to the GBIF distributed network and made available for public access. • Designed to decentralize and speed up the process of indexing (large) biodiversity occurrence datasets. • IPT also provides a local tool for data quality assessment to data publishers. GBIF Integrated Publishing Toolkit (IPT) - - Java 1.5 or higher is required Apache Tomcat is recommended (1 GB RAM+) GBIF IPT is provided as a WAR archive (for easy deployment) GeoServer is included for web mapping (OGC Compliant, WFS, WMS, etc) H2 Embedded Java Database (with JDBC interface and web console) Hibernate (object relational mapping) http://ipt.nordgen.org/ipt/ IPT Interfaces REST XML TAPIR DwC Archive OGC (WFS, WMS, Web Mapping) EML (Ecological Markup Language) Darwin Core Archive (DwC-A) DwC-A publish dwc records including extensions Simple text based format Zipped single file archive Germplasm.txt http://code.google.com/p/gbif-ecat/wiki/DwCArchive The GBIF IPT service has a graphical interface to the datasets. Including a map, pie charts, or the right side context menu (taxonomy and geography). 24 The IPT user interface includes the extensions XML interface includes the extensions GBIF IPT implements the Darwin Core Standard; and provides an interface to easily build extensions to the core Darwin Core terms. The draft germplasm extension is one example of how-to extend the Darwin Core terms for the GBIF IPT. Using GBIF technology (and contributing to its development), the PGR community can easily establish specific PGR networks without duplicating GBIF's work. The compatibility of data standards between PGR and biodiversity collections made it possible to integrate the worldwide germplasm collections into the biodiversity community (GBIF, TDWG). GBIF PGR Network 2 http://data.gbif.org/datasets/network/2 Special thanks to: • GBIF, Global Biodiversity Information Facility http://www.gbif.org • TDWG, Biodiversity Information Standards http://www.tdwg.org • BioCASE, The Biological Collection Access Service for Europe. http://www.biocase.org • Bioversity International http://www.bioversityinternational.org Things can happen in a band, or any type of collaboration, that would not otherwise happen. (Jim Coleman, Musician)