GLOBAL BIODIVERSITY INFORMATION FACILITY

advertisement

GLOBAL

BIODIVERSITY

INFORMATION

FACILITY

Developing taxonomic names services to enhance findability

David Remsen

ECAT Programme Officer

October 22, 2007

WWW.GBIF.ORG

Presentation

Overview of GBIF and data portal

Informatics challenges relating to taxon data

What we are doing about it

Wider implications of our efforts

GBIF mission

…to make the world’s biodiversity data freely and universally available via the Internet

What is biodiversity?

GBIF follows the broadly outlined CBD recognition of levels of biological diversity:

• Molecules / genes

• Species

• Ecosystems / ecology

New GBIF data portal

 http://data.gbif.org

/

GBIF Data Types

Core data types on

GBIF network

Taxon names

Taxon occurrence information

 specimen records from natural history collections

 observational records

Fields used in indexing records

Mandatory

Scientific name

Institutional code

Collection code

Catalogue number

Highly desirable

Geospatial location

Collection date

Higher taxon info

Date last modified

GBIF network today

Users

List species recorded in Costa Rica

Find all occurrences of Papilio machaon

Find type specimen for Coffea odorata

Find occurrences of Primates from

Madagascar

Find occurrences from

Antananarivo Province

Portal

Mirror

Registry

DiGIR

DiGIR

DiGIR

TAPIR

DiGIR

TAPIR

DiGIR

X

X

X

X

X

X

Index

Mirror

Databases

 http://data.gbif.org

/

Species: Achillea millefolium Kingdom: Animalia

Country: Madagascar Dataset: Continuous Plankton

Recorder Database

Maps - occurrence density

 http://data.gbif.org

/

Actions

Occurrence download

Names and type specimens

Images

taxon data http://data.gbif.org/ws/rest/taxon occurrence record data http://data.gbif.org/ws/rest/occurrence occurrence density data http://data.gbif.org/ws/rest/density

GBIF

Data Portal

Web Services http://data.gbif.org/ws/rest/resource dataset metadata http://data.gbif.org/ws/rest/provider data provider metadata http://data.gbif.org/ws/rest/network data network metadata

Web services

http://data.gbif.org/ws/rest/occurrence/list/?taxonConceptKey=14724348&format=darwin

Embedding in other sites

 iSpecies

Portal Summary

Hundreds of Institutional providers

Thousands of Resources

Millions of Records

Collective & Integrated Access

Wide Taxonomic, Temporal and Geographic Scope

Free and Open Access to all

Go forth and integrate!

Parallel: GenBank

Informatics challenges relating to taxon data

The Makings of a problem

Everything I Just Said

Meets

The names problem in biology

All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.

-

Grimaldi & Engel, 2005, Evolution of the Insects

Nature of the problem?

Access to data is via limited points of entry

Biology has a “names problem.”

This names problem impacts these data access entry points.

Exacerbated by:

Wide taxonomic, temporal scope

Federated origins of data

Limited points of entry: search

Limited points of entry: browse

Breakdown of the names problem

Synonymy

A single concept may reference multiple names

Equivalent

Inclusive

Homography (Homonymy)

A single name may refer to multiple concepts

Definition

A single name may refer to multiple KINDS of concepts

“Name” refers to:

A lexical “concept”

A set of character strings

A nomenclatural concept

A Code-regulated fact

A taxonomic concept

A Hypothesis or opinion

Synonyms

Homonyms

All of these are important to distinguish

Synonyms: Equivalence

Lexical/Orthographic

Informed by: Nomenclators, Taxonomies, Algorithm

Nomenclatural

Informed by: Nomenclators, Monographs (with interpretation)

Taxonomic

Informed by: Monographs, Floras, Faunas, derived checklists

Different classes of equivalence are addressed by different resources

Lexical synonym: A single concept may reference multiple names

ILPIN

IPNI

MOBOT

Gerardia paupercula (Gray) Britt. var borealis (Pennell) Deam

Gerardia paupercula var borealis (Pennell) Deam

Gerardia paupercula Britt. var borealis Deam

Informed by: Nomenclators, Taxonomies, Algorithm

Identifies the preferred lexigraphy of the name

Automates the grouping of lexical variation

Orthographic synonym: A single concept may reference multiple names

Loligo pealeii Loligo pealii

Loligo pealei Loligo plei

Informed by: Nomenclators, Taxonomies, Algorithm

Vernacular synonym: A single concept may reference multiple names

Nomenclatural synonym: A single concept may reference multiple names

Nomenclatural synonym: A single concept may reference multiple names

ILPIN Gerardia paupercula (Gray) Britt. var borealis (Pennell) Deam

IPNI

MOBOT

Gerardia paupercula var borealis (Pennell) Deam

Gerardia paupercula Britt. var borealis Deam

MOBOT

IPNI

Agalinis paupercula (Gray) Britton var.

borealis Pennell (Zenkert 1934)

OHIO DNR Agalinis paupercula (Gray) Britt. var.

borealis Pennell

Agalinis paupercula Britton var. borealis Pennell

ITIS Agalinis paupercula var.

borealis Pennell

Informed by: Nomenclators and generally NOT by taxonomy

Taxonomic synonym: A single concept may reference multiple names (or it may not)

Informed by: Taxonomic Sources

Synthesized synonymy: A bit of everything

Informed by: Algorithm, Nomenclators,Taxonomic Sources

Another example

Aedes calopus | Stegomyia Aegypti | Culex aegypti

Synonymy: Inclusive

Classifications

Catalogue of Life Integrated Classification

Annotated Checklist of the Neuroptera - Mansell 2006

NCBI Taxonomy

Cladograms, Phylograms

Phylogenetic representations

Regional lists

Cetacea of the Hebrides

Flora of China

Thematic Lists

2006 IUCN RedList of Threatened and Endangered Species

WoRMS/OBIS Marine Taxa

100 of the World’s Worst Invasive Alien Species (in GISIN)

Implications for data retrieval

Frost 2005 AMNH

• Notopthalmus viridescens

• Triturus viridescens

• Notopthalmus viridescens

• Notophthalmus viridescens

• Notophthalma viridescens

• Diemyctylus viridescens

• Triton viridescens

• Molge viridescens

• Diemyctylus minatus viridescens

• Triturus viridescens dorsalis

• Diemyctylus viridescens dorsalis

• Notophthalmus viridescens dorsalis

• … 24 others

Dolbe 2004

• Notopthalmus viridescens viridescens

• Triturus viridescens

• Notopthalmus viridescens

• Notophthalmus viridescens

• Notophthalma viridescens

• Diemyctylus viridescens

• Triton viridescens

• Molge viridescens

• Notophthalmus viridescens dorsalis

• Triturus viridescens dorsalis

• Diemyctylus viridescens dorsalis

• Notophthalmus viridescens louisianensis

Breakdown of the names problem

Homography (Homonymy)

A single name may refer to multiple concepts

Homonyms & Disambiguation

Homographs

Virginia (the state) & Virginia Baird & Girard 1853 (the genus)

Tumor (cancer) & Tumor Huang in Huang Dawei 1990

Informed by: Algorithms/Lexicons (word sense disambiguation)

Homonym

Agathis montana (the conifer) & Agathis montana (the wasp)

Wagneria Meladze 1967 & Wagneria Heilprin 1887 & 12 other Wagneria

Informed by: Nomenclators and Taxonomists

Nomenclators establish the factual basis of homonyms and partial disambiguation method

Taxonomy provides a disambiguation method

Taxon Concept (Polysemes)

Gorilla gorilla Wilson and Reeder 1992 vs Gorilla gorilla Groves 2003

Informed by: Taxonomic opinion via monographs, floras, faunas, derived lists

Take home message

The names problem is inherent to all taxon data

We need a Global Taxonomic Resource

Needs to treat all names

Support multiple taxonomic opinion

Depends on many different source data

The Informatics sum is more than the content parts

Can only work in a federated enviroments

Requires communal exchange data standards communications protocols

What GBIF is doing about the names problem

Current GBIF Taxonomic Infrastructure

(ECAT)

Catalogue of Life

International Plant Names Index (IPNI)

Index Fungorum

Is not enough

EXPAND to Global Taxonomic Infrastructure

Mobilize wide array of “checklist resources”

Promote the use of nomenclatoral GUIDS in all taxonomic checklists

Enable synthesis of resources

Enable informatics web services

Address Synonymy

Wider access to, and explicit classing of synonyms

Access to multiple lexical grouping algorithms

Access to, and support of, development of nomenclators

Promote the use of nomenclatoral GUIDS in all taxonomic checklists

More Taxonomic, Regional, Thematic checklists

Comprehensive Vernacular Names catalogue

Address Homography, Polysemy

Rapid cataloguing of homography

Access to multiple lexical grouping algorithms

Catalogue and classify all genera

Development of multiple disambiguation methods

Standardized representation of taxon concepts

Development of taxon concept comparators

Explicit assertions of concept relations

All Genus Index

Wider Implications of our efforts

GBIF and Phyloinformatics

As a consumer of taxonomic data resources

As a consumer of name services

As a provider of taxonomic metadata

Increased interoperability

How to contact GBIF:

Web site: www.gbif.org

Data portal: www.gbif.net

GBIF Secretariat

Universitetsparken 15

2100 Copenhagen

Denmark

E-mail: dremsen@gbif.org

Phone: +45 3532 1470

Fax: +45 3532 1480

GBIF Secretariat building, supported by a grant from the Aage V. Jensens Fonde

Download