PPT - CHAH

advertisement
Biodiversity Informatics
Biodiversity informatics and the
manipulation of biological information
Jim Croft
jrc@anbg.gov.au
Outline
• ‘Biodiversity Informatics’
• Australia’s Virtual Herbarium as a model of use
and management of biodiversity knowledge
• New ways of managing biological knowledge
• Information management issues
• Current trends and future directions in biodiversity
knowledge management
Biodiversity Informatics
Management of our knowledge of
biodiversity using modern techniques
of data and information management
Taxonomy of Database
Interoperability
Multi-database
systems
Non-federated
[Autonomous]
Federated
Loosely coupled
Tightly coupled
Multiple schemas
Unified schema
Sheth & Larson (1990)
Tightly Coupled
• Central administration
• Semantic consistency
– Schemas
– Authority files
•
•
•
•
Common technology
Difficult to implement
Proprietary solutions tolerated
Expensive
Loosely Coupled
•
•
•
•
Closer to Reality
Independent management
Suited to scientific systems
Common publication syntax
– Export schema
• Less functionality … Doable
• Need open standards
Intermediate Coupling
• Scientific Independence
• Common syntax & semantics for the
exchange of information.
– Import/export
– HISPID, Darwin Core, TDWG/CODATA abcd
• Leverage Existing Open Standards
– Participation in wider, more loosely coupled
federations
– Simplicity
– Distribution of effort
Data Refinement
Policy & strategy
Envir. decision making
•
•
•
•
conservation
restoration biology
resource mgmt
utilization
Increasing
refinement &
utility of data
action
knowledge
information
data
observations
the real world
• government
• corporate
• individual
Herbarium Specimens
Specimen Data Capture
Specimen Data
• The core information is from herbarium specimens
• Beyond taxonomy & names
• Collections data:
–
–
–
–
–
–
–
–
Scientific name
Collection date
Collector name & number
Location
Soils
Habitat (incl. topography)
Vegetation community
Associated species
A Herbarium Database Structure
What do we want to know?
•
•
•
•
•
•
•
•
•
What species does a plant belong to?
What is its name?
What other species is it related to?
What does it look like?
Where does it grow?
Where might it grow?
What other species grow with it?
What species grow in a defined area?
How did they get there?
What is a Virtual Herbarium?
An on-line digital representation of a
scientific collection of preserved plant
specimens and botanical information
What is the AVH?
• Spread across
Australian herbaria
• Data distributed;
resides with custodians
• Each herbarium has
a portal to receive
requests and to
deliver data
• A common single
query AVH interface
in each herbarium
polls all herbaria
Major Australian Herbaria
AVH Partners
State Herbarium of
South Australia
National Herbarium of
Victoria
Queensland Herbarium
National Herbarium of
New South Wales
Australian National
Herbarium
Northern Territory
Herbarium
Tasmanian Herbarium
Industry Partner:
KE Software
Western Australian
Herbarium
Australian Biological
Resources Study
Why is there an AVH?
• Pressure on Herbaria to work more efficiently
• Demand for access to larger amounts of data
• Demand to access data more quickly
• Demand to view data in different ways
• Pressure on herbaria to appear and to be more
responsive to community needs
What is the AVH task?
• > 18,000 species of higher plants
• > 64,000 available names
• Extensive synonymy (4 names per plant)
• 8 major government-funded herbaria
• Similar number of university herbaria
• > 6,500,000 specimens in Aust. herbaria
• 50 -100 data elements per specimen
• Several Kb per specimen (excl. images)
Herbarium database status
The AVH Agreement
• $10M over 5 years to database all major Australian
herbarium collections
• $10 million:
- $ 4 million Commonwealth
- $ 4 million State/Territory
- $ 2 million private
• Initial focus on capture of herbarium specimen data
• Ultimate aim a complete flora information system
Australia’s Virtual Herbarium
On-line access to herbarium specimen
information and botanical knowledge
Australian Plant Name
Index (APNI)
www.anbg.gov.au/apni
www.anbg.gov.au/win
http://www.chah.gov.au/avh.html
Acacia
salicina
Research Potential:
Plant distribution analysis
?
Incurved
Recurved
Recurved
Incurved
Incurved
Pultenaea distribution classes in eastern Australia
?
Flora Information Systems
• On-line systems
• Often regionally based
• Integrating:
– Plant names and synonyms
–
–
–
–
Descriptive Flora treatments
Illustrations
Distributions
etc.
Flora Information Systems
Botanical illustrations
National Plant Photograph Index
Search all records on-line
Digital images available
(‘best of class’)
35,000 images of
Australian plants and
vegetation
www.anbg.gov.au/anbg/photo-collection/
Type Images on demand
High resolution image of
type specimen of Austrobaileya
downloaded over the Internet
from the Herbarium of the
New York Botanical Garden
Flora & Revision Databases
New ways of managing and delivering
botanical information
A Flora in XML
Example in HTML
Example in XML
<p><b>Platyzoma
microphyllum</b> R.Br.,
<i>Prodr.</i> 160 (1810)</p>
<p ><i>Gleichenia platyzoma</i>
F.Muell., <i>Veg. Chatham.-Isl.</i>
63 (1864). T: Facing Island, Qld,
<i>R.Brown Iter Austral. 102</i> ;
lecto: BM.</p>
<p>Illus.: S.B.Andrews…</p>
<p>Rhizome short-creeping…
Sporangia in zones in distal half of
frond. Fig. 55</p>
<p>Widespread across northern
Australia… Grows in sandy or
swampy soils.... Map 135.</p>
<p>W.A.: 14.4 km NW of
Mt…</p>
<taxon><name>Platyzoma microphyllum</name>
<author>R.Br</author>,
<publication><title>Prodr.</title>
<page>160</page><date>1810</date>
</publication>
<synonym> <name>Gleichenia platyzoma</name>
<author> F.Muell. </author><publication>Veg.
Chatham.-Isl.</publication> <page>63<page>
<date>1864</date> <type>T: Facing Island, Qld,
…</type></synonym>
<illustration>Illus.: S.B.Andrews…</illustration>
<description>Rhizome short-creeping… Sporangia
in zones in distal half of frond. </description>
<figure> Fig. 55 </figure>
<locality>Widespread across northern Australia…
</locality><habitat>Grows in sandy or swampy
soils...</habitat> <map>Map 135.</map>
<specimens>W.A.: 14.4 km NW of
Mt…</specimens></taxon>
A Flora XML Schema fragment
A Flora database structure
A Flora database report
An old process of publication
Botanist
W-P file
Editors
W-P file
Publisher
C-R Copy
Book, etc.
An new process of publication
Botanist
W-P file
Editors
W-P file
Outputs
XML file
Publisher
C-R Copy
Outputs
Database
XML file
Book, etc.
A future process of publication
Botanist
Database
Editors
Outputs
XML file
Outputs
Database
Publisher
C-R Copy
Book, etc.
Interactive Identification
Using computers to identify and name
plant species and display information
about them
Interactive Plant Identification
Current trends, future directions
?
Trends in Biodiverssity
Information Management
Nomenclatural
Regional
Text-based
Taxon-based
Individual effort
Single user
Standalone
Centralized
Proprietary System
Idiosyncratic Design
Nonstandard data content
Conventional
Developmental
Access charges
 Taxonomic
 Global
 Image-based
 Spatially-based
 Partnerships
 Multiuser
 Networked
 Distributed
 Open System
 Standard Architecture
 Standard data content
 Innovative
 Stable
 Freely available
Global Organization
• Several parallel and complementary initiatives:
– Global Biodiversity Information Facility (GIF)
– Taxonomic Databases Working Group (TDWG)
– Global Taxonomic Initiative (GTI)
– International Organization for Plant Information (IOPI)
– Species 2000
– All Species Foundation (ALL)
www.gbif.org
Data Flow within GBIF Network
User Browser
HTML Data
HTML Data
GBIF
Portal
Participant
Node
Aggregated
Data
Aggregated
Data
Service
Metadata
Service
Metadata
Detailed
Specimen Data
Collection Node
Service
Metadata
Specimen
Index Data
Service
Metadata
Collection Nodes
Participant
Node
Detailed
Specimen
Data
www.all-species.org
20000000
15000000
10000000
5000000
0
Year
www.all-species.org
20000000
15000000
10000000
5000000
What needs to
happen here?
0
Year
www.all-species.org
Requirements for
Interoperability
Standards…
Standards for
Interoperability of
Biodiversity Databases
URL
cgi XPATH
SVG abcdXSLT
RDF
Z39.50
ITF UML
URI UDDIXHTML
SOAP
Dublin
Core
Z39.19
RDFS
BNF
HTTP
DOM
WSDL
SAX
HISPID
DARWIN
CORE
CSS
XML
schema
RMI
ASN.1
PNG
WAIS
Download