Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au Outline • ‘Biodiversity Informatics’ • Australia’s Virtual Herbarium as a model of use and management of biodiversity knowledge • New ways of managing biological knowledge • Information management issues • Current trends and future directions in biodiversity knowledge management Biodiversity Informatics Management of our knowledge of biodiversity using modern techniques of data and information management Taxonomy of Database Interoperability Multi-database systems Non-federated [Autonomous] Federated Loosely coupled Tightly coupled Multiple schemas Unified schema Sheth & Larson (1990) Tightly Coupled • Central administration • Semantic consistency – Schemas – Authority files • • • • Common technology Difficult to implement Proprietary solutions tolerated Expensive Loosely Coupled • • • • Closer to Reality Independent management Suited to scientific systems Common publication syntax – Export schema • Less functionality … Doable • Need open standards Intermediate Coupling • Scientific Independence • Common syntax & semantics for the exchange of information. – Import/export – HISPID, Darwin Core, TDWG/CODATA abcd • Leverage Existing Open Standards – Participation in wider, more loosely coupled federations – Simplicity – Distribution of effort Data Refinement Policy & strategy Envir. decision making • • • • conservation restoration biology resource mgmt utilization Increasing refinement & utility of data action knowledge information data observations the real world • government • corporate • individual Herbarium Specimens Specimen Data Capture Specimen Data • The core information is from herbarium specimens • Beyond taxonomy & names • Collections data: – – – – – – – – Scientific name Collection date Collector name & number Location Soils Habitat (incl. topography) Vegetation community Associated species A Herbarium Database Structure What do we want to know? • • • • • • • • • What species does a plant belong to? What is its name? What other species is it related to? What does it look like? Where does it grow? Where might it grow? What other species grow with it? What species grow in a defined area? How did they get there? What is a Virtual Herbarium? An on-line digital representation of a scientific collection of preserved plant specimens and botanical information What is the AVH? • Spread across Australian herbaria • Data distributed; resides with custodians • Each herbarium has a portal to receive requests and to deliver data • A common single query AVH interface in each herbarium polls all herbaria Major Australian Herbaria AVH Partners State Herbarium of South Australia National Herbarium of Victoria Queensland Herbarium National Herbarium of New South Wales Australian National Herbarium Northern Territory Herbarium Tasmanian Herbarium Industry Partner: KE Software Western Australian Herbarium Australian Biological Resources Study Why is there an AVH? • Pressure on Herbaria to work more efficiently • Demand for access to larger amounts of data • Demand to access data more quickly • Demand to view data in different ways • Pressure on herbaria to appear and to be more responsive to community needs What is the AVH task? • > 18,000 species of higher plants • > 64,000 available names • Extensive synonymy (4 names per plant) • 8 major government-funded herbaria • Similar number of university herbaria • > 6,500,000 specimens in Aust. herbaria • 50 -100 data elements per specimen • Several Kb per specimen (excl. images) Herbarium database status The AVH Agreement • $10M over 5 years to database all major Australian herbarium collections • $10 million: - $ 4 million Commonwealth - $ 4 million State/Territory - $ 2 million private • Initial focus on capture of herbarium specimen data • Ultimate aim a complete flora information system Australia’s Virtual Herbarium On-line access to herbarium specimen information and botanical knowledge Australian Plant Name Index (APNI) www.anbg.gov.au/apni www.anbg.gov.au/win http://www.chah.gov.au/avh.html Acacia salicina Research Potential: Plant distribution analysis ? Incurved Recurved Recurved Incurved Incurved Pultenaea distribution classes in eastern Australia ? Flora Information Systems • On-line systems • Often regionally based • Integrating: – Plant names and synonyms – – – – Descriptive Flora treatments Illustrations Distributions etc. Flora Information Systems Botanical illustrations National Plant Photograph Index Search all records on-line Digital images available (‘best of class’) 35,000 images of Australian plants and vegetation www.anbg.gov.au/anbg/photo-collection/ Type Images on demand High resolution image of type specimen of Austrobaileya downloaded over the Internet from the Herbarium of the New York Botanical Garden Flora & Revision Databases New ways of managing and delivering botanical information A Flora in XML Example in HTML Example in XML <p><b>Platyzoma microphyllum</b> R.Br., <i>Prodr.</i> 160 (1810)</p> <p ><i>Gleichenia platyzoma</i> F.Muell., <i>Veg. Chatham.-Isl.</i> 63 (1864). T: Facing Island, Qld, <i>R.Brown Iter Austral. 102</i> ; lecto: BM.</p> <p>Illus.: S.B.Andrews…</p> <p>Rhizome short-creeping… Sporangia in zones in distal half of frond. Fig. 55</p> <p>Widespread across northern Australia… Grows in sandy or swampy soils.... Map 135.</p> <p>W.A.: 14.4 km NW of Mt…</p> <taxon><name>Platyzoma microphyllum</name> <author>R.Br</author>, <publication><title>Prodr.</title> <page>160</page><date>1810</date> </publication> <synonym> <name>Gleichenia platyzoma</name> <author> F.Muell. </author><publication>Veg. Chatham.-Isl.</publication> <page>63<page> <date>1864</date> <type>T: Facing Island, Qld, …</type></synonym> <illustration>Illus.: S.B.Andrews…</illustration> <description>Rhizome short-creeping… Sporangia in zones in distal half of frond. </description> <figure> Fig. 55 </figure> <locality>Widespread across northern Australia… </locality><habitat>Grows in sandy or swampy soils...</habitat> <map>Map 135.</map> <specimens>W.A.: 14.4 km NW of Mt…</specimens></taxon> A Flora XML Schema fragment A Flora database structure A Flora database report An old process of publication Botanist W-P file Editors W-P file Publisher C-R Copy Book, etc. An new process of publication Botanist W-P file Editors W-P file Outputs XML file Publisher C-R Copy Outputs Database XML file Book, etc. A future process of publication Botanist Database Editors Outputs XML file Outputs Database Publisher C-R Copy Book, etc. Interactive Identification Using computers to identify and name plant species and display information about them Interactive Plant Identification Current trends, future directions ? Trends in Biodiverssity Information Management Nomenclatural Regional Text-based Taxon-based Individual effort Single user Standalone Centralized Proprietary System Idiosyncratic Design Nonstandard data content Conventional Developmental Access charges Taxonomic Global Image-based Spatially-based Partnerships Multiuser Networked Distributed Open System Standard Architecture Standard data content Innovative Stable Freely available Global Organization • Several parallel and complementary initiatives: – Global Biodiversity Information Facility (GIF) – Taxonomic Databases Working Group (TDWG) – Global Taxonomic Initiative (GTI) – International Organization for Plant Information (IOPI) – Species 2000 – All Species Foundation (ALL) www.gbif.org Data Flow within GBIF Network User Browser HTML Data HTML Data GBIF Portal Participant Node Aggregated Data Aggregated Data Service Metadata Service Metadata Detailed Specimen Data Collection Node Service Metadata Specimen Index Data Service Metadata Collection Nodes Participant Node Detailed Specimen Data www.all-species.org 20000000 15000000 10000000 5000000 0 Year www.all-species.org 20000000 15000000 10000000 5000000 What needs to happen here? 0 Year www.all-species.org Requirements for Interoperability Standards… Standards for Interoperability of Biodiversity Databases URL cgi XPATH SVG abcdXSLT RDF Z39.50 ITF UML URI UDDIXHTML SOAP Dublin Core Z39.19 RDFS BNF HTTP DOM WSDL SAX HISPID DARWIN CORE CSS XML schema RMI ASN.1 PNG WAIS