On-line biological data concepts at CSIRO Marine Research, Australia

advertisement
On-line biological data concepts at CSIRO
Marine Research, Australia
Tony Rees & Kim Finney
Divisional Data Centre
CSIRO Marine Research, Hobart, Australia
http://www.marine.csiro.au/datacentre/
Our website: http://www.marine.csiro.au/datacentre/
Pre-existing situation at CMR (before 1997)
•
•
•
•
Data in a variety of databases and flat files
No metadata or digital documentation
No web access to any data or metadata
CAAB (taxon coding system) in existence but
coverage patchy and compliance variable
Our implementation path
Stage 1 (1997-2000) ...
•
•
•
Construct a searchable, web-accessible metadata system
and start population it with information - MarLIN v1
Upgrade CAAB to form a comprehensive taxon dictionary for
MarLIN (also accessible by SQuID)
Build a pilot data store and visualisation system with a webdriven GUI (Java applet) - SQuID v1
Stage 2 (2000-) ...
•
•
Build SQuID v2 (onwards) to become a comprehensive data
store, with upgraded links to MarLIN and CAAB
Implement linkage between MarLIN and Australia-wide,
distributed metadata search system
Stage 3… ???
Our system overview
Master data storage
(includes index layer)
- holds info at the atomic
data level
Data directory
(metadatabase)
- holds info at “dataset” level
(e.g. survey, species range)
Entry point to data
Display relevant metadata
Taxon dictionary
Subsets of information
shared with other metadata
directory systems
Digression #1: Taxon matching
•
•
Simplistic view:
–
text match on one field (“scientific name”) or two (genus + species)
More comprehensive approach:
–
10 or more fields required, e.g. in CAAB we define the following:
Genus
Subgenus
Species
Qualifier
also need to flag:
Subspecies
- Is botanical or zoological code applicable?
Variety
- Species name latin or informal (“sp. A”, etc.)?
Original Author/s
- Has name changed from original? (even if
Original Date
no revising author/date stored)
Revising Author/s
Revision Date
Authority Addendum
Examples from our database:
• Chlamys (Belchlamys) aktinos (Petterd, 1886) … a scallop
• Ophiaster hydroideus (Lohmann) Lohmann, 1913 emend. Manton &
Oates, 1983 … a coccolithophorid
• Heteroclinus sp. 1 [in Gomon et al, 1994] .. Kuiter's weedfish
Taxon matching … continued
•
•
•
•
•
We have standardised on taxon codes, rather than names
for data storage and matching … names are stored as an
attribute of the code (and can be updated in the future as
needed)
Our “CAAB” coding system has evolved over 20+ years earlier generations of codes are maintained on the system
New web-based access facility for retrieving latest name
for a code, searching for a taxon, etc.
Same CAAB codes are also used by other marine
science/fisheries agencies around Australia
Facility newly implemented in CAAB to hold ITIS codes, for
cross-reference to international systems in the future
CAAB services available
Applicationlevel
requests
• Generate scientific
name, common name,
current code (if
applicable) for a given
taxon code
• Call a CAAB taxon
report
• Translate an ITIS
number to a CAAB code
(or vice versa)
User searches by
scientific name,
common name or
taxon code (or portion
thereof)
CAAB
user
interface
• List taxa matching
query
• Retrieve current
sci. name, common
name(s), taxon
code, taxon report
• Initiate a MarLIN
search, ITIS report,
FishBase report
• List taxa by CAAB
category or family
CAAB web interface (current version)
Digression #2: taxonomy keywords
•
•
•
CAAB uses “major categories” (mostly = phyla)
MarLIN uses Australian “Blue Pages” keywords (c. 100 terms) - independent of
CAAB codes (in current implementation)
NASA GCMD keywords would be an OBIS option (maybe with additions to suit OBIS)
- c. 50 currently relevant … could also cross-map to GEMET (EC) list (c.200)
EARTH SCIENCE >> Biosphere >> Zoology >> Amphibians
EARTH SCIENCE >> Biosphere >> Zoology >> Anemones
EARTH SCIENCE >> Biosphere >> Zoology >> Arachnids
EARTH SCIENCE >> Biosphere >> Zoology >> Arthropods
EARTH SCIENCE >> Biosphere >> Zoology >> Birds
EARTH SCIENCE >> Biosphere >> Zoology >> Centipedes
EARTH SCIENCE >> Biosphere >> Zoology >> Corals
EARTH SCIENCE >> Biosphere >> Zoology >> Crustaceans
EARTH SCIENCE >> Biosphere >> Zoology >> Echinoderms
EARTH SCIENCE >> Biosphere >> Zoology >> Fish
EARTH SCIENCE >> Biosphere >> Zoology >> Flatworms
EARTH SCIENCE >> Biosphere >> Zoology >> Insects
EARTH SCIENCE >> Biosphere >> Zoology >> Invertebrates
EARTH SCIENCE >> Biosphere >> Zoology >> Jellyfish
EARTH SCIENCE >> Biosphere >> Zoology >> Mammals
EARTH SCIENCE >> Biosphere >> Zoology >> Millipedes
EARTH SCIENCE >> Biosphere >> Zoology >> Mollusks
EARTH SCIENCE >> Biosphere >> Zoology >> Reptiles
EARTH SCIENCE >> Biosphere >> Zoology >> Roundworms
EARTH SCIENCE >> Biosphere >> Zoology >> Segmented Worms
EARTH SCIENCE >> Biosphere >> Zoology >> Sponges
EARTH SCIENCE >> Biosphere >> Zoology >> Vertebrates
EARTH SCIENCE >> Biosphere >> Zoology >> Zooplankton
EARTH SCIENCE >> Biosphere >> Microbiota >> Amoebae
EARTH SCIENCE >> Biosphere >> Microbiota >> Bacteria
EARTH SCIENCE >> Biosphere >> Microbiota >> Blue-green Algae
EARTH SCIENCE >> Biosphere >> Microbiota >> Ciliates
EARTH SCIENCE >> Biosphere >> Microbiota >> Coccolithophore
EARTH SCIENCE >> Biosphere >> Microbiota >> Diatoms
EARTH SCIENCE >> Biosphere >> Microbiota >> Flagellates
EARTH SCIENCE >> Biosphere >> Microbiota >> Foraminifers
EARTH SCIENCE >> Biosphere >> Microbiota >> Microalgae
EARTH SCIENCE >> Biosphere >> Microbiota >> Microphyte
EARTH SCIENCE >> Biosphere >> Microbiota >> Phytoplankton
EARTH SCIENCE >> Biosphere >> Microbiota >> Plankton
EARTH SCIENCE >> Biosphere >> Microbiota >> Protist
EARTH SCIENCE >> Biosphere >> Microbiota >> Radiolarians
EARTH SCIENCE >> Biosphere >> Microbiota >> Zooplankton
EARTH SCIENCE >> Biosphere >> Vegetation >> Algae
EARTH SCIENCE >> Biosphere >> Vegetation >> Flowering Plants
EARTH SCIENCE >> Biosphere >> Vegetation >> Lichens
EARTH SCIENCE >> Biosphere >> Vegetation >> Macroalgae
EARTH SCIENCE >> Biosphere >> Vegetation >> Macrophyte
EARTH SCIENCE >> Biosphere >> Vegetation >> Phytoplankton
Taxonomy keyword cross-mapping (examples)
GCMD list
Invertebrates
Sponges
Jellyfish
Anemones
Corals
Flatworms
Roundworms
Segmented Worms
Mollusks
Arthropods
Insects
Arachnids
Echinoderms
Crustaceans
Vertebrates
Fish
Amphibians
Reptiles
Birds
Mammals
GEMET list
invertebrate … S709
poriferan … S744
coelenterate … S737
coral … S738
nematode … S743
annelid … S711 ++
mollusc … S740
cephalopod … S741
gastropod … S742
arthropod … S713
insect … S719 ++
chelicerate … S714 ++
echinoderm … S739
crustacean … S717
vertebrate … S649
fish … S754
amphibian … S 650 ++
reptile … S691 ++
bird … S654 ++
mammal … S 664 ++
MarLIN - used for data discovery
•
•
•
•
•
MarLIN - based on an Oracle database containing dataset,
project, and survey descriptions, plus on-line links to data and
web resources
Holds metadata according to regional (ANZLIC and “Blue
Pages”) standards, with additional agency-constructed fields
(“extended ANZLIC”)
Web interface for searching and metadata contribution/update,
using HTML, Oracle Web Server and custom PL/SQL application
Produces lists of datasets, or dataset reports, as requested
Includes links to pre-formatted data “packets” (now) and to
SQuID (in future), for access to the data
NB: no data visualising capability, apart from “thumbnails” showing data
extent
MarLIN - behind the scenes
•
Some 25+ tables, holding the following:
–
–
–
–
–
–
•
•
•
text-based fields (e.g. title, abstract, contributors, references, etc.)
keywords, handled as numeric ID’s (including taxonomic keywords)
species/species groups, handled as CAAB codes
spatial extent, handled as bounding coordinates (max and min. latitude and
longitude)
time extent, handled as earliest and latest collection date for items in the dataset
originator organisation, present custodian, survey, contact person, etc, handled
as numeric ID’s
Initial search set up by keyword/ID type, spatial coordinates, time period (if
desired)
Then search/browse by subject categories, keywords, taxon names,
contributing project, vessel/voyage identifier, location of data, etc.
Free text search also supported
MarLIN search interface
Example MarLIN search result - by taxonomic group
subject categories | custodian organisations | vessels | voyages | projects |
taxonomic groups | species | habitats | parameters | equipment
The following choices are presently available for MarLIN records in the selected region
and/or time period:
Start year: 1990
End year: 1995
Selected region: Australian North West Shelf (stored coordinates used: North=-17, West=114, South=-24,
East=122)
Click on any hyperlink to see the full listing for that item.
Invertebrates 4
. . . . Cephalopods 1
. . . . . . Squids 1
. . Crustaceans 2
. . . . Prawns & Shrimps 2
Fishes 4
. . Breams 1
. . Dories 1
. . Leatherjackets 1
. . Perches 3
. . Redfishes 1
. . Roughies 1
. . Snappers 4
. . Whales 1
Example MarLIN search result - by species
subject categories | custodian organisations | vessels | voyages | projects |
taxonomic groups | species | habitats | parameters | equipment
The following choices are presently available for MarLIN records in the selected region
and/or time period:
Start year: 1990
End year: 1995
Selected region: Australian North West Shelf (stored coordinates used: North=-17, West=114, South=-24,
East=122)
Click on any hyperlink to see the full listing for that item.
23 636004 Nototodarus gouldi .. Gould's squid 1
28 786002 Metanephrops boschmai .. Boschma's scampi 1
28 786005 Metanephrops velutinus .. velvet scampi 1
28 821001 Ibacus alticrenatus .. deepwater bug 1
28 821002 Ibacus pubescens .. [a shovel-nosed/slipper lobster] 1
37 118001 Saurida undosquamis .. brushtooth lizardfish 3
37 118016 Saurida sp. 2 [in Sainsbury et al, 1985] .. grey lizardfish 3
37 255004 Gephyroberyx darwinii .. Darwin's roughy 1
37 258002 Beryx splendens .. alfonsino 1
(etc.)
Example MarLIN search result - dataset titles
You searched on the following criteria:
Start year: 1990
End year: 1995
Selected region: Australian North West Shelf
CAAB Species: 37 118001 - Saurida undosquamis
There are 3 datasets matching your criteria in MarLIN at this time.
Click on the dataset title to view the metadata record for any dataset.
Southern Surveyor Voyage SS 02/90 - Biological Data Overview
Southern Surveyor Voyage SS 04/91 - Biological Data Overview
Southern Surveyor Voyage SS 08/95 - Biological Data Overview
------------------------------------------------------------------------
SQuID - data repository and visualisation tool
•
•
•
•
•
•
•
•
Oracle relational database containing c. 45 tables (present version)
Holds point, poly-line, and polygon based, geo-referenced data (also
time and depth referenced)
Client runs as Java applet, connects to Oracle data store by Remote
Method Invocation (RMI) and JDBC
Search by spatial coordinates, time period, data “stream” … can subset
by survey if desired
Retrieve atomic-level data for inspection or upload to user’s system
Basic plotting routines provided, such as:
–
–
–
–
–
geographic distribution of data (sampling points, vessel tracks)
vertical plots (e.g. temperature, salinity, oxygen vs depth)
time-based plots (e.g. water temperature measurement through a voyage)
pie charts for catch composition by number or weight
length-frequency data, aggregated or by sex of individual
Taxon handling using CAAB codes (system includes legacy data with
obsolete codes)
Links to MarLIN to display relevant metadata
SQuID user interface - version 1.0
Example SQuID search result
SQuID atomic level data - example
Time series data in SQuID
SQuID vs MarLIN / CAAB - two different approaches
SQuID - a data-rich browser environment
•
•
Large files uploaded to the browser to allow interactive functions
(zoomable maps, on-demand display of sample details, cursor tracking,
browser-generated plots)
Disadvantages: more complex applet to load, longer waits for queries
to be serviced, performance on user’s machine may be limiting
MarLIN & CAAB - a minimal browser environment
•
•
•
No reliance on JAVA version control, browser plugins etc, no load time
at startup
All processing takes place on the server (can maximise performance
there) - less stringent requirements for users in hardware terms
Disadvantage: less real-time interactivity provided (although some
workarounds possible)
… May look at a hybrid solution for SQuID v2 - prioritise what level of
interactivity/data upload is really needed, handle more at server level
some considerations for OBIS ...
•
•
•
•
•
For agency-specific reasons, we have arrived at separate
metadata/data systems. OBIS might want to integrate these
two aspects more fully
Automated generation/maintenance of metadata might be
possible (at least in part) and is certainly desirable
Where would OBIS metadata reside? (centrally or replicated
or fully distributed?) - Australian “ASDD” is an example of a
fully distributed system, NASA “GCMD” is a centralised one
Need to decide on taxon handling for OBIS (names or
codes), plus standard(s) for higher level searching
OBIS software should aim to tolerate a diversity of agencylevel systems, while encouraging/facilitating “best practice”
data management
The End
CAAB web search
Download