The Paleobiology Database - Museum für Naturkunde: Museum intern

advertisement
The Paleobiology
Database
A Hands-on Tutorial on
Estimating Fossil Diversity
Patterns
Wolfgang Kiessling, 25 September 2012
Program









09:00 – 09:20
Computer-Hookup, Intro
09:20 – 10:00
Background and Rationale
10:00 – 10:45
Basic Features
10:45 – 11:00
Break
11:00 – 11:30
Advanced Features
11:30 – 12:30
Diversity Through Time
12:30 – 13:30
Lunch
13:30 – 14:00
Sampling-Standardized Diversity
Curves with the PBDB
14:00 – 15:00
Data Entry Trial
Important Resources

Course Materials
 http://download.naturkundemuseum-
berlin.de/wolfgang.kiessling/Workshop

Database Servers
 http://paleodb.org
 http://paleodb.geology.wisc.edu/
Background and Rationale

The Age of Biodiversity Informatics
 Scope

of modern biodiversity facilities
A brief history of the PaleoDB
 The
scientific question it sought to answer
 The evolution since then
The Age of Biodiversity Informatics
Biodiversity Informatics: An emerging
discipline in the broader field of
Bioinformatics aiming at information
capture, storage, retrieval, and analysis of
biodiversity data
 The Age: Biodiversity research with
increasing worldwide attention and funding
especially for large-scale approaches

Biodiversity Initiatives

National biodiversity centers being established
worldwide, usually highly interdisciplinary
 Science driven
 Discovery/outreach
 Policy driven

driven
International consortia
 Infrastructure: GBIF, OBIS
 Policy: Intergovernmental Platform
of Biodiversity &
Ecosystem Services (http://www.ipbes.net)

Where is Paleo?
GBIF and Allies


The Global Biodiversity Information Facility (GBIF) was
founded in 2001
Mission: facilitate free and open access to biodiversity
data worldwide, via the Internet, to underpin sustainable
development
Priorities:
 Mobilising biodiversity data
 Developing protocols and standards

Building an informatics architecture
www.gbif.org
271∙106 georeferenced data available
GBIF promotes data-sharing with countries of
origin.
Use of GBIF data
 Predict
biotic effects of climate change
 Analyse and predict spread of pests and diseases of
humans, crops, livestock, wildlife, etc.
 Predict best places to set up new protected areas
 Analyse invasive species and predict invasion
pathways
 Provide policy-maker-relevant data of all kinds
 Be a resource for biodiversity science communities
Paleo to be Integrated at Multiple
Scales
Short time scales: Natural baselines,
ecological consequences of climate
change  Conservation Palaeobiology
 Long time scales: General principles of
biodiversity regulation, response to
extreme events  Analytical
Palaeobiology

The Paleobiology Database: A Core
Infrastructure for the Biogeosciences


Founded in 2000, funded by NSF
(2000-2005, 2010-) and other
sources
Driven by a scientific question


Was the rise of marine biodiversity in
the last 200 myr as dramatic as
suggested by compendia of
stratigraphic ranges?
Collect occurrence data, apply
sampling standardization and use
fossil data only
http://paleodb.org
2000
0
1000
Number of genera
3000
4000
Phanerozoic Marine Animal Diversity
O
Cm
500
S
D
400
P
C
300
J
Tr
200
Age (Ma)
Exponential post-Paleozoic rise?
Data from Sepkoski (2002, Bull. Am. Pal.)
K
100
Pg
N
0
What is wrong with Sepkoski?

Data are just times of first and last
appearances in the record (genera and
families)
 No
way to standardize for sampling
 Extreme effect of the Pull of the Recent
500
400
200
300
2000
Cm
500
O
S
D
400
C
P
300
Tr
J
200
K
100
Pg
0
100
1000
0
Number of genera
3000
600
4000
700
Marine Biodiversity Through Time
N
Cm
0
O
500
Age (Ma)
Old
Exponential post-Paleozoic rise
Data from Sepkoski (2002, Bull. Am. Pal.)
S
D
400
C
P
300
Tr
J
200
Age (Ma)
New
Logistic post-Triassic rise
Alroy et al. (2008, Science)
K
100
Pg
N
0
Structure of Compendia
Ext-P
Ext-Stg Reference
Genus
Ori-P
Ori-Stg
Acropora
T
(Eo-l)
Actinacis
K
(Ceno-l) - T
(Mi-l-l)
(2)
Actinaraea
J
(Sine)
(Apti)
(819,821,825,919)
Actinastraea
Tr
(Ladi-u) - K
(Maes)
(817)
Ancliffia
J
(Bath)
Anodontia
T
(Than)
*
(705)
- R
- K
- R
*
Corals and bivalves from Sepkoski‘s compendium of marine animal
genera (2002)
Evolution of the PaleoDB: New
Horizons

Biogeographic Questions


Extending taxonomic/environmental scope



Vertebrate, paleobotany, and micropaleontology research groups
Link to Neptune Database (Ocean Drilling)
Beyond Diversity





Implementation of Scotese’s plate tectonic reconstructions
Communities over time
Environmental preferences
Geodisparity
Body-size distributions
Geological Drivers
Basic Features of the PBDB
Organization
 Structure
 Finding data
 Drawing maps
 Downloading data

Organization





Database Coordinator: John Alroy (Macquarie
University)
Informal core group running mirror servers (3
persons)
Data Management Committee (10)
Data Contributors: Professional scientists
(usually with PhD) (132)
Data Enterers: Contributors and students (310)
The Structure




Basic information is the occurrence of a
particular taxon (species, genus or higher) in a
particular collection (i.e. sample or outcrop …)
References linked to occurrences and
collections
Geographic and geologic context stored with
each collection
Taxa classified according to multiple opinions
(synonymies, re-identifications)
Finding Data

Generate data summary tables



Find collections



Menu: Analyze
Task: Marine Invertebrate Collections by Geological Period
Menu: Full search – Fossil collection records
Task: Find all collections containing lithistid sponges (Lithistida)
in Germany
Find taxa


Menu: Full search – Fossil organisms
Task: Get the full synonymy list of Brachiosaurus brancai
Drawing Maps
Draw fossil collections on a plate tectonic
reconstruction of the appropriate age
 Menu: Analyze
 Tasks: 1. Get a map of Jurassic reefs in a
Mollweide projection. 2. Identify the
westernmost reef and get a list of fossils

Downloading Data





The most important step for further analyses
Virtually all data in the PaleoDB are open access
Downloads in csv format can be read by almost
any program
Menu: Download
Task: Download all occurrences of Triassic
sponges with coordinates/paleocoordinates,
stage-level resolution and full taxonomic
information
Playtime
+
Break
Advanced Features

Ecological metrics of collections
 Diversity

Confidence intervals of stratigraphic
ranges
 Within

and others
sections and global
Diversity curve generator
 Raw
and sampling standardized
Ecological Metrics
Get alpha diversity and ecological data
from a collection
 Menu: Analyze abundance data
 Task: Get the metrics of a Triassic
community from China (e.g. collection
#31618) and look feeding modes

Background of Diversity Metrics
Which community is more diverse?
Measuring Alpha Diversity

Shannon-Wiener Information Index (H)
= -∑ pi x ln(pi)
 pi = Proportion of the ith species in community
 Mixed signal of richness and evenness
H

Evenness (J)
 Evenness

J = H/Hmax
Hmax = ln(S)
Rarefaction


Which species richness would
I observe if my sample A was
smaller than it is (e.g., as small
as sample B)
Mathematic solution:

S 
E ( S n )   1 
i 1 


 N  Ni  

 
n


N 
  
n 
60
50
Empirical solution:
 Let the computer draw
specimens at random and
get diversity for a given
sample size
40
Species

30
20
Neogene
10
Jurassic
0
0
100
200
300
Individuals
400
500
600
Confidence Intervals of Stratigraphic
Ranges
The first and last observations of a taxon
in the fossil record must be younger and
older than its time or origination and
extinction, respectively
 By how much?

 Quantifying
globally
uncertainties within sections and
Draw a Stratigraphic Section
Menu: Analyze stratigraphic sections
 Task: Try the Bangtoupo section in China

Using the fossil record for molecular
clocks
Calibration: Estimate the branching points
of two sister groups
 Menu: Analyze – Calculate a first
appearance
 Task: Branching point between Acropora
and Montipora

Diversity Through Time

Theoretical Background
 Counting
methods
 Sampling issues
 Sampling standardization

Hands on with R
Counting Diversity Through Time
A
Through ranging
B
Through ranging
C
Extinct
D
Originating
E
Singleton
Measuring Diversity
A
Through ranging
B
Extinct
C
Originating
D
Singleton
E
Through ranging
Boundary crossers: 3
Range through: 5
Boundary
crossers
Range
through
Range through minus
singletons: 4
Measuring Diversity Through Time
Taxon
1
A
Time
2
x
x
B
C
x
D
x
E
x
x
3
4
x
x
x
x
x
x
Draw Diversity Curves: SIB, range through, range through minus singletons,
boundary crossers
Sampling Standardization of Time Series
Data
2 or 3
Rarefaction (3) 2,23
Perhaps 2
1,89
Sure 2
2
This sufficient for sampled in bin (SIB) diversity, but silent on extinctions
Diversity Over Time
Omit Singletons
S = 2, Ext = 0
S = 2, Ext = 0
S = 2, Ext = 2
S = 1, Ext = 0
S = 1, Ext = 0
S = 1, Ext = 1
S = 2, Ext = 0
S = 3, Ext = 1
S = 2, Ext = 2
S = 1.67, Ext = 0
S = 1.67, Ext = 0.33
S = 1.67, Ext = 1.67
Subsampling Methods

Classical Rarefaction



Occurrences weigthed by-list subsampling (OW)




Pool collections
Randomly draw collections until quota of collections is reached
Occurrences-exponentiated weighted by-list subsampling (OexpW)



Pool occurrences by collections
Randomly draw collections until quota of occurrences is reached
Unweighted by-list subsampling (UW)


Pool all occurrence data
Randomly draw data until quota is reached
Pool occurrences by collections
Randomly draw collections until weighted quota of occurrences is reached
Shareholder Quorum Method

Sampling until a particular proportion (quorum) of the rank-abundance
distribution has been sampled
Why so many?
Rarefaction assumes that differences in
diversity are due to sampling
 We might lose biological signal by
attempting to sampling-standardize if we
don’t consider evenness

 If
evenness of communities is different, then
rarefaction will mostly reflect these differences

The best subsampling method has to
consider several biases
Lunch
Hands-On with the PaleoDB
Create a subsampled diversity curve with
the online scripts
 Download a dataset and use the function:
Generate diversity curve data

Analyze Downloaded Data with R
Open R
 Run the script PBDB_analyze.R

Occurrence Data Now and Then
Occurrence counts per 1° grid
We Want You!
The Paleobiology Database is
from the community for the
community
Data quantity and quality need to be
improved to increase rigor and
scope of analyses
Many important questions are yet to
be addressed
http://paleodb.org
How to Enter Data
Give it a try
 testpaleodb.geology.wisc.edu
 Login as Contributor:

 Authorizer:
User60x, T.
 Enterer: User60x, T.
 Password: Berlin
The Paleobiology Database (PaleoDB, www.paleodb.org) has
been rapidly developing into a core infrastructure for
palaeontology. The participation of 289 contributors from 22
countries made it possible that the PaleoDB now holds
taxonomic and distributional information on 217,000 taxa and
more than one million fossil occurrences. With 150 official
publications, the scientific output is impressive, but could be
improved if more colleagues would learn how to make use of the
database for their own research.
The purpose of this course is thus to familiarize paleontologists
with the structure and scope of the PaleoDB and to introduce
them to its analytical tools that are available online. Examples
will be provided for paleo-community analysis, confidence
intervals on stratigraphic ranges, and global and regional
diversity patterns. Basic statistical concepts will be explained
briefly, but the focus is on practicing with the database.
Download