The Paleobiology Database A Hands-on Tutorial on Estimating Fossil Diversity Patterns Wolfgang Kiessling, 25 September 2012 Program 09:00 – 09:20 Computer-Hookup, Intro 09:20 – 10:00 Background and Rationale 10:00 – 10:45 Basic Features 10:45 – 11:00 Break 11:00 – 11:30 Advanced Features 11:30 – 12:30 Diversity Through Time 12:30 – 13:30 Lunch 13:30 – 14:00 Sampling-Standardized Diversity Curves with the PBDB 14:00 – 15:00 Data Entry Trial Important Resources Course Materials http://download.naturkundemuseum- berlin.de/wolfgang.kiessling/Workshop Database Servers http://paleodb.org http://paleodb.geology.wisc.edu/ Background and Rationale The Age of Biodiversity Informatics Scope of modern biodiversity facilities A brief history of the PaleoDB The scientific question it sought to answer The evolution since then The Age of Biodiversity Informatics Biodiversity Informatics: An emerging discipline in the broader field of Bioinformatics aiming at information capture, storage, retrieval, and analysis of biodiversity data The Age: Biodiversity research with increasing worldwide attention and funding especially for large-scale approaches Biodiversity Initiatives National biodiversity centers being established worldwide, usually highly interdisciplinary Science driven Discovery/outreach Policy driven driven International consortia Infrastructure: GBIF, OBIS Policy: Intergovernmental Platform of Biodiversity & Ecosystem Services (http://www.ipbes.net) Where is Paleo? GBIF and Allies The Global Biodiversity Information Facility (GBIF) was founded in 2001 Mission: facilitate free and open access to biodiversity data worldwide, via the Internet, to underpin sustainable development Priorities: Mobilising biodiversity data Developing protocols and standards Building an informatics architecture www.gbif.org 271∙106 georeferenced data available GBIF promotes data-sharing with countries of origin. Use of GBIF data Predict biotic effects of climate change Analyse and predict spread of pests and diseases of humans, crops, livestock, wildlife, etc. Predict best places to set up new protected areas Analyse invasive species and predict invasion pathways Provide policy-maker-relevant data of all kinds Be a resource for biodiversity science communities Paleo to be Integrated at Multiple Scales Short time scales: Natural baselines, ecological consequences of climate change Conservation Palaeobiology Long time scales: General principles of biodiversity regulation, response to extreme events Analytical Palaeobiology The Paleobiology Database: A Core Infrastructure for the Biogeosciences Founded in 2000, funded by NSF (2000-2005, 2010-) and other sources Driven by a scientific question Was the rise of marine biodiversity in the last 200 myr as dramatic as suggested by compendia of stratigraphic ranges? Collect occurrence data, apply sampling standardization and use fossil data only http://paleodb.org 2000 0 1000 Number of genera 3000 4000 Phanerozoic Marine Animal Diversity O Cm 500 S D 400 P C 300 J Tr 200 Age (Ma) Exponential post-Paleozoic rise? Data from Sepkoski (2002, Bull. Am. Pal.) K 100 Pg N 0 What is wrong with Sepkoski? Data are just times of first and last appearances in the record (genera and families) No way to standardize for sampling Extreme effect of the Pull of the Recent 500 400 200 300 2000 Cm 500 O S D 400 C P 300 Tr J 200 K 100 Pg 0 100 1000 0 Number of genera 3000 600 4000 700 Marine Biodiversity Through Time N Cm 0 O 500 Age (Ma) Old Exponential post-Paleozoic rise Data from Sepkoski (2002, Bull. Am. Pal.) S D 400 C P 300 Tr J 200 Age (Ma) New Logistic post-Triassic rise Alroy et al. (2008, Science) K 100 Pg N 0 Structure of Compendia Ext-P Ext-Stg Reference Genus Ori-P Ori-Stg Acropora T (Eo-l) Actinacis K (Ceno-l) - T (Mi-l-l) (2) Actinaraea J (Sine) (Apti) (819,821,825,919) Actinastraea Tr (Ladi-u) - K (Maes) (817) Ancliffia J (Bath) Anodontia T (Than) * (705) - R - K - R * Corals and bivalves from Sepkoski‘s compendium of marine animal genera (2002) Evolution of the PaleoDB: New Horizons Biogeographic Questions Extending taxonomic/environmental scope Vertebrate, paleobotany, and micropaleontology research groups Link to Neptune Database (Ocean Drilling) Beyond Diversity Implementation of Scotese’s plate tectonic reconstructions Communities over time Environmental preferences Geodisparity Body-size distributions Geological Drivers Basic Features of the PBDB Organization Structure Finding data Drawing maps Downloading data Organization Database Coordinator: John Alroy (Macquarie University) Informal core group running mirror servers (3 persons) Data Management Committee (10) Data Contributors: Professional scientists (usually with PhD) (132) Data Enterers: Contributors and students (310) The Structure Basic information is the occurrence of a particular taxon (species, genus or higher) in a particular collection (i.e. sample or outcrop …) References linked to occurrences and collections Geographic and geologic context stored with each collection Taxa classified according to multiple opinions (synonymies, re-identifications) Finding Data Generate data summary tables Find collections Menu: Analyze Task: Marine Invertebrate Collections by Geological Period Menu: Full search – Fossil collection records Task: Find all collections containing lithistid sponges (Lithistida) in Germany Find taxa Menu: Full search – Fossil organisms Task: Get the full synonymy list of Brachiosaurus brancai Drawing Maps Draw fossil collections on a plate tectonic reconstruction of the appropriate age Menu: Analyze Tasks: 1. Get a map of Jurassic reefs in a Mollweide projection. 2. Identify the westernmost reef and get a list of fossils Downloading Data The most important step for further analyses Virtually all data in the PaleoDB are open access Downloads in csv format can be read by almost any program Menu: Download Task: Download all occurrences of Triassic sponges with coordinates/paleocoordinates, stage-level resolution and full taxonomic information Playtime + Break Advanced Features Ecological metrics of collections Diversity Confidence intervals of stratigraphic ranges Within and others sections and global Diversity curve generator Raw and sampling standardized Ecological Metrics Get alpha diversity and ecological data from a collection Menu: Analyze abundance data Task: Get the metrics of a Triassic community from China (e.g. collection #31618) and look feeding modes Background of Diversity Metrics Which community is more diverse? Measuring Alpha Diversity Shannon-Wiener Information Index (H) = -∑ pi x ln(pi) pi = Proportion of the ith species in community Mixed signal of richness and evenness H Evenness (J) Evenness J = H/Hmax Hmax = ln(S) Rarefaction Which species richness would I observe if my sample A was smaller than it is (e.g., as small as sample B) Mathematic solution: S E ( S n ) 1 i 1 N Ni n N n 60 50 Empirical solution: Let the computer draw specimens at random and get diversity for a given sample size 40 Species 30 20 Neogene 10 Jurassic 0 0 100 200 300 Individuals 400 500 600 Confidence Intervals of Stratigraphic Ranges The first and last observations of a taxon in the fossil record must be younger and older than its time or origination and extinction, respectively By how much? Quantifying globally uncertainties within sections and Draw a Stratigraphic Section Menu: Analyze stratigraphic sections Task: Try the Bangtoupo section in China Using the fossil record for molecular clocks Calibration: Estimate the branching points of two sister groups Menu: Analyze – Calculate a first appearance Task: Branching point between Acropora and Montipora Diversity Through Time Theoretical Background Counting methods Sampling issues Sampling standardization Hands on with R Counting Diversity Through Time A Through ranging B Through ranging C Extinct D Originating E Singleton Measuring Diversity A Through ranging B Extinct C Originating D Singleton E Through ranging Boundary crossers: 3 Range through: 5 Boundary crossers Range through Range through minus singletons: 4 Measuring Diversity Through Time Taxon 1 A Time 2 x x B C x D x E x x 3 4 x x x x x x Draw Diversity Curves: SIB, range through, range through minus singletons, boundary crossers Sampling Standardization of Time Series Data 2 or 3 Rarefaction (3) 2,23 Perhaps 2 1,89 Sure 2 2 This sufficient for sampled in bin (SIB) diversity, but silent on extinctions Diversity Over Time Omit Singletons S = 2, Ext = 0 S = 2, Ext = 0 S = 2, Ext = 2 S = 1, Ext = 0 S = 1, Ext = 0 S = 1, Ext = 1 S = 2, Ext = 0 S = 3, Ext = 1 S = 2, Ext = 2 S = 1.67, Ext = 0 S = 1.67, Ext = 0.33 S = 1.67, Ext = 1.67 Subsampling Methods Classical Rarefaction Occurrences weigthed by-list subsampling (OW) Pool collections Randomly draw collections until quota of collections is reached Occurrences-exponentiated weighted by-list subsampling (OexpW) Pool occurrences by collections Randomly draw collections until quota of occurrences is reached Unweighted by-list subsampling (UW) Pool all occurrence data Randomly draw data until quota is reached Pool occurrences by collections Randomly draw collections until weighted quota of occurrences is reached Shareholder Quorum Method Sampling until a particular proportion (quorum) of the rank-abundance distribution has been sampled Why so many? Rarefaction assumes that differences in diversity are due to sampling We might lose biological signal by attempting to sampling-standardize if we don’t consider evenness If evenness of communities is different, then rarefaction will mostly reflect these differences The best subsampling method has to consider several biases Lunch Hands-On with the PaleoDB Create a subsampled diversity curve with the online scripts Download a dataset and use the function: Generate diversity curve data Analyze Downloaded Data with R Open R Run the script PBDB_analyze.R Occurrence Data Now and Then Occurrence counts per 1° grid We Want You! The Paleobiology Database is from the community for the community Data quantity and quality need to be improved to increase rigor and scope of analyses Many important questions are yet to be addressed http://paleodb.org How to Enter Data Give it a try testpaleodb.geology.wisc.edu Login as Contributor: Authorizer: User60x, T. Enterer: User60x, T. Password: Berlin The Paleobiology Database (PaleoDB, www.paleodb.org) has been rapidly developing into a core infrastructure for palaeontology. The participation of 289 contributors from 22 countries made it possible that the PaleoDB now holds taxonomic and distributional information on 217,000 taxa and more than one million fossil occurrences. With 150 official publications, the scientific output is impressive, but could be improved if more colleagues would learn how to make use of the database for their own research. The purpose of this course is thus to familiarize paleontologists with the structure and scope of the PaleoDB and to introduce them to its analytical tools that are available online. Examples will be provided for paleo-community analysis, confidence intervals on stratigraphic ranges, and global and regional diversity patterns. Basic statistical concepts will be explained briefly, but the focus is on practicing with the database.