Dataset explanations

advertisement
Electronic supplementary material
Supplementary Text
This document contains two sections. The first, “Variable definitions”, defines the
variables in the dataset (see Dataset in the electronic supplementary material). The
second, “Frequently asked questions”, gives further justification and discussion of our
analyses that aid interpretation for researchers who wish to evaluate or repeat our
analyses
Variable definitions
Column A “Time”: Time before present (Ma) according to Harland et al. (1990).
Column B “Red”: The red curve in figure 4 of Royer et al. (2004), which is estimated
low latitude sea surface temperature relative to present (oC). Data originate from Veizer
et al. (2000) but have been adjusted for pH effects due to changes in seawater Ca++
concentration, and CO2 based on GEOCARB III.
Column C “Bottom orange”: the lower bound of the orange band in figure 4 of Royer et
al. (2004), which is as Red, but holding Ca++ levels constant.
Column D “Top orange”: the upper bound of the orange band in figure 4 of Royer et al.
(2004), which is as Red, but allowing the saturation state of CaCO3 in the ocean to vary
through time.
Column E “CO2”: RCO2 from GEOCARBIII (Berner & Kothavala 2001). RCO2 = ratio
of mass of CO2 at time t to that at present, as used in Royer et al. (2004).
Column F “All-Fam-Max”. Total standing diversity of families from Benton (1993),
using maximum dating assumption (see Benton 1993, 1995).
Column G “All-Fam-Min”. Total standing diversity of families from Benton (1993),
using minimum dating assumption (see Benton 1993, 1995).
Column H “Marine-Fam-Max”. Standing diversity of marine families from Benton
(1993), using maximum dating assumption (see Benton 1993, 1995).
Column I “Marine-Fam-Min”. Standing diversity of marine families from Benton (1993),
using minimum dating assumption (see Benton 1993, 1995).
Column J “Terrestrial-Fam-Max”. Standing diversity of terrestrial families from Benton
(1993), using maximum dating assumption (see Benton 1993, 1995).
Column K “Terrestrial-Fam-Min”. Standing diversity of terrestrial families from Benton
(1993), using maximum dating assumption (see Benton 1993, 1995).
Column L “All-Fam-pcOrig-Max”. Per capita origination rate for all families from
Benton (1993) using the maximum dating assumption (see Benton 1993, 1995). Per
capita origination rate for an interval is calculated as (number of originations/standing
diversity)/duration of interval.
Column M “All-Fam-pcOrig-Min”. Per capita origination rate for all families from
Benton (1993) using the minimum dating assumption (see Benton 1993, 1995). Per capita
origination rate for an interval is calculated as (number of originations/standing
diversity)/duration of interval.
Column N “Marine-Fam-pcOrig-Max”. Per capita origination rate for marine families
from Benton (1993) using the maximum dating assumption (see Benton 1993, 1995). Per
capita origination rate for an interval is calculated as (number of originations/standing
diversity)/duration of interval. Standing diversity here refers to that of marine families.
Column O “Marine-Fam-pcOrig-Min”. Per capita origination rate for marine families
from Benton (1993) using the minimum dating assumption (see Benton 1993, 1995). Per
capita origination rate for an interval is calculated as (number of originations/standing
diversity)/duration of interval. Standing diversity here refers to that of marine families.
Column P “Terrestrial-Fam-pcOrig-Max”. Per capita origination rate for terrestrial
families from Benton (1993) using the maximum dating assumption (see Benton 1993,
1995). Per capita origination rate for an interval is calculated as (number of
originations/standing diversity)/duration of interval. Standing diversity here refers to that
of terrestrial families.
Column Q “Terrestrial-Fam-pcOrig-Min”. Per capita origination rate for terrestrial
families from Benton (1993) using the minimum dating assumption (see Benton 1993,
1995). Per capita origination rate for an interval is calculated as (number of
originations/standing diversity)/duration of interval. Standing diversity here refers to that
of terrestrial families.
Column R “All-Fam-pcExtin-Max”. Per capita extinction rate for all families from
Benton (1993) using the maximum dating assumption (see Benton 1993, 1995). Per
capita extinction rate for an interval is calculated as (number of extinctions/standing
diversity)/duration of interval.
Column S “All-Fam-pcExtin-Min”. Per capita extinction rate for all families from Benton
(1993) using the minimum dating assumption (see Benton 1993, 1995). Per capita
extinction rate for an interval is calculated as (number of extinctions/standing
diversity)/duration of interval.
Column T “Marine-Fam-pcExtin-Max”. Per capita extinction rate for marine families
from Benton (1993) using the maximum dating assumption (see Benton 1993, 1995). Per
capita extinction rate for an interval is calculated as (number of extinctions/standing
diversity)/duration of interval. Standing diversity here refers to that of marine families.
Column U “Marine-Fam-pcExtin-Min”. Per capita extinction rate for marine families
from Benton (1993) using the minimum dating assumption (see Benton 1993, 1995). Per
capita extinction rate for an interval is calculated as (number of extinctions/standing
diversity)/duration of interval. Standing diversity here refers to that of marine families.
Column V “Terrestrial-Fam-pcExtin-max”. Per capita extinction rate for terrestrial
families from Benton (1993) using the maximum dating assumption (see Benton 1993,
1995). Per capita extinction rate for an interval is calculated as (number of
extinctions/standing diversity)/duration of interval. Standing diversity here refers to that
of terrestrial families.
Column W “Terrestrial-Fam-pcExtin-Min”. Per capita extinction rate for terrestrial
families from Benton (1993) using the minimum dating assumption (see Benton 1993,
1995). Per capita extinction rate for an interval is calculated as (number of
extinctions/standing diversity)/duration of interval. Standing diversity here refers to that
of terrestrial families.
Column X “Sep_ngen”. Standing diversity of marine animal genera from Sepkoski
(2002). Data were extracted from http://strata.ummp.lsa.umich.edu/jack/, using stage
resolution.
Column Y “Sep-pcOrig”. Per capita origination rate of marine animal genera from
Sepkoski (2002). Data were extracted from http://strata.ummp.lsa.umich.edu/jack/, using
stage resolution. Per capita origination rate for an interval is calculated as (number of
originations/standing diversity)/duration of interval, durations taken from Harland et al.
(1990) for comparability with above variables.
Column Z “Sep-pcExtin”. Per capita extinction rate of marine animal genera from
Sepkoski (2002). Data were extracted from http://strata.ummp.lsa.umich.edu/jack/, using
stage resolution. Per capita extinction rate for an interval is calculated as (number of
extinctions/standing diversity)/duration of interval, durations taken from Harland et al.
(1990) for comparability with above variables.
Column AA “allmaxbc”. Standing diversity of boundary crossing families, taken from
Benton (1993), using maximum dating assumption. For definitions of boundary crossers
see Foote (2000).
Column AB “allmaxp”. Estimated per capita rate of origination, p, of all families in
Benton (1993), using maximum dating assumption. p is defined as in Foote (2000):
-ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary
of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is
interval duration.
Column AC “allmaxq”. Estimated per capita rate of extinction, q, of all families in
Benton (1993), using maximum dating assumption. q is defined as in Foote (2000):
-ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary
of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is
interval duration.
Column AD “allminbc”. Standing diversity of boundary crossing families, taken from
Benton (1993), using minimum dating assumption. For definitions of boundary crossers
see Foote (2000).
Column AE “allminp”. Estimated per capita rate of origination, p, of all families in
Benton (1993), using minimum dating assumption. p is defined as in Foote (2000):
-ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary
of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is
interval duration.
Column AF “allminq”. Estimated per capita rate of extinction, q, of all families in Benton
(1993), using minimum dating assumption. q is defined as in Foote (2000):
-ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary
of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is
interval duration.
Column AG “marmaxbc”. Standing diversity of marine boundary crossing families, taken
from Benton (1993), using maximum dating assumption. For definitions of boundary
crossers see Foote (2000).
Column AH “marmaxp”. Estimated per capita rate of origination, p, of marine families in
Benton (1993), using maximum dating assumption. p is defined as in Foote (2000):
-ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary
of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is
interval duration.
Column AI “marmaxq”. Estimated per capita rate of extinction, q, of marine families in
Benton (1993), using maximum dating assumption. q is defined as in Foote (2000):
-ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary
of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is
interval duration.
Column AJ “marminbc”. Standing diversity of marine boundary crossing families, taken
from Benton (1993), using minimum dating assumption. For definitions of boundary
crossers see Foote (2000).
Column AK “marminp”. Estimated per capita rate of origination, p, of marine families in
Benton (1993), using minimum dating assumption. p is defined as in Foote (2000):
-ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary
of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is
interval duration.
Column AL “marminq”. Estimated per capita rate of extinction, q, of marine families in
Benton (1993), using minimum dating assumption. q is defined as in Foote (2000):
-ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary
of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is
interval duration.
Column AM “terrmaxbc”. Standing diversity of terrestrial boundary crossing families,
taken from Benton (1993), using maximum dating assumption. For definitions of
boundary crossers see Foote (2000).
Column AN “terrmaxp”. Estimated per capita rate of origination, p, of terrestrial families
in Benton (1993), using maximum dating assumption. p is defined as in Foote (2000):
-ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary
of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is
interval duration.
Column AO “terrmaxq”. Estimated per capita rate of extinction, q, of terrestrial families
in Benton (1993), using maximum dating assumption. q is defined as in Foote (2000):
-ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary
of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is
interval duration.
Column AP “terrminbc”. Standing diversity of terrestrial boundary crossing families,
taken from Benton (1993), using minimum dating assumption. For definitions of
boundary crossers see Foote (2000).
Column AQ “terrminp”. Estimated per capita rate of origination, p, of terrestrial families
in Benton (1993), using minimum dating assumption. p is defined as in Foote (2000):
-ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary
of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is
interval duration.
Column AR “terrminq”. Estimated per capita rate of extinction, q, of terrestrial families
in Benton (1993), using minimum dating assumption. q is defined as in Foote (2000):
-ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary
of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is
interval duration.
Column AS “Sepbc”. Number of marine animal broundary crossing genera from
Sepkoski (2002). Data were extracted from http://strata.ummp.lsa.umich.edu/jack/, using
stage resolution.
Column AT “sep-p”. Estimated per capita rate of origination, p, of marine animal genera
in Sepkoski (2002). p is defined as in Foote (2000): -ln(Nbt / Nt) / Δ t, where Nbt = number
of taxa that cross both the top and bottom boundary of an interval, Nt = number of taxa
crossing the top boundary of an interval and Δt is interval duration. Data were extracted
from http://strata.ummp.lsa.umich.edu/jack/, using stage resolution.
Column AU “sep-q”. Estimated per capita rate of extinction, q, of marine animal genera
in Sepkoski (2002). q is defined as in Foote (2000): -ln(Nbt / Nb) / Δ t, where Nbt = number
of taxa that cross both the top and bottom boundary of an interval, Nb=number of taxa
crossing the bottom boundary of an interval and Δt is interval duration. Data were
extracted from http://strata.ummp.lsa.umich.edu/jack/, using stage resolution.
Column AV “pulsedR”. Mean per-stage sampling probability in marine animal genera
taken from the model of pulsed origination and pulsed extinction in Foote (2005). This
model fit the data better then alternative models in that paper. Data provided courtesy of
Michael Foote.
Column AW “p&f01terrf”. Standing diversity, per epoch, of terrestrial sedimentary
formations, taken from column 4 of Appendix 2 in Peters & Foote (2001). Data originate
from Kerocher et al. (1967).
Column AX “p&f02f”. Standing diversity of marine sedimentary formations, per stage,
taken from the supplementary information of Peters & Foote (2002). Data originate from
Kerocher et al. (1967).
Column AY “cosunaD”. Standing diversity of marine sedimentary packages, per stage,
used in Peters (2005). Data originate from Childs & Salvador (1985), provided courtesy
of Shanan Peters.
Column AZ “cosunaP”. Estimated per capita rate of origination, p, of marine sedimentary
packages, used in Peters (2005). p is defined as in Foote (2000): -ln(Nbt / Nt) / Δ t, where
Nbt = number of packages that cross both the top and bottom boundary of an interval, Nt =
number of packages crossing the top boundary of an interval and Δt is interval duration.
Data originate from Childs & Salvador (1985), provided courtesy of Shanan Peters.
Column BA “cosunaQ”. Estimated per capita rate of extinction, q, of marine sedimentary
packages, used in Peters (2005). q is defined as in Foote (2000): -ln(Nbt / Nb) / Δ t, where
Nbt = number of packages that cross both the top and bottom boundary of an interval,
Nb=number of packages crossing the bottom boundary of an interval and Δt is interval
duration. Data originate from Childs & Salvador (1985), provided courtesy of Shanan
Peters.
Frequently asked questions
1. “Why did you transform the data?”
Answer: The data are skewed, requiring us either to transform the data to meet parametric
assumptions. The use of non-parametric ranking statistics involves throwing away
information embodied in the data, and this may seriously limit our ability to detect what
turn out to be relatively weak associations (correlations <0.5) when sample size is small.
Furthermore it would limit our ability to incorporate control variables and hence test the
robustness of our results that way, as parametric statistics are very flexible in that regard,
non-parametric statistics less so.
2. “Why did you de-trend the data prior to analysis?”
Answer: When time series show serial trends, spurious associations can arise simply
because of that. Whilst the trends might be causally linked, one cannot tell: they might
just as well be caused by other variables. So the variation in a time series, that one can
actually use to detect associations, is the variation around the trend. To examine that, we
first have to find the trend and then remove it by taking residuals from it. De-trending
also has a biological motivation and lends some robust elements to our analysis: It is
widely recognized that fossil diversity generally has been increased enormously over
geological time (either logistically or exponentially depending on who you read), no
doubt because of the standard process of clade growth. But the standard process of clade
growth is not of interest here, so it is the variation around this that we must consider.
Thus, it can be argued that an increase should only be regarded as significant biologically
if it is a residual from the trend. Another issue is that some of the long term trend is
artefactual due to changes in the availability of fossil-bearing rocks of different ages
(sampling effects). Here it also makes sense to de-trend to remove such artefacts..
3. “Why did you mean-standardize the residuals prior to analysis?”
Answer: This simply allows us to plot the two variables on the same y-axis (number of
standard deviations from the mean), which helps for visualization (see figures 1-3 of the
main paper).
4. “Why did you assess significance using bootstrapping instead of in the normal way?”
Answer: The transformed, detrended, mean-standardized time series are still serially
autocorrelated, meaning that data points close to each other in the time series take similar
values and hence are not fully independent. This means that the standard p values from
tables will be too low, giving rise to Type I errors. Bootstrapping is a randomization
procedure that allows us to calculate the correct probabilities even in the presence of such
autocorrelation.
5. “How did you control for sampling effects and other confounding variables in your
analyses?”
Answer: We did this using multiple regression (general linear model, the term we use in
the main paper, is the more general term for the class of statistics). Diversity, origination
or extinction was the response variable, and temperature was one explanatory variable.
Measures of sampling or other, possibly confounding, variables were included as other
explanatory variables, and a regression model is built using both temperature and the
other potential explanatory variable(s), so that the effect of temperature is considered
whilst the effect of these is already included.
6. “Raw data in some of the time series are at the stage level (about 80 data points), but
these were somehow converted to a coarser time series of about 50 points. How was this
done? What problems might it introduce?”
Answer: The temperature and CO2 data are point estimates at 10 Myr intervals. We
needed point estimates at 10 Myr intervals for diversity and origination/extinction. We
simply used the diversity/rates for whichever stage each 10 Myr point falls, thus avoiding
the need for aggregation or interpolation. The main problem with this is the usual one of
stages being of different duration, but use of boundary crossers etc should help to
ameliorate that. One consequence is that the same values for a stage are sometimes used
in consecutive time intervals, but this should actually be conservative given that we are
creating extra error that our tests have to withstand (i.e. in reality diversity is probably not
static but we are sometimes artificially making it so).
7. “If these associations are real, why did nobody notice them before?”
Answer: Correlations between CO2 and origination, extinction and diversity have been
found previously at the same scales, and CO2 is associated with temperature in our data
(they both basically signify greenhouse and icehouse worlds, though temperature is
obviously more definitive). This suggests that the temperature associations we report are
real. The temperature data are quite new, so there was not previously much opportunity to
conduct these analyses. A final reason is that the data are not simple and the associations
are not very strong; the diversity trends are non-linear, the temperature data are cyclical
with a more linear trend, and this means that associations will not be obvious on first
inspection, and crude analyses will not necessarily pick them up.
8. “There are several intervals in Earth history when climate has been warm, but diversity
has risen, for example the Paleocene/Eocene and mid Cretaceous, both periods of rising
diversity. Aren’t the present results at odds with those? Furthermore, geographically the
places with highest mean temperatures have highest diversity. Isn’t that observation also
at odds with the present results?”
Answer: Not necessarily. After you de-trend, the mid and late Cretaceous show falling
diversity residuals as temperature residuals are high (figure 1 of the main paper). So
detrending leads to the opposite conclusion. The Paleocene/Eocene likewise is relatively
warm, but forms part of a falling long term trend in temperature residuals, whilst
diversity rises. So de-trending alters one’s view of the associations. The reasons for detrending are explained above. In contrast, previously published associations between CO2
and diversity, origination and extinction in the Phanerozoic would predict associations in
the same directions as those we find. The observation on the geographic distribution of
biodiversity is also not necessarily at odds with our results. The latitudinal gradient in
diversity has persisted for millions of years (Willig et al. 2003), obviously independently
of global climate and total changes in global diversity. It is the changes in global diversity
that concern us here. Current studies (Williams et al. 2007) suggest that areas of high
diversity, mostly tropical, are currently most at risk from climate change because they
will experience the greatest loss of current climates. So a warmer world could be at
greater risk of taxonomic extinction. It would obviously be easier to interpret such
apparent conflicts if we knew what the underlying causes of our results are, but currently
we can only hypothesize.
9. “Why does standing diversity sometimes take non-integer values in your data?”.
Answer: This is a result of the date falling at a geological boundary, in which case the
mean diversity of the two intervals either side of the boundary was taken. Means were
also taken for rates in such cases.
10. “What defines a marine or terrestrial family in your data?”
Answer: “Marine” families from Benton (1993) are defined here as any family occupying
marine habitats, either wholly or in combination with other habitats. Similarly,
“Terrestrial” families are defined as any family occupying terrestrial habitats, either
wholly or in combination with other habitats.
11. “Why is there no data at time zero in some of your time series?”
Answer: At time 0 (=Holocene), taxonomic rates take extraordinary values, due the
unusually short duration of this geological interval. Such values were omitted from
analysis, recorded as “no data” here.
References:
Benton, M. J., ed. 1993 The fossil record 2. London: Chapman & Hall.
Benton, M. J. 1995 Diversification and extinction in the history of life. Science 268, 5258.
Berner, R. A. & Kothavala, Z. 2001 Geocarb III: a revised model of atmospheric CO2
over Phanerozoic time. Am. J. Sci. 301, 182-204.
Childs, O. E. & Salvador, A. 1985 Correlation of Stratigraphic Units of North America
(COSUNA). Amer. Assoc. Petr. Geol. Bull. 69, 173-180.
Foote, M. 2000 Origination and extinction components of taxonomic diversity: general
problems. Paleobiol. 26 (Supplement), 74-102.
Foote, M. 2005 Pulsed origination and extinction in the marine realm. Paleobiol. 31, 620.
Harland, W. B. et al. 1990 A geologic time scale. Cambridge: Cambridge University
Press.
Keroher, G. C. 1967 Lexicon of geologic names of the United States for 1936–1960. US
Geological Survey Bull.1200. Reston, VA: US Geological Survey.
Peters, S. E. 2005 Geological constraints on the macroevolutionary history of marine
animals. Proc. Natl. Acad. Sci. USA 102, 1236-1331.
Peters, S. E. & Foote, M. 2001 Biodiversity in the Phanerozoic: a reinterpretation.
Paleobiol. 27, 583-601.
Peters, S. E. & Foote, M. 2002 Determinants of extinction in the fossil record. Nature
416, 420-424.
Royer, D. L., Berner, R. A., Montañez, I. P., Tibor, N. J. & Beerling, D. J. 2004 CO2 as a
primary driver of Phanerozoic climate. GSA Today 14, 4-10.
Sepkoski, J. J. Jr. 2002 A compendium of fossil marine animal genera. Bull. Amer.
Paleontol. 363, 1–560.
Veizer, J., Godderis, Y. & François, L. M. 2000 Evidence for decoupling of atmospheric
CO2 and global climate during the Phanerozoic eon. Nature 408, 698-701.
Williams, J. W., Jackson, S. T. & Kutzbach, J. E. 2007 Projected distributions of novel
and disappearing climates by 2100 AD. Proc. Natl. Acad. Sci. USA 104, 57385742.
Willig, M. R., Kaufman, D. M. & Stevens R. D. 2003 Latitudinal gradients of
biodiversity: pattern, process, scale and synthesis. Ann. Rev. Ecol. Syst. 34, 273309.
Download