ele12485-sup-0002-Supinfo

advertisement
Supplementary information for
Ecological Interactions on Macroevolutionary Time Scales: Clams and
Brachiopods are more than Ships that Pass in the Night
Liow L.H., Reitan, T. & Harnik P.G.
Materials and Methods
1. Data
We downloaded all available brachiopod and bivalve observed occurrence data
recorded from marine deposits from the Paleobiology Database (PaleoDB,
downloaded 16 Dec 2014). We altered the following download options: Only
marine deposits were downloaded, genus names with qualifiers “aff., cf., ?, “, ex.
Gr., sensu lato” were excluded and genus names were replaced by subgenus
names where available. Note that taxa not identified to species level, e.g. “Genus
sp.” are included in our data. This means that dynamics will be more muted
(Wagner et al. 2007). Subgenera and genera are often treated as equal in rank in
macroevolutionary studies of fossil marine invertebrates (Roy et al. 1996;
Sepkoski 2002; Jablonski et al. 2003; Simpson & Harnik 2009; Foote & Miller
2013) because the morphological distinction between these ranks can be
arbitrary (Roy et al. 1996). This protocol is standard in global-scale analyses of
fossil marine bivalves (Roy et al. 1996; Jablonski et al. 2003; Simpson & Harnik
2009). However, workers concerned about taxonomic over-splitting in certain
groups, such as fossil brachiopods (Cooper 1970; Carlson & Fitzgerald 2007),
may favor the use of the genus level as the operational taxonomic unit (Harnik et
al. 2014; Powell et al. 2015). All analyses reported here treat subgenera and
genera as equal in rank; a preliminary analysis where we substituted genus
names using available subgenus names for only bivalves but not brachiopods
gave qualitatively similar results. In the main text, we explain why we analyzed
data for (sub)genera rather than species. We used “ma_max” and “ma_min” as
reported in the PaleoDB as the age range of each observed occurrence where
“ma_max” is the maximum age estimate of the fossil occurrence based on
geochronology if available or the interval name, if not, and where “ma_min” is the
equivalent minimum age estimate, both in millions of years ago. Before capturerecapture analyses (see next section), we removed data where reported age
ranges are greater than the largest age interval in the ICS (Cohen et al.
2013)(18.5My, duration of the Carnian); we also removed occurrences assigned
a Cambrian age. For brachiopods, we have 135251 data points representing
observations of 3420 (sub)genera while for bivalves, we have 156011 data
points representing 2679 (sub)genera. These are the data that we use for
capture-recapture analyses (see below). M. Clapham (30.6%), W. Kiessling
(14.6%) and A. Miller (11.7%) were the top three authorizers for the brachiopod
data we used while for the bivalve data, the top three authorizers were W.
Kiessling (18.7%), A. Hendy (16.4%) and M. Clapham (13.7%); the numbers in
parentheses indicate the proportion of total hours authorized.
2. Capture-recapture models for paleontological data: Pradel seniority model
Capture-recapture and related approaches have their roots in population ecology
(King 2014) and we and others have previously used such approaches to infer
diversification parameters using fossil observation data that are analogous to
capture-recapture data in ecology (Nichols & Pollock 1983; Connolly & Miller
2001b, a, 2002; Kröger 2005; Liow et al. 2008; Liow & Finarelli 2014). Here, we
briefly outline the approach we use and refer readers to Pradel (1996), Connolly
and Miller (2001b) and Liow & Finarelli (2014) for details.
The data available to us are the recorded observations of genera in the fossil
record. Assuming that there are no errors in taxonomic identification or age
assignment, we can infer that a taxon was extant and sampled if there is an
observation of it in our database. For instance, in the table below, Taxon A was
sampled in both Time 1 (oldest interval) and Time 3 (younger interval), but not
in Time 2, so we can infer that it must have been extant during Time 2 but simply
not sampled. However, for Taxon B, we know it was extant in Time 1, but we do
not know if it was extant but simply not sampled in Times 2, 3 or 4.
Time 1
Time 2
Time 3
Time 4
Taxon A
1
0
1
1
Taxon B
1
0
0
0
Given a dataset of such sampling histories in this classic Cormack-Jolly-Seber
(CJS) (Cormack 1964; Jolly 1965; Seber 1965) example where survival
probabilities are conditioned on first observation, the probability of observing
the sampling histories (sh) for Taxon A, can be written as
Pr( sh  [1011])  1(1  p2 )2p33p4 (1)
For Taxon B, it is
Pr( sh  [1000])  1  1  1 (1  p2 )(1  2 )
1(1  p2 )2(1  p3 )  1(1  p2 )2(1  p3 )3(1  p4 )
(2)
j is survival probability between the subscripted time interval and the next and
p is sampling probability within the subscripted time interval. Note that
multiple observations of the same taxon in a given time interval is treated simply
as “observation” as opposed to “non-observation” in this set-up. While it is
potentially fruitful to utilize multiple-observations, such an approach is beyond
the scope of this work but is part of our on-going work.
To simplify the modeling, we assume that all the taxa within our dataset have the
same survival probabilities and sampling probabilities within the same time
intervals (see later paragraph on assumptions). We could even assume that only
sampling is time-varying but that survival is non time-varying, then
L(j ,p2 ,p3 ,p4 ) =
N1011
(j 3(1- p2 )p3p4 )
{
}
N1000
[(1- j + j(1- p2 ) 1- j + j(1- p3 )(1- j + j(1- p4 ) ]
(3)
where N indicates how many cases in the dataset there are with the
corresponding sampling histories exemplified by taxa A and B. The maximum
likelihood estimates of the parameters can then be found.
The Pradel seniority model is a modification of the classic CJS model (described
above) where survival probabilities (the complement of extinction probability)
and “seniority” probabilities (the complement of origination probability if genus
observation data are used) can be estimated simultaneously with sampling
probabilities. Upon reparametrization of the Pradel model (Pradel 1996; Liow &
Finarelli 2014), net diversification (origination minus extinction) can also be
estimated.
The assumptions of capture-recapture approaches are well known and the
effects of violating these assumptions has been well-studied (Pollock et al. 1990;
Williams et al. 2002; Liow & Nichols 2010). To summarize, using the CJS model
as an example, i) sampling and survival probabilities are assumed to be equal for
all taxa in the dataset, ii) the sampling intervals are short relative to the
timespan over which survival is estimated and iii) the fate of each taxon is
independent of other taxa in the dataset. By making parameters taxon or time
specific, we can relax the first assumption. The second assumption is always
violated for paleontological data but simulations have shown that the effects of
this violation are not substantial (see Pollock et al. 1990, Williams et al. 2002,
Liow and Nichols 2010 and references therein). The third assumption leads to
over-dispersion and overly narrow confidence intervals for the estimates but
this can be corrected using a variance inflation factor if desirable. In our study,
we estimated diversification and sampling parameters for brachiopods and
bivalves separately and assumed a full-time varying model such that each of the
86 time intervals had a priori different estimates. We also removed time points
with large uncertainties (see Methods in main text and Table S1). We supply the
datasets we used as separate files in our supplementary information
(bivalves16dec2014Marine.csv and brachiopods16dec2014Marine.csv) and in
Table S1 we supply the untransformed estimates from the Pradel model.
3. Inference of macroevolutionary processes using SDE: background
Paleontological datasets often span a substantial amount of time and are
composed of observations of a given system at various time points. These
temporal observations, like other time series data, can exhibit considerable time
dependency, i.e. the state of the system at one time point can be correlated with
that at another time point. Paleontologists often wish to know whether a
paleobiological time series (e.g. phenotype, extinction rates) is correlated or
driven by another such biotic time series and/or an environmental time series
(e.g. predator-density, climate, sea-level). Given the time dependency inherent in
many time series (including paleontological and environmental time series),
attempts to identify relationships among time series using ordinary regression
and related techniques will fail. Although the estimate of the effect might be
unbiased, the uncertainty of that effect will be seriously underestimated because
ordinary regression assumes independent noise terms. This can often lead to
rejection of a true null hypothesis (Type I error). If however the time dependency
is addressed using a time series model that incorporates the auto-correlation
present in the data, reliable tests for effects can be performed.
Common tools for time series analysis, predominantly based on ARIMA
models, tacitly assume that the given time series have been sampled regularly
(equidistant) in time. Time series in paleontology, like the ones we are interested
in here, however, are often not temporally equidistant. For instance, the age of
observed fossils are haphazardly dispersed in time and geological stages are also
not regularly spaced temporally. Thus it is necessary to have statistical tools for
handling observations that have irregularly spaced time points. Stochastic
differential equations (SDEs) describe processes that are continuous in time,
such that analyses based on such models are able to deal with measurements
distributed irregularly in time. In addition, paleobiological processes are often
continuous even though our observations are discrete. SDEs can be linear or nonlinear and we use linear SDEs here because their likelihoods are analytically
tractable. For a detailed insight into SDEs, see Øksendal (2010); Evans (2013).
Linear SDEs have been used paleobiological studies, such as Brownian
Motion (Raup 1977) and the Ornstein-Uhlenbeck processes (Lande 1976).
However, linear SDEs also easily allow for studying the effect of one time series
on another (Hansen 1997) or even the effect of unmeasured time series on
measured time series (Hansen et al. 2008; Reitan et al. 2012). The general
expression for a linear SDE can be written as
dX(t )    A(t )X(t )  m(t ) dt  (t )dB(t ) (4)
where the first part of Eqn (4) is deterministic and the second part, stochastic. In
the first part of the equation, X (t ) is the vector process of interest (potentially
having both measured and hidden components), m(t ) is a vector and A(t ) and
(t ) are matrices with explicit time dependency and in the second part, B(t )
represents stochastic contributions to the system. The additive vector m , the socalled pull-matrix A and co-variance matrix  are often assumed to be timeindependent, hence simplifying calculations.
Assuming a constant pull-matrix 𝐴, the SDE can formally be transformed into
a stochastic integral with the form
t
t
t0
t0
X (t )  e  A(t t0 ) X (t 0 )   e  A(ut0 )m(u)du   e  A(ut0 ) (u)dB(u) (5)
which is a function of the previously known state at time t 0 . If no previous state
is known, one can set t 0   . Equation (5) retains the normality of the
stochastic contributions, thus only the expectation vector and the covariance
matrix of the vectorial process X (t ) (that is, all the possible combinations of time
points) are necessary for deriving a likelihood for different states of the process.
Assuming that that one can find an eigenvector decomposition VA  V where 
is a diagonal matrix of eigenvalues, then the expectation vector will be
t
E  X (t )| X (t 0 )  V 1e (t t0 )VX (t 0 )   V 1e (t t0 )Vm(u)du (6)
t0
and the covariance matrix of the state at two different time points t and v where
t>v will be

Cov  X (v ), X (t )| X (t 0 )  V 1   e (uv )V (u)(u)V e (uv )du  (V 1 ) (7)
 t0

where V  stands for the transpose of matrix V . When m and  are also
constant, algebraic expressions for the expectation (Eqn (8)) and covariance
(Eqn (9)) can be found:
E  X(t )| X(t0 )  V 1e(t t0 )VX(t0 )  V 11(1  e(t t0 ) )Vm (8)
Cov  X(v ), X(t )| X(t0 )  V 1(t , v ,t0 )(V 1 ) (9)
where (t , v , t0 )i , j 
e
 j ( t  ) i ( t t0  t0 )
e
i   j
i , j and   V V  (see Reitan et. al.
2012, including supplementary material).
In addition to the stochasticity of one state conditioned on the previous
state, independent measurement errors also contribute to the overall
stochasticity of the observations. Assuming that these errors are also normally
distributed, one can either express the likelihood through a multi-normal
distribution of all measurement points or use a Kalman filter (see Kalman 1960
and Reitan et al. 2012 for use in the linear SDE setting), which incrementally
calculates likelihood contributions one measurement at a time. We use a Kalman
filter because of its computational efficiency.
General vector and matrix expressions allow total flexibility in linear SDE
modeling. However, using these general expressions directly as models
introduces unnecessary complexity, possibly making some parameters
unidentifiable and rendering results difficult to interpret. Imposing restrictions
and extra structure in these equations alleviates this problem and makes it
possible to answer questions, such as “is the process stationary” or “is there a
relationship between process 1 and process 2 and if so, is that a causal
relationship (one process affecting/driving the other) or one of simple
correlation?”
For example, stationarity can be studied by comparing a Brownian motion
(BM) process, which is non-stationary, to an Ornstein-Uhlenbeck (OU) process,
which is stationary. BM can be described using only one stochastic variable
(although it can be expanded to being multivariate) in the SDE, that is,
dX (t )   dB(t ) . It is characterized by being normal, having a variance that
increases with time, var( X (t ))   2t and increments that are independent. BM
has been proposed as a null hypothesis for evolutionary processes (Raup 1977).
In contrast, the OU process, described by dX (t )   ( X (t )   )dt   dB(t ) is
stationary with an expectation  and stationary variance var(X(t))= s 2 /2a and
a correlation between the state at time t and time u of (t )  e a|t u| . The OU
process has been used as a model for phenotypic evolution where the fitness
landscape is bell-shaped (see Lande 1976) and having an optimum at  . The
pull  then describes the strength of the selection. The stochastic part
represents genetic drift in this model. The OU process has also been proposed as
a model for the process of change in the optimum of the fitness landscape itself
(Hansen 1997) The pull  is often re-parameterized using either characteristic
time t c  1/  or half-life, 𝛥𝑡1/2 = log (2)⁄𝛼 . The half-life describes the time it
takes for the distance from process to 𝜇 to be halved in expectation, as well as the
time it takes for the correlation to drop to 0.5. The incremental variance  can
also be re-parameterized as stationary variance s   / 2 . The OU process has
also been described as a model for stabilizing selection (Estes & Arnold 2007)
but when the phenotype is far from the optimum, it can also be viewed as a
model for directional selection.
Although we can use linear SDEs to encode different ideas (e.g. a meanreverting process, a random process etc), the same verbal idea can be modeled in
different ways. For instance, directional evolution can be described by a non-zero
value for the additive term for the Brownian motion such that
dX (t )  mdt   dB(t ) (Smaers & Vinicius 2009), or a linear expectation term in
the OU process, such that (t )  0  t (Estes & Arnold 2007). Model
comparisons can help clarify model differences, even if precise interpretations of
each model may be wanting. One must also be aware that there might be multiple
interpretations of the favored model (see Reitan et al. 2012).
So far, only single variable processes have been considered. Frequently
however, the objective is to examine the relationship between two such
processes. With linear SDEs this is possible, even if the measurements were not
sampled at the same time points and with non-equidistant gaps in the data.
Overlapping measurement periods are of course necessary. Relationships can be
investigated with a correlated noise model:
dX1(t )  1( X1(t )  1 )dt   1dB1(t ) (10)
dX 2(t )   2( X 2(t )  2 )dt   2 1   2 dB2(t )   2dB1(t ) (11)
Eqn (10) is an OU process, whereas Eqn (11) has not only its independent
contribution, but also a term from Eqn (10). While the expression given here is
asymmetrical in Eqns (10) and (11), these can be expressed in a symmetric
fashion while having the same statistical properties. Each process by itself will
have the same properties as an OU process, but the two processes seen together
will be correlated. If the pull (and half-life) is the same for the two processes, the
correlation between the state of the two processes at the same time will simply
be  while if the pulls are different, the correlation will be less, namely
2 12 /(1  2 ) . Such processes can be expected to have peaks and troughs
at the approximately the same time points, with no processes preceding the
other systematically. Note that BM processes might be similarly linked.
Another way of connecting two processes is through a causal relationship.
This can be unidirectional with one process driving the other, or bidirectional,
with one process driving the other and vice versa. A causal relationship between
two processes exist when the state of one process influences the outcome of the
other (Granger 1969; Schweder 1970). Such directional pulls are most easily
modeled in OU-like processes. If process X 1 (t ) influences X 2(t ) in a linear fashion,
we can write
dX2(t )  2( X2(t )  2   ( X1(t )  1 ))dt   2dB2(t )dt (12)
If we compare Eqn (12) with Eqn (10), we can see that X 2(t ) is an OU-like
tracking process that has an additive term that is influenced by X 1 (t ) . The two
processes will be correlated, but the peaks and troughs are now expected to
occur first in X 1 (t ) and then in X 2(t ) . The relationship between the two
processes can be summarized by a regression-like parameter,  , instead of the
correlation term,  . It is also fairly straightforward to exchange the OU process
with BM for the driving process, as was done in Hansen et al. 2008. Note also that
if process X 1 (t ) is not measured, its influence can still be detectable on the
measured process, as the auto-correlation of the process will be different from
that of an OU process. This is the basis for multi-layered SDE analysis (Reitan et
al. 2012).
A given time series can be fitted to a single-layer SDE, which can be a BM
process, an OU process, an OU process with a trend, etc., but it can sometimes be
better described by multiple SDEs that are encapsulated in casual layers (Reitan
et al. 2012). In such a multi-layered SDE, variables describing the data are
ordered with a directed causal flow. We then number the multiple processes
associated with a single measured time series (Layer 1), such that for a threelayered process, Layer 3 causally affects Layer 2, which in turn causally affects
Layer 1. Layer 1 would thus be an OU-like tracking of Layer 2, which in turn
would be an OU-like tracking of layer 3. The process of the lowest layer (layer 3
in our terminology) could be either an OU process, an OU process with a linear
time trend or a BM process. For identifiability, we set 𝛽 = 1 (see SI of Reitan et. al
2012) and m is set for the lowest layer as the other processes will inherit this
expectation. So if the lowest layer (Layer 3) is an OU process, the whole system
looks like this (where subscripts denote layers):
dX1(t) = -a 1(X1(t )- X 2(t))dt + s 1dB1(t)dt
dX 2(t) = -a 2(X2(t )- X 3(t))dt + s 2dB2(t)dt
dX 3(t) = -a 3(X3(t )- m )dt + s 3dB3(t)dt
Data quantity and quality can set limits to what relationships are detectable
with SDE analyses. With too few data points or with too little overlap between
the time periods of the measurements, a correlative or casual relationship
between two truly correlated or casually related processes might not be found. If
the measurements are too sparsely sampled in time to detect the autocorrelation in each process, causal relationships may be inferred as correlative
ones. If a hidden (layered) process, A, affects B and process B responds too
quickly compared to the sparseness of measurements, only the dynamics of
process A will be detected.
4. Bayesian framework and model comparison
Likelihood landscapes for evolutionary processes can be very complex and we
have previously encountered numerical problems in our maximum likelihood
approaches to analyzing linear SDEs (see Reitan et al. 2012). Hence we use
Bayesian Markov chain Monte Carlo (MCMC) analyses here to obtain samples
from the posterior distributions of the parameters, given the data and prior
distributions. For each parameter, we assigned an independent prior
distribution, which was set to be wide but informative: Expected value,  ,
logged stochastic contribution, log( ) , logged half-time, log(Dt1/2 ) , causal
connection,  , linear time trend,  , and logit-transformed correlation,
log((1   ) / (1   )) , were all assigned normal distributions adjusted to give a
target 95% credibility interval (CI) on the original scale. The expected value,  ,
was given a 95% CI of (-20,20) for the logged biotic rates, (-1000,1000) for sea
level, (-1,1) for normalized abiotic data (see Section 6 and main text) and (-
10,10) for the other abiotic series. The stochastic contribution was given a vague
prior with a 95% CI of (0.01,100) for logged biotic rates and (0.00001,100) for
the remaining time series. Half-life, Dt1/2 , and correlation,  , were assigned
(0.001,1000) and (-0.96,0.96) respectively as 95% CI for all series. Linear time
trends,  , and causal connections,  , were both given a CI of (-10,10).
Model comparison was performed using the Bayesian Model Likelihood (BML),
defined as the probability density of the data given the prior parameter
distribution of the model; BML( M )  f ( D | M )   f ( D |  M ) f ( M )d M where
f ( D |  M ) is the likelihood and f ( M ) is the prior distribution. The BML can be
used for deriving the Bayes factor (which compares two models),
B(M1 , M 2 )  f ( D | M1 ) / f ( D | M 2 ) or to calculate model probabilities in a
collection of models, P( M j | D) 
f ( D | M j ) P( M j )
M
 f ( D | M ) P( M )
i 1
i
. For our purposes, a given
i
null model (of no relationship between two time series) is assigned a prior
probability of 50% with each alternative model sharing the remaining 50%
probability equally. Thus Bayes factor for the null model versus the set of
alternatives is B(M 0 , M A )  P( M 0 | D) / (1  P(M 0 | D)) . This can also be equivalently
reported as the posterior probability of the null model, P(M 0 | D) .
Our numerical method for calculating the BML uses a multivariate normal
distribution adjusted to the MCMC samples as a proposal distribution in an
importance sampling scheme as described in Reitan and Petersen-Øverleir
(2009). In short, while the model probabilities do depend on the prior
distribution, they are robust. That is, the model probabilities are not very
sensitive to changes in the prior distribution, as long as these changes are not
dramatic (several orders of magnitude, see Reitian et al. 2012).
5. Comparing SDE with ordinary regression based approaches with
simulations
We simulated three scenarios then applied SDE, as well as regression techniques,
to recover the relationships between time series pairs. The first scenario is
where the two simulated time series are i) independent; the second where they
are ii) correlated, but not causally related; and the third where they are casually
related such that one time series drove they other (see main text Eqns 2 to 4).
Each times series pair was simulated 100 times, each with 200 data points
drawn uniformly over a time period [0, 100] such that there were no time points
found in both series (i.e. zero overlap of temporal data). While this sounds
extreme, we note that it is exceedingly rare that an isotope measurement
(paleoenvironmental proxy) stems from the same shell that contributed to a
taxon observation in the PaleoDB. Even taxa observed in the same named stage
are unlikely to be from the decade or century, more so they are from different
locations. The OU parameters were   2 , s  2 , Dt1/2 = 5 for the first series in
each pair and   2 , s  1 , Dt1/2 = 20 for the second. To enable linear regression
analysis, the simulated data were binned into 2My intervals. Note that there is a
mean of 48 time bins in the binned data as some of the 100 time intervals
contain no data points due to the randomized drawing of data. In addition to
performing regression directly on the binned simulated data, we also perform
the same analysis after shifting the data of one time series one time bin forward
relative to the other and vice versa. Bonferroni correction was used to deal with
the multiple testing. The code for generating and analyzing these simulated time
series can be found in simcausal.zip (attached here with our submission and
could also be deposited at Dyrad or another online repository).
Given that 95 out of 100 cases are classified as independent when they indeed
were independent, we can say that the linear regression on first-differenced time
series is well calibrated in this set of simulations, given 95% confidence (Table
S8). However, linear regression on the un-differenced time series gave a false
positive rate of 7+18+18=43% (Table S8). This invalidates the true positive rates
of the correlated and casually related cases. Note also that linear regression on
causally connected first-differenced time series performed poorly: 73 casual
cases were classified as independent. In contrast, SDE performed well in the
classification of relationships in all simulated scenarios (Table S8).
It must be stressed that linear regression on shifted time series is not the same
as inferring a true Granger causal relationship (Granger 1969, Schweder 1970,
see rest of section). But we have however assumed such an interpretation in
Table S8 as time lagged analyses have been used in the paleontological literature
as a test for casual relationships.
In the above simulations, the false positive and negative rates are both variable
in each approach. The Bayesian test uses BML based on vague prior knowledge
and is not constructed to yield 5% false positives, like a well-calibrated classic
test is supposed to (if a 95% confidence level is desirable). However, by varying
the target Bayes-factor for linear SDE analysis and p-value for classic regression,
one can create a ROC-curve. A ROC (Receiver Operating Characteristic) curve is a
plot that shows the relationship between false positives and true positives for
varying test sensitivities, such that comparisons between tests of different
philosophies are possible. An entirely random test will yield a diagonal line,
while a near perfect test would be a line squeezed towards the upper left corner
of a ROC plot (see Fig. below). The strength of a test calibrated to a 5%
significance level can be read from the true positive rate (TPR) when the false
positive rate (FPR) is 5%. Since we are now concerned only inferring a link
model or not, the nature of the relationship (correlated or causal) is not of
concern here. For correlated pairs of time series, linear regression on firstdifference data performed only slightly better than linear SDE for our example,
while linear regression on untransformed data performed very poorly (see panel
A in the figure below). For causally connected pairs, both regression methods
performed much worse than linear SDE analysis and in this particular case
almost as poorly as a random test (see panel B in the figure below). It must be
noted that the test strength given for the two regression tests at a 5%
significance level must be regarded as a best-case scenario since one must first
use time series models to calibrate these!
Figure: ROC-curves for A) correlated time series pairs, B) causally
connected time series pairs (x1->x2). The y-axes are true positive rates (TPR)
and the x-axes are false positive rates (FPR). In both A and B, blue lines indicate
linear regression on binned untransformed data, red lines indicate linear
regression on binned first-differenced data, green lines indicate linear SDE
analysis on original data. The black diagonal lines show the ROC curve of a
purely random test. The vertical line shows where to read the test strength for a
test calibrated to a 5% significance level.
The poor performance of linear regression on untransformed data is
unsurprising given the high number of false positives seen in our simulations
(Table S8). While linear regression on first differences yield better results than
for untransformed data, it is still far less efficient than linear SDE analysis. A
causal connection x1  x2 means that changes in x1 have an accumulated effect
on x2 , with a “memory” proportional to the half-life. This is not the same as a
correlation for a specific temporal shift between the two time series. For
temporally un-shifted data, both linear regression methods performed very
poorly on causally related time series. In our simulated example, the two tests
performed almost as poorly as an entirely random test. When we simulated x1
(the cause) as the slow series and x2 (the effect) as the fast series, linear
regression on untransformed data gave better results than linear regression on
first-difference data for lag=0. The reason might be that since x1 at one time
point is now well correlated with x1 at a later time point and x2 is correlated to
x1 at the former time point, x1 and x2 will also be correlated at the same time
point. Thus we cannot take for granted that first differenced data will yield better
regression results than untransformed data in all cases. Because of the reasons
laid out in this section, we did not perform linear regression on our empirical
data.
6. Data preparation for SDE
SDE analyses assume normality of data. While all the biotic time series, the δ13C,
δ34S and eustatic sea-level series are normally distributed as verified with a
Kolmogorov-Smirnov test, we had to transform δ180 and 87Sr/86Sr using 𝑥 =
Φ−1 (𝐹𝑦 (𝑦)) where 𝑦 is the original value, x is the transformed value, 𝐹𝑦 is the
estimated cumulative distribution function of the original values and Φ is the
cumulative distribution function of the standard normal distribution. A kernel
smoother was used in order to estimate 𝐹𝑦 using the “density” function in R
(R_Core_Team 2014), with adjustment 0.1 such that it is closer to the empirical
distribution function.
Results
Here, we list results in presented the supplementary Tables file and follow these
with some short remarks.
Table S1. Pradel seniority estimates for origination, extinction and sampling
probabilities
Table S2. Stand-alone time series summary: multi-layered process inference
Table S3. Relationships among paleoenvironmental proxies and biotic time
series (single-layered process inference)
Table S4. Relationships among paleoenvironmental proxies and biotic time
series (multi-layered process inference)
Table S5. Relationships among paleoenvironmental proxies (single-layered
process inference)
Table S6. Relationships among paleoenvironmental proxies (multi-layered
process inference)
Table S7: Relationships among biotic time series of bivalves and brachiopods
(multi-layered process inference)
Table S8: Classification results from simulations
Relationships among abiotic time series
Of the five abiotic time series, only low latitude δ180 and δ13C have a significant
positive correlative relationship with one another (Table S5-S6) after a
Bonferoni-like correction (see Methods). In contrast, a study using a differential
statistical approach (Hannisdal 2011) found that δ13C and δ34S, low latitude δ180
and δ13C, and δ34S and 87Sr/86Sr are pairs of time series exhibiting the strongest
signals of information transfer (IT), with the first two relationships being bidirectional while the last was unidirectional from δ34S to 87Sr/86Sr. Hannisdal
also found a unidirectional coupling from δ34S to δ180 that can be explained by
the other pairs of relationships. To summarize, we recovered only one of the
relationships found by Hannisdal (2011), who used a non-parametric
information transfer approach. The discrepancies between Hannisdal’s study
and ours may be due to some or all of the following causes: 1) Discrepancies
could simply be due to type I (Hannisdal 2011) and type II (our results) errors.
2) Hannisdal (2011) did not account for multiple testing in the IT analyses. 3)
Hannisdal conditioned pairwise relationships on a third time series, an approach
that we did not use. 4) Information transfer is more suited than linear SDE in
inferring non-linear relationships. However, one would not expect non-linear
relationships between two time series that both are normally distributed, even
though it is technically possible. Three of the abiotic time series are normally
distributed while we transformed the other two.
References cited
1.
Carlson, S.J. & Fitzgerald, P.C. (2007). Sampling taxa, estimating phylogeny and
inferring macroevolution: an example from Devonian terebratulide
brachiopods. . Earth and Environmental Science Transactions of the Royal
Society of Edinburgh, 98, 311-325.
2.
Cohen, K.M., Finney, S.C., Gibbard, P.L. & Fan, J.-X. (2013). The ICS International
Chrongostratigraphic Chart. Episodes, 36, 199-204.
3.
Connolly, S.R. & Miller, A.I. (2001a). Global Ordovician faunal transitions in the
marine benthos:proximate causes. . Paleobiology, 27, 779-795.
4.
Connolly, S.R. & Miller, A.I. (2001b). Joint estimation of sampling and turnover
rates from fossil databases: Capture-Mark-Recapture methods revisited.
Paleobiology, 27, 751-767.
5.
Connolly, S.R. & Miller, A.I. (2002). Global Ordovician faunal transitions in the
marine benthos: ultimate causes. Paleobiology, 28, 26-40.
6.
Cooper, G.A. (1970). Generic characters of brachiopods. In: North American
Paleontological Convention (ed. Yochelson, EL). Allen Press Field Museum
of Natural History, Chicago, pp. 194–263.
7.
Cormack, R.M. (1964). Estimates of survival from sightings of marked animals.
Biometrika, 51, 429-438.
8.
Estes, S. & Arnold, S.J. (2007). Resolving the paradox of stasis: models with
stabilizing selection explain evolutionary divergence on all time scales.
Am. Nat., 169, 227-244.
9.
Evans, L.C. (2013). An Introduction to Stochastic Differential Equations. American
Mathematical Society.
10.
Foote, M. & Miller, A.I. (2013). Determinants of early survival in marine animal
genera. Paleobiology. Paleobiology, 39, 171-192.
11.
Granger, C.W.J. (1969). Investigating causal relations by econometric models and
cross-spectral methods. Econometrica, 37, 424-238.
12.
Hannisdal, B. (2011). Non-parametric inference of causual interactions from
geological records. American Journal of Science, 311, 315-334.
13.
Hansen, T.F. (1997). Stabilizing selection and the comparative analysis of
adaptation Evolution, 51, 1341-1351.
14.
Hansen, T.F., Pienaar, J. & Orzack, S.H. (2008). A comparative method for
studying adaptation to a randomly evolving environment. Evolution, 62,
1965–1977.
15.
Harnik, P.G., Fitzgerald, P.C., Payne, J.L. & Carlson, S.J. (2014). Phylogenetic signal
in extinction selectivity in Devonian terebratulide brachiopods.
Paleobiology, 40, 675-692.
16.
Jablonski, D., Roy, K., Valentine, J.W., Price, R.M. & Anderson, P.S. (2003). The
impact of the pull of the recent on the history of marine diversity. Science,
300, 1133-1135.
17.
Jolly, G.M. (1965). Explicit estimates from capture-recapture data with both
death and immigration-stochastic model. Biometrika, 52, 225-&.
18.
Kalman, R.E. (1960). A new approach to linear flitering and prediction problems.
Journal of Basic Engineering, 82, 35-45.
19.
King, R. (2014). Statistical Ecology. In: Annual Review of Statistics and Its
Application, Vol 1 (ed. Fienberg, SE). Annual Reviews Palo Alto, pp. 401U983.
20.
Kröger, B. (2005). Adaptive evolution in Paleozoic coiled cephalopods.
Paleobiology, 31, 253-268.
21.
Lande, R. (1976). Natural-selection and random genetic drift in phenotypic
evolution. Evolution, 30, 314-334.
22.
Liow, L.H. & Finarelli, J.A. (2014). A dynamic global equilibrium in carnivoran
diversification over 20 million years. Proceedings of the Royal Society B:
Biological Sciences, 281.
23.
Liow, L.H., Fortelius, M., Bingham, E., Lintulaakso, K., Mannila, H., Flynn, L. et al.
(2008). Higher origination and extinction rates in larger mammals.
Proceedings of the National Academy of Sciences of the United States of
America, 105, 6097-6102.
24.
Liow, L.H. & Nichols, J.D. (2010). Estimating rates and probabilities of origination
and extinction using taxonomic occurrence data: Capture-recapture
approaches. In: Short Courses in Paleontology: Quantitative Paleobiology
(eds. Hunt, G & Alroy, J). Paleontological Society, pp. 81-94.
25.
Nichols, J.D. & Pollock, K.H. (1983). Estimating taxonomic diversity, extinction
rates, and speciation rates from fossil data using capture-recapture
models. Paleobiology, 9, 150-163.
26.
Øksendal, B. (2010). Stochastic Differential Equations: An Introduction with
Applications. Sixth edn. Springer.
27.
Pollock, K.H., Nichols, J.D., Brownie, C. & Hines, J.E. (1990). Statistical inference
for capture-recapture experiments. Wildlife Monographs, 1-97.
28.
Powell, M.G., Moore, B.R. & Smith, T.J. (2015). Origination, extinction, invasion,
and extirpation components of the brachiopod latitudinal biodiversity
gradient through the Phanerozoic Eon. Paleobiology.
29.
Pradel, R. (1996). Utilization of capture-mark-recapture for the study of
recruitment and population growth rate. Biometrics, 52, 703-709. .
30.
R_Core_Team (2014). R: A Language and Environment for Statistical Computing.
31.
Raup, D.M. (1977). Probabilistic models in evolutionary paleobiology. American
scientist, 65, 50-57.
32.
Reitan, T. & Petersen-Øverleir, A. (2009). Bayesian methods for estimating multisegment discharge rating curves. Stochastic Environmental Research and
Risk Assessment, 23, 627-642.
33.
Reitan, T., Schweder, T. & Henderiks, J. (2012). Phenotypic evolution studied by
layered stochastic differential equations. Annals of Applied Statistics, 6,
1531- 1551.
34.
Roy, K., Jablonski, D. & Valentine, J.W. (1996). Higher taxa in biodiversity studies:
patterns from eastern Pacific marine molluscs. . Philosophical
Transactions of the Royal Society B: Biological Sciences, 351, 1605-1613.
35.
Schweder, T. (1970). Decomposable markov processes. Journal of Applied
Probability, 7, 400-410.
36.
Seber, G.A.F. (1965). A note on multiple-recapture census. Biometrika, 52, 249259.
37.
Sepkoski, J.J. (2002). A compendium of fossil marine animal genera. Bulletins of
American paleontology, 363, 1-560.
38.
Simpson, C. & Harnik, P.G. (2009). Assessing the role of abundance in marine
bivalve extinction over the post-Paleozoic. Paleobiology, 35, 631-647.
39.
Smaers, J.B. & Vinicius, L. (2009). Inferring macro-evolutionary patterns using an
adaptive peak model of evolution. Evolutionary Ecology Research, 11, 9911015.
40.
Wagner, P.J., Aberhan, M., Hendy, A. & Kiessling, W. (2007). The effects of
taxonomic standardization on sampling-standardized estimates of
historical diversity. Proceedings of the Royal Society B-Biological Sciences,
274, 439-444.
41.
Williams, B.K., Nichols, J. & Conroy, M.J. (2002). Analysis and Management of
Animal Populations. Academic Press, San Diego, San Francisco, New York,
Boston, London, Sydney, Tokyo.
Download