Quantitative paleo-estimation: hypothetical experiments with extrapolation and the no-analog problem Figen Mekik

ELSEVIER
Marine Micropaleontology 36 (1999) 225–248
Quantitative paleo-estimation:
hypothetical experiments with extrapolation and the no-analog problem
Figen Mekik Ł , Paul Loubere
Department of Geology and Environmental Sciences, Northern Illinois University, De Kalb, IL 60115, USA
Received 4 June 1998; accepted 15 December 1998
Abstract
We experiment with artificial data to test the response of five numerical techniques in extrapolating paleo-environments
for no-analog conditions. No-analog conditions are those beyond the technique calibration (modern) data set and will be
encountered in applications to the geologic past, though they may not be easy to recognize. In the ideal, a numerical
technique will correctly extrapolate to no-analog conditions. Failing this, the technique will have a consistent, predictable
error response to increasing no-analog conditions, as these are measured by a reliable index. The no-analog conditions
that we used are a natural extension of the calibration conditions we created. Thus we test techniques for their response
to shifting environmental conditions rather than for factors unrelated to the ecology of the taxa (e.g. post-depositional
fossil preservation). Five numerical techniques we test with our hypothetical data are (1) multivariate regression of species
percents, (2) correlation-based principal components with linear regression, (3) covariance-based principal components
with linear regression, (4) correlation-based principal components with non-linear regression, and (5) the Imbrie and Kipp
technique. All the techniques show increasing estimation error as conditions depart from those of the calibration data set.
There are two main causes of error in our estimates: (1) the distorting effects of matrix closure on taxon abundances;
and (2) generation of ratio no-analogs among species abundances because of non-linear responses to conditions departing
progressively from the calibration range. With all the techniques, the distribution of error for no-analog conditions is
complex. Non-linear regression with factors shows the least predictable error response. We found that currently developed
no-analog indicators do not have a good correlation to estimation error. This means that better indicators, more closely
linked to the accuracy of estimates, need to be developed.  1999 Elsevier Science B.V. All rights reserved.
Keywords: multivariate techniques; modeling; microfossils; paleo-environments extrapolation
1. Introduction
Interpolation and extrapolation of modern environmental parameters from recent microfossil abundances and spatial distributions to down core samples has been an ongoing challenge for paleontoloŁ Corresponding
author. Tel.: C1-815-7531943; Fax: C1-8157531945; E-mail: figen@geol.niu.edu
gists since the 1930’s. Schott (1935) began this endeavor by using fossil plankton recovered from deep
sea cores to interpret Pleistocene climates. Ericson
et al. (1964) studied fluctuations in the abundance of
selected taxa of planktic foraminifera and their coiling patterns to interpret climatic temperature changes
through the Pleistocene. In addition to foraminifera,
tree rings, pollen, diatoms, coccoliths and Radiolaria
have also been used to create calibration data sets for
0377-8398/99/$ – see front matter  1999 Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 7 - 8 3 9 8 ( 9 9 ) 0 0 0 0 4 - 3
226
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
estimation of paleoclimatic parameters (e.g. McIntyre, 1967; Ericson and Wollin, 1968). The recognition of glacial–interglacial oceanographic changes
and paleocirculation patterns via benthic foraminiferal assemblage distribution (Streeter, 1973; Streeter
and Shackleton, 1979) and the identification of water masses by their benthic foraminiferal content
(Shnitker, 1974) are other examples of early studies
of paleo-environmental inferences drawn from benthic foraminiferal distribution patterns. These early
studies were qualitative or semi-quantitative.
In the early 1970’s the use of multivariate techniques was broached independently in three publications: the Imbrie and Kipp technique (1971) using
planktic foraminifera, analysis of tree-ring width
variation by Fritts et al. (1971) and a study on variations in pollen assemblages in lacustrine deposits by
Webb and Bryson (1972). The Imbrie and Kipp technique of using transfer functions to estimate paleoclimatic parameters from taxon abundance data became
the standard method for estimating sea surface temperatures in the Climate=Long-Range Investigation,
Mapping and Prediction (CLIMAP Project Members, 1976, 1981) project. Subsequently, reconstruction of paleo-environments from fossil data using
transfer functions became widespread (e.g. Moore
et al., 1980; Mix et al., 1986; Le, 1992; Loubere,
1994; Pisias and Mix, 1997). Multivariate numeric
analyses were performed on the faunal composition
and spatial distribution patterns of planktic foraminifera in the north Atlantic (Kipp, 1976; Dowsett
and Poore, 1990); the tropical Atlantic (Ravelo et
al., 1991); the northeast Atlantic (Ottens, 1992); the
Indian Ocean (Hutson, 1978); the equatorial Pacific
(Thompson, 1976) and the western north Pacific
(Thompson, 1981). Sachs et al. (1977) and Hutson
(1977) reviewed the accuracy of transfer functions
and the identification of no-analog conditions.
Le and Shackleton (1994) tested the Imbrie and
Kipp technique of estimating sea surface temperatures (SST) with simulated biological species abundance data in order to observe the effects of the number of factors in the calibration, regression types,
counting errors, calibration ranges and sub-surface
species. They demonstrated that if the number of
factors is too small, SST is over-estimated at low
temperature ranges and under-estimated at high temperature ranges. They have also shown that although
using non-linear equations amplifies the effect of
counting errors, these equations produce results with
higher accuracy when used within the calibration
range of the data set.
Loubere and Qian (1997) used artificial fossil data
in order to control species environmental responses,
environmental conditions and the sampling scheme.
They used Principal Components and Regression
Analysis for reconstructing specific environmental
parameters. They demonstrated that if the sampling
scheme is constructed in such a way that the controlling environmental parameters are orthogonal to
one another, the resulting factor patterns reflect these
variables most accurately. They also illustrated that
principal component structure matrices can be used
to interpret species responses and that regression
analysis can successfully draw independent environmental signals from the taxon compositional data.
The distortion of species abundances and spatial distribution produced by the confounding effects of matrix closure is also illustrated in their work. For the
successful application of transfer functions in recovering paleo-environmental parameters, a knowledge
of controlling environmental variables and their correlation to one another is necessary (Loubere and
Qian, 1997). Loubere and Qian (1997) did not explore methods for recognizing no-analog conditions
and their effect on multivariate numerical analyses.
We address these issues herein.
2. The no-analog problem
Quantitative reconstruction of environmental conditions for the geologic past depends on having a
modern calibration data set which encompasses the
past conditions, or on being able to extrapolate accurately from the calibration to the past conditions. In
the ideal, extrapolation is the less desirable approach,
but in reality, it is not always easy to recognize noanalog material, forcing extrapolation (e.g. Hutson,
1977); and sometimes extrapolation is necessary as
conditions in the past have no modern analogs.
Thus, it is important to determine how estimation
error develops for numerical techniques in response
to no-analog conditions. This is especially true for
no-analogs generated by shifting environmental conditions of interest, as opposed to no-analogs pro-
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
duced by factors separate from those we want to
estimate (e.g. fossil preservation as in Hutson, 1977).
No-analogs due to shifting environmental conditions
are likely to be the hardest no-analogs to recognize.
Paleo-estimates are generated by transfer functions which are empirically derived algebraic expressions that extract paleo-environmental variables
from paleontological abundance and spatial distribution data. Four important assumptions are inherent in
the application of transfer functions:
(1) Environmental conditions within the down
core data set fall within the range of variation of
these conditions in the calibration data set.
(2) The response of the taxa (percentage of abundance of fossil components within the data set) is
linearly or at least systematically controlled by the
environmental parameters under study.
(3) The ecological behavior of the taxa remains
constant in the past.
(4) Preservation of samples does not modify faunas in a way that is significantly correlated with the
environmental parameters of interest.
If any of these assumptions are not met, noanalog conditions can be created. Hutson (1977)
studied no-analog conditions, with planktic foraminifera, which are produced by high dissolution
rates in deep sea samples and tested the success of
employing transfer functions in estimation of sea
surface temperatures for recent material. He tested
species regression, principal component regression,
distance-index regression, diversity index regression
and a weighted average technique. He concluded that
among these, the first four provide variable estimates
under no-analog conditions whereas the weighted
average technique interpolates and provides the most
accurate estimates. According to Hutson (1977),
no-analog conditions can be by comparing species
abundances and the ratio of abundances among the
species in the calibration data set with those in the
down core data set; by low communality values (Imbrie and Kipp, 1971) in the down core data set or by
estimating significantly different paleo-environmental parameters via different numerical techniques.
Our objectives are:
(1) To analyze the success of numerical techniques in extrapolating environmental parameters
from an original calibration data set. We quantify and
examine the relationship between degree of assem-
227
blage no-analog and amount and type of estimation
error produced by several different paleo-estimation
approaches.
(2) To test five multivariate numerical techniques
capable of extrapolation in estimating environmental
parameters for samples outside the calibration range.
These five techniques are: species based multiple
regression; linear regression of factor loadings derived from the correlation matrix of the calibration
data set; linear regression of factor loadings derived
from the covariance matrix of the calibration data
set; non-linear regression of factor loadings derived
from the correlation matrix of the calibration data
set; and the Imbrie and Kipp (1971) technique where
samples are row normalized and the sums of squares
matrix is used instead of the correlation matrix in
calculating factor loadings.
(3) To examine the effects of matrix closure on
species apparent environmental responses and extrapolation of environmental conditions.
(4) To examine two methods for identifying noanalog samples and their relationship with estimation
error.
As outlined above, our analysis is based on noanalog conditions generated by changing the environmental conditions to which the organisms respond. This is different from no-analogs created by
factors independent of the organisms’ ecologies as
in the study by Hutson (1977; no-analogs produced
by bottom water driven dissolution of planktic foraminifera). Also, we do not examine techniques
like weighted-averaging and modern analog (Hutson, 1977; Prell, 1985; Ortiz and Mix, 1997) which
are incapable of extrapolating and therefore cannot
be used in studying no-analog conditions. Ideally,
we would like to identify no-analog samples and
obtain reasonable estimates of what they represent
by extrapolating from assemblage patterns in the
calibration data set.
Assessing the accuracy with which transfer functions estimate paleo-environmental parameters is difficult because an independent source for calculating
these parameters is generally lacking. Thus, we perform numerical experiments on artificial environmental parameters with artificial species responses
producing an artificial data set (after Loubere and
Qian, 1997). In this way, we have an independent
means of knowing the correct values for our pa-
228
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
leo-environmental variables outside calibration conditions. Therefore we can determine the amount,
pattern and causes of error in paleo-environmental
estimates calculated by transfer functions.
carbon flux and bottom water temperature. We base
our experiment on benthic foraminifera as it has
been shown in recent studies that these type of
environmental signals are embedded in benthic foraminiferal assemblage data (e.g. Loubere, 1996). In
our simulated setting (Fig. 1), temperature decreases
with depth while the flux of organic carbon to the
seabed decreases radially outward from the center
of the upwelling region. The contours for these two
variables are intentionally made orthogonal so that
they are not correlated to one another.
We assume our study area is inhabited by 12
species of benthic foraminifera as in Loubere and
3. Methods
The setting we created for this study is an artificial continental margin affected by an upwelling
system bringing nutrient-rich deep waters to the surface (Fig. 1). Our hypothetical system is controlled
by the two paleo-environmental variables: organic
50
20
40
30
15
10
20
5
10
0
Fig. 1. Change of bottom water temperature and organic carbon flux in artificial study area. Solid contour lines represent organic carbon
flux and dashed contours represent temperature. ž D sample locations for the calibration data set. All other symbols show sample
locations for the test data set. In this diagram and in all diagrams in this study, M D high temperature–high carbon samples, Ž D test
samples from the calibration area, D low temperature–low carbon samples and ? D low carbon-variable temperature samples.
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
Qian’s (1997) work and that these twelve species
respond only to organic carbon flux and bottom water temperature. Response patterns of the 12 taxa
in arbitrary units, as devised in Loubere and Qian
(1997), are utilized in this study without any changes
(Fig. 2). These response patterns were originally designed to imitate realistic ecologic behavior as well
as to provide a range of response types (for examples of real ecologic behavior see Imbrie and Kipp,
1971; Kipp, 1976; Loubere, 1981, 1991; Miller and
Lohmann, 1982; Lutze and Colbourn, 1984; Mackenson et al., 1993). Species 1 and 3 respond only
to organic carbon flux and increase in abundance as
the flux of organic carbon increases. Although both
species 5 and 12 are affected solely by temperature, their responses are opposite. Species 2, 4 and
7 are controlled by both variables and respond positively to them whereas species 6, 9 and 11 respond
negatively. Species 8 has a non-linear response pattern, becoming most abundant at high organic carbon
values and intermediate temperatures. Species 11 increases with higher temperatures and lesser amounts
of organic carbon flux.
The arbitrary numbers from the taxon response diagrams (Fig. 2) were converted to shell accumulation
rate at the seabed for each species by multiplying
with an arbitrary production factor (Table 1). The
production factor combines the rates of shell production and destruction yielding a net accumulation
rate for the shells. For every sample we calculate
species percentage by converting the independently
229
computed species accumulation rates into relative
abundances. All of our analyses are based on taxon
percentage data in keeping with the form of data
most often used in paleo-environmental analysis.
To construct the calibration data set, 30 sample
locations were chosen on our continental margin between organic carbon flux values of 20–40 g C m 2
yr 1 and 5–15ºC (Fig. 1). The percent abundance of
each species at each of these locations was calculated
and tabulated on a 30ð12 matrix (Appendix A). This
matrix is our calibration data set for the 12 species at
30 locations.
To construct a test data set for taxon percents outside the calibration range, 40 sample locations were
chosen (Fig. 1). Ten of these new locations were
selected within the calibration range and 30 are outside this range. These samples tend to behave as four
separate groups when we apply our transfer functions (see Fig. 1). The first group is made of samples
which fall within the calibration range. Nine samples
form a second group from high temperature–high
organic carbon flux areas. The third group is made
up of low temperature and low organic carbon flux
samples. The fourth group is made of 12 samples
from low organic carbon flux but variable temperature areas. The percent abundance of each species
for each of these 40 samples was calculated and
tabulated on a 40 ð 12 matrix (Appendixes B and C).
Regression coefficients obtained from the analysis
of the calibration data set and the transfer equations
were applied to the 40-sample test data set to make
Table 1
Species percent abundance information
Species
Production
factor
Calibration data
variation range
(%)
Test data
variation range
(%)
Species percent
means in test
data
Species percent
means in calibration
data
Species percent standard
deviations in calibration
data
1
2
3
4
5
6
7
8
9
10
11
12
10
5
3
5
2
3
10
5
3
3
3
5
12.4–19.8
1.8–13.9
0.0–1.1
4.8–21.8
0.9–5.2
0.2–8.3
15.9–28.3
7.5–13.4
0.7–10.5
0.0–3.9
0.0–3.6
10.0–28.0
0–20
0–19.1
0–4.3
0–31.1
0.3–9.3
0–19
4.5–30
0.3–13.5
0.1–17.3
0–19
0–13.3
1.5–44.8
11.83
5.83
0.82
15.49
2.99
5.55
19.03
5.57
6.17
4.32
2.52
19.92
15.67
6.76
0.26
15.95
2.47
3.07
20.43
10.47
4.84
1.04
0.70
18.52
2.21
3.51
0.44
3.88
1.13
2.46
2.79
1.75
2.70
0.97
0.97
4.62
230
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
Fig. 2. Ecologic response of 12 species of benthic foraminifera to temperature and organic carbon. ž D sample locations for the
calibration data set. All other symbols show sample locations for the test data set.
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
231
Table 2
Results for multiple regression and principal component structure matrices for the techniques involving factor analysis
Species
Temperature
þ
B
1
2
3
4
5
6
7
8
9
10
11
12
Constant
% correlation structure
0.52
0.04
0.31
0.03
1.66
0.11
0.01
0.03
0.23
0.04
0.68
0.18
18.97
–
Carbon
þ
B
0.46
0.06
0.05
0.05
0.74
0.10
0.01
0.02
0.24
0.01
0.26
0.34
–
–
0.92
0.01
1.32
0.26
2.32
0.15
0.26
0.19
0.52
0.88
0.75
0.81
48.52
–
0.38
0.01
0.11
0.19
0.50
0.07
0.14
0.06
0.27
0.16
0.14
0.71
–
–
bottom water temperature and organic carbon flux
estimates (Tables 2 and 3).
The first method, multiple regression of environmental parameters on species percents, emphasizes
the direct relationship between these parameters and
not any relationship among the 12 species (Table 2).
The species are treated unequally with emphasis on
those that best reflect the environmental parameter in
question.
The second method is extrapolation of environmental variables based on principal components
analysis using the correlation matrix. We generated
a correlation matrix from the calibration data set and
extracted its principal component structure matrix.
The principal component structure matrix (Table 2)
records the number of orthogonal patterns of species
variation needed to account for the observed species
correlations. We found two to three significant components (or factors) so the eigenvectors extracted
from the calibration data set were used to calculate
two or three factor loadings per sample. Then, a linear regression analysis was performed between each
environmental variable and the factor loadings. The
regression coefficients for the factors (Table 3) were
then applied to the test group of 40 samples after
these had been converted to factor loadings using
the eigenvectors from the calibration data set. To
use these coefficients, the test data were converted
Correlation-based
PC analysis
Covariance-based
PC analysis
Sums of squares-based
PC analysis
PC1
PC1
PC1
PC2
0.85
0.90
0.79
0.74
0.04
0.85
0.22
0.72
0.98
0.92
0.35
0.85
–
55.4
0.41
0.12
0.25
0.01
0.99
0.29
0.94
0.47
0.10
0.23
0.87
0.44
–
28.7
0.65
0.92
0.66
0.81
0.21
0.83
0.39
0.49
0.96
0.92
0.17
0.97
–
69.2
PC2
0.52
0.06
0.38
0.31
0.96
0.11
0.90
0.54
0.12
0.02
0.91
0.21
–
16.2
0.30
0.06
0.08
0.39
0.21
0.15
0.05
0.54
0.15
0.36
0.13
0.02
–
94.9
PC2
PC3
0.61
0.92
0.62
0.77
0.19
0.82
0.33
0.43
0.96
0.92
0.16
0.98
0.59
0.07
0.41
0.30
0.96
0.08
0.92
0.58
0.12
0.01
0.92
0.19
–
3.6
–
0.8
to ‘pseudo-component’ loadings using the species
means and standard deviations along with the eigenvectors of the calibration data. The procedure is to
column standardize the test data using the means and
standard deviations. Then the standardized data is
cross-multiplied by the eigenvector matrix to compute principal component loadings for each sample
in the test data. These loadings multiplied by our regression coefficients yield estimates of temperature
and organic carbon flux.
The third method is extrapolation of environmental variables based on extracting principal components using the covariance matrix. This test is
identical to the second one but the covariance matrix
was used instead of the correlation matrix of the calibration data set. Species were treated unequally in
this test with bias toward species having the largest
variance. This puts emphasis on the more common
species while still deriving orthogonal factors.
The fourth method is extrapolation of environmental variables based on principal components
analysis using the correlation matrix and non-linear
regression (Table 3) which uses cross-products and
squares of factor loadings. Non-linear regression has
usually been preferred over linear regression (e.g.
Imbrie and Kipp, 1971; Moore, 1973; Sachs, 1973;
Kipp, 1976; Lozano and Hayes, 1976; Geitzenauer
et al., 1976; Le and Shackleton, 1994) because it
232
r2
PC1
PC2
PC3
PC1 ð 2
PC1 ð 3
PC2 ð 3
PC12
PC22
PC32
Constant
Correlation-based PC analysis
linear regression
Covariance-based PC analysis
linear regression
Correlation-based PC analysis
non-linear regression
Imbrie–Kipp technique
non-linear regression
carbon
carbon
carbon
carbon
temperature
B
þ
0.97
1.89
0.96
0.12
–
–
–
–
–
–
29.30
–
0.93
0.34
0.02
–
–
–
–
–
–
–
þ
B
0.96
0.34
1.21
0.65
–
–
–
–
–
–
9.85
B
–
0.34
0.89
0.24
–
–
–
–
–
–
–
0.97
0.56
0.73
0.61
–
–
–
–
–
–
29.30
temperature
þ
þ
B
–
0.79
0.50
0.31
–
–
–
–
–
–
–
0.97
0.20
0.53
0.22
–
–
–
–
–
–
9.85
B
–
0.59
0.76
0.23
–
–
–
–
–
–
–
0.996
1.89
1.23
0.19
0.05
0.05
0.07
0.09
0.08
0.42
28.05
temperature
þ
B
–
0.93
0.43
0.03
0.04
0.02
0.04
0.12
0.07
0.12
–
0.994
0.41
1.35
0.70
0.08
0.05
0.17
0.01
0.09
0.06
10.06
þ
–
0.41
0.99
0.26
0.13
0.04
0.19
0.02
0.16
0.04
–
temperature
B
þ
0.98
–
22.14
45.18
–
–
59.55
–
14.89
115.5
27.85
–
–
0.79
0.78
–
–
0.14
–
0.11
0.33
–
þ
B
0.99
–
8.03
27.10
–
–
3.63
–
6.58
47.47
9.99
–
–
0.60
0.97
–
–
0.02
–
0.10
0.29
–
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
Table 3
Regression coefficients of principal components for each technique
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
has been empirically observed to increase the accuracy of estimates. Also, in downcore applications,
non-linear equations produce a lower number of unreasonable estimates (Sachs et al., 1977).
The fifth method is the application of the Imbrie
and Kipp technique. Both the calibration and test
data sets were row normalized. Row normalization
is considered most appropriate for foraminiferal data
as their assemblages are usually characterized by
fewer species (Sachs et al., 1977). The procedure for
previous tests was repeated except that the sums of
squares matrix was used in extracting principal components and the regression was non-linear (Table 3).
This method is biased toward taxa with the largest
means, and the factors are oblique, that is, correlated
to one another so that they may contain overlapping
species response patterns.
4. Distortion of data resulting from mathematical
analyses: effect of matrix closure
As demonstrated by Loubere and Qian (1997),
matrix closure is produced by the conversion of
species abundance data to percents which can yield
artificial correlations among species. Matrix closure
creates linear distortion in the ecologic response
patterns of the taxa within the calibration and test
data sets (Chayes, 1971; Krumbein and Watson,
1972; Butler, 1979). This distortion is most evident
among taxa that only respond to one environmental
parameter (compare Figs. 2 and 3 for species 1, 4, 5,
7, 11 and 12).
Overall, matrix closure has a somewhat homogenizing effect on taxon response by spreading environmental signals from species that carry a strong
environmental signal to those that do not respond
to that signal. In this way, matrix closure produces
spurious signals in the abundance patterns of certain taxa and becomes a potential source of error in
paleo-estimation.
5. Results
In the sections below we examine the response
of our various techniques to no-analog conditions.
We found two primary sources of error in making
233
no-analog estimates: (1) distortion of species true
responses by matrix closure; and (2) non-linear shifts
in species abundances for no-analog conditions that
produced ratio no-analogs among the taxa.
5.1. Multiple regression directly on species percent
abundance
This test is a multiple regression of each environmental parameter directly with taxon abundance
data. The twelve species were entered simultaneously into the regression against each of the two
environmental parameters we used. This analysis
was done using SPSS v. 6.1 (SPSS, 1995). The regression coefficients (Table 2) from the calibration
were applied directly to the 12 species in our 40 test
samples and comparisons of actual versus estimated
environmental values were made (Fig. 4A and B).
Error in this test ranges between 0 and 5ºC for
bottom water temperature (T ) estimates and between
0 and 13.5 g C m 2 yr 1 for organic carbon flux
(C) estimates. Low T –low C samples produce the
largest errors for both T and C.
5.1.1. Temperature
In the calibration data set multiple regression
.r 2 D 0:997/ of 12 species to temperature yields
species 5, 1 and 12 as most influential (Table 2). The
plot of true T vs. estimated T (Fig. 4A) illustrates
a bifurcating pattern at higher T . Samples from the
original calibration area are accurately estimated,
therefore interpolation is successful. Samples from
low C –high T regions are over-estimated. The range
of error for high T –low C and high T –high C
samples is 0–3ºC.
On the lower T end of the graph in Fig. 4A,
low T –low C samples are over-estimated. The error for these samples ranges between 3 and 4.5ºC.
Over-estimation in these samples is mainly caused
by the distortion in the ecologic response pattern of
species 12. Species 12 ideally responds only to T
(Fig. 2) but matrix closure creates a strong artificial
C-response for this species (Fig. 3). This distortion
is compensated for in the regression equation by
species 1 (þ D 0:46, Table 2) which only responds
to C. Use of species 1 in the regression algebraically
corrects for the pseudo-response of species 12 to
organic carbon. This correction fails at low T –low C
234
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
Fig. 3. Distortional effects of matrix closure on the ecological response of 12 species of benthic foraminifera when abundances are
calculated as percents. ž D sample locations for the calibration data set. All other symbols show sample locations for the test data set.
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
because species 1 is absent; so T is over-estimated.
This error is therefore a product of matrix closure.
There is modest over-estimation for high T –low
C samples which is a common trend in not only the
results of this test but also for extrapolation using
the correlation and covariance matrices in factor
analysis and linear regression. The causes for this
over-estimation are discussed under the results for
those methods.
5.1.2. Carbon
In the calibration data set, multiple regression of
12 species with C .r 2 D 0:998/ yields species 12 and
5 as having a strong negative influence (Table 2) and
species 1 as having a strong positive influence on the
calculations. The strongest effect on the regression is
produced by species 12 although it ideally responds
only to temperature (see Figs. 2 and 3) as discussed
above.
The plot of true C vs. estimated C for the 40 samples of the test data set (Fig. 4B) reveals samples
taken from areas within calibration conditions and
samples from high C –high T areas are correctly estimated. However, a set of samples having low C values are significantly over-estimated. The largest error
is in the estimate for sample 6 (error D 13.5 g C m 2
yr 1 ) which is located outside the upwelling zone (see
Fig. 1). Samples on the outer fringes of the upwelling
zone are over-estimated with an error range of 0–9 g
C m 2 yr 1 . The non-linear increase of abundance in
the distorted ecologic response patterns of species 9,
10 and 11 creates this over-estimation.
Ideally species 10 .B D 0:88/ should balance the
effect of species 12 .B D 0:81/ in the regression
calculations (Table 2). However, the non-linear abundance change of these species beyond calibration
conditions alters the interspecific ratio of abundance.
The ratio of species 10=species 12 is plotted in C –T
space (Fig. 5). Under calibration conditions the sp.
10=sp. 12 ratio changes between 0 and 0.15. Outside calibration conditions the ratio quickly grows
to become 0.43 at the low T –low C corner of the
graph (Fig. 5). This means that the positive effect of
species 10 in the regression .B D 0:88/ is greatly
exaggerated when compared to the negative effect of
species 12 .B D 0:81/ for low T –low C samples.
The result is that these samples are over-estimated
due to the ratio no-analog.
235
5.2. Extrapolation using correlation-based principal
components and linear regression
The correlation matrix generated from the calibration data is here used in the traditional method of
principal components analysis (Cooley and Lohnes,
1971; Morrison, 1976). This method treats all taxa
equally in an analysis of pattern regardless of taxon
abundance in the data set so that rare taxa are just as
important as common ones. The results of the regression are listed in Table 3 and the principal component
structure matrix of the principal components analysis
is listed in Table 2. The first factor is clearly related
to C which is positively reflected in species 2 and 1
and negatively reflected in 9, 10, 6 and 12 (see Figs. 2
and 4C and D). Principal Component 2 is inversely
correlated with T having high negative loadings from
species 5, 7 and 11 in the principal component structure matrix. The first two principal components extracted 84.1% of the data structure and accurately retrieved the two artificial environmental parameters we
used to construct the species abundance data matrix.
The principal component loadings for components 1
and 2 were used in multiple regression with the calibration data set in order to derive regression coefficients that could be used on the test data.
In T estimates, 83% of the test samples fall within
an error range of 0–2ºC. Variable T –low C and very
low T –low C samples produce errors from 2 to
10ºC. Among the C estimates only samples from the
calibration area had low error, 0–1.5 g C m 2 yr 1 .
All samples from outside calibration conditions had
errors between 2 and 8.5 g C m 2 yr 1 .
5.2.1. Temperature
Overall, T estimates are more accurate than C estimates in this method. Only samples from very high
T –low C areas contain large errors ranging from 4.5
to 10ºC (Fig. 4C). Principal component (factor) 2
.þ D 0:89/ is the dominant factor controlling T
estimates. Temperature for high T –low C samples is
mostly over-estimated due to the high negative loadings for species 5, 7 and 11 in the principal component structure matrix for factor 2. All three of these
species become very abundant in the high T –low
C corner of the ecologic response graphs (Fig. 3).
The high species abundances yield high negative factor loadings which, when multiplied with a negative
236
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
50
Estimated Organic Carbon
Estimated Temperature
30
20
10
0
A
0
10
40
30
20
10
20
B
0
10
Temperature
Estimated Organic Carbon
Estimated Temperature
50
10
0
C
0
10
40
30
20
10
0
20
D
0
10
Temperature
20
30
40
50
Organic Carbon
40
Estimated Organic Carbon
30
Estimated Temperature
40
50
20
20
10
0
-10
30
Organic Carbon
30
-10
20
E
0
10
Temperature
20
30
20
10
0
F
0
10
20
30
Organic Carbon
40
50
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
5.2.2. Carbon
Factor 1 .þ D 0:93/ is the dominant factor in
calculation of C estimates (Table 3). Carbon values
for low C –low T or variable T –low C samples produce error that ranges from 2 to 8.5 g C m 2 yr 1
(Fig. 4D). These samples are under-estimated as a result of high negative values for species 6, 9, 10 and
12 in the principal component structure matrix (Table 2). In their distorted ecologic response patterns
(Fig. 3), these four species increase in abundance in
the low T –low C corner of the graphs. Species 12 especially has a strong pseudo-C response at this corner
of the graph due to matrix closure as discussed above.
This increase in taxon abundances beyond calibration
conditions coupled with the negative signs in the principal component structure matrix lower the estimates
which results in under-estimation.
Carbon values for high C –high T samples are
modestly under-estimated. Fig. 6A and B illustrate
the behavior of factor 1 and factor 2 in carbon–
temperature space. At the high T –high C corners of
the graphs, factor 1 has a positive whereas factor 2
has a negative effect on the calculation of estimates.
factor 1 flattens in this corner of the graph (Fig. 6A)
so that loadings are lower than expected in comparison with the loadings in the calibration range. The
reason behind this ‘flattening’ is a shift in principal
component calculations from species like 6 or 12 to
50
Organic Carbon Flux
gC/m2/yr
regression coefficient, produce a large positive effect on the calculations and result in over-estimation.
Species 5 is most influential (structure matrix loading D 0.99). This is illustrated by the pattern of
factor 2 when plotted in carbon–temperature space
(Fig. 6B) which mimics the ecologic response pattern of species 5 (Fig. 3). Matrix closure creates
a C-response in species 5 causing it to increase
in abundance at the high T –low C corner of the
graph. Since this pseudo-C response is not clearly
developed in the calibration area (Fig. 3), it leads to
over-estimation for the no-analog samples.
237
40
0
30
20
0.10
0.20
10
0.35
0
0
5
10
15
20
Temperature ˚C
Fig. 5. Plot of the abundance ratio of species 10 to species 12
(sp. 10=sp. 12) in organic carbon–temperature space. ž D sample
locations for the calibration data set. All other symbols show
sample locations for the test data set.
species like 1. Also, the equation is highly dependent
on species 12 (structure matrix loading D 0.85,
Table 2), which principally corresponds to T at the
high T –high C corner of the. This results in loadings
higher than expected in the high T –high C corner
compared to the values in the calibration area. The
effect described is largely due to shifting taxon ecologic response across the C –T diagram (e.g. species
12) which in this case was caused by matrix closure.
5.3. Extrapolation using covariance-based principal
components and linear regression
Although this method is based on species with
the highest variance, the results are strikingly similar
to those from correlation-based factor analysis and
regression (Fig. 4E and F).
The error for T estimates ranges between 0 and
7ºC and for carbon estimates between 0 and 10 g
C m 2 yr 1 . Similar to results from extrapolation
Fig. 4. (A) T estimate vs. true T for multiple regression directly on species percent abundance. (B) C estimate vs. true C for multiple
regression directly on species percent abundance. (C) T estimate vs. true T for correlation-based principal components analysis and
linear regression. (D) C estimate vs. true C for correlation-based principal components analysis and linear regression. (E) T estimate vs.
true T for covariance-based principal components analysis and linear regression. (F) C estimate vs. true C for covariance-based principal
components analysis and linear regression.
238
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
50
5
3
2
40
30
Organic Carbon Flux
gC/m2/yr
50
6
1
40
-1
-2
-3
-5
0
2
3
30
1
20
10
-16
0
-10 -7
-12
-14
0
-1
-5 -3 -2
10
A
0
0
5
10
15
50
40
40
-15 -10
15
0 10
5
10
15
1 0
20
-3
-2 -4
-7
6 5 3
30
-10
-25 -20
-30
20
B
0
20
50
30
-9
-10
-14
4
5
20
-14
20
10
10
-40
0
C
0
0
5
10
15
20
D
0
5
10
15
20
Temperature ˚C
Fig. 6. Behavior of principal components in organic carbon–
temperature space. (A) PC1 based on correlation matrix. (B)
PC2 based on correlation matrix. (C) PC1 based on covariance
matrix. (D) PC2 based on covariance matrix. In each diagram,
the shaded area represents the calibration range.
using the correlation matrix, High T –high C samples are over-estimated with error margins ranging
from 0 to 1.3ºC; and low T –low C samples are
under-estimated with error ranging from 0 to 1.5ºC.
High T –low C estimates are the least accurate with
error ranging from 2.5 to 7ºC. For C, all samples
not within calibration conditions are under-estimated
except for sample 6. Error for calibration samples
ranges from 0 to 1.5; for low C –low T samples from
1.5 to 4.5, for low C –variable T samples from 1.5
to 4 and for high C –high T samples from 6.5 to
10 g C m 2 yr 1 . The causes for these errors are
identical to the causes described for error in extrapolation using the correlation matrix. The behavior
of the factors derived from the covariance matrix in
carbon–temperature space (Fig. 6C and D) is similar
to those from the correlation matrix.
5.4. Extrapolation using correlation-based principal
components and non-linear regression
Once again the correlation matrix derived from
the calibration data set was used in calculating T
and C estimates for the test data set; but instead
of linear regression, non-linear regression was performed. Non-linear regression, where squares and
cross-products of the factors extracted from the calibration data set are used as independent parameters,
will theoretically yield more accurate results (e.g.
Imbrie and Kipp, 1971; Lozano and Hayes, 1976; Le
and Shackleton, 1994), at least for interpolation.
Fig. 7A and B illustrates plots of estimated C
and T vs. their true values for the test data. The
results from non-linear regression are more random
and more widely scattered than those from linear
regression. For T estimates, the error for calibration
and high T –high C samples is 0–2ºC. For low T –
low C samples it is 3–13ºC and for variable T –low
C samples it is 2–16ºC. For C estimates, the error
for calibration samples is 0–3 g C m 2 yr 1 , for
high T –high C samples 0–6.5 g C m 2 yr 1 , for
low T –low C samples 0.5–28 g C m 2 yr 1 and for
variable T –low C samples 0–18 g C m 2 yr 1 .
5.4.1. Temperature
Unlike the success in T estimates for the first
three methods, this method yielded scattered results
(Fig. 7A). Low T –low C samples are highly overestimated and high T –low C samples are strongly
under-estimated. The reasons behind these spurious
results are subtle and complex. Both under-estimation of low C –high T samples and over-estimation
of low T –low C samples are caused by high factor
loadings at extreme conditions beyond calibration
range (Fig. 6A and B). The cross-product, factor 1 ð
factor 2 .B D 0:08/, produces high negative values
at the low C –low T corner of the C –T diagrams
and high positive values at the low C –high T corner
of these diagrams (see Fig. 6A and B). The multiplication of these large negative and positive values
with a negative regression coefficient .B D 0:08/
causes over-estimation for low C –low T samples
and under-estimation for low C –high T samples,
respectively.
5.4.2. Carbon
The C values are highly over-estimated for most
samples (Fig. 7B) and even for some samples from
the calibration area. factor 1 has the largest þ coefficient in the regression results (þ D 0:93, Table 3)
followed by factor 2. þ coefficients for cross-prod-
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
60
Estimated Organic Carbon
Estimated Temperature
20
10
0
A
0
10
50
40
30
20
10
20
B
0
10
Temperature
Estimated Organic Carbon
Estimated Temperature
40
50
20
10
0
C
0
10
30
20
10
0
20
D
0
10
Temperature
20
30
40
50
Organic Carbon
20
40
Estimated Organic Carbon
18
Estimated Temperature
30
40
30
16
14
12
10
8
6
4
2
20
Organic Carbon
40
-10
239
E
0
10
Temperature
20
30
20
10
F
0
10
20
30
40
50
Organic Carbon
Fig. 7. (A) T estimate vs. true T for correlation-based principal components analysis and non-linear regression. (B) C estimate vs. true
C for correlation-based principal components analysis and non-linear regression. (C) T estimate vs. true T for the Imbrie and Kipp
technique. (D) C estimate vs. true C for the Imbrie and Kipp technique. (E) T estimate vs. true T for the Imbrie and Kipp technique
without using PC1 in calculations. (F) C estimate vs. true C for the Imbrie and Kipp technique without using PC1 in calculations.
240
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
ucts and squares are small (Table 3) except for
squares of factor 1 and factor 3. The causes for error
in C estimates are similar to those for T estimates.
However, the reason for over-estimation of both low
C –low T and low C –high T samples is that the
square of factor 1 obtains high positive values at
low C conditions. These values increase non-linearly with distance from the calibration range. The
accumulation of such high positive values in the
regression calculations results in over-estimation.
Overall, application of non-linear regression in
obtaining environmental parameters from our test
data set has produced less accurate results than linear
regression.
5.5. The Imbrie and Kipp technique
In this technique, species percents in the samples
from both the calibration and test data sets are row
normalized. The sums of squares matrix is used in
calculating eigenvectors and the regression is nonlinear. Two approaches for the estimation of T and
C have been tested. First, all three principal components and their squares and cross-products were
employed in the regression yielding the results in
Fig. 7C and D. Although T estimates (Fig. 7C) appear to be more accurate than C estimates (Fig. 7D),
the strong scatter in both plots results from including
factor 1 in the calculations. factor 1 incorporates the
abundance means of species within it and therefore,
error is amplified for species whose test data set
means are much different from their calibration data
set means. Fig. 7E and F show C and T estimates
calculated by using only factor 2 and factor 3 and
their squares and cross-products. The extrapolation
for both C and T are improved as factor 2 and factor
3 are based on variations of abundance among taxa
rather than their means.
5.5.1. Temperature
The error for T estimates is the lowest of all tested
methods (0–2.15ºC and the mean error for extrapolation is 1.13ºC). factor 3 .þ D 0:97/ (Table 3) is the
dominant factor for T estimates and species 5, 7 and
11 are most influential in the principal component
structure matrix for factor 3 (Table 2). All three of
these species have high abundances at high T areas
regardless of the amount of C.
5.5.2. Carbon
The error for calibration samples is 0–3 g C m 2
yr 1 and for all other samples is 2–18 g C m 2
yr 1 . The þ coefficient for factor 2 and factor 3
(0.79 and 0.78, respectively, Table 3) are close and
illustrate that both factors contribute equally to the
calculation of the estimates. Species 9, 12 and 5 have
the largest values in the factor structure matrices
(Table 2) of factor 2 and factor 3. Species 5 and
12 which were originally designed to only respond
to T (Fig. 2) also respond to C (Fig. 3) due to
distortion by matrix closure and row normalization.
Matrix closure and row normalization affect species
9 to a lesser degree. The C response introduced into
the behavior of species 5 and 12 is probably the
strongest reason behind more scatter among carbon
estimates (Fig. 7F).
The reasons for poor estimation of C values by
this method are generally the same as those noted for
extrapolation using the correlation matrix and nonlinear regression. In both methods where non-linear
regression was applied, the magnification of non-linear species abundance trends and the confounding
effects of matrix closure and row-normalization create error in calculation of estimates. Error is also
amplified by squaring and cross-multiplying factors.
6. Comparing methods under no-analog
conditions
It is clear that no-analog conditions can adversely
affect the quantitative estimators we tested. To deal
with this problem, we would ideally like to have noanalog indicators that have a consistent relationship
to estimation error. Then we could use the no-analog
indicators to determine likely estimation precision.
Two primary no-analog indicators have been considered by various workers. The first is some measure
of species percents that are beyond the range seen
in a calibration data set (out of range no-analog)
and the other is sample communality, a measure of
how well an assemblage can be recreated by linear
combinations of assemblages in a calibration data
set. Both of these no-analog indices could potentially
identify samples in which matrix closure and ratio
no-analogs, both trouble makers in our tests above,
could lead to estimation error. In this section we ex-
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
amine the relationship of an out-of-range index and
sample communality to estimation error in our test
data set.
The ‘Range No-Analog’ Index (RNA D the sum
for a sample of species departures from their calibration percent range) that we used is calculated for
each sample by finding the percentage by which each
species lies outside the calibration data set range for
that species (percentage D 0 if the species is in
range) and then summing the species out-of-range
values for each sample. The sample communality
that we used was defined by Imbrie and Kipp (1971).
6.1. Error vs. range no-analog (RNA) index
We would like to find a predictable relationship
between the amount of estimation error and increasing value of the no-analog index. We are seeking
a relationship where the error in the estimates increases gradually with the degree of no-analog. Ideally the error in the estimates should be small until
the degree of no-analog becomes very large.
Estimation error for our test samples (for T and C,
respectively) was plotted against our RNA index for
each numerical technique we tested in Figs. 8 and 9.
Error in T estimates from species based regression is low up to about 20% RNA index values
(Fig. 8A). Beyond this index value there is considerable scatter in the index vs. error relationship. For C
estimates (Fig. 8B), once again there is considerable
scatter in the error to index plot. Error can increase
very rapidly even at low RNA index values.
Error in temperature estimates for correlationbased principal components analysis (Fig. 8C) progressively increases with increasing percentage of
the RNA index. Overall, error reaches 1.5ºC with
20% RNA index values, š2.5ºC with 40% and š4ºC
with 60%. This pattern is not as clearly developed
in C estimates (Fig. 8D) calculated by the same
technique. There is considerable scatter beyond a 5%
RNA index. For example, high T –high C samples
have values up to 30% on the RNA index yet have
low errors (0–1 g C m 1 yr 1 ) whereas samples from
the calibration region have values close to 0 on the
RNA index but can have up to 1.5 g C m 1 yr 1 of
error.
Error for T estimates in covariance-based factor
analysis display a similar pattern to those in corre-
241
lation-based factor analysis (Fig. 8E). However, the
scatter for C is much different (Fig. 8F). Samples
fall into two groups. Most samples have an error
within š0–4 g C m 1 yr 1 . However, high C –high
T samples, which have only 0–30% RNA index
values, have errors of š7–10 g C m 1 yr 1 . Thus,
in this case the relationship between the estimation
error and the RNA index is complex and dependent
on environmental conditions, with high T –high C
samples producing the larger errors.
Both T and C estimates from correlation-based
principal component analysis using non-linear regression are much higher than all previous techniques.
There is also considerable scatter for RNA index
values higher than 20% on both plots for T and C
(Fig. 9A and B). This scatter reflects a high sensitivity to no-analog conditions and poor extrapolation.
The error vs. RNA index plots for the Imbrie and
Kipp technique (Fig. 9C and D) also show wide
scatter. Unlike the results of previous techniques,
the error for T estimates in this technique is very
low (error margin D 0–2.25ºC; mean of estimation
error D 1.13) whereas the error for C estimates is
large (0–18 g C m 1 yr 1 ). The large error for C
is comparable to the results from correlation-based
principal components analysis and non-linear regression where sample 6 had an exceptionally high error
of š28 g C m 1 yr 1 (Fig. 9A and B). Although
error is small in T estimates, there is wide scatter in error values with respect to the RNA index.
(Fig. 9C). Similarly for C estimates wide scatter
is observed for RNA index values higher than 5%
(Fig. 9D). Overall, this scatter in the error vs. RNA
index plots of both T and C estimates, regardless
of the size of error, denotes that the relationship
between estimation error and the range no-analog
index is complex.
6.2. Error vs. sample communality
The communality of a sample is defined as the
measure of how well the taxonomic components of
a sample may be accounted for by analysis with
the calibration components (Imbrie and Kipp, 1971).
Therefore, the lower the communality, the higher the
degree of no-analog. However, high communality
does not necessarily imply perfect analogy and=or
correct estimates.
242
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
10
10
Error in Carbon Estimates
Error in Temperature Estimates
12
8
6
4
2
0
-2
-20
A
0
20
40
60
80
8
6
4
2
0
-2
-20
100
B
0
20
RNA - Index
Error in Carbon Estimates
Error in Temperature Estimates
80
60
80
60
80
100
14
4
3
2
1
0
C
0
20
40
60
80
12
10
8
6
4
2
0
-2
-20
100
D
0
20
RNA - Index
40
100
RNA - Index
8
12
7
Error in Carbon Estimates
Error in Temperature Estimates
60
RNA - Index
5
-1
-20
40
6
5
4
3
2
1
10
8
6
4
2
0
-1
-20
E
0
20
40
RNA - Index
60
80
100
0
-20
F
0
20
40
RNA - Index
100
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
30
Error in Carbon Estimates
Error in Temperature Estimates
20
10
0
-10
-20
A
0
20
40
60
80
20
10
0
-10
-20
100
B
0
20
RNA - Index
60
80
60
80
100
20
Error in Carbon Estimates
Error in Temperature Estimates
40
RNA - Index
2.5
2.0
1.5
1.0
0.5
0.0
-20
243
C
0
20
40
60
80
100
RNA - Index
10
0
-10
-20
D
0
20
40
100
RNA - Index
Fig. 9. (A) RNA index vs. amount of estimation error of T for correlation-based principal components analysis and non-linear regression.
(B) RNA index vs. amount of estimation error of C correlation-based principal components analysis and non-linear regression. (C) RNA
index vs. amount of estimation error of T for the Imbrie and Kipp technique. (D) RNA index vs. amount of estimation error of C for the
Imbrie and Kipp technique.
Estimation errors for T and C from the Imbrie and
Kipp technique are plotted against sample communality in Fig. 10A and B. On both graphs about
85% of the samples fall between 0.9 and 1 units of
communality. However, there is considerable scatter
on the T estimate error vs. sample communality plot
(Fig. 10A). For T , communality does not seem to be
related to estimation error at all and a wide range
of error is found at high communality values. On
the C estimation error vs. sample communality plot
(Fig. 10B), a somewhat scattered yet linear relationship between estimation error and communality is
Fig. 8. (A) RNA index vs. amount of estimation error of T for multiple regression directly on species percent abundance. (B) RNA
index vs. amount of estimation error of C for multiple regression directly on species percent abundance. (C) RNA index vs. amount of
estimation error of T for correlation-based principal components analysis and linear regression. (D) RNA index vs. amount of estimation
error of C for correlation-based principal components analysis and linear regression. (E) RNA index vs. amount of estimation error of
T for covariance-based principal components analysis and linear regression. (F) RNA index vs. amount of estimation error of C for
covariance-based principal components analysis and linear regression.
244
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
discernable. For C, error increases with decreasing
communality. An inverse relationship between communality and our RNA index is shown in Fig. 10C,
however, there is considerable scatter. This means
that in some cases linear extrapolation of calibration factors to model taxon abundances is possible
for out-of-range samples. Apparently adequate modelling occurs up to out-of-range values of 50%. This
reflects the use of environmental conditions that are
an extension of calibration conditions to create our
no-analog test data set.
Error in Temperature Estimates
2.5
2.0
1.5
1.0
0.5
0.0
.7
A
.8
.9
1.0
1.1
7. Conclusions
Sample Communality
Error in Carbon Estimates
20
10
0
-10
.7
B
.8
.9
1.0
1.1
Sample Communality
100
RNA - Index
80
60
40
20
0
-20
.7
C
.8
.9
1.0
1.1
Sample Communality
Fig. 10. (A) Communality vs. error in C estimates in the Imbrie
and Kipp technique. (B) Communality vs. error in T estimates
in the Imbrie and Kipp technique. (C) Communality vs. RNA
index.
We tested five numerical paleo-estimation techniques for their response to no-analog conditions.
The error associated with estimating T and C for
each test is summarized in Table 4. In the ideal, we
sought a technique which would show a consistent,
predictable error response to increasingly no-analog
conditions. However, all the techniques showed considerable scatter in the plots of error against our
no-analog index (RNA index in Figs. 8 and 9). We
found that multiple regression yielded the most consistent behavior when estimating orthogonal environmental parameters in no-analog space. Principal
component (factor) based linear regression yielded
error magnitudes that were significantly different
for the two parameters we estimated. Non-linear regression used with principal components or Imbrie–
Kipp factors yielded the most unstable extrapolations for no-analog samples (Table 4). The Imbrie
and Kipp technique estimated one controlling environmental variable accurately (temperature), but did
poorly with the other (organic carbon flux) despite
the fact that both variables contribute about evenly
to species abundance variations. The distribution of
data points may have some effect on these differing
results: data points for temperature have a roughly
Gaussian distribution, whereas they are skewed toward low values for organic carbon.
We found two principal sources of error in the
tests we performed. These were: (1) distortion of
species ecologic responses by matrix closure, so that
parameter estimation was based on taxa not truly
carrying the environmental signal; and (2) non-linear
changes in the ecologic responses of taxa beyond
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
245
Table 4
Mean and maximum error generated by each tested multivariate technique
Method
Bottom water temperature
interpolation
Multiple regression
Correlation-based PC
analysis and regression
Covariance-based PC
analysis and regression
Correlation-based PC analysis
and non-linear regression
Imbrie and Kipp technique
Organic carbon flux
extrapolation
interpolation
extrapolation
mean error
max error
mean error
max error
mean error
max error
mean error
max error
0.56
0.40
1.86
1.21
2.20
1.72
9.95
4.46
0.78
0.89
1.46
1.52
5.21
3.95
8.35
13.59
0.47
1.70
1.60
6.99
1.05
1.58
4.53
10.00
0.30
0.71
4.58
16.05
0.96
2.77
6.17
28.66
0.46
1.46
1.13
2.15
1.27
2.35
7.13
17.87
calibration conditions causing no-analog ratio variations among species (ratio no-analog).
Neither our RNA index nor Imbrie–Kipp communality provides a good basis for the estimation
of the error that is associated with no-analog samples (Fig. 10A and B). Further, communality does
not appear to be a sensitive index of the no-analog
condition (Fig. 10C).
We did not find a paleo-estimation technique
which behaved consistently when applied to no-analog samples. This would include the modern analog
technique since it is inherently unable to extrapolate.
Perhaps the most conservative technique was multiple regression since it is based on the least numerical
manipulation of the taxon data and does not employ
non-linear transformations.
It is important to note that our hypothetical experiments are essentially an exploration of a worst-case
scenario as the range of our estimates is double
that of our calibration conditions. Further, in some
cases, the range of species percent in our test samples is four times larger than that of our calibration
samples. This difference in range is extreme and is
much larger than is typical for most uses of paleoestimation techniques. Also, under real conditions
there may be other variables that influence the fauna.
Some of these variables may be important yet unorthogonal inducing biases in the results which were
not considered in our experiments.
Acknowledgements
We would like to thank Mr. Mark Holland (Northern Illinois University) for kindly drafting our figures. This manuscript benefitted from the suggestions of an anonymous reviewer.
Appendix A
Species percents in calibration data set
Sample
SP1
SP2
1
2
3
4
5
6
7
12.6
12.4
12.6
12.7
13.8
13.6
14.8
2.7
4.1
3.1
1.8
7.7
6.4
3.9
SP3
SP4
SP5
SP6
SP7
SP8
SP9
SP10
SP11
SP12
0
0
0
0
0
0
0
19.7
15.9
13.5
10.9
19.5
16.5
15.3
4.8
3.7
2.5
1.7
4.1
3.1
2.2
1.8
3.1
5.4
7.5
0.9
7.1
4.0
25.1
21.9
19.1
17.4
24.0
20.9
19.7
7.5
8.5
8.7
7.8
8.5
9.1
10.6
5.4
6.5
8.6
10.2
5.3
4.8
6.5
1
1.6
2.4
3.3
0.6
0.7
1.0
3.2
2.7
1.1
0.4
1.8
0.9
0.7
16.1
19.8
23.1
26.3
13.8
16.9
21.1
246
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
Appendix A (continued)
Sample
SP1
SP2
SP3
SP4
SP5
SP6
SP7
SP8
SP9
SP10
SP11
SP12
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
14.8
15.6
15.9
16.5
16.8
17.5
18.0
19.2
19.0
19.8
16.9
19.0
16.8
14.8
12.9
18.4
16.5
14.8
12.6
17.4
14.6
15.7
14.1
3.9
11.1
8.9
5.4
5.5
13.9
12.5
7.4
9.0
11.4
4.7
9.0
5.1
3.6
1.8
10.5
5.4
3.8
3.1
13.8
10.1
8.5
4.8
0
0
0
0
0
0.9
1
1
1
1.1
0
1.0
0
0
0
1
0
0
0
0.9
0
0
0
13.2
20.5
18
16.7
15
21.8
19.2
18.9
18
18.2
14.5
18.0
14.7
13.2
9.2
19.2
16.7
15.3
13.5
21.4
18.6
8.5
4.8
1.5
3.7
2.9
2.0
1.4
3.0
2.4
1.7
1.3
1.4
0.9
1.4
1.2
1.3
1.5
1.9
2.0
2.2
2.5
2.8
3.3
4.5
5.2
5.8
0.7
1.1
2.5
3.5
0.3
0.3
0.5
0.6
0.8
4.0
0.6
4.4
6.4
8.3
0.5
2.5
4.0
5.4
0.2
6.2
1.4
2.4
17.4
23.2
21.8
20.3
18.3
21.9
20.0
20.2
19.3
19.6
17.2
19.3
17.6
16.5
15.9
20.0
20.3
19.7
19.1
21.0
21.3
26.5
28.3
9.9
9.6
11.2
12.5
12.1
10.0
12.5
13.4
12.7
12.7
11.1
12.7
11.6
9.2
7.7
12.8
12.5
10.9
8.7
10.5
9.6
10.3
9.1
7.5
2.5
3.5
4.6
5.3
0.7
1.3
1.8
2.0
2.3
5.4
2.0
5.3
7.7
10.5
1.6
4.6
6.5
8.7
0.8
2.4
4.8
6.2
2.0
0.3
0.5
0.7
0.8
0
0.1
0.3
0.2
0.3
1.6
0.2
0.9
2.2
3.9
0.1
0.7
1.0
2.4
0
0.4
0.7
1.2
0.1
0.7
0.6
0.3
0
0.1
0
0
0
0
0
0
0
0
0.2
0
0.3
0.7
1.1
0.1
0.6
1.8
3.6
23.9
12.3
15.6
18.6
21.1
10.0
12.7
15.6
16.9
17.7
23.7
16.9
22.4
25.0
28.0
14.0
18.6
21.1
23.1
11.0
12.8
17.4
20.2
Appendix B
Species percents in test data set samples 1–20
Sample
SP1
SP2
SP3
SP4
SP5
SP6
SP7
SP8
SP9
SP10
SP11
SP12
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
4.2
5.8
7.7
8.4
6.6
0
19.2
19.3
19.3
19.3
20.0
19.7
19.6
19
19.2
13.3
16.9
16.2
13.3
18.3
0
0
0
0
0
0
19.1
17.8
15.1
14.1
13.0
14.6
15.9
17.8
19.1
8.1
13.7
7.3
2.6
7.4
0
0
0
0
0
0
3.2
3.3
3.0
3.0
3.9
4.1
4.3
3.0
3.2
0
0.4
0
0
0.4
2.6
4.6
5.4
6.1
4.8
0
27.5
25.1
23.6
22.7
22
22.4
24.0
24.8
27.5
22.7
23.6
16.0
12.2
16.1
0.7
0.3
0.3
0.3
0.3
0.9
4.3
3.5
2.4
2.2
2.0
2.1
2.3
3.5
4.3
5.2
4.0
2.3
1.8
1.5
15.3
17.1
16.1
15.7
16.8
19.0
0
0
0
0
0
0
0
0
0
0.8
0.3
2.0
5.9
1.8
8.9
6.6
7.7
8.8
7.0
4.5
24.7
23.5
21.5
20.7
21.2
21.0
21.5
23.2
24.7
26.3
23.6
21.2
17.5
19.3
2.1
1.8
1.9
2.1
2.0
0.8
0.3
2.0
6.0
7.3
6.5
5.8
3.8
3.0
0.3
7.3
8.1
12.3
10.5
13.2
13.2
12.3
12.1
11.9
12.2
11.1
0.1
0.2
0.3
0.3
0.2
0.2
0.2
0.3
0.1
2.9
1.0
4.4
8.2
3.2
16
13.3
11.6
10.4
12.5
19.0
0
0
0
0
0
0
0
0
0
0.5
0
0.5
3.1
0.5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2.4
0.5
0.4
0.5
0
37.1
38.2
37.2
36.4
37.9
44.8
1.5
5.4
8.7
10.4
11.0
10.0
8.4
5.4
1.5
10.5
8.1
17.4
24.4
18.4
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
247
Appendix C
Species percents in test data set samples 21–40
Sample
SP1
SP2
SP3
SP4
SP5
SP6
SP7
SP8
SP9
SP10
SP11
SP12
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
11.6
14.5
11.9
18.0
13.9
16.3
17.5
13.2
4.0
7.5
4.3
7.4
3.4
7.7
7.9
4.2
8.2
4.4
8.1
4.0
0.8
1.4
0.8
8.4
4.2
7.3
13.6
5.8
0
1.0
0
0.8
0
0.4
0.6
0
1.2
0
1.2
0
0
0
0
0.5
0
0
0.5
0
0
0
0
0
0
0
0
0
0
0
0
0
7.7
11.2
8.6
16.8
13.6
16.0
20.7
18.2
31.1
27.7
10.8
11.3
2.4
0.9
6.9
4.7
11.3
10.2
24.6
31.1
0.5
0.6
0.5
1.4
1.8
2.3
2.4
3.6
9.3
8.2
5.8
4.3
2.7
1.7
1.4
2.6
4.0
5.8
7.1
9.3
12.7
10.3
12.2
1.5
5.6
2
0.4
2.5
0.9
1
3.6
5.3
10.6
10.7
10.5
10.2
5.2
3.9
1.1
0.9
12.3
13.8
12.6
19.1
15.9
21.2
20.8
22.8
29.8
30.0
24.0
21.4
14.5
14.9
13.9
15.6
21.8
24.2
29.2
29.8
4.8
8.3
5.7
13.5
10.2
12.3
10.9
9.1
1.6
2.9
3.8
6.6
2.9
6.4
6.1
4.2
7.2
3.6
4.1
1.6
11.0
8.3
9.6
2.7
7.6
4.4
1.3
5.1
1.7
2.9
7.9
8.6
13.7
17.3
15.4
12.7
8.6
8
3.7
1.7
6.9
4.1
6.7
0.4
3.2
0.5
0.1
0.8
2.0
1.1
5.8
4.9
11.6
7.5
6.5
10.6
3.7
5.8
1.2
2.0
0
0
0
0
0.4
0.4
0.2
2.0
13.3
9.4
13.0
7.4
5.2
0.9
0.6
2.8
6.2
13.1
8.5
13.3
31.8
27.6
31.5
17.7
23.7
17.4
11.8
17.0
6.2
8.3
20.9
22.0
33.0
31.6
30.1
32.3
22.6
21.1
11.2
6.2
References
Butler, J., 1979. The effects of closure on the moments of a
distribution. Math. Geol. 11, 75–84.
Chayes, F., 1971. Ratio Correlation. Univ. Chicago Press,
Chicago, IL, 99 pp.
CLIMAP Project Members, 1976. The surface of the Ice-Age
Earth. Science 191, 1131–1137.
CLIMAP Project Members, 1981. Seasonal reconstructions of
the Earth’s surface at the last glacial maximum. Geol. Soc.
Am. Map Chart Ser. MC-36, 1–18.
Cooley, W., Lohnes, P., 1971. Multivariate Data Analysis. Wiley,
New York, NY, 364 pp.
Dowsett, H.J., Poore, R.Z., 1990. A new planktonic foraminifera transfer function for estimating Pliocene–Holocene paleoceanographic conditions in the North Atlantic. Mar. Micropaleontol. 16, 1–23.
Ericson, D.B., Wollin, G., 1968. Pleistocene climates and
chronology in deep-sea sediments. Science 162, 1227.
Ericson, D.B., Ewing, M., Wollin, G., 1964. The Pleistocene
epoch in deep sea sediments. Science 146, 723.
Fritts, H.C., Blasing, T.J., Hayden, B.P., Kutzbach, J.E., 1971.
Multivariate techniques for specifying tree growth and climate
relationships and for reconstructing anomalies in paleoclimate.
J. Appl. Meteorol. 10, 845–864.
Geitzenauer, K.R., Roche, M., McIntyre, A., 1976. Modern Pacific coccolith assemblages: derivation and application of Late
Pleistocene paleotemperature analysis. In: Cline, R.M., Hayes,
J.D. (Eds.), Investigation of Late Quaternary Paleoceanography and Paleoclimatology. Geol. Soc. Am. Mem. 145, 464
pp.
Hutson, W.H., 1977. Transfer functions under no-analog conditions: experiments with Indian Ocean planktonic foraminifera.
Quat. Res. 8, 355–367.
Hutson, W.H., 1978. Application of transfer functions to Indian
Ocean planktonic foraminifera. Quat. Res. 9, 87–112.
Imbrie, J., Kipp, N.G., 1971. A new micropaleontological
method for quantitative paleoclimatology: application to a
Late Pleistocene Caribbean core. In: Tuketian, K.K. (Ed.), The
Late Cenozoic Glacial Ages. Yale Univ. Press, New Haven,
CT, pp. 71–181.
Kipp, N.G., 1976. New transfer function for estimating past
sea-surface conditions from sea-bed distribution of planktonic
foraminiferal assemblages in the North Atlantic. In: Cline,
R.M., Hayes, J.D. (Eds.), Investigation of Late Quaternary Paleoceanography and Paleoclimatology. Geol. Soc. Am. Mem.
145, 3–41.
Krumbein, W., Watson, G., 1972. Effects of trends on correlation
in open and closed three-component systems. Math. Geol. 4,
317–330.
Le, J., 1992. Paleotemperature estimation models: sensitivity test
on two western equatorial Pacific cores. Quat. Sci. Rev. 11,
801–820.
Le, J., Shackleton, N.J., 1994. Reconstructing paleoenvironment
by transfer function: model evaluation by simulated data. Mar.
Micropaleontol. 24, 187–199.
248
F. Mekik, P. Loubere / Marine Micropaleontology 36 (1999) 225–248
Loubere, P., 1981. Oceanographic parameters reflected in the
seabed distribution of planktic foraminifera from the North
Atlantic and the Mediterranean Sea. J. Foraminiferal Res. 11
(2), 137–158.
Loubere, P., 1991. Deep-Sea benthic foraminiferal assemblage
response to a surface ocean productivity gradient: A test.
Paleoceanography 6 (2), 193–204.
Loubere, P., 1994. Quantitative estimation of surface ocean productivity and bottom water oxygen concentration using benthic
foraminifera. Paleoceanography 9, 723–737.
Loubere, P., 1996. The surface ocean productivity and bottom
water oxygen signals in deep water benthic foraminiferal assemblages. Mar. Micropaleontol. 28, 247–261.
Loubere, P., Qian, H., 1997. Reconstructing paleoecology and
paleo-environmental variables using factor analysis and regression: some limitations. Mar. Micropaleontol. 31, 205–217.
Lozano, J., Hayes, J.D., 1976. Relationship of radiolarian assemblages to sediment types and physical oceanography in
the Atlantic and Western Indian Ocean sectors of the Antarctic Ocean. In: Cline, R.M., Hayes, J.D. (Eds.), Investigation
of Late Quaternary Paleoceanography and Paleoclimatology.
Geol. Soc. Am. Mem. 145, 464 pp.
Lutze, G.F., Colbourn, W.T., 1984. Recent benthic foraminifera
from the continental margin of northwest Africa: community
structure and distribution. Mar. Micropaleontol. 8, 361–401.
Mackenson, A., Futterer, D.K., Grobe, H., Schmiedl, G., 1993.
Benthic foraminiferal assemblages from the eastern South Atlantic Polar Front region between 35º and 57ºS: Distribution,
ecology and fossilization potential. Mar. Micropaleontol. 22,
33–69.
McIntyre, A., 1967. Coccoliths as paleoclimatic indicators of
Pleistocene glaciation. Science 158, 1314.
Miller, K.G., Lohmann, G.P., 1982. Environmental distribution
for Recent benthic foraminifera on the northeast United States
continental slope. Geol. Soc. Am. Bull. 93, 200–206.
Mix, A.C., Ruddiman, W.F., McIntyre, A., 1986. Late Quaternary
paleoceanography of the tropical Atlantic, 1: Spatial variability
of annual mean sea-surface temperatures, 0–20,000 years B.P.
Paleoceanography 1, 43–66.
Moore Jr., T.C., 1973. Late Pleistocene–Holocene oceanographic
changes in the northeastern Pacific. Quat. Res. 3, 99–109.
Moore, T.C., Burckle, L.H., Geitzenaur, K., Luz, B., MolinaCruz, A., Robertson, J.H., Sachs, H., Sancetta, C., Thiede, J.,
Thompson, P., Wenkam, C., 1980. The reconstruction of sea
surface temperature in the Pacific Ocean of 18,100 B.P. Mar.
Micropaleontol. 5, 215–247.
Morrison, D., 1976. Multivariate Statistical Methods, 2nd ed.
McGraw-Hill, New York, NY, 415 pp.
Ortiz, J.D., Mix, A.C., 1997. Comparison of Imbrie–Kipp transfer function and modern analog temperature estimates using
sediment trap and core top foraminiferal faunas. Paleoceanography 12 (2), 175–190.
Ottens, J.J., 1992. Planktic foraminifera as indicators of ocean
environments in the Northeast Atlantic. Ph.D. Thesis. Free
University, Amsterdam, 189 pp.
Pisias, N.G., Mix, A.C., 1997. Spatial and temporal oceanographic variability of the eastern equatorial Pacific during the
late Pleistocene: evidence from Radiolaria microfossils. Paleoceanography 12, 381–393.
Prell, W.L., 1985. The stability of low latitude sea-surface temperatures: An evaluation of the CLIMAP reconstruction with
emphasis on the positive SST anomalies. U.S. Dep. Energy,
Washington, DC, Rep. TR025, 60 pp.
Ravelo, A.C., Fairbanks, R.G., Philander, S.G.H., 1991. Reconstructing tropical Atlantic hydrography using planktonic
foraminifera and ocean model. Paleoceanography 5, 409–432.
Sachs, H.M., 1973. Late Pleistocene history of the north Pacific: evidence from a quantitative study of Radiolaria in core
V21-173. Quat. Res. 3, 89–98.
Sachs, H.M., Webb III, T., Clark, D.R., 1977. Paleoecological
transfer functions. Annu. Rev. Earth Planet. Sci. 5, 159–178.
Schott, W., 1935. Die foraminiferen in den Aquatorialen Teil des
Atlantischen Ozeans. Dtsch. Atl. Exped. 11 (6), 411.
Shnitker, D., 1974. West Atlantic abyssal circulation during the
past 120,000 years. Nature 248, 385–387.
SPSS, 1995. Professional Statistics 6.1. Marija L. Norusis=SPSS
Inc., Chicago, IL, 385 pp.
Streeter, S., 1973. Bottom water and benthonic foraminifera in
the North Atlantic-glacial–interglacial contrasts. Quat. Res. 3,
131–141.
Streeter, S., Shackleton, N., 1979. Paleocirculation of the Deep
North Atlantic: 150,000 year record of benthic foraminifera
and oxygen-18. Science 203, 168–171.
Thompson, P.R., 1976. Planktonic foraminiferal dissolution and
the progress towards a Pleistocene equatorial Pacific transfer
function. J. Foraminiferal Res. 6, 208–227.
Thompson, P.R., 1981. Planktonic foraminifera in the Western
North Pacific during the past 150,000 years: comparison of
modern and fossil assemblages. Palaeogeogr., Palaeoclimatol.,
Palaeoecol. 35, 241–279.
Webb III, T., Bryson, R.A., 1972. Late and post-glacial climatic
change in the northern Midwest, USA: Quantitative estimates
derived from fossil pollen spectra by multivariate statistical
analysis. Quat. Res. 2, 70–115.