INFERRING PAST ENVIRONMENTS FROM BIOLOGICAL DATA PROGRESS, PROBLEMS, AND PITFALLS H.J.B. Birks University of Bergen & University College London BASIC IDEA OF BIOINDICATION OR ENVIRONMENTAL RECONSTRUCTION Fossil biological data (e.g., pollen, chironomids) 'Proxy data' Environmental variable (e.g., temperature) 1, ........... m species 1 YO t samples XO Unknown. To be estimated or reconstructed t samples To solve for XO, need modern data or 'training data' or 'calibration set' 1, ........... m species 1 Y X n samples n samples Modern biology (e.g., pollen, chironomids) Modern environment (e.g., temperature) BASIC BIOLOGICAL ASSUMPTIONS Marine planktonic foraminifera - Imbrie & Kipp 1971 Foraminifera are a function of sea-surface temperature Foraminifera can be used to reconstruct past sea-surface temperature Pollen Pollen is a function of vegetation Vegetation is a function of climate Pollen is an indirect function of climate and can be used to reconstruct past climate Chironomids (aquatic non-biting midges) Chironomids are a function of lake-water temperature Lake-water temperature is a function of climate Chironomids are an indirect function of climate and can be used to reconstruct past climate Freshwater diatoms (microscopic algae) Diatoms are a function of lake-water chemistry Diatoms can be used to reconstruct past lake-water chemistry Lake-water chemistry may be some weak function of climate Diatoms may be a weak function of climate BIOLOGICAL 'PROXY' DATA PROPERTIES May have 200-300 species, expressed as proportions or percentages in 200-500 samples Multicollinearity Biological data contain many zero values (absences) Species invariably show non-linear unimodal responses to their environment, not simple linear responses 'PROXIES' Pollen - good indicators of vegetation and hence indirect indicators of climate. Betula (birch) Pinus (pine) Alnus (alder) Quercus (oak) Empetrum nigrum Agropyron repens (crowberry) (Gramineae) (grass) Modern pollen, identical treatment, all at same magnification, all stained with safranin Chironomids - good indicators of past lake-water temperatures and hence past climate Common late-glacial chironomid taxa. A: Tanytarsina; b: Sergentia; c: Heterotrissocladius; d: Hydrobaenus/Oliveridia; e: Chironomus; f: Dicrotendipes; g: Microtendipes; h: Polypedilum; i: Cladopelma. Scale bar represents 50 m. Freshwater diatoms - excellent indicators of lake-water chemistry (e.g. pH, total P). Not reliable climate indicators. BASIC NUMERICAL MODELS CLASSICAL APPROACH (1) Y = f(X) + error Biology Environment (2) Estimate f by some mathematical procedure and 'invert' our estimated (f) to find unknown past environment X0 from fossil data Y0 XO f-1(YO) INVERSE APPROACH In practice, for various mathematical reasons, do an inverse regression or calibration (3) X = g(Y) + error (4) XO = g(YO) Obtain 'plug-in' estimate of past environment XO from fossil data YO f or g are 'transfer functions' 'INVERSE' PROCEDURES 1. Principal components regression. Imbrie & Kipp (1971) PC1 Y PC2 X Multiple linear regression or polynomial regression of X on PC1, PC2, PC3, etc. PC3 PCA components maximise variance within Y Selection of components done visually until very recently. Now crossvalidation is used to select model with fewest components and low RMSEP and maximum bias. 2. Two-way weighted averaging. ter Braak & van Dam (1989) and Birks et al. (1990) ^ by weighted averaging of the (i) Estimate species optima (u) environmental variable (x) of the sites. Species abundant at a site will tend to have their ecological optima close to the environmental variable at that site. (WA regression). ^ at the sites by weighted (ii) Estimate the environmental values (x) ^ (WA calibration.) averaging of the species optima (u). (iii) Because averages are taken twice, the range of estimated xvalues is shrunken, and a simple 'inverse' or 'classical' deshrinking is required. Usually regress x on the preliminary ^ and take the fitted values as final estimates of x. estimates (x) Can downweight species in step (ii) by their estimated WA tolerances (niche breadths) so that species with wide tolerances have less weight than species with narrow tolerances 3. Weighted averaging partial least squares regression (WA-PLS). ter Braak & Juggins (1983) and ter Braak et al. (1993) PLS1 Y PLS2 PLS3 X Components selected to maximise covariance between species weighted averages and environmental variable x Selection of number of PLS components to include based on crossvalidation. Model selected should have fewest components possible and low RMSEP and maximum bias. 4. Modern analog technique (MAT) = k-nearest neighbours (kNN). Hutson (1980), Prell (1985), ter Braak (1995), et al. Compare fossil sample t with modern sample i Repeat for all modern samples Repeat for all fossil samples Calculate DC between t and i Select k-closest analogues for fossil sample t Estimate past environment for sample t as (weighted) mean of the environment of the k analogues Value of k estimated by visual inspection, arbitrary rules (e.g., 10, 20, etc.), or cross-validation USE OF METHODS Marine studies (foraminifera, diatoms) - PCR, some MAT plus variants, very few WA or WA-PLS uses Freshwater studies (diatoms, chironomids) - WA or WA-PLS, very few MAT uses Terrestrial studies (pollen) - MAT plus variants, some WA-PLS uses In comparisons using simulated and real data, WA and WA-PLS usually outperform PCR and MAT but not always. Classical methods of Gaussian logit or multinomial logit regression and calibration rarely used (freshwater, terrestrial). Some applications of artificial neural networks and few studies within a Bayesian framework. Bayesian framework may be an important future research direction. HIDDEN BASIC ASSUMPTIONS 1. Species in training set (Y) are systematically related to the physical environment (X) in which they live. 2. Environmental variable (XO , e.g. summer temperature) to be reconstructed is, or is linearly related to, an ecologically important variable in the system. 3. Species in the training set (Y) are the same as in the fossil data (YO) and their ecological responses (Gm) have not changed significantly over the timespan represented by the fossil assemblage. 4. Mathematical methods used in regression and calibration adequately model the non-linear biological responses (Gm) to the environmental variable (X). 5. Other environmental variables than, say, summer temperature have negligible influence, or their joint distribution with summer temperature in the past is the same as in the modern training set. MODEL PERFORMANCE AND SELECTION 1. Root mean square error of prediction (RMSEP) as low as possible. 2. Maximum bias as low as possible. 3. Smallest number of components to avoid 'overfitting'. Based on leave-one-out cross-validation, n-fold crossvalidation, or boot-strapping. Very rare to have an independent test set. MODEL VALIDATION Compare reconstructed values with historical data. Rarely possible as few historical data exist. Renberg & Hultberg (1992) But when done, sometimes the model that gives the closest correspondence is not the model with lowest RMSEP or maximum bias! Conflict between model performance and selection based on crossvalidation and validation results using independent historical test-sets. AN EXAMPLE OF RECONSTRUCTING PAST CLIMATE FROM POLLEN DATA 304 modern pollen samples Norway, northern Sweden, Finland (Sylvia Peglar, Heikki Seppä, John Birks, Arvid Odland) Seppä & Birks (2001) Performance statistics - WA-PLS - leave-one-out cross-validation RMSEP R2 Max. bias July temperature (7.7 - 17.1ºC) 1.0ºC 0.73 3.64ºC Annual precipitation (300 - 3234 mm) 341 mm 0.71 960 mm Seppä & Birks (2001) Seppä & Birks (2001) Summary pollen diagram from Tsuolbmajavri, northern Finland. The age scale in modelled calibrated years BP is shown along with four phases. The total pollen- and spore-accumulation rate (grains cm-2 yr-1) is also shown. The hollow silhouette curves denote the 10 x exaggeration of the percentages. RECONSTRUCTIONS Seppä & Birks (2001) RECONSTRUCTION VALIDATION Tibetanus, Abisko Valley, Sweden Isotopes Inferred from pollen Inferred from Theory pollen Hammarlund et al. (2002) BROAD-SCALE PATTERNS Changes in July summer temperature relative to present-day reconstructed temperature on a south-north transect west of the Scandes mountains. 16 sites covering all or much of the Holocene. South North Anne Bjune et al. FINE-RESOLUTION CHANGES Inferred mean July air temperature Oxygen isotope ratios in Greenland icecore Brooks & Birks, (2000) STATISTICAL AND BIOLOGICAL PROBLEMS AND PITFALLS 1. Sample specific errors of reconstruction for fossil samples. Estimate by boot-strapping. Mean square error of prediction (MSEP) = Error due to variability in estimates of species parameters in the training set (s.e. of boot-strap estimates) s1 2 ˆ ( xi ,boot xi ,boot ) boot n Error due to variation in species abundances at a given environmental value (actual prediction error between observed and mean boot-strap estimate) + boot ( xi ,boot xi ,boot ) n 2 s2 where xi,boot is the mean of xi,boot for all cycles when i is in the test set. RMSEP = (s1 + s2)½ (s1 usually ca. 25% of RMSEP, s2 ca. 75%) For temperature RMSEP usually 1-1.5ºC (about 10% of the modern range sampled) pH RMSEP usually 0.3-0.5 pH units (about 10%) Components of RMSEP (i) Within-lake variability - Heiri et al. (2002) Maximum of 15% of total RMSEP. (ii) Variability in modern environmental data - Nilsson et al. (1996). Can be 30-40% (even 70%) of total RMSEP. Major problem. Cannot take account of natural variability of environmental data. (iii) Variance in the model (model error or lack of fit). What to do with sample specific errors? There is a consistent temporal trend but also continuous overlap in RMSEP! 2. How do we identify signal from noise in reconstructions? LOESS smoothers are a help. Seppä & Birks (2002) Trends or RMSEP? Brooks & Birks (2001) 3. Different methods, although they have similar modern model performances, can give very different reconstruction results. Birks (2003) 4. Some indication of consistent model bias when applied to fossil data. MAT - low variability, insensitive WA - some variability, overestimates at low values WA-PLS - more variability PCR - considerable variability but in terms of modern model performance, all seem good in terms of RMSEP and maximum bias. Extensive experiments using simulated independent test data-sets currently underway by Richard Telford are showing important model differences and biases. 5. Biological data, when sampled over natural environmental gradients, show a mixture of symmetric unimodal (40%) and monotonic responses (40%) and some skewed unimodal responses (5%) and no statistically significant responses (ca. 15%), great variation in species tolerances or niche breadths, and a compositional turnover gradient of 3-4 standard deviations. Perhaps too many monotonic responses to feel comfortable with a unimodal-based model like WA or WA-PLS but too many unimodal responses for linear-based models, like PLS or PCR. Classical approach based on Gaussian or multinomial logit regression and calibration (tried but dropped because of computational limitations in the 1990s) should be re-investigated, possibly within a Bayesian framework (e.g. Toivenen et al. (2001); Korhola et al. (2002)) but incorporating a priori ecological information about the species concerned (depth preferences, lake-chemical preferences, sediment preferences) as priors or conditionals. 6. Incorporation of species tolerances (niche widths) into WA-PLS is needed so that species with narrow tolerances ('good' indicators) have greater weight in the model. 7. Use non-linear deshrinking equations (e.g. smoothing spline) in WA or WA-PLS because the pattern of initially estimated x in relation to observed x is often non-linear, especially at the gradient ends ('edge effects'). 8. Some species may show great dominance and abundance in some ecological settings ('weeds') but then occur with lower abundance in other settings. Great dominance can bias estimates of species parameters, not only of the few dominants, but also of the other species because of the percentage compositional constraint. 9. Do we really need all 200-300 species in a calibration set? Would a model based on only those species that are necessary for the model to perform well be more robust as it is not so 'overfitted' as a model based on 200-300 species? ANN with a backward-elimination pruning algorithm, Racca et al. (2003) SWAP diatom-pH data-set 167 samples 267 species 18.5% +ve data entries pH 4.3-7.3 Species N2 1-120.9 Sample N2 5.1-57.2 Could eliminate 85% of species with little change in model performance RMSEP maximum bias All 267 species 0.32 -0.44 37 species remaining 0.33 -0.46 Use difference between RMSE(apparent) and RMSEP(jack-knife) as a guide to possible model 'overfitting'. Racca et al. (2003) In general we have many species and few lakes in our modern calibration sets. 'Curse of dimensionality' and hence model overfitting. Ideally ratio of species number to lake number should be as close to 1 as possible to minimise 'curse of dimensionality'. Racca et al. (2003) How to find minimal set of 'driving' species in WA or WA-PLS? 10. Covarying environmental variables e.g. temperature and lake trophic status (e.g. total N or P) or temperature and lake depth Brodersen & Anderson (2002) pH and climate Anderson (2000) 11. Use of different proxies - different proxies may give different reconstruction, e.g. mean July temperature at Bjørnfjell, northern Norway. Validate using another proxy - macrofossils of tree birch Importance of independent validation 12. One large modern calibration data-set or several regional datasets? Merging data-sets increases the floristic diversity and environmental range of the resulting transfer function but can introduce further noise due to secondary environmental gradients. Dynamic or local calibration data-set. Use MAT to find 10-20 closest modern analogues for each fossil sample in a core, and use these selected samples as a local calibration data-set for that site. Current evidence suggests a modest improvement only in RMSEP and maximum bias of about 2-5%. 13. Hidden assumption number 5. 'Other environmental variables than, say, summer temperature have neglible influence or their joint distribution with summer temperature in the past is the same as the training set.' Climate model and glaciological results suggest that the joint distribution between summer temperature and winter accumulation has not been the same in the past 11,000 years. Good evidence to suggest that lake-water pH has decreased naturally (soil deterioration) whilst summer temperature rose and then fell in the last 11,000 years. In Norway today, lake-water pH is negatively correlated with summer temperature because lakes of pH 6-7.5 are on basic rock and this happens in Norway to occur mainly at high altitudes and hence at low temperatures. In the past after deglaciation, almost all lakes had a higher pH than today, so the pHtemperature relationship in the past was different than today. PROJECTS THAT HAVE STIMULATED OUR TRANSFER FUNCTION WORK SWAP NORPAST Surface Water Acidification Project 1998-2002 1987-1990 NORPAST-2 1995-2000 NFR KILO 1993-1996 2000-2004 NFR SETESDAL 1996-1999 2003EU CHILL 1998-2001 PERSON WHO HAS STIMULATED OUR TRANSFER FUNCTION WORK Major attributes of Cajo: 1. Wonderful person and loyal friend 2. Exceptional scientist with over 7400 citations 3. Revolutionised numerical ecology and quantitative palaeoecology with his creative ideas, remarkable powers of synthesis, and genius at working at the interface between practical ecology and statistical theory. Cajo ter Braak, April 1992 Thank you Cajo, for all your contributions in the last 20 years.