Progress in Oceanography 58 (2003) 217–233 www.elsevier.com/locate/pocean Summarising spatial and temporal information in CPR data D.J. Beare a,∗, S.D. Batten b, M. Edwards b, E. McKenzie c, P.C. Reid b, D.G. Reid a b a FRS Marine Laboratory, Aberdeen, AB11 9DB, UK Sir Alister Hardy Foundation for Ocean Science, The Laboratory, Citadel Hill, Plymouth PL1 2PB, UK c Department of Statistics and Modelling Science, Strathclyde University, Glasgow, GI 1XH, UK Abstract The Continuous Plankton Recorder survey provides pan-oceanic data on geographic distribution, species composition, seasonal cycles of abundance, and long-term change during the last 70 years. In this paper we compare and contrast some of the historic data-analytic protocols of the survey, focusing primarily on the various means by which spatiotemporal information in CPR data has been exposed. Relative strengths and limitations are assessed, followed by suggestions for future approaches to the visualisation and summarising of CPR data. 2003 Elsevier Ltd. All rights reserved. Keywords: Continuous Plankton Recorder; Spatial; Long-term and seasonal summary; North Atlantic Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 2. Spatial and temporal scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 3. Summarising spatial and temporal variability in CPR data 3.1. Interactions between spatial and temporal effects . . . . 3.1.1. Subsetting to expose interaction . . . . . . . . . . . . 3.1.2. Statistical modelling . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 219 220 221 Multivariate techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 5. Problems explaining long-term, 5.1. General . . . . . . . . . . . . 5.2. CPR data are categorised . 5.3. Data voids . . . . . . . . . . ∗ . . . . seasonal, . . . . . . . . . . . . . . . . . . spatial and . . . . . . . . . . . . . . . . . . . . . compositional . . . . . . . . . . . . . . . . . . . . . . . . . . . Corresponding author. Tel.: +44-1224-295314; fax: +44-1224-295511. E-mail address: d.beare@marlab.ac.uk (D.J. Beare). 0079-6611/$ - see front matter 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.pocean.2003.08.005 changes in . . . . . . . . . . . . . . . . . . . . . plankton populations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 225 226 226 218 D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 5.4. 6. CPR data are observational . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 1. Introduction Continuous Plankton Recorder (CPR) data continue to be collected along multiple tow routes in the North Atlantic using ships of opportunity. The survey has been running since the 1930s and has led to the accumulation of a large and complex database reflecting the rich pageant of North Atlantic planktonic abundance and taxonomic composition (Hardy, 1939). The recent (1980 to present) development of statistical theories and an explosion in computer processing capability, has led to increasingly detailed synthesis and visualisation presentations of the spatial and temporal variability of plankton at pan-oceanic (North Atlantic), seasonal, and decadal scales (e.g. Colebrook & Robinson, 1961, 1965; Colebrook, 1969; Matthews, 1969; Colebrook, 1978a, 1978b, 1979a, 1979b; Colebrook, Robinson, Hunt, Roskell, John, Bottrell et al., 1984; Colebrook & Taylor, 1984; Colebrook, 1985; Colebrook, 1986; Colebrook, 1991; Aebischer, Coulson, & Colebrook, 1990; Planque & Fromentin, 1996; Beare & McKenzie, 1999a, 1999b, 1999c; Planque & Taylor, 1998; Edwards, John, Hunt, & Lindley, 1999; Madden, Beare, Heath, Fraser, & Gallego, 1999; Planque & Batten, 2000; Beaugrand, Reid, Ibañez, & Planque, 2000b). In this paper we tell the story of the gradual advancement of summary techniques used for CPR data, focusing in particular on the difficulties caused by the general non-linearity of the data (e.g. seasonal dependence), the interaction between spatial and temporal effects (e.g. seasonal patterns that vary with respect to location), and nonrandom sampling (e.g. only collecting night samples on a particular ferry route). 2. Spatial and temporal scales Plankton abundance varies in spatial scale from millimetres, to metres, to entire ocean basins; while temporal variation may be measured and analysed in terms of minutes, years, decades or millennia. The amount of information realistically available for spatio/temporal analysis however, is constrained by the design of the sampling survey itself. For the CPR survey, the finest spatial information available is at a 10 km scale—the distance separating each sample. In some areas where tow routes overlap a more detailed spatial resolution may be obtained. In terms of temporal scales, CPR tow routes are only really sampled on a monthly basis, which in our opinion is the smallest temporal unit, at which CPR data can realistically be summarised. The time of day is noted for each CPR sample, suggesting that CPR data can also be analysed at hourly resolutions. Extraction of an hourly signal has been attempted (Hays, 1994; Hays, 1995; Hays, Warner, & Proctor, 1995; Hays, Warner, & Lefevre, 1996; Hirst & Batten, 1998), but considerable problems were encountered in interpretation, because of interactions between the main temporal and spatial effects (Beare & McKenzie, 1999b). The CPR survey also provides data at decadal scales since it has been running since the 1930s. To summarise, a close scrutiny of exactly how relevant information is collected in space and time (seasonal and long-term) is an essential pre-requisite for CPR data summary and visualisation in order to assess whether or not the data can be used to answer particular questions (Table 1). D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 219 Table 1 Details of the category counting system employed by the CPR survey Number of individuals Recorded value Accepted value counted 1 2 3 4–11 12–25 26–50 51–125 126–250 251–500 501–1000 1001–2000 2001–4000 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 6 17 35 75 160 310 640 1300 2690 3. Summarising spatial and temporal variability in CPR data 3.1. Interactions between spatial and temporal effects Plankton abundance depends on numerous factors such as sea temperature, degree of water column stratification, bottom depth, food availability and predator density etc. Proxies for these factors are usually obtained by using variables of location and time (e.g. longitude, latitude, year and month), which are useful for visualising dependence within the data. The functional forms (shapes) of such dependence are most likely to be non-linear and further complications arise because the variables of location, long-term trend and seasonality often depend on each other as well. Such inter-relationships between variables are described in statistical terminology as ‘dependence’, ‘interaction’ or ‘covariation’, and ought, somehow, to be reflected in CPR data-analytic protocols. At first this may seem to be a rather obvious point, but such interactions have often been ignored in the past, or at least only partly accommodated. Descriptions of seasonal cycles of abundance were first described in CPR data in the early 1940’s (e.g. Lucas, 1940; Rae & Fraser, 1941) and have now extended to almost all of the common planktonic taxa, e.g. fish larvae (Bainbridge, Cooper, & Hart, 1974; Coombs, 1980; Coombs & Mitchell, 1981), Thaliacea (Hunt, 1968), decapod larvae (Lindley, 1987; Lindley, Williams, & Hunt, 1993), pteropods (Cooper & Forsyth, 1963), and gastropods (Vane & Colebrook, 1962). Since the early studies, analysts have noted that seasonal cycles of plankton abundance can have different timings and shapes each year (e.g. Fig. 1; Colebrook & Robinson, 1961, 1965; Reid, Surey-Gent, Hunt, & Durrant, 1992; Beare, McKenzie, & Speirs, 1998; Reid, Edwards, Hunt, & Warner, 1998a; Edwards, Reid, & Planque, 2001), while simultaneously varying with location (e.g. Robinson, 1970; Robinson, Aiken, & Hunt, 1986). Summarising spatial patterns of abundance is similarly problematic because they also vary for different month/year combinations (e.g. Colebrook, 1961; Robinson, 1965; Oceanographic Laboratory, Edinburgh, 1973; Planque & Fromentin, 1996; Planque, Hays, Ibañez, & Gamble, 1997; Planque & Ibañez, 1997; Madden, Beare, Heath, Fraser & Gallego, 1999; Beare & McKenzie, 1999a). The occurrence of interactions between spatial and seasonal effects may also mask long-term trends which are also influenced by location (e.g. Planque & Ibañez, 1997; Planque & Batten, 2000) and time of year (Beare & McKenzie, 1999a, 1999b, 1999c). Temporally and spatially constant patterns of abundance for organisms that live permanently in a dynamic environment are thus extremely unlikely to be observed. Complex hypervariate data such as those from the CPR therefore, require dynamic data summary and data visualisation tools (Cleveland, 1993). 220 D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 Fig. 1. Contour plots of mean monthly Phytoplankton Colour during 1948–95 for thecentral North Sea, the central north-east Atlantic and the northern north-eastAtlantic. (redrawn from Reid, Edwards, Hunt & Warner, 1998a). 3.1.1. Subsetting to expose interaction Since the inception of the survey CPR scientists have wrestled continuously with the basic quandary of how best to summarise plankton abundance data that depend simultaneously on a number of variables. A common approach has been to divide the data into separate compartments and analyse the time dependence within each. Reid, Edwards, Hunt & Warner, 1998a) divided their data into three areas (CNE Atlantic, NNE Atlantic, and North Sea) and ‘modelled’ dependence on predictors of season (month) and long-term trend (year) using a contouring algorithm (Fig. 1). This approach allows ‘interaction’ between the two temporal predictors (month and year) to be visualised. However, the shape and level of the seasonal cycle can be different each year so that potentially important spatial variability or dependence within each of the three compartments is oversimplified. The framework basically assumes that the same long-term and seasonal patterns occur over very large areas of ocean. To a limited extent the differences in the three D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 221 time series estimated within each compartment by Reid, Edwards, Hunt & Warner, 1998a) are in fact a measure—albeit crude—of interaction between spatial and temporal factors, because each compartment is treated separately. Therefore, by subsetting the data and dealing with the subdivisions separately, such interactions can be further evaluated. An analogous protocol was used by Robinson et al. (1986) to describe spatial differences in the seasonal abundance of an assortment of planktonic taxa along a transect in the English Channel. Instead of considering how seasonality changed with year in a sub-region (e.g. Reid, Edwards, Hunt & Warner, 1998a), the authors instead wanted to reveal how seasonality changed according to location; or in their case the ‘distance along a transect’. To do this they contoured the plankton data in two dimensions, using covariates of month and distance along the transect to show how the shape of the seasonal cycle varied with, or interacted with, location. Unfortunately, Robinson et al. (1986) used a long-term aggregation of data (1974–1981) in their analysis that may have caused an avoidable bias, which occurred because the shape of the seasonal cycle may have been completely different in each of the seven years (1974–1981). Suppose, for the sake of argument that single pronounced spring peaks in abundance occurred in 1974 and 1975, while single autumn peaks occurred in 1980 and 1981. For such data, adopting the protocol of Robinson, Aiken and Hunt (1986) would lead to the assumption of a bimodal seasonal cycle that did not happen in any of the years. If instead, the authors had chosen to divide the data into seven subsets, one for each year, and repeated their contouring process they would have been able to go some way towards gauging the effect of year (long-term trend) on their interpretation of how seasonality interacts with the locational dimension. Workers have often assumed that their estimates of seasonal dynamics were more reliable because they were using an aggregation of data collected over many years, and there were, as a consequence, more data points. This assumption is spurious because inter-annually constant seasonal cycles are seldom observed in CPR data, and aggregating them over multiple years may lead to erroneous estimates of seasonal (or spatial) structure. Planque and Batten (2000) examined how the annual cycle of abundance of Calanus finmarchicus varied across the entire North Atlantic (Fig. 2). They did this by constructing mean seasonal cycles at various points in the North Atlantic for the period 1958–1996. The conclusion from their protocol was that peaks in annual cycles of abundance could differ by as much as four months across the North Atlantic. This conclusion may be correct, but the ultimate interpretation of the analysis, however, is ambiguous because of the presence in the data of a long-term trend, and the almost certain knowledge that it is not independent of the seasonal or spatial effects. Thus, to deal with ‘interaction’, data-analysts can adopt one of two basic protocols. They can either divide their data into various spatio-temporal subsets (years, months, ICES squares etc.) and carefully examine the data within each, or they can try to ‘model’ the interactions directly using statistical fitting algorithms. 3.1.2. Statistical modelling Statistical modelling techniques take ‘response variables’ (e.g. copepod abundance) and attempt to link them, via mathematical functions (plus random components), to one or more ‘predictor variables’ (e.g. temperature & food availability). In practice, statistical modelling also often involves prior subsetting, in order to reduce certain aspects of the variability. In an investigation into long-term changes in diel vertical migration (DVM) behaviour of the copepod, C. finmarchicus, Beare & McKenzie (1999b) divided CPR data into five sub-regions to lessen the impact of spatial variability. Within each of the sub-regions, 12 (January to December) stochastic models (Generalized Linear Models (GLMs)) were then built to describe long-term trend (1958–1998) for each month individually. This was done to control the effect of seasonal variation on interpretation, because long-term trends in plankton taxa vary between months. Only by first accounting for the spatial, seasonal and long-term variations can the signal resulting from DVM be exposed. The combined subsetting and statistical modelling procedure, adopted by Beare & McKenzie (1999b), 222 D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 Fig. 2. Spatial variations in the seasonal abundance of C. finmarchicus. (A) Mean time-series of monthly abundance, (B) cumulated time-series of monthly abundance and (C) spatial distribution of the seasonal index showing geographical variability in seasonality (redrawn from Planque & Batten, 2000). Grey intensity is proportional to the local value of the seasonal index. The seasonal index is the month at which 50% of the total annual abundance of C. finmarchicus has been collected by the CPR survey. Increasing seasonal indices correspond to later seasonality in the abundance of C. finmarchicus. enabled determination of the DVM effect, but the spatial dimension was poorly controlled because of the large size of their sub-regions. Statistical methodology appropriate to CPR data has traditionally been divided into three separate disciplines: (1) time-series (e.g. Diggle, 1990); (2) spatial statistics (e.g. Cressie, 1991) and (3) multivariate statistics (e.g. Krzanowski, 1988). Time series analysis, concerned with decomposing serially dependent data into separate long-term, seasonal, cyclical and random components, has only rarely been applied to D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 223 CPR data (e.g. Broekhuizen & McKenzie, 1995; Beare & McKenzie, 1999c). Spatial statistics attempt to model 2-dimensional spatial dependence using variography (Rossi, Mulla, Journel, & Franz, 1992), while multivariate techniques are concerned with datasets with more than one response variable for each observational or experimental unit (e.g. Krzanowski, 1988). Most researchers, however, are generally most interested in describing combined time series, spatial and multivariate data. CPR data are neither bona fide time series because of the spatial dimension, nor bona fide spatial data because of the temporal dimension. Most past summaries of CPR data have been restricted to: 1. temporal analyses of spatial compartments (e.g. Taylor & Stephens, 1980; Taylor, Colebrook, Stephens, & Baker, 1992; Broekhuizen & McKenzie, 1995; Hirst & Batten, 1998; Reid, Edwards, Hunt & Warner, 1998a; Reid, Planque, & Edwards, 1998b); 2. spatial analyses of temporal compartments (e.g. Oceanographic Laboratory, Edinburgh, 1973; Planque & Fromentin, 1996; Planque & Batten, 2000); or 3. multivariate analyses of wide-ranging data aggregations in space and time (e.g. Colebrook, 1972, 1978a, 1978b, 1979a, 1979b, 1982; Ali, 1996; Beaugrand et al., 2000a) which may tend to oversimplify aspects of spatial and temporal variation. Recently, statistical methods have been developed that attempt to model temporal and spatial dependence in ecological data simultaneously. Spatio-temporal patterns in mackerel and horse mackerel egg abundance were successfully described using Generalized Additive Models (GAMs) by Borchers, Buckland, Priede, & Ahmedi (1997a) and by Borchers, Richardson, & Motos (1997b). Non-linear dependence in the data was handled within the GAMs using spline functions, while interactions were modelled using smoothed products of the predictor variables. Similar methods have been applied to CPR abundance data for Calanus finmarchicus and C. helgolandicus (Beare & McKenzie, 1999a, 1999b, 1999c, 1999d). The outputs from such models allow long-term changes in both seasonality and spatial distribution to be assessed over the long term (Figs. 3 and 4). In Fig. 3, C. finmarchicus abundances for the northwestern North Sea are plotted as a time series (Fig. 3(A)), and as a 2D surface (Fig. 3(B)). [Note: both datasets are identical and have been derived from the same stochastic model.] The plots show how C. finmarchicus abundance collapsed in conjunction with changes in its seasonal structure. It is similarly possible to map changes in spatial distribution over the long term. In Fig. 4, output from 4D models are plotted to show how the geographic distributions of three zooplankton ‘indices’ (Boreal Atlantic, Temperate Atlantic and Neritic) have changed each May between 1958 and 1998. The output also enables seasonal change in spatial distribution to be examined for individual years (not shown). In future, similar modelling procedures might be used to summarise the spatio-temporal patterns in many other CPR taxa. The important caveat is that, as the dimensionality of our data modelling and exploration increases, so problems caused by missing data mount. It is straightforward to fit plankton abundance to the covariates of time and location (see also Borchers, Buckland, Priede & Ahmedi, 1997a, Borchers, Richardson and Motos, 1997b) and use the model parameters to predict over an evenly spaced, temporally resolved grid as was done here (e.g. Fig. 4). The fact that the data have been modelled in such a manner does not mean, however, that problems relating to non-randomness of the sampling (confounding) and the data voids have all been conveniently solved. The data used to create Fig. 4 have gaps in space and time, through which the model interpolates using propinquitous (in space and time) available information. Standard fit diagnostics, residuals, R-squared etc. can be used to gauge the quality of the fit, which is acceptable in this case (Fig. 4), but where data are missing over large areas it is impossible to know what would have been recorded had those areas actually been sampled. Simple statistical functions (polynomials, smoothers) probably do reasonably well where sampling has been representative, but one cannot know for certain. Statistical models are only as good as the input data. 224 D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 Fig. 3. (A) Time series and (B) surface plot of average accepted numbers per month of C. finmarchicus in the north western North Sea between 1958 and 1998. (Note: The average accepted numbers were calculated using a multinomial logit model and the plots show how seasonality can change as a function of long-term trend. The datasets used in both A and B are identical). 4. Multivariate techniques A review of the application of multivariate methods used to construct ‘summary’ long-term trends for CPR data is provided by Ali (1996), who focused on Principle Component Analysis (PCA), Minimum/Maximum Autocorrelation Analysis (MAFA; Solow, 1994) and Cluster Analysis. Ali commented that the long-term trends extracted using PCA and MAFA are often difficult to interpret because only statistical criteria (variance; lag-1 autocorrelation) are used in their construction. The fact that data must also be aggregated prior to constructing the necessary 2-dimensional matrices may also result in a loss of information: yearly averages for particular sub-regions, for example, will obscure how seasonality affects the interpretation of long-term trend. For these reasons Ali (1996) suggested that alternative index numbers could usefully be constructed from CPR data using ad-hoc scientific criteria for the ‘weights’ instead. In other disciplines index numbers are essential tools for summarising large, multivariate systems (e.g. FTSE-100 Share Index, Retail Price Index, House Price Index). Analyses by Beaugrand, Ibañez & Reid (2000), using a range of diversity indices, have suggested that the North Atlantic can be divided into different regions based on assemblages of planktonic species. These ideas have been used by us to experiment with three possible index numbers (e.g. Boreal Atlantic Index; Temperate Atlantic Index; Neritic Index) for the North Sea, based on aggregations of CPR zooplankton taxa with similar long-term, seasonal, and spatial behaviours. These indices have then been modelled in space and time using GAMs, and have revealed long-term ecosystem and water mass changes (see also Beare, Gislason, Astthorsson, & McKenzie, 2000). The ‘Temperate Atlantic Index’ for example, is plotted for June 1958 and 1998 (Fig. 4). This plot shows there were pronounced D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 225 Fig. 4. Change in the spatial distribution of Boreal Atlantic (BA), Temperate Atlantic, (TA) and Neritic (N) indices in the North Sea between June 1958 and June 1998. Indices determined from probability of recording in a CPR sample: (BA) C. finmarchicus, (TA) C. helgolandicus, Candacia armata or Centropages typicus, (N) Centropages hamatus or Temora longicornis. Grey scale corresponds to the probability or presence estimated using a Generalised Additive Model from the Binomial family. long-term spatial changes in the abundances of temperate Atlantic species during that period, which are probably related to increasing sea temperatures and changing patterns of Atlantic inflow via the English Channel and Fair Isle Current. The extension of such an approach to cover the entire North Atlantic using aggregations like those suggested by Beaugrand is an exciting future prospect. 5. Problems explaining long-term, seasonal, spatial and compositional changes in plankton populations. 5.1. General The CPR survey has supplied most of our long-term, seasonal and spatial information on zooplankton populations in the North Atlantic, but the abiotic and biotic mechanisms that cause these observations remain poorly understood. Successful scientific interpretation of CPR data is compromised by a number of considerations. CPR plankton data are typically examined using predictor variables of time (long-term trend, month) and location (latitude, longitude), which are obviously extremely useful for purposes of data summary, but cannot directly reflect causative mechanisms. Phytoplankton abundance does not soar in 226 D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 spring because it is April, but because ambient temperatures, light levels and water column stability begin to become suitable for growth. Long-term trend, month and location etc. are certainly useful for descriptive purposes, but the incorporation of scientifically more meaningful covariates (e.g. salinity, temperature, light levels) directly into CPR data analyses and stochastic models might lead to more satisfying outcomes. 5.2. CPR data are categorised The CPR survey records plankton abundance data in the form of ordered categories (Colebrook, 1960; Oceanographic Laboratory, Edinburgh, 1973; Warner & Hays, 1994) known as ‘recorded values’ (Table 1). For such data, traditional statistical techniques based on the Normal Distribution (e.g. linear regression, analysis of variance) are inappropriate and any statistical conclusions based on them are almost certainly unreliable (Lindsey, 1995). For example, the mean and standard deviation of a sample of accepted numbers with at least one number greater than three cannot be interpreted in the usual way because such accepted numbers represent ‘bins’. During the last 20 years, advances in statistical methods mean that it is now possible to model categorised data, such as those from the CPR, directly (McCullagh & Nelder, 1983). One published report involved modelling the probability of getting any one of the recorded values in a CPR sample using a multinomial logit transformation (e.g. Beare & McKenzie, 1999a). An example of its output for data on C. finmarchicus is plotted in Fig. 5. The red bars, for example, correspond to the likelihood of recording a zero in May between 1958 and 1998 (top), or a zero between January and December in 1965 (bottom). Similarly the darker green, dark blue, and turquoise bands reflect the probabilities of recording ones, twos, and threes respectively. The non-parallelism, a result of the separate model fitted to each recorded value, is emphasised in the pictures, viz. the seasonal (February to December) narrowing of the zero recorded value (red) probability band, versus the widening of the band (deep pink) representing a recorded value of four. The abrupt widening of the deep pink band in both graphs (Fig. 5 top and bottom) happens because it is the first band to represent an aggregation of numbers on the recorded value scale (Warner & Hays, 1994). This fifth category could represent any number of animals between 4 and 11, and so the probability band widens because there is more chance of recording any of eight numbers (4–11) than of just one, which is the case for recorded values 0, 1, 2 and 3. Such types of model have an appeal for applications to analyses of CPR data, because they allow more confident interpretation of output statistics than the standard Gaussian-based techniques of the past. 5.3. Data voids Data voids cause serious problems for analysis and summarising of CPR data because they reduce the ability to separate the seasonal, spatial and long-term components; especially where interactions occur (Hays, Carr, & Taylor, 1993). The fixed depth horizon (all samples are taken at ca 10m; Hays & Warner, 1993) of the CPR causes difficulties, since seasonal (Heath, Backhaus, Richardson, Slagstad, Beare, Dunn et al., 1999a; Heath & Jonasdottir, 1999), and diel (Hays, Warner & Proctor, 1995) vertical patterns of migration are crucial factors in the life cycles of many North Atlantic zooplankton species. These behaviours may bias what we actually interpret as seasonal or spatial pattern. Consider the copepod Metridia lucens, which is only recorded in CPR samples during darkness (Hays, Warner & Proctor, 1995). Examination of M. lucens data from the CPR for the northern North Sea shows that the animal is only recorded in that area during wintertime, but the pronounced diel vertical migration behaviour of the species means that we cannot know if this wintertime peak in M. lucens abundance is real, or a result of the longer hours of darkness in winter. It is crucially important that the potential influence of non-random sampling in CPR data is considered when trying to assess spatio-temporal variability. Consider CPR observations between 1969 and 1980 in an arbitrary sub-region (56.5–57°N; 0–3°W) of D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 227 Fig. 5. Stages 5 and 6 Calanus finmarchicus in the northwest North Sea: bar-plots showing the proportion of the overall probability estimated for each recorded value in May between 1958 and 1998 (top) and between January and December during 1965 (bottom). Red reflects the probability of recording a zero, green a one, dark blue a two, and so on up the recorded value scale (Warner & Hays, 1994). the northern North Sea, split by year and month (Table 2). In 1969, sampling was evenly spread throughout the year, but later data voids begin to emerge and in 1979 there were no January, March, or May to November samples, so, strictly speaking, separating inter-annual and seasonal effects using this information is impossible. As a result even straightforward questions regarding the shape of the long-term trend, seasonal pattern, and how they interact with each are sometimes unanswerable, because the necessary information is not available in the data. In this case, the areal extent of the arbitrary sub-region could be increased 228 D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 Table 2 Numbers of CPR samples recorded per month between 1969 and 1980 in an arbitrary sub-region (56.5–57°N;0–3°W) in the northeast North Sea Year Jan Feb 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 5 0 5 5 5 3 5 5 0 0 0 0 5 5 5 5 2 0 5 0 4 5 4 3 Mar Apr May Jun Jul Aug Sep Oct 5 4 5 5 0 4 4 1 1 5 0 0 5 5 5 5 4 0 5 5 1 5 5 3 5 4 5 5 2 5 4 4 10 10 0 3 10 5 6 5 5 4 0 5 6 6 0 3 6 7 5 5 4 0 5 0 3 5 0 0 5 5 5 4 4 4 0 2 1 5 0 4 5 5 3 4 3 5 6 2 5 5 0 0 4 5 5 0 0 3 5 2 5 5 0 3 Nov Dec 4 5 3 4 0 0 5 5 4 5 0 4 5 5 5 4 0 5 5 4 0 0 3 1 until observations were spread evenly throughout the year, although this might introduce additional biases as a result of spatial variation. Data aggregation has been used extensively in the literature to overcome problems of sparsity and nonrandomness. Achieving the right balance is, in truth, very difficult, and there are no entirely satisfactory solutions. We make this point here only to keep future CPR data analysts alert to the potential difficulties caused by sparse data, data voids and non-randomness, and to note that data aggregation is not necessarily a solution. 5.4. CPR data are observational Continuous Plankton Recorder data are ‘observational’ and not derived from ‘designed experiments’ indicating that confirmatory statistics of the type promoted by Fisher (see Fisher, 1990) in the early decades of the last century have limited usefulness for deducing scientific mechanism. Traditional univariate, correlative approaches to linking CPR and environmental time series are useful first steps, but may lead to oversimplification and probable ultimate failure. Single predictor variables such as the Gulf Stream Index (Taylor & Stephens, 1980, 1998), the NAO Index (Fromentin & Planque, 1996; Planque & Fromentin, 1996), or sea temperature are unlikely to produce satisfactory predictive models when the effect of each is examined individually. This situation arises because the actual underlying scientific mechanisms that force the observed changes are not incorporated directly into the overall data-analytic and conceptual frameworks. Temperature, salinity, stratification, food availability, advection and overwintering location might all simultaneously affect the abundance of a particular copepod, and the level of one might influence that of another and so on. Scientifically interpretable models with multiple predictors that can interact with each other are thus essential. To illustrate the point further, consider the CPR data displayed in Fig. 4. Correlation coefficients were calculated between the Boreal Atlantic abundance index (almost totally dominated by C. finmarchicus) displayed in Fig. 4 and sea temperature (1958 to 1998), and then plotted on maps (Fig. 6). The blue areas represent correlations with large negative signs where C. finmarchicus abundance fell (1958 to 1998) and temperature increased (1958 to 1998), the red areas large positive correlations where both sea temperature (1958 to 1998) and C. finmarchicus abundance declined (1958 to 1998). Clearly, the long-term connection between the two variables (sea temperature and C. finmarchicus) varies with month and geographic position, and interactions between temperature and other factors (e.g. stratification, salinity, Atlantic inflow) may D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 229 Fig. 6. Spatial and seasonal patterns in correlation (1958 and 1998) between sea temperature (at 15 m) and the abundance of C. finmarchicus. Only correlation coefficients ⬎ +0.6 and ⬍ ⫺0.6 are plotted. The blue areas represent negative correlations (C. finmarchicus abundance falls while sea temperatures rise), the red areas positive correlations (C. finmarchicus abundance also falls but sea temperatures rise). The blanks represent areas where no linear relationships between the two variables were identified. need to be considered if a satisfactory explanatory model for C. finmarchicus is to be found. Interestingly, the relationship is most pronounced when the abundances of C. finmarchicus in the North Sea are at seasonal minima (winter). Comparing data averaged over large areas (e.g. North Sea, English Channel, North East Atlantic) might miss such observations because long-term trends in environmental variables can, and do, vary from month to month, and from place to place. 6. Conclusion It is worth remembering that all temperate ecological and meteorological time series data are highly seasonal, and usually have an important spatial dimension. These seasonal and spatial signals are usually far larger than those resulting from long-term trends. This means that environmental time series are usually highly correlated with each other. Unfortunately, auto-correlation between successive (time series) and 230 D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 Fig. 7. Long-term and seasonal changes in (A) C. finmarchicus abundance in the northwest North Sea and (B) the North Atlantic Oscillation index. nearby (spatial) data points means that the relationships are not ‘statistically significant’, and isolating those that are actually forcing the long-term changes in the plankton, is difficult. One prospect is that coincidental changes in seasonal (or spatial) patterns may provide evidence for links between environmental time series data. It has been reported, for example, that the seasonal structure of the North Atlantic Oscillation Index and the C. finmarchicus time series both changed at around the same time in the late 1960’s (Fig. 7; Beare & McKenzie, 1999b). Whether such relationships ultimately prove to be scientifically useful remains to be seen, but such links are certainly worth seeking in the analysis of ecological time series data. The complexity of the long-term links between plankton and their environment is not in doubt, but the success of the next 70 years of CPR data collection will depend on our ability to build models capable of assessing multi-dimensional, dynamic interaction among the factors that initiate change in plankton populations. Acknowledgements We would like to thank all those CPR scientists and technicians, past and present, who have contributed to this priceless dataset. The Natural Environmental Research Council are also owed a debt of thanks for providing funding support for this work as part of the thematic programme: Marine Productivity. D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 231 References Aebischer, N. J., Coulson, J. C., & Colebrook, J. M. (1990). Parallel long-term trends across four marine trophic levels and weather. Nature London, 347, 753–755. Ali, S. (1996). New approaches to analysing time series from the Continuous Plankton Recorder Survey. M.Phil. Thesis, University of Strathclyde, Scotland, pp. 206 unpublished. Bainbridge, V., Cooper, G. A., & Hart, P. J. B. (1974). Seasonal fluctuations in the abundance of the larvae of mackerel and herring in the north-eastern Atlantic and North Sea. In J. H. S. Blaxter (Ed.), The Early Life History of Fish (pp. 159–169). Berlin: Springer-Verlag. Beare, D., & McKenzie, E. (1999a). The multinomial logit model: a new tool for exploring Continuous Plankton Recorder data. Fisheries Oceanography, 8(Suppl. 1), 25–39. Beare, D., & McKenzie, E. (1999b). Continuous Plankton Recorder data and diel vertical migration in stage V and VI Calanus finmarchicus: a statistical analysis. Fisheries Oceanography, 8(Suppl. 1), 126–137. Beare, D. J., & McKenzie, E. (1999c). Connecting ecological and physical time-series: the potential role of changing seasonality. Marine Ecology Progress Series, 178, 307–309. Beare, D. J., & McKenzie, E. (1999d). Temporal patterns in the surface abundance of Calanus finmarchicus and Calanus helgolandicus in the northern North Sea. Marine Ecology Progress Series, 195, 253–262. Beare, D. J., McKenzie, E., & Speirs, D. C. (1998). The unstable seasonality of C. finmarchicus in the Fair Isle Current. Journal of the Marine Biological Association of the United Kingdom, 78, 1377–1380. Beare, D. J., Gislason, A., Ástthórsson, Ó., & McKenzie, E. (2000). Assessing long-term changes in the early summer zooplankton communities around Iceland. ICES Journal of Marine Science, 57, 1544–1561. Beaugrand, G., Ibañez, F., & Reid, P. C. (2000). Long-term and seasonal fluctuations of plankton in relation to hydroclimatic features in the English Channel, Celtic Sea and Bay of Biscay. Marine Ecology Progress Series, 200, 93–102. Beaugrand, G., Reid, P. C., Ibañez, F., & Planque, B. (2000b). Biodiversity of North Atlantic and North Sea copepods. Marine Ecology Progress Series, 204, 299–303. Borchers, D. L., Buckland, S. T., Priede, I. G., & Ahmedi, S. (1997a). Improving the precision of the daily egg production method using generalized additive models. Canadian Journal of Fisheries and Aquatic Science, 54, 2727–2742. Borchers, D. L., Richardson, A., & Motos, L. (1997b). Modelling the spatial distribution of fish eggs using generalized additive models. Oceanografica, 2, 103–120. Broekhuizen, N., & McKenzie, E. (1995). Patterns of abundance for Calanus and smaller copepods in the North Sea: time-series decomposition of two CPR data sets. Marine Ecology Progress Series, 118, 103–120. Cleveland, W. S. (1993). Visualizing Data. Summit, New Jersey: Hobart Press. Colebrook, J. M. (1960). Continuous Plankton Records: methods of analysis 1950–1959. Bulletins of Marine Ecology, 5, 51–64. Colebrook, J. M. (1961). Continuous Plankton Records: contributions towards a plankton atlas of the North-eastern Atlantic and the North Sea. Bulletins of Marine Ecology, 5, 65–111. Colebrook, J. M. (1969). Variability in the plankton. Progress in Oceanography, 5, 115–125. Colebrook, J. M. (1972). Variability in the distribution and abundance of the plankton. ICNAF Special Publication, 8, 167–184. Colebrook, J. M. (1978a). Changes in the zooplankton of the North Sea, 1948–1973. Rapports et Procès-Verbaux des Réunions du Conseil International pour l’Exploration de la Mer, 191, 264–272. Colebrook, J. M. (1978b). Continuous Plankton Records: zooplankton and environment, North-East Atlantic and North Sea, 1948– 1975. Oceanologica Acta, 1, 9–23. Colebrook, J. M. (1979a). Continuous Plankton Records: Seasonal cycles of phytoplankton and copepods in the North Atlantic Ocean and the North Sea. Marine Biology, 51, 23–32. Colebrook, J.M. (1979b). Continuous Plankton Records: monitoring the plankton of the North Atlantic and the North Sea. In D. Nichols, (Ed.). Monitoring the Marine Environment. Symposia of the Institute of Biology, 24, 87-102. Colebrook, J. M. (1982). Continuous Plankton Records: Persistence in time-series and the population dynamics of Pseudocalanus elongatus and Acartia clausi. Marine Biology, 66, 289–294. Colebrook, J. M. (1985). Continuous Plankton Records: overwintering and annual fluctuations in the abundance of zooplankton. Marine Biology, 84, 261–265. Colebrook, J. M. (1986). Environmental influences on long-term variability in marine plankton. Hydrobiologia, 142, 309–325. Colebrook, J. M. (1991). Continuous Plankton Records: from seasons to decades in the plankton of the North-east Atlantic. In T. Kawasaki, S. Tanaka, Y. Toba, & A. Taniguchi (Eds.), Long-term Variability of Pelagic Fish Populations and their environment (pp. 29–45). Oxford: Pergamon Press. Colebrook, J. M., & Robinson, G. A. (1961). The seasonal cycle of the plankton in the North Sea and the north-eastern Atlantic. Journal du Conseil international pour l’Exploration de la Mer, 26, 156–165. Colebrook, J. M., & Robinson, G. A. (1965). Continuous Plankton Records: seasonal cycles of phytoplankton and copepods in the North Eastern Atlantic and the North Sea. Bulletins of Marine Ecology, 6, 123–139. 232 D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 Colebrook, J. M., Robinson, G. A., Hunt, H. G., Roskell, J., John, A. W. G., Bottrell, H. H., Lindley, J. A., Collins, N. R., & Halliday, N. C. (1984). Continuous Plankton Records: a possible reversal in the downward trend in the abundance of the plankton of the North Sea and the Northeast Atlantic. Journal du Conseil international pour l’Exploration de la Mer, 41, 304–306. Colebrook, J. M., & Taylor, A. H. (1984). Significant time scales of long-term variability in the plankton and the environment. Rapports et Procès Verbaux du Conseil International pour l’Exploration de la Mer, 183, 20–26. Coombs, S. H. (1980). Continuous Plankton Records: a plankton atlas of the North Atlantic and North Sea. Supplement 5—young fish 1948–1972. Bulletins of Marine Ecology, 8, 229–281. Coombs, S. H., & Mitchell, C. E. (1981). Long-term trends in the distribution, abundance and seasonal occurrence of larvae of mackerel (Scomber scombrus L.) around the British Isles 1948–1978. Journal of the Marine Biological Association of the United Kingdom, 61, 343–358. Cooper, G. A., & Forsyth, D. C. T. (1963). Continuous Plankton Records: contribution towards a plankton atlas of the North Atlantic and the North Sea. Part VII: The seasonal and annual distributions of the pteropod Pneumodermopsis Keferstein. Bulletins of Marine Ecology, 6, 31–38. Cressie, N. A. C. (1991). Statistics for Spatial Data. New York: John Wiley and Sons. Diggle, P. J. (1990). Time Series: A Biostatistical Introduction. Oxford: Oxford University Press. Edwards, M., John, A. W. G., Hunt, H. G., & Lindley, J. A. (1999). Exceptional influx of oceanic species into the North Sea late 1997. Journal of the Marine Biological Association of the United Kingdom, 79, 737–739. Edwards, M., Reid, P. C., & Planque, B. (2001). Long-term and regional variability of phytoplankton biomass in the Northeast Atlantic (1960–1995). ICES Journal of Marine Science, 58, 39–49. Fisher, R. A. (1990). Statistical Methods. Experimental Design and Scientific Inference. Oxford: Oxford University Press. Fromentin, J. M., & Planque, B. (1996). Calanus and environment in the eastern North Atlantic. II. Influences of the NAO on C. finmarchicus and C. helgolandicus. Marine Ecology Progress Series, 134, 111–118. Hardy, A. C. (1939). Ecological investigations with the Continuous Plankton Recorder: object, plan and methods. Hull Bulletins of Marine Ecology, 1, 1–57. Hays, G. C. (1994). Zooplankton avoidance activity. Nature, London, 376, 650. Hays, G. C. (1995). Diel vertical migration behaviour of Calanus hyperboreus at temperate latitudes. Marine Ecology Progress Series, 127, 301–304. Hays, G. C., Carr, M. R., & Taylor, A. H. (1993). The relationship between Gulf Stream position and copepod abundance derived from the Continuous Plankton Recorder Survey: separating biological signal from sampling noise. Journal of Plankton Research, 15, 1359–1373. Hays, G. C., & Warner, A. J. (1993). Consistency of towing speed and sampling depth for the Continuous Plankton Recorder. Journal of the Marine Biological Association of the United Kingdom, 73, 967–970. Hays, G. C., Warner, A. J., & Proctor, A. (1995). Spatio-temporal patterns in the diel vertical migration pattern of the copepod Metridia lucens in the northeast Atlantic derived from the Continuous Plankton Recorder survey. Limnology and Oceanography, 40, 469–475. Hays, G. C., Warner, A. J., & Lefevre, D. (1996). Long-term changes in the diel vertical migration of zooplankton. Marine Ecology Progress Series, 141, 149–159. Heath, M. R., & Jónasdóttir, S. H. (1999). Distribution and abundance of overwintering Calanus finmarchicus in the Faroe-Shetland Channel. Fisheries Oceanography, 8(Suppl. 1), 40–60. Heath, M. R., Backhaus, J. O., Richardson, K., Slagstad, D., Beare, D. J., Dunn, J., Fraser, J. G., Gallego, A., Hainbucher, D., Hay, S., Jónasdóttir, S., Madden, H., Mardaljevic, J., & Schacht (1999a). Climate fluctuations and the spring invasion of the North Sea by Calanus finmarchicus. Fisheries Oceanography, 8(Suppl. 1), 163–176. Hirst, A. G., & Batten, S. D. (1998). Long-term changes in the diel vertical migration behaviour of Calanus finmarchicus in the North Sea are unrelated to fish predation. Marine Ecology Progress Series, 171, 307–310. Hunt, H. G. (1968). Continuous Plankton Records: contribution towards a plankton atlas of the north Atlantic and the North Sea. Part XI: The seasonal and annual distributions of Thaliacea. Bulletins of Marine Ecology, 6, 225–249. Krzanowski, W. J. (1988). Principles of Multivariate Analysis. A User’s Perspective. Oxford: Oxford University Press. Lindley, J. A. (1987). Continuous Plankton Records: the geographical distribution and seasonal cycles of decapod crustacean larvae and pelagic post- larvae in the northeastern Atlantic Ocean and the North Sea, 1981–3. Journal of the Marine Biological Association of the United Kingdom, 67, 145–167. Lindley, J. A., Williams, R., & Hunt, H. G. (1993). Anomalous seasonal cycles of decapod crustacean larvae in the North Sea plankton in an abnormally warm year. Journal of Experimental Marine Biology and Ecology, 172, 47–65. Lindsey, J. K. (1995). Modelling frequency and count data. Oxford: Oxford University Press. Lucas, C. E. (1940). Ecological investigations with the Continuous Plankton Recorder: the phytoplankton in the Southern North Sea, 1932–37. Hull Bulletins of Marine Ecology, 1, 73–170. Madden, H., Beare, D. J., Heath, M. R., Fraser, J. G., & Gallego, A. (1999). The spring/early summer distribution of Calanus sp. In the northern North Sea and adjacent areas. Fisheries Oceanography, 8(Suppl. 1), 138–152. D.J. Beare et al. / Progress in Oceanography 58 (2003) 217–233 233 Matthews, J. B. L. (1969). Continuous Plankton Records: the geographical and seasonal distribution of Calanus finmarchicus s.l. in the North Atlantic. Bulletins of Marine Ecology, 6, 251–273. McCullagh, P., & Nelder, J. A. (1983). Generalized Linear Models. London: Chapman & Hall. Oceanographic Laboratory, Edinburgh (1973). Continuous Plankton Records: a plankton atlas of the North Atlantic and the North Sea. Bulletins of Marine Ecology, 7, 1–174. Planque, B., & Batten, S. D. (2000). Calanus finmarchicus in the North Atlantic—The year of Calanus in the context of inter-decadal changes. ICES Journal of Marine Science, 57, 1528–1535. Planque, B., & Fromentin, J. -M. (1996). Calanus and environment in the eastern North Atlantic. 1. Spatial and temporal patterns of C. finmarchicus and C. helgolandicus. Marine Ecology Progress Series, 134, 101–109. Planque, B., Hays, G. C., Ibañez, F., & Gamble, J. C. (1997). Large scale spatial variations in the seasonal abundance of Calanus finmarchicus. Deep-Sea Research 1, 44, 315–326. Planque, B., & Ibañez, F. (1997). Long-term time series in Calanus finmarchicus abundance—a question of space? Oceanologica Acta, 20, 159–164. Planque, B., & Taylor, A. H. (1998). Long-term changes in zooplankton and the climate of the North Atlantic. ICES Journal of Marine Science, 55, 644–654. Rae, K. M., & Fraser, J. H. (1941). The Copepoda of the southern North Sea, 1932–37. Hull Bulletins of Marine Ecology, 1, 171–238. Reid, P. C., Edwards, M., Hunt, H. G., & Warner, A. J. (1998a). Phytoplankton changes in the North Atlantic. Nature, London, 391, 546. Reid, P. C., Planque, B., & Edwards, M. (1998b). Is observed variability in the long-term results of the Continuous Plankton Recorder survey a response to climate change? Fisheries Oceanography, 7, 282–288. Reid, P. C., Surey-Gent, S. C., Hunt, H. G., & Durrant, A. E. (1992). Thalassiothrix longissima, a possible oceanic indicator species in the North Sea. ICES Marine Science Symposium, 195, 268–277. Robinson, G. A. (1965). Continuous Plankton Records: contribution towards a plankton atlas of the North Atlantic and the North Sea. Bulletins of Marine Ecology, 6, 104–122. Robinson, G. A. (1970). Continuous Plankton Records: variation in the seasonal cycle of phytoplankton in the North Atlantic. Bulletins of Marine Ecology, 6, 333–345. Robinson, G. A., Aiken, J., & Hunt, H. G. (1986). Synoptic surveys of the western English Channel. The relationship between plankton and hydrography. Journal of the Marine Biological Association of the United Kingdom, 60, 675–680. Rossi, R. E., Mulla, D. J., Journel, A. G., & Franz, E. H. (1992). Geostatistical tools for modelling and interpreting ecological spatial dependence. Ecological Monographs, 62, 277–314. Solow, A. R. (1994). Detecting change in the composition of a multispecies community. Biometrics, 50, 556–565. Taylor, A. H., Colebrook, J. M., Stephens, J. A., & Baker, N. G. (1992). Latitudinal displacements of the Gulf Stream and the abundance of plankton in the north-east Atlantic. Journal of the Marine Biological Association of the United Kingdom, 72, 919–921. Taylor, A. H., & Stephens, J. A. (1980). Latitudinal displacements of the Gulf Stream (1966 to 1977) and their relation to changes in temperature and zooplankton abundance in the NE Atlantic. Oceanologica Acta, 3, 145–149. Taylor, A. H., & Stephens, J. A. (1998). The North Atlantic Oscillation and the latitude of the Gulf Stream. Tellus, 50A, 134–142. Vane, F. R., & Colebrook, J. M. (1962). Continuous Plankton Records: contribution towards a plankton atlas of the north-eastern Atlantic and the North Sea. Part VI: The seasonal and annual distributions of the Gastropoda. Bulletins of Marine Ecology, 5, 247–253. Warner, A. J., & Hays, G. C. (1994). Sampling by the Continuous Plankton Recorder survey. Progress in Oceanography, 34, 237–256.