This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain. Statistical Modeling of Environmental Data in Space and Time Noel Cressiel EXTENDED ABSTRACT As a concept, the environment is simply the surroundings of an organism or organisms. Space and time scales in environmental investigations can range from the very local to the very global. Some studies attempt to understand physical, chemical, and biological processes by performing controlled experiments in the laboratory. In this paper, we are concerned instead with studies made in the field. These are mostly observational in nature and, hence, even though a large amount of data may be collected and analyzed, one typically can only infer associations rather than causation. Most environmental studies in the field involve variability over both space and time. The extension of traditional geostatistical methods, such as kriging, to the space-time domain is one possible approach to characterizing the variability of the processes (e.g., Bilonick, 1983; Eynon and Switzer, 1983; Stein, 1986; Le and Petkau, 1988; Sampson and Guttorp, 1992; Host et al., 1995). There are difficult modeling decisions to make in this approach, involving space, time, and space-time interaction components. The space-time variability is characterized by a variogram that often exhibits very different spatial behavior at different points in time and the class of variogram models that can be fit in this situation is very small indeed. In the atmospheric sciences, traditional methods for examining space-time processes have focused on Empirical Orthogonal Functions (EOF), Canonical Correlation Analysis (CCA), and Principal Oscillation Patterns (POP); see, for example, von Storch et al. (1995). Although these techniques are visually powerful, they were designed with summarization rather than prediction in mind. Without the spatial component, there are a large class of time series that could be used to model the temporal component (e.g., autoregressive error processes). These are dynamic in the sense that they exploit the unidirectional flow of time. Without the temporal component, geostatistical methods could be used to model the spatial component (e.g., intrinsically stationary error processes). These are descriptive in the sense that although they model spatial correlation there is no causative interpretation associated with them. When both ' ~ o e Cressie l is Professor of Szatistics and Distinguished Professor in Liberal A N and Sciences, Iowa Stare Uniwrsizy, Ames, IA 50011 . temporal and spatial components are present, it seems sensible to use models that are a combination of both approaches, namely temporally dynamic and spatially descriptive. That is the new feature of our work and it allows a natural development of the space-time Kalman filter. To give some definiteness to the problem, consider Markov temporal models with spatial colored noise: where St(s) is an (unobserved) value of the state process at location s and time t. The observations are actually which expresses the data as a noisy version of the state process. The goal is to predict St (so), where both t, and so may or may not represent space-time 0 coordinates at which data are available. In (I), (a,,. ..,ap) are autoregressive parameters (i.e., parameters of the "temporally dynamic" component) and, most importantly, qt(*) is a spatially-colored noise process (i.e., the "spatially descriptive" component). In (2), E,(*)is a white-noise process representing measurement error. Data come in the form, where it is not essential that all observations are available at each time point and at each spatial location. The optimal predictor of St (so) is: 0 with mean-squared prediction error, Both these quantities can be calculated recursively, using what could be called a space-time Kalman filter (Huang and Cressie, 1996). The state-space model in (1) is very attractive because it features the dynamic aspect through an autoregressive structure but builds in space-time interaction through the error process q,, which is, at any point in time, a spatially correlated (e.g., intrinsically stationary) process. Notice that St(s) is influenced directly by past values only at location s. In reality, spatio-temporal processes are likely to be more complicated, to have dependence also on past values at locations u near s. Thus, I shall investigate the spatio-temporal climate model, where, for identifiability, the coefficients o,(u) satisfy 1 w,(u)du = 1. Development of the spatio-temporal Kalman Filter allows prediction of St (so) 0 based on data Z, ,...,Z,. The two challenging aspects of this research are to derive the expressions for the optimal predictor and the optimal mean-squared prediction error, and to obtain efficient estimators of the parameters a , var(e,(s)), and cov(qt(s),qt(u)). These are then substituted into the optimal prediction equations. (This research is in progress with Ph.D. student Christopher K. Wikle.) An alternative to the estimation of parameters is to put (prior) distributions on them. In this case, there is a good physical reason to do this. The parameters almost certainly vary from year to year and from region to region; this extra variation can quite simply be handled by replacing a in (3) with {a,,. ..,a,,,a,,. ..} and assuming them to be distributed according to some prior distribution. The goal is still to obtain E(St (so) 1 Zl ,...,253 and its mean-squared prediction error or, more 0 generally, the posterior distribution of St0(so) given Z,, ...,25,. While such calculations were daunting five years ago, Markov chain Monte Carlo (e.g., Bemardo and Smith, 1994, Section 5.5.5) can be invoked to handle the problem. REFERENCES Bemardo, J.M. and Smith, A.F.M. (1994). Bayesian Theory. Wiley, Chichester. Bilonick, R.A. (1983). Risk qualified maps of hydrogen ion concentration for the New York state area for 1966-1978. Atmospheric Environment, 17, 25 13-2524. Eynon, B.P. and Switzer, P. (1983). The variabiity of rainfall acidity. Canadian Journal of Statistics, 11, 11-24. Host, G., Omre, H., and Switzer, P. (1995). Spatial interpolation errors for monitoring data. Journal of the American StatisticalAssociation, 90, 853861. Huang, H. C. and Cressie, N. (1996). Spatio-temporal prediction of snow water equivalent using the Kalman filter. Computational Statistics and Data Analysis, in press. Le, D.N. and Petkau, A.J. (1988). The variability of rainfall acidity revisited. Canadian Journal of Statistics, 16, 15-38. Sampson, P.D. and Guttorp, P. (1992). Nonparametric estimation of nonstationary spatial covariance structure. Journal of the American Statistical Association, 87, 108-119. Stein, M. L. (1986). A simple model for spatial temporal processes. Water Resources Research, 22, 2 107-2110. von Storch, H., Burger, G . , Schnur, R., and von Storch, J.S. (1995). Principal oscillation patterns: A review. Journal of Climate, 8, 377-400.