Analysis of Covariance with Spatially Correlated Secondary Variables

Revista Colombiana de Estadística Junio 2008, volumen 31, no. 1, pp. 95 a 109 Analysis of Covariance with Spatially Correlated Secondary Variables Análisis de covarianzas con variables secundarias correlacionadas espacialmente Tisha Hooks1,a , David Marx2,b , Stephen Kachman2,c , Jeffrey Pedersen3,d , Roger Eigenberg3,e 1 Department of Mathematics and Statistics, Winona State University, Winona, United States 2 Department 3 Department of Statistics, University of Nebraska, Lincoln, United States of Agronomy and Horticulture, USDA-ARS Research, University of Nebraska, Lincoln, United States 4 USDA-ARS U.S. Meat Animal Research Center, Clay Center, United States Abstract Advances in precision agriculture allow researchers to capture data more frequently and in more detail. For example, it is typical to collect “on-the-go” data such as soil electrical conductivity readings. This creates the opportunity to use these measurements as covariates for the primary response variable to possibly increase experimental precision. Moreover, these measurements are also spatially referenced to one another, creating the need for methods in which spatial locations play an explicit role in the analysis of the data. Data sets which contain measurements on a spatially referenced response and covariate are analyzed using either cokriging or spatial analysis of covariance. While cokriging accounts for the correlation structure of the covariate, it is purely a predictive tool. Alternatively, spatial analysis of covariance allows for parameter estimation yet disregards the correlation structure of the covariate. A method is proposed which both accounts for the correlation in and between the response and covariate and allows for the estimation of model parameters; also, this method allows for analysis of covariance when the response and covariate are not colocated. Key words: Covariance Analysis, Spatial Analysis, Cokriging, Covariate. a Assistant Professor. E-mail: THooks@winona.edu E-mail: DMarx1@unl.edu c Professors. E-mail: SKachman1@unl.edu d Geneticist and Professor. E-mail: JPedersen1@unl.edu e Researcher. E-mail: REigenberg2@unl.edu b Professors. 95 96 Tisha Hooks, et al. Resumen Los avances en agricultura de precisión permiten a los investigadores obtener datos con más frecuencia y en detalle. Por ejemplo, es común colectar “en el transcurso” datos como lecturas de electro-conductividad del suelo. Esto crea la oportunidad de usar estas medidas como covariables para incrementar la precisión experimental de la variable de respuesta. Aún más, estas medidas están espacialmente relacionadas entre sí, creando la necesidad de métodos en los cuales la ubicación espacial representa un papel explícito en el análisis de los datos. Se analizan conjuntos de datos que contienen variables de respuesta y covariables espacialmente relacionadas, usando el método cokriging o el análisis espacial de covarianza. Aunque el método cokriging usa la estructura de correlación de la covariable, es una herramienta puramente predictiva. Alternativamente, el análisis espacial de covarianza permite la estimación de parámetros pero sin tener en cuenta la estructura de correlación de la covariable. El presente artículo propone un método que tiene en cuenta la correlación en la covariable, así como la correlación entre la covariable y la variable de respuesta, permitiendo la estimación de los parámetros del modelo. De la misma manera, este método permite el análisis espacial de covarianza cuando la variable de respuesta y la covariable no están colocalizadas. Palabras clave: análisis de covarianzas, covarianza espacial, cokriging, covarianza. 1. Introduction With recent advances in precision agriculture, researchers are now able to capture data more frequently and in more detail. For example, it is typical to collect “on-the-go" data such as soil electrical conductivity readings. This creates the opportunity to use these measurements as covariates for the primary response variable to possibly increase experimental precision. Moreover, these measurements are also spatially referenced to one another, creating the need for methods in which spatial locations play an explicit role in the analysis of the data. These covariates are usually measured before the treatment is applied and hence the assumption that the covariate is not affected by the treatment is met. A standard framework for the analysis of spatial data considers a response variable, Y (s), which is in principle obtainable at any location, s, within a twodimensional spatial region R. Let these spatial locations be indexed by site si . Data values ysi = yi are obtained from locations si , i = 1, 2, . . . , n, and are assumed to follow the model Y (si ) = µ(si ) + e(si ), i = 1, 2, . . . , n where Y (si ) = Yi is the response variable, µ(si ) is the deterministic trend, and e(si ) is a stochastic error term with some spatial covariance structure (Dubin 1988). Applications of this model fall into two broad categories: spatial prediction problems and estimation problems. In spatial prediction problems, the objective is to predict the value of the response variable at some arbitrary location, s0 , Revista Colombiana de Estadística 31 (2008) 95–109 Analysis of Covariance with Spatially Correlated Secondary Variables 97 given the data y = (y1 , y2 , . . . , yn ). In estimation problems, interest centers on estimating model parameters such as treatment effects. A thorough discussion of methods applicable to these types of problems can be found in texts such as Journel & Huijbregts (1978), Isaaks & Srivastava (1989), Cressie (1993), and Goovaerts (1997). Often, data collected on the response variable are supplemented by additional information collected on the covariates. The model given above can be extended to include the covariates. In spatial prediction problems, the joint data are analyzed using the cokriging approach (Goovaerts 1997). In this case, the model can be expressed as Y1 (s1i ) = µ1 (s1i ) + e1 (s1i ), i = 1, 2, . . . , n1 Y2 (s2i ) = µ2 (s2i ) + e2 (s2i ), i = 1, 2, . . . , n2 where Y1 (s1i ) is the response variable, Y2 (s2i ) is the covariate, µ1 (s1i ) and µ2 (s2i ) represent deterministic trend, and e1 (s1i ) and e2 (s2i ) are random error terms with some spatial covariance structure. This approach is most advantageous when Y2 is more densely sampled than the response variable. Cokriging explicitly accounts for spatial cross-correlation between the response and secondary variable in that e1 (s1i ) and e2 (s2i ) have a spatial correlation structure and a cross-correlation between them. Incorporated into the procedure is the usual restriction that the covariance matrix associated with the cross-covariogram be positive definite (Isaaks & Srivastava 1989). Ordinary cokriging then predicts Y1 (s0 ) using information from the data Y1 (s1i ) and the covariate Y2 (s2i ). Universal cokriging allows for a trend or large scale structure in the prediction equations under the assumption of coregionalization. However, in order for the covariate to have any influence on the estimation of the large scale structure parameters, a restriction on the parameter space that jointly involve parameters in the response and covariate variables must be made (Helterbrand & Cressie 1994). Making such a restriction, such as assuming the mean of the response variable and covariate are equal, it is often too restrictive to make universal cokriging of any practical use for estimation. Universal cokriging can also be used as a regression procedure (Stein & Corsten 1991). While the cokriging approach is an extremely useful geostatistical tool, it is used only in prediction problems and does not easily allow for the estimation of treatment effects. In spatial estimation problems where interest centers on estimating model parameters or testing for treatment differences, data consisting of measurements on a response variable and a covariate are usually analyzed using a spatial analysis of covariance. Analysis of covariance methods use information about the response variable that is contained in another variable in order to improve estimation, and a detailed description of this tool can be found in Searle (1971) and Milliken & Johnson (2002). The basic analysis of covariance model is as follows: yij = αi + βi xij + eij where i = 1, 2, . . . , t, j = 1, 2, . . . , ni , the mean of yi for a given value of X is αi + βi X and eij ∼ i.i.d. N (0, σ 2 ). Note that this model has t intercepts and t slopes Revista Colombiana de Estadística 31 (2008) 95–109 98 Tisha Hooks, et al. and thus represents a collection of simple linear regression models with a different model for each level of the treatment. If equal slopes are assumed, the model used to describe the mean of y as a function of the covariate is yij = αi + βxij + eij where eij ∼ i.i.d. N (0, σ 2 ) (Milliken & Johnson 2002). If the assumption on the error term is relaxed and eij is alternatively assumed to have some spatial covariance structure, the model becomes a spatial analysis of covariance model (Cressie 1993, Zimmerman & Harville 1991). In this case, the analysis uses information from both the response variable and the covariate in order to obtain more accurate parameter estimates. Dubin (1988) proposed this approach for spatially autocorrelated error terms. However, this method accounts for only the spatial correlation that exists in the response variable. If the covariate has a spatial correlation structure and a cross-correlation with the response variable, the analysis does not take these spatial correlation structures into account. Consider the case where the primary attribute of interest (response variable) and a secondary variable (covariate) possess some spatial structure, and assume interest lies in estimating treatment effects. For example, consider the situation and an agronomic trial is conducted where the field fertility contains spatial structure. The experimental design of the study can be any appropriate classical design, but the appropriate analysis will include a spatial component (Marx & Stroup 1993). If soil testing of the area has been conducted recently, then the soil fertility can be more accurately estimated using these measurements. There will generally be fewer soil sample locations than plots and these will not be colocated with the centers of the plots. A hypothetical schematic is given in Figure 1, where there is a 6 × 6 arrangement of plots with 20 gridded soil sample locations throughout the field. In another example, a recent study conducted at the U.S. Meat Animal Research Center in Clay Center, Nebraska, compared a cover crop treatment and a no-cover crop treatment for values of ammonia nitrogen. Also, soil electrical conductivity readings were used as a covariate. In this situation, the response variable and a covariate possessed a spatial structure. These spatial correlations and the cross-correlation structure that exists between them need to be included in the analysis as it is in the cokriging approach, but parameter effects need to be estimated as in spatial analysis of covariance. The objective of this work is to develop a model that accounts for the correlation in and between the response and covariate and allows for the estimation of model parameters. In the following pages, the proposed model and methodology are described, the analysis of a simulated data set in which the response and covariate are colocated is presented (colocated data occur when the response variable and the covariate are measured at the same geographic coordinates), the methodology is applied to a simulated data set in which the data are not colocated; and the methodology is used to analyze a soils data set obtained from the U.S. Meat Animal Research Center. Revista Colombiana de Estadística 31 (2008) 95–109 Analysis of Covariance with Spatially Correlated Secondary Variables 99 Figure 1: Hypothetical agronomic field trial where additional information is available through soil test samples. 2. Model and Methods For colocated data, the model can be expressed as y = Xτ + βu + ey (1) u = 1µ + eu where y is an n × 1 vector containing the measurements of the observed y(si )’s at sites in Syu , u is an n × 1 vector containing the covariate observations u(si )’s at sites in Syu , X is an n × p design matrix for treatment effects, τ is a p × 1 vector of treatment effects, β is the regression coefficient for the covariate, 1 is an n × 1 vector of 1’s, and µ represents the mean of the covariate. Also, the model assumes ey ∼ N (0, σy2 R) and eu ∼ N (0, σu2 R) where R represents a spatial correlation structure (e.g., spherical, gaussian, or exponential)(Cressie 1993). It can be seen that E(y) = E(Xτ + βu + ey ) = Xτ + βE(u) = Xτ + β1µ E(u) = E(1µ + eu ) = E(1µ) = 1µ Also, since the term βu in equation (1) can contain any correlation between y and u, it can be assumed that Cov(ey , eu ) = 0. Then, V ar(y) = V ar(Xτ + βu + ey ) = β 2 V ar(u) + V ar(ey ) = β 2 V ar(eu ) + V ar(ey ) V ar(u) = V ar(1µ + eu ) = V ar(eu ) Cov(y, u) = Cov(Xτ + βu + ey , 1µ + eu ) = Cov(Xτ + β(1µ + eu ) + ey , 1µ + eu ) = Cov(βeu + ey , eu ) = Cov(βeu , eu ) + Cov(ey , eu ) = βV ar(eu ) Revista Colombiana de Estadística 31 (2008) 95–109 100 Tisha Hooks, et al. Finally, the model assumptions can be written as y ∼N u Σ Xτ + β1µ , yy Σuy 1µ Σyu Σuu ! where Σyy = β 2 Σuu + Σyy = (β 2 σu2 + σy2 )R Σuu = V ar(eu ) = σu2 R Σyu = βΣuu = βσu2 R thus, y ∼N u 2 2 (β σu + σy2 )R Xτ + β1µ , βσu2 R 1µ ! βσu2 R σu2 R (2) For the univariate case, the variance of the response variable is defined as κ2y = V ar(yi ) = β 2 σu2 + σy2 (3) and the variance for the covariate is defined as κ2u = V ar(ui ) = σu2 (4) κyu = Cov(yi , ui ) = βσu2 (5) Also, Let ρ represent the correlation between the covariate and response variable. The relationship between ρ and β can be described as follows: κyu κyu = 2 σu2 κu Cov(yi , ui ) κyu =⇒ κyu = ρκy κu ρ= p = κy κu V ar(yi )V ar(ui ) κyu = βσu2 =⇒ β = thus, ρκy κu ρκy = 2 κu κu ρ̂κˆy β̂ = κˆu β= (6) Note that κy is also the square root of the sill for the response and κu is the square root of the sill for the covariate. The covariance matrix in (2) can easily be reparameterized to have just one error term. Since, in our situation, we have two variables, we chose to have two error terms rather than just one and the other error term then being a proportionality constant times the first error term. This is in agreement with Oliver (2003) (equations (3), (4), and (5)). Additionally note that Revista Colombiana de Estadística 31 (2008) 95–109 Analysis of Covariance with Spatially Correlated Secondary Variables 101 Oliver constructs the cosimulation by using independent Gaussian random variables which again implies that the assumption of zero covariance is made without loss of generality. This model notation can be extended to non-colocated data (Banerjee et al. S 2004). Let y be the vector of observed y(si )’s at S the sites in Syu Sy and u be the vector of observed u(sj )’s at the sites in Syu Su . Let y ′ denote the vector of missing y observations in Su and u′ denote the vector of missing u observations in Sy . Then, the vectors for the response and covariate observations from the previous discussion can be replaced with the augmented vectors yaug = (y, y ′ ) and uaug = (u, u′ ). After permutation to line up the y’s and u’s, they can be collected y yaug of equation (2). which is analogous to the vector into a vector uaug u A program which models the covariance structure in equation (2) was written in R (R Development Core Team 2004). The program uses generalized least squares and yields restricted maximum likelihood (REML) estimates of the range, the sills for the response variable and for the covariate, treatment effects, and the correlation between the response and the covariate. Also, the results give the F value for testing for an overall difference in treatment effects, and an approximate z-test is constructed using the asymptotic variance of ρ to test whether or not ρ (and equivalently β) is significantly different from zero. The denominator degrees of freedom for the F -test can be found by subtracting the number of fixed effects from the number of observations on the response variable. Finally, an estimate for β can be found using the relationship between β and ρ given in equation (6). The program, simulated data sets, and soils data set may all be obtained from the first author. 3. Example Using Simulated Colocated Data The simulated data set consisted of 20 replications of five treatments. The treatments were laid out in a completely randomized design on a 10 × 10 arrangement of plots. For each of the 100 points, both a spatial floor (Y ) and a spatial covariate (X) were generated using the method of gaussian cosimulation (Oliver 2003). In this example, the spherical covariance function was used for the construction of both variables. The function is as follows: ( σ 2 {1 − 32 ( ha ) + 12 ( ha )3 }, if 0 ≤ h ≤ a; C(h) = 0, if h > a. where h is the distance between observations, a is the range of the corresponding spherical semivariogram, and σ 2 is the sill of the semivariogram. The response variable was simulated with a range of 8 and a sill of 10, and the covariate was simulated with a range of 8 and a sill of 1. Two different correlations of ρ = 0.3 and 0.8 were simulated when modeling the cross-covariance between the spatial floor and the covariate. Finally, the mean of both the response variable and the Revista Colombiana de Estadística 31 (2008) 95–109 102 Tisha Hooks, et al. covariate was simulated to be 10, and treatment effects were generated with the treatment vector τ = (−0.4, −0.2, 0, 0.2, 0.4). The data were analyzed in three ways: using a nonspatial analysis of covariance, a spatial analysis of covariance, and the proposed analysis which accounts for the spatial structure of both the response and the covariate. The results are summarized in Tables 1-3. The spatial analyses yield more accurate estimates than the analysis which disregards the location of the observations. As displayed in Tables 1 and 2, the estimate for β is much closer to the simulated value and the average standard error of treatment differences is much smaller for the spatial analyses. Also, the estimates for σy2 and for the regression coefficient for the covariate are very close for the proposed method and spatial analysis of covariance. While the proposed method provides the most accurate representation of the treatment effects as seen in Table 2, it is observed that the average standard error of treatment differences is very close to that obtained from a spatial analysis of covariance. Table 3 shows the results of hypothesis tests for treatment differences and significance of the covariate for all three analyses. Overall, the results from the proposed method are very close to the results obtained from the spatial analysis of covariance, and it appears that little is gained when accounting for the spatial structure of the covariate when the response and covariate are colocated. The same results occur for ρ = 0.3 as seen in Table 4. In addition to the proposed method giving a slightly smaller average standard error of treatment differences, the least square means were slightly closer to the simulated data means (not shown). Thus, one may choose to use a simple spatial analysis of covariance to test for treatment differences since the precision that is gained via use of the proposed method may not be worth the extra effort that is required. Table 1: Parameter estimates from the three analyses conducted in the example using simulated colocated data with ρ = 0.8. Parameter of Interest Range sill for response sill for covariate β Analysis of Covariance – – – 3.37 Spatial Analysis of Covariance 8.74 4.31∗ – 2.33 Proposed Method 8.61 9.73 1.02 2.33 Simulated Data 8.00 10.00 1.00 2.53 ∗ Note that 4.31 actually represents σ̂ 2 , whereas the sill for the response is κ2 . y y For the proposed method, σ̂y2 is calculated to be 4.20 using equation (3). The simulated σy2 is 3.60. However, this is not to imply that the proposed method is not extremely useful. The true strength of this analysis is more than simply accounting for the spatial structure of the covariate. Its power lies in the fact that this model allows for covariate measurements that are not colocated with measurements of the primary attribute of interest. An example which considers a data set in which the covariate is measured at locations different from the response variable is presented in the next section. Revista Colombiana de Estadística 31 (2008) 95–109 103 Analysis of Covariance with Spatially Correlated Secondary Variables Table 2: Least squares means for treatments and average standard errors of treatment differences from the three analyses conducted in the example using simulated colocated data with ρ = 0.8. LSMEAN for treatment 1 LSMEAN for treatment 2 LSMEAN for treatment 3 LSMEAN for treatment 4 LSMEAN for treatment 5 Average standard error of treatment differences Analysis of Covariance Spatial Analysis of Covariance Proposed Method Simulated Data 10.860 10.150 9.980 9.6 10.670 10.290 10.110 9.8 10.930 10.460 10.280 10.0 11.580 10.780 10.610 10.2 11.560 10.720 10.540 10.4 0.478 0.260 0.255 – Table 3: Results of hypothesis tests for treatment differences and significance of the covariate from the three analyses conducted in the example using simulated colocated data with ρ = 0.8. test for treatment differences test for significance of covariate Analysis of Covariance F = 1.54 (p -value = 0.1976) F = 327.90 (p -value < 0.0001) Spatial Analysis of Covariance F = 1.75 (p -value = 0.1448) F = 119.57 (p -value < 0.0001) Proposed Method F = 1.20 (p -value = 0.3174) z = 3.668 (p -value < 0.0001) Table 4: Parameter estimates from the three analyses conducted in the example using simulated colocated data with ρ = 0.3. Parameter of Interest Range sill for response sill for covariate β Average standard error of treatment differences Analysis of Covariance – – – 1.220 0.778 Spatial Analysis of Covariance 7.220 6.320∗ – 0.800 0.333 Proposed Method 10.05 9.40 0.90 0.83 0.328 Simulated Data 8.00 10.00 1.00 0.95 – ∗ Note that 6.32 actually represents σ by2 whereas the sill for the response is κ2y . 4. Example Using Simulated Non-Colocated Data The simulated experiment consisted of twenty replications of five treatments. The treatments were laid out in a completely randomized design on a 10 × 10 arrangement of plots. Within each plot, another 3×3 grid of points was constructed. For each of the 900 points, a spatial floor (Y ) and a spatial covariate (X) were generated using the method of gaussian cosimulation (Oliver 2003). Revista Colombiana de Estadística 31 (2008) 95–109 104 Tisha Hooks, et al. The spherical covariance function was used for the construction of both variables. The response variable was simulated with a range of 25 and a sill of 5, and the covariate was simulated with a range of 25 and a sill of 1. Correlations of ρ = 0.8 and 0.3 were simulated when modeling the cross-covariance between the spatial floor and the covariate. Finally, the mean of the response variable and the covariate was simulated to be 10, and treatment effects were generated with the treatment vector τ = (−0.5, −0.25, 0, 0.25, 0.5). These treatment effects correspond to treatments A, B, C, D, and E, respectively. The final data set was constructed as follows. First, the center observation from each of the 100 plots was used as the response variable. Then, 25 of the 100 remaining responses were randomly chosen and deleted from the data set. Finally, 90% of the 900 covariate observations were randomly selected and deleted. A representation of this simulated data set is given in Figure 2. For each plot, the treatment (A, B, C, D, or E) is identified, and the black squares represent the location of the sampled covariates. Clearly, the response and the covariate are not colocated in this example, and there are more covariate observations than measurements on the response. Figure 2: Simulated data set in which the response and covariate are not colocated. Treatments A-E for each plot and the locations of the sampled covariates are identified. The data were first analyzed using the proposed analysis which accounts for the spatial structure of both the response and the covariate. In order to compare this to a spatial analysis of covariance, a data set in which the response and covariate were colocated was constructed by using the covariate measurement which was closest to the central observation of each plot as the covariate for that plot. If two covariate measurements were equally distant from the center of the plot, the covariate was calculated as their average. A spatial analysis of covariance was then conducted using this data set. The results from both analyses are summarized in Tables 5-7. Revista Colombiana de Estadística 31 (2008) 95–109 Analysis of Covariance with Spatially Correlated Secondary Variables 105 Table 5: Parameter estimates from the three analyses conducted in the example using simulated colocated data with ρ = 0.8. Parameter of Interest Range sill for response sill for covariate β Spatial Analysis of Covariance 27.43 4.75∗ – 0.83 Proposed Method 27.35 5.16 1.02 2.02 Simulated Data 25.00 5.00 1.00 1.79 ∗ Note that 4.75 actually represents σ by2 , whereas the sill for the response is κ2y . The simulated σy2 is calculated to be 1.80 using equation (3). Table 6: Least squares means for treatments and average standard errors of treatment differences from the three analyses conducted in the example using simulated colocated data with ρ = 0.8. LSMEAN for treatment 1 LSMEAN for treatment 2 LSMEAN for treatment 3 LSMEAN for treatment 4 LSMEAN for treatment 5 Average standard error of treatment differences Spatial Analysis of Covariance Proposed Method Simulated Data 10.000 9.540 9.50 9.930 9.360 9.75 10.460 10.200 10.00 10.560 10.340 10.25 10.680 10.270 10.50 0.304 0.241 – Table 7: Results of hypothesis tests for treatment differences and significance of the covariate from the three analyses conducted in the example using simulated colocated data with ρ = 0.8. Test for treatment differences Test for significance of covariate Spatial Analysis of Covariance F = 2.10 (p -value = 0.0909) F = 10.21 (p -value = 0.0021) Proposed Method F = 6.06 (p -value = 0.0003) z = 1.9580 (p -value = 0.0502) As seen in Table 5, the proposed analysis yields a much more accurate estimate of β than the spatial analysis of covariance. Moreover, the estimate of the sill for the response variable (κ2y ) from the proposed analysis is very close to the simulated value, but the estimate of σy2 from the spatial analysis of covariance is considerably different from the simulated value. Also, as illustrated in Table 6, the proposed method provides more accurate representations of most of the treatment effects, and the average standard error of treatment differences provides a 20% improvement over the average standard error of treatment differences from the Revista Colombiana de Estadística 31 (2008) 95–109 106 Tisha Hooks, et al. Table 8: Parameter estimates from the three analyses conducted in the example using simulated colocated data with ρ = 0.3. Parameter of Interest Range sill for response sill for covariate β Average standard error of treatment differences Spatial Analysis of Covariance 20.520 4.200∗ – 0.100 Proposed Method 20.330 5.120 0.760 0.520 0.335 0.329 Simulated Data 25.00 5.00 1.00 0.67 – ∗ Note that 4.20 actually represents σ by2 whereas the sill for the response is κ2y . spatial analysis of covariance. The F -statistic is larger for the proposed analysis (Table 7), but there is no reason to expect that the F -statistics be comparable since the proposed method uses all covariate points and the spatial analysis of covariance uses a subset of the covariates. Moreover, some covariate points are used twice in the spatial analysis of covariance. A relatively large correlation between the response variable and covariate was chosen so that the analysis of covariance would be very effective. To see if the results held for a smaller correlation, a correlation of ρ = 0.3 was simulated using the same parameters (range, sill for response and covariate, treatment means) as were used for the ρ = 0.8 simulation. These results, shown in Table 8, indicate that the same trend as seen for ρ = 0.8 continue, but with a lesser effect. The average standard error of the treatment differences was reduced slightly using the proposed method compared to spatial analysis of covariance and the least square means were slightly closer to the simulated means using the proposed method (data not shown). Our conclusion is that the stronger the association between the response variable and covariate the greater the improvement by using the proposed method. The true strength of the proposed method is that even the spatial analysis of covariance should not be run except when the data are colocated. Our proposed method is the only procedure which allows for non-colocated data and tests of treatment effects. 5. Example Using Actual Field Data These data were obtained from a study site located at the U.S. Meat Animal Research Center located in Clay Center, Nebraska. The site was treated with four replications of a winter wheat cover crop versus four replications of a no-cover crop. The treatments were laid out in a randomized complete block design with subsampling. The response variable consisted of ammonia nitrogen levels obtained from soil cores. Also, soil electrical conductivity measurements taken prior to treatment application were used as a covariate, and the geographical coordinates (northing and easting) were recorded at each measured location. A representation of this data set is given in Figure 3. Revista Colombiana de Estadística 31 (2008) 95–109 Analysis of Covariance with Spatially Correlated Secondary Variables 107 Figure 3: Soils data set in which the response and covariate are not colocated. Locations where soil cores were taken for the response are marked by diamonds, and locations of the sampled covariates are also identified. The proposed method was used for the analysis since the response variable and covariate are not colocated. The data were analyzed as a randomized complete block design with subsampling and fixed block effects. A spherical covariance structure was assumed, and the results are summarized in Tables 8-10. All parameter estimates in Table 9 are reasonable. Table 10 shows the least squares means of the two treatments and the standard error of their difference. Finally, as shown in Table 11, there does not appear to be a difference in values of ammonia nitrogen for cover crop treatment versus a no-cover crop treatment. Moreover, the covariate (soil electrical conductivity) is not significant. Table 9: Parameter estimates from the soils data example. Parameter of Interest Range sill for response sill for covariate β Estimate from Proposed Method 12.0600 0.9800 43.5100 −0.0350 Revista Colombiana de Estadística 31 (2008) 95–109 108 Tisha Hooks, et al. Table 10: Least squares means for treatments and average standard errors of treatment differences from the analysis of the soils data. Estimate from Proposed Method LSMEAN for treatment 1 LSMEAN for treatment 2 Standard error of treatment difference 3.9900 3.8800 0.2032 Table 11: Results of hypothesis tests for treatment differences and significance of the covariate from the analysis conducted in the soils data example. Test for treatment differences Test for significance of covariate F = 0.55 (p -value = 0.5125) z = −0.75 (p -value = 0.4532) 6. Discussion and Conclusion First of all, it should be noted that the proposed method and analysis make the usual intrinsic coregionalization assumption of equal ranges for the response variable and the covariate. The model could be altered to allow for this assumption to be relaxed, but the current framework assumes the correlation matrices are proportional to each other. Also, the results from the proposed method and the spatial analysis of covariance are very similar when the data are colocated. It appears that little is gained by accounting for the correlation structure of the covariate, and if the response variable and the covariate are colocated, the authors recommend analyzing the data using a simple spatial analysis of covariance. In fact, when data are colocated, the results are a generalized form of the error-in-variables problem. However, the true power of this analysis lies in the fact that the model allows for analysis of covariance when data are not colocated. Using this method, it is not essential that the covariate and response be measured at the same location or to have the same number of observations for the response and covariate. Although a data set in which the response and covariate are not colocated can be manipulated so that a spatial analysis of covariance is appropriate, more than one choice for a covariate exists. For example, the covariate could have been constructed by averaging all covariate measurements within a plot, had the data allowed for this. The construction of the covariate is arbitrary and has not statistical validation associated with it; thus, using the proposed method is superior to the alternative. The spatial locations of the covariates play a role in the proposed analysis, and this leads to more precision in testing for treatment differences. Lastly, and perhaps most importantly, experiments where the covariate is available at fewer locations than Revista Colombiana de Estadística 31 (2008) 95–109 109 Analysis of Covariance with Spatially Correlated Secondary Variables the primary variable still allows this supplementary information to be incorporated into the analysis. Recibido: noviembre de 2007 — Aceptado: mayo de 2008 ˆ ˜ References Banerjee, S., Carlin, B. P. & Gelfand, A. E. (2004), Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall/CRC Press. Cressie, N. A. C. (1993), Statistics for Spatial Data, John Wiley and Sons, Inc., New York, United States. Dubin, R. (1988), ‘Spatial Autocorrelation’, Review of Economics and Statistics 70(466-474). Goovaerts, P. (1997), Geostatistics for Natural Resources Evaluation, Oxford University Press, New York, United States. Helterbrand, J. D. & Cressie, N. (1994), ‘Universal Cokriging Under Intrinsic Coregionalization’, Mathematical Geology 26(2), 205–226. Isaaks, E. H. & Srivastava, R. M. (1989), An Introduction to Applied Geostatistics, Oxford University Press, New York, United States. Journel, A. G. & Huijbregts, C. J. (1978), Mining Geostatistics, Academic Press, New York, United States. Marx, D. & Stroup, W. (1993), Analysis of Spatial Variability using PROC MIXED, in ‘Proceedings of the 1993 Kansas State University Conference of Applied Statistics in Agriculture’, pp. 40–59. Milliken, G. A. & Johnson, D. E. (2002), Analysis of Messy Data, Volume III: Analysis of Covariance, Chapman and Hall/CRC Press. Oliver, D. S. (2003), ‘Gaussian Cosimulation: Modeling of the Cross-Covariance’, Mathematical Geology 35, 681–698. R Development Core Team (2004), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. *http://www.R-project.org Searle, S. R. (1971), Linear Models, John Wiley and Sons, Inc., New York, United States. Stein, A. & Corsten, I. C. A. (1991), ‘Universal Kriging and Cokriging as a Regression Procedure’, Biometrics 47, 575–587. Zimmerman, D. L. & Harville, D. A. (1991), ‘A Random Field Approach to the Analysis of Field-Plot Experiments and other Spatial Experiments’, Biometrics 47, 223–239. Revista Colombiana de Estadística 31 (2008) 95–109

Analysis of Covariance with Spatially Correlated Secondary Variables

Related documents

Products

Support

Analysis of Covariance with Spatially Correlated Secondary Variables

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib