Variograms Brian Klinkenberg Geography To get there, we’ll consider Correlation Tobler’s first law of geography Spatial autocorrelation Three dimensions of scale Units of observation Regionalized variables Variograms Correlation What is correlation? What does it measure? [n] a statistical relation between two or more variables such that systematic changes in the value of one variable are accompanied by systematic changes in the other [n] a statistic representing how closely two variables co-vary; it can vary from -1 (perfect negative correlation) through 0 (no correlation) to +1 (perfect positive correlation); "what is the correlation between those two variables?" [n] a reciprocal relation between two or more things [Source: http://www.hyperdictionary.com/dictionary/correlation] Correlation Suppose we have two variables X and Y, with means XBAR and YBAR respectively and standard deviations SX and SY respectively. The correlation is computed as: The correlation coefficient, r, quantifies the direction and magnitude of correlation. Correlation Correlation What is variance? (Contrast with correlation) The correlation coefficient is the covariance divided by the standard deviations of the two variables (normalized): Tobler’s first law of geography “All things are similar, but nearby things are more similar than distant things.” Or, more poetically "All things by a mortal power, Near or far Hiddenly To each other linked are, That thou canst not stir a flower Without the troubling of a star" Francis Thompson, "The Mistress of Vision" 1897 Spatial autocorrelation Extending the concept of correlation by incorporating space. Note, however, that the concept of spatial autocorrelation is actually more related to that of variance than of correlation since we are examining the nature of the variability of a variable with itself through space. Spatial autocorrelation “Usually autocorrelation means correlation among the data from different time periods. Spatial autocorrelation means correlation among the data from locations. There could be many dimensions of spatial autocorrelation, unlike autocorrelation between periods.” [http://economics.about.com/od/economicsglossary/g/spatial.htm] Spatial autocorrelation “Is a term referring to the degree of relationship that exists between two or more spatial variables, such that when one changes, the other(s) also change. This change can either be in the same direction, which is a positive autocorrelation, or in the opposite direction, which is a negative autocorrelation. For example, soil type and vegetation may be highly correlated, either positively or negatively depending upon the type of soil and vegetation under examination.” [http://www.geo.ed.ac.uk/agidexe/term?66] Spatial autocorrelation How to determine paired observations when working in space? The question of identifying pairs of observations become complex when we move from the traditional statistics world (i.e., aspatial) to the real (i.e., spatial) world. Should we only consider nearest neighbours (1st order) or also include 2nd order (neighbours of neighbours) and additional higher-order neighbours? Should we use simple binary adjacency coefficients / weights (1 if immediately adjacent / shared boundary; 0 if not) or distance-weighted measures of adjacency (distance could be simple geographic or based on the length of the shared border or friction-based geographic or genetic or …). Space as a matrix W where wij is some measure of interaction adjacency decreasing function of distance invariant under rotation, displacement readily obtained from a GIS Applications of the W matrix Spatial regression add spatially lagged terms weighted by W Anselin’s SPACESTAT Moran and Geary indices of spatial dependence n 1 wij xi x j 2 c i j 2 wij xi a 2 i j i Three dimensions of scale When we think of scale, from a cartographic point-of-view, we are discussing only the output scale (a map with a large scale of 1:10,000 would cover just a small area). In ecology, scale is more often associated with the size of the study area (a largescale study would encompass a large area). In landscape ecology terms such as grain and extent are often used, resolution is used in remote sensing, ‘support’ is a term used in geostatistics. There is a wide diversity of terms used in a wide array of disciplines that often share similar but not exact shades of meaning. Always be explicit in your use of a term. However, when fully considered we should recognize that there are multiple dimensions of scale: Three dimensions of scale The phenomenon (or process) itself—the size and phase spacing, and the range of action and extent of the effect. (Examples: Appalachian Mountains have an obvious size and phase spacing; clear cutting has an obvious footprint with an effect that can extend beyond that footprint). It is also important to note that patches may have one set of characteristics (e.g., mean patch size and variance) and that within a class there will be another set of characteristics (e.g., mean distance between patches within a class). Three dimensions of scale The sampling units—the sample size and shape, the spacing of the samples, and the extent of the study area. (Relates to the fine-scale [high resolution] variation that can be detected, and to the largescale [coarser resolution] variation that may be detected.) Three dimensions of scale The analysis—the size of the analytical units, the spacing of them and the extent of the analysis can be different from the scales associated with the phenomenon being studied (e.g., when computing a variogram the rule of thumb is that statistics should not be calculated to greater than one-third to onehalf of the extent of the study domain; if the data were a satellite image, the samples may be randomly selected throughout the image and the pixels may be 30m or 1.2km resolution, also consider IFOV issues). Units of observation When working in space, selecting the units of observation will have a significant impact on the results (MAUP). Is the unit of observation smaller than potential objects of interest, the same size as the objects of interest, or larger than the objects of interest (think of quadrats and what happens when the sampling point includes a tree? – such studies often require two or more sampling frameworks)? If larger than the objects of study, then aspects of the modifiable areal unit problem must be considered. Regionalized variables A variable that takes on values according to its spatial location is known as a regionalized variable. Considering a variable z measured at location i, we can partition the total variability in z into three components: z(i) = f(i) + s(i) + ε where f(i) is some coarse-scale forcing or trend in the data, s(i) is local spatial dependency, and ε is error variance (presumed normal). Regionalized variables blue dots represent the data Regionalized variables The structural component (e.g., a linear trend) The spatially correlated component The random noise component (non-fitted) Regionalized variables Regionalized variables are variables that fall between random variables and completely deterministic variables. Typical regionalized variables are functions describing variables that have geographic distributions (e.g. elevation of ground surface). Unlike random variables, regionalized variables exhibit spatial continuity; however, the change in the variable is so complex that they cannot be described by any deterministic function. The variogram is used to describe regionalized variables Variograms In mathematical terms, the semi-variogram: Where h represents a distance vector. Recall that the variance is expressed as: Variograms The semi-variogram is based on modelling the (squared) differences in the z-values as a function of the distances between all of the known points. Variograms In graphical terms: Why ‘semi’-variogram? As the lag distance approaches infinity, the expression converges to twice the variance. Therefore, dividing by 2 means that the sill approximates the variance. Variograms This is an example of a variogram produced using ArcGIS's Geostatistical Analyst. Variograms Statistical assumptions: Stationary—mean and variance are not a function of location. Second-order (weak) stationary is required—variance is a function of the separation distance. Isotropy—no directional trends occur in the data (as contrasted with anisotropy). However, you can compute directional variograms in order to assess directional trends in the data. Unbounded variograms (i.e., with no sill) are evidence of nonstationary variables. Use of trend surface analysis to remove global trends in the data (to transform a non-stationary variable [mean varies across space] to a stationary one). Lag distances – typically we group the distance intervals into classes so that we can have enough sample points within any one distance class (typically 30 is suggested as the minimum number). Small-scale (high resolution) variation (at the resolution implied by the original sampling scheme) may not be detectable as a result. Variograms The technique can provide a quantification of the scale of variability exhibited by natural patterns of resource distributions (although correlograms may be better for this, since you can conduct statistical tests on the results) and an identification of the spatial scale at which the sampled variable exhibits maximum variance. At larger lag distances (beyond the natural ‘scale’ of the phenomenon) harmonic effects can be noted, in which the variogram peaks or dips at lag distances that are multiples of the natural scale. Given the noise present in natural environmental data sets, it is unlikely that you will be able clearly to identify multiple scales. One approach might be to fit a semivariogram model to the data, and to examine the residuals for the presence of multiple patterns of scale. Variograms Variograms http://zappa.nku.edu/~longa/geomed/modules/geostats_lite/lec/illinois.html Kriging Kriging is a spatial interpolation technique based on semi-variograms. Unlike every other spatial interpolation technique, kriging provides a map that shows you the uncertainty associated with the prediction. Kriging TIN Prediction standard error map Prediction map Integrating GIS and spatial statistics Role of space Terminology an organizing dimension for information a source of context and linkage an explanatory variable a problem lattice, support, drift, topology, layer, coverage, region Software as glue within what conceptual framework?