Semivariograms All forms of kriging assumes you know Cov (Z (s i ), Z (s j )) Or, equivalently Cor (Z (s i ), Z (s j )) Usually, need to estimate this Non-spatial approach: data-based estimate ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● Spatial data: two problems 1) only one observation at s i and one at s j plot of Z (s i ) against Z (s j ) not very useful! 2) need vector of Cov (Z (s 0 ), Z (s i )) when haven’t observed Z (s 0 ) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 1 / 58 Spring 2016 2 / 58 Spring 2016 3 / 58 Semivariograms Need a model! How does Cov (Z (s i ), Z (s j )) depend on: distance between s i and s j direction from s i to s j location of s i and s j in the study area Use a model in the same way we use a regression model: E Yi = β X i + εi . Optimal prediction is E Y | X , mean of Y for that X If have many Y at an X (more like ANOVA), can calculate sample average no model required One Y per X (regression) Use a model to predict Y at observed X or new X use global characteristics to predict for a specific X relies on assumptions being appropriate Examine each Cov characteristic in turn c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Does Cov depend on location? Two pairs of points, same direction, same distance, different parts of study area Same covariance? ● ● ● ● c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Stationary spatial processes 2nd order stationary µ(s) constant across study area Cov (Z (s), Z (s + h)) same across study area h specifies a particular distance and direction So in previous picture, the two pairs of points would have same Cov above implies Var Z (s) constant across study area instrinsic stationarity Var (Z (s) − Z (s + h)) same everywhere Slightly weaker assumption Some really care about the difference. I don’t. We’ll assume 2nd order stationarity c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 4 / 58 Does Cov depends on direction? Isotropic spatial process Cov (Z (s), Z (s + h)) same in all directions Only depends on distance between two points, i.e. || h|| Anisotropic spatial process Cov (Z (s), Z (s + h)) depends on direction 1.0 consider Cov as function of distance, but only for certain directions 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● 0.4 ● ● ● 0.2 ● ● ● ● 0.0 ● ● 0.0 0.2 0.4 0.6 0.8 1.0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 5 / 58 0.8 0.0 0.2 0.2 Covariance 0.4 Covariance 0.4 0.6 0.6 0.8 Anisotropy 0.0 0.2 0.4 0.6 Distance c Philip M. Dixon (Iowa State Univ.) 0.8 1.0 0.0 0.2 Spatial Data Analysis - Part 4 0.4 0.6 Distance 0.8 Spring 2016 1.0 6 / 58 Geometric anisotropy can redefine X, Y coordinates so rescaled process is isotropic simple scaling & rotation → isotropy Example: Swiss rain data, plots on next 3 slides Original coordinates rotated 45◦ counterclockwise = −pi/4 radians stretched 5-fold in X direction c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 7 / 58 ●● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●● ● ●● c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 8 / 58 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 ● ● ● ● ● ●● ● ●●●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ●● ●● ●●● c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 9 / 58 ● ●● Spring 2016 10 / 58 Dealing with anisotropy Think about the data Can a covariate explain part of the pattern? Swiss rain: elevation But then need covariate values to make predictions Geometric anisotropy can transform coordinates to make isotropic General anisotropy can repeat what we’re about to do in different directions more complications, more details, no change in concept Discuss later c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 11 / 58 Spring 2016 12 / 58 Estimating covariances We will assume isotropy and 2nd order stationarity So, Cov (Z (s i ), Z (s j )) depends on ||s i − s j || i.e, the distance between s i and s j smaller areas: Euclidean distance larger areas: should use great circle distance c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Estimating covariances Remember non-spatial data ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● All X observations have same µ and σ 2 , ditto all Y to estimate covariance / corr.: If an obs has an above-average X , does it have an above-average Y ? ditto below-average calculate (Xi − µX )(Yi − µY ) for each obs, then average Works because each pair (X , Y ) is an independent sample from the population c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 13 / 58 Estimating spatial covariances Spatial data: Remember N = 1, even when 1000 locations Data are one realization of a correlated spatial random vector Concept: When 2nd order stationary can use “replicates” in space instead of replicate realizations called Ergodicity What are spatial replicates? Depends on assumptions: pairs of points separated by same distance pairs of points separated by same distance and same direction We’ll assume isotropy, so only distance matters c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 14 / 58 Spatial covariance Remember Cov = E (Z (s i ) − µi )) (Z (s j ) − µj ) 2nd order stationary, so µ constant Know h = the distance between s i and s j . Find all pairs of locations separated by that same distance Could plot Z (s i ) against Z (s j ). rarely done. Cov is a theoretical average (E ) Estimate by calculating (Z (s i ) − µ̂) (Z (s j ) − µ̂) for each of those pairs of obs. and average those values That average estimates Cov(h) for that distance h repeat for other h to estimate the Cov(h) function. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 15 / 58 Spring 2016 16 / 58 Covariogram cloud Usually done for all distances simulaneously calculate (Z (s i ) − µ̂) (Z (s j ) − µ̂) for each pair of obs. calculate distances between each pairs of obs. empirical cov vs. distance Example: covariogram cloud for the Swiss rain data c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 ● ● 50000 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●●●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ●●● ● ●● ● ●●●● ● ●●● ● ● ●● ●●● ● ●●● ●● ●● ●● ● ●●●● ● ● ● ●● ●● ● ●●●● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ●●● ●●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ●●● ● ●● ●●● ● ●● ●●● ●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ●● ●● ●●●●●● ●●●●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ●● ●● ●●● ● ●● ●● ●●●● ●●●● ●● ●● ●●●● ● ●● ● ●●● ●●●●●● ● ● ● ● ●●●●● ● ● ●● ●●●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ●●● ●● ● ●● ● ●●● ● ● ●●● ● ● ● ●● ●● ● ●●●●●●●●●● ● ●●●●● ● ● ●●●●●● ● ●● ●● ● ●●●● ●● ●●●●● ●●● ●●● ●●● ●● ● ● ● ●●● ●● ● ●●●● ●●●●● ●●● ●●●● ●●●●● ●● ● ● ●● ● ● ●●● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ●●● ● ●● ● ●● ● ● ● ●● ●● ● ●● ●●● ● ●●●●●● ●●●● ● ●●● ● ● ●● ● ● ● ● ● ●●● ●● ● ●●● ●● ●●●●● ●● ●●● ● ●●●● ●● ● ● ●●● ●●● ● ●●●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●●● ● ●● ●● ●●●● ● ● ● ● ●●● ● ● ●●●●●●●●● ●●●● ●●●● ● ●● ● ●● ● ●●● ● ●●● ● ● ● ●●●● ●●● ●●●●●●● ●●●● ●● ●●●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ●● ●● ● ●●●● ●●● ● ●●●●● ● ● ● ●●● ● ●● ●● ●●● ● ●●●●●●● ● ● ●●● ● ●●●● ●● ● ● ●●●●●●● ● ●●● ●● ● ●●●● ●● ●● ●● ●●●●●●●● ● ●● ●●●●●● ● ● ●● ● ●●● ● ●● ●●● ● ●●●●●● ●●●●●● ●● ● ● ●● ●● ● ●●●●●● ● ●●●●● ●● ● ●● ● ● ●●●● ●●● ● ● ● ●●●● ●●●●●●●● ● ● ●●●●●●● ● ●● ● ●●●●● ● ● ● ●● ●●●●● ● ●●●● ●● ● ●● ●●●●●●● ●● ● ● ● ●● ● ●●●●● ●● ● ●●●●●● ● ● ●●●● ● ●● ● ●● ● ● ● ● ● ● ● ●●● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●●●● ●●● ●● ●●● ●● ●● ●●●●● ●●● ●● ● ● ● ● ●● ●● ●●● ●● ● ●●● ● ● ● ● ●● ●● ●● ● ●●● ● ●●● ● ●●●●● ●● ● ●●●● ●●●●●●●●●● ● ● ●● ●●● ● ●●● ●●● ●● ●● ●●●●● ● ●●● ●●●●●●● ●● ● ●● ●● ● ● ● ●● ●● ● ● ●●● ● ●● ●● ●●● ● ● ●●● ●●●●● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ●●●●●● ● ● ●● ●●● ●●● ● ●●●● ●● ● ●● ●● ●● ●● ●● ● ●● ●●● ●● ● ●●●●● ●● ●●●● ●●● ● ●●●● ●● ●●●●● ●● ●● ● ● ● ● ●●●● ●●● ● ● ●● ●●●●●●● ● ● ● ●●●● ●●● ●●●●●●●● ● ●● ● ●●●●●●●●●●●●●● ●●● ● ● ● ● ●●●●●●●●●●●●● ●● ● ● ●●● ●● ● ●●● ●● ● ●●● ●●● ●● ● ●● ●● ●●● ●● ● ● ● ●●● ●●● ● ● ●● ● ● ● ●● ●●●●●● ● ●● ●● ● ●● ● ●●● ●●●●●●● ●●●● ●●● ●● ● ●●●●● ● ●● ●●● ●●●●● ● ●●●●● ●● ●●●● ●● ●●● ●● ● ●●●●●●●● ●●●● ● ●●●●● ●●●●●● ●●● ● ●● ●●●●● ●●●●● ●● ●●● ● ● ●● ● ● ●● ● ●● ●●● ● ● ●● ●●●●● ●● ● ●●●● ● ●●●● ● ●● ●● ●●●●●●●●● ● ● ● ●● ●● ●●● ● ●● ●●●●● ● ●● ● ●●●● ●●● ●● ● ● ●●●● ●● ●●●●● ● ●● ● ●●● ●●●● ●●● ●●● ●● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ●●●●●●● ●● ●●● ●●● ● ● ●●● ● ●● ●●● ●●●● ●●●● ●● ●● ●●●●● ●● ●●●● ●● ●● ●●●●● ●● ●●● ● ●●●● ●● ●● ●● ●●● ●●● ● ● ●●●● ●●●●●●●●●●●●●●●●●●● ● ●● ●● ● ● ●●● ● ●●●●● ●●●●● ● ●● ● ●● ● ● ● ●● ● ●●● ●● ● ● ●●● ●●●●● ●● ● ●●●●●● ●● ●●● ●●●● ●● ●●● ●●●●●●● ● ●● ●●● ●●●●●● ●● ● ● ● ●●●●● ●● ●●● ●● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●●● ● ●● ● ● ●●●●●●●●● ●●● ●● ●●●●●● ●●● ●●●● ● ●● ●● ●●● ●●●●● ●●●●● ● ● ●● ● ●● ●●● ● ●●● ●● ● ●●● ● ●● ●●●● ●● ● ●●●●● ● ● ●● ● ● ●● ● ●●●●●● ● ●●● ● ●●●●●● ●●●●●● ● ●●●●●●●●● ●● ● ●●●●● ●● ●●●● ● ●● ● ●● ● ● ● ●● ●●●● ● ● ●● ●●●● ●●● ● ●●● ●●● ●● ●● ● ●●● ●● ●●●●● ●● ●●●● ●● ●●● ●●● ● ●● ● ●●● ● ●● ●●●●● ●●●● ● ● ●●●● ● ●●● ●●● ● ●● ●● ●●● ●●● ●● ● ●●●●●●●●● ●●● ●●●●●●●●● ●● ● ●●● ●● ● ●●● ● ● ● ●●●● ● ●●● ●●●● ●● ● ●●●● ●●● ●●● ●●●●●●● ● ●● ● ● ●●●●●●●●●●● ●● ● ●●●● ●●● ●●●●● ●●● ● ●● ● ● ● ●●●●●●● ● ●● ●●● ●●●● ●●● ●● ● ●● ● ● ● ●● ●● ● ● ●●●●●● ●●● ●● ●● ●● ●● ●●● ●●●●●●●● ●●●●●●●● ● ● ●●●● ● ●● ● ● ●●● ● ● ● ●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●●●●●●● ● ● ● ●● ●● ● ● ● ●● ●●●● ●●●●●● ● ● ● ●● ● ● ● ●● ● ●●●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ●● ●● ●●● ●●● ● ● ● ●● ● ●● ●● ● ●●●● ●●● ●● ● ●●● ● ● ● ● ●●● ●● ●● ●● ●● ● ●● ● ● ● ● ● ●●●● ●● ● ●●● ● ●●●●●●●● ● ●●●●●●●●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ●●●●●●●●●●● ●●●● ● ● ●● ●● ● ●● ●●● ● ●●●● ● ● ●● ●●●●●● ●● ●●● ●●● ●●●● ●● ● ●● ●●●●●● ● ● ● ●●●●●●●●●●●● ●●●●●●● ●● ●●●●●●● ●●● ● ●● ● ● ●● ●● ● ● ● ● ● ●● ●●●●●● ●●●● ● ● ● ● ● ●●● ● ● ● ●●●●●●●●● ●●● ● ●●●●●●●● ●●●● ● ● ●● ●●●●●● ● ●● ● ●● ● ●●●●●● ● ●● ●●●●●●● ●● ●●● ● ● ●● ●● ● ●●●●●●● ● ●●●● ● ●● ● ●● ●● ●● ●● ●●● ● ● ●●● ●●●●●●● ●● ● ●● ●● ●● ●●● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ●●● ● ●●●● ● ● ● ●● ●● ●● ●● ● ●●●●●● ● ●●● ●●●●●●● ● ●●● ●● ●● ●● ●● ● ●●● ●●● ●●●●● ●●● ● ●● ● ●● ● ●●●●● ● ●● ● ●●● ● ●● ●●● ● ●●● ●●● ●● ● ●● ●● ●●● ●● ●● ● ● ● ●● ● ●● ●● ●● ●●●● ● ●●●● ● ●●●● ●●● ● ●●●● ● ● ●●● ●● ●●● ●● ● ●●● ● ●●● ●● ●● ● ●●● ●●● ● ● ●● ●●●●●●● ●●●● ●●●●●●● ●●● ● ●● ●● ● ● ●● ●● ●● ● ● ●●● ● ● ●● ●● ●● ●●● ● ● ● ●●●●●●● ● ● ●●● ●● ● ● ● ●●● ● ●● ●●● ● ● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ●●●●● ● ● ● ●● ●● ●●●● ●● ● ●● ●● ●●● ●● ● ●● ● ● ● ●● ●● ● ● ● ●●●●● ●● ●●● ● ● ● ● ●● ●●●●● ●● ●● ● ●●● ● ● ● ● ● ● ●● ● ●● ●●● ●●● ●● ●● ●●● ●●● ●● ●●●●●●● ● ● ●● ● ●●●● ● ● ●●● ● ● ●● ●● ●● ●● ● ● ●●● ● ●● ●● ●●●● ● ●● ●● ● ● ● ●●●● ●●●●●● ● ●● ● ● ●● ●●●●●●● ●●●● ● ● ● ● ● ● ●● ●●●●● ●● ● ●● ● ● ● ● ●● ●●● ●● ● ●● ●● ● ● ● ● ●●● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ●●●●●● ● ● ● ●● ● ●●● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ●●●● ●●● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ●● ●●● ●●● ●● ●● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●●●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ●● ● ●● ● ●● ●● ● −50000 Cross−product ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● 0 ● ●●● 50000 100000 200000 300000 Distance c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 17 / 58 Semivariance Need µ̂ to calculate empirical covariance Can avoid calculating µ̂ Remember (a few weeks ago): Var X = E (X − µ)2 = E (Xi − Xj )2 /2 Average squared diff. of pairs of observations estimates variance Same idea works for covariances 2 Use [Z (σ i ) − Z (σ j )] /2 to evaluate the spatial dependence between a pair of locations Called the semivariance Again, calculate for each pair, plot vs distance c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 120000 18 / 58 ● ● 100000 ● ● ● ● ● semivariance ● ● ● ● ● ● ● ● ● 60000 ● ● ● ● ● ●● ● 20000 ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●●● ● ● ● ● ●●● ● ●● ● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ●●● ● ● ●● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ●● ●● ● ●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ●●●●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ●● ●●●●● ● ● ● ●● ●● ●● ● ● ●● ●●● ●●●● ●●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ●●●● ● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●●●●● ●●●●●●●● ● ●● ●● ● ● ● ● ● ●● ● ● ●● ● ●●●● ●●●● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ●● ● ●● ●● ● ● ● ●●●● ● ●●● ● ●● ●●● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ●●● ●● ● ● ●● ●● ● ● ● ● ●● ●●● ● ●●●●●●●●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ●●●● ●● ● ●●●● ●● ● ● ●● ● ●● ●● ●● ●●● ● ● ● ●●●● ●●● ● ● ●● ●● ●● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ●●● ● ● ●●● ●● ● ● ●● ● ●● ●●● ●●●● ● ●● ●●●●●●● ●● ● ● ● ● ●●● ● ● ●●●● ● ●● ●● ● ●●● ● ●●●●● ●● ●● ●● ●● ● ● ●● ● ● ●●● ●● ●● ● ● ●●● ● ● ●●● ● ●● ● ● ●● ● ●●●●● ●●● ●●●● ●●● ● ●● ●● ● ● ●●● ● ●●● ●●●● ●●● ● ● ● ● ●● ● ●●● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ●● ●●● ●● ● ●● ●●● ●● ●● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●●●● ●●●● ● ●● ● ● ● ●●● ●●● ●●● ●●● ● ●● ●● ●● ● ● ● ●● ● ●●● ● ●● ● ●● ● ● ● ●●● ●● ● ●●●●●●●●●●● ● ●● ●● ● ● ●● ●● ● ● ●●●● ●● ● ● ● ● ●●●● ● ●● ● ● ●●● ● ● ●●●●●● ●● ● ● ●● ●●●● ● ●● ●● ●● ●●● ● ●● ●● ●● ●●● ●● ● ●●● ● ●●● ●● ●● ●● ● ● ●●●● ● ●● ●●●●●●●●●●● ●● ●●● ● ●●●●●●●●●●●●● ●●● ●● ●●●●●●●● ●●●● ●●●● ●●●● ●● ● ●● ● ●● ●● ●●● ●● ● ● ● ● ● ● ●●●● ● ● ● ●●● ●●●●● ● ●● ●● ● ●●● ●● ● ●● ●● ●●●●●● ● ●●●● ●●●● ● ● ● ●●● ● ● ● ●●●●● ●●● ● ● ●● ●●● ●● ● ● ● ●●● ● ● ●●● ● ● ● ●●● ● ●● ● ● ●● ●●● ● ● ● ● ● ●●●● ●●●● ●● ● ●● ●● ●● ●●●● ●●●●● ●●●● ●● ●● ●●●●●● ●●●●●● ● ● ●●●● ● ●●● ● ●●● ●●●●● ●●● ●●●● ● ●● ● ●●●● ● ●●● ●●● ●●●●●● ●●● ●● ●● ●●●● ●● ●● ●● ●● ● ●●●●●●●● ●●● ●● ●● ●●●●●● ●●● ●● ● ●● ●● ● ●● ● ●●●● ●●●● ●● ● ● ●●●● ● ●●●●● ●●●● ●● ●●● ● ●●● ●●● ● ●●● ●● ●●●●●●● ●●● ●● ● ●●● ●●● ●● ●●● ● ● ● ●● ●●● ●●●● ●●● ●● ●●●●●●●●● ●● ●● ●● ●● ● ●●●● ●●● ●●● ● ●● ●● ●●● ●●● ●●● ● ●● ●●● ● ●● ●● ● ●● ● ●● ●● ●● ● ● ●● ●●● ● ●●●● ●● ●●●●●●●●● ● ● ●●● ● ● ● ●●● ●● ●● ●●● ● ●●● ● ● ● ●● ● ● ● ●●●● ● ● ●●● ●● ●●● ●●● ●●● ● ● ●●●●●●● ● ●●●●●● ● ● ● ● ●●●●●●● ●● ● ●● ● ●● ● ● ● ●●●● ●● ● ●●● ●●● ●● ● ● ●● ● ●● ●● ● ● ●●●●●● ● ●●● ● ●●● ● ●●● ●● ● ●●● ●● ●● ●●● ●●●●●● ●●● ●●● ●● ●● ● ● ● ●● ●● ●●● ● ●● ●● ● ●●●●●●● ●●●● ● ●● ● ●● ● ● ●● ●●●● ●●● ●●● ●●●● ●●● ● ●●●●● ●● ●●●● ●●● ●●●●● ●●●● ● ● ●●●●● ●● ●● ●● ●● ●● ●●● ● ● ●● ● ● ●● ●● ●●●●●●●● ● ● ●●●● ●●● ●● ●●● ●●● ●● ● ●●●● ●● ● ●● ●●●● ●●● ●●●● ● ●●●● ● ●●● ●●● ●● ● ● ●●●● ●● ●●● ●● ● ● ●●●● ●● ● ●●●●●●●●● ● ●●●●●● ●●●●● ●●● ● ● ●●●●● ●●●● ●●● ●●●● ●●●● ●●● ●● ●●● ●●●●●●●● ● ●●● ●●●●●●● ● ●●●●● ●●●●● ●● ●● ●●●●●●● ● ● ● 40000 ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●● 80000 ● ● ● ● 20000 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 40000 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60000 80000 100000 distance c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 19 / 58 Semivariance Why use semivariances instead of covariances? When 2nd order stationary - can convert between the two E 1 2 (Z (s i ) − Z (s j )) = σ 2 − E (Z (s i ) − µi ) (Z (s j ) − µj ) 2 In rare situations, semivariance can be calculated but covariance shouldn’t difference between intrinsic and 2nd order stationarity convenience: don’t have to estimate µ Notation: γ(h): theoretical semivariance at distance h γ̂(h): empirical semivariance at distance h c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 20 / 58 Semivariogram Single semivariance values are very variable Smooth semivariogram cloud by averaging “Classical” or Matheron estimator γ̂(h) = 1 Σ [Z (s i ) − Z (s j )]2 2 N(h) N(h) Sum over pairs of points separated by distance h N(h) is number of pairs at that distance When locations on a grid, easy to do many pairs of locations with same distance When locations are irregular, have to create “distance bins” Define a range of distances = a bin, e.g. 0-25000m Calculate mean distance and mean semivariance for the bin Repeat for rest of bins Plot X = mean distance vs. Y= mean semivariance c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 ● ● ● ● ● semivariance ● ● 15000 21 / 58 ● ● ● 10000 ● ● ● 5000 ● ● 20000 40000 60000 80000 100000 distance c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 22 / 58 Choice of bin size/number How precise is the estimated semivariance? Var γ̂(h) ≈ 2γ(h)2 N(h) How big should the bin be? Want at least 30, preferably 50 or more, pairs per bin Choice of binning matters very common to make all bins equally wide but, N(h) often small for large h γ̂(h) not very precise, semivariogram is erratic Solution: don’t calculate γ̂(h) for large distances one recc.: calculate γ̂(h) to 1/2 max distance Swiss rain: max distance is 291 km, so calculate to h = 150 km. Notice default max distance in R is less Would like to have 10-15 bins, but # pairs more important c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 23 / 58 Choice of bin size/number When locations irregular, N(h) often small at short distances less of a problem because γ(h) small, so Var γ̂(h) small Alternative is to have equal # pairs per bin → wide bins for short distances and very long distances concern is loss of information about γ̂(h) at small h that info. is crucial for fitting models to semivariograms I tend to use equi-distant bins without large lag distances But, I really want a good estimate of γ(h) for h close to 0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 24 / 58 20000 20000 423318 ● 6 44 ● ● 356 ● ● 142 ● 3 ● 85 ● ● 35 ● 339 ● 30 ● 181 13 ● ● 93 0 0 50 100 150 200 250 300 0 50 100 20000 2 356 ● ● 279 314 10000 ● ● 251 336● 189 ● 28 ● ● 68 ● 250 300 247 ● ● 321 200 ● 5 218 ● 24 ● 22 ● ● ● 68 248 ● ● 247 ● 248 ● 247 ● 247 5000 ● 122 247 247 248●248 ● 247 ● 247 ● 248 ● 247 247 ● 248 ● ● 247 ● ● 248248 ● 248 ● 15000 307 ● Semivariance 15000 ● ● 169 247 ● 106 271● 131● ● 382 ●366 368 150 Distance group 10000 20000 Distance group Semivariance ● ● ● ● 429 ● 0 5000 315 358 ● ● 388 ● 5000 31 324 ● 241 182 155 467 458 ● 411 443 ● ● 15000 207 ● 138 ● 10000 ● 437 566 Semivariance 15000 ● ● 10000 Semivariance ● ● 5000 ● 560624 ● 595 503 ● 0 0 ● 0 50 100 150 200 250 300 0 50 100 Distance group c Philip M. Dixon (Iowa State Univ.) 150 200 250 300 Distance group Spatial Data Analysis - Part 4 Spring 2016 25 / 58 Cressie-Hawkins estimator Classical estimator is very sensitive to outliers One unusual value, pairs. Z (s ), can really mess up γ̂(h) because used in N-1 Cressie-Hawkins estimator is more robust to outliers 4 0.5 ΣN(h) | Z (s i ) − Z (s j ) |1/2 γ̂(h)CH = 0.494 0.045 0.457 + N(h) + N(h)2 Where does this come from? | Z (s i ) − Z (s j ) |1/2 is not dominated by a single large squared difference That makes C-H more robust to outliers When Z (s) ∼ N(0, 1), i4 1 h 0.494 0.045 E ΣN(h) | Z (s i ) − Z (s j ) |1/2 ≈ 2γ(h) 0.457 + + 2 N(h) N(h) N(h) ⇒ C-H estimator (also called robust SV estimator) 25000 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 26 / 58 15000 10000 0 5000 Semivariance 20000 Matheron Cressie−Hawkins 0 50 100 150 200 250 Distance c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 27 / 58 Semivariogram parameters With either estimator, plot of X=h vs. Y=γ̂(h) describes the pattern of spatial correlation Three important numbers that describe that plot: nugget, sill, and practical range Picture on next slide Nugget: γ(h), for distances very very close to 0. quantifies micro-scale variation term from mining geology: nuggets of metal in ore → micro-scale variation Replicate observations at a single location assumed to be identical (γ(0) = 0) i.e., γ(h) not continuous - has a jump at 0 Later: will discuss measurement error kriging, in which replicate observations are not identical c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 28 / 58 1.0 0.8 sill 0.4 0.6 practical range 0.0 0.2 nugget c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 29 / 58 Variogram parameters Sill: γ(h), for large h. If σ 2 constant, sill = σ 2 Sill is an asymptote, technically, γ(h) never equals sill. Partial sill: Magnitude of the “spatially associated” variability partial sill = sill - nugget Practical range: distance at which there 95% of the spatially associated variability i.e., γ(h)nugget = 0.95(sill − nugget) obs. separated by more than the practical range are essentially uncorrelated. 0.95 is traditional. All above assumes one spatial process. Can have multiple sills and practical ranges if multiple processes Spatial Data Analysis - Part 4 Spring 2016 30 / 58 Spatial Data Analysis - Part 4 Spring 2016 31 / 58 0.0 0.2 0.4 0.6 0.8 1.0 c Philip M. Dixon (Iowa State Univ.) c Philip M. Dixon (Iowa State Univ.) Semivariogram models To use kriging, need Cov Z (s ), Z (s ) for all locations with observed data and all prediction locations If field is second order stationary, γ(h) = σ 2 − Cov (h) = Cov (0) − Cov (h), so Cov (h) = Cov (0) − γ(h) So, need γ(h) for any h Fit a model to the variogram or variogram cloud. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 32 / 58 Semivariogram models Many different models Three very commonly discussed models: " # Spherical: 1 h 3 3h − , for h ≤ α γ(h) = σ 2 2α 2 α Cov exactly 0 for h > α Exponential: γ(h) = σ 2 [1 − exp(−3h/α)] Gaussian: γ(h) = σ 2 [1 − exp(−3 2 h )] α In all, α is the practical range, σ 2 is the sill c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 33 / 58 Some variogram models 10000 0 5000 Semivariance 15000 Matern, 0.25 Exponential Whittle Gaussian 0 20 40 60 80 100 120 Distance (km) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 34 / 58 Semivariogram models Exponential and Gaussian are two specific examples of models in the Matern class, a large family of semivariogram models # " k 1 θh γ(h) = σ 2 1 − 2Kk (θh) Γ(k) 2 Γ(k) and K (θh) are math special functions (Gamma and modified Bessel fn of 2nd kind, order k) θ controls the range, = 1/α, σ 2 controls the sill k controls the shape of the variogram Gaussian is k = 2, exponential is k = 0.5. k = 1 is the Whittle model. I have found it quite useful h h γ(h) = σ 2 1 − K1 α α The Gaussian is one of the historically traditional models, but modern opinion is to avoid it Wackernagel, 2003. The Gaussian model is “pathological”. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 35 / 58 More variogram models linear: γ(h) = θh no sill!, unless you force one. Not second order stationary, C (0) = σ 2 doesn’t exist unless you force a sill But is intrinsic stationary: Var (Z (s i ) − Z (s j )) is constant for any distance h So semivariogram is well defined My experience is that a linear trend semivariogram usually indicates trend across the study area I suggest modeling the trend, then computing a semivariogram from the residuals from that trend wave or hole-effect: h h γ(h) = σ 2 1 − sin α α models periodic spatial patterns picture on next slide c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 36 / 58 5000 10000 Semivariance 15000 20000 More Semivariogram models 0 Spherical Linear w/sill Hole 0 20 40 60 80 100 120 Distance (km) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 37 / 58 Nuggets Can add to any model e.g. Exponential with nugget γ(h) = σ02 + σ 2 [1 − exp(−3h/α)] σ 2 is the partial sill, the spatially-associated variation σ02 is the micro-scale variation In practice, very few pairs of observations are really close to each other, i.e. with h ≈ 0 So, the estimated nugget is an extrapolation down to h = 0 nugget can be very sensitive to the choice of semivariogram model. Even if there is a nugget, γ(0) is still defined as 0 The Kriging prediction at an obs. location is the obs. value But, γ() = nugget, where is a very small number Which means that the prediction of Z (s) can be quite different a very small distance away from that observed location c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 38 / 58 Estimating a SV model Given an empirical average SV or SV cloud and a model, how do you estimate the SV model parameters (e.g. σ 2 , α, and perhaps σ02 )? All SV models are non-linear functions of their parameters Not X1i β1 + X2i β2 Use non-linear least-squares Could fit to cloud, but tradition is to fit to binned averages Given γ̂(hi ) for a set of bins and a specific model Find θ = (σ 2 , α, σ02 ) for which Σ(γ(hi , θ) − γ̂(hi ))2 is as small as possible Same criterion as linear regression No closed-form solution - need iterative algorithm must provide starting values bad start → big trouble generally well-behaved if start is reasonable I (and everyone else) uses eyeball guess as starting values c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 ● ● ● ● ● semivariance ● ● 15000 39 / 58 ● ● ● 10000 ● ● ● 5000 ● ● 20000 40000 60000 80000 100000 distance c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 40 / 58 Fitting SV models One issue: LS assumes all obs equally important But, if Var Y2 > Var Y1 , want to give more attention to fitting Y1 more closely 2 Remember Var (γ̂(h) ≈ 2 γ(h) #pts more pts →↓ Var ↑ γ̂(h) →↑ Var Used weighted LS to put more emphasis on obs with lower variance Minimize Σwi [γ(hi , θ) − γ(hi )]2 statistically optimal weights are wi = 1/Var γ(hi ) Problem: Var depends on γ(h), which is what we are trying to estimate! Solution (not the only one): use wi = #pts h2 as the weight assumes γ(h) ≈ h c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 41 / 58 Spring 2016 42 / 58 Which SV model to use? Which model to use? Two approaches: 1) which model best fits semivariogram use wt SS as measure of fit. For Swiss rainfall: Exponential: SS = 1.608 Spherical: SS=1.184 Gaussian: SS=1.258 Matern, k=1: 1.237 Matern, k=5: 1.153 Matern, k=4: 1.140 Suggests Spherical or Matern with k=4 2) which model gives most accurate predictions? c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Which SV model to use? One small problem with assessing predictions Using data twice: Once to fit SV model Again, to assess precision of predictions known to be overly optimistic and favor models with more parameters leads to overfitting data at hand Three solutions, all often used: 1) Training/test set fit model to subset of data (training set) evaluate on remaining data (test set) requires arbitrary division into training and test both are subsets of full data set c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 43 / 58 Cross-validation 2) cross-validation remove 1st obs, using N − 1 obs. to fit SV and predict Z (s 1 ) h i2 calculate squared error for 1st obs: Z (s 1 ) − Ẑ (s 1 ) Ẑ (s 1 ) NOT based on Z (s 1 ), so valid “out-of-sample” prediction return 1st obs. to data set, remove 2nd obs., repeat above repeat for all obs., average squared errors h i2 gives mean squared error of prediction: N1 (Z (s i ) − Ẑ (s i ) exact same concept as PRESS statistic in linear regression 3) k-fold CV If many points, leave-one-out can be slow Divide data into k “folds”=sets of obs., k=5 and k=10 are common remove entire set of obs., fit model on 90% (for k=10) of data predicte omitted points, calculate rMSEP repeat for all other parts c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 44 / 58 Cross-validation For Swiss rainfall data: Exponentional: MSEP = 3615 Spherical: MSEP = 3447 Gaussian: MSEP = 3725 Matern, k=1: 3561 Matern, k=5: 3610 Matern, k=4: 3591 Suggests Spherical (one of two with good fit) But, none substantially better than any other c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 45 / 58 Cross-validation CV also provides a way to assess model in various ways plots on next pages plot predicted values vs residuals should be flat sausage, just like for linear regression spatial (or bubble) plot of residuals should be no big clusters if there are, don’t have a good spatial model outliers are more apparent very large residuals, usually surrounded by residuals with opposite sign c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 46 / 58 ● ● ● ● 100 ● ● ● ● ● ● ●● ● 0 ● Residual ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● −100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● −200 ● ● 100 200 300 400 Predicted c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 47 / 58 residual 100000 ● ● ● ●● 50000 ● ● ● ● ● ● 0 ● ● ● ● −50000 ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −239.818 −28.85 −5.007 27.622 164.185 ● ● −100000 ● −150000 −100000 c Philip M. Dixon (Iowa State Univ.) −50000 0 50000 Spatial Data Analysis - Part 4 100000 Spring 2016 48 / 58 Cross-validation Compare range of residuals to range of obs. values obs. values: 0 to 493 residuals: -240 to 160 these residuals are pretty variable the problem for this data set is (I think) the anisotropy process is more constant ⇒ smooth more in one direction than the other c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 49 / 58 Other ways to fit SV models REML (REstricted/REsidual Maximum Likelihood) likelihood-based, after removing fixed effects avoids binning most commonly used with linear models for trend / treatment effects more esoteric Composite likelihood and generalized estimating equations not commonly used Also even more robust ways to estimate semivariances Genton’s estimator (instead of Cressie-Hawkins) available in geoR, not gstat c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 50 / 58 Anisotropy Up to now: assume isotropy. Only distance matters Anisotropy: Spatial dependence dependence on direction as well as distance How to assess? What you know about the data. Is isotropy reasonable? Directional semivariograms Calculate semivariogram using only points in one direction repeat for other directions looking for similar curves in all directions Variogram map Plot of average semivariance for distance / direction combinations looking for “bulls eye” patterns. No single best tool c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ●● ● 51 / 58 Spring 2016 52 / 58 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● Spring 2016 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● c Philip M. Dixon (Iowa State Univ.) [0,98.6] (98.6,197.2] (197.2,295.8] (295.8,394.4] (394.4,493] Spatial Data Analysis - Part 4 var1 30000 100000 25000 50000 dy 20000 0 15000 10000 −50000 5000 −100000 −100000 −50000 0 50000 100000 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 20000 dx ● Spring 2016 ● ● ● ● Gamma(h) 10000 15000 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 53 / 58 ● 5000 ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● 20000 80000 Spatial Data Analysis - Part 4 Spring 2016 54 / 58 Spring 2016 55 / 58 0 5000 Gamma(h) 10000 15000 20000 c Philip M. Dixon (Iowa State Univ.) 40000 60000 Distance 20000 c Philip M. Dixon (Iowa State Univ.) 40000 60000 Distance Spatial Data Analysis - Part 4 80000 Anisotropy How to deal with different types of anisotropy Geometric anisotropy: Range longer in one direction than another Nugget and (partial) sill same in all directions Rotate coordinate system so longer direction aligned with an axis Then rescale that axis Do this by: cos θ − sin θ s ∗ = 10 λ0 s sin θ cos θ Where θ is the angle of the major axis and λ is the ratio of length of minor to length of major axis krige() will do this for you c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 56 / 58 Zonal anisotropy variogram sills vary with direction The direction with the shorter range also has the shorter sill. model by a combining an isotropic model and a model which depends “only on the lag-distance in the direction θ of the greater sill” (Schabenberger and Gotway, 2005, p 152). γ(h ) = γ1 (|| h ||) + γ2 (hθ ) Chiles and Delfiner (1999, p. 96) warn against axis-specific models, e.g.: γ(h ) = γ1 (hx ) + γ2 (hy ) Under certain circumstances they can lead to Var Z (s) = 0, which not good. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 57 / 58 Kriging with anisotropy Geometric anisotropy Do not have to transform and back transform coordinates krige() can do this for you in the vgm() specification of the variogram model, add anis=c(θ, λ) θ is the direction of the longest range (highest correlation) in degrees, measured clockwise from the + vertical axis (N) λ is the ratio of minor range to major range (a value from 0 to 1) so anis=c(60,0.2) specifies a major axis at 2 o’clock with a length 5 times the minor axis. Zonal anisotropy “fake it” by setting up a geometrically anisotropic component with a major axis much longer than the minor axis (e.g. λ = 0.00001. Only distances only along the major axis contribute to γ(h ) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 4 Spring 2016 58 / 58