Geostatistical data Notation: s: location, a vector value. Usually s = (x, y ) in some coordinate frame (e.g., longlat or UTM) Written as a vector because details of 1D (beach, line), 2D (earth surface), 3D (ocean, soil, atmosphere) not important Z (s ): a characteristic of location s . e.g., number of nesting turtles, soil pH, O2 concentration Characteristics of geostatistical data Z (s ) exists everywhere within boundary of study area Generally, no sharp changes (jumps) in Z (s ) Z (s 1 ) probably different from Z (s 2 ), but transition is smooth c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 1 / 63 Prediction Could be done to fill in a grid so can draw map or use image/contour plot Or, done because you need predictions at unmeasured points 3 3 1 X 1 2 What is Z at the location marked by X? c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 2 / 63 Prediction One possibility: average Z over entire region Very common in non-spatial contexts 1st law of geography (Tobler): everything is related to everything else more closely related to nearby things This principle is very important if there is a spatial trend or variation across the region c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 3 / 63 Spring 2016 4 / 63 Prediction What if had a bit more data: 3 20 9 3 1 X 2 10 1 15 7 10 Overall average for region clearly inappropriate Consider some form of local average We will discuss 3 methods: Inverse distance weighting Spatial trend model Kriging c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Inverse distance weighting Notation: s i is location of i’th observation dij is distance between location i and location j Z (s i ) is value of Z at location s i Ẑ (s i ) is prediction of Z at location s i Ẑ (s j ) = wij = Σi wij Z (s i ) , where Σi wij 1 dija a is an arbitrary parameter, commonly 1 or 2 a = 0 gives you the region average larger values → “more local” estimate, because emphasize shorter distances if dij = 0, i.e. predicting at an observed location, use observed value c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 5 / 63 Inverse distance weighting Characteristics: wij always ≥ 0 and wij /sum always ≤ 1 Sometimes set small values of wij to 0 Ẑ (s j ) always within range of observed values Some like this; other’s don’t. Problem: have to choose a. Some approaches: Ad hoc (you like the resulting picture), or tradition (your field always uses 2 or 1.5 or ??) demonstrate role of a by comparing results for a = 2 and a = 1 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 6 / 63 Swiss rain, IDW, power=2 Swiss rainfall, IDW, power=2 400 300 200 100 0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 7 / 63 Swiss rain, IDW, power=1 Swiss rainfall, IDW, power=1 350 300 250 200 150 100 50 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 8 / 63 Swiss rain, IDW, power=0 Swiss rainfall, IDW, power=0 199.0 198.8 198.6 198.4 198.2 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 9 / 63 Swiss rain, IDW, difference Swiss rainfall, IDW, difference 100 50 0 −50 −100 −150 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 10 / 63 Spring 2016 11 / 63 400 Swiss rain, IDW, comparison 300 ● 100 IDW, power=1 200 ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ●● ●●● ●●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ●●● ●● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ●●●● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● 0 ● 0 c Philip M. Dixon (Iowa State Univ.) 100 200 IDW, power=2 300 Spatial Data Analysis - Part 3 400 Spatial trend surface Assume some function of X and Y coordinates fits the data often a low order polynomial (linear or quadratic) Z (s i ) = β0 + β1 Xi + β2 Yi + εi , or Z (s i ) = β0 + β1 Xi + β2 Yi + β3 Xi2 + β4 Yi2 + β5 Xi Yi + εi , only used for predicting Ẑ (s i ) not doing any test or inference, so don’t worry about correlation in ε’s accounting for correlation, i.e. using GLS, will give better estimates of β̂’s. One potential advantage over inverse distance weighting: Can estimate Var ε Can construct prediction intervals for Ẑ (s i ): q Ẑ (s i ) ± T1−α/2 s 2 + Var Ẑ (s i ) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 12 / 63 Spatial trend surface Confidence and Prediction intervals Reminder: can interpret a fitted regression line two ways Predict average Y at some new X Uncertainty only in regr. coefficients (“the line”) If I collected a second set of n obs, how similar is Ŷ ? Decreases as n increases se of a mean, confidence interval for Ŷ Predict a new observation at some new X Uncertainty in both the line and obs around the line Before sampling a new obs, how accurate is prediction? Usually, very similar to s (rMSE), never smaller sd of a predicted observation, prediction interval for Ŷ c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 13 / 63 Spring 2016 14 / 63 Spring 2016 15 / 63 Spring 2016 16 / 63 600 Confidence interval: Swiss rain 300 0 100 200 Rainfall 400 500 Fit 95% CI −150000 −100000 −50000 0 50000 100000 150000 Easting c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 600 Prediction interval: Swiss rain 300 0 100 200 Rainfall 400 500 Fit 95% PI −150000 −100000 −50000 0 50000 100000 150000 Easting c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Swiss rain, linear TS Swiss rainfall, linear trend surface 300 280 260 240 220 200 180 160 140 120 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Swiss rain, quadratic TS Swiss rainfall, quadratic trend surface 250 200 150 100 50 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 17 / 63 Splines: more flexible regression functions Concepts only. Details require a lot of intricate math and stat theory Consider response Y and one predictor X Sometimes a simple model is great 4.5 ● ● ● ● 4.0 ● ● ● ● ● ● ● ● ● y 3.5 ● ● ● ● ● ● ● ● ● 3.0 ● ● ● 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 2.0 ● 0 1 2 3 4 5 x c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 18 / 63 Splines: more flexible regression functions And sometimes not 12 ● ● 10 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● 8 ●● ● ● ● ● y ● 6 ● ● ● ● ● 4 2 ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● 1 ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ●● ● ● ● 3 4 5 x c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 19 / 63 Splines: more flexible regression functions How to model relationship between Y and X, especially to predict? If subject-matter-based model, use it! Fit a higher order polynomial (quadratic, cubic) Non-parametric regression: smooth the data Various NP regression methods. Focus on smoothing splines Concept: put together many models for small pieces of the data Need to choose number of small pieces fewer pieces: smoother curve, closer to linear more pieces: wigglier curve, closer to data c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 20 / 63 Spline fits ● 12 Smoother Wigglier ● 10 ● ● ● ● ● ● ● 8 ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● y ● 6 ● ● ● ● ● 4 ● 2 ● ●● ● ● ● ● ● ● ● ● ● ●● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ●● ● ● ● 2 3 4 5 x c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 21 / 63 Spline fits How to choose how smooth/wiggly? not easy Too smooth is obviously bad Extemely wiggly effectively connects the dots also bad - predictions of new obs. are inaccurate overfitting the observed data treating “noise” as signal. One common solution: cross validation leave out an obs, fit a model, predict left obs. put back, leave out next obs, · · · right choice is the value that makes good preds. of all the left-out obs splines require more data than when you know the model c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 22 / 63 Spline fits ● 12 Spline, 8.87 df ● 10 ● ● ● ● ● ● ● 8 ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● y ● 6 ● ● ● ● ● 4 ● 2 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 0 ●● 1 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ●● ● ● ● 3 4 5 x c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 23 / 63 Spline fits Two ways to extend to spatial data Additive splines spline fn of X coordinate: describes pattern in X spline fn of Y coordinate: describes pattern in Y Add them together depends on axis directions. Assumes pattern along the axes c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 24 / 63 Swiss rainfall data ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● c Philip M. Dixon (Iowa State Univ.) [0,98.6] (98.6,197.2] (197.2,295.8] (295.8,394.4] (394.4,493] Spatial Data Analysis - Part 3 Spring 2016 25 / 63 Swiss rain, Spline fit: s(x) + s(y) Additive spline 350 300 250 200 150 100 50 0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 26 / 63 Spring 2016 27 / 63 Spline fits Two ways to extend to spatial data thin plate spline think of a sheet of paper or thin sheet of metal drape over the data, allow to wiggle models pattern in all directions simultaneously not dependent on axis directions requires much more data than additive splines c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Swiss rain, Spline fit: s(x,y) 2D thin plate spline 400 350 300 250 200 150 100 50 0 −50 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 28 / 63 Models for data IDW: no model trend surface: Z (s ) = β0 + f (s ) + ε all the spatial “action” is in f (s ) Have to choose form of f (s ) β1 X + β2 Y β1 X + β2 Y + β3 X 2 + β4 Y 2 + β5 XY s(X ) + s(Y ) s(X , Y ) given form of model, can easily estimate unknown parameters, e.g., β1 , β2 , or the parameters in s(). kriging: simple, ordinary Z (s ) = β0 + ε ε are correlated. nearby observations more so. all the spatial “action” is in the correlations universal kriging: Z (s ) c Philip M. Dixon (Iowa State Univ.) = f (s ) + ε, ε correlated Spatial Data Analysis - Part 3 Spring 2016 29 / 63 Kriging Original motivation: underground gold mining. gold content variable want to predict where highest gold content and / or average gold content in an area sample a small fraction of the rock face prediction problem: predict Z (ˆs ) at new locations given data Danie Krige: treat Z (s ) as spatially correlated collection of r.v.’s derive optimal predictor original paper: 1951, So. African mining journal procedure now known as kriging c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 30 / 63 Kriging Simplest setup of the problem: Z (s ) = β0 + ε, ε ∼ N(0, Σ), β0 , Σ known ¯ In words: observations are spatially correlated r.v.’s mean β0 known covariances (or correlations) between all pairs of obs. Σ, known Can derive: Kriging is the optimal linear predictor No other linear combination of the observations has a smaller variance predictions are weighted average of the obs. weights are functions of the spatial pattern When little spatial pattern, → regional average When strong spatial pattern, → local average weights can be > 1 or < 0 predictions can exceed range of observations c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 31 / 63 Multivariate normal distributions Usual 401 setup: Yi ∼ N(µi , σ 2 ) Mean, µi , for each observation may be constant, dependent on treatment, or unique to each observation (e.g., β0 + β1 Xi ) Variance same for each obs. (no subscript on σ 2 ) Independent observations Now, need to describe correlations among pairs of observations Yi ∼ N(µi , σ 2 ) is not sufficient Towards that goal: collect all the observations in a vector Y1 Y2 0 Y = . , Y = [Y1 Y2 · · · Yn ] .. Y Yn c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 32 / 63 Multivariate normal distributions 0 Collect the means into a vector: µ = [µ1 µ2 · · · µn ] Variance now becomes the variance-covariance matrix Consider 3 random variables, X, Y and Z VC matrix is a 3 x 3 matrix σX2 Σ = σXY σXZ σXY σY2 σYZ σXZ σYZ σZ2 When n values of Y , VC matrix is n x n c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 33 / 63 Variances and covariances Diagonal values are variances: σX2 = Var X = E (X − µX )2 Off diagonal values are covariances: σXY = Cov X , Y = E (X − µX )(Y − µY ) Cov X , Y = Cov Y , X , so VC matrix is symmetric variance X is covariance of X with itself: E (X − µX )2 = E (X − µX )(X − µX ) correlation between X and Y is Cov X , Y CorX , Y = √ Var X Var Y Cov = 0 means Cor = 0 In general, independent means Cov = 0. When observations have Multivariate normal distribution, Cov = 0 means independent c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 34 / 63 Multivariate normal distribution So can write VC matrix for “401” observations (independent, constant variance) 2 1 0 0 ··· σ 0 0 ··· 0 0 1 0 ··· 0 σ2 0 · · · 0 2 Σ= . . .. . . . .. = σ .. .. . . .. . . . .. . .. . . 0 0 0 · · · σ2 0 0 0 ··· I 0 0 .. . = σ2I 1 is the identity matrix. matrix equivalent of the scalar 1 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 35 / 63 Least squares regression, again Model: Yi = β0 + β1 Xi + εi Can write down (but won’t) equations to estimate β0 and β1 Do not generalize easily to more parameters, e.g., Zi = β0 + β1 Xi + β2 Yi + εi Can use matrices 1 1 X = 1 .. . to simplify everything X1 1 β0 + X1 β1 1 β0 + X2 β1 X2 β0 X3 , X β = 1 β0 + X3 β1 ,β = β1 .. ... . 1 Xn 1 β0 + Xn β1 Model: Y = Xβ + ε Can have many columns in c Philip M. Dixon (Iowa State Univ.) X , or just 1 Spatial Data Analysis - Part 3 Spring 2016 36 / 63 Least squares regression, again For any regression model: β̂ols = 0 X X −1 0 X Y ()−1 is the matrix inverse, equivalent of scalar reciprocal = I and X −1 X = I labeled OLS for ordinary least squares XX −1 will see another type of LS soon (for correlated obs) and Var β̂ols = σ 2 0 X X c Philip M. Dixon (Iowa State Univ.) −1 Spatial Data Analysis - Part 3 Spring 2016 37 / 63 Least squares regression, again Example: constant mean, Yi = µ + εi X = 1 1 1 .. . , β = [µ] 1 0 X X 0 = 1 1 = 1 + 1 + · · · = n, 0 0 X X −1 = 1/n 0 = 1 Y = Y1 + Y2 + · · · + Yn = ΣY −1 0 0 2 = X X X Y = ΣY /n and Var µ̂ = σ /n X Y so β̂ols c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 38 / 63 Kriging notation Use vectors and matrices to describe the data Z (s ), their means µ, and their variance-covariance matrix, Σ. 2 Z1 µ1 σ1 σ12 · · · σ1n Z2 µ2 σ12 σ 2 · · · σ2n 2 Z (s ) = . , µ = . , Σ = . .. .. .. .. .. .. . . . σ1n σ2n · · · σn2 Zn µn P(s 0 ) is a function that predicts Z (s 0 ) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 39 / 63 Kriging as making a good prediction What should we choose for P(s 0 )? Want a “good” prediction. Need to measure how good or how bad. In general, define a loss function,L(), that tells us how to measure good/bad. Kriging: use squared error loss L (Z (s 0 ), P(s 0 )) = (Z (s 0 ) − P(s 0 ))2 P(s 0 ) depends on the data, so L() is a random variable so define a good predictor as one that minimizes E L() That predictor is E Z (s 0 ) | Z (s ) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 40 / 63 Simple Kriging Kriging model: Z (s ) = µ(s) + ε(s) where ε(s) are correlated (spatial pattern) µ(s) is known, initially assume Σ is known can derive: 0 P(s 0 ) = µ(s 0 ) + σ Σ−1 (Z (s ) − µ(s)) , σ is vector of covariances: Cov (Z (s 0 ), Z (s )) Σ is the Var-Cov matrix of the observations Least Squares regression: same loss function Used to writing Ŷi = β̂0 + β̂1 Xi Algebra: same as Ŷi = Ȳ + σxy (σX2 )−1 (Xi − X̄ ) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 41 / 63 Simple Kriging This is the best predictor if best linear predictor if Z (s ) Z (s ) is Gaussian is not Gaussian Can also estimate prediction variance 0 σ 2 (s 0 ) = σ 2 − σ Σ−1 σ Looks a bit unusual: usually add variances 0 σ Σ−1 σ is large when prediction is highly correlated with nearby values Reduces uncertainty in the prediction c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 42 / 63 Simple Kriging 9 ●1210 11 11 10 2 9 1210 11 11 10 2 1 1 0 0 5 5 ● 7 1 6 c Philip M. Dixon (Iowa State Univ.) 6 1 3 7 3 Spatial Data Analysis - Part 3 Spring 2016 43 / 63 Spring 2016 44 / 63 Simple Kriging: example Understanding the prediction in a simple situation 1.0 Our population: constant mean. Our data: not same value because of random variation 9 12 10 11 11 10 0.6 0.8 2 1 0.4 0 5 6 3 0.0 0.2 7 1 0.0 c Philip M. Dixon (Iowa State Univ.) 0.2 0.4 0.6 Spatial Data Analysis - Part 3 0.8 1.0 Simple Kriging: example The prediction is: 0 P(s 0 ) = µ + σ Σ−1 (Z (s ) − µ) This is a weighted average of the deviations from the mean, µ P(s 0 ) = µ + w (Z (s ) − µ) where the weights, w, depend on the correlations: w 0 = σ Σ−1 Can rewrite as a weighted average of the n Z (s) values and the mean, µ P(s 0 ) = w Z (s ) + (1 − Σw )µ look at those weights for predictions at two observations: blue location: close to measured locations red location: distant from all measured locations c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 45 / 63 1.0 1.0 Simple Kriging: example 0.6 0.8 0.6 0 0 0 0 0 0 0.01 0 0.8 0.26●0.270 0.36 0.15 −0.03 0 0.01 0 0.4 0.4 0 0 0.03 ● 0.2 0 0 0 0.0 0.2 0.4 0.6 c Philip M. Dixon (Iowa State Univ.) 0.8 1.0 0.06 0.01 0.02 Ave 0.85 0.0 Ave −0.01 0.0 0.2 0 0 0.0 0.2 0.4 Spatial Data Analysis - Part 3 0.6 Spring 2016 0.8 1.0 46 / 63 Ordinary Kriging The problem with simple kriging is that µ(s 0 ) usually not known Ordinary Kriging: estimate µ̂(s 0 ) slightly different statistical properties no best linear predictor but O.K. is best linear unbiased predictor 0 P(s 0 ) = µ̂(s 0 ) + σ Σ−1 (Z (s ) − µ̂) , where µ̂(s 0 ) is estimated by GLS For Y = X β + ε, OLS: β̂ = (X X )−1 X Y 0 0 In general, GLS: β̂ = (X Σ−1 X )−1 X Σ−1 Y 0 0 To estimate µ̂, µ̂ = (1 Σ−1 1)−1 1 Σ−1 Z (s ) 0 0 Consequence of GLS is less weight on obs. correl. with others Picture on next slide c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 47 / 63 Spring 2016 48 / 63 1.0 Wts for GLS est of mean 0.6 0.4 Y coordinate 0.8 0.298 0.2 0.121 −0.002 −0.029 0.054 0.089 0.0 0.171 0.298 0.0 0.2 0.4 0.6 0.8 1.0 X coordinate c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 0 Useful insight: σ Σ−1 is a row vector, so P(s 0 ) = µ̂(s 0 ) + λ (Z (s ) − µ̂) values in λ depend on covariance btwn obs. values and covariance between prediction location and obs. values high for obs. close to prediction location values in λ may be negative, when obs. are “shadowed” Picture on next slide. Prediction variance: 0 0 σ 2 (s 0 ) = σ 2 − σ Σ−1 σ + (1 − 1 Σ−1 σ)2 0 1 Σ−1 1 S.K. prediction variance + addn. variance because est. µ. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 49 / 63 Spring 2016 50 / 63 1.0 Ordinary Kriging wts 0.6 0.4 Y coordinate 0.8 0.089 ● 0.133 0.092 0.566 0.2 −0.086 0.259 −0.061 0.0 −0.049 0.0 0.2 0.4 0.6 0.8 1.0 X coordinate c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Swiss rainfall Kriging predictions 400 300 200 100 0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 51 / 63 Swiss rainfall Kriging, shorter range correlation 500 400 300 200 100 0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 52 / 63 Swiss rainfall Kriging, less spatial dependence 300 250 200 150 100 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 53 / 63 Swiss rainfall Kriging, small spatial dependence 240 220 200 180 160 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 54 / 63 Swiss rainfall Kriging, almost no spatial dependence 199.4 199.2 199.0 198.8 198.6 198.4 198.2 198.0 197.8 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 55 / 63 Universal Kriging generalize O.K. to any regression model for local mean model: Z (s ) = X (s)β + ε(s) i.e. trend + random variation No unique decomposition Generally consider trend as fixed, repeatable, pattern and random variation to be non-repeatable pattern Measure Z at 50 spatial locations. What is the sample size? c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 56 / 63 Universal Kriging generalize O.K. to any regression model model: Z (s ) = X (s)β + ε(s) i.e. trend + random variation No unique decomposition Generally consider trend as fixed, repeatable, pattern and random variation to be non-repeatable pattern Measure Z at 50 spatial locations. Q: What is the sample size? A: ONE. You have one realization of that spatial pattern Makes it very difficult to distinguish fixed and random components Operationally: trend is the variability that can be predicted by random variation is that which can not Choice of X (s ) X (s ) is really important! c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 57 / 63 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 58 / 63 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 59 / 63 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 60 / 63 Universal Kriging notice that spatial variation accounts for lack of fit to trend model Two competing explanations Defer discussion until we talk about spatial linear models Should be able to anticipate the predictor: P(s 0 ) = X (s)β̂ GLS + λ Z (s ) − X (s)β̂ GLS and the prediction variance: 0 σ 2 (s 0 ) = σ 2 − σ Σ−1 σ + term for Var the term for Var X (s)β̂ X (s)β̂ is complicated, not too informative 0 0 β̂ GLS = (X (s) Σ−1 X (s))−1 X (s) Σ−1 Z (s ) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 61 / 63 Comparison of spatial prediction methods Inverse distance weighting more weight to nearby locs wts relative to other nearby locs if no other nearby locs, will still average the more distant locs no easy way to estimate uncertainty in prediction Trend surfaces depend on specified model form model is a global model splines based on global est of smoothing param. although there are local extensions estimate doesn’t depend on number of nearby locs c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 62 / 63 Comparison of spatial prediction methods Kriging: based on correlations among observations estimated from global properties big advantage: estimate depends on number of nearby locs nearby points: prediction more like the local ave. no nearby points: prediction more like the global ave. and data determines how smooth theory: best predictor my experience: not compelling because assumptions never met c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 2016 63 / 63