Geostatistical data Notation: s: location, a vector value.

advertisement
Geostatistical data
Notation:
s:
location, a vector value.
Usually s = (x, y ) in some coordinate frame (e.g., longlat or UTM)
Written as a vector because details of 1D (beach, line), 2D (earth
surface), 3D (ocean, soil, atmosphere) not important
Z (s ): a characteristic of location s .
e.g., number of nesting turtles, soil pH, O2 concentration
Characteristics of geostatistical data
Z (s ) exists everywhere within boundary of study area
Generally, no sharp changes (jumps) in Z (s )
Z (s 1 ) probably different from Z (s 2 ), but transition is smooth
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
1 / 63
Prediction
Could be done to fill in a grid so can draw map or use image/contour
plot
Or, done because you need predictions at unmeasured points
3
3
1
X
1
2
What is Z at the location marked by X?
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
2 / 63
Prediction
One possibility: average Z over entire region
Very common in non-spatial contexts
1st law of geography (Tobler): everything is related to everything else
more closely related to nearby things
This principle is very important if there is a spatial trend or variation
across the region
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
3 / 63
Spring 2016
4 / 63
Prediction
What if had a bit more data:
3
20
9
3
1
X
2
10
1
15
7
10
Overall average for region clearly inappropriate
Consider some form of local average
We will discuss 3 methods:
Inverse distance weighting
Spatial trend model
Kriging
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Inverse distance weighting
Notation: s i is location of i’th observation
dij is distance between location i and location j
Z (s i ) is value of Z at location s i
Ẑ (s i ) is prediction of Z at location s i
Ẑ (s j ) =
wij
=
Σi wij Z (s i )
, where
Σi wij
1
dija
a is an arbitrary parameter, commonly 1 or 2
a = 0 gives you the region average
larger values → “more local” estimate, because emphasize shorter
distances
if dij = 0, i.e. predicting at an observed location, use observed value
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
5 / 63
Inverse distance weighting
Characteristics:
wij always ≥ 0 and wij /sum always ≤ 1
Sometimes set small values of wij to 0
Ẑ (s j ) always within range of observed values
Some like this; other’s don’t.
Problem: have to choose a. Some approaches:
Ad hoc (you like the resulting picture),
or tradition (your field always uses 2 or 1.5 or ??)
demonstrate role of a by comparing results for a = 2 and a = 1
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
6 / 63
Swiss rain, IDW, power=2
Swiss rainfall, IDW, power=2
400
300
200
100
0
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
7 / 63
Swiss rain, IDW, power=1
Swiss rainfall, IDW, power=1
350
300
250
200
150
100
50
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
8 / 63
Swiss rain, IDW, power=0
Swiss rainfall, IDW, power=0
199.0
198.8
198.6
198.4
198.2
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
9 / 63
Swiss rain, IDW, difference
Swiss rainfall, IDW, difference
100
50
0
−50
−100
−150
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
10 / 63
Spring 2016
11 / 63
400
Swiss rain, IDW, comparison
300
●
100
IDW, power=1
200
●● ●
●
●
●
●
●● ●
●
●
● ●● ●
● ●●
● ● ●●
●
● ●
●
●●
●●●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●●● ●●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●●●
●●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●●
●● ●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
● ●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
● ●● ●
●●●● ●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●● ●●●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●●
●
●
●● ●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●●
●
● ●
●
●●
● ●●
●
●
●●
●
● ●
●
●
●
●
0
●
0
c Philip M. Dixon (Iowa State Univ.)
100
200
IDW, power=2
300
Spatial Data Analysis - Part 3
400
Spatial trend surface
Assume some function of X and Y coordinates fits the data
often a low order polynomial (linear or quadratic)
Z (s i ) = β0 + β1 Xi + β2 Yi + εi , or
Z (s i ) = β0 + β1 Xi + β2 Yi + β3 Xi2 + β4 Yi2 + β5 Xi Yi + εi ,
only used for predicting Ẑ (s i )
not doing any test or inference, so don’t worry about correlation in ε’s
accounting for correlation, i.e. using GLS, will give better estimates of
β̂’s.
One potential advantage over inverse distance weighting:
Can estimate Var ε
Can construct prediction
intervals for Ẑ (s i ):
q
Ẑ (s i ) ± T1−α/2 s 2 + Var Ẑ (s i )
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
12 / 63
Spatial trend surface
Confidence and Prediction intervals
Reminder: can interpret a fitted regression line two ways
Predict average Y at some new X
Uncertainty only in regr. coefficients (“the line”)
If I collected a second set of n obs, how similar is Ŷ ?
Decreases as n increases
se of a mean, confidence interval for Ŷ
Predict a new observation at some new X
Uncertainty in both the line and obs around the line
Before sampling a new obs, how accurate is prediction?
Usually, very similar to s (rMSE), never smaller
sd of a predicted observation, prediction interval for Ŷ
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
13 / 63
Spring 2016
14 / 63
Spring 2016
15 / 63
Spring 2016
16 / 63
600
Confidence interval: Swiss rain
300
0
100
200
Rainfall
400
500
Fit
95% CI
−150000
−100000
−50000
0
50000
100000
150000
Easting
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
600
Prediction interval: Swiss rain
300
0
100
200
Rainfall
400
500
Fit
95% PI
−150000
−100000
−50000
0
50000
100000
150000
Easting
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Swiss rain, linear TS
Swiss rainfall, linear trend surface
300
280
260
240
220
200
180
160
140
120
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Swiss rain, quadratic TS
Swiss rainfall, quadratic trend surface
250
200
150
100
50
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
17 / 63
Splines: more flexible regression functions
Concepts only. Details require a lot of intricate math and stat theory
Consider response Y and one predictor X
Sometimes a simple model is great
4.5
●
●
●
●
4.0
●
●
●
●
● ● ●
●
●
y
3.5
●
●
● ●
●
●
●
●
●
3.0
●
●
●
2.5
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
2.0
●
0
1
2
3
4
5
x
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
18 / 63
Splines: more flexible regression functions
And sometimes not
12
●
●
10
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
8
●●
●
●
●
●
y
●
6
●
●
●
●
●
4
2
●
●●
●
●
●
● ●●
●
● ●
● ●
●● ● ●
●
●
●
●
●●
1
●
●
●
●
●
● ●
0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
2
●
●
●
●
●
●
●
●
●●
●
●
●
3
4
5
x
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
19 / 63
Splines: more flexible regression functions
How to model relationship between Y and X, especially to predict?
If subject-matter-based model, use it!
Fit a higher order polynomial (quadratic, cubic)
Non-parametric regression: smooth the data
Various NP regression methods. Focus on smoothing splines
Concept: put together many models for small pieces of the data
Need to choose number of small pieces
fewer pieces: smoother curve, closer to linear
more pieces: wigglier curve, closer to data
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
20 / 63
Spline fits
●
12
Smoother
Wigglier
●
10
●
●
●
●
●
●
●
8
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
y
●
6
●
●
●
●
●
4
●
2
●
●●
●
●
●
●
●
●
●
● ●
●●
0
●
●
●
●
●
●
●
●
●
●
● ●
● ●●
● ●
● ●●
●● ● ●
●
●
●
●
●
●
● ●●
●
●
●
●
●
1
●
●
●
●
●
●
●
●
●●
●
●
●
2
3
4
5
x
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
21 / 63
Spline fits
How to choose how smooth/wiggly?
not easy
Too smooth is obviously bad
Extemely wiggly effectively connects the dots
also bad - predictions of new obs. are inaccurate
overfitting the observed data
treating “noise” as signal.
One common solution: cross validation
leave out an obs, fit a model, predict left obs.
put back, leave out next obs, · · ·
right choice is the value that makes good preds. of all the left-out obs
splines require more data than when you know the model
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
22 / 63
Spline fits
●
12
Spline, 8.87 df
●
10
●
●
●
●
●
●
●
8
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
y
●
6
●
●
●
●
●
4
●
2
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
0
●●
1
●
●
●
●
●
●
●
● ●
● ●●
● ●
● ●●
●● ● ●
●
●
●
●
●
●
● ●●
●
●
●
●
●
2
●
●
●
●
●
●
●
●
●●
●
●
●
3
4
5
x
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
23 / 63
Spline fits
Two ways to extend to spatial data
Additive splines
spline fn of X coordinate: describes pattern in X
spline fn of Y coordinate: describes pattern in Y
Add them together
depends on axis directions. Assumes pattern along the axes
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
24 / 63
Swiss rainfall data
●●
●
●
●
●
● ●●
●●
●
●
●
●
● ●
●
●●
●
●
● ●
●● ● ●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
c Philip M. Dixon (Iowa State Univ.)
[0,98.6]
(98.6,197.2]
(197.2,295.8]
(295.8,394.4]
(394.4,493]
Spatial Data Analysis - Part 3
Spring 2016
25 / 63
Swiss rain, Spline fit: s(x) + s(y)
Additive spline
350
300
250
200
150
100
50
0
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
26 / 63
Spring 2016
27 / 63
Spline fits
Two ways to extend to spatial data
thin plate spline
think of a sheet of paper or thin sheet of metal
drape over the data, allow to wiggle
models pattern in all directions simultaneously
not dependent on axis directions
requires much more data than additive splines
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Swiss rain, Spline fit: s(x,y)
2D thin plate spline
400
350
300
250
200
150
100
50
0
−50
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
28 / 63
Models for data
IDW: no model
trend surface: Z (s ) = β0 + f (s ) + ε
all the spatial “action” is in f (s )
Have to choose form of f (s )
β1 X + β2 Y
β1 X + β2 Y + β3 X 2 + β4 Y 2 + β5 XY
s(X ) + s(Y )
s(X , Y )
given form of model, can easily estimate unknown parameters,
e.g., β1 , β2 , or the parameters in s().
kriging: simple, ordinary
Z (s )
= β0 + ε
ε are correlated. nearby observations more so.
all the spatial “action” is in the correlations
universal kriging:
Z (s )
c Philip M. Dixon (Iowa State Univ.)
= f (s ) + ε, ε correlated
Spatial Data Analysis - Part 3
Spring 2016
29 / 63
Kriging
Original motivation:
underground gold mining. gold content variable
want to predict where highest gold content
and / or average gold content in an area
sample a small fraction of the rock face
prediction problem: predict Z (ˆs ) at new locations given data
Danie Krige: treat
Z (s )
as spatially correlated collection of r.v.’s
derive optimal predictor
original paper: 1951, So. African mining journal
procedure now known as kriging
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
30 / 63
Kriging
Simplest setup of the problem:
Z (s ) = β0 + ε, ε ∼ N(0, Σ), β0 , Σ known
¯
In words:
observations are spatially correlated r.v.’s
mean β0 known
covariances (or correlations) between all pairs of obs. Σ, known
Can derive: Kriging is the optimal linear predictor
No other linear combination of the observations has a smaller variance
predictions are weighted average of the obs.
weights are functions of the spatial pattern
When little spatial pattern, → regional average
When strong spatial pattern, → local average
weights can be > 1 or < 0
predictions can exceed range of observations
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
31 / 63
Multivariate normal distributions
Usual 401 setup: Yi ∼ N(µi , σ 2 )
Mean, µi , for each observation may be constant, dependent on
treatment, or unique to each observation (e.g., β0 + β1 Xi )
Variance same for each obs. (no subscript on σ 2 )
Independent observations
Now, need to describe correlations among pairs of observations
Yi ∼ N(µi , σ 2 ) is not sufficient
Towards that goal: collect all the observations in a vector


Y1
 Y2 
0


Y =  . , Y = [Y1 Y2 · · · Yn ]
 .. 
Y
Yn
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
32 / 63
Multivariate normal distributions
0
Collect the means into a vector: µ = [µ1 µ2 · · · µn ]
Variance now becomes the variance-covariance matrix
Consider 3 random variables, X, Y and Z
VC matrix is a 3 x 3 matrix

σX2
Σ =  σXY
σXZ
σXY
σY2
σYZ

σXZ
σYZ 
σZ2
When n values of Y , VC matrix is n x n
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
33 / 63
Variances and covariances
Diagonal values are variances:
σX2 = Var X = E (X − µX )2
Off diagonal values are covariances:
σXY = Cov X , Y = E (X − µX )(Y − µY )
Cov X , Y = Cov Y , X , so VC matrix is symmetric
variance X is covariance of X with itself:
E (X − µX )2 = E (X − µX )(X − µX )
correlation between X and Y is
Cov X , Y
CorX , Y = √
Var X Var Y
Cov = 0 means Cor = 0
In general, independent means Cov = 0.
When observations have Multivariate normal distribution, Cov = 0
means independent
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
34 / 63
Multivariate normal distribution
So can write VC matrix for “401” observations
(independent, constant variance)


 2
1 0 0 ···
σ
0
0 ··· 0
 0 1 0 ···
 0 σ2 0 · · · 0 



2
Σ= .
.
.. . .
.
..  = σ  .. .. . .
 ..
 . .
. ..
. ..
.
. 
0 0
0 · · · σ2
0 0 0 ···
I
0
0
..
.



 = σ2I

1
is the identity matrix.
matrix equivalent of the scalar 1
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
35 / 63
Least squares regression, again
Model: Yi = β0 + β1 Xi + εi
Can write down (but won’t) equations to estimate β0 and β1
Do not generalize easily to more parameters, e.g.,
Zi = β0 + β1 Xi + β2 Yi + εi
Can use matrices

1
 1


X =  1
 ..
 .
to simplify everything


X1
1 β0 + X1 β1
 1 β0 + X2 β1
X2 


β0

X3 
, X β =  1 β0 + X3 β1
,β =
β1

..
... 


.
1 Xn
1 β0 + Xn β1
Model:
Y
= Xβ + ε
Can have many columns in
c Philip M. Dixon (Iowa State Univ.)







X , or just 1
Spatial Data Analysis - Part 3
Spring 2016
36 / 63
Least squares regression, again
For any regression model:
β̂ols =
0
X X
−1
0
X Y
()−1 is the matrix inverse, equivalent of scalar reciprocal
= I and X −1 X = I
labeled OLS for ordinary least squares
XX
−1
will see another type of LS soon (for correlated obs)
and Var β̂ols = σ 2
0
X X
c Philip M. Dixon (Iowa State Univ.)
−1
Spatial Data Analysis - Part 3
Spring 2016
37 / 63
Least squares regression, again
Example: constant mean, Yi = µ + εi

X



=


1
1
1
..
.




 , β = [µ]


1
0
X X
0
= 1 1 = 1 + 1 + · · · = n,
0
0
X X
−1
= 1/n
0
= 1 Y = Y1 + Y2 + · · · + Yn = ΣY
−1 0
0
2
= X X
X Y = ΣY /n and Var µ̂ = σ /n
X Y
so β̂ols
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
38 / 63
Kriging notation
Use vectors and matrices to describe the data Z (s ), their means µ,
and their variance-covariance matrix, Σ.


 2



Z1
µ1
σ1 σ12 · · · σ1n
 Z2 
 µ2 
 σ12 σ 2 · · · σ2n 
2






Z (s ) =  .  , µ =  .  , Σ = 
.
..
..
.. 
 ..
 .. 
 .. 
.
.
. 
σ1n σ2n · · · σn2
Zn
µn
P(s 0 ) is a function that predicts Z (s 0 )
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
39 / 63
Kriging as making a good prediction
What should we choose for P(s 0 )?
Want a “good” prediction. Need to measure how good or how bad.
In general, define a loss function,L(), that tells us how to measure
good/bad.
Kriging: use squared error loss
L (Z (s 0 ), P(s 0 )) = (Z (s 0 ) − P(s 0 ))2
P(s 0 ) depends on the data, so L() is a random variable
so define a good predictor as one that minimizes E L()
That predictor is E Z (s 0 ) | Z (s )
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
40 / 63
Simple Kriging
Kriging model: Z (s ) = µ(s) + ε(s)
where ε(s) are correlated (spatial pattern)
µ(s) is known, initially assume Σ is known
can derive:
0
P(s 0 ) = µ(s 0 ) + σ Σ−1 (Z (s ) − µ(s))
,
σ is vector of covariances: Cov (Z (s 0 ), Z (s ))
Σ is the Var-Cov matrix of the observations
Least Squares regression: same loss function
Used to writing Ŷi = β̂0 + β̂1 Xi
Algebra: same as Ŷi = Ȳ + σxy (σX2 )−1 (Xi − X̄ )
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
41 / 63
Simple Kriging
This is the best predictor if
best linear predictor if
Z (s )
Z (s )
is Gaussian
is not Gaussian
Can also estimate prediction variance
0
σ 2 (s 0 ) = σ 2 − σ Σ−1 σ
Looks a bit unusual: usually add variances
0
σ Σ−1 σ is large when prediction is highly correlated with nearby
values
Reduces uncertainty in the prediction
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
42 / 63
Simple Kriging
9 ●1210
11 11
10
2
9 1210
11 11
10
2
1
1
0
0
5
5
●
7
1
6
c Philip M. Dixon (Iowa State Univ.)
6
1
3
7
3
Spatial Data Analysis - Part 3
Spring 2016
43 / 63
Spring 2016
44 / 63
Simple Kriging: example
Understanding the prediction in a simple situation
1.0
Our population: constant mean.
Our data: not same value because of random variation
9 12 10
11 11
10
0.6
0.8
2
1
0.4
0
5
6
3
0.0
0.2
7
1
0.0
c Philip M. Dixon (Iowa State Univ.)
0.2
0.4
0.6
Spatial Data Analysis - Part 3
0.8
1.0
Simple Kriging: example
The prediction is:
0
P(s 0 ) = µ + σ Σ−1 (Z (s ) − µ)
This is a weighted average of the deviations from the mean, µ
P(s 0 ) = µ + w (Z (s ) − µ)
where the weights,
w,
depend on the correlations:
w
0
= σ Σ−1
Can rewrite as a weighted average of the n Z (s) values and the mean,
µ
P(s 0 ) = w Z (s ) + (1 − Σw )µ
look at those weights for predictions at two observations:
blue location: close to measured locations
red location: distant from all measured locations
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
45 / 63
1.0
1.0
Simple Kriging: example
0.6
0.8
0.6
0
0 0 0
0 0
0.01
0
0.8
0.26●0.270
0.36
0.15
−0.03
0
0.01
0
0.4
0.4
0
0
0.03
●
0.2
0
0
0
0.0
0.2
0.4
0.6
c Philip M. Dixon (Iowa State Univ.)
0.8
1.0
0.06
0.01
0.02
Ave 0.85
0.0
Ave −0.01
0.0
0.2
0
0
0.0
0.2
0.4
Spatial Data Analysis - Part 3
0.6
Spring 2016
0.8
1.0
46 / 63
Ordinary Kriging
The problem with simple kriging is that µ(s 0 ) usually not known
Ordinary Kriging: estimate µ̂(s 0 )
slightly different statistical properties
no best linear predictor
but O.K. is best linear unbiased predictor
0
P(s 0 ) = µ̂(s 0 ) + σ Σ−1 (Z (s ) − µ̂)
, where µ̂(s 0 ) is estimated by GLS
For Y = X β + ε, OLS: β̂ = (X X )−1 X Y
0
0
In general, GLS: β̂ = (X Σ−1 X )−1 X Σ−1 Y
0
0
To estimate µ̂, µ̂ = (1 Σ−1 1)−1 1 Σ−1 Z (s )
0
0
Consequence of GLS is less weight on obs. correl. with others
Picture on next slide
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
47 / 63
Spring 2016
48 / 63
1.0
Wts for GLS est of mean
0.6
0.4
Y coordinate
0.8
0.298
0.2
0.121
−0.002
−0.029
0.054
0.089
0.0
0.171
0.298
0.0
0.2
0.4
0.6
0.8
1.0
X coordinate
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
0
Useful insight: σ Σ−1 is a row vector, so
P(s 0 ) = µ̂(s 0 ) + λ (Z (s ) − µ̂)
values in λ depend on covariance btwn obs. values and covariance
between prediction location and obs. values
high for obs. close to prediction location
values in λ may be negative, when obs. are “shadowed”
Picture on next slide.
Prediction variance:
0
0
σ 2 (s 0 ) = σ 2 − σ Σ−1 σ +
(1 − 1 Σ−1 σ)2
0
1 Σ−1 1
S.K. prediction variance + addn. variance because est. µ.
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
49 / 63
Spring 2016
50 / 63
1.0
Ordinary Kriging wts
0.6
0.4
Y coordinate
0.8
0.089
●
0.133
0.092
0.566
0.2
−0.086
0.259
−0.061
0.0
−0.049
0.0
0.2
0.4
0.6
0.8
1.0
X coordinate
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Swiss rainfall
Kriging predictions
400
300
200
100
0
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
51 / 63
Swiss rainfall
Kriging, shorter range correlation
500
400
300
200
100
0
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
52 / 63
Swiss rainfall
Kriging, less spatial dependence
300
250
200
150
100
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
53 / 63
Swiss rainfall
Kriging, small spatial dependence
240
220
200
180
160
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
54 / 63
Swiss rainfall
Kriging, almost no spatial dependence
199.4
199.2
199.0
198.8
198.6
198.4
198.2
198.0
197.8
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
55 / 63
Universal Kriging
generalize O.K. to any regression model for local mean
model: Z (s ) = X (s)β + ε(s)
i.e. trend + random variation
No unique decomposition
Generally consider trend as fixed, repeatable, pattern
and random variation to be non-repeatable pattern
Measure Z at 50 spatial locations. What is the sample size?
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
56 / 63
Universal Kriging
generalize O.K. to any regression model
model: Z (s ) = X (s)β + ε(s)
i.e. trend + random variation
No unique decomposition
Generally consider trend as fixed, repeatable, pattern
and random variation to be non-repeatable pattern
Measure Z at 50 spatial locations. Q: What is the sample size?
A: ONE. You have one realization of that spatial pattern
Makes it very difficult to distinguish fixed and random components
Operationally:
trend is the variability that can be predicted by
random variation is that which can not
Choice of
X (s )
X (s )
is really important!
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
57 / 63
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
58 / 63
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
59 / 63
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
60 / 63
Universal Kriging
notice that spatial variation accounts for lack of fit to trend model
Two competing explanations
Defer discussion until we talk about spatial linear models
Should be able to anticipate the predictor:
P(s 0 ) = X (s)β̂ GLS + λ Z (s ) − X (s)β̂ GLS
and the prediction variance:
0
σ 2 (s 0 ) = σ 2 − σ Σ−1 σ + term for Var
the term for Var
X (s)β̂
X (s)β̂
is complicated, not too informative
0
0
β̂ GLS = (X (s) Σ−1 X (s))−1 X (s) Σ−1 Z (s )
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
61 / 63
Comparison of spatial prediction methods
Inverse distance weighting
more weight to nearby locs
wts relative to other nearby locs
if no other nearby locs, will still average the more distant locs
no easy way to estimate uncertainty in prediction
Trend surfaces
depend on specified model form
model is a global model
splines based on global est of smoothing param.
although there are local extensions
estimate doesn’t depend on number of nearby locs
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
62 / 63
Comparison of spatial prediction methods
Kriging:
based on correlations among observations
estimated from global properties
big advantage: estimate depends on number of nearby locs
nearby points: prediction more like the local ave.
no nearby points: prediction more like the global ave.
and data determines how smooth
theory: best predictor
my experience: not compelling because assumptions never met
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 3
Spring 2016
63 / 63
Download