Lecture #5: Small geographic area estimation, kriging, and kernel smoothing

advertisement
Lecture #5:
MAPS WITH GAPS--
Small geographic area
estimation, kriging,
and kernel smoothing
Spatial statistics in practice
Center for Tropical Ecology and Biodiversity,
Tunghai University & Fushan Botanical Garden
Topics for today’s lecture
•
•
•
•
The E-M algorithm
The spatial E-M algorithm
Kriging in ArcGIS
geographically weighted
regression (GWR)
• approaches to map smoothing
THEOREM 1
When missing values occur only in a response
variable, Y, then the iterative solution to the EM
algorithm produces the regression coefficients
calculated with only the complete data.
PF: Let b denote the vector of regression coefficients
that is converged upon. Then if Yˆ m  X m b ,
1
 Xo   Xo   Xo T  Yo 
 
 
 

b  
 Xm   Xm   Xm   Xmb 
T
T
-1
T
T
 ( Xo Xo  Xm Xm ) ( Xo Yo  Xm Xmb)
T
 ( XTo Xo )1 XTo Yo  bo
THEOREM 2
When missing values occur only
in a response variable, Y, then
by replacing the missing values with zeroes and introducing a binary 0/-1 indicator variable covariate -Im for
each missing value m, such that Im is 0 for all but
missing value observation m and 1 for missing value
observation m, the estimated regression coefficient bm
is equivalent to the point estimate for a new observation, and hence furnishes EM algorithm imputations.
PF:Let bm denote the vector of regression coefficients
for the missing values, and partition the data matrices
such that  b   X 0   X 0   X 0   Y 
1
T
 o    o
 b m   X m

 I mm 
om
 o
 Xm
  o
 I mm   X m

om
T

 I mm 
om
 o 
 0m 
 ( XTo Xo ) 1
 XTo Yo 
( XTo Xo ) 1 XTm


 
T
1
T
1 T 
I mm  X m ( Xo Xo ) X m  0m 
 X m ( Xo Xo )
 b o  ( XTo Xo ) 1 XTo Yo , and
b m  X mb o ,
The EM algorithm solution
M
Y  Xβ   y m I m  ε
m 1
 Yo   1o
   
 0 m   1m
X o  α   0 o,m 
 εo 
y m    
   
X m  β    I m, m 
 0m 
where:
the missing values are replaced by 0 in Y, and
Im is an indicator variable for missing value m
that contains n-m 0s and a single 1
For imputations computed
THEOREM 3 based upon Theorem 2,
each standard error of the estimated regression
coefficients bm is equivalent to the conventional
standard deviation used to construct a
prediction interval for a new observation, and as
such furnishes the corresponding EM algorithm
imputation standard error.
PF:
 X o

 X m
0 om
 I mm



T
 Xo

 Xm
1
0 om
 I mm
 2  ( X To X o ) 1
 σ̂ ε  
T
1

 X m (X o X o )
 sbm 
I mm
 2
( X To X o ) 1 X Tm
σ̂
T
1
T  ε
 X m (X o X o ) X m 
[I mm  X m ( X To X o ) 1 X Tm ]
diag
σ̂ ε2
What is the set of equations for the
following case?
M
Y  1α   y m I m  ε
m 1
10
7
7
y4 = ?
10  8 
7  8 
0
0
 2
 1
 8 
 8 
0
y4
 1
 0
7
0
Some preliminary assessments
Calculations from ANCOVA regression and the EM algorithm
Data Source
quantity
Reported value
OLS/NLS estimate
McLachlan & Krishnan (1997)
p. 49
14.61523
14.61523
̂ 2
20.88516
208.85156/10
̂12
26.75405
230.875/8 +
̂ 22
0.519532(402/10384/8) = 26.75407
ŷ(1,1)
p. 53
429.6978
429.69767
ŷ(0,1)
324.0233
324.02326
p. 54
4.73
4.73030
ŷ 23
ŷ 51
Little & Rubin (1987)
p. 31
u1 estimate
u2 estimate
p. 101
̂ 2
p. 118
ˆ (x )
1
ˆ (x 2 )
ˆ (x 4 )
3.598
3.59697
7.8549
7.9206
49.3333
6.655
7.85492
7.92063
49.33333
6.65518
49.965
49.96524
27.047
27.03739
Calculations from ANCOVA regression and the EM algorithm
Data Source
quantity
Reported value
OLS/NLS estimate
Schafer (1997)
p. 43
48.1000
48.10000
 ()
 ( )
p. 195
simulations
59.4260
59.42600
ŷ 3, 2 average (n=5)
226.2
228.0 (se = 32.86)
ŷ 3, 4 average (n=5)
146.8
146.2 (se = 38.37)
ŷ 3, 5 average (n=5)
190.8
192.5 (se = 34.11)
ŷ 3,10 average (n=5)
250.2
271.7 (se = 36.20)
ŷ 3,13 average (n=5)
234.2
241.3 (se = 35.18)
ŷ 3,16 average (n=5)
269.2
269.9 (se = 34.53)
ŷ 3,18 average (n=5)
192.4
201.9 (se = 32.91)
ŷ 3, 23 average (n=5)
215.6
207.4 (se = 33.09)
ŷ 3, 25 average (n=5)
250.0
255.7 (se = 33.39)
Ŷreported  0.044  0.987Yimputed ; R  0.99
2
simulated
imputations
EM algorithm solution for
aggregated georeferenced
data: vandalized turnips plots
MTB > regress c4 8 c7-c14
Regression Analysis: C4 versus C7, C8, C9, C10, C11, C12, C13, C14
The regression equation is
C4 = 28.9 - 6.32 C7 - 18.2 C8 - 1.10 C9 - 11.4 C10
- 10.1 C11 + 28.9 C12 + 18.8 C13 + 27.8 C14
Predictor
Constant
C7 [I1-I6]
C8 [I2-I6]
C9 [I3-I6]
C10 [I4-I6]
C11 [I5-I6]
Coef SE Coef
28.900 2.404
-6.317 3.254
-18.200 3.254
-1.100 3.399
-11.400 3.254
-10.100 3.399
C12 [plot(6,5)] 28.900
C13 [plot(5,6)] 18.800
C14 [plot(6,6)] 27.800
5.887
5.887
5.887
T
12.02
-1.94
-5.59
-0.32
-3.50
-2.97
P
0.000
0.063
0.000
0.749
0.002
0.006
4.91
3.19
4.72
0.000
0.004
0.000
Analysis of Variance for C4
Source
DF
SS
MS
C5
5
1289.0
257.8
Error
27
779.9
28.9
Total
32
2068.9
Level
1
2
3
4
5
6
N
5
6
6
5
6
5
Mean
28.900
22.583
10.700
27.800
17.500
18.800
Pooled StDev =
StDev
4.407
6.391
2.585
5.082
6.648
5.922
5.375
F
8.92
P
0.000
Individual 95% CIs For Mean
Based on Pooled StDev
---+---------+---------+---------+--(-----*-----)
(----*-----)
(----*-----)
(-----*-----)
(-----*-----)
(------*-----)
---+---------+---------+---------+--8.0
16.0
24.0
32.0
Residual spatial autocorrelation
What does this mean?
SAR-based missing data estimation
Y  ρWY  (I  ρW ) Xβ 
M
y
(

I

ρ
W
)

ε
 m m om
*
m 1
where ym is a missing value (replaced by 0 in Y),
Im is an indicator variable for ym, and
*
Wom is the mth column of geographic weights
matrix W
The Jacobian term
 Voo
2
J  det 
 Vmo
n
Vom  
 
V mm  

1
nm
2

[ LN(1  ρλ i )   LN(1  ρωk )]
n - n m i 1
k 1
NOTE: denominator becomes (n-nm)
What is the set of equations for the
following case?
7
Y2 = ?
10
M
Y  ρWY  (I  ρW)1μ   y m (I m  ρWom )  ε
*
m 1
ρ̂  0
y1  y3
0  ρ̂
2
10 
ρ̂  0
7

 μ̂(1  ρ̂)  ρ̂y 2
 e1
 μ̂(1  ρ̂) 
y2
 e2
 μ̂(1  ρ̂)  ρ̂y 2
 e3
spatial autoregressive (AR)
 Woo
 Yo 
   ρ
 0m 
 Wmo
Wom  Yo   0 o 

  
Ym  
Wmm  Ym    I m 
 β1 
 1o   X o    ε 
     
(1 - ρ)α    
 1m   X m  β   0 
 k
kriging
1
ˆ
ˆ
ˆ
Ym  X mβ  Σ mo Σ oo (Yo  X o βˆ )
estimate with
semivariogram model
fit semivariogram model with
The pure spatial autocorrelation
CAR model
-1
ˆ
Ym  1m β̂0  ρ̂(I  ρ̂Cmm ) Cmo (Yo  1o β̂0 )
NOTE: exactly the same algebraic
structure as the kriging equation
Dispersed missing values:
ˆ  1 β̂  ρ̂C (Y  1 β̂ )
Y
m
m 0
mo
o
o 0
Imputation = the observed mean plus a
weighted average of the surrounding residuals
Employing rook’s adjacency and a
CAR model, what is the equation for
the following imputation?
10
3
7
6
y5 = ?
4
9
5
5
ŷ5  b 0  ρ̂[(3  b 0 )  (6  b 0 ) 
(4  b 0 )  (5  b 0 )]
The spatial filter EM algorithm
solution
M
Y  Xβ X   y m I m  E k β E k  ε
m 1
where:
the missing values are replaced by 0 in Y, and
Im is an indicator variable for missing value m
that contains n-m 0s and a single 1
Imputation
of turnip
production
in 3
vandalized
field plots
Field
plot
Conventional EM
estimate
Spatial SAR- Spatial filter: 3
EM estimate
selected
ρ̂ SAR = 0.443
eigenvectors
29.99
24.31
(6,5)
28.9
(5,6)
18.8
17.66
13.62
(6,6)
27.8
28.26
23.93
Cressie’s PA coal ash
model
Cressie
min mean max
7.00 9.78 17.61
estimate
10.27%
Spherical
10.62%
Gaussian
10.18%
exponential
10.12%
SAR
10.17%
spatial filter
10.71%
Unconstrained and constrained missing value estimates for the Little and Rubin
(1987, p. 118) example
Variable &
Unconstrained Non-negative Non-negative &
Reported
observation
constraint
totals constraints
values
x1,10
12.9
12.8
15.9
21
x1,11
- 0.5
0.0
1.2
1
x1,12
10.0
10.0
13.0
11
x1,13
10.1
10.1
12.9
10
x2,10
65.8
66.0
59.0
47
x2,11
48.2
46.9
44.4
40
x2,12
68.1
68.1
61.4
66
x2,13
62.4
62.5
56.2
68
x4,7
0.8
0.8
6.7
6
x4,8
37.9
37.9
44.0
44
x4,9
20.0
20.0
24.4
22
x4,10
14.5
14.4
17.9
26
x4,11
20.8
21.6
29.4
34
x4,12
8.2
8.2
13.6
12
x4,13
14.5
15.4
20.0
12
Missing 1992 georeferenced density of milk
production in Puerto Rico: constrained (total = 1918)
Predicted from
1991 DMILK
235
1,339
344
Predicted from
spatial filter
70
1,848
0
Predicted from
both
385
1,065
468
predictions
Moran scatterplot
USDA-NASS
estimation of covariate
Pennsylvania
total
crop production constraints
map gaps
USDA-NASS estimation of
Michigan crop production
If this is
2% milk, how much
am I paying for the
other 98%?
different
response variable
specifications
Michigan
imputations
USDA-NASS estimation of
Tennessee crop production
Tennessee
imputations
An EM specification when some data
for both Y and the Xs are missing
 Yo   1o X o 
 0o , x , m 
 0o , y , m 

 
  Y  



 Yx , m   1x , m 0 x , m     I x , m X x , m Y   0 x , y , m Yy , m    Y
 0  1 X  Y   0

 I 
 y,m   y,m y,m 
 y, x ,m 
 y,m 
 X o   1o Yo 
 0 o, x ,m 
 0 o, y,m 

 
  X  



    I x , m X x , m    0 x , y , m Yy , m  X   X
 0 x , m   1x , m Yx , m 
 X  1 0   X   0

 I

 y,m   y,m y,m 
 y, x ,m 
 y,m 
Concatenation results:
 Yy , o 




 0y,o 
 X x ,o 
 0x ,o 
 0y,m 
 X   eq1   y   y  0    y  I X x , m     I Yy , m  


 x,m 
 x ,m 
 y,m 
 x ,o 
0 
 x,m 
  y,o 



  0y,m 
 Yy , o 
 0y,o 
 0x ,o 
  x 
Yy , m   
X x , m   
eq 2   x  x 





0
I

I

y
,
m
y
,
m
x
,
m
x
,
o

 







0 
 x,m 
Calculations from ANCOVA regression and the EM algorithm
Data Source
quantity
Reported value
OLS/NLS estimate
McLachlan & Krishnan (1997)
p. 91
5/2, 5/2, 0
5/2, 5/2, 0
saddle point: 11 ,  22 , ρ
maxima
8/3, 8/3,  0.5
2.87977, 2.87977,
 0.88817
Schafer (1997)
p. 54
1.80
18/10
̂11 = ̂ 22
-1
-1
̂
0
0
̂ 1 = ̂ 2
The spatial
model
p jr


C
n
pr
r


n  nm
1  pkr
n


y
y
 xy
k 1
x
 y[
wij ( yield j   y ) 
wij (
  y ) ]   y (1   y )   y [( area   xy ) 
a jr


Car nr
j 1
j  nm  1


1  a kr


k 1


n
n
xj
xj


 xy
 xy
 xy
y
y
wij ( area   xy ) ]  { y (1   y )   y [ xy   y
wij ( area   xy ) ]   y  

j
j
j 1
j 1




p jr
C pr nr


n  nm
1  pkr


n


k

1
y
y


 y[
wij ( yield j   y ) 
wij (
  y ) ]}  I 0
a jr


Car nr
j 1
j  nm  1


1  a kr


k 1


covariate
spatial autocorrelation




p jr
C pr nr


n  nm
1  pkr


n
y
y
 xy 
k 1
x
 [
wij ( yield j   y ) 
wij (
  y ) ]   y (1   y )   y [( area   xy ) 
a jr
 y

Car nr
j 1
j  nm  1


1  a kr


k 1


pir


C pr nr


1  pkr
n
xj


 xy
y
k 1
y
wij ( area   xy ) ]  [(
  y ) ]  I mr


a
j
Car nrir
j 1
totals


1

a


 kr
constraints
k 1


0



0


0



0




 ( yield   )  y

y


0

 acres

a
 ( area   a )


 
0


 production
p
(


)
p
area




0

power
transformation






0



0




a jr
C
nr


ar
n

n
n
1  a kr


m
acres j
k 1
x



 [
wij ( area j   a ) a 
wij (
  a ) a ]   a (1   a )   a [( area   xa ) xa  
a


area j
j 1
j  nm  1


n
n


xj
xj
 xa
 xa
 xa
a 

a
wij ( area j   xa ) ]  { a (1   a )   a [ xa   a
wij ( area j   xa ) ]   a 


j 1
j 1


a jr


Car nr


n  nm
n
1  a kr
acres j


k 1
a
a

[
w
(


)

w
(


)
]}

I


a
ij area j
a
ij
a
0
area j
j

1
j

n

1


m




a jr


C
n
r
ar


n  nm
n
1

a
 kr


acres j
k 1
x
a
a
 xa
 a[
wij ( area j   a ) 
wij (
  a ) ]   a (1   a )   a [( area   xa )  
area
j


j 1
j  nm  1


air


Car nr


n
1  a kr
x


j
k 1
 xa
a

w
(


)
]

[(


)
]

I
a
ij
xa
a
m
area


r
j
area
j 1


0



0











0

0
0

0

p jr
C pr nr

n  nm

n
1  pkr
production j
k 1
p


x
 [
wij ( area j   p ) 
wij (
  p ) p ]   p (1   p )   p [( area
  xp ) xp
p

area j
j 1
j  nm  1

n
n

xj
xj


xp


p
wij ( area j   xp ) ]  { p (1   p )   p [ xp xp   p
wij ( area j   xp ) xp ]   p  p 

j 1
j 1

p jr

C pr nr

n  nm
n
1  pkr

production j
k 1
p

 p[
wij ( area j   p ) 
wij (
  p ) p ]}  I 0

area j

j 1
j  nm  1


p jr

C pr nr

n  nm
n
1  pkr

production j
k 1
p


x
  p[
wij ( area j   p ) 
wij (
  p ) p ]   p (1   p )   p [( area
  xp ) xp
area j

j 1
j  nm  1

p

C pr nrir

n
1  pkr
xj

k 1
 xp

p
wij ( area j   xp ) ]  [(
  p ) p ]  I mr

area

j 1





















  yield





 residual yield 





 acres



 residualacres  ,





  production





residual

production 












Imputation
of turnip
production
in 3
vandalized
field plots
Field plot Spatial filter: 3 selected eigenvectors
(6,5)
24.31
(5,6)
(6,6)
13.62
23.93
Cross-validation of spatial filter
for observed turnip data
Kriging: best linear
unbiased spatial
interpolator (i.e.,
predictor)
The accompanying table
contains a test set of sixteen
random samples (#17-32)
used to evaluate three maps.
The “Actual” column lists the
measured values for the test
locations identified by “Col,
Row” coordinates. The
difference between these
values and those predicted
by the three interpolation
techniques form the
residuals shown in
parentheses. The “Average”
column compares the whole
field arithmetic mean of 23
(guess 23 everywhere) for
each test location.
ArcGIS: Geostatistical Wizard
density of
German
workers
anisotropy
check
Cross-validation check of krigged
values
This is one use of
the missing spatial data
imputation methods.
Unclipped krigged surface
values increase with darkness of brown
exponential semivariogram model
extrapolation
krigged (mean response) surface
prediction error surface
Clipped krigged surface
krigged (mean response) surface
values increase with darkness of brown
prediction error surface
Detrended population density across
China
anisotropy
check
Cross-validation check of krigged
values
This is one use of
the missing spatial data
imputation methods.
Unclipped krigged surface
values increase with darkness of brown
exponential semivariogram model
extrapolation
krigged (mean response) surface
prediction error surface
Clipped krigged surface
krigged (mean response) surface
values increase with darkness of brown
prediction error surface
THEOREM 4
The maximum likelihood estimate for
missing georeferenced values described by
a spatial autoregressive model specification
is equivalent to the best linear unbiased
predictor kriging equation of geostatistics.
Geographically weighted regression: GWR
Spatial filtering enables easier implementation of
GWR, as well as proper assessment of its dfs
•Step #1: compute the eigenvectors of a
geographic connectivity matrix, say C
•Step #2: compute all of the interactions terms
XjEk for the P covariates times the K candidate
eigenvectors (e.g., with MC > 0.25)
•Step #3: select from the total set, including the
individual eigenvectors, with stepwise
regression
• Step #4: the geographically varying intercept
term is given by:
K
a i  a   E i,k b E i,k
k 1
• Step #5: the geographically varying covariate
coefficient is given by factoring Xj out of its
appropriate selected interaction terms:

bi, j X j   b j   Ei,k b X jEi,k
k 1

K

X j

A Puerto Rico DEM example
Mean elevation (Y) is a function of: standard
deviation of elevation (X), eigenvectors E1E18, and 18 interaction terms (XE)
Results
intercept: 1, E2, E5-E7, E9, E11-E13, E15, E18
slope: 1, E4, E6, E9, E10
R2 increases from 0.576 (with X only) to 0.911
(with geographically varying coefficients)
P(S-W) = 0.52 for the final model
GWRspatial
filter
intercept
(MC =
0.692)
GWRspatial
filter
slope
(MC =
0.721)
Spatial moving averages
Local smoothing of attribute values
n
μ̂ i 
w
j1
n
ij
w
j1
yi
i  1, 2, ..., n
ij
where:
 wij is a spatial weights matrix
 yi is the attribute value for each areal unit
 n is the number of areal units
A summary: what have we learned
during the 5 lectures?
•
•
•
•
•
•
•
•
Lecture #1
The nature of data and its information content.
What is spatial autocorrelation?
Visualizing spatial autocorrelation: Moran scatterplots,
semivariogram plots, and maps.
Defining and articulating spatial structure: topology and
distance perspectives; contagion and hierarchy
concepts.
Necessary concepts from multivariate statistics.
An example of the elusive negative spatial
autocorrelation.
Some comments about spatial sampling.
Implications about space-time data structure.
•
•
•
•
•
Lecture #2
Multivariate grouping, and location-allocation
modeling.
Going from the global to the local: variability and
heterogeneity.
Impacts of spatial autocorrelation on histograms.
The LISA and Getis-Ord statistics.
Cluster analysis: multivariate analysis, cluster
detection, and spider diagrams.
– An overview of geographic and space-time clusters.
• Regression diagnostics and geographic clusters
•
•
•
•
•
•
Lecture #3
Autoregressive specifications and normal curve
theory (PROC NLIN).
Auto-binomial and auto-Poisson models: the
need for MCMC.
Relationships between spatial autoregressive
and geostatistical models
Spatial filtering specifications and linear and
generalized linear models (PROC GENMOD).
Autoregressive specifications and linear mixed
models (PROC MIXED).
Implications for space-time datasets (PROC
NLMIXED)
Lecture #4
• Frequentist versus Bayesian perspectives.
• Implementing random effects models in
GeoBUGS.
• Spatially structured and unstructured random
effects: the CAR, the ICAR, and the spatial filter
specifications
•
•
•
•
Lecture #5
The E-M algorithm
The spatial E-M algorithm
Kriging in ArcGIS
Approaches to map smoothing
Download