Spatial-temporal data mining on data with short observation

advertisement
SPATIAL-TEMPORAL DATA MINING WITH SHORT OBSERVATION HISTORY
Dragoljub Pokrajac, Zoran Obradovic, Temple University, Philadelphia
pokie@ist.temple.edu, zoran@ist.temple.edu
Goal
 Spatial-temporal
Model Stationarity
2. Response Modeling
The
proposed technique outperforms non-spatial
regression and approaches optimal accuracy
Stationarity
criterion can be derived
from the theory of 3-dimensional filters
prediction of
continuous
For p=1
(process dependent on one
step in past), model is stationary iff
– Attributes
– Response Variable
2
 ,  0 ,360 M     cosk

L
0
1
Optimal prediction
Proposed method
Non-spatial regression
Time instance t-1
2
  L L

 
sin k1  l 2 1 k , l   1
1  l 2 1 k , l   

  k  L l  L

L
0
2
k  L l  L
Applications

Stationarity
can be determined from
a stationarity plot
Time instance t
0

– Meteorology
– Geo-sciences
 Bio-medical
Two-Stage Modeling
Covariance Structure
applications
Ordinary
k
L
L
 c
j 1 m '  L n '  L
Challenges
p
L
FF , k  j
L
  c

j  k 1 m '  L n '  L
Variance
 Existing
techniques for spatialtemporal modeling
m  m' , n  n' j m' , n'
FF , j  k
k  1,..., p
Spatial-temporal
  

    
2
p
L
L

  p L L

1     cosk 1  l 2  j 3  j k , l       sin k 1  l 2  j 3  j k , l 
j 1 k   Ll   L

  j 1 k   Ll   L

Variance
L
2
d 1 d 2 d 3
Model Estimation
autogressive
models of attributes
–Computational complexity
Least-squares
–Neighborhood matrix
 2L1,2L1
f t m, n  
 Attribute
prediction on spatialtemporal uniform grid
p
L
L

Homogeneity
f t  j m  k , n  l ˆ j k , l ,
p
L
Least-squares
future attribute value using
L
 
j 1 k   L l   L
Linear
f Nt 1 j m  k , n  l ˆ j k , l 
accuracy measured using
coefficient of determination R2
Agricultural
(t-1) 
t
Proposed
at each spatial location
depends on:
 Spatial-Temporal
auto-regressive
process on a Uniform Grid (STUG)
t j
m  k , n  l  j k , l   aSTUG,t m, n
j 1 k   L l   L
error
L
L

k  L l  L

at m, n ~ N 0, a2
technique evaluated on synthetic
 
–Parameters of y x
 t,i
R 1
2
–Neighborhood matrix W
0.5
0
sa
pH
2
p
Fall
2
1
Fe
Summer
Cu

p
ce
B
Attribute
Season
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
Temporal attributes missing
0
3
No missing attributes
4
Time layer
Temporal attributes missing
0
3
No missing attributes
4
Time layer
5
5
Open Problems
Development
Derivation
of maximum-likelihood estimation
techniques to improve model accuracy
Determination
–Number of samples per spatial layer
–Variance of random components 2
Analysis
Measured:
S
K
R2
of a close-form expression of STUG
model variance as a function of model parameters
Zn
1
1
and comparison of techniques for
model identification
Varied:
Mn
at m  k , n  l 0 k , l 
Estimation
data
results
P
R2
Evaluation
technique can provide useful
N
aSTUG,t m, n  
accuracy measured using coefficient
of determination R2
Predict 1998 attribute based on
measurements from 1995-1997
 Attribute
 Model
attributes sampled through the year
Ordinary non-linear regression
Proposed method
Ordinary linear regression
Proposed method
y xt,i 
Prediction
Goal:
Sampling
instant
– Samples from the same location
– Samples from its spatial neighborhood
taken in recent history
data from 4 years INEEL
study
12
the proposed technique, accuracy of ordinary
linear and non-linear models significantly improved
The technique is particularly useful when temporal
attributes (e.g. Nitrogen, Phosphorus,…) are missing
estimation
–Less-demanding one-step algorithm
–Ordinary regression at time layers t-1 and t
–Estimation of residuals ut and ut-1
–Estimation of W through regression of ut on ut-1
Experiments on Real-Life Data
p=2
Using
y xt,i 
Non-linear
Prediction
(t-2) 
observed sample
1st order neighborhood
2nd order neighborhood
other sampling locations
–Iterative Gauss-Newton algorithm
m  L,..., N x  L  1; n  L,..., N y  L  1
L
Data from one temporal layer
Parameter Estimation
Forecasting
fˆNt 1 m, n  
Nitrogen
imposed
–Weights depend only on
distance, not on the position
–Computational complexity O( p3L6 )
L
Phosphorus
Nitrogen
t  p  1,..., N t ; m  L,.., N x  L  1; n  L,.., N y  L  1
Predict
Phosphorus
estimation
j 1 k   L l   L
– Spatial sampling interval 
– Temporal sampling interval 
– Spatial order L
– Temporal order p
Potassium
Potassium
W  wi , j
O(p2L6 )
–Solving regression system
1. Attribute Modeling
Profile
Profile Curvature
Curvature
of residuals limited
to spatial neighborhood
–Solving the Yule-Walker equations
response
modeling with correlated residuals
Slope
Slope
Influence
Yule-Walker estimation
 Spatial-temporal

Wheat
yield
Wheat yield
Attributes
 Spatial-temporal

agriculture data
ε t  εt ,1εt ,2 ...εt ,n T , εt ,i ~ N (0,  2 )
T
k   Ll   L
Solution
residual regression
u t  ut ,1ut , 2 ...ut ,n 
L
 n2   a2   0 k , l 2
Neighborhood matrix
–5 attributes and the response on 10*10m2 grid
–5 temporal layers, each with 6561 samples
Neighborhood matrix
Correlated residuals
Uncorrelated errors
of model error
W3
Simulated
u t  Wu t 1  ε t
2
0
W2
1
Response
– Not suitable for short observation
history
– Do not involve non-linear modeling
– Have difficulties with missing attributes
 a2
8 3
 L L cosk  l  k , l    L L sin k  l  k , l 

 

1
2
0
1
2
0
 k   Ll   L
  k   Ll   L

0
Experiments on Realistic Data
Non-spatial regression
model
of STUG process
2
 2f 
regression
n   L,..., L
0  0.2
0 
0 

0 
0 0.2 
of estimated coefficients decreases with the
number of samples, but the estimation remains biased
m   L,..., L
m'm, n'n  j m' , n'
0
0 0 0
0 0.15 0
0 0 0
Variance
yt ,i  y xt,i   ut ,i , i  1, n
Yule-Walker equations
cFF ,k m, n   
L
2
10
– Tumor growth prediction
L

0
 0.2 0 
 0
W 3   0.2 0.4  0.2
0.2
0 
 0
W1
0.01
0.1

 Remote-sensing
p
 0.2
 0

W2   0

 0
 0.2

0.5
– Crop yield prediction
– Treatment recommendation
  f
 0.2 0  0.2
W 1   0 0.1 0 
 0.2 0  0.2
1
R2
 Agriculture
f t m, n  
Main Properties of Estimator
–Prediction accuracy R2
–Bias and variance of estimated parameters
of STUG models stationarity with temporal
order p>1
References
[1] D.Pokrajac, Z.Obradovic, Spatial-Temporal Autoregressive Model on Uniform
Sampling Grid, Technical Report CIS TR 2001-05, Temple University, 2001.
[2] D.Pokrajac, Z.Obradovic, “Improved Spatial-Temporal Forecasting through Modeling
of Spatial Residuals in Recent History,” First SIAM Int’l Conf. on Data Mining, 2001.
Download