SPATIAL-TEMPORAL DATA MINING WITH SHORT OBSERVATION HISTORY Dragoljub Pokrajac, Zoran Obradovic, Temple University, Philadelphia pokie@ist.temple.edu, zoran@ist.temple.edu Goal Spatial-temporal Model Stationarity 2. Response Modeling The proposed technique outperforms non-spatial regression and approaches optimal accuracy Stationarity criterion can be derived from the theory of 3-dimensional filters prediction of continuous For p=1 (process dependent on one step in past), model is stationary iff – Attributes – Response Variable 2 , 0 ,360 M cosk L 0 1 Optimal prediction Proposed method Non-spatial regression Time instance t-1 2 L L sin k1 l 2 1 k , l 1 1 l 2 1 k , l k L l L L 0 2 k L l L Applications Stationarity can be determined from a stationarity plot Time instance t 0 – Meteorology – Geo-sciences Bio-medical Two-Stage Modeling Covariance Structure applications Ordinary k L L c j 1 m ' L n ' L Challenges p L FF , k j L c j k 1 m ' L n ' L Variance Existing techniques for spatialtemporal modeling m m' , n n' j m' , n' FF , j k k 1,..., p Spatial-temporal 2 p L L p L L 1 cosk 1 l 2 j 3 j k , l sin k 1 l 2 j 3 j k , l j 1 k Ll L j 1 k Ll L Variance L 2 d 1 d 2 d 3 Model Estimation autogressive models of attributes –Computational complexity Least-squares –Neighborhood matrix 2L1,2L1 f t m, n Attribute prediction on spatialtemporal uniform grid p L L Homogeneity f t j m k , n l ˆ j k , l , p L Least-squares future attribute value using L j 1 k L l L Linear f Nt 1 j m k , n l ˆ j k , l accuracy measured using coefficient of determination R2 Agricultural (t-1) t Proposed at each spatial location depends on: Spatial-Temporal auto-regressive process on a Uniform Grid (STUG) t j m k , n l j k , l aSTUG,t m, n j 1 k L l L error L L k L l L at m, n ~ N 0, a2 technique evaluated on synthetic –Parameters of y x t,i R 1 2 –Neighborhood matrix W 0.5 0 sa pH 2 p Fall 2 1 Fe Summer Cu p ce B Attribute Season 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 Temporal attributes missing 0 3 No missing attributes 4 Time layer Temporal attributes missing 0 3 No missing attributes 4 Time layer 5 5 Open Problems Development Derivation of maximum-likelihood estimation techniques to improve model accuracy Determination –Number of samples per spatial layer –Variance of random components 2 Analysis Measured: S K R2 of a close-form expression of STUG model variance as a function of model parameters Zn 1 1 and comparison of techniques for model identification Varied: Mn at m k , n l 0 k , l Estimation data results P R2 Evaluation technique can provide useful N aSTUG,t m, n accuracy measured using coefficient of determination R2 Predict 1998 attribute based on measurements from 1995-1997 Attribute Model attributes sampled through the year Ordinary non-linear regression Proposed method Ordinary linear regression Proposed method y xt,i Prediction Goal: Sampling instant – Samples from the same location – Samples from its spatial neighborhood taken in recent history data from 4 years INEEL study 12 the proposed technique, accuracy of ordinary linear and non-linear models significantly improved The technique is particularly useful when temporal attributes (e.g. Nitrogen, Phosphorus,…) are missing estimation –Less-demanding one-step algorithm –Ordinary regression at time layers t-1 and t –Estimation of residuals ut and ut-1 –Estimation of W through regression of ut on ut-1 Experiments on Real-Life Data p=2 Using y xt,i Non-linear Prediction (t-2) observed sample 1st order neighborhood 2nd order neighborhood other sampling locations –Iterative Gauss-Newton algorithm m L,..., N x L 1; n L,..., N y L 1 L Data from one temporal layer Parameter Estimation Forecasting fˆNt 1 m, n Nitrogen imposed –Weights depend only on distance, not on the position –Computational complexity O( p3L6 ) L Phosphorus Nitrogen t p 1,..., N t ; m L,.., N x L 1; n L,.., N y L 1 Predict Phosphorus estimation j 1 k L l L – Spatial sampling interval – Temporal sampling interval – Spatial order L – Temporal order p Potassium Potassium W wi , j O(p2L6 ) –Solving regression system 1. Attribute Modeling Profile Profile Curvature Curvature of residuals limited to spatial neighborhood –Solving the Yule-Walker equations response modeling with correlated residuals Slope Slope Influence Yule-Walker estimation Spatial-temporal Wheat yield Wheat yield Attributes Spatial-temporal agriculture data ε t εt ,1εt ,2 ...εt ,n T , εt ,i ~ N (0, 2 ) T k Ll L Solution residual regression u t ut ,1ut , 2 ...ut ,n L n2 a2 0 k , l 2 Neighborhood matrix –5 attributes and the response on 10*10m2 grid –5 temporal layers, each with 6561 samples Neighborhood matrix Correlated residuals Uncorrelated errors of model error W3 Simulated u t Wu t 1 ε t 2 0 W2 1 Response – Not suitable for short observation history – Do not involve non-linear modeling – Have difficulties with missing attributes a2 8 3 L L cosk l k , l L L sin k l k , l 1 2 0 1 2 0 k Ll L k Ll L 0 Experiments on Realistic Data Non-spatial regression model of STUG process 2 2f regression n L,..., L 0 0.2 0 0 0 0 0.2 of estimated coefficients decreases with the number of samples, but the estimation remains biased m L,..., L m'm, n'n j m' , n' 0 0 0 0 0 0.15 0 0 0 0 Variance yt ,i y xt,i ut ,i , i 1, n Yule-Walker equations cFF ,k m, n L 2 10 – Tumor growth prediction L 0 0.2 0 0 W 3 0.2 0.4 0.2 0.2 0 0 W1 0.01 0.1 Remote-sensing p 0.2 0 W2 0 0 0.2 0.5 – Crop yield prediction – Treatment recommendation f 0.2 0 0.2 W 1 0 0.1 0 0.2 0 0.2 1 R2 Agriculture f t m, n Main Properties of Estimator –Prediction accuracy R2 –Bias and variance of estimated parameters of STUG models stationarity with temporal order p>1 References [1] D.Pokrajac, Z.Obradovic, Spatial-Temporal Autoregressive Model on Uniform Sampling Grid, Technical Report CIS TR 2001-05, Temple University, 2001. [2] D.Pokrajac, Z.Obradovic, “Improved Spatial-Temporal Forecasting through Modeling of Spatial Residuals in Recent History,” First SIAM Int’l Conf. on Data Mining, 2001.