Chapter 5 Part B: Spatial Autocorrelation and regression modelling www.spatialanalysisonline.com Autocorrelation Time series correlation model {xt,1} t=1,2,3…n-1 and {xt,2} t=2,3,4…n 3rd edition www.spatialanalysisonline.com 2 Spatial Autocorrelation n Correlation coefficient r {xi} i=1,2,3…n, {yi} i=1,2,3…n x x y i y i i 1 n x i x 2 i 1 Time series correlation model n y i y 2 i 1 {xt,1} t=1,2,3…n-1 and {xt,2} t=2,3,4…n Mean values: Lag 1 autocorrelation: n 1 1 x.1 xt 1 large n n 1 t 1 x n n 1 x.2 xt n 1 t 2 3rd edition n 1 n x t t 1 www.spatialanalysisonline.com r1 x t x xt 1 x t 1 n 2 x x t t 1 3 Spatial Autocorrelation Classical statistical model assumptions Independence vs dependence in time and space Tobler’s first law: “All things are related, but nearby things are more related than distant things” Spatial dependence and autocorrelation Correlation and Correlograms 3rd edition www.spatialanalysisonline.com 4 Spatial Autocorrelation Covariance and autocovariance Lags – fixed or variable interval Correlograms and range Stationary and non-stationary patterns Outliers Extending concept to spatial domain Transects Neighbourhoods and distance-based models 3rd edition www.spatialanalysisonline.com 5 Spatial Autocorrelation Global spatial autocorrelation Dataset issues: regular grids; irregular lattice (zonal) datasets; point samples Simple binary coded regular grids – use of Joins counts Irregular grids and lattices – extension to x,y,z data representation Use of x,y,z model for point datasets Local spatial autocorrelation Disaggregating global models 3rd edition www.spatialanalysisonline.com 6 Spatial Autocorrelation Joins counts (50% 1’s) A. Completely separated pattern (+ve) 3rd edition B. Evenly spaced pattern (-ve) www.spatialanalysisonline.com C. Random pattern 7 Spatial Autocorrelation Joins count Binary coding Edge effects Double counting Free vs non-free sampling Expected values (free sampling) 1-1 = 15/60, 0-0 = 15/60, 0-1 or 1-0 = 30/60 3rd edition www.spatialanalysisonline.com 8 Spatial Autocorrelation Joins counts A. Completely separated (+ve) 3rd edition B. Evenly spaced (-ve) www.spatialanalysisonline.com C. Random 9 Spatial Autocorrelation Joins count – some issues Multiple z-scores Binary or k-class data Rook’s move vs other moves First order lag vs higher orders Equal vs unequal weights Regular grids vs other datasets Global vs local statistics Sensitivity to model components 3rd edition www.spatialanalysisonline.com 10 Spatial Autocorrelation Irregular lattice – (x,y,z) and adjacency tables Cell data Cell coordinates (row/col) x,y,z view +4.55 +5.54 1,1 1,2 1,3 x y z +2.24 -5.15 +9.02 2,1 2,2 2,3 1 2 4.55 +3.10 -4.39 -2.09 3,1 3,2 3,3 1 3 5.54 +0.46 -3.06 4,1 4,2 4,3 2 1 2.24 2 2 -5.15 2 3 9.02 3 1 3.1 3 2 -4.39 3 3 -2.09 4 2 0.46 4 3 -3.06 3 7 1 4 8 2 5 9 6 10 Cell numbering Adjacency matrix, total 1’s=26 3rd edition www.spatialanalysisonline.com 11 Spatial Autocorrelation “Spatial” (auto)correlation coefficient Coordinate (x,y,z) data representation for cells Spatial weights matrix (binary or other), W={wij} From last slide: Σ wij=26 Coefficient formulation – desirable properties Reflects co-variation patterns Reflects adjacency patterns via weights matrix Normalised for absolute cell values Normalised for data variation Adjusts for number of included cells in totals 3rd edition www.spatialanalysisonline.com 12 Spatial Autocorrelation Moran’s I w (z z)( z 1 I p (z z) p w / n, ij i i j j 2 i z) , w her e i ij i j hence p 26/10 for our 10 cell ex ample TSA model x x x x x t 1 t r.1 x t 2 t t 3rd edition www.spatialanalysisonline.com 13 Spatial Autocorrelation Moran I =10*16.19/(26*196.68)=0.0317 0 A. Computation of variance/covariance-like quantities, matrix C B. C*W: Adjustment by multiplication of the weighting matrix, W 3rd edition www.spatialanalysisonline.com 14 Spatial Autocorrelation w (z z)( z Moran’s I I 1 p (z z) ij i i j z) j 2 i , w her e p w i ij / n j i Modification for point data Replace weights matrix with distance bands, width h Pre-normalise z values by subtracting means Count number of other points in each band, N(h) z z I(h) N(h) z i i j j 2 i i 3rd edition www.spatialanalysisonline.com 15 Spatial Autocorrelation Moran I Correlogram Source data points 3rd edition Lag distance bands, h www.spatialanalysisonline.com Correlogram 16 Spatial Autocorrelation Geary C Co-variation model uses squared differences rather than products (z z ) w p2 C 1 p wij (zi z j )2 i 2 ij n 1 Similar approach is used in geostatistics 3rd edition www.spatialanalysisonline.com 17 Spatial Autocorrelation Extending SA concepts Distance formula weights vs bands Lattice models with more complex neighbourhoods and lag models (see GeoDa) Disaggregation of SA index computations (rowwise) with/without row standardisation (LISA) Significance testing Normal model Randomisation models Bonferroni/other corrections 3rd edition www.spatialanalysisonline.com 18 Regression modelling Simple regression – a statistical perspective One (or more) dependent (response) variables One or more independent (predictor) variables Linear regression is linear in coefficients: y 0 1x1 2 x2 3 x3 ..., or y xβ Vector/matrix form often used Over-determined equations & least squares 3rd edition www.spatialanalysisonline.com 19 Regression modelling Ordinary Least Squares (OLS) model yi 0 1x1i 2 x2i 3 x3i ... i , or y Xβ ε Minimise sum of squared errors (or residuals) Solved for coefficients by matrix expression: ˆ XX T β 3rd edition 1 ˆ) σ 2 XX T X T y var (β www.spatialanalysisonline.com 1 20 Regression modelling OLS – models and assumptions Model – simplicity and parsimony Model – over-determination, multi-collinearity and variance inflation Typical assumptions Data are independent random samples from an underlying population Model is valid and meaningful (in form and statistical) Errors are iid • Independent; No heteroskedasticity; common distribution Errors are distributed N(0,2) 3rd edition www.spatialanalysisonline.com 21 Regression modelling Spatial modelling and OLS Positive spatial autocorrelation is the norm, hence dependence between samples exists Datasets often non-Normal >> transformations may be required (Log, Box-Cox, Logistic) Samples are often clustered >> spatial declustering may be required Heteroskedasticity is common Spatial coordinates (x,y) may form part of the modelling process 3rd edition www.spatialanalysisonline.com 22 Regression modelling OLS vs GLS OLS assumes no co-variation Solution: ˆ XX T β 1 XT y GLS models co-variation: y~ N(,C) where C is a positive definite covariance matrix y=X+u where u is a vector of random variables (errors) with mean 0 and variance-covariance matrix C Solution: 3rd edition ˆ XC1X T β 1 T 1 X C y www.spatialanalysisonline.com ˆ X T C 1X T var(β) 1 23 Regression modelling GLS and spatial modelling y~ N(,C) where C is a positive definite covariance matrix (C must be invertible) C may be modelled by inverse distance weighting, contiguity (zone) based weighting, explicit covariance modelling… Other models Binary data – Logistic models Count data – Poisson models 3rd edition www.spatialanalysisonline.com 24 Regression modelling Choosing between models Information content perspective and AIC AIC 2 ln(L) 2k n AICc 2 ln(L) 2k n k 1 where n is the sample size, k is the number of parameters used in the model, and L is the likelihood function 3rd edition www.spatialanalysisonline.com 25 Regression modelling Some ‘regression’ terminology Simple linear Multiple Multivariate SAR CAR Logistic Poisson Ecological Hedonic Analysis of variance Analysis of covariance 3rd edition www.spatialanalysisonline.com 26 Regression modelling Spatial regression – trend surfaces and residuals (a form of ESDA) General model: y f (x 1, x 2 , w) y - observations, f( , , ) - some function, (x1,x2) - plane coordinates, w - attribute vector Linear trend surface plot Residuals plot 2nd and 3rd order polynomial regression Goodness of fit measures – coefficient of determination 3rd edition www.spatialanalysisonline.com 27 Regression modelling Regression & spatial autocorrelation (SA) Analyse the data for SA If SA ‘significant’ then Proceed and ignore SA, or Permit the coefficient, , to vary spatially (GWR), or Modify the regression model to incorporate the SA 3rd edition www.spatialanalysisonline.com 28 Regression modelling Regression & spatial autocorrelation (SA) Analyse the data for SA If SA ‘significant’ then Proceed and ignore SA, or Permit the coefficient, , to vary spatially (GWR) or Modify the regression model to incorporate the SA 3rd edition www.spatialanalysisonline.com 29 Regression modelling Geographically Weighted Regression (GWR) Coefficients, , allowed to vary spatially, (t) Model: y Xβ(t) ε Coefficients determined by examining neighbourhoods of points, t, using distance decay functions (fixed or adaptive bandwidths) Weighting matrix, W(t), defined for each point 1 T Solution: β( ˆ t) XW(t)X T X W(t)y GLS: 3rd edition ˆ XC1X T β 1 X T C 1y www.spatialanalysisonline.com 30 Regression modelling Geographically Weighted Regression Sensitivity – model, decay function, bandwidth, point/centroid selection ESDA – mapping of surface, residuals, parameters and SEs Significance testing Increased apparent explanation of variance Effective number of parameters AICc computations 3rd edition www.spatialanalysisonline.com 31 Regression modelling Geographically Weighted Regression Count data – GWPR use of offsets Fitting by ILSR methods Presence/Absence data – GWLR True binary data Computed binary data - use of re-coding, e.g. thresholding Fitting by ILSR methods 3rd edition www.spatialanalysisonline.com 32 Regression modelling Regression & spatial autocorrelation (SA) Analyse the data for SA If SA ‘significant’ then Proceed and ignore SA, or Permit the coefficient, , to vary spatially (GWR) or Modify the regression model to incorporate the SA 3rd edition www.spatialanalysisonline.com 33 Regression modelling Regression & spatial autocorrelation (SA) Modify the regression model to incorporate the SA, i.e. produce a Spatial Autoregressive model (SAR) Many approaches – including: SAR – e.g. pure spatial lag model, mixed model, spatial error model etc. CAR – a range of models that assume the expected value of the dependent variable is conditional on the (distance weighted) values of neighbouring points Spatial filtering – e.g. OLS on spatially filtered data 3rd edition www.spatialanalysisonline.com 34 Regression modelling SAR models Spatial weights matrix Pure spatial lag: y Wy ε Autoregression parameter Re-arranging: y (I W)1 ε MRSA model: y Xβ ρW y ε Linear regression added 3rd edition www.spatialanalysisonline.com 35 Regression modelling SAR models Linear regression + spatial error Spatial error model: y Xβ ε, where ε λWε u iid error vector Spatial weighted error vector Substituting and re-arranging: y Xβ W(y Xβ) u, or y Xβ Wy WXβ u Linear regression (global) iid error vector SAR lag 3rd edition www.spatialanalysisonline.com Local trend 36 Regression modelling CAR models Standard CAR model: Autoregression parameter E y i | all y j i i w y ij j j j i Expected value at i weighted mean for neighbourhood of i Local weights matrix – distance or contiguity Variance : var (y) (I W)1M Different models for W and M provide a range of CAR models 3rd edition www.spatialanalysisonline.com 37 Regression modelling Spatial filtering Apply a spatial filter to the data to remove SA effects Model the filtered data y Wy = Xβ WXβ + ε, or Example: y = Xβ + ε y I W = I W Xβ + ε, hence 1 y = Xβ + I W ε Spatial filter 3rd edition www.spatialanalysisonline.com 38