Simulating spatial data We’ve talked about simulating point patterns Inference was via simulation Now consider simulating geostatistical and areal data Given a set of locations s , a model, and parameter values want to generate a set of values for Z (s ) Focus on values from normal distributions, want N(µ, σ 2 ) If Z independent, easy: generate calculate: σ Z + µ Z ∼ N(0, 1) If spatially correlated: want N(µ, Σ) a brute-force algorithm: calculate µ or µ(s) for each location if trend determine Σ from geostat model or equ’s for CAR/SAR calculate C = Cholesky square-root decomposition of Σ. simulate vector of standard normals, Z ∼ N(0, I ) 0 return Z (s ) = µ + C Z c Philip M. Dixon (Iowa State Univ.) C C =Σ Spatial Data Analysis - Part 14 0 Spring 2016 1/7 Why does this work? Mean: E Z (s ) = µ + C 0 Z =µ E Variance: Var Z (s ) = C 0 Var ZC = C IC = C C = Σ 0 Distribution: linear compbinations of normals Example: 3 2 1 1.75 Σ = 2 3 2 , C ≈ 0 1 2 3 0 0 are normal 1.15 0.58 1.29 1.03 0 1.26 Zs1 = 1.73Z1 Zs2 = 1.15Z1 + 1.29Z2 Zs3 = 0.58Z1 + 1.03Z2 + 1.26Z3 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 14 Spring 2016 2/7 Timing: k = 50, simulate 1000 sets separately: 7.18 sec simulate all 1000 simultaneously: 0.04 sec Difference is time req. to calculate the Cholesky Practical use: either calculate C once, do Z (s ) “by hand” or, simulate many sets, use as needed Cholesky algorithm fails if Σ is large, “turning bands” algorithm simulate a direction θk (will have many of these) simulate many X ’s in that direction (1D problem) for any Z (s), project Z (s) (in 2D) onto the line, record X at that projected location repeat for many (often 15) directions, average contributions from all directions The detail is relating the 2D covariance function for Z (s ) to the corresponding 1D covariance function for the line c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 14 Spring 2016 3/7 Conditional simulation Previous method generates a new data set mean and covariance functions given by the model same in original data and simulation but realization may look completely different from original data set original data may have a high region in top left corner simulation may have that in the middle, or lower right Conditional Simulation: Given values at obs. locations, simulate values at other points simulation of new values conditional on obs. values Why use this? visualize uncertainty calculate uncertainty in quantity calculated over entire area e.g. overall average or total, % area > some action level both assume exact measurements at obs. locations c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 14 Spring 2016 4/7 Conditional simulation “honors” observed values at {s c } want to simulate random values at {s n } the usual algorithm: calculated kriging predictions = Z ∗ (s n ) using values at Z (s c ) simulate unconditional random field at {s c } and {s n } = Z ◦ (s c ) and Z ◦ (s n ) calculate kriging predictions = Z † (s n ) using values at Z ◦ (s c ) return Z ∗ (s n ) + Z ◦ (s n ) − Z † (s n ) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 14 Spring 2016 5/7 Conditional simulation properties If s is a conditioning point (obs. value), i.e. one of the locations in the {s c } set, 1st kriging prediction: Z ∗ (s c ) = obs. value, Z (s c ) 2nd kriging prediction: Z † (s c ) = obs. value, Z ◦ (s c ) so returned value is obs. value, Z (s c ) If s n is far from any obs. loc, s c : Z ∗ (s n ) = µ and Z † (s n ) = µ so return the unconditional predictions, Z ◦ (s n ) Both behaviours for extreme situations “make sense” Usually only used for geostat data. With areal data, have an obs. value for all regions in study area c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 14 Spring 2016 6/7 Simulating areal data Usually much easier than geostat data: many fewer regions Use brute force algorithm used for small geostat data sets Given connection matrix and dependence parameter, calculate VC matrix Calculate Cholesky decomposition Simulate appropriate number of standard normals, multiply by Cholesky to the a realizaiton from the mvNormal All has been for mvNormal distributions simulating correlated non-normal distributions much harder and sometimes the distribution doesn’t exist c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 14 Spring 2016 7/7