Spatial Statistics I RESM 575 Spring 2010 Lecture 7 Today Part A: Spatial statistics What are spatial measurements and statistics? Part B: Measuring geographic distributions Testing statistical significance Identifying patterns 2 And if we have time… Spatial statistics (continued) Defining spatial neighborhoods and weights Identifying clusters Using statistics with geographic data Analyzing geographic relationships 3 Part A: Spatial Statistics Spatial Measurements GIS analysis In many cases the map is the analysis Or, the GIS tools and methods to create new data was displayed to analyze and draw conclusions 5 Issues with traditional GIS analysis Making a map may not be enough to get the answers you need Trying to draw conclusions from a map is not always easy 6 Mapping questions How to classify features? How to symbolize features? How to show relationships? 7 New spatial statistic tools To describe the distribution of a set of features To discern patterns To measure relationship between features 8 Interdisciplinary contributions Geographers Regional scientists Ecologists Geologists Economists Wildlife biologists Sociologists 9 New spatial stat tools Rely on statistics to cut through the map display Get right at the patterns and relationships in the data Space or location is a fundamental component of the stats 10 New questions from spatial stats How sure am I that the pattern I am seeing isn’t due to some random occurrence? To what extent does the value of a feature depend on the values of surrounding features? How well does the value of one attribute predict the value of another? What are the trends in the data? 11 What we will use spatial stats to find 1. 2. 3. 4. How features are distributed What is the pattern created by the features Where are the clusters What are the relationships between sets of features or values 12 Spatial statistics Exploratory tools Help you measure processes, distributions, and relationships Spatial pattern and processes “results are statistically significant at the .05 level” 13 Understanding your data In general you can analyze features using location alone or using location influenced by an attribute value The type of attribute values will influence the statistical method you use 14 Geographic features Are either: Discrete or Spatially Continuous Spatially continuous surface 15 Geographic features Spatially continuous categorical data 16 Attribute values Include Nominal Ordinal Ranked data from hi to low (soil suitability scores) Interval Also referred to as categorical data Quantities, tells us relative magnitude Ratio Relationship between quantities 17 Choosing a spatial stat method Depends on the type of data you have For example: You can analyze the distribution of discrete data themselves or the distribution of an attribute associated with the features For spatially continuous data, you are interested in the distribution of the values 18 Frame the question In inferential stats the analysis is stated as a hypothesis Landslides in this area tend to occur more frequently on slopes over 30% To insure impartiality, we structure the analysis assuming the inverse of the hypothesis is true Landslides are equally likely to occur on any type of slope 19 Test the significance of the statistic The null hypothesis.. there is no pattern or relationship Significance tests help us decide whether we should or should not reject the null hypothesis Must also decide on confidence level ranging from 0 to 1, 80% sure (.20 confidence level), 95% sure (0.05 confidence level) that the clusters did not occur by chance 20 Question the results! Even if statistically significant, question them Consider: Scale you’re working at Where the study boundaries fall Type of data you are using Quality of the data How did you define proximity between features (straight line vs travel time on roads) 21 Part B: Spatial Statistics Measuring geographic distributions Testing statistical significance Identifying patterns Measuring geographic distributions Identify spatial characteristics of a distribution Where is the center? What feature is most central? How are features dispersed around the center? 23 ArcTools in ArcGIS 24 Where is the center? Mean Center tool Computes the average X and Y coordinates of all features Creates a new point feature 25 Example 26 Example 27 Mean center tool More common use: To compare distributions of different types of features or to find the center of features based on an attribute value 28 Example 29 Example 30 Example 31 What is the most central feature? Central feature tool Identifies the most centrally located feature Feature having the lowest total distance to all other features 32 Example 33 Example 34 What is the most central feature? More interestingly is by adding a weight in the analysis such as population We are now finding not just which site is most central but which is the most accessible to the greatest number of people. 35 Example 36 Measuring feature distribution Standard distance tool Measures distribution of features around the mean Result is a summary statistic representing distance If circle is large, incidents are widespread If small, incidents are more localized 37 Example 38 Distributional trends Directional distribution (standard ellipse) tool Identify spatial trends in the distribution of features Uses Compare distributions Examine different time periods Show compactness and orientation 39 Example 40 Testing statistical significance The next section of identifying patterns or later spatial relationships allows us to perform significance tests on the results before accepting them 41 Using significance tests with spatial data Spatial data contradicts some of the assumptions of inferential statistics You need to be aware of these limitations! 42 Assumptions Testing a random sample With GIS data in a database you may not know if the data was randomly sampled How large the sample is in relation to the population? Even with randomness assumed, spatial data often violates the independence of observations in a sample Spatial data is rarely evenly distributed across a region 43 For spatial pattern analysis… The null hypothesis is that features are evenly distributed across the study area Hard to imagine this being true You have to make one of two common sampling assumptions: randomization or normalization 44 Identifying Patterns Why study patterns? Range from completely clustered to completely dispersed 45 Identifying patterns (applications) Forestry applications USFS may measure the pattern of clear cuts to ensure sufficient contiguous forest habitat remaining Agency may allow a level of clustering of clear cuts and then make sure it is not exceeded Wildlife studies if population is dispersed then species can live in a wide range of habitats, if clustered then it has very specific habitat requirements 46 Goal for analyzing spatial patterns Are there underlying spatial processes influencing the locations of our features? Are our features randomly located throughout the study area, or are they displaying a clustering or dispersed pattern? 47 Two approaches for analyzing spatial patterns Global calculations Identifies overall patterns or trends in the data Effective for complex messy data Interested in broad overall results Work by comparing feature locations and/or attributes to a theoretical random distribution to determine if you have statistically significant clustering or dispersion 48 2nd approach Local calculations Identify the extent and locations of clustering Answer where do we have spatial clustering Process every feature within the context of its neighboring features in order to determine whether it represents a spatial outlier, or if part of a statistically significant spatial cluster 49 Are features clustered? Can be found with the Nearest Neighbor Index which does not require specifying an attribute If based on an attribute we can test for Spatial autocorrelation using the (Moran’s I) tool Things that are closer are more alike than things that are not Measures similarity of neighboring features Identifies if features are clustered or dispersed 50 Output for this tool Output gives a graphical display of results in four different ways Statistical #s Pictorial of #s Significance Sentence 51 Example Compare z score on two different years of data, if z score increases, then clustering is more intense 52 Locate the hot spots This is a local question that requires a hot spot analysis (Getis-Ord Gi*) tool Indicates the extent to which each feature is surrounded by similarly high or low values Where do features with similar attribute values cluster spatially together 53 Getis-Ord Gi* tool Identifies where clustering occurs in both high and low values Calculates a Z score for each feature High Z = hot spot (when a feature has a high value and it is surrounded by other features with high values) Low Z = cold spot (when we have features with low values surrounded by other features with low values) 54 Notes on the Z score Z score is a measure of standard deviation It is a reference value that’s associated with a standard normal distribution A very high or low Z score would be found in the tails 55 More notes on Z score A very high or low Z score means that the pattern deviates significantly from a hypothetical random pattern For example, when using a 95% CI, Z scores are 1.96 and +1.96 If Z is between these -1.96 and +1.96 you can’t reject the null You are seeing one version of a random pattern If very high or low (ie -2.5 or +5.4) you have a pattern that’s too unusual to be a pattern of random choice so we reject the null hypoth REMEMBER: The null hypothesis is that features are evenly distributed across the study area 56 Reference Mitchell, A. 2005. ESRI Guide to GIS Analysis, Volume 2. ESRI press, Redlands, CA. 57