Spatial Analysis Concepts and Challenges Briggs Henan University 2012 1 Topics • Description versus Analysis • The concepts of Process, Pattern and Analysis • Issues and challenges in spatial data analysis • Measuring space Briggs Henan University 2012 2 Description and Analysis Description • Most GIS systems are used by governments and private companies to describe the real world • this helps the organization “do its job” – For example, manage sewer and water networks – Manage land resources • Most GIS systems are primarily designed for this purpose – They are used to develop spatial databases to describe the real world and help manage it. Briggs Henan University 2012 3 Description and Analysis Is the locations of the software industry different Analysis from the telecommucations industry? • Tries to understand the processes which cause or create the patterns in the real world • Understanding processes: – Helps the organization do its job better • Make better decisions, for example – Helps us understand the phenomena itself • This is the role of science Here, we are using “centrographic statistics” to help answer this question Briggs Henan University 2012 4 Description: Water and Sewer system Analysis: Do the locations of the software and telecommucations industries differ? We will talk about analysis. Briggs Henan University 2012 5 Process, Pattern and Analysis • Processes operating in space create patterns • Spatial Analysis is aimed at: – Identifying and describing the pattern – Identifying and understanding the process Processes Create Patterns (or cause) Briggs Henan University 2012 6 Spatial Analysis is aimed at: • Identifying and describing the pattern The pattern is clearly clustered (points are in “groups”) • Identifying and understanding the process Access to transportation. Agglomeration economies* from sharing ideas, access to skilled labor, access to business services. *cost savings from many firms locating in the same area We will focus on spatial analysis Briggs Henan University 2012 7 Process, Pattern and Analysis • Often, we cannot observe (or “see’) the process, so we have to infer (“guess at” ?) the process by observing the pattern No Infer Processes Yes Create Patterns (or “cause”) Briggs Henan University 2012 8 Spatial Analysis: successive levels of sophistication Four levels of Spatial Analysis: --Each is more advanced (more difficult!) 1. Spatial data description: 2. Exploratory Spatial Data Analysis (ESDA) 3. Spatial statistical analysis and hypothesis testing 4. Spatial modeling and prediction We will look at all 4 levels in this lecture series More difficult, but more useful! (more powerful) Briggs Henan University 2012 9 Spatial Analysis: successive levels of sophistication 1. Spatial data description: – Focus is on describing the world, and representing it in a digital format --computer map --computer database – Uses classic GIS capabilities --buffering, map layer overlay --spatial queries & measurement Briggs Henan University 2012 10 Spatial Analysis: successive levels of sophistication 2. Exploratory Spatial Data Analysis (ESDA): – searching for patterns and possible explanations – GeoVisualization through data graphing and mapping --Density Kernel Estimation --Overlay transportation network Briggs Henan University 2012 11 Spatial Analysis: successive levels of sophistication 2. Exploratory Spatial Data Analysis (ESDA): – searching for patterns and possible explanations – GeoVisualization through calculation and display of Centrographic statistics --Calculation of Centrographic Statistics Briggs Henan University 2012 12 Spatial Analysis: successive levels of sophistication 3. Spatial statistical analysis and hypothesis testing – Are data “to be expected” or are they “unexpected” relative to some statistical model, usually of a random process (pure chance) 2.5% We will look at statistical hypothesis testing for: --point patterns --also for polygon data -1.96 2.5% 0 1.96 We can test if the spatial pattern for software & telecommmunications companies in Dallas is clustered (a pattern) or “random” (no pattern) Briggs Henan University 2012 13 Spatial Analysis: successive levels of sophistication 4. Spatial modeling: prediction – Construct models (of processes) to predict spatial outcomes (patterns) Notice how the density of points (number per square km) decreases as we move away from the highway. We can construct regression models to predict location patterns. However, for spatial data, we need special: Spatial regression models Density of points Density of points = f (distance from highway) Distance from highway Briggs Henan University 2012 14 The first example of Spatial Analysis • John Snow’s maps of cholera in 1850s London Was it ESDA or hypothesis testing? • Did he discover the association between water and cholera after drawing the map: ESDA • Did he draw the map in order to prove the association: using a map for hypothesis testing Briggs Henan University 2012 15 Maps are good—but more is needed! A. Is this clustered? B. Is this clustered? We must test rigorously using spatial analysis methods. Not just look and guess Briggs Henan University 2012 Source: R & Y, p. 5 16 Why is this important? ? Is it clustered? We must measure and test --not just look and guess! Because that is science! Because that is how earth management decisions must be made! Briggs Henan University 2012 17 Why we need analysis --and not just visual examination See handout!!! You need this course! Briggs Henan University 2012 18 A Clustered or random? B Clustered or random? Source: Rogerson and Yamada, 2009 (clustered = points are in “groups”) = pattern exists (random = points can be anywhere) = no pattern 19 Briggs Henan University 2012 Issues/Challenges/Problems in Spatial Analysis Summarize these now. Talk in greater detail about them throughout this lecture series. Briggs Henan University 2012 20 Critical Issues in Spatial Analysis • Spatial autocorrelation – Data from locations near to each other are usually more similar than data from locations far away from each other • Modifiable areal unit problem (MAUP-zone ) – Results may depend on the specific geographic unit used in the study – Province or county; county or city • Scale affects representation and results – Cities may be represented as points or polygons – Results depend on the scale at which the analysis is conducted: province or county – MAUP—scale effect • Ecological fallacy – Results obtained from aggregated data (e.g. provinces) cannot be assumed to apply to individual people – MAUP—individual effect • Non-uniformity of Space – Phenomena are not distributed evenly in space – Be careful how you interpret results! • Edge issues – Edges of the map, beyond which there is no data, can significantly affect results Briggs Henan University 2012 21 Critical Issues in Spatial Analysis • Modifiable areal unit problem (MAUP) – Results may depend on the specific geographic unit used in the study – Province or county; county or city – MAUP—zone effect • Scale affects representation and results – Results depend on the scale at which the analysis is conducted – MAUP—scale effect • Ecological fallacy – Results obtained from aggregated data (e.g. provinces) cannot be assumed to apply to individual people – MAUP—individual effect • Non-uniformity of Space – Phenomena are not distributed evenly in space – Be careful how you interpret results! • Edge issues – Edges of the map, beyond which there is no data, can significantly affect results • Spatial autocorrelation –the biggest of all! – Data from locations near to each other are usually more similar than data from locations far away from each other Briggs Henan University 2012 22 Modifiable areal unit problem (MAUP): 3 in 1!!! – Results may depend on the specific geographic unit used in the study – Dangerous to assume results for one set of units will also apply for another Zonal effect: Similar size and number of units, but different boundaries • Zip codes versus census tracts • Postal zones versus city neighborhoods Scale effect: increases size and decreases number of units • Counties versus provinces Individual effect: ecological fallacy results from geograhic units may not apply to individual people Briggs Henan University 2012 23 Modifiable areal unit problem (MAUP): zonal • Census Tracts versus Zip codes • Problem not as big—usually—as for scale differences Census Tract (used by US Census Bureau for data) Zipcode Areas (used by US Post Office) Briggs Henan University 2012 24 Modifiable areal unit problem (MAUP): scale • Counties versus Provinces codes • Usually a bigger problem than for zonal Will results be the same? Do conclusions still apply? Briggs Henan University 2012 25 Aggregation • combining smaller units into bigger units • affects results! • Note how: • variance (s2) decreases • Correlation coefficient (rXY) increases (‘cos of less variability) Briggs Henan University 2012 26 Scale: • Results obtained at one scale do not necessarily apply at other scales – A pattern may be clustered at one scale but dispersed at another scale Population clustered into cities City populations are dispersed Scale is always very important in spatial analysis! Briggs Henan University 2012 27 Scale: Always Important – ratio of distance on a map, to the equivalent distance on the earth's surface. – Scale must be shown on every map – Use scale bar because that is correct when map is enlarged or reduced 0 1 Large scale >objects are large, small area covered 2 Km • Affects how objects are represented on a map and how data is stored in a data base – Important for research design and data collection – Cities may be points or polygons Small scale >objects are small, large area covered Dallas point polygon 10 km Briggs Henan University 2012 28 Ecological fallacy crime rate Results from aggregated data (e.g. provinces) cannot be applied to individual people A special case of the MAUP problem Encountered in spatial and non-spatial analysis Usually because a variable was left out (omitted variable) income If low income provinces have high crime rate. Cannot assume low income people commit crimes. Perhaps low income provinces do not have money to pay for police. --”Yuans on policing” is omitted variable Briggs Henan University 2012 29 Non-uniformity of Space: things are not evenly distributed in space – Bank robberies are clustered – But only because banks are clustered Bank robbery Banks Bank Robberies Bonnie and Clyde were two very famous bank robbers in Texas in the 1930s They were asked “Why do you rob banks?” They replied “Because that is where the money is! What a stupid question!” (the expected answer was, perhaps, “because we needed money for food”) Briggs Henan University 2012 30 Non-uniformity of Space and Choropleth Maps Illiteracy in China Henan does not have high illiteracy! Henan has high illiteracy! Briggs Henan University 2012 31 Non-uniformity of Space and Choropleth Maps Always normalize data if drawing a choropleth map – By total population – By geographic area • Do not map “counts” unless population and/or geographic area are the same size for all observation units • Failing to “normalize” is a very common mistakes made by non-professional GIS people – You are professionals – Do not make that mistake! Briggs Henan University 2012 32 Edge or Boundary Effect – Every study region has a boundary (unless you study the entire world!) – You do not have data for outside your study region – However, the outside data can affect the inside data if there is spatial autocorrelation – Consequently, edges of the map, beyond which there is no data, can significantly effect results Solutions: Core study region periphery or guard area Use Core/Periphery --analyze only the core --use edge only for “neighborhood” calculations --reduces amount of data available Use the toroid concept --bends the left edge to meet the right and the top to meet the bottom --uses all the data --assumes that there is no systematic spatial trend in data Briggs Henan University 2012 33 Spatial Autocorrelation • Spatial organization is usually important • The results from a traditional regression analysis ignore how the observation units are organized spatially! • Data from location near to each other are usually more similar than data from locations far away – Must be considered in your analysis – Also causes serious problems with traditional statistical hypotheses testing • Spatial statistical models are essential Briggs Henan University 2012 34 What is the most common mistake in GIS analysis? Much more basic than any discussed above. Briggs Henan University 2012 35 Single most common error in GIS Analysis --intending a one to one join of attribute data to spatial table --getting a one to many join of attribute data to spatial table 51 states Hawaii Spatial FID 0 1 SHAPE Polygon Polygon 11 12 13 14 15 16 Polygon Polygon Polygon Polygon Polygon Polygon 53 54 Polygon Polygon NAME Alabama Alaska ….. Georgia Hawaii Idaho ….. Wisconsin Wyoming 51 states NAME FIPS Alabama 01 Alaska 02 ….. Georgia 13 Hawaii-Hawaii 15 Hawaii-Maui 15 Hawaii-Oahu 15 Hawaii-Kauai 15 Idaho 16 ….. Wisconsin 55 Wyoming 56 CODE AL AK FIPS 01 02 CODE AL AK POP2000 4,447,100 626,932 13 15 16 GA HI ID 8,186,453 1,211,537 1,293,953 55 56 WI WY 5,363,675 493,782 Total 282,421,906 POP2000 4,447,100 626,932 GA HI HI HI HI ID 8,186,453 1,211,537 1,211,537 1,211,537 1,211,537 1,293,953 WI WY 5,363,675 493,782 Total 286,056,517 54 observations Briggs Henan University 2012 After joining attribute to spatial data 36 Errors with the attribute data will occur if Hong Kong or Guangdong are not correctly drawn in the shapefile (spatial data) If there are islands, the province must drawn as a multi-part feature in the shapefile (the spatial data) --then there is only one row in the attribute table If each island is drawn as a separate feature, there will be multiple rows in the attribute table Briggs Henan University 2012 37 Measuring Space Not easy! Briggs Henan University 2012 38 Spatial is special: 3 primary concepts – Distance – Adjacency or neighborhood A – Interaction Briggs Henan University 2012 39 Fundamental Spatial Concepts • Distance – The magnitude of spatial separation – Euclidean (straight line) distance often only an approximation • Adjacency or neighborhood – Nominal or binary (0,1) equivalent of distance – Levels of adjacency exist: 1st, 2nd, 3rd nearest neighbor, etc.. • Interaction – The strength of the relationship between entities – An inverse function of distance Briggs Henan University 2012 40 Distance is not simple! • Cartesian distance via Pythagorus dij ( Xi Xj ) 2 (Yi Yj ) 2 Use for projected data, at local scale • Spherical distance via spherical coordinates Cos d = (sin a sin b) + (cos a cos b cos P) where: d = arc distance a = Latitude of A b = Latitude of B P = degrees of long. A to B Use for unprojected data, or at world scale • possible distance metrics: – – – – Euclidean straight line/airline city block/manhattan metric distance through network time/friction through network Briggs Henan University 2012 41 Spatial neighbors based on adjacency Square raster Rook: Sharing a boundary Hexagons Irregular Queen: Sharing a boundary or a point Briggs Henan University 2012 42 1st and 2nd order adjacency 1st order rook hexagon queen 2nd order Briggs Henan University 2012 43 Interaction Based on the Gravity Model P P I ij κ e α β i j γ d ij Gravity Model: Interaction between i and j is a function of: Pi --the population (size) at i Pj --the population (size) at j dij --the distance from i to j Based on a Hierarchy How do you fly from Zhengzhou to Wuhan? Shanghai Zhengzhou Central Place Hierarchy Briggs Henan University 2012 Wuhan 44 Today, we discussed spatial analysis and some of its problems and challenges However, to do spatial analysis you must have spatial data Next time: Spatial data and why it is special! Briggs Henan University 2012 45 Texts O’Sullivan, David and David Unwin, 2010. Geographic Information Analysis. Hoboken, NJ: John Wiley, 2nd ed. Other Useful Books: Mitchell, Andy 2005. ESRI Guide to GIS Analysis Volume 2: Spatial Measurement & Statistics. Redlands, CA: ESRI Press. Allen, David W 2009. GIS Tutorial II: Spatial Analysis Workbook. Redlands, CA: ESRI Press. Wong, David W.S. and Jay Lee 2005. Statistical Analysis of Geographic Information. Hoboken, NJ: John Wiley, 2nd ed. Briggs Henan University 2012 46