Winter GIS Institute
• Modifiable Area Unit Problem (MAUP)
• Boundary problems
• Spatial sampling procedures
• Spatial Autocorrelation
• Ecological fallacy
Rachel Franklin
• Our choice of spatial units (or zones) has a large influence on our analytical results
– For example, median household income by county versus state
• Two sides of the MAUP to be aware of:
– Placement of boundaries for units of a given size
– Choice of size of units
Rachel Franklin
• It’s important to keep in mind that activity just outside the boundary of our study area may also affect the study area
– For example, studying shopping behavior in
Rhode Island
• Size and shape of spatial units can affect our analysis and results
– Example: Tennessee and migration
• Possible solution in some cases: buffers
Rachel Franklin
• How do we ensure that we sample in such a way that we have a representative and unbiased sample for the spatial units we’re interested in?
– In other words, we want an accurate representation of the earth’s surface without sampling each and every point
• Random spatial sample – choosing x and y coordinates and random (or from a range)
• Stratified spatial sample – random sampling within each strata
• Systematic spatial sample – applying the spatial configuration of random sample in one stratum to all other strata in the study area
Rachel Franklin
• Tobler’s First Law of Geography: “Everything is related to everything else, but near things are more related than distant things.”
– A variable’s values are related to each other space – they’re correlated
– This means that observations are often not independent of each other
– For example, house values. If I tell you how much a particular house is worth, does it affect your prediction of the neighboring house’s value?
• We distinguish between two types of autocorrelation: positive and negative
Rachel Franklin
• Assuming that individuals in a group possess the average characteristics of the entire group
– We risk doing this when we use aggregate data for spatial units to make inferences about individuals(e.g. median income and education levels)
• For example, in recent presidential elections, wealthier states have tended to vote Democratic and poorer states,
Republican
– But at the individual level, it’s the opposite
Rachel Franklin
manipulating GIS data
• This is what GIS is all about – analyzing the spatial relationships between and within features
• Map overlay – combine layers to create single output
– Two categories:
• Tools that do not combine layer attributes
(clip & erase)
• Those that do (intersect & union)
Rachel Franklin
• Isolate a set of features from their larger group
– Similar to queries, except queries can only isolate – or select – features in their entirety
– Clip and erase can isolate entire features or just parts of features
• Clip – like a cookie cutter
– Cuts or clips one set of features based on the outline of another
• Erase is the opposite of clip – keeps only features that fall outside the erase layer
Rachel Franklin
Graphic source: Price
Rachel Franklin
Clip
Erase
• These essentially combine layers
– Both areas and attributes are affected
– Similar to spatial joins
• Union – combines polygon layers
– Creates all possible polygons from combination of both layers
– Both input layers must contain polygons
• Intersect – Only keeps polygon areas that were common to both layers
– Makes it easier to identify locations where two conditions are in effect simultaneously
• E.g. habitat identification
– Accepts points, lines, or polygons
Rachel Franklin
Graphic source: Price
Rachel Franklin
Intersect
Union
(found in ArcToolbox)
• Dissolve – groups features together, based on a common attribute
• Buffer – identifies areas that fall within a certain distance of a set of features
• Append and Merge – combine features from two or more layers
– Layers must be the same feature type
– And have the same coordinate system
Rachel Franklin
• Geoprocessing tools are accessible via:
– ArcToolbox
– Menus and tool bars
– Command line
• ModelBuilder and scripts
• Pay special attention to:
– Coordinate systems and projections
– Areas and lengths
Rachel Franklin
• Types of spatial analysis (Longley)
– Queries and reasoning – no changes are made to the database and no new information is produced
• For example, how many cities within 300 miles of Kansas City?
– Measurements – Describing aspects of geographic data, like length, area, or shape
• For example, calculating the size (or area) of a parcel
– Transformations – Changing or combining data to create new data
• Using logical, mathematical, or geometrical rules
– Descriptive summaries – summary statistics for spatial data
– Optimization – Finding the best locations for a set of objects, given a set of criteria
• For example, bus stop locations in Australia
– Hypothesis testing – Making generalizations about a population from a sample
• Could this spatial pattern have occurred by chance?
Winter GIS Institute
• We can query our spatial data lots of ways:
– Through perusing the “catalog” or file view
– Map view
– Table view
– Histogram or scatterplot view
– Database queries, using SQL
• Remember, “computers are generally
uncomfortable with vagueness.”
(Longley)
Winter GIS Institute
• How far apart are two points? How large is a parcel’s area?
• Area
• Distance or length
– Distance may be measured two ways:
1.
Straight line or Pythagorean distance. Also referred to as “as the crow flies”
– Assumes a flat plane, for latitude and longitude we need to think of great circle distances
2.
Manhattan or network distance
• Shape – for example Gerrymandering
– S=P/3.54√A
• Where P is perimeter and A is area; 3.54 is twice the square root of π
• S=1 for the most compact shape, a circle
• Slope and aspect
– Digital Elevation Models or DEMs
• Rasters whose cells contain the elevation at that location
Winter GIS Institute
• Buffering – Creates an area of a specific and constant width around a point, line, or polygon
– This can be used to identify all objects falling within a certain distance of the original feature
• Point in polygon – Associates points with polygons
– Counts number of points within a polygon
– Attach polygon characteristics to points or vice versa
– Points can lie in only one polygon; point in polygon algorithm
• Polygon overlay – Determining whether two polygons overlap, the extent of their overlap, and what new polygons are created by the overlap
– Spurious polygons or slivers – the coastline weave problem
– Tolerance
• Spatial interpolation – “Guessing” the value of a variable for locations where no measurement has occurred. For example, rainfall, temperature, or elevation
– Inverse distance weighting
– Kriging
• Density estimation and potential – generates a surface from a set of discrete points
Winter GIS Institute
• Looking for patterns or anomalies
• Descriptive summaries
– Center
• Mean Center y x
y
i
w i
w i i x i y i
/
/ i
w i
w i i
• Centroid – summarizing an area (polygon) with a point
– That is, making points from polygons – uses the average of the polygon’s vertices
• Point of minimum aggregate travel (MAT) – The point that minimizes the total straight line distance
Winter GIS Institute
– Dispersion
• Mean distance from the centroid
– Spatial Dependence
• We can think of global and local measures of spatial dependence
• The scale we use will determine, in large part, whether we find spatial dependence across a set of objects
– Fragmentation – how broken up is the landscape into difference pieces?
• Are these pieces large or small? Compact or spread out?
• One measure is simply the number of patches that exist
• Or we can use the shape measure discussed a few minutes ago: S=P/3.54√A
Winter GIS Institute
• Best location for a set of points
– “p-median problem” – seeking the best location for a set of p facilities, such that distance from each point to the closest facility is minimized
• School location, e.g.
– “Coverage problem” – seeking to minimize the furthest distance traveled
• Fire station location, e.g.
– “Location-Allocation” – We’re not only trying to locate facilities, but also allocate demand for each facility
Winter GIS Institute
• Routing on a network
– “Shortest path” – The best path through a network that minimizes distance or travel time
• Google Maps direction, e.g.
– “Traveling Salesman Problem” (TSP) – Seeks the best ordering of a set of stops to minimize total distance traveled
• My milkman, e.g.
• If there are n places to be visited including home base, then there are (n-1)! possible tours to choose from
– Or, really, (n-1)!/2, since it doesn’t matter if a given tour is done forwards or backwards.
• Large n problem and the use of heuristics
Winter GIS Institute
• Optimum paths - best paths in continuous space
– Locating highways or power lines, for example
– Routing airplane flights
• These are often solved using a raster, where each cell contains a friction value – cost or time associated with crossing the cell
– GIS then finds the least-cost path
• We can differentiate between optimal locations with a network or just in continuous space
Winter GIS Institute
• Point patterns
– Is the distribution of points random?
Uniform? Can we identify clusters?
• Measures of spatial association
– Global – Do we see positive or negative autocorrelation across our study area
• Very dependent on scale
– Local – Are values correlated with local neighbors?
• House values
• Crime
Winter GIS Institute
• All measures of spatial association depend on scale
– How do we define neighbors?
• Neighborhoods can be defined based on
distance or contiguity
– Distance: My neighbors are those who live within a mile of me, for example
– Contiguity: Refers to polygons. My neighbors are those I share a border with:
• Queen’s case: Shared borders and corners count for contiguity
• Rook’s case: Only shared borders count for contiguity
• 1 st order versus 2 nd order, etc: We could choose our immediate neighbors, or those that are neighbors of our neighbors.
Winter GIS Institute
• When we define our neighborhood, this is implemented using a “weights
matrix”
• Usually 1 and 0’s that indicate yes or no for whether a spatial unit is my neighbor
• This is then often “row standardized” – values are constrained to sum to 1 at the end of each row.
• Units are not considered neighbors of themselves
• These matrices are generally symmetric – If I’m your neighbor, then you’re my neighbor.
Winter GIS Institute
• Local Indicators of Spatial Association
(LISA)
– Local Indicators of Spatial Association (LISA) indicate the presence or absence of significant spatial clusters or outliers for each location. A
Randomization approach is used to generate a spatially random reference distribution to assess statistical significance.
Winter GIS Institute
• Getis-Ord Gi* Statistic
– The resultant Z score tells you where features with either high or low values cluster spatially. This tool works by looking at each feature within the context of neighboring features.
– A feature with a high value is interesting, but may not be a statistically significant hot spot. To be a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well.
Winter GIS Institute
• A few options:
– Best case scenario: data are already in shapefile format, or similar
• Or you join e.g. excel data to shapefile data
– You collect or create your data yourself
– ArcGIS converts X,Y (lat, long) coordinates into point data
– Or, very commonly, we geocode
Winter GIS Institute
• Along with mapping, geocoding is one of the most commonly-used GIS applications
• When we geocode, we attach location information to tabular geographic information
– Addresses of all grocery stores in Providence
– Locations of all capital cities in the world
• We can think of a location-specificity continuum from general (e.g. cities) to specific (e.g. exact addresses)
Winter GIS Institute
Winter GIS Institute
• The more specific we are in terms of location, the more geographic information we need
– Also, depending on use of geocoded information, exact location may be very important – for example, 911 calls
– Locating cities requires a reference file with city locations
– Location addresses in Providence requires street name and street number, at a minimum
• Locations can be attached to polygons or points, but the most challenging is attaching to addresses, or lines
Winter GIS Institute
• Emergency services
• GPS
• Driving directions
• Google maps
• Crime analysis
• Marketing
Winter GIS Institute
• Tabular data are compared to a spatial Reference layer
– This is what ArcMap uses to match addresses
• This happens in a few steps
– To work best, addresses need to be recognizable to the computer, or standardized
– Then standardized addresses in our table of locations (say,
J. Crew stores) are compared to our reference layer
• To understand this, think about the standard components of a street address
– Prefix direction
– Street name
– Street type
– Number
– Suffix direction
Winter GIS Institute
• The spatial reference layer includes the spatial information that will help locate our list of places in space
– The street name, obviously, if we’re geocoding addresses
– Or city and state
• Names of streets are attached to line segments, or polylines
– Each line segment is associated with a range of street numbers
– These are tabulated as “from address” to “to address” – allowing us to increase house numbers from bottom of line segment to top, since we know beginning and end number
• What we don’t know is where, exactly, a building lies on that line segment
• So geo-coding always has an element of approximation to it
100 200
Line segment
Winter GIS Institute
• One range method: A single address range for each chunk of street
• Two range method: An address range for each side of the street
– Obviously more desirable, but not always possible since this information needs to be coded into the reference layer
– ArcMap allows us to include an “offset” in this case
• In both cases, addresses are assigned to a place on the line in proportion to the starting and ending address on the line itself.
– So if the polyline starts at 100 Main St. and ends at
200 Main St., an address of 150 Main St. goes right in the middle
Winter GIS Institute
• Single field – Zip code, state name, power stations
• Alphanumeric Ranges – Helps narrow the search range for address identification, since ArcMap only has to look in that quadrant
• US Cities and states – Locates cities, given city and state names
• US One Address – Matches addresses to points or polygons
• US One Range – Matches addresses to one range of street values
• US Streets – Matches addresses to a range of street values for both sides of the street
• World City and country – Locates cities within countries on a world map
• Zip code – Matches zip codes to a point or polygon reference layer
• Zone option – Additional pieces of information (zip, state, city) that allow us to match over larger areas
Winter GIS Institute
Why it’s important to know your study location
• Quirky address styles:
– Queens, NY
– Washington, DC
– Phoenix, AZ
• Quickly growing locations
• Spelling quirks
– Saint and St. / Sainte and Ste.
• Value of “Alias Tables”
– Maxcy Hall v. 112 George Street
Winter GIS Institute
• First, load your address table and reference layer into ArcMap
• Then we need to set up an address locator
– Done in ArcCatalog
– This assembles the pieces of information we need in order to geocode
• What is our reference layer?
• What are the key fields we’ll use to locate addresses?
• A “snapshot” of the reference layer is taken at this time
– important to remember
• Geocoding can be done interactively or in batch mode
– Usually we do a combination of both
• The output is a new shapefile or feature class
Winter GIS Institute