Spatial Data What is special about Spatial Data? 1 Briggs Henan University 2012 What is needed for spatial analysis? 1. Location information—a map 2. An attribute dataset: e.g population, rainfall 3. Links between the locations and the attributes 4. Spatial proximity information – Knowledge about relative spatial location – Topological information Topology Topography --knowledge about relative spatial positioning --the form of the land surface, in particular, its elevation Briggs Henan University 2012 2 Berry’s geographic matrix Berry, B.J.L 1964 Approaches to regional analysis: A synthesis . Annals of the Association of American Geographers, 54, pp. 2-11 1990 time 2000 location Attributes or variables Variable 1 Variable 2 … Variable P Attributes or variables areal unit 1 location 2010 Income … Variable P areal unitPopulation 2 Attributes or Variables .1 areal unit location . Income … Variable P .2 areal unitPopulation geographic . Henanareal unit n associations . Shanxi . geographic . areal unit ndistribution geographic . . fact areal unit n 3 Briggs Henan University 2012 4 Briggs Henan University 2012 Types of Spatial Data • • • • Continuous (surface) data Polygon (lattice) data Point data Network data 5 Briggs Henan University 2012 Spatial data type 1: Continuous (Surface Data) • Spatially continuous data – attributes exist everywhere • There are an infinite number locations – But, attributes are usually only measured at a few locations • There is a sample of point measurements • e.g. precipitation, elevation – A surface is used to represent continuous data 6 Briggs Henan University 2012 Spatial data type 2: Polygon Data • polygons completely covering the area* – Attributes exist and are measured at each location – Area can be: • irregular (e.g. US state or China province boundaries) • regular (e.g. remote sensing images in raster format) *Polygons completely covering an area are called a lattice 7 Briggs Henan University 2012 Spatial data type 3: Point data • Point pattern – The locations are the focus – In many cases, there is no attribute involved 8 Briggs Henan University 2012 Spatial data type 4: Network data • Attributes may measure – the network itself (the roads) – Objects on the network (cars) • We often treat network objects as point data, which can cause serious errors – Crimes occur at addresses on networks, but we often treat them as points See: Yamada and Thill Local Indicators of network-constrained clusters in spatial point patterns. Geographical Analysis 39 (3) 2007 p. 268-292 Briggs Henan University 2012 9 Which will we study? Point data (point pattern analysis: clustering and dispersion) Polygon data* (polygon analysis: spatial autocorrelation and spatial regression) Continuous data* (Surface analysis: interpolation, trend surface analysis and kriging) *in the fall semester Briggs Henan University 2012 10 Converting from one type of data to another. --very common in spatial analysis 11 Briggs Henan University 2012 Converting point to continuous data: interpolation # ## # # # # # # # # # # # # # ## ## # # # ## # # # # # # # # ## ## # ## # ## # # # ## # ## # # # # # # # # # # # # # # # ### ## # # # # ### # ### ## # # # # # # ### # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # ### ## ## # # # # # # # # # # # # # ## # # # # # # # # # # # ## # # # # # ## # # # # # # # # # # # # ## # # # # # # # # # # # # # # # # # # # # # # # # # # # ## # # # # # # # ## # # # # # # # # # # # # # # # # # # # 12 # Briggs Henan University 2012 Interpolation • Finding attribute values at locations where there is no data, using locations with known data values Simple linear • Usually based on interpolation – Value at known location – Distance from known location • Methods used Known Unknown – Inverse distance weighting – Kriging 13 Briggs Henan University 2012 Converting point data to polygons using Thiessen polygons # # # # ## # # # # # # # # ## # # # ## # # # # # # # # ## # # # ## # # # # # # # ## # # # ## # # # # # # # # # # # # # # # # # # # # # # # # ## # # # ## # # # # # # # # # # ## # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #### #### # # ##### ## ## ## # # # # # # # # # # # # # # ## # # # # # # # # # # # # # # # # # # # # # # # # # # # ## # # # # # # # # # # # ## # # # # # # ## # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 14 # # # Briggs Henan University 2012 Thiessen or Proximity Polgons (also called Dirichlet or Voronoi Polygons) • Polygons created from a point layer • Each point has a polygon (and each polygon has one point) • any location within the polygon is closer to the enclosed point than to any other point • space is divided as ‘evenly’ as possible between the polygons Thiessen or Proximity Polygons A 15 Briggs Henan University 2012 How to create Thiessen Polygons 1. Connect point to its nearest (closest) neighbor 2. Draw perpendicular line at midpoint 3. Repeat for other points 4. Thiessen polygons 16 Briggs Henan University 2012 Converting polygon to point data using Centroids • Centroid—the balancing point for a polygon • used to apply point pattern analysis to polygon data • More about this later 17 Briggs Henan University 2012 Using a polygon to represent a set of points: Convex Hull • the smallest convex polygon able to contain a set of points – no concave angles pointing inward • A rubber band wrapped around a set of points • “reverse” of the centroid • Convex hull often used to create the boundary of a study area No! – a “buffer” zone often added – Used in point pattern analysis to solve the boundary problem. • Called a “guard zone” 18 Briggs Henan University 2012 Models for Spatial Data: Raster and Vector two alternative methods for representing spatial data 19 Briggs Henan University 2012 Concept of Vector and Raster river Real World house trees Raster Representation 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 Vector Representation R T R R R H T point line R R R R R T T T T H polygon R R 20 Briggs Henan University 2012 Comparing Raster and Vector Models Raster Model • area is covered by grid with (usually) equal-size, square cells • attributes are recorded by giving each cell a single value based on the majority feature (attribute) in the cell, such as land use type or soil type • Image data is a special case of raster data in which the “attribute” is a reflectance value from the geomagnetic spectrum – cells in image data often called pixels (picture elements) Vector Model The fundamental concept of vector GIS is that all geographic features in the real work can be represented either as: • points or dots (nodes): trees, poles, fire plugs, airports, cities • lines (arcs): streams, streets, sewers, • areas (polygons): land parcels, cities, counties, forest, rock type Because representation depends on shape, ArcGIS refers to files containing vector data as shapefiles Briggs Henan University 2012 21 Raster model Land use (or soil type) wheat fruit clover corn Image fruit 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 4 4 5 5 5 1 1 1 1 1 4 4 5 5 5 1 1 1 1 1 4 4 5 5 5 1 1 1 1 1 4 4 5 5 5 1 1 1 1 1 4 4 5 5 5 2 2 2 2 2 2 2 3 3 3 2 2 2 2 2 2 2 3 3 3 2 2 2 2 2 2 2 3 3 3 2 2 4 4 2 2 2 3 3 3 2 2 4 4 2 2 2 3 3 3 Each cell (pixel) has a value between 0 and 255 (8 bits) 21 186 22 Briggs Henan University 2012 Vector Model • point (node): 0-dimensions – single x,y coordinate pair – zero area – tree, oil well, location for label • line (arc): 1-dimension – two connected x,y coordinates – road, stream – A network is simply 2 or more connected lines • polygon : 2 1 y=2 . Point: 7,2 x=7 1 7 8 2 Line: 7,2 8,1 1 7 8 2-dimensions – four or more ordered and connected x,y coordinates – first and last x,y pairs are the same – encloses an area – county, lake 2 Polygon: 7,2 8,1 7,1 7,2 1 7 8 23 Briggs Henan University 2012 Using raster and vector models to represent surfaces 24 Briggs Henan University 2012 Representing Surfaces with raster and vector models –3 ways • Contour lines – Lines of equal surface value – Good for maps but not computers! • Digital elevation model (raster) – raster cells record surface value • TIN (vector) – Triangulated Irregular Network (TIN) – triangle vertices (corners) record surface value 25 Briggs Henan University 2012 Contour (isolines) Lines for surface representation Contour lines of constant elevation --also called isolines (iso = equal) Advantages • Easy to understand (for most people!) – – – – hill top (or basin) Downhill > = ridge Uphill < = valley Closer lines = steeper slope Circle = Disadvantages • • Not good for computer representation Lines difficult to store in computer Raster for surface representation Each cell in the raster records the height (elevation) of the surface Surface Contour lines Raster cells with elevation value Raster cells (Contain elevation values) 27 Briggs Henan University 2012 Triangulated Irregular Network (TIN): Vector surface representation • a set of non-overlapping triangles formed from irregularly spaced points • preferably, points are located at “significant” locations, valley ridge 1 – bottom of valleys, tops of ridges • Each corner of the triangle (vertex) has: – x, y horizontal coordinates – z vertical coordinate measuring elevation. 2 3 4 vertex Point # 1 2 3 4 etc 5 X 10 25 30 15 Y 30 30 25 20 Z 160 150 140 130 Draft: How to Create a TIN surface: from points to surfaces Thiessen3.jpg Thiessen4.jpg Links together all spatial concepts: point, line, polygon, surface 29 Briggs Henan University 2012 Using raster and vector models to represent polygons (and points and lines) 30 Briggs Henan University 2012 Representing Polygons (and points and lines) with raster and vector models • Raster model not good – not accurate 0 1 2 3 4 5 6 7 8 9 0 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 4 4 3 1 1 1 1 1 2 2 2 4 4 4 1 1 1 1 1 2 2 2 2 2 5 4 4 4 4 4 2 2 2 2 2 6 4 4 4 4 4 2 2 2 2 2 7 5 5 5 5 5 3 3 3 3 3 8 5 5 5 5 5 3 3 3 3 3 X 9 5 5 5 5 5 3 3 3 3 3 • Also a big challenge for the vector model – but much more accurate – the solution to this challenge resulted in the modern GIS system 31 Briggs Henan University 2012 Using Raster model for points, lines and polygons For points --not good! For lines and polygons Point “lost” if two points in one cell Line not accurate Point located at cell center --even if its not Polygon boundary not accurate Briggs Henan University 2012 32 Using vector model to represent points, lines and polygons: Node/Arc/Polygon Topology The relationships between all spatial elements (points, lines, and polygons) defined by four concepts: • Node-ARC relationship: – specifies which points (nodes) are connected to form arcs (lines) • Arc-Arc relationship – specifies which arcs are connected to form networks • Polygon-Arc relationship – defines polygons (areas) by specifying which arcs form their boundary • From-To relationship on all arcs from to New! – Every arc has a direction from a node to a node from – This allows • This establishes left side and right side of an arc (e.g. street) • Also polygon on the left and polygon on the right for every side of the polygon Briggs Henan University 2012 Left Right 33 to 1 I 4 II Smith Estate A34 IV 2 Birch III A35 3 Cherry Spatial Data Node Table Node ID Easting Northing 1 126.5 578.1 2 218.6 581.9 3 224.2 470.4 4 129.1 471.9 Arc Table Arc ID From N To N L Poly I 4 1 II 1 2 III 2 3 A35 IV 3 4 Polygon Table Polygon ID Arc List A34 I, II, III, IV A35 III, VI, VII, XI Node/Arc/ Polygon and Attribute Data R Poly A34 A34 A34 A34 Example of computer implementation Attribute Data Node Feature Attribute Table Node ID Control Crosswalk 1 light yes 2 stop no 3 yield no 4 none yes ADA? yes no no no Arc Feature Attribute Table Arc ID Length Condition Lanes Name I 106 good 4 II 92 poor 4 Birch III 111 fair 2 IV 95 fair 2 Cherry Polygon Feature AttributeTable Polygon ID Owner Address A34 J. Smith 500 Birch A35 R. White 200 Main 34 Briggs Henan University 2012 This is how a vector GIS system works! This data structure was invented by Scott Morehouse at the Harvard Laboratory for Computer Graphics in the 1960s. Another graduate student named Jack Dangermond hired Scott Morehouse, moved to Redlands, CA, started a new company called ESRI Inc., and created the first commercial GIS system, ArcInfo, in 1971 Modern GIS was born! 35 Briggs Henan University 2012 Other ways to represent polygons with vector model 2. Whole polygon structure 3. Points and Polygons structure • Used in earlier GIS systems before node/arc/polygon system invented • Still used today for some, more simple, spatial data (e.g. shapefiles) • Discuss these if we have time! 36 Briggs Henan University 2012 Vector Data Structures: Whole Polygon Whole Polygon (boundary structure): list coordinates of points in order as you ‘walk around’ the outside boundary of the polygon. – all data stored in one file – coordinates/borders for adjacent polygons stored twice; • may not be same, resulting in slivers (gaps), or overlap – all lines are ‘double’ (except for those on the outside periphery) – no topological information about polygons • which are adjacent and have a common boundary? – used by the first computer mapping program, SYMAP, in late 1960s – used by SAS/GRAPH and many later business mapping programs – Still used by shapefiles. Topology Topography --knowledge about relative spatial positioning -- knowledge about shared geometry --the form of the land surface,Briggs in particular, its elevation Henan University 2012 37 Whole Polygon: illustration Data File A34 A44 A42 A32 A34 B44 B54 B52 B42 B44 C 32 C42 C40 5 4 3 E A B C D 2 1 0 1 2 3 4 5 C30 C32 D42 D52 D50 D40 D42 E15 E55 E54 E34 E30 E10 E15 38 Briggs Henan University 2012 Vector Data Structures: Points & Polygons Points and Polygons: list ID numbers of points in order as you ‘walk around’ the outside boundary • a second file lists all points and their coordinates. – solves the duplicate coordinate/double border problem – still no topological information • Do not know which polygons have a common border – first used by CALFORM, the second generation mapping package, from the Laboratory for Computer Graphics and Spatial Analysis at Harvard in early ‘70s 39 Briggs Henan University 2012 Points and Polygons: Illustration 5 12 11 2 1 4 3 E 2 1 10 0 1 A 9 2 3 4 3 C 8 4 5 B 6 D 7 Points File 1 2 3 4 5 6 7 8 9 10 11 12 34 44 42 32 54 52 50 40 30 10 15 55 Polygons File A 1, 2, 3, 4, 1 B 2, 5, 6, 3, 2 C 4, 3, 8, 9, 4 D 3, 6, 7, 8, 3 E 11, 12, 5, 1, 9, 10, 11 5 40 Briggs Henan University 2012 Hopefully, you now have a better understanding of what is special about spatial data! Monday, we will begin talking about Spatial Statistics 41 Briggs Henan University 2012 42 Briggs Henan University 2012