GIS Data Preparation and Integration Digesting the Food 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 1 Data Preparation and Integration: the necessary steps • Geocoding: assigning geographic coordinates to points – Perhaps the most basic form of spatial data entry • data media conversion – scanning – digitizing • data format conversion – raster & vector • data reduction • Topology, error detection and topological editing • rectification and registration (one on top of the other) – overlaying sheets and referencing to the real world • edge matching & image adjustment (side by side) – linking & balancing adjacent sheets • interpolation • conflation 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 2 Geocoding:assigning spatial coordinates to point data Address Matching assigns spatial coordinates (explicit location) to addresses (implicit location) Address matching requires street network file with street attribute information (street name and number range) for all street segments (block sides) – – – – “Zone” variable required if data spans multiple cities (to handle duplicated street names) precise matching of street names can be problematic completeness (esp. for ‘new’ streets) important PO boxes, building names, and apartment complex names cause problems. Implementation in ArcGIS is 3-step process – In ArcToolbox (9.2), process street network file to create a Geocoding Service – In ArcMap, load appropriate geocoding service via Tools/Geocoding/Services Manager – In ArcMap, geocode a table of addresses using Tools/Geocoding/Geocode Addresses Point Location Files containing lat/long or x,y coordinates (e.g derived via GPS) – bring table (e.g. in .csv or .dbf format) into ArcGIS using add data icon – Right click table name in T of C and select Display X,Y data – Displays as “event layer.” Export to shapefile or gdb feature class for spatial data set. Input table must contain 3 variables at minimum: Feature ID, x, y Data Media Conversion--Scanning: automated recording of map or aerial • Produces “dumb” raster data • Great if need only raster representation – vectorize using conversion software • Automated creation of vector data from – Create “smart” image using digital scanning very problematic: image processing techniques • electromechanical – $100-$50,000 instruments – drum or flatbed – scan resolution depends on price! • down to 20 microns (millionth of m) • Scanners v. sensors – – – – – docs must be clean complex line work adds error lines shouldn’t be broken with text. text may be interpreted as lines automatic feature detection (road versus railroad) difficult • ESRI’s ArcScan for ArcGIS (included with ArcEditor) provides interactive, – Sensors collect data directly in digital semi-automated raster to vector form (e.g. digital cameras) conversion. – Sensor resolution now (2005>) matches that of photos, so scanning photos becoming old technology – Still lots of paper maps around e.g. • property ownership records – Other vendors offer specialized conversion software Digital image processing techniques used to create “smart raster” – Identify feature type within each raster 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 4 Data Media Conversion--Digitizing: manually tracing a map or aerial • • • • • • • • • Applied to map or aerial photo Use hard copy map/photo on table/tablet, or scanned image on screen (heads-up digitizing) pen or cursor detects x, y coords coordinates are in inches/cms from lower left (0,0) control points (tic marks) relate digitized coordinates to real world lat/long coordinates coordinates captured in stream or point mode accuracy of table (but not user!) usually better than 0.1 mm all nodes and polygons should be marked and numbered first essentially a vector approach 7/12/2016 Ron Briggs, UTDallas Problems: • paper maps unstable – crease and fold – stretch with humidity ( up to 3%) – photos more stable (0.2%) • map errors transferred to GIS – maps often prepared for display not accuracy • • human hand very shaky often generates undershoots, overshoots, & double lines – editing and clean-up essential GISC 6381 GIS Fundamentals 5 Data Format Conversion: • Vector to Vector – e.g. whole polygon (e.g SAS map data) to point/arc/polygon – computationally intense – no accuracy loss providing data is ‘clean’ – perfectly transitive • raster to raster – may involve resampling (see under data reduction) – may involve conversion between different vendor’s raster formats (e.g. GRID to BIL) • vector to raster: point – node x,y assigned to closest raster cell – locational shift almost inevitable; error depends on raster size. – two points in one cell indistinguishable – not transitive; cannot retrieve original data without error 7/12/2016 Ron Briggs, UTDallas Vector raster vector raster 4 possibilities vector to raster: line – cells assigned if touched by line – stair step appearance of diagonal lines (called aliasing) – can be visually improved through anti aliasing: brightness of cells varied based on fraction of cell covered by the line • raster to vector – by far the most difficult Transitive: the ability to reproduce the original data after conversion. GISC 6381 GIS Fundamentals 6 Vector to Raster Conversion Point Orthogonal Line Diagonal Line (more problemmatic) Vector Note the use of anti-aliasing to improve line’s visual appearance Raster 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 7 Raster to Vector Data Conversion: 3-step process – skeletonizing (or thinning): to reduce rasters to unit width • peeling approach successively removes outer edges • medial axis approach determines set of interior pixels farthest from outer edges – vector extraction: to identify lines • 4-connected reconstruction – joins center points of 4-connected neighbors if present – particularly bad for diagonal line reproduction • 8-connected reconstruction – joins center points of 8-connected neighbors if present – diagonal lines reproduced but adds extra lines • 8-connected reconstruction with redundancy elimination – if 4-connected neighbor line exists, don’t draw diagonal – reduces redundant lines Available via the ArcScan extension for ArcGIS, as well as via several specialized packages from other vendors – topological reconstruction: recreates topological structure – create nodes at line junctions – construct arcs – define polygons (manual designation required) 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 8 Raster to Vector Conversion Skeletonizing For example, go to: http://www.cosc.canterbury.ac.nz/people/mukundan/covn/Thin.html 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 9 Raster to Vector Conversion: Vector Extraction 4-connect reconstruction Vector Raster 4-connect reconstruction: search the 4 surrounding cells and join center points if present 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 10 Raster to Vector Conversion: Vector Extraction 8-connect reconstruction Vector Raster 8-connect reconstruction: search the 8 surrounding cells and join center points if present. 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 11 Raster to Vector Conversion: Vector Extraction 8-connect reconstruction with redundancy elimination Vector Raster 8-connect with redundancy elimination: draw diagonal from 8-cell search only if not already connected by orthogonal from 4-cell search 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 12 Data Format Conversion Implementation in ArcGIS 9 To Vector To Raster Arctoolbox>Conversion Tools>To Raster> Raster To Other (multiple) From Raster Converts one or more raster dataset formats supported by ArcGIS to a GRID, IMAGINE, TIFF, or geodatabase raster dataset format Can also be accomplished thru ArcCatalog, Export function Arctoolbox>Conversion Tools>To Raster> Feature to Raster From Vector Converts any shapefile, coverage, or geodatabase feature class containing point, line, or polygon features to a raster dataset Arctoolbox>Conversion Tools>From Raster> Raster to Point Raster to Polygon Raster to PolyLine Converts raster datasets in GRID, IMAGINE, or TIFF formats to shapefiles or feature classes. Results may not be what you expect! Can also be accomplished thru ArcCatalog, Export function Use ArcCatalog, Export function for conversions between shapefiles, gdb feature classes, coverages and CAD ArcGIS Data Interoperability Extension for the most comprehensive set of conversions Can also be accomplished thru ArcCatalog, Export function. 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 13 • Why? – conserve space Data Reduction • Disk in past • Comm. bandwidth today • Thinning (vector data) – conserve time • reduce processing time (batch) • speed response time (interactive) • Resampling (raster data) – ‘average’ the 4 values in a 2by2 neighborhood – use this 1 value in a single cell occupying the location of the 4 original cells – use mean for interval data; rules required for ordinal or nominal data – not transitive! 3 7 2 4 16 bytes – often applied to data digitized in stream mode – tolerance elimination: remove nearest-neighbor points which are ‘too close’ (e.g. output device resolution insufficient to distinguish) – topological elimination*: remove points unnecessary for topo structure – model-based elimination: fit polynomial by least squares and record fewer points along its path *Normally uses the Douglas/Poiker (or Peucker) algorithm: David H. 4 bytes Douglas & Thomas K. Peucker Algorithms for the reduction of the number of points required to represent a digitized line or its caricature, Canadian Cartographer, 1973 4 1 byte 7/12/2016 Ron Briggs, UTDallas Implement in ArcGis via Advanced Editing toolbar, Generalize tool 14 GISC 6381 GIS Fundamentals Topology & Errors Topology --knowledge about relative spatial positioning --spatial relationships between features and rules about these relationships --managing data cognizant of shared geometry Implies knowledge of the three Cs: – connectivity (linked): – congruency (coincident/same as/on top of) – contiguity (adjacent) It is critical that spatial data be created and managed so that it is topological clean--free from topological errors --editing must always aim to maintain topological structure In topological editing, changes made to one feature (line, polygon, etc.) are also reflected in all other features to which it is connected, coincident, or adjacent In the classic GIS data structure model (as discussed in GIS Data Structures lecture) this implies that, for example --all arcs have nodes at end points --there is a node wherever arcs intersect or connect --a single arc forms the border between contiguous polygons (e.g. Dallas and Tarrant county) Tarrant Dallas --a single arc represents a common boundary (e.g. state and county boundary) 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 15 Errors: detection and removal • GIS packages commonly use topological structure checking to detect errors • Editing based on node snapping used to correct errors: moving a feature so its coordinates correspond exactly with another’s • snapping conducted based on tolerances -- snap if within 1 foot, for example • Care must always be taken to assure that topological “cleaning” does not itself introduce errors (e.g. snapping nodes and lines together which shouldn’t be) 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 16 Topological errors or real world occurrences? common problems • • • • • • dangling arc (node missing at one end) No node at arc intersection (overpass?) Overshoot (or missing node)? undershoot? pseudo node (but perhaps road surface changes) pseudo arc (connects to itself) • open polygon • Sliver polygon • gap 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 17 How ArcGIS Handles Topology • The original Coverage data model, introduced with ArcInfo in 1981, incorporated topology as a part of the data – The CLEAN command checked for, and automatically “fixed”, topological errors based on a set tolerance • It could introduce errors into the data – The BUILD command then rebuilt polygon structures • ArcGIS 8.3 introduced the concept of topological rules for geodatabases in which the topological relationships are stored as a topology feature class separate from the data itself – The user can generate an error report, review each error, and then fix it in the data if desired, or mark it as an “exception” 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 18 Georeferencing: Rectification and Registration providing true earth location/overlaying layers • rectification: rearrangment of location of objects to correspond to a specific reference system (usually geodetic) • registration: rearrangment of location of objects of one set so they correspond with those of another, without reference to a specific reference system Despite formal difference, often used interchangeably Two methods • homogeneous transformation via rotation, translation, scaling, skewing – used for map projection and similar conversions • differential transformation via rubber sheeting – used to correctly position distorted images or scanned maps or documents •Most commonly used to relate images (e.g. scanned photo) to a vector layer, but can also be used to “fix” incorrect positioning of features in a vector layer •Implemented in ArcMap: via the Georeferencing toolbar for images via the Spatial Adjustment toolbar for vector layers 19 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals Transformation: (homogeneous conversion) • translation of origin – from digitizer origin for sheet to ‘true’ origin of GIS file • rotation of axis – e.g to true north • scaling of axis translation differential scaling • homogenous: • differential (ovals to circles) • skewing of axis Changing map projections may involve all 4 7/12/2016 Ron Briggs, UTDallas rotation GISC 6381 GIS Fundamentals skewing 20 Rubber Sheeting (differential conversion) • • GIS file is differentially ‘stretched’ so that tic points in file overlay corresponding ground control (tie) points on earth’s surface (or tic points in a second file) polynomial fitted by least squares between known ground control coords and tic point coords in GIS – “Least squares” minimizes the sum of the squared distances between tic/tie pairs • • • derived parameters then applied to all coordinates in file after conversion, tic points are on average closer to ground control points, but not identical can’t do this with a paper map! 7/12/2016 Ron Briggs, UTDallas --the more the better ground control (tie) --well distributed map locations (tic) --known lat/long of ground control tie points (usually obtained from GPS) needed for rectification --common identifiable points in each file needed for registration GIS file GISC 6381 GIS Fundamentals 21 Edge Matching: Joining map sheets to create a seamless GIS Process • required for topo. consistency even if features line-up visually • snapping used to connect features Issues • acceptable tolerance before ‘further investigation’ of mismatch • ‘how far back’ to go on sheet(s) with adjustments for mismatch Causes of mismatch • paper map shrinkage/expansion • errors from digitizing/scanning – georeferencing errors – accuracy of equipment – extrapolation or round-off errors Corresponding features fail to match on two sheets: Edge matching in this example would likely require ‘further research’ • overlapping map coverage Implement in ArcGIS 9 by: 1. ArcToolbox>Data Management>General>Append (replaces Geoprocessing Tools>Merge in AG 8) – combines two (or more) files, but does not link features 2. Spatial Adjustment toolbar, edge match tool – links features (after links have been manually identified) 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 22 Image Adjustments raster/image data issues Raster data is made from separate images (photos) or tiles which are mosaiced to produce “seamless image” Collars: must be removed for seamless image – Overlap between adjacent images – Borders of scanned maps Image Balancing and Feathering: adjusting radiometry for consistent and/or desired image color, brightness, contrast – Checker board appearance – Abrupt line between adjacent images – Brightness levels wash out detail in highly reflective areas, but enhance detail in low reflectance areas – Inconsistent signature for same features, especially water as function of wind or sun relative to camera (and is it blue?) Digital Ortho adjustments: – – – – Ground control (usually with GPS for visible points) to obtain ‘real world’ location Ground control for camera’s angle relative to ground Camera calibration data to remove lens distortion Digital terrain model (dtm) to remove elevation “distance” (5 mi. on map to mountain top, but 6 mi walking or on photo if mountain is 5,280 feet high!) 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 23 Collar removal required. Image Balancing/ feathering required Tiles Before After 2005 NCTCOG Digital Orthos 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 27 Interpolation: to create regular spacings from irregular data (e.g creating raster elevation surface from set of point height measurements) • estimating values for locations with no data based on: – known values, and – understanding of spatial behavior of phenomena • generally, should assign more importance to closer known values than those further away Estimated values • weighting functions – average closest n (2?) points • ignores distance – fit line between closest 2 – fit surface between closest 3 • trend surface approaches – one high order polynomial • oscillation a problem – finite element approach: fit separate polynomials for each local area – kriging: uses correlations of values with distance Implemented in ArcGIS 9 via ArcToolbox>Spatial Analyst Tools>Interpolation 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 28 Conflation • create new master coverage from the best spatial and attribute qualities of two or more source coverages – combine multiple coverages into one to simplify support – updated data obtained (e.g. new TIGER file) but need to preserve enhancements made to earlier version – two groups modify a single file, then need to recreate single version which preserves mods • create new master coverage from quality spatial data in one source and quality attribute data in another – somewhat narrower definition • Depending on the situation, can require application of a variety of processing tools and can be labor intensive: • Approaches available within ArcGIS 9 include – Spatial Adjustment toolbar, specifically attribute transfer tool – ArcToolbox>Analysis Tools>Overlay>Update • other add-ins available such as • MapMerge from ESEA, Mountain View CA for ArcGIS • GIS/T-Conflate for transportation applications 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 29 NAVSTAR Global Positioning System (gps) –use to collect ground control for imagery/orthos –or for point/line data (manholes, roads, etc) Types of Ground Collection and Corrrection NAVSTAR Satellite Program • • • • SA turned off May 1st, 2000 – – – – • – – Hand-held unit provides 10m accuracy (with SA off) $150-$1,500 per unit 24 (NAVigation Satellite Time and Ranging) WAAS (wide area augmentation system) satellites in 11,00 mile orbit provide 24 hour – <3 meter accuracy in practice (spec. is 7m vert/horiz) – Base stations (25 across US) monitor satellites coverage worldwide – 2 master stations (E & W coast) calculate corrections first launched 1978; full system operational – upload to two geosynchronous satellites over equator December 1993. – correction signal broadcast to GPS receivers (no special extra equipment needed unlike DGPS) gps receiver computes locations/elevations via signals from simultaneously visible satellites – Began operation June, 1998 – To be expanded to cover Canada, Mexico, Panama (minimum 3 for 2-D, 4 for 3-D) – European EGNO, Asian MSAS under development Selective Availability (SA) security system – 100m accuracy with single receiver, if active – 10-15m accuracy if inactive • Autonomous Multiple ways to counteract SA Even USCG broadcasted correction signal! Europeans threatened to compete Regional denial of signal possible Russia’s 21-satellite GLONASS (Global Navigation Satellite System) also available. Differential (DGPS-predecessor to WAAS) – – – accuracy 1-5m depending on equipment/exact method equipment $1,500-$15,000 per receiver correct for SA and other errors via either • real time correction signals over FM radio • post process with data from Internet Kinematic: – – – – high accuracy engineering (within cms); two receivers (base station and rover must lock-on to satellites equipment $15-30K per station Factors Affecting GPS Accuracy • Ionosphere – worst in evening at low altitudes (but ephemerous best there) • troposhere – especially water vapor which slows signal • multipath – reflected signals from buildings, cliffs, etc • ephemerous – position and number of satellites in sky – 4 required for 3D (horiz. and vertical), 3 for 2D (no elevation) – ideallly, 3 every 120° horizon. with 20° elev., 1 directly above • blockage (of satellite signal) – by foliage, buildings, cliffs, etc. – WAAS signal espec. subject to blocking by terrain & buildings ‘cos is from geostationary equatorial satellite Overall, accuracy better at night than during day. 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 31 Conclusion Most of the effort in most GIS projects involves data preparation and integration! 7/12/2016 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals 32