DATA QUALITY AND ERROR Terminology, types and sources Importance Handling error and uncertainty DATA QUALITY GIGO: garbage in, garbage out Because it’s in the computer, don’t mean it’s right Accept there will always be errors in GIS INTRODUCTION • GIS - great tool for spatial data analysis and display • question: what about error? • data quality, error and uncertainty error propagation confidence in GIS outputs be careful, be aware, be upfront TERMINOLOGY • various (often confused terms) in use: error uncertainty accuracy precision data quality ERROR AND UNCERTAINTY Error • wrong or mistaken • degree of inaccuracy in a calculation e.g. 2% error Uncertainty • lack of knowledge about level of error • unreliable Accuracy and Precision Accuracy • extent of system-wide bias in measurement process Imprecise Precision • level of exactness associated with measurement Precise Inaccurate Accurate 1 2 3 4 DATA QUALITY • degree of excellence in data • general term for how good the data is • takes all other definitions into account error uncertainty precision accuracy DATA QUALITY • based on the following elements: positional accuracy attribute accuracy logical consistency data completeness POSITIONAL ACCURACY • spatial: deviance from true position (horizontal or vertical) • general rule: be within the best possible data resolution i.e: for scale of 1:50,000, error can be no more than 25m • can be measured in root mean square error (RMS) measure of the average distance between the true and estimated location • temporal: difference from actual time and/or date ATTRIBUTE ACCURACY • classification and measurement accuracy a feature is what the GIS thinks it to be i.e. a railroad is a railroad and not a road i.e. a soil sample agrees with the type mapped • rated in terms of % correct • in a database, forest types are grouped and placed within a boundary • in reality - no solid boundary where only pine trees grow on one side and spruce on the other ATTRIBUTE ACCURACY LOGICAL CONSISTENCY • presence of contradictory relationships in the database • non-spatial crimes recorded at place of occurrence, others at place where report taken data for one country is for 2000, another for 2001 data uses different source or estimation technique for different years LOGICAL CONSISTENCY • spatial overshoots and gaps in road networks or parcel polygons Good logical consistency COMPLETENESS • reliability concept • partially a function of the criteria for including features • are all instances of a feature the GIS claims to include, in fact, there? when does a road become a track? simply put, how much data is missing? SOURCES OF ERROR • sources of error: data collection and input human processing actual changes data manipulation data output DATA COLLECTION AND INPUT • • inherent instability of phenomena itself random variation of most phenomena (i.e. leaf size) edges may not be sharp boundaries (i.e. forest edges) description of source data data source name, date of collection, method of collection, date of last modification, producer, reference, scale, projection inclusion of metadata DATA COLLECTION AND INPUT • instrument inaccuracies: satellite/air photo/GPS/spatial surveying e.g. resolution and/or accuracy of digitizing equipment thinnest visible line: 0.1 - 0.2 mm at scale of 1:20,000 - 6.5 - 12.8 feet anything smaller, not able to capture attribute measuring instruments DATA COLLECTION AND INPUT • model used to represent data • e.g. choice of datum, classification system data encoding and entry e.g. keying or digitizing errors original digitised DATA COLLECTION AND INPUT Attribute uncertainty • uncertainty regarding characteristics (descriptors, attributes, etc.) of geographical entities • types: imprecise or vague, mixed up, plain wrong • sources: source document, misinterpretation, database error 505.9 500 500-510 505.9 238.4 238.4 240 230-240 238.4 505.9 HUMAN PROCESSING • misinterpretation (i.e. photos), spatial and attribute • effects of classification (nominal/ordinal/ interval) • Global DEM Nation al DEM European DEM effects of scale change and generalization Scale of data Local DEM HUMAN PROCESSING • generalization - simplification of reality by cartographer to meet restrictions of map scale and physical size, effective communication and message • can result in: reduction, alteration, omission and simplification of map elements 1:10,000 1:500,000 1:25,000 City of Sapporo, Japan ACTUAL CHANGES • gradual natural changes: river courses, glacier recession • catastrophic changes: fires, floods, landslides • seasonal and daily changes: lake/sea/river levels • man-made: urban development, new roads • attribute change: forest growth (height), discontinued trail/roads, road surfacing ACTUAL CHANGES • age of data Northallerton circa 1999 Northallerton circa 1867 DATA MANIPULATION vector to raster conversion errors • coding and topological mismatch errors: cell size (majority class and central point) Fine raster Coarse raster DATA MANIPULATION vector to raster conversion errors • coding and topological mismatch errors: grid orientation Original Tilted Original raster Shifted DATA MANIPULATION • compounding effects of processing and analysis of multiple layers if two layers each have correctness of 90%, the accuracy of the resulting overlay is around 81% • density of observations - TIN modeling and interpolation • inappropriate or inadequate class intervals or inputs for models DATA OUTPUT • scaling accuracies • detail on scale bar and scale type error caused by inaccuracy of the output devices: resolution of computer screen or printer colour palettes: intended colours don’t match from screen to printer DATA OUTPUT USE • information may be incorrectly understood • information may be inappropriately used HANDLING ERROR • must learn to cope with error and uncertainty in GIS applications minimise risk of erroneous results minimise risk to life/property/environment • more research needed: mathematical models procedures for handling data error and propagation empirical investigation of data error and effects procedures for using output data uncertainty estimates incorporation as standard GIS tools HANDLING ERROR • Awareness knowledge of types, sources and effects • Minimization use of best available data correct choices of data model/method • Communication to end user!