Object-oriented Data Models

Geoprocessing GTECH361 Lecture 12 3 Categories of Geoprocessing  Analysis  Data extraction; e.g., clip  Overlay; e.g., intersect  Proximity; e.g., buffer The Geoprocessing Process  Geoprocessing tool  Workflow diagram The Geoprocessing Process  A typical process consists of 1. Determine which geoprocessing tools you need 2. Determine the order in which the geoprocessing tools should be used 3. Locate the first tool and open its dialog 4. Enter the tool parameters, including the input and output datasets 5. Run the tool 6. Run the tool 7. Examine the final output and repeat some or all of the analysis steps as needed Part of Spatial Decision Making GeoProcessing in ArcGIS Geoprocessing in ArcGIS   Geoprocessing tools are organized into toolsets of ArcToolbox You can create your own tools and toolsets Creating and Using Models  A model is a collection of geoprocessing operations that automatically execute in sequence when the model is run to produce a final output dataset ModelBuilder    The building block of a model is called a process A process consists of a geoprocessing tool Every tool has parameters    an input dataset values that tell the tool what to do (e.g., the distance value for the buffer tool) an output dataset ModelBuilder    A model is represented by a diagram that shows all the processes and the sequence in which they run The connecting arrows show how elements and processes are related to each other The output of one process is used as the input for another A Bigger Model Model Element States Why Use Models       Models provide a big-picture view of a project Models are reusable Processes run seamlessly, faster Processes can be run individually Models make managing intermediate data easy Models can be shared Model Documentation  Description  Metadata  Help  Labels Spatial and Geo-Statistics Interpolation and Exploration Spatial Interpolation in GIS   Estimate z-value for any point location within the map area Assume:   that the data is continuous data is spatially dependent (can estimate based in surrounding locations) Interpolation Photogrammetric Sampling Techniques (a) regular sampling pattern (profiles); (b) regular pattern (grid); (c) progressive sampling; (d) selective sampling; (e) composite sampling Tobler’s Law of Geography Inverse Distance Weighted IDW  Power  minimizing root-mean-square-prediction-error  Search radius  Barrier  Exact interpolator Global Polynomial Interpolation Local Polynomial Interpolation Spline   Exact interpolator Regularized   Tension    very smooth stiffness factor Weight Number of points  radius Kriging (1) Kriging (2) Z(si) = measured value at ith location li = unknown weight to value at ith location S0 = prediction location N = number of measured values Kriging (3)  Semivariogram (distance h) = 0.5 * average [(value at location i – value at location j)2] Pairing of One Point with Others Semi-variogram Understanding a semi-variogram Spatial Interpolation  Interpolate from contours  estimate values based on the shortest straight line that joins two contours Spatial Statistics        Starts with simple descriptive statistics Measures of distribution Comparison of local versus global Outliers Trends Autocorrelation and directional variation Covariance Describing Spatial Data  Measures of location  Measures of spread  Measures of shape Mapping and Removing Trends Spatial Data Quality aka Error and Uncertainty Dimensions of Geographical Data Quality    Geographical  spatial Lecture 02: g (x, y, z) = (t, a1, .. an) Matrix of geographical dimensions and quality components Accuracy I  Accuracy is the inverse of error   Many people equate accuracy with quality but in fact accuracy is just one component of quality. Definition of accuracy is based on the entityattribute-value model:    Entities = real-world phenomena Attribute = relevant property Values = Quantitative/qualitative measurements Accuracy II  An error is a discrepancy between the encoded and actual value of a particular attribute for a given entity. “Actual value” implies the existence of an objective, observable reality. However, reality may be:    Unobservable (e.g., historical data) Impractical to observe (e.g., too costly) Perceived rather than real (e.g., subjective entities such as “neighborhoods") Accuracy III  We do not need an objective reality in order to assess accuracy, since all geographical data are collected with the aid of a model that specifies — implicitly or explicitly — the required level of abstraction and generalisation.     This is the database “specification” and is closely related to the “terrain nominal” concept of perceived reality. The specification serves as the standard against which accuracy is assessed. Thus the “actual” value is the value we would expect based on the specification. Accuracy is always a relative measure, since it is always measured relative to the specification. To judge fitness-for-use, one must judge the data relative to the specification, and also consider the limitations of the specification itself. Spatial Accuracy I   Spatial accuracy is the accuracy of the spatial component of the database. The metrics used depend on the dimensionality of the entities under consideration. For points, accuracy is defined in terms of the distance between the encoded location and “actual” location.   Error can be defined in various dimensions: x, y, z, horizontal, vertical, total. Metrics of error are extensions of classical statistical measures (mean error, RMSE or root mean squared error, inference tests, confidence limits, etc.). Spatial Accuracy II Spatial Accuracy III  For lines and areas, the situation is more complex. This is because error is a mixture of positional error (error in locating well-defined points along the line) and generalisation error (error in the points selected to represent the line).   The epsilon band is usually used to define a zone of uncertainty around the encoded line, within which “actual” line exists with some probability. However, there is little agreement (and little empirical work) on the shape of the band, both planimetrically and in cross-section. Spatial Accuracy IV Temporal Accuracy I    Temporal accuracy is the agreement between the encoded and “actual” temporal coordinates for an entity. Temporal coordinates are often only implicit in geographical data, e.g., a time stamp indicating that the entity was valid at some time. Often this is applied to the entire database (e.g., a map dated “1995”). More realistically, temporal coordinates are the temporal limits within which the entity is valid (e.g., Pothole Q54D-35-021 existed between 2/12/96 and 8/9/96). Temporal Accuracy II   Temporal accuracy is not the same as “database time”, which is the time the information was entered into the database. Temporal accuracy is not the same as “currentness” (or up-to-dateness) which is actually an assessment of how well the database specification meets the needs of a particular application. A database can be temporal accurate but still out of date; historical applications depend on such data. Thematic Accuracy I   Thematic accuracy is the accuracy of the attribute values encoded in a database. The metrics used here depend on the measurement scale of the data:   Quantitative data (e.g., precipitation) can be treated like a z-coordinate (elevation) and assessed using metrics normally used for vertical error (such as the RMSE). Qualitative data (e.g., land use/land cover) is normally assessed using a cross-tabulation of encoded and “actual” classes at sample of locations. This produces a classification error matrix Classification Error Matrix Water Soil Vegetation TOTAL Water 25 2 3 30 Soil 0 38 2 40 Vegetation 1 4 25 30 TOTAL 26 44 30 100 Thematic Accuracy II    Element in row i, column j of the matrix is the number of sample locations assigned to class i but actually belonging to class j. The sum of the main diagonal divided by the number of samples is a simple measure of overall accuracy. An error of omission means a sample that has been omitted from its actual class. An error of commission means an error that is included in the wrong class. Ever error of omission is also an error of commission. Resolution (Precision)  Resolution refers to the amount of detail that can be discerned in space, time or theme.   Resolution is an aspect of the database specification   Resolution is always finite High resolution is not always better Resolution is linked with accuracy, since the level of resolution affects the database specification against which accuracy is assessed.  Two databases with the same overall accuracy levels but different levels of resolution do not have the same quality; the database with the lower resolution has less demanding accuracy requirements. Spatial Resolution I  Spatial resolution is well-defined in the context of raster data were it refers to the linear dimension of a cell Spatial Resolution II   For vector data resolution might be defined as the minimum mapping unit size. Sometimes mean polygon size is used instead, but this is erroneous since smaller polygons may be observable but just not present on the map Resolution is distinct from the spatial sampling rate, although the two are often confused with each other   Sampling rate refers to the distance between samples, while resolution refers to the size of the sample units Often resolution and sampling are closely matched, but they do not necessarily need to be. When the sampling rate is higher than the resolution, sample units overlap; when the sampling rate is lower than the resolution, there are gaps between sample units Temporal Resolution I  Temporal resolution is length (temporal duration) of the sampling interval   E.g., the shorter the shutter speed of a camera, the higher the temporal resolution (other factors being equal) Temporal resolution affects the minimum duration of an event that is discernible. If the duration is less than the resolution, the event is invisible or at best leaves a smudge Temporal Resolution II  Temporal resolution is distinct from temporal sampling rate   Resolution is the length of the sampling interval, while sampling rate is the frequency of sampling over time (e.g., once a day, once a week, etc.). For example, a motion picture camera might have a temporal resolution of 1/1000 second (i.e., the shutter speed to capture a single frame ), and sampling rate of 24 frames per second Thematic Resolution  Thematic resolution refers to the precision of the measurements or categories for a particular theme   For categorical data, resolution is the fineness of category definitions (e.g., “urban” vs. “residential” and “commercial”) For quantitative data, thematic resolution is analogous to spatial resolution in the z-dimension (i.e., the degree to which small differences in the quantitative attribute can be discerned) Consistency I   Consistency refers to the absence of apparent contradictions in a database. Consistency is a measure of the internal validity of a database. Consistency can be defined with reference to the three dimensions of geographical data    Spatial consistency includes topological consistency e.g., all one-dimensional objects must intersect at a zerodimensional object Temporal consistency is related to temporal topology, e.g., the constraint that only one event can occur at a given location at a given time Thematic consistency refers to a lack of contradictions in redundant thematic attributes. For example, attribute values for population, area, and population density must agree for all entities Consistency II  Attribute redundancy is one way in which consistency can be assessed e.g., an entity might have the value “Montgomery” for the attribute “County” but the value “Honolulu” for the attribute “Town”. This is inconsistent since there is no Honolulu town in Montgomery County. City State 1 Annapolis Iowa ... ... ... Here, redundancy is partial, since the state Iowa eliminates the possibility of the city Annapolis, but the city Annapolis does not necessarily imply the state Maryland, since Maryland is one of seven (!) states containing a city of Annapolis The identification of an inconsistency does not necessarily imply that it can be corrected The absence of inconsistencies does not necessarily imply that the data are accurate Completeness I   Completeness refers to a lack of errors of omission in a database. It is assessed relative to the database specification, which defines the desired degree of generalisation and abstraction (selective omission) There are two kinds of completeness   “Data completeness” is a measurable error of omission observed between the database and the specification. Even highly generalised databases can be “data complete” if they contain all of the objects described in the specification “Model completeness” refers to the agreement between the database specification and the “abstract universe” that is required for a particular database application. A database is “model complete” if its specification is appropriate for a given application. Completeness II  Incompleteness can be measured in space, time or theme . Consider a database of buildings in PG County that have been placed on the National Register of Historic Places as of the end of 1995  Spatial incompleteness: The list contains only  Temporal incompleteness: The list contains only  buildings Hyattsville buildings placed on the Register by June 30, 1995 Thematic incompleteness: The list contains only residential buildings Completeness III  Errors of commission can also be assessed. These errors can lead to “over-completeness”  Errors of commission in space, time and theme for the previous example:    The list also contains buildings in Silver Spring The list contains buildings added to the list in 1996 The list contains historic districts as well as buildings Summary of Important Points   Data quality is the degree of excellence in a database. Quality is assessed relative to the database specification, which defines the desired level of generalisation and abstraction. The quality of this specification, and its appropriateness for particular applications, can also be assessed Quality assessment and reporting is based on minimum quality standards (compliance testing or quality control), metadata standards (truth-inlabelling and fitness-for-use), or market standards (feedback from users). Summary of Important Points   Data quality is contains several components, including accuracy, precision, consistency and completeness. Each component can be assessed in space, time and theme (the three basic dimensions of geographical data). Various assessment methods can be used for each component/dimension combination. Some methods are well-developed and others are not Managing Uncertainty in GIS Why Bother?     Many jurisdictions now require mandatory data quality reports when transferring data Individual and agency reputations need to be protected, particularly when geographic information is used to support administrative decisions subject to appeal Safeguard against possible litigation by those who allege to have suffered harm through the use of products that were of insufficient quality to meet their needs The basic scientific requirement of being able to describe how close their information is to the truth it represents How Was It Done Before GIS?  Traditional hardcopy maps contain valuable forms of accuracy statements such as reliability diagrams and estimates of positional error  Although these descriptors were imperfect, they at least represented an attempt by map makers to convey product limitations to map users, however   this approach assumed a knowledge on the part of users as to how far the maps could be trusted, and new users of this information are often unaware of the potential traps that can lie in misuse of their data and the associated technology Managing Uncertainty I   The lack of accuracy estimates for digital data has the potential to harm reputations of both individuals and agencies - particularly where administrative decisions are subject to judicial review The era of consumer protection also has an impact upon the issue  While we would not think of purchasing a microwave oven or video recorder without an instruction booklet and a warranty against defects, it is still common for organisations to spend thousands of dollars purchasing geographic data without receiving any quality documentation Managing Uncertainty II   Finally, if the collection, manipulation, analysis and presentation of geographic information is to be recognized as a valid field of scientific endeavour, then it is inappropriate that GIS users remain unable to describe how close their information is to the truth it represents The obligation to resolve the issues associated with uncertainty rests equally with data producers, software and hardware vendors, system integrators and end-users alike Strategies for Managing Uncertainties in GIS  Core components:      developing formal, rigorous models of uncertainty understanding how uncertainty propagates through spatial processing and decision making communicating uncertainty to different levels of users in more meaningful ways designing techniques to assess the fitness for use of geographic information and reducing uncertainty to manageable levels for any given application learning how to make decisions when uncertainty is present in geographic information, i.e. being able to absorb uncertainty and cope with it in our everyday lives Strategies for Managing Uncertainties in GIS I  In applying the strategy, consideration is initially given to:   the type of application the nature of the decision to be made    low risk versus high risk non-controversial versus controversial non-political versus political  the degree to which system outputs are utilised within the decision making process Strategies for Managing Uncertainties in GIS II    Ideally, this prior knowledge permits an assessment of the final product quality specifications to be made before a project is undertaken, however this may have to be decided later when the level of uncertainty becomes known Data, software, hardware and spatial processes are combined to provide the necessary information products Assuming that uncertainty in a product is able to be detected and modelled, the next consideration is how the various uncertainties may best be communicated to the user Strategies for Managing Uncertainties in GIS III  Finally, the user must decide what product quality is acceptable for the application and whether the uncertainty present is appropriate for the given task. There are two choices available here:   either reject the product as unsuitable and select uncertainty reduction techniques to create a more accurate product, or absorb (accept) the uncertainty present and use the product for its intended purpose

Object-oriented Data Models

Related documents

Products

Support

Object-oriented Data Models

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib