Object-oriented Data Models

advertisement
Geoprocessing
GTECH361
Lecture 12
3 Categories of Geoprocessing

Analysis

Data extraction; e.g., clip

Overlay; e.g., intersect

Proximity; e.g., buffer
The Geoprocessing Process

Geoprocessing tool

Workflow diagram
The Geoprocessing Process

A typical process consists of
1. Determine which geoprocessing tools you need
2. Determine the order in which the geoprocessing tools
should be used
3. Locate the first tool and open its dialog
4. Enter the tool parameters, including the input and
output datasets
5. Run the tool
6. Run the tool
7. Examine the final output and repeat some or all of the
analysis steps as needed
Part of Spatial Decision Making
GeoProcessing in ArcGIS
Geoprocessing in ArcGIS


Geoprocessing tools are organized into
toolsets of ArcToolbox
You can create your own tools and
toolsets
Creating and Using Models

A model is a collection of geoprocessing
operations that automatically execute in
sequence when the model is run to
produce a final output dataset
ModelBuilder



The building block of a model is called a process
A process consists of a geoprocessing tool
Every tool has parameters



an input dataset
values that tell the tool what to do
(e.g., the distance value for the buffer tool)
an output dataset
ModelBuilder



A model is represented by a diagram that
shows all the processes and the sequence in
which they run
The connecting arrows show how elements
and processes are related to each other
The output of one process is used as the
input for another
A Bigger Model
Model Element States
Why Use Models






Models provide a big-picture view of a
project
Models are reusable
Processes run seamlessly, faster
Processes can be run individually
Models make managing intermediate data
easy
Models can be shared
Model Documentation

Description

Metadata

Help

Labels
Spatial and Geo-Statistics
Interpolation and Exploration
Spatial Interpolation in GIS


Estimate z-value for any point location
within the map area
Assume:


that the data is continuous
data is spatially dependent (can estimate
based in surrounding locations)
Interpolation
Photogrammetric
Sampling Techniques
(a) regular sampling pattern (profiles); (b) regular pattern (grid);
(c) progressive sampling; (d) selective sampling;
(e) composite sampling
Tobler’s Law of Geography
Inverse Distance Weighted
IDW

Power

minimizing root-mean-square-prediction-error

Search radius

Barrier

Exact interpolator
Global Polynomial
Interpolation
Local Polynomial
Interpolation
Spline


Exact interpolator
Regularized


Tension



very smooth
stiffness factor
Weight
Number of points

radius
Kriging (1)
Kriging (2)
Z(si) = measured value at ith location
li = unknown weight to value at ith location
S0 = prediction location
N = number of measured values
Kriging (3)

Semivariogram
(distance h) = 0.5 * average [(value at location i –
value at location j)2]
Pairing of One Point
with Others
Semi-variogram
Understanding a
semi-variogram
Spatial Interpolation

Interpolate from contours

estimate values based on the shortest
straight line that joins two contours
Spatial Statistics







Starts with simple descriptive statistics
Measures of distribution
Comparison of local versus global
Outliers
Trends
Autocorrelation and directional variation
Covariance
Describing Spatial Data

Measures of location

Measures of spread

Measures of shape
Mapping and Removing
Trends
Spatial Data Quality
aka Error and Uncertainty
Dimensions of
Geographical Data Quality



Geographical  spatial
Lecture 02: g (x, y, z) = (t, a1, .. an)
Matrix of geographical dimensions and
quality components
Accuracy I

Accuracy is the inverse of error


Many people equate accuracy with quality but in
fact accuracy is just one component of quality.
Definition of accuracy is based on the entityattribute-value model:



Entities = real-world phenomena
Attribute = relevant property
Values = Quantitative/qualitative measurements
Accuracy II

An error is a discrepancy between the encoded
and actual value of a particular attribute for a
given entity. “Actual value” implies the
existence of an objective, observable reality.
However, reality may be:



Unobservable (e.g., historical data)
Impractical to observe (e.g., too costly)
Perceived rather than real (e.g., subjective
entities such as “neighborhoods")
Accuracy III

We do not need an objective reality in order to assess
accuracy, since all geographical data are collected with
the aid of a model that specifies — implicitly or
explicitly — the required level of abstraction and
generalisation.




This is the database “specification” and is closely related to
the “terrain nominal” concept of perceived reality.
The specification serves as the standard against which
accuracy is assessed. Thus the “actual” value is the value we
would expect based on the specification.
Accuracy is always a relative measure, since it is always
measured relative to the specification.
To judge fitness-for-use, one must judge the data relative to
the specification, and also consider the limitations of the
specification itself.
Spatial Accuracy I


Spatial accuracy is the accuracy of the spatial
component of the database. The metrics used
depend on the dimensionality of the entities
under consideration.
For points, accuracy is defined in terms of the
distance between the encoded location and
“actual” location.


Error can be defined in various dimensions:
x, y, z, horizontal, vertical, total.
Metrics of error are extensions of classical
statistical measures (mean error, RMSE or root
mean squared error, inference tests, confidence
limits, etc.).
Spatial Accuracy II
Spatial Accuracy III

For lines and areas, the situation is more
complex. This is because error is a mixture of
positional error (error in locating well-defined
points along the line) and generalisation error
(error in the points selected to represent the
line).


The epsilon band is usually used to define a zone
of uncertainty around the encoded line, within
which “actual” line exists with some probability.
However, there is little agreement (and little
empirical work) on the shape of the band, both
planimetrically and in cross-section.
Spatial Accuracy IV
Temporal Accuracy I



Temporal accuracy is the agreement between
the encoded and “actual” temporal
coordinates for an entity.
Temporal coordinates are often only implicit
in geographical data, e.g., a time stamp
indicating that the entity was valid at some
time. Often this is applied to the entire
database (e.g., a map dated “1995”).
More realistically, temporal coordinates are
the temporal limits within which the entity is
valid (e.g., Pothole Q54D-35-021 existed
between 2/12/96 and 8/9/96).
Temporal Accuracy II


Temporal accuracy is not the same as
“database time”, which is the time the
information was entered into the database.
Temporal accuracy is not the same as
“currentness” (or up-to-dateness) which is
actually an assessment of how well the
database specification meets the needs of a
particular application. A database can be
temporal accurate but still out of date;
historical applications depend on such data.
Thematic Accuracy I


Thematic accuracy is the accuracy of the
attribute values encoded in a database.
The metrics used here depend on the
measurement scale of the data:


Quantitative data (e.g., precipitation) can be treated
like a z-coordinate (elevation) and assessed using
metrics normally used for vertical error (such as the
RMSE).
Qualitative data (e.g., land use/land cover) is normally
assessed using a cross-tabulation of encoded and
“actual” classes at sample of locations. This produces
a classification error matrix
Classification Error Matrix
Water
Soil
Vegetation
TOTAL
Water
25
2
3
30
Soil
0
38
2
40
Vegetation
1
4
25
30
TOTAL
26
44
30
100
Thematic Accuracy II



Element in row i, column j of the matrix is the
number of sample locations assigned to class
i but actually belonging to class j.
The sum of the main diagonal divided by the
number of samples is a simple measure of
overall accuracy.
An error of omission means a sample that has
been omitted from its actual class. An error of
commission means an error that is included in
the wrong class. Ever error of omission is also
an error of commission.
Resolution (Precision)

Resolution refers to the amount of detail that can be
discerned in space, time or theme.


Resolution is an aspect of the database specification


Resolution is always finite
High resolution is not always better
Resolution is linked with accuracy, since the level of
resolution affects the database specification against
which accuracy is assessed.

Two databases with the same overall accuracy levels but
different levels of resolution do not have the same quality; the
database with the lower resolution has less demanding
accuracy requirements.
Spatial Resolution I

Spatial resolution is well-defined in the context
of raster data were it refers to the linear
dimension of a cell
Spatial Resolution II


For vector data resolution might be defined as the
minimum mapping unit size. Sometimes mean polygon
size is used instead, but this is erroneous since smaller
polygons may be observable but just not present on the
map
Resolution is distinct from the spatial sampling rate,
although the two are often confused with each other


Sampling rate refers to the distance between samples, while
resolution refers to the size of the sample units
Often resolution and sampling are closely matched, but they do
not necessarily need to be. When the sampling rate is higher
than the resolution, sample units overlap; when the sampling
rate is lower than the resolution, there are gaps between
sample units
Temporal Resolution I

Temporal resolution is length (temporal
duration) of the sampling interval


E.g., the shorter the shutter speed of a camera, the
higher the temporal resolution (other factors being
equal)
Temporal resolution affects the minimum duration of
an event that is discernible. If the duration is less
than the resolution, the event is invisible or at best
leaves a smudge
Temporal Resolution II

Temporal resolution is distinct from temporal
sampling rate


Resolution is the length of the sampling interval,
while sampling rate is the frequency of sampling
over time (e.g., once a day, once a week, etc.).
For example, a motion picture camera might have a
temporal resolution of 1/1000 second (i.e., the
shutter speed to capture a single frame ), and
sampling rate of 24 frames per second
Thematic Resolution

Thematic resolution refers to the precision of
the measurements or categories for a particular
theme


For categorical data, resolution is the fineness of
category definitions (e.g., “urban” vs. “residential”
and “commercial”)
For quantitative data, thematic resolution is
analogous to spatial resolution in the z-dimension
(i.e., the degree to which small differences in the
quantitative attribute can be discerned)
Consistency I


Consistency refers to the absence of apparent
contradictions in a database. Consistency is a measure of
the internal validity of a database.
Consistency can be defined with reference to the three
dimensions of geographical data



Spatial consistency includes topological consistency
e.g., all one-dimensional objects must intersect at a zerodimensional object
Temporal consistency is related to temporal topology, e.g., the
constraint that only one event can occur at a given location at a
given time
Thematic consistency refers to a lack of contradictions in redundant
thematic attributes. For example, attribute values for population,
area, and population density must agree for all entities
Consistency II

Attribute redundancy is one way in which
consistency can be assessed
e.g., an entity might have the value “Montgomery” for the attribute
“County” but the value “Honolulu” for the attribute “Town”. This is
inconsistent since there is no Honolulu town in Montgomery County.
City State 1 Annapolis Iowa ... ... ...
Here, redundancy is partial, since the state Iowa eliminates the
possibility of the city Annapolis, but the city Annapolis does not
necessarily imply the state Maryland, since Maryland is one of seven (!)
states containing a city of Annapolis
The identification of an inconsistency does not necessarily imply that it
can be corrected
The absence of inconsistencies does not necessarily imply that the data
are accurate
Completeness I


Completeness refers to a lack of errors of omission in
a database. It is assessed relative to the database
specification, which defines the desired degree of
generalisation and abstraction (selective omission)
There are two kinds of completeness


“Data completeness” is a measurable error of omission
observed between the database and the specification. Even
highly generalised databases can be “data complete” if they
contain all of the objects described in the specification
“Model completeness” refers to the agreement between the
database specification and the “abstract universe” that is
required for a particular database application. A database is
“model complete” if its specification is appropriate for a
given application.
Completeness II

Incompleteness can be measured in space,
time or theme . Consider a database of
buildings in PG County that have been placed
on the National Register of Historic Places as
of the end of 1995

Spatial incompleteness: The list contains only

Temporal incompleteness: The list contains only

buildings Hyattsville
buildings placed on the Register by June 30, 1995
Thematic incompleteness: The list contains only
residential buildings
Completeness III

Errors of commission can also be assessed.
These errors can lead to “over-completeness”

Errors of commission in space, time and theme for
the previous example:



The list also contains buildings in Silver Spring
The list contains buildings added to the list in 1996
The list contains historic districts as well as buildings
Summary of Important
Points


Data quality is the degree of excellence in a
database. Quality is assessed relative to the database
specification, which defines the desired level of
generalisation and abstraction. The quality of this
specification, and its appropriateness for particular
applications, can also be assessed
Quality assessment and reporting is based on
minimum quality standards (compliance testing or
quality control), metadata standards (truth-inlabelling and fitness-for-use), or market standards
(feedback from users).
Summary of Important
Points


Data quality is contains several components,
including accuracy, precision, consistency and
completeness. Each component can be assessed in
space, time and theme (the three basic dimensions
of geographical data).
Various assessment methods can be used for each
component/dimension combination. Some methods
are well-developed and others are not
Managing Uncertainty
in GIS
Why Bother?




Many jurisdictions now require mandatory data
quality reports when transferring data
Individual and agency reputations need to be
protected, particularly when geographic information
is used to support administrative decisions subject to
appeal
Safeguard against possible litigation by those who
allege to have suffered harm through the use of
products that were of insufficient quality to meet
their needs
The basic scientific requirement of being able to
describe how close their information is to the truth it
represents
How Was It Done Before GIS?

Traditional hardcopy maps contain valuable
forms of accuracy statements such as
reliability diagrams and estimates of
positional error

Although these descriptors were imperfect, they at
least represented an attempt by map makers to
convey product limitations to map users,
however


this approach assumed a knowledge on the part of users
as to how far the maps could be trusted, and
new users of this information are often unaware of the
potential traps that can lie in misuse of their data and
the associated technology
Managing Uncertainty I


The lack of accuracy estimates for digital data has
the potential to harm reputations of both individuals
and agencies - particularly where administrative
decisions are subject to judicial review
The era of consumer protection also has an impact
upon the issue

While we would not think of purchasing a microwave oven
or video recorder without an instruction booklet and a
warranty against defects, it is still common for organisations
to spend thousands of dollars purchasing geographic data
without receiving any quality documentation
Managing Uncertainty II


Finally, if the collection, manipulation, analysis and
presentation of geographic information is to be
recognized as a valid field of scientific endeavour,
then it is inappropriate that GIS users remain unable
to describe how close their information is to the truth
it represents
The obligation to resolve the issues associated with
uncertainty rests equally with data producers,
software and hardware vendors, system integrators
and end-users alike
Strategies for Managing
Uncertainties in GIS

Core components:





developing formal, rigorous models of uncertainty
understanding how uncertainty propagates through
spatial processing and decision making
communicating uncertainty to different levels of users
in more meaningful ways
designing techniques to assess the fitness for use of
geographic information and reducing uncertainty to
manageable levels for any given application
learning how to make decisions when uncertainty is
present in geographic information, i.e. being able to
absorb uncertainty and cope with it in our everyday
lives
Strategies for Managing
Uncertainties in GIS I

In applying the strategy, consideration is
initially given to:


the type of application
the nature of the decision to be made



low risk versus high risk
non-controversial versus controversial
non-political versus political

the degree to which system outputs are utilised within
the decision making process
Strategies for Managing
Uncertainties in GIS II



Ideally, this prior knowledge permits an assessment of
the final product quality specifications to be made
before a project is undertaken, however this may have
to be decided later when the level of uncertainty
becomes known
Data, software, hardware and spatial processes are
combined to provide the necessary information products
Assuming that uncertainty in a product is able to be
detected and modelled, the next consideration is how
the various uncertainties may best be communicated to
the user
Strategies for Managing
Uncertainties in GIS III

Finally, the user must decide what product
quality is acceptable for the application and
whether the uncertainty present is appropriate
for the given task.
There are two choices available here:


either reject the product as unsuitable and select
uncertainty reduction techniques to create a more
accurate product, or
absorb (accept) the uncertainty present and use the
product for its intended purpose
Download