Spatial data quality

advertisement
Spatial data quality
February 10, 2006
Geog 458: Map Sources and Errors
Outlines
•
•
•
•
•
•
•
Why is spatial data quality an issue?
Taxonomy of uncertainty
Anatomy of error
Definition of quality
Assessing data quality
Levels of testing
Component of spatial data quality
Why is spatial data quality an issue?
Examples
• Buying land parcel
– Discrepancy between area of land parcel
measured in land parcel map and through
GPS receivers
• Land use change research
– Classification accuracy of remotely sensed
images & its effect on policy-making
Why is spatial data quality an issue?
• Spatial data is used in many decisions and analyses.
• Increasing availability, exchange and use of spatial data
– Good news: more awareness of importance of geographic
information
– Bad news: poor-quality data is increasingly available
• Growing users less aware of spatial data quality
• Gap between data producers and data users
• GIS do not usually provide functionality for analyzing
error propagation
• Spatial data quality has implications for decision-making
– People do not fully appreciate the consequences of poor-quality
data whereas they are easily convinced by pretty maps
• To resolve issues illustrated in the
examples,
– Uncertainty should be defined precisely
• Taxonomy of uncertainty
• How to quantify error?
– The effect of input data on analysis should be
demonstrated
• How to validate error?
Taxonomy of uncertainty
• Real world to spatial data
• The discrepancy between real world and spatial
data is unavoidable;
• The transformation (from real world to data) can
be partitioned into two steps, and errors are
inherent in each step
– Interpretation (or conceptualization)
• Vagueness: arises due to poor definition
• Ambiguity: arises due to disagreement
– Measurement
• Error: discrepancy between observed value and true value;
can be measured only when it has clear definition about what
constitutes the truth
Anatomy of error
• Error is the discrepancy between observed
value and true value
Measurement
Mean of measurement
Systematic error
(=bias)
Random error
Test data set
True value
Total error
Reference data set
Accuracy is calculated from total error; closeness of an observation to a true value
Precision is calculated from random error (see Figure 6.11 at p.140)
Anatomy of error
• Precision
– Statistical: variability among repeated measurements
– Storage: amount of details that can be discerned
• Resolution
– Detail in which the data is presented
– Minimum distance which can be recorded
• Scale
– Map scale: ratio of ground distance to earth distance
• Discuss the relation of these terms to accuracy
Anatomy of error
• Statistical precision (deviation from sample
mean) is called relative accuracy
• Storage precision sets a lower bound on
resolution
• Resolution sets a lower bound on accuracy
• Geographic scale and resolution are separate
especially in digital maps even though they are
historically related
– Large scale hardcopy map has high resolution
– Small scale hardcopy map has low resolution
Definition of quality
• See the reading C4 (Chrisman 1984)
– Degree of excellence
– Meeting an expectation
– Fitness for use
– Conformance to a standard
• “Totality of characteristics of a product that
bear on its ability to satisfy stated and
implied needs” (ISO 19113)
Who assesses data quality?
• Minimum quality standards
– Data product should pass quality standard
– Data producers are responsible for assessing data quality
– May be too inflexible
• Metadata standards
– Does not impose quality standard under the belief that errors are
inherent
– Data producers simply provide documentation (i.e. truth in
labeling)
– Data users are responsible for determining fitness-for-use of the
data
• Market standards
– Uses a two-way information flow between data producers and
data users
Data may not be perfectly accurate, but it can be useful to some extent given
applications and purposes; In other words, it has a quality (fitness-for-use); To
determine data quality, errors should be properly documented (→ data quality)
Levels of error assessment
•
Lineage report (descriptive)
– Most primitive level of error assessment, but essential to understanding the
characteristics of data
– Provides data source & processing steps
– Algorithm used for mathematical transformation, geographic scale of source data,
currentness of data
•
Deductive estimates (descriptive)
– Guess (extrapolation) based on sample testing; calibration test
– Should be as numeric as possible
•
Internal evidence (quantitative measure)
– Provides the result of error propagation analysis;
– What is the impact of parameters in input data on output product or analysis
results?
• e.g. modifiable areal unit problem
•
External source (quantitative measure)
– If “true value” is believed to be existent,
– Report on discrepancy between observed value and true values
Read C4 p. 45 (Levels of Testing) or C3 (SDTS data quality section)
Matrix of spatial data quality
Space
Time
Accuracy
Theme
CSDGM:
•Positional accuracy
•Attribute accuracy
Consistency
•Logical consistency
Completeness
•Completeness
Row: components of data quality
Column: components of geographic information
Components of spatial data quality:
Accuracy
• Accuracy is the inverse of error
• Many people equate accuracy with quality but in
fact accuracy is one component of quality
• An error is a discrepancy between the observed
value and true value
• What if true value is not existent?
• Relative accuracy may suffice in some cases
– Land parcel area
• Absolute accuracy may be required in some
cases
– Exact geodetic coordinate value
Read SDTS data quality section (reading C3)
Components of spatial data quality:
Accuracy
• Can be divided into spatial, temporal, and
thematic accuracy
• Spatial accuracy and thematic accuracy are
recognized in CSDGM: positional accuracy and
attribute accuracy
• If data is measured in the quantitative scale,
RMSE can used to report error estimates
• If data is measured in the qualitative scale,
misclassification matrix can be used to report
error estimates
Components of spatial data quality:
Consistency
•
•
•
•
Absence of apparent contradictions in a database
The fidelity of relationships encoded in the data structure
Internal validity of a database
Spatial consistency includes conformance to topological
rules
• Temporal consistency is related to temporal topology
(some event can occur at a given location at a given time)
• Thematic consistency refers to a lack of contradiction in
attributes (e.g. density = population / area; they should
agree with all; e.g. Erie county is not a part of
Washington State)
Components of spatial data quality:
Completeness
• A lack of errors of omission in a database
• Relationship between the object represented and the
abstract universe of all such objects
• To define the abstract universe, we need specification
(e.g. what constitutes hospital, housing, forest?)
Data
Spec
Abstract universe
• Two kinds of completeness
– Data completeness: data relative to specification
– Model completeness: spec. relative to abstract universe
– If highly generalized data can be data complete if they contain all
of the objects described in the specification
– A data is model complete if its specification is appropriate for a
given application
Components of spatial data quality:
Completeness
• Within data completeness,
– Errors resulting in overcompleteness are called errors of
commission
– Errors resulting in incompleteness are called errors of omission
• Can be divided into spatial, temporal and thematic
completeness
• For example, database of building in Washington state
as of January 2003
– e.g. Spatial incompleteness: data has only one building
– e.g. Temporal incompleteness: data includes building placed by
2000
– e.g. Thematic incompleteness: data has only residential building
Download