Document 11863944

advertisement
This file was created by scanning the printed publication.
Errors identified by the software have been corrected;
however, some errors may remain.
Exploring Spatial Confidence with a
Matthew H. Pelkkil
Abstract.-- One of the main input methods into geographic mformation
systems remains digitizing spatial data from paper maps. While newer,
automated methods such as global positioning systems promise better
accuracy and quality of data for future GIs, digitizers are llkely to be part of
the data collection technology for some time into the future. When digtizing
this data, it is important to remember that a single sample of the representation
of reality is being taken. Information related to the spatial confidence of the
resulting representation often is not quantified well by the digitizing software.
Spatial confidence can be expressed as percentage or degree of certainty that
a represented object does indeed exist at that location or within a certain
distance of that location. T h s research found that for map registration error
may not be a good indication the required raster resolution to obtain h g h
spatial confidence. It was also found that complex lines tend to increase the
amount of error or variance in the resulting digital map.
INTRODUCTION
Obtaining spatial data of high quality is an obvious desire of spatial data users.
Positional accuracy is one important component of data quality (Antenucci et al,
1991). Error in spatial data is inherent and difficult to remove. While minimizing
error has long been a goal of spatial sciences (Chrisman, 1991), others (Aronoff
1989) suggest that since error cannot be eliminated, it should be managed. Certainly,
the first step in managing error is being able to quantify it.
The need to explicitly understand spatial confidence in a GIs is growing as digital
data becomes less specialized and more widespread and mainstream. It is well known
that digital data is perceived to be of higher quality than analog data due to the
appearance of precision and accuracy. As developers of digital data become more and
more removed fiom the end-users, explicit understanding of locational confidence is
important to prevent rnis-application of data to analyses for which it is not suitable.
Paper maps are currently the most common source for geographic data, and while
1
Assistant Professor, University of Kentucky, Department of Forestry, Lexington, KY 40546-0073.
336
automated methods such as global positioning systems promise better accuracy and
quality of data for fiture GIs, digitizers are likely to be a part of the data collection
technology for some time into the future. Historical data, and much natural resource
data will be available only on paper maps. In these cases, the map is in fact the only
representation of "truth and any one manual digitization of that map is but one
sample of reality. The map scale is often used as an estimate for the accuracy of the
digitized data (Fisher, 1991) and one sample of reality is often all that is taken due to
time and cost constraints. This limits the information on the quality of the data, as it
assumes that the first sample is truly representative of the "truth."
Ideally, spatial confidence metadata should be attached to a raster GIs that states
something like, "there is a 95% confidence that any object classed within a raster cell
actually exists within that cell or within a radius of X cells from the indicated cell."
From a single sample, even assuming that the analog data represents "truth," this is
difficult to measure. This research involves exploration of absolute locational
accuracy of point and line data that is manually digitized and rasterized to various
resolutions under the assumption that the analog medium being digitized is in fact
"truth."
METHODS
The GIs software chosen for this study was EPPL7 (Environmental Programming
and Planning Language) release 2.1. This is a relatively simple raster-based GIs
created and maintained by the State of Minnesota Land Management Information
Center. It is used as a "desktop" GIs in Minnesota, and has users throughout the
United States and in 15 other countries. Users include universities, several state
natural resource agencies, the National Park Service, and the Fish and Wildlife
Service.
Two separate spatial representations were created digitally. One representation was
composed of ten randomly located points. The second spatial representation was
composed of ten lines, two each having one to five line segments. This was done to
simulate a some increasing complexity in lines, but to keep it on a quantifiable scale.
Thus, according to Burrough (1 986), more complex lines (those with more vertices)
should have a greater degree of error associated with them. These digital
representations were considered to be "truth," and were printed out on a relatively
stable medium (transparency film) for manual digitizing. The scale of these printed
images was 150,000, and the point and line width printed on the map was 11100th
of an inch, giving the approximate representation width of < 13 m.
Both the point and the line images were digitized ten times under two conditions.
The first set of ten samples were collected under a single map registration, and the
second set of ten samples were each collected under a different map registration. This
was done to see how map registration differences affected positional accuracy. The
registration standard deviation of error was recorded in each and every case. In the
EPPL7 manual, it is recommended that the standard deviation be less than the desired
resolution for rasterizing the file (LMIC, 1992). All digitizing was done in the center
of a 24 x 36 inch Calcomp 3300 digitizing tablet. The room was climate controlled
to reduce climate-caused distortion of the transparencies.
Once digitized, the data files were rasterized to various resolutions (5, 10, 20, and
40 meters). The digital "truth" file was also rasterized to the same resolution. Error
in absolute position was determined by overlay, and the error was weighted by the
distance between any one raster cell and the "truth."
Similar to the study reported by Maffini et a1 (1989), this study examines positional
errors. It examines the performance of a single digitizer operator over repeated trials
and does not control for error properties in converting the digital "truth" to
transparency medium, errors in the digitizing tablet or equipment, nor control for
speed of the operator.
RESULTS
Table 1 shows the results from ten separate digitizing operations on the ten point
locations using the same map registration. The distances for locational error were
calculated by counting the number of raster cells the sample point was displaced from
the "truth" point and multiplying by the resolution to obtain X and Y positional error.
Since under each resolution there where ten points digitized ten times, the five points
with the largest positional errors were discarded, and the positional error that would
contain 95 points is recorded in the leftmost column of table 1. We can see that as
raster resolution gets smaller, the 95% zone of inclusion approaches a value around
20 m.
Point data digitized under different map registrations is shown in table 2. The
rasterized data for 40m resolution were omitted, but a trend similar to table 1 appears
Table 1. Positional error of point data digitized under same registration and rasterized from
5m to 40m resolution.
Raster cell
resolution
Mean Distance
Positional Error
1 - Std. Dev. of registration = 3.16 m
Minimum
Positional Error
Maximum
Positional Error
951100 Cells lie
within:
Table 2. Positional error of point data digitized under different registrations and rasterized
from 5m to 30m resolution.
Raster cell
resolution
Mean Distance
Positional Error
Minimum
Positional Error
Maximum
Positional Error
30 m
20.8 m
Om
42.4 m
1 - Std. Dev. of registrations ranged fiom 3.69m to 13.17m,and averaged 6.93m
951100 Cells lie
within:
42.4 m
to occur. As raster resolution gets smaller, a more accurate measurement of absolute
position error is possible, and the maximum value which includes 95% of all points
approaches some value.
Determining positional error for line data is a bit more complex. Error for lines
includes overshoots and undershoots, as well as locational deviation for portions of
the line. A line error index was calculated for each digitized line as it was compared
to the "truth" line. The error index weighted errors by the magnitude of the deviation
fkom the "truth" line. The formula used is as follows:
where:
LEI = location error index
N = number of raster cells in "truth" line
N' = number of raster cells in digitized line
r = raster cell resolution
di = distance between truth line and digitized line for cell I
This index is independent of line distance, but it is not independent of resolution, since
with larger raster cell sizes, small deviations between the "truth" line and the digitized
line will be impossible to detect. Therefore, the error index should decrease with
raster cell size but increase with line complexity. An LEI = 0.50 means that, for any
given line length, 50% of that line lies in a incorrect cell location weighted by the
magnitude of the error.
Tables 3 and 4 show the LEI value for lines rasterized to various resolutions under
the same and multiple map registrations, respectively. The tables also show the
maximum deviation in number of cells for the ten sample lines from the "truth" line.
From the tables, two relationships appear. The first is that as raster cell size increases,
Table 3. Line data digitized under the same registration and rasterized from 5m to 40m
resolution (# cells max. deviation in parentheses).
Raster cell
resolution
Number of line segments per line
1
2
3
4
5
Table 4. Line data digitized under different registrations and rasterized from 5m to 40m
resolution (# cells max. deviation in parentheses).
Raster cell
resolution
Number of line segments per line
1
2
3
4
5
the LEI index decreases as does the maximum number of cells deviation for the
sample lines. The second relationship is that more complex lines have higher LEI
values and maximum cell deviation values, indicating that complex lines introduce
more error than simple lines. Multiple linear regressions indicated that the both these
relationships are sigtllficant at the a = 0.05 level of significance. It is also interesting
to note that incorporating the map registration error into the regression model did not
significantly improve the prediction of error (a=0.10).
CONCLUSIONS
In the EPPL7 GIs, the summed standard deviation reported during map registration
appears to be a poor indicator of locational accuracy of the resulting map. Given the
map scale, the point and line representation equated to just under 13 m, and so
perhaps a better estimation of adequate raster cell size would be the 13 m plus two
standard deviations. This would suggest a cell size of 20 meters.
It appeared that multiple map registrations increased error, but the tests were
uncontrolled and so no comments about the significance of any apparent numerical
differences can be made.
In line data, error is strongly correlated to line complexity. This makes good
intuitive sense, as the more complex the line, the greater the number of vertices are
required to represent that line with reasonable accuracy. The correlation between line
error and target raster cell size is an indication that larger raster cells do a poor job
of calculating distance, and as the raster cell size increases, differences between the
digitized line and the "truth" line that are less than the raster cell size are rounded to
zero. However, for maps covering large areas, small raster cell size requires a great
deal of processing time and storage.
This preliminary work has identified the need to incorporate various map scales into
the procedure, particularly larger scale maps that will better allow the testing of
registration error on overall locational error. Testing for multiple digitizing personnel,
and various levels of operator experience might also prove interesting.
ACKNOWLEDGMENTS
Special thanks go to Linda Delay, a student worker that performed the digitizing
and ran the computer macros to determine the error components. Without her time
and effort on data collection, this project would still be in the conceptual stage.
REFERENCES
Antenucci, I. C., K. Brown, P. L. Croswell, M. J. Kevany, and H. Archer. 1991.
Geographical information systems: A guide to the technology. Van Nostrand
Reinhold, New York, NY. 3 0 1 p.
Aronoff, S. 1989. Geographic Information Systems: A management perspective.
WDL Publications, Ottawa, Canada. 294 p.
Burrough, P . A. 1986. Principles of geographical information systems for land
resources assessment. Oxford University Press, New York, NY. 194 p.
Chrisman, N. R. 1991. The error component in spatial data. In Geographical
Information Systems, Volume 1: Principles, D. J. Maguire, M. F. Goodchild, and
D. W. Rhind, editors. John Wiley and Sons, New York. Pp. 165-174.
Fisher, P. F. 1991. Spatial data sources and data problems. In Geographical
Information Systems, Volume 1: Principles, D. J. Maguire, M. F. Goodchild, and
D. W. Rhind, editors. John Wiley and Sons, New York. Pp. 175-189.
LMIC. 1992. EPPL7 User's Guide, Release 2.0, Tutorial Chapter, page 110. Land
Management Information Center, St. Paul, MN.
Ma£Eni, G., M. Arno, and W. Bitterlich. 1989. Observations and comments on the
generation and treatment of error in digital GIs data. In Accuracy of spatial
databases, M. Goodchild and S. Gopal, editors. Taylor and Francis, Bristol, PA.
Pp. 55-68.
BIOGRAPHICAL SKETCH
Matthew H. Pelkki is an Assistant Professor of Forest Management and Economics
at The University of Kentucky Department of Forestry. He graduated with a B.S.F.
in 1985 from the University of Michigan's School of Natural Resources &
Environment, and then earned an M.S. (1988) and Ph.D. (1992) from the University
of Minnesota's College of Natural Resources. Matthew has been an assistant
professor at the University of Kentucky siice 1991 where he teaches courses in timber
management, integrated forest resource management, and applications of GIs in
natural resources. His research interests include dynamic programming and standlevel optimization, natural resource information system planning and design, and the
economics of bioremediation/ecological restoration.
Download