Document 11863927

advertisement
This file was created by scanning the printed publication.
Errors identified by the software have been corrected;
however, some errors may remain.
The Development of a GIs
Information Quality Module
Jane Drummond 1, Allan Brown 2, Du Daosheng 3, corn6 van
Elzakker 4
Abstract.- In the early 1980s the authors first considered 1) the
transfer of traditional mapmakers' expertise regarding spatial
information quality into GIs and 2) ways of determining GIs
generated information's quality. Taking into account the now accepted
principles that attribute, position, age, and topology quality and
lineage all contribute to GIs generated information, the framework
for our work was that as a GIs generated new information an
associated uncertainty subsystem should in parallel generate
information on the quality of that information and that the two sets of
information should be displayed simultaneously to better the GIS
user's decision making. Now an extant (though incomplete) entity the
subsystem consists of appropriate means to determine and store the
quality of spatial data and the processing model used to generate
information, means for determining the generated information quality
(variance propagation, fwzy sub-set theory, etc.) and means for
visualizing that quality. These are integrated in an Uncertainty
Subsystem called PUIUS. The subsystem has been implemented in
ILWIS. This paper will review the subsystem (and another),
sum~narizesome applications and point the way to further work.
Maintaining high standards has long concerned land surveyors,
photogrammetrists and cartographers, but the Geographic Information System
(GIs) community seemed only alerted to the problem of information quality in
the early 1980's. By the end of the 1980's those in GIs had accepted the
principles that: quality of attribute and position; information aye; topological
consistency; and lineage all contributed to the quality of any GIs generated
information. We first started, formally, to consider 1) the transfer of
traditional mapmakers' expertise regarding spatial information quality in the
GIS environment and 2) ways to determine the quality of GIs generated
information, in the mid-eighties.
Taking into account the principles presented in the first paragraph, our
study's framework was that while a GIs generated new information an
associated uncertainty subsystem should in parallel generate information on
the quality of that new information and that the two sets of information should
(if possible) be displayed simultaneously to better the GIs user's decision
making. The subsystem itself would consist of appropriate means to determine
(knowledge base, user queries, etc.) and store the quality of spatial data, to
store the quality of the processing model used to generate information, the
means for determining the generated information quality (variance
propagation, fuzzy sub-set theory, etc.) and the means (2D, 3D) for visualizing
that quality. These were integrated in an Uncertainty Subsystem called PUIUS
(Prototype User Interface for an Uncertainty Subsystem). The subsystem has
been implemented in ILWIS, a PC based GIs developed and marketed bt the
International Institute for Aerospace Survey and Earth Sciences (ITC) in The
Netherlands. This paper will review this subsystem.
The review will take the form of a summary of projects carried out by the
group which lead to the development of PUIUS, and an investigation of a
recent version of another off-the-shelf GIs package which is now marketted
with some information quality processing tools. Thus the five remaining
sections of this paper will deal with i) early ideas; ii) the XGIS project; iii) the
PUIUS project; iv) the Idrisi's information quality processing tools; v) and
conclusions.
EARLY IDEAS
In 1983 two of this paper's authors, working at the UK Experimental
Cartography Unit, prepared a very straightforward presentation on digital
spatial data quality for a Visiting Committee. The presentation showed
boundary widths for polygons of various natural features as a function of the
standard deviation (SD) of these boundaries' coordinates, and then the
resulting overlaid polygons. Some visitors expressed surprise at the
demonstrated imprecise nature of the digital spatial data we so expensively
acquired. Although official British science seemed unready pay for
recognisably imperfect digital spatial data (the two authors concerned
relocated to the Netherlands and China), at about the same time Blakemore
(1983) and Chrisman (1983) were much more successfully convincing
European and North American academics that the imperfections of digital
spatial data represented a respectable research area. The presentation's
polygons represented areas of (assumed) uniform slope, soil class, rainfall and
temperature and were overlaid and used for crop suitability mapping
(ODAJ983). At the time, and working in the vector environment, to
determine the precision of such contributing polygons' boundaries and then
produce overlaid polygons with boundary line widths proportional to their
(e.g.) precision was cumbersome. The same problem has now been much
more elegantly addressed in the raster environment of PUIUS (see below).
The user of a G I s is not likely to be interested in the quality of the data per
se, but in the quality of the generated information. In the ODA suitability
mapping project, the input sets of polygons each had a centroid - as had each
polygon resulting from overlay. Standard databasing facilities permit such
centroids to have attributes referring to data quality and can be easily
processed. Such was part of the thinking behind much subsequent work in the
group (Drummond,1987).The group understood a GIs to be both:
a system which used processing models to generate new
(i)
information from existing data; and,
a
spatial data archive which allowed one to retrieve and display
(ii)
stored data.
In the first case the new information will be visualized, but simultaneously
(or by toggling) its quality should be visualised. Error propagation models are
directly related to information generation models and information generation
models can be mathematical (e.g. computation of aspect; USLE) or logical
(e.g. crop suitabilty; Community Tax determination). Thus if a mathematical
information processing model gives us X = f(a,b,c) then oX = f(oa,ob,oc),
where o represents Standard Deviation. Likewise if a logical processing model
gives us Y = f (l,m,n) then pY = f(pl,pm,pn) where p represents a certainty
statistic such as Probability or Certainty Factor. However the processing
model itself may be in error, thus however good the data are, the generated
information may be poor. In the case of a mathematical model its error can be
handled as noise and subsequently processed by oX = f(oa,ob,oc,on) where n
represents noise. In the case of a logical model, the model itself may have a
certainty statistic associated with it which can be processed by pY =
f(pl,pm,pn,po) with po representing probability or CF of the model holding.
In the second case above a GIs might, e.g., be queried to produce a map to
show the position of a utility line through a neighbourhood. The users of the
map might be concerned with the accuracy of depiction of the utility's
location. If the precision of the procedure used to survey in the utility was
recorded in the attribute tables of the GIs, then this information could be
extracted and used to control the width (e.g. 30 allowing >99.9% probability
of finding the utility) of the utility's mapping. In this case no processing
model is used to generate new information.
Following our understanding of GIs, a simple GIs using the merged
capabilities of AutoCAD and dBASE (Drummond, 1990) was built. As well as
more usual facilities, this GIs had an interface permitting interrogation of the
user to acquire quality information of each data set. Although not a very userfriendly interface (when compared to P W S described in a later section) it
permitted the user to provide the quality information of each data item in a ,
probablity or natural language terms convertible to CF, as appropriate.
Conversion (for example converting a to probability using central limit theory
when a contiuous variable, such as rainfall, was reclassified into a
discontinuous variable such as the rainfall class 300-400 mm a year) was
carried out within the GIs. The acquired information was stored in the
relevant attribute database table and processed using fizzy sub-set reasoning
to provide CF values for each crop suitability polygon (Drurnmond,l987). As
the data concerned (i.e. for the crop suitability project) were all of a
discontinuous type error propagation using fbzzy logc rather than variance
propagation was appropriate - but variance propagation was supported. Simple
maps showing the quality of generated suitability information, by polygon and
with quality a function of hatching density, were produced.
FIGURE 1
UNCERTAINTY SUBSYSTEM.
The overall concept.
FIGURE 2
UNCERTAINTY SUBSYSTEM
Visualization of Information
Qua1ity
Uncertainty Sub-system
Uncenainty Sub-system (D)
-1
Lsvel I :Viiualiution of Quality lnfonnation
[TI Fl
Processing
?%don
Canograonic
I
-racy
NonCanograonic
I
f
C'kslfy
/1
Storing/Determining
qualrty of Datz
A
Quality
Storing
quallty of the
model
- Teenure
- Size
Info.
Fl-1
II
I
I
Reoresent the:
I
Indicators
in a :coon
1
I
Ckssrty
line acc ->
tine Mdtn
1
Final lnfonnation
:Quality Info.
Map (20or 30)
I
Quality
3.e
THE XGIS PROJECT
The ITC was willing to partially support three MSc students
(Ramlal, 199I), (Fan, 1992), (Oliveira, 1992) to research data quality matters in
their 'dissertation year' and fund a research associate (BarsoumJ995). This
was referred to as the XGIS project and represented the development of an
Uncertainty Subsystem - a subsystem of ILWIS (Ramlal et al., 1991). The
overall concept of this subsystem is provided in FIGURE 1.
From FIGURE 1 it can be seen that the Uncertainty Subsystem was
designed with four components. These dealt with the uncertainty of data
(Storing/Determining quality of data), the uncertainty of processing models
(Storing quality of the model), error propagation (Determining of Final
Information Quality) and the visualisation of uncertainty (Visualisation of
Information Quality). The visualisation was tackled first (Ramlal, 1991),
(Ramlal et al., l992), (Van Elzakker et al. 1992). FIGURE 2 summarises this.
From FIGURE 2 it can be seen that the visual variables texture, size and
value were considered appropriate for depicting data quality. The above
mentioned work by Ramlal, Van Elzakker and Drummond resulted in maps of
a type in which suitability was represented by the visual variable colour (e.g.
green = most suitable to red = least suitable) and data quality by value (e.g.
dark most certain to light least certain). Later Brown and Van Elzakker
(Brown et a1.J 993) experimented with the use of colour in the representation
of categoric area information quality. They concluded that the simultaneous
representation of attribute and quality is possible (by means of the colour hue
and colour saturation respectively), for both screen display and printer
output, for a limited number of categories and quality levels.
In tackling model quality (Fan,] 992), Fan took on the task of developing
Part B (FIGURE 1) of the Uncertainty Sub-system addressing (regression)
model building within the GIs environment. By using Linear Least Squares
Regression in model building, information became available on the
performance (quality) of the model. This information was graphically
displayed allowing the model developer to improve the model. Actual model
development related to toxic cloud release was carried out. Fan established the
concept of a model library in ILWIS where developed and pre-existing models
could be stored along with relevant quality and other information.
De Oliveira addressed data quality (OliveiraJ992) (StoringIDetermining
Data Quality in FIGURE 1 ) and produced an appropriate module for the subsystem. The module is capable of automatically generating data quality
parameters from related 'Ground Truth' data and user-supplied information.
The a1gorithms, in the main, concerned positional and attribute accuracy
assessment, working at two levels. The first level referred to overall accuracy
parameters at the data set level (e.g theme) and the second represented
accuracy parameters associated with individual database objects. It was a
requirement of De Oliveira's future employer (University of Aveiro) that
concepts developed and implemented for ILWIS could also be implemented in
ArdInfo to support an Emergency Response GIs. Error propagation for the
particualr information generation (or processing) models used (relating to
chemical accidents in the form of toxic cloud releases) was developed.
The challenge of developing a complete error propagation module (i.e. Part
C of the Uncertainty Sub-system: Determination of Final Information Quality
- see FIGURE 1) has remained unaddressed in the XGIS project. Instead the
remaining funds were used to support development of the user interface
(within the PUIUS Project). User interface is an important issue if GIs users
are to give serious consideration to matters of data and information quality.
THE PUIUS PROJECT
In its original form ILWIS operated in the MS-DOS environment. With the
decision to market ILWIS for WINDOWS from 1995 it was necessary to
transfer XGIS into WINDOWS. In its prototype form and in the WINDOWS
environment the ILWIS Uncertainty Subsystem has the name of PUIUS
(Prototype Uncertainty Subsystem). It was hoped that the PUIUS project
would complete the XGIS development, at least in prototype form. Referring
to FIGURE 1 the transfer of parts B and D was completed, and of part C for
discontinuous variables, but not continuous variables (Barsoum, 1995).
The prototype was developed using Visual Basic. The prototype deals with
the assessment and representation of the quality of discontinuous (categoric)
attribute data and resulting information referring to polygons. Three types of
quality assessment are involved, namely fuzzy set theory using Certainty
Factors (CFs), Bayesian probability theory and an integrated approach
involving also the positional accuracy of polygon boundaries. In essence, the
prototype is intended to provide a user-friendly graphical interface to assist
ILWIS users to assess the quality of their data and information, and to
visualize the results of the assessment using a wide variety of techniques.
Standard ILWIS functionality provides the established tools for examining
data and analysing quality, e.g confusion matrices, frequency graphs. PUIUS
integrates these. The graphical user interface has a main menu to allow the
user to opt to investigate data quality or model quality, and also to select an
appropriate visualization technique. As already indicated, the data quality
module has been partially implemented in the prototype, the model quality
modile not. The visualization module allows a wide choice of techniques,
including tables, graphs, maps and oblique views: for example to perform a
spatial analysis and simultaneously display the result. The associated quality
information can be visualized by one of the available methods, or it can be
accessed in numerical form by clicking on a polygon or pixel.
In the case of Certainty Factors, visualization was applied to the quality of a
crop suitability map. As the default representation, four suitability classes,
very suitable to not suitable, are represented by four hues (green, yellow,
orange, red). The lower the CF, the greyer (i.e. lower saturation) the hues
become. The user has control over hue choice and the rate of decreasing
saturation. If desired, the visual variable value (lightness) can be used to
represent uncertainty instead of, or as well as, saturation.
In the case of quality assessment using probabilities, PUIUS allows the
adjacent display of results with or without using prior probability. As for the
CF case described above, the visual variables saturation and/or value can be
used to represent uncertainty. Also, for example for a land use map resulting
from the classification of satellite data, blinking can be used to highlight
particular pixels. Several criteria can be selected, e.g. those with classification
accuracy above or below a specified probability threshold, those for which the
classification changes if prior probability is applied.
IDRISI'S INFORMATION QUALITY PROCESSING TOOLS
Apart from the above-reported work carried out in the ILWIS environment
the only other off-the-shelf GIs which (as the authors are aware) appears to
give consideration to matters of information quality is ldrisi (ldrisiJ995).
Members of the group were able to examine this in 1995. This is a developing
system so findings relate to a situation which may now have been superseded.
The investigation consisted of the execution of two different spatial
information generation tasks using Idrisi and analysis of any uncertainty
processing carried out during their execution. In the case of one test Idrisi
handled the uncertainty in a near complete manner (the location of a new
municipal dump - essentially a buffering operation) and in the case of the
other (identification of agroclimatic zones based on computed environmental
characteristics) some software development was needed (Du,1995).
Idrisi allows the insertion of either o or probability values for spatial data.
If these values have been inserted for some processing modules, error will be
propagated. The user has to know what values to insert. We chose, as far as
possible, to use data quality values which are standard for medium scale
topographic data as supplied in PR China. There follow brief descriptions of
Idrisi modules which address uncertainty. (These are discussed further in
(Idrisi,1995) under the title "Uncertainty and Risk" and "Error Propagation
Formulas" where some, perhaps over-, simplified formulae for variance
propagation are presented. These formulae do not seem to be implemented
directly, but having been presented the users can implement them themselves.)
PCLASS is an uncertainty handling version of a density slicing module.
With PCLASS for some newly generated classes (e.g. <2000m, <4000m,
<6000m etc.) a probability of each element in the DTM falling in that new
class is calculated, and stored in a displayable probability image, from user
supplied information on the quality of the original DTM given as a standard
deviation value. The central limit theorem is applied.
BAYES allows the user to exploit Bayesian Probability theory. The user
can exploit not only the quality of the datasets being used in the analysis (e.g.
the standard deviation of river or road coordinates processed through PCLASS
to give a probability image) but also the quality of the processing model (e.g.
that the particular rule for identifying suitable new landfill dump sites only
works 54% of the time) and other (u priori) information (e.g. an older study
identifying potential landfill dump sites in the same area). The usefulness of
this tool is enhanced because in the absence of a priori information one can
still exploit the module (although the procedure would no longer be strictly
Bayesian) and at least use the information available on data quality.
FUZZY allows an image of uncertainty values to be generated. These
uncertainty values are generated by one of three different types of membership
function (unlike PCLASS which only uses central limit theorem). Subsequent
processing of the FUZZY images must be performed using Idrisi's OVERLAY
module, using the appropriate FUZZY Logic rules (i.e. that the minimum
available certainty factor is assigned to the output image in the case of
intersection, that the maximum certainty factor is assigned in the case of
union, etc.). The assignment of certainty factors (via the different membership
functions) can be based on the experience or knowledge of the user - rather
than more objective information, such as standard deviation.
ERRMAT allows the generation of an error matrix (sometimes called a
confusion matrix in Remote Sensing literature) showing the agreement
between the image (e.g. landuse derived from the classification of a satellite
image) and the 'truth' (perhaps obtained by a set of ground observations and
used as check points).
MCE (Multi-Criteria Evaluation) supports information generation using
several sets of input data and user input processing models. If standard
deviation values (called RMS values in Idrisi) are stored in the documentation
file of each input data set, then Idrisi will carry out a simplified variance
propagation to give an estimate of output error, storing it in the relevant
document file. The simplification is such that the results imply no variation in
error across the resulting image, which is neither a helpful nor a valid
assumption when the main purpose of uncertainty analysis is to guide the
decision maker to geographic areas of greater certainty. A more rigorous
approach has been implemented by the authors.
Other Idrisi modules contribute to the handling or understanding of
uncertainty and include: REGRESS - for linear regression analysis; and
RANDOM - for simulating error.
CONCLUSIONS
The academic community investigating quality aspects of GIs is large. This
group of authors has regarded the processing of uncertainty as a GIs subsystem. Others may regard it as a GIs analytical tool; as with many GIs
analytical tools it may never be as well developed in an off-the-shelf system as
in a system developed by specialists particularly interested in error. However
this group has been characterised by individuals who believe that the standard
(i.e. off-the-shelf tools) should be as good as possible and this is reflected in
the fact that all developments have taken place in well-known environments
(AutoCAD+dBASE; ILWIS; ArclInfo; Idrisi). Obviously development of error
analysis tools must continue to take place without the constraints of off-theshelf-systems, but it should always be possible to implement the resulting
tools in such systems - if they are to have any effect in the 'real world'.
We have demonstrated that most components of the Uncertainty SubSystem can be implemented in ILWIS and the sub-system could thus become a
standard product and marketed. Although Idrisi's information quality
generation procedures are (also) incomplete, they are marketed. One can
commend Clark University. By making users at least sometimes think about
information quality such a GIs becomes an enhanced decision support tool.
ACKNOWLEDGEMENTS
Professor Du Daoshengts fellowship at the Topographic Science Section,
Glasgow University, in 1995, was funded by the Royal Society, 6 Carlton
House Terrace, London. The efforts of our supervisees and colleagues,
without whom much of the work referred to in this review would have
remained undone, namely, Beshem Ramlal, Jorge de Oliveira, Fan Shien-ta,
Elia Barsoum and Stephen McGinley are gratefully acknowledged.
REFERENCES
Barsoum, E.M., 1995 "A User Interface for Error Visualisation in ILWIS"
report to the Division of Cartography, ITC, 1995.
Blakemore, M. 1983, "Generalization and Error in Spatial Data Bases",
Proceedings of AutoCarto 6, 1983, pp.3 13-322
Brown A. and Elzakker, C.P.J.M van, 1993. "The use of colour in the
cartographic representation of information quality generated by a GIs." In:
P. Mesenburg (ed.), Procs. Vol2. 16th International Cartographic
Conference, Koln, 3-9 May 1993
Chrisman, N.R. 1983, "The role of quality information in the long-term
functioning of a GIs", Proceedings of AutoCarto 6, 1983, pp.303-312.
Drummond, J., 1987 "A Framework for handling error in GIs", ITC Jo. 1 98711
Drummond, J., 1990 "Models and Data Quality handled in a
dBASE1AUTOCAD GIs", in ITC Journal 1989 314
Du Daosheng, 1995. "Investigation of information quality processing in an
off-the-shelf GIs". Report prepared for the Royal Society of London at
Topographic Science Section, Glasgow University.
Fan, Shien-ta 1992 "Uncertainty Subsystem:Assessment of Model Quality",
MSc Thesis, ITC 1992
Idrisi for WINDOWS, Manual, 1995. Clark University Grad. School of Geog.
ODA, 1983. "Dominica Land Use Planning Project". LRDC, Tolworth Tower,
Surbiton, Surrey, KT6 7DY, UK. Report #127.
Oliviera, J.P. de 1992 "Generation of Data Quality Parameters in a GIs", MSc
Thesis, ITC 1992
Ramlal, B and Drummond, J.E., 1992 "A Prototype Uncertainty Subsystem
implemented in ITC's ILWIS PC based GIs and tested on a Dutch Land
Reallotment Project", EGIS Conference Proceedings, Munich 1992
Ramlal, Bheshem 1991 "Communicating Information Quality in a GIs
Environment", MSc Thesis, ITC 1991
Van Elzakker, C., Ramlal, B and Drummond, J.E., 1992 "The Visualisation of
GIs Generated Information Quality", August 1992, Archives ISPRS
Congress XVI, Commission IV
BIOGRAPHICAL SKETCH
Jane Drummond is a staffmember of Glasgow University's Topographic
Science Section. Prior to that she worked in ITC's Department of
Geoinformatics, the UK NERC's Experimental Cartography Unit, UNB's
Department of Geodesy and Geomatics Engineering, and Hunting Surveys.
Allan Brown is a staffmember of ITC's Department of Geoinformatics.
Previously he has worked in Gadja Mada University's Department of
Geography, Glasgow University's Topographic Science Section as an Assistant
Lecturer, and at Fairey Surveys.
Du Daosheng is a staffmember of the National Key Laboratory at Wuhan
Technical University of Surveying and Mapping, holding the rank of
professor. Prior to that he was Head of the Department of Cartography at the
same institute. From 1982 - 1984 he worked in the UK NERC's Experimental
Cartography Unit.
Come' van Elzakker is a staffmember of ITC's Department of
Geoinformatics. Prior to that he was a lecturer in the Cartography Section of
the Department of Geography of Utrecht State University.
Download