This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain. The Development of a GIs Information Quality Module Jane Drummond 1, Allan Brown 2, Du Daosheng 3, corn6 van Elzakker 4 Abstract.- In the early 1980s the authors first considered 1) the transfer of traditional mapmakers' expertise regarding spatial information quality into GIs and 2) ways of determining GIs generated information's quality. Taking into account the now accepted principles that attribute, position, age, and topology quality and lineage all contribute to GIs generated information, the framework for our work was that as a GIs generated new information an associated uncertainty subsystem should in parallel generate information on the quality of that information and that the two sets of information should be displayed simultaneously to better the GIS user's decision making. Now an extant (though incomplete) entity the subsystem consists of appropriate means to determine and store the quality of spatial data and the processing model used to generate information, means for determining the generated information quality (variance propagation, fwzy sub-set theory, etc.) and means for visualizing that quality. These are integrated in an Uncertainty Subsystem called PUIUS. The subsystem has been implemented in ILWIS. This paper will review the subsystem (and another), sum~narizesome applications and point the way to further work. Maintaining high standards has long concerned land surveyors, photogrammetrists and cartographers, but the Geographic Information System (GIs) community seemed only alerted to the problem of information quality in the early 1980's. By the end of the 1980's those in GIs had accepted the principles that: quality of attribute and position; information aye; topological consistency; and lineage all contributed to the quality of any GIs generated information. We first started, formally, to consider 1) the transfer of traditional mapmakers' expertise regarding spatial information quality in the GIS environment and 2) ways to determine the quality of GIs generated information, in the mid-eighties. Taking into account the principles presented in the first paragraph, our study's framework was that while a GIs generated new information an associated uncertainty subsystem should in parallel generate information on the quality of that new information and that the two sets of information should (if possible) be displayed simultaneously to better the GIs user's decision making. The subsystem itself would consist of appropriate means to determine (knowledge base, user queries, etc.) and store the quality of spatial data, to store the quality of the processing model used to generate information, the means for determining the generated information quality (variance propagation, fuzzy sub-set theory, etc.) and the means (2D, 3D) for visualizing that quality. These were integrated in an Uncertainty Subsystem called PUIUS (Prototype User Interface for an Uncertainty Subsystem). The subsystem has been implemented in ILWIS, a PC based GIs developed and marketed bt the International Institute for Aerospace Survey and Earth Sciences (ITC) in The Netherlands. This paper will review this subsystem. The review will take the form of a summary of projects carried out by the group which lead to the development of PUIUS, and an investigation of a recent version of another off-the-shelf GIs package which is now marketted with some information quality processing tools. Thus the five remaining sections of this paper will deal with i) early ideas; ii) the XGIS project; iii) the PUIUS project; iv) the Idrisi's information quality processing tools; v) and conclusions. EARLY IDEAS In 1983 two of this paper's authors, working at the UK Experimental Cartography Unit, prepared a very straightforward presentation on digital spatial data quality for a Visiting Committee. The presentation showed boundary widths for polygons of various natural features as a function of the standard deviation (SD) of these boundaries' coordinates, and then the resulting overlaid polygons. Some visitors expressed surprise at the demonstrated imprecise nature of the digital spatial data we so expensively acquired. Although official British science seemed unready pay for recognisably imperfect digital spatial data (the two authors concerned relocated to the Netherlands and China), at about the same time Blakemore (1983) and Chrisman (1983) were much more successfully convincing European and North American academics that the imperfections of digital spatial data represented a respectable research area. The presentation's polygons represented areas of (assumed) uniform slope, soil class, rainfall and temperature and were overlaid and used for crop suitability mapping (ODAJ983). At the time, and working in the vector environment, to determine the precision of such contributing polygons' boundaries and then produce overlaid polygons with boundary line widths proportional to their (e.g.) precision was cumbersome. The same problem has now been much more elegantly addressed in the raster environment of PUIUS (see below). The user of a G I s is not likely to be interested in the quality of the data per se, but in the quality of the generated information. In the ODA suitability mapping project, the input sets of polygons each had a centroid - as had each polygon resulting from overlay. Standard databasing facilities permit such centroids to have attributes referring to data quality and can be easily processed. Such was part of the thinking behind much subsequent work in the group (Drummond,1987).The group understood a GIs to be both: a system which used processing models to generate new (i) information from existing data; and, a spatial data archive which allowed one to retrieve and display (ii) stored data. In the first case the new information will be visualized, but simultaneously (or by toggling) its quality should be visualised. Error propagation models are directly related to information generation models and information generation models can be mathematical (e.g. computation of aspect; USLE) or logical (e.g. crop suitabilty; Community Tax determination). Thus if a mathematical information processing model gives us X = f(a,b,c) then oX = f(oa,ob,oc), where o represents Standard Deviation. Likewise if a logical processing model gives us Y = f (l,m,n) then pY = f(pl,pm,pn) where p represents a certainty statistic such as Probability or Certainty Factor. However the processing model itself may be in error, thus however good the data are, the generated information may be poor. In the case of a mathematical model its error can be handled as noise and subsequently processed by oX = f(oa,ob,oc,on) where n represents noise. In the case of a logical model, the model itself may have a certainty statistic associated with it which can be processed by pY = f(pl,pm,pn,po) with po representing probability or CF of the model holding. In the second case above a GIs might, e.g., be queried to produce a map to show the position of a utility line through a neighbourhood. The users of the map might be concerned with the accuracy of depiction of the utility's location. If the precision of the procedure used to survey in the utility was recorded in the attribute tables of the GIs, then this information could be extracted and used to control the width (e.g. 30 allowing >99.9% probability of finding the utility) of the utility's mapping. In this case no processing model is used to generate new information. Following our understanding of GIs, a simple GIs using the merged capabilities of AutoCAD and dBASE (Drummond, 1990) was built. As well as more usual facilities, this GIs had an interface permitting interrogation of the user to acquire quality information of each data set. Although not a very userfriendly interface (when compared to P W S described in a later section) it permitted the user to provide the quality information of each data item in a , probablity or natural language terms convertible to CF, as appropriate. Conversion (for example converting a to probability using central limit theory when a contiuous variable, such as rainfall, was reclassified into a discontinuous variable such as the rainfall class 300-400 mm a year) was carried out within the GIs. The acquired information was stored in the relevant attribute database table and processed using fizzy sub-set reasoning to provide CF values for each crop suitability polygon (Drurnmond,l987). As the data concerned (i.e. for the crop suitability project) were all of a discontinuous type error propagation using fbzzy logc rather than variance propagation was appropriate - but variance propagation was supported. Simple maps showing the quality of generated suitability information, by polygon and with quality a function of hatching density, were produced. FIGURE 1 UNCERTAINTY SUBSYSTEM. The overall concept. FIGURE 2 UNCERTAINTY SUBSYSTEM Visualization of Information Qua1ity Uncertainty Sub-system Uncenainty Sub-system (D) -1 Lsvel I :Viiualiution of Quality lnfonnation [TI Fl Processing ?%don Canograonic I -racy NonCanograonic I f C'kslfy /1 Storing/Determining qualrty of Datz A Quality Storing quallty of the model - Teenure - Size Info. Fl-1 II I I Reoresent the: I Indicators in a :coon 1 I Ckssrty line acc -> tine Mdtn 1 Final lnfonnation :Quality Info. Map (20or 30) I Quality 3.e THE XGIS PROJECT The ITC was willing to partially support three MSc students (Ramlal, 199I), (Fan, 1992), (Oliveira, 1992) to research data quality matters in their 'dissertation year' and fund a research associate (BarsoumJ995). This was referred to as the XGIS project and represented the development of an Uncertainty Subsystem - a subsystem of ILWIS (Ramlal et al., 1991). The overall concept of this subsystem is provided in FIGURE 1. From FIGURE 1 it can be seen that the Uncertainty Subsystem was designed with four components. These dealt with the uncertainty of data (Storing/Determining quality of data), the uncertainty of processing models (Storing quality of the model), error propagation (Determining of Final Information Quality) and the visualisation of uncertainty (Visualisation of Information Quality). The visualisation was tackled first (Ramlal, 1991), (Ramlal et al., l992), (Van Elzakker et al. 1992). FIGURE 2 summarises this. From FIGURE 2 it can be seen that the visual variables texture, size and value were considered appropriate for depicting data quality. The above mentioned work by Ramlal, Van Elzakker and Drummond resulted in maps of a type in which suitability was represented by the visual variable colour (e.g. green = most suitable to red = least suitable) and data quality by value (e.g. dark most certain to light least certain). Later Brown and Van Elzakker (Brown et a1.J 993) experimented with the use of colour in the representation of categoric area information quality. They concluded that the simultaneous representation of attribute and quality is possible (by means of the colour hue and colour saturation respectively), for both screen display and printer output, for a limited number of categories and quality levels. In tackling model quality (Fan,] 992), Fan took on the task of developing Part B (FIGURE 1) of the Uncertainty Sub-system addressing (regression) model building within the GIs environment. By using Linear Least Squares Regression in model building, information became available on the performance (quality) of the model. This information was graphically displayed allowing the model developer to improve the model. Actual model development related to toxic cloud release was carried out. Fan established the concept of a model library in ILWIS where developed and pre-existing models could be stored along with relevant quality and other information. De Oliveira addressed data quality (OliveiraJ992) (StoringIDetermining Data Quality in FIGURE 1 ) and produced an appropriate module for the subsystem. The module is capable of automatically generating data quality parameters from related 'Ground Truth' data and user-supplied information. The a1gorithms, in the main, concerned positional and attribute accuracy assessment, working at two levels. The first level referred to overall accuracy parameters at the data set level (e.g theme) and the second represented accuracy parameters associated with individual database objects. It was a requirement of De Oliveira's future employer (University of Aveiro) that concepts developed and implemented for ILWIS could also be implemented in ArdInfo to support an Emergency Response GIs. Error propagation for the particualr information generation (or processing) models used (relating to chemical accidents in the form of toxic cloud releases) was developed. The challenge of developing a complete error propagation module (i.e. Part C of the Uncertainty Sub-system: Determination of Final Information Quality - see FIGURE 1) has remained unaddressed in the XGIS project. Instead the remaining funds were used to support development of the user interface (within the PUIUS Project). User interface is an important issue if GIs users are to give serious consideration to matters of data and information quality. THE PUIUS PROJECT In its original form ILWIS operated in the MS-DOS environment. With the decision to market ILWIS for WINDOWS from 1995 it was necessary to transfer XGIS into WINDOWS. In its prototype form and in the WINDOWS environment the ILWIS Uncertainty Subsystem has the name of PUIUS (Prototype Uncertainty Subsystem). It was hoped that the PUIUS project would complete the XGIS development, at least in prototype form. Referring to FIGURE 1 the transfer of parts B and D was completed, and of part C for discontinuous variables, but not continuous variables (Barsoum, 1995). The prototype was developed using Visual Basic. The prototype deals with the assessment and representation of the quality of discontinuous (categoric) attribute data and resulting information referring to polygons. Three types of quality assessment are involved, namely fuzzy set theory using Certainty Factors (CFs), Bayesian probability theory and an integrated approach involving also the positional accuracy of polygon boundaries. In essence, the prototype is intended to provide a user-friendly graphical interface to assist ILWIS users to assess the quality of their data and information, and to visualize the results of the assessment using a wide variety of techniques. Standard ILWIS functionality provides the established tools for examining data and analysing quality, e.g confusion matrices, frequency graphs. PUIUS integrates these. The graphical user interface has a main menu to allow the user to opt to investigate data quality or model quality, and also to select an appropriate visualization technique. As already indicated, the data quality module has been partially implemented in the prototype, the model quality modile not. The visualization module allows a wide choice of techniques, including tables, graphs, maps and oblique views: for example to perform a spatial analysis and simultaneously display the result. The associated quality information can be visualized by one of the available methods, or it can be accessed in numerical form by clicking on a polygon or pixel. In the case of Certainty Factors, visualization was applied to the quality of a crop suitability map. As the default representation, four suitability classes, very suitable to not suitable, are represented by four hues (green, yellow, orange, red). The lower the CF, the greyer (i.e. lower saturation) the hues become. The user has control over hue choice and the rate of decreasing saturation. If desired, the visual variable value (lightness) can be used to represent uncertainty instead of, or as well as, saturation. In the case of quality assessment using probabilities, PUIUS allows the adjacent display of results with or without using prior probability. As for the CF case described above, the visual variables saturation and/or value can be used to represent uncertainty. Also, for example for a land use map resulting from the classification of satellite data, blinking can be used to highlight particular pixels. Several criteria can be selected, e.g. those with classification accuracy above or below a specified probability threshold, those for which the classification changes if prior probability is applied. IDRISI'S INFORMATION QUALITY PROCESSING TOOLS Apart from the above-reported work carried out in the ILWIS environment the only other off-the-shelf GIs which (as the authors are aware) appears to give consideration to matters of information quality is ldrisi (ldrisiJ995). Members of the group were able to examine this in 1995. This is a developing system so findings relate to a situation which may now have been superseded. The investigation consisted of the execution of two different spatial information generation tasks using Idrisi and analysis of any uncertainty processing carried out during their execution. In the case of one test Idrisi handled the uncertainty in a near complete manner (the location of a new municipal dump - essentially a buffering operation) and in the case of the other (identification of agroclimatic zones based on computed environmental characteristics) some software development was needed (Du,1995). Idrisi allows the insertion of either o or probability values for spatial data. If these values have been inserted for some processing modules, error will be propagated. The user has to know what values to insert. We chose, as far as possible, to use data quality values which are standard for medium scale topographic data as supplied in PR China. There follow brief descriptions of Idrisi modules which address uncertainty. (These are discussed further in (Idrisi,1995) under the title "Uncertainty and Risk" and "Error Propagation Formulas" where some, perhaps over-, simplified formulae for variance propagation are presented. These formulae do not seem to be implemented directly, but having been presented the users can implement them themselves.) PCLASS is an uncertainty handling version of a density slicing module. With PCLASS for some newly generated classes (e.g. <2000m, <4000m, <6000m etc.) a probability of each element in the DTM falling in that new class is calculated, and stored in a displayable probability image, from user supplied information on the quality of the original DTM given as a standard deviation value. The central limit theorem is applied. BAYES allows the user to exploit Bayesian Probability theory. The user can exploit not only the quality of the datasets being used in the analysis (e.g. the standard deviation of river or road coordinates processed through PCLASS to give a probability image) but also the quality of the processing model (e.g. that the particular rule for identifying suitable new landfill dump sites only works 54% of the time) and other (u priori) information (e.g. an older study identifying potential landfill dump sites in the same area). The usefulness of this tool is enhanced because in the absence of a priori information one can still exploit the module (although the procedure would no longer be strictly Bayesian) and at least use the information available on data quality. FUZZY allows an image of uncertainty values to be generated. These uncertainty values are generated by one of three different types of membership function (unlike PCLASS which only uses central limit theorem). Subsequent processing of the FUZZY images must be performed using Idrisi's OVERLAY module, using the appropriate FUZZY Logic rules (i.e. that the minimum available certainty factor is assigned to the output image in the case of intersection, that the maximum certainty factor is assigned in the case of union, etc.). The assignment of certainty factors (via the different membership functions) can be based on the experience or knowledge of the user - rather than more objective information, such as standard deviation. ERRMAT allows the generation of an error matrix (sometimes called a confusion matrix in Remote Sensing literature) showing the agreement between the image (e.g. landuse derived from the classification of a satellite image) and the 'truth' (perhaps obtained by a set of ground observations and used as check points). MCE (Multi-Criteria Evaluation) supports information generation using several sets of input data and user input processing models. If standard deviation values (called RMS values in Idrisi) are stored in the documentation file of each input data set, then Idrisi will carry out a simplified variance propagation to give an estimate of output error, storing it in the relevant document file. The simplification is such that the results imply no variation in error across the resulting image, which is neither a helpful nor a valid assumption when the main purpose of uncertainty analysis is to guide the decision maker to geographic areas of greater certainty. A more rigorous approach has been implemented by the authors. Other Idrisi modules contribute to the handling or understanding of uncertainty and include: REGRESS - for linear regression analysis; and RANDOM - for simulating error. CONCLUSIONS The academic community investigating quality aspects of GIs is large. This group of authors has regarded the processing of uncertainty as a GIs subsystem. Others may regard it as a GIs analytical tool; as with many GIs analytical tools it may never be as well developed in an off-the-shelf system as in a system developed by specialists particularly interested in error. However this group has been characterised by individuals who believe that the standard (i.e. off-the-shelf tools) should be as good as possible and this is reflected in the fact that all developments have taken place in well-known environments (AutoCAD+dBASE; ILWIS; ArclInfo; Idrisi). Obviously development of error analysis tools must continue to take place without the constraints of off-theshelf-systems, but it should always be possible to implement the resulting tools in such systems - if they are to have any effect in the 'real world'. We have demonstrated that most components of the Uncertainty SubSystem can be implemented in ILWIS and the sub-system could thus become a standard product and marketed. Although Idrisi's information quality generation procedures are (also) incomplete, they are marketed. One can commend Clark University. By making users at least sometimes think about information quality such a GIs becomes an enhanced decision support tool. ACKNOWLEDGEMENTS Professor Du Daoshengts fellowship at the Topographic Science Section, Glasgow University, in 1995, was funded by the Royal Society, 6 Carlton House Terrace, London. The efforts of our supervisees and colleagues, without whom much of the work referred to in this review would have remained undone, namely, Beshem Ramlal, Jorge de Oliveira, Fan Shien-ta, Elia Barsoum and Stephen McGinley are gratefully acknowledged. REFERENCES Barsoum, E.M., 1995 "A User Interface for Error Visualisation in ILWIS" report to the Division of Cartography, ITC, 1995. Blakemore, M. 1983, "Generalization and Error in Spatial Data Bases", Proceedings of AutoCarto 6, 1983, pp.3 13-322 Brown A. and Elzakker, C.P.J.M van, 1993. "The use of colour in the cartographic representation of information quality generated by a GIs." In: P. Mesenburg (ed.), Procs. Vol2. 16th International Cartographic Conference, Koln, 3-9 May 1993 Chrisman, N.R. 1983, "The role of quality information in the long-term functioning of a GIs", Proceedings of AutoCarto 6, 1983, pp.303-312. Drummond, J., 1987 "A Framework for handling error in GIs", ITC Jo. 1 98711 Drummond, J., 1990 "Models and Data Quality handled in a dBASE1AUTOCAD GIs", in ITC Journal 1989 314 Du Daosheng, 1995. "Investigation of information quality processing in an off-the-shelf GIs". Report prepared for the Royal Society of London at Topographic Science Section, Glasgow University. Fan, Shien-ta 1992 "Uncertainty Subsystem:Assessment of Model Quality", MSc Thesis, ITC 1992 Idrisi for WINDOWS, Manual, 1995. Clark University Grad. School of Geog. ODA, 1983. "Dominica Land Use Planning Project". LRDC, Tolworth Tower, Surbiton, Surrey, KT6 7DY, UK. Report #127. Oliviera, J.P. de 1992 "Generation of Data Quality Parameters in a GIs", MSc Thesis, ITC 1992 Ramlal, B and Drummond, J.E., 1992 "A Prototype Uncertainty Subsystem implemented in ITC's ILWIS PC based GIs and tested on a Dutch Land Reallotment Project", EGIS Conference Proceedings, Munich 1992 Ramlal, Bheshem 1991 "Communicating Information Quality in a GIs Environment", MSc Thesis, ITC 1991 Van Elzakker, C., Ramlal, B and Drummond, J.E., 1992 "The Visualisation of GIs Generated Information Quality", August 1992, Archives ISPRS Congress XVI, Commission IV BIOGRAPHICAL SKETCH Jane Drummond is a staffmember of Glasgow University's Topographic Science Section. Prior to that she worked in ITC's Department of Geoinformatics, the UK NERC's Experimental Cartography Unit, UNB's Department of Geodesy and Geomatics Engineering, and Hunting Surveys. Allan Brown is a staffmember of ITC's Department of Geoinformatics. Previously he has worked in Gadja Mada University's Department of Geography, Glasgow University's Topographic Science Section as an Assistant Lecturer, and at Fairey Surveys. Du Daosheng is a staffmember of the National Key Laboratory at Wuhan Technical University of Surveying and Mapping, holding the rank of professor. Prior to that he was Head of the Department of Cartography at the same institute. From 1982 - 1984 he worked in the UK NERC's Experimental Cartography Unit. Come' van Elzakker is a staffmember of ITC's Department of Geoinformatics. Prior to that he was a lecturer in the Cartography Section of the Department of Geography of Utrecht State University.