This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain. The Effect Of Database Generalization On The Accuracy Of The Viewshed Peter v is her' Abstract -- This paper examines the effects of database generalization on the area which is determined to be visible in GIs analysis. Many different methods of generalization are possible, but here, for any cell at the target resolution, elevations are determined from: the arithmetic mean, the maximum, the minimum, and the maximum difference from the mean of the cells within the kernel, and all possible combinations of regular spacings. The resolutions studied are 0.5, 0.33, 0.25, and 0.2 of the original study area. The viewsheds determined over these different resolution DEMs are compared with a number of possible viewsheds derived by generalization of the viewshed over the original DEM. Of those tested the maximum deviation from the mean within kernel provides the best estimate of the pattern and area of the viewshed at all resolutions. INTRODUCTION In recent years considerable attention has been paid to two related lines of research. The representation of digital spatial data at multiple resolutions, where either information is stored at multiple resolutions or is generalized from a detailed (large) scale to a generaked (smaller) scale (Buttenfield and McMaster 1991;Muller, Lagrange and Weibel 1995). For specific studies see, for example, Brown et al. (1 993), Chang and Tsai (199 I), Isaacson and Ripple (1990), Joao (1995) and Painho (1995). Few pieces ofwork have taken a dataset at one single resolution and examined derived products in alternative generalizations of the data, but see work of McMaster (19 87). The research reported here is in t h s same vein. It takes the viewshed as it is determined in one DEM at the largest resolution, and uses measurements of the 'quality' or 'accuracy' of the viewshed when it is determined from altemative derived DEMs at smaller resolutions. The purpose is to comment on the changing nature of the viewshed and the performance of the alternative generalization operators defined. Senior Lecturer, University qf Leicester, Leicester, United Ki'ngdoin THE VIEWSHED The viewshed is one of the standard procedures included with GIs designed for analysis of elevation data. It is intended to distinguish locations which can be seen fiom a particular viewing location (in-view) from those which are outof-view. A he-of-sight is drawn from the viewer to the viewed location, and if the elevation rises above the elevation of the line-of-sight at any point along the line-of-sight then the viewed position at the end of the line is determined to be out-of-view. Otherwise it is in-view. Here the method of point-to-point determination across a rectangular lattice is used (Fisher 1993). EXPERIMENTAL PROCEDURE For the research reported here two Ordnance Survey DEMs from the United Kingdom were used: one of the Malvern Hills and one including the southeast part of Dartmoor. Each is a 20 x 20 km tile of the national elevation coverage of 150,000 gridded elevation data recording elevations at 50 m intervals. Therefore each is 401 x 40 1 values. Within each test area 100 random points were selected w i t h an active area which excluded a 60-cell buffer from the edge of the tile. A region 120 x 120 cells was then taken around each point. The 60,60 cell within each area was taken as the view point and the viewshed determined. These are binary products which are referred to below as the original viewsheds. Generahations were then made of the DEM itself, the original viewshed and the viewpoint, so that viewsheds could be determined. For simplicity the original DEMs are generaked so that pixels in the derived datasets map exactly to the original. The generalizations included 0.5, 0.33, 0.25, and 0.2 times reductions, giving DEM arrays of 60x60,40x40,30x30 and 20x20. The generahation procedures are discussed in the next section. Statistical comparisons include the bias and error of the viewshed in the generalized DEMs was derived from a comparison with the area of the original viewshed, and the kappa coefficient of agreement between the viewshed in the generabed DEMs and a number of alternative generalizations of the original viewshed. GENERALIZATION OPERATORS Generalizing the DEM It is easy to envisage many alternative approaches to the generalization of a yidded terrain dataset. The methods used here are entirely based on raster filtering. They do not include any surface modelhng methods. Another approach is to resample using one of a number of well known algorithms, including nearest neighbour, and bilinear interpolation. These methods, however, are generally reserved for the transformation of one gndded dataset to another where the orientation of the grid varies, as well as the size of the grid. Other approaches to generalizing the grid use a more regular geometric approach, and are investigated here. Two primary types of generalization are used. The first type searches for values in the original DEM on a regular spacing basis, while the others take summary statistics from within a kernel area. Regular Spacing The simplest method of generalization is to take a value at a regular spacing, m, so that in the values in the derived DEM are every m th values in x and y in the original DEM . If m = 2 then there are 4 possible realizations of this process. zd = 21, thenzd = 22, thenzd = 23, and finallyzd = 4 (where zl,z2, z3, and 24 alternative elevation values, and zd is the value in a realization of the generalization process). If m = 5 then there are 25 different possible values of Summary Statistics 1) Mean Within the kernel area the most obvious summary statistic is the mean. 2 ) Maximum and Minimum Two fiuther generalization operators are the maximum and minimum values w i t h kernel. 3) Maximum Deviation from the Mean Finally, the value of the elevation whlch is the largest deviation from the mean within the kernel is recorded. Generalizing the Viewshed For comparison with derived viewsheds at multiple levels of generalization it is necessary to have a generalization of the original viewshed. It should be recalled that the viewshed treated here is a binary phenomenon, any cell can only be treated as being in-view or out-of-view. Three different strategies are used: 1) All A cell in the generalized dataset is taken to be in-view if all cells in the kernel are in-view. Otherwise it is out-of-view. 2) Majority A cell in the generalized dataset is taken to be in-view if the majority of cells in the kernel are in-view. Otherwise it is out-of-view. Where there is an even number of cells in the kernel the majority is taken as 1112+l. 3) Any A cell in the generaked dataset is taken to be in-view if the any cell in the kernel is in-view. Otherwise the generalized cell is out-of-view. Generalizing the Viewpoint Finally, the treatment of the viewpoint is considered. Viewshed algorithms treat the viewpoint very differently. Here, the cell considered to be the viewpoint in any generalization is the generalized cell which contains the viewpoint in the ungeneraked DEM. Thls approach has the advantage over others that the viewpoint is always at the centre of a cell in any dataset, and so coincides geometrically with a height value in the corresponding DEM. The DEM elevation at the viewpoint is always known as part of the generaked DEM and never inferred. Many possible reslizations of the original viewshed correspond to the viewshed at a particular scale, and so results here must be regarded as provisional. Implementation The above approach was implemented, together with the viewshed algorithm in Turbo Pascal 7.0. The sampling, generalization by all methods, generation of the viewshed and calculation of summary and comparitive statistics of the viewsheds were all integrated. It ran on a Pentium PC wkch had been checked for faults on the processor. Idrisi v 4.1 was used in support of the analysis to view data at different resolutions, etc. and analysis was done in Excel 5.0 and using bespoke programs. RESULTS Generalizing the Original Viewshed As noted above, three dflerent versions of generaking the original viewshed were derived. These are based on all, a majority and any of the pixels w i t h the kernel being in view in the original. The results of t h s analysis can best be seen in Table 1, where the correlation between the area of the original and generalized viewsheds is reported together with the mean bias and error as a result of the generahation for both test areas. It can be seen that the correlation for the majority operator is always closes to 1.O, and that it has much the lowest values of bias and error at all levels of generahation. In short, the area of the viewshed derived by the majority operator is predictable, and is very s d a r to the original area, even over large resolution changes, while the values as a result of the any and all operators are not so predictable at the larger resolution changes. Furthermore, the values of the majority operator are very close to the values for the original. Table 1. -- Bias and Error in generalizations of the Orignal Viewshed. Dartmoor Generalizations Malvern Error Correl. Bias Error Correl. Bias 2.70 0.997 2.42 3.39 3.78 0.993 0.5 x Any 0.97 0.999 1.34 0.997 -1.19 -0.87 Major 2.79 0.993 -2.48 -3.60 4.06 0.985 All 5.98 6.58 0.983 4.50 4.97 0.993 0.33 x Any 0.09 0.43 0.999 0.24 0.999 Major -6.21 7.02 0.943 -4.45 5.04 0.972 All 7.10 0.988 6.46 8.36 9.12 0.971 0.25 x Any 0.63 0.998 0.77 0.997 -0.50 -0.56 Major 6.75 0.940 -5.96 -7.93 8.99 0.881 All 9.09 0.980 8.30 10.54 11.46 0.958 0.2 x Any 0.6 1 0.995 0.70 0.997 -0.26 -0.29 Major 7.92 0.896 -6.99 -9.13 10.39 0.803 All The Area of the Viewshed When the viewshed is determined over the generalized DEMs, there are a number of ways to compare it with the original viewshed. Two parameters are examined here: the area visible, and the kappa coefficient of agreement. The area of the viewshed at different resolutions is easily determined, and in Table 2 the mean bias and error are given for both test areas at all four resolutions. In addition, the correlation coefficients between the two areas are reported. The "best" generalization is that which yields the smallest bias and error and the largest correlation. Table 2 is divided by the amount of and methods of generalization. The values reported for regular spacings are the minimum area and the maximum area determined among the 4, 9, 16 and 25 possible. There is considerable variation among these, as is reflected in the error measures which are always the largest and smallest of any other generalization. On the other hand, the correlation coefficients for these are on the whole both poorer than the best of the summary statistic values. Among the summary statistics of the generalizations, 21 out of the 24 measures of accuracy show the maximum deviation fiom the mean to give the best performance. The correlation coeficients and mean errors yielded by this method are always the best of those reported. The bias is sometimes smaller for some other method (in two cases the maximum within kernel and once the minimum within kernel). Table 2. -- Error measures for the area of the viewshed in generalizations of the viewshed as compared with the area in the original viewshed. Malvern Generalizations Dartmoor Bias Error Correl. Bias Error Correl. 0.5 x regular Max 6.17 8.17 0.802 3.60 4.34 0.956 1.14 spacing M n 0.47 1.82 0.965 3.32 0.913 4.04 summary Max 6.56 0.809 2.18 3.18 0.946 4.27 6.63 0.823 statistics Min 2.03 3.12 0.957 Mean 4.52 6.76 0.830 2.23 2.97 0.972 MaxDev 3.39 4.02 0.967 2.1 1 2.58 0.992 11.39 13.68 0.706 0.33 x regular Max 7.52 8.75 0.871 2.28 spacing Min 1.03 3.33 0.897 6.42 0.725 6.85 summary Max 9.34 0.768 4.50 6.05 0.874 8.51 11.25 0.688 statistics Mm 4.57 6.59 0.880 Mean 7.87 10.10 0.768 4.74 6.17 0.910 MaxDev 6.60 8.08 0.884 4.08 4.85 0.979 16.34 18.89 0.600 0.25 x regular Max 12.26 14.26 0.712 3.34 spacing Min 7.42 0.673 1.76 4.88 0.812 summary Max 9.55 12.24 0.658 7.46 9.40 0.784 12.61 15.86 0.600 statistics Mm 7.25 9.48 0.806 Mean 11.22 14.01 0.675 7.38 9.12 0.836 MaxDev 9.65 11.23 0.844 6.29 7.53 0.955 20.75 23.23 0.526 0.2 x regular Max 16.60 18.87 0.648 4.15 8.08 0.651 spacing Min 2.30 5.51 0.738 summary Max 12.45 15.88 0.542 9.78 12.20 0.705 statistics Mm 15.83 18.96 0.580 10.26 12.62 0.685 Mean 14.00 17.12 0.607 10.14 12.38 0.706 MaxDev 12.77 14.60 0.790 8.97 10.57 0.927 I I The latter of these shows an enormous spread of values which tends to be more typical of the 0.2 generahations than that yielded by the maximum deviation from the mean. The 0.5 generalizations are generally relatively well correlated as is shown by the correlation coefficients, but this can rapidly deteriorate. The Arrangement of the Viewshed The kappa coefficient of agreement between two sets of data ranges from 0 to 1, where 1 reflects perfect correspondence in the arrangements and 0 disagreement. It has been widely applied to examining the confusion matrix in remote sensing (Congalton et al., 1983). Here the same approach is used. Kappa is determined for 2 x 2 tabulations which, for a particular resolution, compares the visbht y of pixels in the generalized binary viewshed, as determined by the majority method, with that determined through the DEM when generalized by one of the methods under test. Summary results are reported in Table 3 where the maximum, minimum and mean values of the coefficient for the 100 viewpoints in each study area are reported. The maximum and minimum values are reporting the best and the worst agreements between patterns in any situation. The best generalization method should yield the largest value of kappa at any reduction. The results for regularly spaced samples report the agreements for the maximum and minimum areas in any single generalization test. The maximum viewshed is the best in 22 out of the 24 values reported. In other words, the largest viewshed generated by regular spacing fiom the kernel area yields the best agreement in the spatial pattern of the viewshed among all other generalizations of the DEM. On the other hand, the minimum area viewshed always yields the worst agreement. This method of generalization provides an envelope of all other values. Indeed, in the 0.2 generalizations of both the Malvern and Dartmoor test areas, regular spacing yielding the minimum visible area gives almost total disagreement with the original viewshed (0.054 and 0.020). Table 3. -- Kappa coefficients of agreement between the generalized version of the original viewshed and the viewshed determined over the eeneralized DEM. Generalizations Malvern Dartmoor IMax Mean Min IMax Mean Min 0.661 0.924 0.818 0.899 0.744 0.285 0.5 x regular Max 0.135 0.845 0.688 0.850 0.591 0.201 spacing Min summary Max statistics Min Mean MaxDev 0.33 x regular Max 1 0.886 0.694 0.235) 0.877 0.772 0.480 0.162 0.805 0.465 0.106 0.796 0.540 spacing Min 0.868 0.607 0.864 0.666 0.201 0.182 summary Max statistics Min Mean MaxDev 0.25 x regular Max spacing Mm summary Max statistics Min Mean MaxDev 0.238 0.833 0.555 0.174 0.827 0.634 0.2 x regular Max 0.020 0.613 0.320 0.711 0.288 0.054 spacing Min summary Max statistics Min Mean MaxDev In 17 out of 24 summaries, the maximum deviation withm kernel yields the best agreement. In 5 cases the mean is best, and the maximum and minimum w i t h kernel are best once each. With 0.2 generahzation the worst levels of agreement are again very poor (0.0 15 in the 0.2 generahzation of the Malvern area by minimum w i t h kernel). Both the mean and the best levels of agreement in all situations are, however, quite acceptable. CONCLUSION A number of conclusions can be drawn from the current work. For the viewshed it would appear that the generahzation of the binary viewshed to other gnd resolutions is best performed by determining those cells at the target resolution with the majority of visible cells. Other immediately apparent methods of generalization do not yield results which match the area and pattern of the original viewshed. Generahation of the DEM yields many alternative possible viewsheds. Regular spacing of the kernel area yields both the best and worst estimates of the visible area. The result is therefore unpredictable, and although the method is fast and convenient it is not to be recommended as a basis of generalization. Among the statistical summaries of a kernel area, generahzation of the DEM by determination of the maximum deviation within the kernel yields the viewshed which best reflects both the area and arrangement of the visible area. The results are all most stable and most frequently the best for this method. It is therefore the method to be recommended (of those tested) in any situation where a DEM requires generalization, and the viewshed is the derived product which is uppermost in the investigator's interest. Finally it should be noted that in most cases neither the area not the pattern of the viewshed is badly disrupted by generalization of the DEM, although it can be if an injudicious method of generahation is used. As stated in the introduction, very little research is reported on propagating the effects of alternative generalizations of spatial databases. Research sirmlar to that reported here can be envisaged for almost any spatial data and derived product. Indeed, it is crucial for very many applications that we know best how to generalize both categorical data (the binary viewshed), and continuous data (the DEM), and possible consequences of generalization. While the results reported here seem quite conclusive, and are very likely to be generalizable to other areas, so long as the viewshed is of interest, there is absolutely no guarantee that either the recommended method of generalization should be the same for all derived products, or that there is not a better generalization operator for the viewshed. ACKNOWLEDGEMENTS I would particularly like to thank Jo Wood for some insightful suggestions in the course of this work. REFERENCES Brown, D.G., Ling Bain and Walsh, S.J., 1993. Response of a distributed watershed erosion model to variations in the input data aggregation levels. Computers & Geosciences 19, 499-509. Buttenfield, B.P., and McMaster, R.B. (Editors) 199 1. Map Generahzation: Makmg Rules for Knowledge Representation. London: Longman. Chang, K.-t., and Tsai, B.-w. 1991. The effect of DEM resolution on slope and apect mapping. Cartography and Geographic Information Systems 18, 6977. Congalton, R.G., Oderwald, R.G., and Mead, R.A., 1983. Assessing Landsat classfication accuracy using discrete multivariate analysis statistical techniques. Photogrammetric Engineering and Remote Sensing 49, 79-87. Fisher, P.F. 1993. Algorithm and Implementation Uncertainty in Viewshed Analysis. International Journal of Geographical Information Systems 7,33 1347. Isaacson , D.L., and Ripple, W.J. 1990. Comparison of 7.5-minute and 1degree digita1 elevation models. Photogrammetric Engineering and Remote Sensing 56, 1523-1527. Joao, E.M. 1995. The importance of quantifymg the effects of generalization. In GIs and Generahation: Methodology and Practice, edited by Muller, J.C., Lagrange, J.P. and Weibel, R. London: Taylor & Francis.pp. 183- 193. McMaster, R.B. 1987. The geometric properties of numerical simplification. Geographical Analysis 19, 330-346. Muller, J.C., Lagrange, J.P. and Weibel, R. (Editors) 1995. GIs and Generalization: Methodology and Practice. London: Taylor & Francis. 257 P Painho, M. 1995. The effects of generalization on attribute accuracy in natural resource maps. . In GIs and Generalization: Methodology and Practice, edited by Muller, J.C., Lagrange, J.P. and Weibel, R. London: Taylor & Francis.pp. 194-206.