A Performance Measure for Validation of Geophysical Flow Simulation Laércio M. Namikawa, Chris Renschler, Byron Rupp, Marcus Bursik Abstract Geophysical flow presents particular challenges for validation of models simulating this phenomenon. A real event geophysical flow is geographically large scale, low recurrent, and dynamic. Fieldwork to collect data from the real event can only gather a small part of the whole information, usually presenting the results in a map with the flow footprint. In contrast with the real event data, a numerical simulation of a model emulating a geophysical flow provides a large amount of information, with values for velocities and flow heights at every step of time. The proposed method considers the characteristics of the real event and of the model simulation to generate a measure of performance based on logistic regression. The dependent variable in the regression is the probability of being inside the real event flow footprint. The independent variables are the flow height in the model simulation and a measure of distance based on the real event flow footprint. Test of the validity of the performance measure using four simulated flows provides results that are in accordance with a qualitative visual analysis. 1. Introduction Geophysical flow presents particular challenges for validation of models emulating this phenomenon due to its large scale, low recurrence, and dynamic nature. The large scale of the flow implies that the knowledge of variables is limited and the acquisition of field data is difficult, while the low recurrence means that the number of cases that can be studied is small. The dynamic nature of the phenomenon requires analysis of a series of data, each from different time instance. The objective of this paper is to present a method to generate a performance measure to be used to calibrate and validate a model of geophysical flow. The performance measure can replace the qualitative analysis widely used to compare output of a numerical simulation of geophysical flow with fieldwork data. The proposed measure considers the particular characteristics of the process under consideration, the simplifications of the simulated world, and the availability of field observations. The proposed comparison procedure will be divided in three phases: one to create a summary of simulation results; the second to define a map with distances to the real event footprint; and the quantitative analysis using statistics and spatial database approach. The simulation real event of a block and ash flow at the Volcán de Colima in April of 1991 triggered by an eruption, will be use test the validity of the proposed method. The data available are the flow footprint from the real event and model simulation results available as a series of adapted grids with flow height at various instances of time. Figure 1 presents the map with the flow mapped on the field. Figure 1. Volcan de Colima lava, block and ash flow. From (Rodriguezelizarraras, Siebe et al. 1991). 2. Literature Review The process of modeling requires comparison of the model result with observations of the modeled phenomenon in the real world to evaluate the model performance. The complexity of this step is often underestimated because modelers do not realize that here lies the culmination of all simplifications applied to the input data, the model, and the quantification of the observations. The framework provided by the scaling theory facilitate the description and analysis of the various scaling applied to both data and models; scaling in this context is the “transformation of information from one spatial and/or temporal scale to another” (Renschler 2003). Therefore, the comparison should consider the sequence of scaling presented in Figure 2, from process to measurement, to database, to modeling, to prediction, to assessment and to validation and measurement scales. Figure 2. Scaling theory, from (Renschler 2003). Comparison of outputs from models of dynamic processes is more difficult than for outputs of static processes. In nature, there is no static process, but when the model is applied, in static processes it is assumed that the output can be analyzed in only one particular time snapshot, allowing the collection of samples values that are unchanged between the input data collection and the comparison samples collection. Common comparison methods for dynamic processes involve either qualitative assessment or quantitative analysis in few key locations. A qualitative analysis method requires a 3–dimensional visualization tool, with a careful selection of visual attributes to achieve a optimum perception (Gahegan 1999). The difficulty with qualitative analysis is the subjectivity and the dependency with the response of the modeler. Quantitative analysis in few locations is a method commonly used in hydrological models. The key locations are located at the basin discharge point, where a summary of the process for the whole basin can be measured. For example, water runoff volume and hydrograph shapes can be used to compare results of a soil-hydrology-vegetation model for various Digital Elevation Models (DEMs) (Kenward, Lettenmaier et al. 2000). In erosion models there is a strong dependency on elevation data and the influence of resolution and quality of the DEM in the model outputs rely on measurement of sediment yield in water bodies at fixed stations, in the watershed outlet ((Renschler and Harbor 2002). Qualitative analysis of dynamic process models can also be executed comparing visually shapes of the affected areas, for example flow scars on the environment. The simulation of volcanic flows based on using parameters from a real event is used to create a map of the flow footprint that is compared to witness observations plotted on the same map ((Takahashi and Tsujimoto 2000); (Crisci, Rongo et al. 2004); (Itoh, Takahama et al. 2000). Quantitative comparison of spatially distributed attributes, such as the flow footprint and final deposit can be executed using regression analysis. Linear regression can be used and, the higher the coefficient of determination r2 between simulated output and the real event, the better the performance of the model is. However, the assumptions of linear regression should be satisfied ((Rogerson 2001): - Relationship between variables is linear; - Errors have mean zero and variance is constant; - Regression residuals are independent; and - Errors have a normal distribution around the regression line. The use of linear regression in not appropriate when the dependent variable has a saturation value ((Sutton, Roberts et al. 1997). In these cases, a logistic regression approach can be used, given that logistic regression is able to define the probability of occurrence of a categorical variable given a independent variable (Rogerson 2001). In environmental modeling, logistic regression is used to predict the probability of a population occurrence given the independent factors ((Bian and West 1997). The objective of the present work is to define a methodology for comparison of a model of dry avalanches results and field observations of real event avalanches. The Titan2D model is implemented as a parallel adaptive simulator using natural terrain description (Patra, Bauer et al. 2003). One of the validation sites for is at the Volcán de Colima, Mexico, which is one of the most active volcanoes in the world. An eruption on April 1991 triggered an avalanche in the south-southwest portion of the volcano. Detailed observations were recorded by (Rodriguezelizarraras, Siebe et al. 1991), including the map presented in Figure 1. The output of the simulation is prepared for 3 dimension visualization, and not suited for a quantitative analysis, requiring some manipulation for the comparison. Given the characteristics of the simulation, the fieldwork data, and the scaling framework, a regression analysis is the most adequate comparison method. 3. Methods The proposed comparison procedure will be divided in three phases: creation of a summary of simulation results; definition of a map with distances to the real event footprint; and the quantitative analysis using statistics and spatial database approach. 3.1. Summary of Simulation Results Titan2D numerical simulation generates files that are appropriate for dynamic, interactive visualization. Every output file correspond to the state of the flow at the selected instant, therefore it contains a snapshot of the flow at a particular time. For the performance measure, a summary needs to be used, allowing comparison with field data related to the flow path. The summary consists of maximum flow height at every location inside the study region for all outputs of the simulation. The following pseudo-code describes the algorithm used to create the summary: Create GRID with cells equal to zero; For each file: Build irregular mesh; For each mesh element: If rectangular element: For each cell inside element: Calculate pile height using linear interpolation; Cell value is the highest between calculated and previous. If triangular element: For each cell inside element: Calculate pile height using linear interpolation; Cell value is the highest between calculated and previous. The resulting file contains data in a regular grid that can be inserted in a geographic Information System (GIS) where it will be compared with the field survey flow map. 3.2. Distances to Real Event Flow Footprint The map of distance to the real event footprint is used as representations of the field flow. The distance data and the fieldwork flow map are used in the comparison. The following steps are required to create the distance information: 3.2.1. Scan and Georeference The Flow Map The existing fieldwork map is scanned and imported into GIS, where it is georeferenced if it is not already. 3.2.2. Digitize Flow Polygons The flow footprint presented in the scanned map needs to be transformed from an image representation into a vector representation. A manual digitalization on the image representation is used. The resulting data is a polygon extending over the flow footprint. 3.2.3. Create Skeleton Lines Delaunay triangulation is used to generate the medial axis line of the flow polygon, sometimes called skeleton lines (van der Poorten and Jones 2002). A purpose built program can be used, but GIS tools that handle DEM in Triangular Irregular Network (TIN) representation are used here. The resulting data for the Colima event, with the digitized flow polygon, the center line of the polygon defined by Delaunay triangulation, and the triangulation is presented in Figure 3. Figure 3. Flow polygon, with center line defined using Delaunay triangulation. 3.2.4. Create Distance to Flow Footprint The flow map from the fieldwork does not contain the flow height values that could be used in a comparison with the simulation data. A strong correlation of the pile height with the distance to the center of the flow footprint and to the distance to the edges of the footprint is assumed. Also in this assumption, it is considered that the footprint from the real flow represented on the map excluded flows that were too small to be distinguished. The measures of distance are executed using the GIS analysis operation of buffering, where the result is a regular grid of distances to the selected feature. Distances are measured in the regular grid space, implying that the distances are measure in multiples of one and squared root of two. The selection of the grid resolution takes into account the scale of the simulated event and the scale of the available maps. The flow map from the Colima event, presented in Figure1, does not have any indication of scale, but given the size of the printed map, it was assumed that the scale was 1:25000. For this scale, accuracy is not better than 5 meters. The base map used has a 1:50000 scale, which implies that the accuracy is no better than 10 meters. The DEM used for the numerical simulation has a distance between elevation posts of 3 arc-seconds, which for the latitude of the region (n 23o 30’) is approximately 90 meters. Given these considerations, the grid size selected for measuring distances was 15 meters, which is a compromise between the worst map and the DEM resolution. In addition to outward distances from the center of the flow and the edges of flow, a composite pseudo distance measure was created by adding the two previous distances. The purpose of this measure is to obtain a quantity that incorporates the information about the width of the flow footprint. The pseudo distance is obtained by a simple addition of the distance to center and the distance to the edges of the real event flow footprint. Figure 4 presents the distance data, with shades of grey representing the increasing distance from the center and from the edges of the flow footprint. Figure 4. Flow polygon, with the combined pseudo distance to the flow. 3.3. Definition of the analysis units. Statistical analysis of the available data from the numerical simulation and the field work data provides the quantitative measure of performance. The principal characteristic of spatial data is the autocorrelation: values of nearby locations are strongly associated. In addition to the autocorrelation, the amount of available data in a regular grid, usually in excess of 100000, suggests that an aggregation unit should be used instead of the individual values on the regular grids. The problem of defining the most adequate aggregation unit shape and size must consider the available information about the analyzed data. The selected shape for this study is rectangular, due to its easier implementation when compared to other alternatives, such as image segmentation. The selection of the size can be based on the consideration of the manageable amount of data and the knowledge about the simulated process and input data. For the study case, the DEM used is a 90-meter resolution data, and using a squared shaped unit with 100 meter side, the total number of units for the study area is 2352, which is considered to be a manageable amount of information. The units are created for the study area and attributes are associated to each unit. For each unit, the following attributes are attached: -Average outward distance to the center of the flow; -Average outward distance to the edges of flow; -Average composite pseudo distance measure created by adding the previous distances; -Average flow height from model numerical simulation; and -Unit inside or outside flow footprint surveyed in the fieldwork. Figure 5 presents the analysis units edges, with a sample unit attributes displayed. Figure 5. Analysis units, with attributes for a sample unit. 4. Quantitative analysis. Quantitative analysis is executed using regression analysis. The assumption used in this study is that the performance of the numerical simulation can be assessed using the correlation of the simulated pile heights and the distances derived from fieldwork flow footprint. The following performance measures hypotheses were initially considered: Smaller flow piles at increasing distances from the flow footprint indicate better simulation performance. The linear regression model, fitted considering the pile height as dependent variable and one of the distances as the independent variable, in this assumption will present the highest decrease ratio in the simulation in the best performance simulation; High flow piles are less likely to be outside the flow footprint and smaller pile heights outside the footprint indicate better simulation performance. The logistic regression model, fitted considering the probability of being inside the flow footprint as the dependent variable and the flow height from the numerical simulation as the independent variable, in this assumption will present the highest probability for a given flow height in the simulation with the best performance. The hypotheses require the existence of simulation results that have been previously classified accordingly to the numerical simulation performance. Test cases ordered from best to worst based on comparison by visual analysis with the real event flow map were defined and are described in the next section. 4.1. Test cases. Two synthetic simulation data sets were created based on the flow footprint and on one model simulation output. The synthetic data considered the best was obtained by assuming that the flow height is zero at 300 meters from the edges of the flow footprint and increases towards the flow. Data from a simulation result was used to define areas with higher flow height with the objective of capturing some of the numerical simulation characteristics. The simulated contour lines are shown in Figure 6, and they were used to generate a regular grid using an inverse distance interpolation. Figure 6. Simulated contour lines used to create the best simulation. The other synthetic simulation data, which is considered as the second best for the performance analysis, was obtained in a similar way as the best result. The difference between them is that the second best has a shorter flow footprint. It was considered that the flow was approximately 2000 meters shorter than the flow for the best case. The contour lines for the second best case are shown in Figure 7, and the corresponding regular grid was created using the same inverse distance interpolation method of the best case data. Figure 7. Simulated contour lines used to create the second best simulation. Model numerical simulation of flow of the Colima event was used to define the third best and the worst test cases. The simulation input parameters were the same except for the displaced volume of material, which was 0.15 million cubic meters fro the third best case and 0.8 million cubic meters of material for the worst case. The output from the larger volume is considered worst because it spreads out farther from the fieldwork surveyed flow footprint in a visual analysis. Figure 8 presents all test cases. (a) (c) (b) (d) Figure 8. Test cases: (a) Best; (b) Second best; (c) Third best; and (d) Worst. 4.2. Preliminary statistical analysis. For each of the analysis units attributes associated based on distances to the flow, simulation flow height, and if the unit is inside or outside flow footprint surveyed in the fieldwork were analyzed to define if some transformation was required. An initial analysis using scatter plot between flow height and distances indicate that the relationship between them is not linear, which is a requirement for linear regression. The solution for this case is to apply a natural logarithm to both variables, yielding the natural logarithm of the pile height and the natural logarithm of the distances. Figure 9 presents the scatter plot using the simulation flow height (identified as PILEHGTM, and PILEHLNM) and the pseudo distance to the flow footprint (identified as DISTCNED and DSTLCNED) after and before the logarithmic transformation. (a) (b) Figure 9. Scatter plot of flow height and pseudo distance: (a) Before ; (b) After logarithmic transformation. 4.3. Hypotheses tests. The hypotheses that smaller flow piles at increasing distances from the flow footprint indicate better simulation performance and that high flow piles heights are less likely to be outside the flow footprint were tested. Based on the results of the tests, performance measure can be defined. 4.3.1. Linear regression model The first hypothesis is that smaller flow piles at increasing distances from the flow footprint indicate better simulation performance. To test the hypothesis, a linear regression model should be fitted considering the flow height as dependent variable and one of the distances as the independent variable. The slope of the fitted regression model is the performance indicator, with rapid decrease in flow height for increasing distances from the flow footprint indicating a better performance than a slow decrease. The choice of the best distance measure is defined by a stepwise selection method, where the different distance measurements are added to the regression, and eliminated if they are not significant to the regression. For each added independent variable, the variables added in the previous steps are checked for significance and eliminated if they are not significant anymore. The stepwise selection of variables for the regression model of the four test cases define that the significant measure of distance is the pseudo distance, obtained by adding the distance from the center and the distance from the edges of the flow footprint. Table 1 presents the selected regression model for each of the four test cases. Table 1. Linear regression models for the test cases. R Test Case Best 0.608 Second 0.562 Third 0. 434 Worst 0.374 Variable Constant Pseudo Distance Constant Pseudo Distance Constant Pseudo Distance Constant Pseudo Distance B 2.698 -0.781 2.669 -0.761 0.144 -0.277 1.007 -0.285 Std. Error 0.236 0.050 0.287 0.060 0.153 0.030 0.139 0.024 t 11.411 -15.526 9.302 -12.581 0.943 -9.215 7.267 -12.105 Sig. 0.000 0.000 0.000 0.000 0.346 0.000 0.000 0.000 The regression model slope behaves as expected for cases one through three, but the worst test case presents a better slope value than third best case. A enhanced assessment of the regression models can be done by using the linear regression equations. Figure 10 presents the flow height calculated using the regression fitted to each of the test cases. Pile Height Estimated by Linear Regression 1.4 1.2 Best 0.8 2nd 3rd 0.6 Worst 0.4 0.2 0 Distance (m) 45 0 40 0 35 0 30 0 25 0 20 0 15 0 10 50 0 0 Pile Height (m) 1 Figure 10. Flow pile height estimated for each of the test cases. The analysis of the regression equations reveals that the right order of performance is achieved only for distances greater than 180 meters. Another shortcoming of using this regression is that the difference between the best and the second best cases is too small. 4.3.2. Logistic regression model The second hypothesis is that high flow piles heights are less likely to be outside the flow footprint. In the best simulation, flow heights outside the footprint are the smallest, and inside are the largest. Based on this hypothesis, a logistic regression model, fitted considering the probability of being inside the flow footprint as the dependent variable and the pile height from the numerical simulation as the independent variable, could provide a performance measure. In this study, the flow pile height for a 0.5 probability of being inside flow path is selected as the model performance measure. Accordingly to the hypothesis, the best performance simulation will have the highest pile height value. The table presents the logistic regression model for each of the four test cases, with their respective pile height for probability 0.5 of being inside the flow footprint. Table 2. Linear regression models for the test cases. Test Case Best Second Third Worst Variable B Sig. Constant Pseudo Distance Constant Pseudo Distance Constant Pseudo Distance Constant Pseudo Distance -0.88141 0.829751 -0.72537 0.753221 -0.31838 0.940915 -2.05608 0.853884 3.24E-09 6.77E-11 1.92E-06 1.4E-09 0.209302 4.41E-06 1.07E-63 3.31E-10 Exp(B) 0.414197 2.292747 0.484147 2.12383 0.727324 2.562326 0.127955 2.348752 Pile Height (p=0.5) 2.892909 2.619593 1.402667 11.11071 The analysis of the pile height for probability 0.5 indicates a decrease as expected for the best, second, and third cases, but an increase for the fourth case. The result shows that this regression is presenting a strong relation with the total flow volume and no information about the flow morphology is considered. By including one of the distances to the flow footprint as an additional independent variable to the logistic regression model, the morphology of the flow can be included. 4.3.3. Logistic regression model with distance The logistic regression including the distance to the center of the flow presents a regression model where the distance is not significant for the best and for the second best cases, but are significant for the third best and the worst cases. Using the distance to flow footprint edge in the logistic regression model generates a model where the pile heights are not significant for the regression in the best, the third best, and the worst case. The pseudo distance, when included in the logistic regression model, is significant for the logistic regression model in all four test cases. The flow height is also significant in these regression models. Therefore, the added distance measure is used with the pile height to predict the probability of being inside the flow footprint. Table 3 presents the logistic regression model for each of the four test cases. Table 2. Linear regression models for the test cases. Test Case Best Second Third Worst Variable Constant Pseudo Distance Flow Pile Height Constant Pseudo Distance Flow Pile Height Constant Pseudo Distance Flow Pile Height Constant Pseudo Distance Flow Pile Height B 0.962 -0.441 0.687 1.878 -0.611 0.562 3.353 -0.899 0.478 3.952 -1.233 0.464 Sig. 0.226 0.019 0.000 0.023 0.002 0.000 0.000 0.000 0.027 0.000 0.000 0.002 Exp(B) 2.617 0.644 1.988 6.540 0.543 1.755 28.574 0.407 1.613 52.062 0.291 1.590 The analysis of the logistic regression model shows that the slope for flow height decreases from the best to the worst cases. In a similar way, the slope for the pseudo distance decreases, but this result is expected given that this measure was generated from the flow footprint. These results indicate that the slope of the flow height in the logistic regression can be selected as the measure of simulation performance. A better assessment can be obtained by plotting the distances and piles height for a 50% probability of being inside the flow footprint. Figure 11 shows the probability for constant distances. Pile Height at 50% Probability of Being Inside Flow 9 Natural Log of Pile Height (m) 8 7 6 5 Best 4 2nd 3 3rd 2 Worst 1 0 -1 45 0 40 0 35 0 30 0 25 0 20 0 15 0 10 0 50 0 -2 Distance (m) Figure 11. Flow height for a 50% probability of being inside the flow. The probability plot shows that for any pseudo distance greater than 100 meters, the flow pile heights required for a simulation to be inside the fieldwork flow footprint increases from the best to the worst. Given that the pseudo distance is the sum of the distance to the center and the distance to the edge of the flow footprint, the 100 meters is in the majority of the area either within 50 meters of the flow footprint, or inside the footprint. For a distance measure equal to 200 meters, the probability of being inside the flow for pile heights between 0 and 20 meters high is presented in Figure 12. Probability of Being Inside Flow at Distance Measure 200 0.7 0.6 Probability 0.5 Best 0.4 2nd 3rd 0.3 Worst 0.2 0.1 0 0 2 4 6 8 10 12 14 16 18 Pile Height (m) Figure 12. Probability of a flow pile being inside the flow at pseudo distance 200. The assessment of the logistic regression can also be obtained by plotting the distances for a 50% probability of being inside the flow footprint for a constant pile height. Figure 13 shows the probability for constant pile heights. Distance at 50% Probability of Being Inside Flow 8 Natural Log of Distance (m) 7 6 5 Best 2nd 4 3rd Worst 3 2 1 18 16 14 12 10 8 6 4 Pile Height (m) 2 Pi le 0 Figure 13. Pseudo distance for a 50% probability of being inside the flow. The distance required for a 50% probability of being inside the flow increases from the worst to the best for pile heights that are greater than 6 meters. Greater distances are required for the best case for piles of the same height when compared to the worst case. For a pile height equal to 10 meters, the probability of being inside the flow is presented in the Figure 14. Probability of Being Inside Flow for a 10 Meter Pile 0.9 0.8 0.7 Probability 0.6 Best 0.5 2nd 0.4 3rd Worst 0.3 0.2 0.1 0 0 50 100 150 200 250 300 350 400 450 Distance (m) Figure 14. Probability of a 10 meter high flow pile being inside the flow. 5. Conclusions The assessment of the performance of a simulation should consider the characteristics that are unique for the phenomenon under study. In an ideal situation, simulation results can be compared to the phenomenon data. For this ideal case, a global measure, such as root mean square of the squared differences can be used as a measure of the model performance. In some other cases, the comparison can be executed on one location only and the performance can be measured by the difference between the simulation result value at that location and the model output. Dynamic phenomena models pose a larger challenge for performance measure, by requiring comparison not only of phenomenon values but also of the difference in time between the occurrences. When the dynamic phenomenon is distributed in a geographical scale area, the challenge is even larger. In these cases, a proper analysis of the data in hand from the real phenomenon and from the model simulation output is required. Geophysical mass flow simulation poses a great challenge for performance measurement. In addition to being a dynamic phenomenon and with large geographical area extents, the quality of information from the real world event is poor. Furthermore, quantified information of the real event is inexistent. Due to these difficulties, the performance measure of the simulation models is rudimentary, based only on subjective visual analysis. In this study, a method of performance measure for geophysical mass flows has been proposed, providing a quantitative measure, contrasting existing approaches for these types of phenomena where only qualitative analysis is made for performance assessment. From the acknowledgement that data from the real event is weak, logistic regression analysis, using the real event flow footprint, distances derived from the footprint, and a summary of the model simulation results, was selected to provide the performance measure. From the logistic regression model, the slope of the simulation pile height is used as the performance measure. The test of the methodology using two simulation results created from the real event flow footprint to represent the best results, and two simulation outputs from the numerical simulation proved that the proposed method provides quantitative performance measures that are coherent with visual analysis. References Bian, L. and E. West (1997). "GIS modeling of elk calving habitat in a prairie environment with statistics." Photogrammetric Engineering and Remote Sensing 63(2): 161-167. Crisci, G. M., R. Rongo, et al. (2004). "The simulation model SCIARA: the 1991 and 2001 lava flows at Mount Etna." Journal of Volcanology and Geothermal Research In Press, Corrected Proof. Gahegan, M. (1999). "Four barriers to the development of effective exploratory visualisation tools for the geosciences." International Journal of Geographical Information Science 13(4): 289-309. Itoh, H., J. Takahama, et al. (2000). "Hazard estimation of the possible pyroclastic flow disasters using numerical simulation related to the 1994 activity at Merapi Volcano." Journal of Volcanology and Geothermal Research 100(1-4): 503-516. Kenward, T., D. P. Lettenmaier, et al. (2000). "Effects of digital elevation model accuracy on hydrologic predictions." Remote Sensing of Environment 74(3): 432444. Patra, A. K., A. C. Bauer, et al. (2003). "Parallel Adaptative Numerical Simulation of Dry Avalanches over Natural Terrain." Journal for Volcanology and Geothermal Research Submited. Renschler, C. S. (2003). "Designing geo-spatial interfaces to scale process models: the GeoWEPP approach." Hydrological Processes 17(5): 1005-1017. Renschler, C. S. and J. Harbor (2002). "Soil erosion assessment tools from point to regional scales-the role of geomorphologists in land management research and implementation." Geomorphology 47(2-4): 189-209. Rodriguezelizarraras, S., C. Siebe, et al. (1991). "Field Observations of Pristine Block-Flow and Ash-Flow Deposits Emplaced April 16-17, 1991 at Volcan-DeColima, Mexico." Journal of Volcanology and Geothermal Research 48(3-4): 399412. Rogerson, P. A. (2001). Statistical Methods for Geography. London, Sage Publications. Sutton, P., C. Roberts, et al. (1997). "A comparison of nighttime satellite imagery and population density for the continental united states." Photogrammetric Engineering and Remote Sensing 63(11): 1303-1313. Takahashi, T. and H. Tsujimoto (2000). "A mechanical model for Merapi-type pyroclastic flow." Journal of Volcanology and Geothermal Research 98(1-4): 91115. van der Poorten, P. M. and C. B. Jones (2002). "Characterisation and generalisation of cartographic lines using Delaunay triangulation." International Journal of Geographical Information Science 16(8): 773-794.