Nearest-neighbor analysis and karst geomorphology: an introduction to spatial statistics. Richard L. Ford Department Geosciences Weber State University 2507 University Circle Ogden, UT 84408-2507 rford@weber.edu 801-626-6942 (voice) 801-626-7445 (fax) ABSTRACT This poster outlines an exercise I use in an upper-division geomorphology course to introduce students to nearest-neighbor analysis, a basic technique in spatial statistics. Nearest-neighbor analysis is a method of comparing the observed average distance between points and their nearest neighbor to the expected average nearest-neighbor distance in a random pattern of points. The pattern of points on a map or 2-D graph can be classified into three categories: CLUSTERED, RANDOM, REGULAR. Nearest-neighbor analysis provides an objective method for distinguishing among these possible spatial distributions. The technique also produces a population statistic, the nearest-neighbor index, which can be compared from area to area. Geoscience applications include the analysis of the spatial distribution of karst sinkholes, drumlins, volcanic centers, cirques, river-basin outlets, fossils on bedding planes, and crystals in polished slabs. The technique is also useful in characterizing the distribution of data points or sample locations. In general, nearest-neighbor analysis can be applied to any geoscience phenomenon or feature whose spatial distribution can be categorized as a point pattern. The basic distance data can come from topographic maps, aerial photographs, or field measurements. The exercise presented in this poster applies this technique to the study of karst landforms on topographic maps, specifically the spatial distribution of sinkholes, and draws heavily on the karst studies of McConnell and Horn (1972) and Williams (1972). The procedures and formulae used are those outlined by Davis (1973). Students are commonly surprised at how common a random distribution of sinkholes is within karst areas. Of course, if a point pattern is found to be non-random, that is clustered or regular, then other possible geologic controls need to be investigated – fracture patterns for example. The advantages of introducing nearest-neighbor analysis in an undergraduate lab is that: (1) it reinforces important concepts related to data collection (e.g significant figures), map use (e.g. scale and the UTM grid), and basic statistics (e.g. hypothesis testing); (2) the necessary calculations are easily handled by most students; and (3) once learned, the technique can be widely applied in geoscience problem-solving. Ford, R.L., 2003, Nearest-neighbor analysis and karst geomorphology: an introduction to spatial statistics: Geological Society of America Abstracts With Programs, v. 35, no. 6, p. 46. INTRODUCTION The scientific study of landforms at the Earth’s surface, the purview of geomorphologists, has become increasingly quantitative over the last 50 years or so. This trend has given rise to a variety of measurement and mathematical-analysis techniques collectively known as geomorphometry. The rise of the geomorphometric approach to landform study has been significant because it allows for rigorous statistical analyses and formal testing of hypotheses. Evans (1981) recognizes two basic strategies available to a geomorphologist as he or she attempts to quantify the form and complexity of the Earth’s surface: (1) specific geomorphometry measures the size, shape, and spatial organization of individual landforms, whereas (2) general geomorphometry treats landscapes as continuous, rough surfaces that can be described by attributes (e.g. elevation, slope, aspect) at sample points across the area. Nearest-neighbor analysis is an example of specific geomorphometry. NEAREST-NEIGHBOR ANALYSIS The spatial distribution of many geologic features can be treated as a pattern of points on a map or 2-D graph. Such a distribution of points can be classified into 3 end-member categories: CLUSTERED, RANDOM, REGULAR (Fig. 1). Nearest-neighbor analysis provides an objective method for distinguishing among these spatial distributions. This is done by comparing the average observed distance between features and their nearest neighbor to an expected or theoretical average distance between nearest-neighbor points in a distribution generated by a random process. PROCEDURE The following description of the steps and formulae used to conduct a nearest-neighbor analysis is abstracted from Davis (1973) and Ebdon (1985). Ebdon’s (1985) discussion of hypothesis testing (one- and two-tailed tests) is particularly helpful. STEP 1. Using either maps, aerial photographs, or field measurements, delineate the distribution of the features of interest as a pattern of points. STEP 2. Select a representative study area within the total population of points. This study area is generally a square or quadrat large enough to enclose 30 or more points. STEP 3. Number each feature within the study area and measure the distance to each feature’s nearest neighbor. Points inside the study area may have nearest neighbors just outside the study area – measure these distances rather than distances to nearest neighbors within the study area. In some cases two points will form a reflexive pair, each one is the nearest of the other. In that case the nearest-neighbor distance is recorded twice, making the number of nearest-neighbor distances equal to the number of points. This step is a good place to review the concept of map scale by having the students convert all their map distances (in cm or inches) to ground distances (in meters or feet) using the map’s fractional scale. It is also appropriate at this point to review the concept of significant figures. Vacher (1998) provides an excellent review of significant figures for geoscience students. STEP 4. Calculate the mean of the measured or observed nearest-neighbor distances (dobs). STEP 5. Calculate the density (ρ = N / A) of points within the study area – the number of points (N) divided by the area (A). Note: the area must be determined using the same linear unit used to measure the nearest-neighbor distances (e.g. if distances are measured in ft, area must be in ft2). This is a common mistake made by students as they first learn the technique. STEP 6. Compare the observed mean nearest-neighbor distance (dobs) to the expected values for the various types of distributions. The expected values for the various types of distribution are dependent upon the density of points (ρ) within the study area. • Random: the expected mean nearest-neighbor distance is given by: dran = 1 / (2 • ρ½) • Clustered: in the most extreme case the expected mean neatest-neighbor distance will be zero. • Uniform: the mean distance between nearest neighbors will be maximized in a hexagonal pattern where each point has six equidistant nearest neighbors. In this case the expected mean nearest-neighbor distance is given by: duni = 1.0745 / ρ½ STEP 7. The comparison mentioned in step 6 is best accomplished the calculation of the nearest-neighbor index (R), given by: R = dobs / dran If the mean of the measured nearest-neighbor distances (dobs) approaches zero, then the nearest-neighbor index (R) will approach 0.0, indicating tight clustering within the point distribution. If the mean of the nearest-neighbor distances (dobs) approaches the theoretical maximum value (duni ), then the nearest-neighbor index (R) will approach 2.15 (1.0745 / 0.5), indicating a uniform distribution of points. For random point distributions the nearest-neighbor index (R) will have a value of 1.0. STEP 8. It is possible to formally test the significance of calculated nearest-neighbor index using a normal curve. The expected standard error of the mean nearest-neighbor distance is analogous to the ordinary standard error of the mean and is calculated using the formula below. The constant in the numerator is derived from considerations of the radius of a unit circle and the Poisson probability model. SEd = 0.26136 / (N • ρ )½ where N is the number of points and ρ is the density of points. The null and alternate hypotheses are: H0 : the points are randomly distributed. R = 1.0 H1 : the points are not randomly distribiuted. R ≠ 1.0 The test statistic used is similar a z-score associated with standard normal distributions: z = (dobs - dran ) / SEd Tables of the standard normal distribution may be used to determine if the z-score is significantly different from zero, the expected z-score for a random distribution. For a significance level of 5% (α = 0.05) in a two-tailed test, the critical regions are z ≥ 1.96 and z ≤ -1.96; the null hypothesis would be rejected. A z-score ≥ 1.96 suggests a uniform distribution of points whereas a z-score ≤ -1.96 indicates a strong tendency toward clustering in the distribution. If -1.96 < z < 1.96, you would fail to reject the null hypothesis; the points are randomly distributed. Ebdon (1985) suggests that a one-tail test should be used if the direction of departure from random is specified, either towards clustering (negative values of z) or uniformity (positive values of z). A two-tailed test is appropriate if the test is simply being used to determine if a pattern is random. GEOSCIENCE APLICATIONS Nearest-neighbor analysis was first developed by plant ecologists (Clark and Evans, 1954) to analyze the spatial distribution of various plant species. Since that time, the method has seen wide application in geography but lesser use by geoscientists (Davis, 1973). Haggett (1965) can provide an introduction to the geographic applications. Table 1, compiled from Jarvis’ (1981) literature review and augmented by Ford and Williams’ (1989) review and a recent GeoRef search, lists some the geologic/geomorphic features whose spatial distribution has been investigated using nearest-neighbor analysis. Of course, if a point pattern is found to be non-random, that is clustered or regular, this finding may shed light on geologic processes that are influencing the spatial distribution of features. Nearest-neighbor analysis also produces a population statistic, the nearest-neighbor index, which can be compared from area to area. This may help to elucidate how the relative importance of different geologic processes changes from place to place. SINKHOLE KARST OF THE MITCHELL PLAIN, SOUTHERN INDIANA Amalie Orme (Cal State – Northridge) first called to my attention the work of McConnell and Horn (1972) on the karst of southern Indiana and the potential this work could have in teaching undergraduate geomorphology. McConnell and Horn (1972) used quadrat methods to analyze the distribution of sinkholes on the Mitchell Plain. In this assignment, I ask students to carry out a similar analysis using nearest-neighbor procedures. The Mitchell Plain is one of two well developed areas of karst in southern Indiana, the other being the Muscatatuck Plateau (Hasenmueller and others, 2000). The Mitchell Plain is a broad karst plateau underlain by limestones and dolomites of the Sanders and Blue River Groups (Mississippian). The Mississippian and Pennsylvanian formations in this area dip to the west, from the Cincinnati Arch toward the Illinois Basin, forming an alternating series of sandstone uplands and carbonate plains/plateaus. West of the Mitchell Plain, the relatively insoluble Upper Mississippian to Lower Pennsylvanian sandstones form the Crawford Upland. Likewise, the insoluble shale and siltstone of the Borden Group (Mississippian), which underlies the carbonates of the Sanders and Blue River Groups, forms the Norman Upland east of the Mitchell Plain. WORKED EXAMPLE The Corydon West (Indiana) 7.5-minute quadrangle covers a portion of the Mitchell Plain near the boundary between Indiana and Kentucky. The Springville escarpment, which marks the physiographic boundary between the Mitchell Plain and the Crawford Upland to the west, is readily seen on this map. A 1-km2 study area was randomly selected and 44 sinkholes were identified within the area. The nearest-neighbor distance for each sinkhole was measured and the various nearest-neighbor calculations are given below: • Density = 44 sinkholes / km2 = 4.4 x 10-5 sinkholes / m2 • dobs = 97.5 m • duni = 1.0745 / ρ½ = 1.0745 / (4.4 x 10-5 sinkholes / m2)½ = 162 m • dran = 1 / (2 • ρ½) = 1 / (2 • 4.4 x 10-5 sinkholes / m2)½ = 75.4 m • R = dobs / dran = 97.5 m / 75.4 m = 1.29 The observed nearest-neighbor distance (97.5 m) is greater than that expected (75.4 m) if the sinkholes were randomly distributed. The nearest-neighbor index is thus greater than 1.0; this suggests the distribution is somewhat uniform. Is this difference statistically significant? • H0 : R ≤ 1.0 ; the sinkholes are randomly distributed. • H1 : R > 1.0; the sinkholes are uniformly distributed • SEd = 0.26136 / (N • ρ )½ = 0.26136 / (44 • 4.4 x 10-5 sinkholes / m2 )½ = 5.94 m • z = (dobs - dran ) / SEd = (97.5 m - 75.4 m) / 5.94 m = 3.72 • Critical value = 1.645 (one-tailed test, significance level 0.05) This test shows that the null hypothesis can be rejected at the 0.05 level; the distribution of sinkholes can be considered “significantly uniform”. Indeed, there is a suggestion of a NW-SE alignment of many of the sinkholes in this area. A possible explanation is that a northwest-southeast-trending fracture set is controlling, in part, the distribution of sinkholes in this area. CONCLUSIONS Teaching the use of nearest-neighbor analysis in a geomorphology provides several pedagogic benefits: (1) it reinforces important concepts related to data collection (e.g significant figures), map use (e.g. fractional scale and the UTM grid), and basic statistics (e.g. hypothesis testing); (2) the necessary calculations are easily handled by most students; (3) once learned, the technique can be widely applied in geoscience problem-solving. A wide variety of data sources (maps, air photos, field measurements) may be used to obtain the basic distance data. REFERENCES CITED Clark, P.J., and Evans, F.C., 1954, Distance to nearest neighbor as a measure of spatial relationships in populations: Ecology, v. 35, p. 445-453. Dacey, M.F., and Krumbein, W.C., 1976, Topological properties of disjoint channel networks within enclosed regions: Journal of the International Association for Mathematical Geology, v. 8, p. 429-461. Davis, J.C., 1973, Statistics and Data Analysis in Geology: New York, John Wiley & Sons, 550 p. Day, M.J., 1978, Morphology and distribution of residual limestone hills (mogotes) in the karst of northern Puerto Rico: Bulletin of the Geological Society of America, v. 89, p. 426-432. Ebdon, D., 1985, Statistics in Geography (2nd Ed.): Oxford, Blackwell Publishers, 232 p. Evans, I.S., 1981, General geomorpometry, in Goudie, A., ed., Geomorphological Techniques: London, George Allen & Unwin, p. 31-37. Foote, M., 1990, Nearest-neighbor analysis of trilobite morphospace: Systematic Zoology, v. 39, p. 371-382. Ford, D. and Williams, P., 1989, Karst Geomorphology and Hydrology: London, Chapman Hall, 601 p. Haggett, P., 1965, Locational Analysis in Human Geography: New York, St. Martin’s Press, 339 p. Hasenmueller, N.R., Powell, R.L., Buehler, M.A., and Sowder, K.H., 2000, Karst in Indiana: Indiana Geological Survey. 10 October 2003 (http://igs.indiana.edu/geology/karst/ karstInIndiana/index.cfm). Jarvis, R.S., 1981, Specific geomorpometry, in Goudie, A., ed., Geomorphological Techniques: London, George Allen & Unwin, p. 42-46. Jauhiainen, E., 1975, Morphometric analysis of drumlin fields in northern Central Europe: Boreas, v. 4, p. 219-230. McConnel, H., and Horn, J.M., 1972, Probabilities of surface karst, in Chorley, R.J., ed., Spatial Analysis in Geomorphology: New York, Harper and Row, p. 111-133. Robinson, G.J., Peterson, J.A., and Anderson, P.A., 1971, Trend surface analysis of corrie altitudes in Scotland: Scottish Geographical Magazine, v. 87, p. 142-146. Rogerson, P.A., 2001, Statistical Methods for Geography: London, SAGE Publications, 236 p. Rose, J., and Letzer, J.M., 1975, Drumlin measurements: a test of the reliability of data derived from 1:25000 scale topographic maps: Geology Magazine, v. 112, p. 361-371. Smalley, I.J., and Unwin, D.J., 1968, The formation and shape of drumlins and their distribution and orientation in drumlin fields: Journal of Glaciology, v. 7, p. 377-390. Tinkler, K.J., 1971, Statistical analysis of tectonic patterns in areal volcanism: the Bunyaruguru volcanic field in west Uganda: Mathematical Geology, v. 3, p. 335-355. Unwin, D.J., 1973, The distribution and orientation of corries in northern Snowdonia, Wales: Transactions of the Institute of British Geographers, v. 58, p. 85-97. Vasher, H.L., 1998, Computational geology 1 – significant figures: Journal of Geoscience Education, v. 46, p. 292-295. Vincent, P.J., 1987, Spatial distribution of polygonal karst sinks: Zeitschrift für Geomorphologie N.F., v. 31, p. 65-72. Vitek, J.D., 1973, Patterned ground: A quantitative analysis of pattern: Proceedings of the Association of American Geographers, v. 5, p. 272-275. Wilkins, D.E., and Ford, R.L., 2007, Nearest neighbor methods applied to dune field organization: The Coral Pink Sand Dunes, Kane County, Utah, USA: Geomorphology, v. 83, p. 48-57. Williams, P.W., 1972a, The analysis of spatial characteristics of karst terrains, in Chorley, R.J., ed., Spatial Analysis in Geomorphology: New York, Harper and Row, p. 135-163. _____ 1972b, Morphometric analysis of polygonal karst in New Guinea: Bulletin of the Geological Society of America, v. 83, p. 761-796. ACKNOWLEDGMENTS I thank Amalie Orme (Cal State–Northridge) for first introducing me to nearest-neighbor analysis and its potential in geomorphology. I also thank all my geomorphology students over the years (UCLA, University of Utah, & Weber State University) who have cheerfully undertaken this assignment and made important observations and suggestions. Lastly, I thank Cameron Lindsley and Ben Pope (WSU GIS/RS Laboratory) for their invaluable assistance in preparing this poster. Further Notes: Weaknesses of the Approach: 1. Shape of the study area will greatly affect the results. Long, narrow, rectangular study areas may have low nearest-neighbor indices (values of R) simply because of the constraints imposed by the region’s shape; points distributed within narrow rectangles are necessarily close to one another (Rogerson, 2001). 2. The very fact that the technique requires a study area with a specific boundary may influence the analysis (i.e “the boundary effect”). One solution is to establish a buffer zone around the study area. Points inside the study area may have nearest neighbors within the buffer zone and these distances (rather than distances to nearest neighbors within the study area) should be used in calculating the average nearest-neighbor distance (Rogerson, 2001). TABLE 1. Geomorphic and other geologic features whose spatial distribution has been studied using nearest-neighbor analysis. Geomorphic/Geologic Feature References • drumlins Smalley & Unwin, 1968; Jauhiainen, 1975; Rose and Letzer, 1975 • volcanic centers Tinkler, 1971 • channel nodes in drainage networks Dacey and Krumbein, 1976 • cirques Unwin, 1973; Robinson and others, 1971 • mineral grains on a polished slab Davis, 1973 (teaching example) • karst depressions (sinkholes/dolines) Williams, 1972a &1972b; Vincent, 1987 • patterned ground Vitek, 1973 • karst hills (mogotes) Day, 1978 • morphologic change in trilobites Foote, 1990 • dune crests Wilkins and Ford, 2007