New image processing software for analyzing

1 New image processing software for analyzing object size-frequency 2 distributions, geometry, orientation, and spatial distribution* 3 Ciarán Beggan1 and Christopher W. Hamilton2 4 1 Grant Institute, School of GeoSciences, University of Edinburgh, UK 5 2 Hawai‘i Institute of Geophysics and Planetology, University of Hawai‘i, USA 6 7 Corresponding author: Ciarán Beggan, E-mail: ciaran.beggan@ed.ac.uk; Phone: +44 8 131 650 5916; Address: School of GeoSciences, Grant Institute, West Mains Road, 9 Edinburgh, EH9 3JW, UK 10 11 *Code available from server at http://www.geoanalysis.org 12 13 Abstract 14 Geological Image Analysis Software (GIAS) combines basic tools for calculating 15 object area, abundance, radius, perimeter, eccentricity, orientation, and centroid 16 location, with the first automated method for characterizing the aerial distribution of 17 objects using sample-size-dependent nearest neighbor (NN) statistics. The NN 18 analyses include tests for: (1) Poisson, (2) Normalized Poisson, (3) Scavenged k = 1, 19 and (4) Scavenged k = 2 NN distributions. GIAS is implemented in MATLAB with a 20 Graphical User Interface (GUI) that is available as pre-parsed pseudocode for use 21 with MATLAB, or as a stand-alone application that runs on Windows and Unix 22 systems. GIAS can process raster data (e.g., satellite imagery, photomicrographs, etc.) 23 and tables of object coordinates to characterize the size, geometry, orientation, and 24 spatial organization of a wide range of geological features. This information expedites 25 quantitative measurements of 2D object properties, provides criteria for validating the 1 26 use of stereology to transform 2D object sections into 3D models, and establishes a 27 standardized NN methodology that can be used to compare the results of different 28 geospatial studies and identify objects using non-morphological parameters. 29 30 Key Words: Image analysis, software, size-frequency, spatial distribution, nearest 31 neighbor, vesicles, volcanology 32 33 1. Introduction 34 Geological image analysis extracts information from representations of natural objects 35 that may either be captured by an imaging system (e.g., photomicrographs, aerial 36 photographs, digital satellite imagery) or schematically rendered into visual form 37 (e.g., geological maps). In addition to examining the properties of individual objects, 38 spatial analysis may be used to quantify object distributions and investigate their 39 formation processes. 40 ImageJ (Rasbald, 2005) is a commonly used image processing application that 41 was developed as an open source Java-based program by the National Institutes of 42 Health (NIH). Custom plug-in modules enable ImageJ to solve numerous tasks, 43 including the analysis of vesicle size-frequency distributions within geological thin- 44 sections (e.g., Szramek et al, 2006, Polacci et al., 2007). However, ImageJ and similar 45 programs for Windows (e.g., Scion Image and ImageTool) and Macintosh (e.g., NIH 46 Image) are not specifically designed for geological applications nor do they provide 47 analytical tools for investigating patterns of spatial organization. 48 To take greater advantage of the information contained within geological 49 images, we have developed Geological Image Analysis Software (GIAS). This 50 program combines (1) an image processing module for calculating and visualizing 2 51 object areas, abundance, radii, perimeters, eccentricities, orientations, and centroid 52 locations, and (2) a spatial distribution module that automates sample-size dependent 53 nearest neighbor (NN) analyses. Although other programs can perform the basic 54 functions in the “Image Analysis" module, GIAS is the first program to automate 55 sample-size-dependent analyses of NN distributions. 56 57 2. Motivation 58 Nearest neighbor (NN) analysis is well-suited for investigating patterns of spatial 59 distribution within intrinsically two-dimensional (2D) datasets, such as orthorectified 60 aerial photographs and satellite imagery. Applications of NN analyses to remote 61 sensing imagery include the study of volcanic landforms (Bruno et al., 2004; 2006; 62 Baloga et al., 2007; Bishop, 2008; Hamilton et al., 2009; Bleacher et al., 2009), 63 sedimentary mud volcanoes (Burr et al., 2009), periglacial ice-cored mounds (Bruno 64 et al., 2006), glaciofluvial features (Burr et al., 2009), dune fields (Wilkins and Ford, 65 2007), and impact craters (Bruno et al., 2006). Despite widespread utilization of NN 66 analyses, its value as a remote sensing tool and effectiveness as basis for comparison 67 between different datasets is limited by the lack of a standardized NN methodology— 68 particularly in terms of defining feature field areas, thresholds of significance, and 69 criteria for apply higher-order NN methods. 70 In addition to remote sensing applications, NN analyses can be used to study 71 objects in photomicrographs such as crystals (Jerram et al., 1996; 2003; Jerram and 72 Cheadle, 2000) and vesicles. In this study, we emphasize the application of NN 73 analyses to vesicle distributions to demonstrate how GIAS can be used to validate (or 74 refute) the assumption of randomness, which is a prerequisite for effectively applying 3 75 stereological techniques to derive vesicle volumes from photomicrographs and for 76 selecting appropriate statistical models to characterize those vesicle populations. 77 Vesicle textures preserve information about the pre-eruptive history of 78 magmas and can be used to investigate the dynamics of explosive and effusive 79 volcanic eruptions (e.g., Mangan et al., 1993; Cashman et al., 1994; Mangan and 80 Cashman 1996; Cashman and Kauahikaua, 1997; Polacci and Papale 1997; Blower et 81 al., 2001; Blower et al., 2003; Gaonac'h et al., 2005; Shin et al., 2005; Lautze and 82 Houghton, 2005; Adams et al., 2006; Polacci et al., 2006; Sable et al., 2006; Gurioli 83 et al., 2008). Quantitative vesicle analyses stem from Marsh (1988), who explored the 84 physics of crystal nucleation and growth dynamics to derive an analytical formulation 85 for crystal size distributions. This research was then applied to numerous bubble size- 86 frequency distribution studies (e.g., Sarda and Graham, 1990; Cashman and Mangan, 87 1994; Cashman et al., 1994; Blower et al., 2003). Early studies of vesicle distributions 88 (e.g., Cashman and Mangan, 1994) were limited by their inability to characterize the 89 full range of vesicle sizes because their methodology could not resolve the smallest 90 vesicles. Nested photomicrographs solve this problem because photomicrographs 91 captured at multiple magnifications enable the reconstruction of total vesicle size- 92 frequency distributions (e.g., Adams et al., 2006; Gurioli et al., 2008). 93 In general, vesicle studies are limited by two major factors: transformations of 94 2D cross-sections into representative vesicle volumes (Mangan et al., 1993; Sahagian 95 and Proussevitch, 1998; Higgins, 2000; Jerram and Cheadle, 2000); and development 96 of reliable statistical characterizations of vesicle populations in terms of distribution 97 functions and their spatial characteristics (Morgan and Jerram, 2007; Proussevitch et 98 al., 2007a). These difficulties have been partially addressed by improved statistical 99 techniques for investigating vesicle populations; however, a single cross-section 4 100 cannot be used to reconstruct a representative 3D vesicle distribution unless the 101 objects viewed in a 2D section can be characterized using 2D reference textures with 102 known 3D distributions (Jerram et al., 1996; 2003; Jerram and Cheadle 2000; 103 Proussevitch et al., 2007a). Although synchrotron X-ray tomography is increasingly 104 being used to directly generate 3D vesicle distributions (e.g., Gualda and Rivers, 105 2006; Polacci et al., 2007; Proussevitch et al., 2007b), nested datasets containing 106 multiple Scanning Electron Microscope (SEM) images remain the most common 107 input for vesicle studies because of their superior spatial resolution relative to X-ray 108 tomography. To facilitate the analysis of vesicles in SEM imagery, GIAS can be used 109 to determine the geometric properties of vesicles within the plane of a given 110 photomicrograph and establish if objects fulfil the criteria of spatial randomness, 111 which is required for transforming 2D sections into 3D models using stereology. 112 113 3. Nearest neighbor (NN) analysis 114 Clark and Evans (1954) proposed a simple test for spatial randomness in which the 115 actual mean NN distance (ra) in a population of known density is compared to the 116 expected mean NN distance (re) within a randomly distributed population of 117 equivalent density. Following Clark and Evans (1954), re and expected standard error 118 (σe) of the Poisson distribution are: 119 re  1 2 0 ; e  0.26136 N 0 , [1; 2] 120 where the input population density, ρ0, equals the number of objects (N) divided by 121 the area (A) of the feature field (ρ0 = N / A). The following two test statistics (termed 122 R and c) are used to determine if ra follows a Poisson random distribution: 123 R ra re ; c ra  re e . [3; 4] 5 124 If a test population exhibits a Poisson random distribution, R should ideally 125 have a value of 1, while c should equal 0. If R is approximately equal to 1, then the 126 test population may have a Poisson random distribution. If R > 1, then the test 127 population exhibits greater than random NN spacing (i.e., tends towards a maximum 128 packing arrangement), whereas if R < 1, then the NN distances in the test case are 129 more closely spaced than expected within a random distribution and thus exhibit 130 clustering relative to the Poisson model. 131 To identify statistically significant departures from randomness at the 0.95 and 132 0.99 confidence levels, |c| must exceed the critical values of 1.96 and 2.58, 133 respectively (Clark and Evans, 1954); however, these critical values implicitly assume 134 large sample populations (N > 104). Jerram et al. (1996) and Baloga et al. (2007) note 135 that finite-sampling effects introduce biases into the variation of NN statistics. These 136 biases become significant for small populations (N < 300) and thereby necessitate the 137 use of sample-size-dependent calculations of R and c thresholds. 138 The NN methodology of Clark and Evans (1954) assumes that objects can be 139 approximated as points. However, for frameworks of solid objects, NN statistics 140 exhibit scale-dependent variations in randomness due to the restrictions imposed by 141 avoiding overlapping object placements. For 2D sections through 3D distributions of 142 equal-sized solid spheres, Jerram et al. (2003) define a porosity-dependent threshold 143 for R, which they term the random sphere distribution line (RSDL). With increasing N 144 (i.e., decreasing porosity), the RSDL threshold increases from a value just above 1 to 145 a maximum of 1.65 at 37% porosity, based on the experimental spherical packing 146 model of Finney (1970). Jerram et al. (1996, 2003) also note that compaction, 147 overgrowth, and sorting can introduce systematic variations in R relative to the RSDL 6 148 and thus, when correctly identified, can provide information relating to the relative 149 importance of these processes during rock formation. 150 151 In addition to sample-size-dependent tests for Poisson randomness, Baloga et al. (2007) proposed the following three NN models: 152 153 1. Normalized Poisson NN distributions account for data resolution-limits, which 154 require normalization of re, σe, skewness, and kurtosis values based on a user- 155 defined minimum distance threshold. 156 2. Scavenged Poisson NN distributions assume features consume (or “scavenge”) 157 resources in their surroundings, which affects the kth nearest neighbor by 158 increasing re relative to the standard Poisson NN model. The order of the 159 scavenging model is given by the Poisson index, k, which identifies the 160 number of NN objects participating in the scavenging process and, if k = 0, the 161 standard Poisson NN distribution is obtained. 162 3. Logistic NN distributions apply the classical probability distribution for self- 163 limiting population growth due to the use of available resources. Under this 164 probability assumption, NN distances become more uniform, which leads to 165 smaller skewness, but larger kurtosis compared to the Poisson NN model. 166 167 Sample-size-dependent R and c test statistics, in addition to the expected 168 skewness versus kurtosis values, allow for robust statistical tests of spatial 169 organization. However, prior to this study, these extensions to the classical NN 170 method have not been automated and freely available for use. 171 172 4. Method 7 173 4.1. Input data and processing parameters 174 GIAS is designed to process two input data types: images and tables of object 175 coordinates. In the first instance, images may include rectangular raster files (e.g., 176 *.tiff, *.jpg, *.gif, etc.) encoded in 8-bit grey-scale formats. All inputs are assumed to 177 have an orthogonal projection with equal x and y pixel dimensions. If the input image 178 pixels are not square, their area can be uniquely specified. To appropriately scale the 179 output results, the pixel size must be defined by the analyst for each input image. 180 Image segmentation is achieved using Digital Number (DN) thresholding, 181 such that inputs are converted to a binary threshold logical matrix. GIAS assumes 182 input grey-scale pixel values >250 are white; however, the upper and lower thresholds 183 can be adjusted in the input options. Within input images, black pixels are assumed to 184 be the objects of interest (e.g., vesicles) whereas white pixels are assumed to be part 185 of the background (e.g., non-vesicles). An option is provided for image inversion. 186 Other than performing the binary conversion, GIAS does not modify input 187 images. Consequently, pre-processing may be necessary to remove image artifacts 188 and segment objects into feature classes. To facilitate image analysis and pre- 189 processing of inputs to the NN module, GIAS supplies labeling information and 190 metadata for each object in the image analysis output file; however, the level of image 191 modification will depend upon the particular scientific investigation and the 192 preferences of the analyst. 193 Inputs to the NN module can be passed from the image analysis module or 194 loaded as a two column table (tab- or space- delimited) with x-coordinates in the first 195 column and y-coordinates in the second column. GIAS automatically checks input 196 files for duplicate entries and retains only unique coordinate pairs. 8 197 Slider bars control the detection thresholds for the minimum and maximum 198 object areas, in addition to the distance threshold used within the Normalized Poisson 199 tests. A check-box excludes objects that touch the periphery of the image from 200 subsequent processing. This option is enabled by default because meaningful cross- 201 sectional properties (e.g., area, geometry, and orientation) cannot be calculated for 202 truncated objects. Calculations of object abundance (as a percentage of total image 203 area) are based on total image properties and are unaffected by peripheral object 204 exclusion. Once the input options are satisfactorily set, the analyst simply presses the 205 “Process” button to calculate the output graphs and statistics. 206 207 4.2. Output results 208 GIAS outputs its results to on-screen graphs and text boxes, which are presented in 209 two tabbed panels. Statistics relating to the object area, geometry, and orientation are 210 shown in the first panel, labeled “Image Analysis” (Figure 1). The frequency axis can 211 be plotted in linear or log units with on-screen options for defining custom 212 dimensions. Object areas, radii, perimeters, and eccentricities are presented as 213 histograms, whereas object orientations are displayed using a rose plot. Object radii 214 are estimated by calculating the radii of equal area circles. In addition to the graphical 215 output, there are text-box summaries of the total number of objects, their abundance, 216 minimum, maximum, mean, standard deviation (at 1 σ), skewness, and kurtosis based 217 on the object area calculations. 218 Results of the spatial analyses are summarized in a second panel, labeled 219 “Nearest Neighbor Analysis” (Figure 2). These results include histograms of the 220 measured NN distances, in addition to graphs of R, c, and skewness versus kurtosis 221 (Baloga et al., 2007). The R and c values are presented with sample-size-dependent 9 222 thresholds for rejecting the null hypothesis (i.e., a given model of re) at significance 223 levels of 1 and 2 σ. By default, the program assumes a Poisson random model for the 224 expected spatial distribution; however, the user may also view results based on 225 Normalized Poisson, or Scavenged (k = 1 or 2) models. Similarly, the skewness 226 versus kurtosis plots are provided with sample-size-dependent significance levels of 227 0.90, 0.95, and 0.99. 228 Textbox outputs provide the model-dependent value of re and summarize ra 229 statistics, including the total number of objects, number within the convex hull, and 230 NN distance statistics (minimum, maximum, mean, and standard deviation at 1 and 2 231 σ). The convex hull is a polygon defined by the outermost points in a feature field 232 such that each interior angle of the polygon measurers less than 180º (Graham, 1972). 233 The convex hull thus forms a boundary around a feature field with the minimum 234 possible perimeter length (see Appendix 1 for examples). 235 The input parameters and computed statistics from both tabbed panels can be 236 output in separate user-named text files. The data are saved in a tab-delimited format, 237 allowing the files to be viewed using a text editor or imported into a spreadsheet. 238 239 4.3. Implementation 240 GIAS is written and implemented in MATLAB with a Graphical User Interface (GUI) 241 that is designed for use within the geological community. The program is available as 242 preparsed pseudocode for use with MATLAB (provided both “Image Processing” and 243 “Statistics” toolboxes are installed) or as a standalone application that can run on 244 Windows and Unix systems without requiring MATLAB or its auxiliary toolboxes. 245 These software products may be downloaded from the internet for free via 246 http://www.geoanalysis.org. 10 247 The regionprops function within the MATLAB Image Processing toolbox 248 calculates the following key properties of each object: area, centroid position, 249 perimeter, eccentricity, orientation, equal-area circle equivalent diameter and a list of 250 pixel positions within each vesicle. The pixel list is used to determine which objects 251 are in contact with the image edge and the equal area circle diameters are used to 252 calculate object radii. The eccentricity and orientation are calculated by fitting an 253 ellipse about the object. The angle between the major axis of the ellipse and the x-axis 254 of the image defines the orientation. 255 By default, the NN analysis utilizes centroid positions calculated in the image 256 analysis module, however, object locations may be uploaded as a table of x- and y- 257 coordinates. Within the NN module, the MATLAB function convhull is used to 258 identify the vertices of the convex hull for all of the input objects. The function 259 convhull is also used to calculate the area of the convex hull (Ahull) and to separate 260 interior objects (Ni) from the hull vertices (see Appendix 1 for examples). 261 NN distances are determined by calculating the Euclidean distance between 262 each interior object (centroid) and all other objects within the dataset, including the 263 vertices of the hull. The results for each interior point are sorted in ascending order to 264 identify the minimum NN distance and calculate the actual mean NN distance (ra) by 265 averaging the NN distances for all interior objects (Ni). The interior population 266 density (ρi = Ni / Ahull), provides a statistical estimate of the true density within an 267 infinite domain and thus substitutes for ρ0. We then calculate R as the ratio of ra-to-re. 268 Using sample-size-dependent values of R, c, and their respective thresholds of 269 significance, we may define the following scenarios. If the values of R and c fall 270 within ±2 σ of their expected values, then the results suggest that the model correctly 271 describes the input distribution. If the value of c is outside the range of ±2 σ, then the 11 272 inference is that the input distribution is inconsistent with the model, regardless of the 273 value of R. If c is outside ±2 σ, and R is greater than the +2 σ threshold, then the input 274 distribution exhibits a statistically significant repelling relative to the model. 275 Conversely, if c is outside the ±2 σ range, and R less then than the -2 σ threshold, then 276 the input distribution is inferred to be clustered relative to the model. Lastly, if R is 277 outside ±2 σ, and c is within the ±2 σ range, then the test results are ambiguous, 278 which generally indicates that there is insufficient data to perform the analysis and/or 279 the standard error (σe) of the input distribution is large. 280 One of the novel contributions of our application is the introduction of 281 automated calculations of sample-size-dependent biases in NN statistics. Baloga et al. 282 (2007) described how finite sampling would affect several NN models; however, they 283 only implemented a solution for the Poisson NN case. In our application, we have 284 automated the Poisson NN procedure, extended it to include the three other models of 285 Baloga et al. (2007), and derived the k = 2 case for the scavenged NN distribution 286 (Appendix 2). Automation of the NN method within GIAS enables users to rapidly 287 determine if their data fulfills the size-dependent criteria of statistical significance for 288 the following five spatial distribution models: (1) Poisson, (2) Normalized Poisson, 289 (3) Scavenged k = 1; and (4) Scavenged k = 2. Table 2 summarizes the properties of 290 these models. 291 To obtain sample-size-dependent values of R, c, and their confidence limits, 292 we generated simulations in MATLAB. Following the method of Baloga et al. (2007), 293 confidence limits for the Poisson and the Scavenged k = 1 and 2 models were 294 simulated using 1000 random draws from a modeled distribution. However, for the 295 Normalized Poisson model, limits for the standard Poisson case are employed because 296 simulating unique limits for each user-defined distance threshold is computationally 12 297 intensive in real-time (e.g., requiring approximately four minutes to perform the 298 simulations using a computer with a 2 GHz processor and 1 GB of RAM). 299 GIAS does not include the logistic density distribution described by Baloga et 300 al. (2007) because its parameters (i.e., the logistic mean (m) and the standard 301 deviation (b), see Table 2) are estimated using a best-fit to the input data. The logistic 302 case therefore differs substantially from the other NN analyses because they involve a 303 comparison between input spatial distributions and expected model distributions. 304 Additionally, the fit of a logistic curve to an input distribution does not necessarily 305 imply an underlying logistic process because other processes could generate 306 distributions that better fits the data. Thus given the differences between the logistic 307 density distribution described by Baloga et al. (2007) and the other Poisson NN 308 models, we have omitted the logistic case from the geospatial analysis tools 309 implemented within GIAS. 310 To correctly interpret the results of the Scavenging NN models, it is important 311 to recognize that the formulation of Baloga et al. (2007) is a scale-free distribution. 312 Modification to expected NN distances depend upon the number of objects 313 participating in the scavenging process and not on the location of the objects. 314 Consequently, the mean and standard deviation statistics are not directly related to the 315 radius of the scavenged area. This is an unrealistic condition for resource-scavenging 316 phenomena in nature, which have a fixed upper limit to r. Alternative modeling 317 scenarios, with a user-defined scavenging radius, may be more appropriate for some 318 applications—provided that a scale for the scavenging process can be determined. 319 Distinguishing between Poisson NN and Scavenged Poisson NN models 320 requires statistical separation of their respective NN distributions. The minimum 321 number of objects required will depend on ρ0 and k. In Appendix 3, we show that 13 322 separation of the Poisson and k = 1 Scavenged Poisson models, at the 95% confidence 323 level, requires a minimum 50 to 60 objects for ρ0 values of 0.5 to 0.0005 objects / unit 324 area, whereas 100 to 150 objects are required to distinguish between k = 1 and 2 325 models. 326 In addition to sample-size-dependent R and c statistics, we expand upon the 327 methods of Baloga et al. (2007) by simulating the sample-size-dependent range of 328 expected skewness and kurtosis values for each NN hypothesis. Calculated 329 confidence intervals—at 90%, 95% and 99%—are plotted within GIAS to illustrate 330 the expected ranges of skewness versus kurtosis for each model. Plots of skewness 331 versus kurtosis are also useful for discriminating between populations with otherwise 332 similar R and c values, such as ice-cored mounds (e.g., pingos) and volcanic rootless 333 cones (Bruno et al., 2006). 334 335 5. Results 336 To demonstrate the utility of GIAS, we have applied the program to vesicle patterns 337 within proximal magmatic tephra deposits from Fissure 3 of the 1783-1784 Laki cone 338 row, Iceland. Tables 3 and 4 summarize the results of the “Image Analysis” and 339 “Nearest Neighbor Analysis” modules, respectively. In Appendix 1, we include 340 additional examples of geospatial distributions that are clustered and repelled relative 341 to the Poisson NN model. These examples are based on the Hamilton et al. (2010a, 342 2010b) and demonstrate how statistical NN analyses can be combined with field 343 observations to obtain information about the effects of environmental resource 344 limitation on the spatial distribution of volcanic landforms. Appendix 1 also explores 345 how the geospatial analysis tools that are provided within GIAS can be applied to 346 remote sensing data to quantitatively compare the spatial distribution of landforms in 14 347 different planetary environments, supply non-morphological evidence to discriminate 348 between feature identities, and examine self-organization processes within geological 349 systems. 350 351 5.1. Vesicle analysis 352 Vesicle characteristics can provide information about the pre-eruptive history of 353 magmas, volatile exsolution dynamics, and conduit ascent processes—all of which are 354 important factors in controlling the intensity of explosive volcanic eruptions. 355 Quantification of these parameters relies largely upon the use of vesicle size, number, 356 and volume distributions to establish links between natural volcanic samples and 357 experimental data (Adams et al., 2006). Nested SEM images at several magnifications 358 are typically used in combination with stereology (e.g., Sahagian and Proussevitch, 359 1998) to transform 2D vesicle properties into 3D distributions. Although acquisition 360 and analysis of a complete nested SEM dataset is outside the scope of this study, we 361 present an example of how GIAS can be used to analyze vesicle properties within a 362 single image. For this example, we have chosen a magmatic tephra sample (LAK-t- 363 23c, Emma Passmore, unpublished data) from 1783-1784 Laki eruption in Iceland. 364 The resolution of the image is 3.81 μm / pixel and we have followed the method of 365 Gurioli et al. (2008) by using a minimum object area threshold of 20 pixels (76.2 366 μm2) to avoid complications associated with poorly resolved objects near the limit of 367 resolution within our data. 368 The output of the "Image Analysis" module (Table 3 and Figure 1) reveals that 369 the sample has a vesicularity of 65.48% and—excluding objects in contact with the 370 periphery of the image—there are 537 objects larger than the detection threshold of 371 76.2 μm2. Assuming the properties of an equal area circle, the vesicle diameters range 15 372 from 19.21 μm to 2.85 × 103 μm, with a mean of 3.01 × 102 μm ±6.89 × 102 (at 1 σ). 373 The ratio of equal area circle circumference to object perimeter is 0.77 ±1.53 (at 1 σ); 374 however; for the 21 vesicles larger than 3.10 × 105 μm2 the ratio decreases to 0.55 375 ±0.21, which indicates that the largest vesicles strongly deviate from a circular 376 geometry (see the graph of “Objects Perimeters”). The vesicles have a mean 377 eccentricity of 0.71 ± 0.18 and a mean orientation trending 7.27° to 187.27° ±43.91° 378 (n.b., 90° points towards the top of the image; see “Object Eccentricity” and “Object 379 Orientations” in Figure 1). 380 The NN module results (Table 4 and Figure 2) show that NN distances 381 between vesicle centroids range from 28.05 to 1.53 ×103 μm, with a mean (ra) of 1.75 382 × 102 ±1.15 ×102 (see “Nearest Neighbor Frequency Distribution” in Figure 2). The 383 sample has R and c values within the 2 σ thresholds of significance (Figure 2) and 384 thus we conclude that the vesicle centroids exhibit a Poisson (random) distribution. 385 We have also analyzed the sample using a Normalized Poisson threshold of 5 pixels 386 (19.24 μm), which corresponds to the diameter of a circle that is equal in area to our 387 object area threshold of 20 pixels. Given this threshold, the sample exhibits a repelled 388 distribution with R and c values greater than their respective upper 2 σ limits. The 389 Scavenged k = 1 and 2 cases overestimate re relative to ra, and thus result in clustered 390 results relative to both Scavenged models. 391 The GIAS outputs suggest that vesicles within this sample exhibit an overall 392 Poisson distribution despite the presence of a vesicle fabric trending 7.27° to 187.27° 393 ±43.91°. The random distribution of vesicles supports the usage of a stereological 394 transformation, however, deformation and coalescence have caused vesicles >3.10 × 395 105 μm2 to strongly deviate from a circular geometry. Consequently, if the vesicles 396 represented within this thin-section were transformed into a 3D distribution using 16 397 stereology, it would be necessary to acknowledge propagating errors associated with 398 the non-circular geometry of the largest vesicles in the sample. Additionally, it is 399 important to recognize the resolution limitations associated with analysis because we 400 have applied a minimum object area threshold of 20 pixels, which excludes all 401 vesicles <76.2 μm2 (i.e., 630 of 1167 objects). A detection threshold of 20 pixels is 402 appropriate for studies of vesicle size-frequency distributions because an 403 misclassification of 1 pixel results in an error of 5% of the estimate of total object area 404 (Lucia Gurioli, personal communication, 2009). Nevertheless, this threshold appears 405 insufficient for filtering artifacts associated with the geometric properties of the 406 smallest vesicles in the input image. For instance, eccentricity values calculated for 407 vesicles within the Laki magmatic tephra sample appear to be very high (0.71 ± 0.18), 408 but this is a resolution-induced artifact resulting from pixel-scale asymmetries in a 409 large population of small vesicles. Resolution limitations also affect the Normalized 410 Poisson NN distribution because the large number of objects below the detection 411 thresholds leads to underestimates of ρ0 and re, which in turn results in a repelled 412 distribution relative to the model. In instances where the majority of objects are below 413 the detection limit of analysis, it is therefore imperative to use nested image datasets, 414 and/or high magnification mosaics to resolve the properties of the smallest objects in 415 a given distribution. 416 417 6. Discussion 418 For randomly distributed spherical objects, stereological techniques may be used to 419 convert cross-sectional areas into volumes (Sahagian and Proussevitch, 1998). 420 Higgins (2000) and Morgan and Jerram (2007) have also applied 2D to 3D 421 transformations for crystal shapes such as cubes and prisms based upon a large 17 422 number of observed cross-sectional areas. Nevertheless, for irregular object 423 geometries, transformations do not exist for generating an accurate 3D model from a 424 single 2D image (Mock and Jerram, 2005; Sable et al., 2006). For vesicle studies, 425 direct measurements of 3D vesicle distributions using X-ray microtomography can be 426 used to circumvent the problems associated with stereological transformations (e.g., 427 Gualda and Rivers, 2006; Polacci et al., 2007; Proussevitch et al., 2007b), but the 428 resolution of synchrotron-based X-ray tomography is insufficient to resolve micron- 429 sized vesicle walls, which can result in erroneously large estimates of vesicle 430 permeability (Gurioli et al., 2008). Tomographic resolution limitations similarly 431 prohibit the detection of sub-micron vesicles, which decreases estimates of vesicle 432 number densities relative to investigations of higher resolution SEM images (Gurioli 433 et al., 2008). Consequently, even given the availability of 3D imaging techniques, 434 thin-section analysis continues to supply unique information about vesiculation 435 processes and, therefore, 2D image processing should continue to be explored. 436 Similarly, NN analysis of a single 2D image cannot resolve the spatial 437 organization of the centroids in 3D, unless additional reference information is 438 provided (e.g., Jerram et al., 1996; 2003; Jerram and Cheadle, 2000). Instead, 2D 439 methods characterize the aerial distribution of objects within the plane of the image. 440 When applying NN analyses to vesicle distributions in thin-sections, it is important to 441 remember that object centroids are unlikely to coincide with the center of a vesicle in 442 3D, especially when vesicles are irregularly shaped. To obtain a more meaningful 443 statistical description of vesicle organization it may be necessary to analyze several 444 photomicrographs—preferably three mutually perpendicular thin-sections that sample 445 the principal component directions of the vesicle fabric. Vertical offsets in the 446 position of objects in orthometric aerial photographs and satellite imagery may 18 447 similarly introduce discrepancies between projected object separations and true NN 448 distances, but the vertical offsets are generally small relative to the separation of 449 objects in the plane of the image and hence the differences can be ignored. 450 451 7. Conclusions 452 We have developed Geological Image Analysis Software (GIAS) for calculating the 453 geometric properties of objects and quantifying their spatial distribution. The program 454 improves the efficiency and reproducibility of geological object characterizations by 455 combining image processing tools for calculating object areas, abundance, radii, 456 perimeters, eccentricity, orientation, and centroid location with sample-size-dependent 457 NN tests for (1) Poisson; (2) Normalized Poisson; (3) Scavenged, k = 1; (4) and 458 Scavenged k = 2 distributions. These geospatial analyses have not been implemented 459 in other software packages and, consequently, GIAS represents a novel approach to 460 characterizing vesicle distributions in thin-sections and examining the spatial 461 organization of other geological features in datasets ranging from photomicrographs 462 (e.g., Section 5) to remote sensing imagery of planetary surfaces (e.g., Appendix 1 463 and Hamilton et al., 2010b). 464 Future work will focus on developing the following aspects of GIAS: (1) 465 image segmentation using Kriging analysis of DN histograms (e.g., Oh and Lindquist, 466 1999); (2) automated object recognition (e.g., Proussevitch and Sahagian, 2001; 467 Lindquist et al., 1996; Shin et al, 2005); (3) geometric cluster analysis (e.g., Still et 468 al., 2004); (4) statistical analyses to address truncated and multimodal populations 469 (e.g., Proussevitch et al., 2007a); (5) processing nested image datasets (e.g., Gurioli et 470 al., 2008); (6) incorporating stereological techniques for converting 2D data into 3D 471 distributions and then using the 3D models to calculate vesicle number density (e.g., 19 472 Sahagian and Proussevitch, 1998; Proussevitch et al., 2007a); (7) developing NN 473 analyses for 3D datasets (Jerram and Cheadle, 2000; Mock and Jerram, 2005); and (8) 474 treatment of solid object interactions on NN statistics (e.g., Jerram et al.; 2003). 475 476 Acknowledgements 477 We thank Alexander Proussevitch, Steve Baloga, Sarah Fagents, Bruce Houghton, 478 Lucia Gurioli, and Mark Naylor for insightful discussions relating to vesicle 479 distributions and NN simulations; Emma Passmore for kindly providing the 480 geological thin-sections examined within this study; and Dougal Jerram and 481 Guilherme Gualda for their thorough reviews, which have greatly improved this 482 manuscript. CDB wishes to acknowledge the assistance of Sue Rigby in providing 483 funding for software purchases. CDB was funded under NERC studentship award 484 NER/S/J/2005/13496. CWH is funded by the Icelandic Centre for Research 485 (RANNÍS) and the NASA Mars Data Analysis Program (MDAP) grant 486 NNG05GQ39G. SOEST publication number: 7808. HIGP publication number 1802. 487 488 489 References 490 Adams, N.K., Houghton, B.F., Fagents, S.A., Hildreth, W., 2006. The transition from 491 explosive to effusive eruptive regime: The example of the 1912 Novarupta eruption, 492 Alaska. Geological Society of America Bulletin 118, 620–634. 493 doi:10.1130/B25768.1. 494 20 495 Baloga, S.M., Glaze, L.S., Bruno, B.C., 2007. Nearest-neighbor analysis of small 496 features on Mars: Applications to tumuli and rootless cones. Journal of Geophysical 497 Research 112, E03002, 1-17. doi:10.1029/2005JE002652. 498 499 Bishop, M.A., 2008. Higher-order neighbor analysis of the Tartarus Colles cone 500 groups, Mars: The application of geographical indices to the understanding of cone 501 pattern evolution. Icarus 197, 73–83. 502 503 Bleacher, J.E., Glaze, L.S., Greeley, R., Hauber E., Baloga, S.M., Sakimoto, S.E.H., 504 Williams, D.A., Glotch, T.D., 2009. Spatial and alignment analyses for a field of 505 small volcanic vents south of Pavonis Mons and implications for the Tharsis province, 506 Mars. Journal of Volcanology and Geothermal Research 185, 96-102. 507 doi:10.1016/j.jvolgeores.2009.04.008. 508 509 Blower, J.D., Keating, J.P., Mader, H.M., Phillips, J.C., 2001. Inferring volcanic 510 degassing processes from vesicle size distributions. Geophysical Research Letters 28, 511 347–350. 512 513 Blower, J.D., Keating, J.P., Mader, H.M., Phillips, J.C., 2003. The evolution of 514 bubble size distributions in volcanic eruptions. Journal of Volcanology and 515 Geothermal Research 120, 1–23. 516 517 Bruno B.C., Fagents S.A., Thordarson T., Baloga S.M., Pilger E., 2004. Clustering 518 within rootless cone groups on Iceland and Mars: Effect of nonrandom processes. 519 Journal of Geophysical Research 109, E07009, 1-11. doi:10.1029/2004JE002273. 21 520 521 Bruno, B.C., Fagents, S.A., Hamilton, C.W., Burr, D.M., Baloga, S.M., 2006. 522 Identification of volcanic rootless cones, ice mounds, and impact craters on Earth and 523 Mars: Using spatial distribution as a remote sensing tool. Journal of Geophysical 524 Research 111, E06017, 1-16. doi:10.1029/2005JE002510. 525 526 Burr, D.M., Bruno, B.C., Lanagan, P.D., Glaze, L.S., Jaeger, W.L., Soare, R.J., Wan 527 Bun Tseung, J.M., Skinner, J.A., Baloga, S.M., 2009. Mesoscale raised-rim 528 depressions (MRRDs) on Earth: A review of the characteristics, processes and spatial 529 distributions of analogs for Mars. Planetary and Space Science 57, 579-596. 530 531 Cashman, K.V., Mangan, M.T., 1994. Physical aspects of magmatic degassing: II. 532 Constraints on vesiculation processes from textural studies of eruption products. In: 533 Carroll, M.R. and Holloway, J.R. (Eds.), Volatiles in Magmas. Reviews in 534 Mineralogy, Mineralogical Society of America, pp. 447–478. 535 536 Cashman, K.V., Kauahikaua, J.P., 1997. Reevaluation of vesicle distributions in 537 basaltic lava flows. Geology 25, 419–422. 538 539 Clark, P.J., Evans, F.C., 1954. Distance to nearest neighbor as a measure of spatial 540 relationships in populations. Ecology 35, 445–453. 541 542 Finney J., 1970. The geometry of random packing. Proceedings of the Royal Society 543 of London, Series A 318, 479-493. 544 22 545 Gaonac'h, H., Lovejoy, S., Schertzer, D., 2005. Scaling vesicle distributions and 546 volcanic eruptions. Bulletin of Volcanology 67, 350–357. 547 548 Gualda, G.A.R., Rivers, M., 2006. Quantitative 3D petrography using X-ray 549 tomography: Application to Bishop Tuff pumice clasts. Journal of Volcanology and 550 Geothermal Research 154, 48–62. 551 Gurioli, L., Harris, A.J.L., Houghton, B.F., Polacci, M., Ripepe, M., 2008. Textural 552 and geophysical characterization of explosive basaltic activity at Villarrica volcano, 553 Journal of Geophysical Research 113, B08206, 1-16. doi:10.1029/2007JB005328. 554 555 Hamilton, C.W., Thordarson, T., Fagents, S.A., 2010a, Explosive lava-water 556 interactions I: architecture and emplacement chronology of volcanic rootless cone 557 groups in the 1783-1784 Laki lava flow, Iceland. Bulletin of Volcanology, (in press) 558 559 Hamilton, C.W., Fagents, S.A., Thordarson, T., 2010b, Explosive lava-water 560 interactions II: self-organization processes among volcanic rootless eruption sites in 561 the 1783-1784 Laki lava flow, Iceland. Bulletin of Volcanology, (in press). 562 563 Higgins, M.D., 2000. Measurement of crystal size distributions. American 564 Mineralogist 85, 1105-1116. 565 566 Jerram, D.A., Cheadle, M.J., Hunter, R.H., Elliot, M.T., 1996. The spatial 567 distribution of crystals in rocks. Contributions to Mineralogy and Petrology 125(1), 568 60-74. 23 569 570 Jerram, D. A., Cheadle, M. J., 2000. On the cluster analysis of rocks. American 571 Mineralogist 84, 47-67. 572 573 Jerram, D.A., Cheadle, M.J., Philpotts, A.R., 2003. Quantifying the building blocks of 574 igneous rocks: are clustered crystal frameworks the foundation? Journal of Petrology 575 44, 2033-2051. doi:10.1093/petrology/egg069. 576 577 Lautze, N.C., Houghton, B.F., 2005. Physical mingling of magma and complex 578 eruption dynamics in the shallow conduit at Stromboli volcano, Italy. Geology 33 579 425–428. 580 581 Lindquist, W.B., Lee, S-M., Coker, D.A., Jones, K.W., Spanne, P., 1996. Medial axis 582 analysis of void structure in three-dimensional tomographic images of porous media. 583 Journal of Geophysical Research 101, 8297–8310. 584 585 Mangan, M.T., Cashman, K.V., 1996. The structure of basaltic scoria and reticulite 586 and inferences for vesiculation, foam formation, and fragmentation in lava fountains. 587 Journal of Volcanology and Geothermal Research 73, 1–18. 588 589 Mangan, M.T., Cashman, K.V., Newman S., 1993. Vesiculation of basaltic magma 590 during eruption, Geology 21, 157–160. 591 592 Mangan, M.T., Mastin, M., Sisson, T., 2004. Gas evolution in eruptive conduits: 593 combining insights from high temperature and pressure decompression experiments 24 594 with steady-state flow modelling. Journal of Volcanology and Geothermal Research 595 129, 23–36. 596 597 Marsh, B.D., 1988. Crystal size distributions (CSD) in rocks and the kinetics and 598 dynamics of crystallization: 1. Theory. Contributions to Mineralogy and Petrology 99, 599 277–291. 600 601 Mock, A., Jerram, DA., 2005. Crystal size distributions (CSD) in three dimensions: 602 Insights from the 3D reconstruction of a highly porphyritic rhyolite. Journal of 603 Petrology 46, 1525-1541. 604 605 Morgan, D., Jerram, D.A., 2006. On estimating crystal shape for crystal size 606 distribution analysis. Journal of Volcanology and Geothermal Research 154, 1-7. 607 608 Oh W., Lindquist W., 1999. Image thresholding by indicator Kriging. IEEE 609 Transactions on Pattern Analysis and Machine Intelligence 21, 590–602. 610 611 Polacci, M., Baker, D.R., Bai, L., Mancini L., 2007. Large vesicles record pathways 612 of degassing at basaltic volcanoes. Bulletin of Volcanology 70, 1023-1029. doi 613 10.1007/s00445-007-0184-8. 614 615 Polacci, M., Baker, D.R., Mancini, L., Tromba, G., Zanini, F., 2006. Three- 616 dimensional investigation of volcanic textures by X-ray microtomography and 617 implications for conduit processes. Geophysical Research Letters 33, L13312, 1-5. 618 doi:10.1029/2006GL026241. 25 619 620 Polacci, M., Papale, P., 1997. The evolution of lava flows from ephemeral vents at 621 Mount Etna: insights from vesicle distribution and morphological studies. Journal of 622 Volcanology and Geothermal Research 76, 1–17. 623 624 Proussevitch, A., Sahagian, D., Tsentalovich, E.P., 2007a. Statistical analysis of 625 bubble and crystal size distributions: Formulations and procedures. Journal of 626 Volcanology and Geothermal Research 164, 95-111. 627 628 Proussevitch, A., Sahagian, D., Carlson, W., 2007b. Statistical analysis of bubble and 629 crystal size distributions: Application to Colorado Plateau Basalts. Journal of 630 Volcanology and Geothermal Research 164, 112-126. 631 632 Proussevitch, A., Sahagian, D., 2001. Recognition and separation of discrete objects 633 within complex 3D voxelized structures. Computers & Geosciences 27, 441–454. 634 635 Rasband, W.S., 2005. ImageJ, U. S. National Institutes of Health, Bethesda, 636 Maryland, USA, http://rsb.info.nih.gov/ij/ [accessed November 28, 2009]. 637 638 Sahagian, D., Proussevitch, A., 1998. 3D particle size distributions from 2D 639 observations: stereology for natural applications. Journal of Volcanology and 640 Geothermal Research 84, 173–196. 641 642 Sable, J., Houghton, B.F., Del Carlo, P., Coltelli, M., 2006. Changing conditions of 643 magma ascent and fragmentation during the Etna 122 BC basaltic Plinian eruption: 26 644 evidence from clast microtextures. Journal of Volcanology and Geothermal Research 645 158, 333–354. 646 647 Sarda, P., Graham, D., 1990. Mid-ocean ridge popping rocks: implication for 648 degassing at ridge crests. Earth and Planetary Science Letters 97, 268–289. 649 650 Shin, H., Lindquist, W.B., Sahagian, D.L., Song, S-R., 2005. Analysis of the vesicular 651 structure of basalts. Computers & Geosciences 31, 473–487. 652 653 Stephenson, G., 1992. Mathematical Methods for Science Students, 2nd edn., 654 Longman Scientific & Technical, London, 528pp. 655 656 Still, S., Bialek, W., Bottou, L., 2004. Geometric clustering using the Information 657 Bottleneck method. In: Thrun, S., Saul, L., and Schölkopf, B. (Eds.), Advances in 658 Neural Information Processing Systems, 16, MIT Press, Cambridge, MA, pp. 1165- 659 1172. 660 661 Szramek, L., Gardner, J.E., Larsen, J., 2006. Degassing and microlite crystallization 662 of basaltic andesite magma erupting at Arenal Volcano, Costa Rica. Journal of 663 Volcanology and Geothermal Research 157, 182–201. 664 665 Wilkins, D.E., Ford, R.L., 2007. Nearest neighbor methods applied to dune field 666 organization: The Coral Pink Sand Dunes, Kane County, Utah, USA. Geomorphology 667 83, 48-57. 668 27 669 Appendix 1: Additional nearest neighbor analysis examples 670 Explosive interactions between lava and groundwater can generate volcanic rootless 671 cones (VRCs). VRCs are well suited to nearest neighbor (NN) analyses because they 672 occur in groups, which consist of a large number of landforms that share a common 673 formation mechanism. Geological Image Analysis Software (GIAS) includes NN 674 analysis tools that were used by Hamilton et al. (2010b) to investigate self- 675 organization processes within VRC groups and quantitatively compare the spatial 676 distribution of VRCs in the Laki lava flow in Iceland to analogous landforms in the 677 Tartarus Colles Region of Mars. 678 Figure 1 presents examples of rootless eruption sites within the Hnúta and 679 Hrossatungur groups of the Laki lava flow. In the field, Differential Global 680 Positioning System (DGPS) measurements were used to delimit rootless crater floors 681 and their centroids were assumed to represent the surface projection of underlying 682 rootless eruption sites (Hamilton et al., 2010a, 2010b). Hamilton et al. (2010a) used 683 tephrochronology and stratigraphic relationships to divide the complete study area 684 (Figure 1A) into chronologically and spatially distinct groups, domains, and 685 subdomains. Figure 1B provides an example of a VRC subdomain (Hnúta Subdomain 686 1.1). Figures 1C and D show the convex hull boundaries of the complete study area 687 and Subdomain 1.1 of the Hnúta Group, respectively. Figure 1 also shows the 688 locations of the convex hull vertices (Nv) and interior rootless eruption sites (Ni). 689 Identification of internal boundaries within the Hnúta and Hrossatungur 690 groups allowed Hamilton et al. (2010b) to examine the effects of scale and resource 691 availability on the spatial distribution explosive lava-water interactions. Relative to 692 the Poisson NN model, rootless eruption sites in both the complete study area (Ni = 693 2193) and Hnúta Subdomain 1.1 (Ni = 98) have |c| and |R| values that exceed their 28 694 respective sample-size-dependent limits of confidence at 2 σ. Consequently, neither of 695 these distributions are constant with a Poisson random model. Within the complete 696 study area, R is 0.72, which is lower than the 2 σ confidence limit of R (0.98) and 697 implies that the distribution is clustered with respect to the Poisson NN model. In 698 Hnúta Subdomain 1.1, R is 1.23, which is above the upper 2 σ confidence limit of 699 1.16 and implies that the distribution is repelled relative to a Poisson NN model. 700 Hamilton et al. (2010b) interpret these patterns of spatial distribution within the 701 context of resource abundance and depletion though competitive utilization. On 702 regional scales, VRCs cluster in locations that contain sufficient lava and groundwater 703 to initiate rootless eruptions, but on local scales rootless eruption sites tend to self- 704 organize into distributions that maximize the utilization of limited groundwater 705 resources. Hamilton et al. (2010b) interpret topography to be the primary influence on 706 regional clustering because it would control initial groundwater abundance and 707 regions of lava inundation. In contrast, they attribute local repelling to the non- 708 initiation, or early cessation, of rootless eruptions at unfavorable localities due to 709 groundwater depletion in a competitive system that is analogous to an aquifer 710 perforated by a network of extraction wells. 711 Baloga et al. (2007) showed that rootless eruption sites in the Tartarus Colles 712 Region of Mars have R = 1.22 and c = 6.14. These NN statistics are similar to those 713 observed within Hnúta Subdomain 1.1 (R = 1.23 and c = 4.45), which suggests that a 714 similar formation process in both environments. Thus the statistical NN analysis tools 715 provided within GIAS can be combined with field observations and remote sensing to 716 quantify patterns of spatial distribution and infer underlying mechanisms of formation 717 within a diverse range of geological systems. 718 29 719 Appendix 1 Figure 1: 720 A and B depict rootless eruption sites (red circles) superimposed on a 0.5 m/pixel 721 color aerial photograph showing 1783-1784 Laki eruption lava flows in Iceland. A: 722 Full extent of Hnúta and Hrossatungur groups and an inset (white rectangle) denoting 723 Hnúta Subdomain 1.1 (shown in B). C and D correspond to identical regions shown in 724 A and B, respectively, and provide examples of convex hull boundaries, rootless 725 eruption sites that form convex hull vertices (Nv), and interior points (Ni). 726 727 Appendix 1 Figure 1 30 728 Appendix 2: Derivation for the Scavenged Nearest Neighbor (k = 2) case 729 The probability of finding exactly k points within a circle of radius r on a plane can be 730 found using the Poisson distribution: 731 P(r , k )  (0 r 2 ) k 0 r 2 , e k! [1] 732 where ρ0 is the spatial density—defined by the number of objects (N) divided by the 733 area of the feature field (A). If the process of creating a point on the plane consumes 734 (or “scavenges”), resources that would have formed the kth nearest neighbor, then the 735 process can be modelled as (Baloga et al., 2007): 736 737 738 739 P(r; k scavenges )  1  P(0)  P(1)   P(k ) . [2] For k = 2, the density distribution of the function can be written as: p(r )  (0 ) 3 r 5 e 0r . 2 [3] For convenience, we write   0 , leading to: p(r )   3 r 5 e r . 2 740 741 To find the mean of this distribution, we multiply by r, integrate from 0 to ∞, and 742 normalise the resulting equation, to obtain first moment of the distribution r : [4] 743   3  r  r 5  e r dr 2 744 r  0  r . 5 e r 2 [5] dr 0 745 746 The integrals can be solved using integration by parts and two mathematical 747 identities for the odd and even terms of r (e.g., Stevenson, 1992). For the even terms, 748 the following identity can be used, 749 31 750  r 751 2n  e r dr  2 0 752 1.3.5(2n  1)  , 1 2 n1 n 2 [6]  e r dr [7] while odd integrals of the type,  r 753 2 n 1 2 0 754 can be solved using the substitution x2 = u and solving using integration by parts, 755 e.g.,  3 r  r  e dr  2 756 0  1 ue u du .  20 [8] 757 This leads to the following solution for the mean distribution of the nearest neighbour 758 distances for k = 2: 1 3  5   759 760 r  7 3 3 24   2  15    0  15 . 7 7 1 3 1 16 0 16 0 2 2  The second moment r 2 can be calculated in a similar manner:  761 [9] r2  r 2  r 2  r 3  e r dr 2  0  r 5 e r 2 dr 3 0 . [10] 0 762 763 The standard deviation is calculated using the identity:   15 2 0.27572   2  . 0 16  0 0 3 764 Thus, the expected Standard Error (σe) is: 765 e  0.27572 N  0 , r2  r 2 : [11] [12] 32 766 where N is the number of points in the area of interest. The third and fourth moments 767 can be derived as above, leading to: 768  r3  769 r  r 5  e r dr 2 3  0  r 5 e r 2 dr 105 320 3 ; [13] 2 0  r4  770 r 4  r 5  e r dr 2  0  r 5 e r 2 dr 12 (0 ) 2 . [14] 0 771 772 773 774 Using the moment formulae for skewness and kurtosis: skew  r 3  3 r r 2  2 r ; kurt  r 4  4 r r 3  6 r 3 2 r 2  3 r , [15] 4 we obtain: skew  0.006663  0 3 3 2 ; kurt  0.0174838  4 0 2 . [16; 17] 775 33 776 Appendix 3: Separation of population means 777 To examine the approximate number of samples required to find a statistically 778 significant separation of two Gaussian distributions, where the expected means and 779 standard deviations are known, a commonly-used test is to examine if zero lies within 780 the following confidence interval: 781 782       P s 0  s1  1.96 0  1   0  1  s 0  s1  1.96 0  1   0.95 , N N N N   [1] 783 where s0 and s1 are the sample means, μ0 and μ1 are the expected means and σ0 and σ1 784 are the expected standard deviations. N can be varied to estimate the number of 785 samples required to separate the distributions. In particular, if zero is within the 786 interval, then there is no discernible difference between the means of the distributions 787 at the 95% confidence level. 788 This method is applied to simulate two different means based upon the 789 expected values of the Poisson and the Scavenged NN when k = 1 and k = 2. To 790 simplify the problem, an approximation to a normal distribution is assumed. 791 Simulation from two normal distributions with increasing numbers of N was 792 coded in MATLAB. We found the number of samples required to statistically 793 differentiate between the distributions by varying ρ0. 794 For PNN and Scavenged NN, k = 1: 795 ρ0 (objects / unit area) 0.5 0.05 0.005 0.0005 Minimum N required to separate Poisson NN and Scavenged NN (k = 1) 50 60 60 60 796 797 For Scavenged NN, k = 1 and Scavenged NN, k = 2: 34 ρ0 (objects / unit area) 0.5 0.05 0.005 0.0005 Minimum N required to separate Poisson NN and Scavenged NN (k = 2) 100 120 130 150 798 799 Given that these simulations assume Normal distributions, it would be wise to 800 regard them as indicative of the minimum number of data points required to separate 801 the competing models. A much larger number of samples should be used separate the 802 Scavenged NN, k = 1 and k = 2 models because their expected means are relatively 803 close together. 804 35 805 Electronic Supplements 806 (1) MATLAB preparsed pseudo code and standalone MATLAB executable file, 807 available to the public at http://www.geoanalysis.org. 808 (2) GIAS help file 809 810 Figures and Tables 811 Tables 812 Table 1: Notation. 813 814 Table 2: Nearest neighbor model properties (after Baloga et al., 2007). 815 816 Table 3: “Image Analysis” module: summary of results. 817 818 Table 4: “Nearest Neighbor Analysis” module: summary of results. 819 820 Figures 821 Figure 1: Image analysis program in “Image Analysis” mode, with maximum object 822 areas truncated to ~104 pixels for visualization purposes which excludes very large 823 (and least circular) vesicles from displayed plots and histograms. Note that object 824 thresholding does not affect results presented in Tables 3 and 4. 825 826 Figure 2: Image analysis program in “Nearest Neighbor Analysis” mode with outputs 827 displaying results for a Poisson NN test. 828 36 829 Table 1. Variable A Ahull C k N Ni Nv NN P R R ro ra re P ρ0 ρi Σ σe 830 Definition Area of a feature field Area of the convex hull Statistic for comparing observed (actual) to expected mean NN distances Poisson index Number of features within a sample population Number of features within the convex hull Number of features forming the vertices of the convex hull Abbreviation for nearest neighbor Probability Statistic for comparing actual (ra) to expected (re) mean NN distances Radial distance Threshold distance used to calculate Normalized Poisson NN distributions Mean NN distance observed in the data Mean NN distance calculated from a given NN model (e.g., Poisson) Probability density Population (spatial) density of the input features (ρ0 = N / A) Population (spatial) density of the input features (ρi = Ni / Ahull) Standard deviation Standard error 37 831 Table 2. Statistic Poisson Probability Density (P) 20 re  0r Mean (re) 1 3 15 2 0 4 0 16  0 Mode (rmode) Scavenge k = 2 2(0 ) 2 r 3e  0r 2(0 )3 r 5e  0r 2 1 3 20 5 20 20 0.272249 0.27572 N 0 N 0 N 0 (  3) 0.008187 0.0066638 Standard Error (σe) 0.26136 Skewness 4 30 Kurtosis 2 Scavenge k = 1 3 2 2  3 2 16  4 (0 ) 2  30 3 2  30 3 2 0.016807 0.0174838  4 0 2  0 4 2 Normalized NN 20re   0 ( r0 2 20 e  0 r0 2  r 2 2 Logistic e ( r  m) b b[1  e ( r  m) b ]2 m r 2 ) e  0 r dr 2 r0 1 20 m r02  r 2  1 0 b N 3 0  3  1  2 r (r0  3r )  r   2r 2  3  0    20  1   832 2  3  2 2 2   3r 4  r  r  4r r0  6r   4  0  0 0  0 2  6 5 38 833 Table 3. Image Analysis Resolution (pixel size) Minimum object area Total number of objects (excluding boundary objects) Abundance (% of total) Area (min. to max.) Mean area (±1 σ) Eccentricity (min. to max.) Mean eccentricity (±1 σ) Perimeter (min. to max.) Mean perimeter (±1 σ) Orientation (min. to max.) Mean orientation (±1 σ) Magmatic vesicles (base unit: μm) 3.81 μm 20 pixels = 76.2 μm2 537 65.48 2.90 ×10 to 6.40 ×106 7.11 ×104 (±3.73 ×105) 0.00 to 0.97 0.71 (±0.18) 56.50 to 3.55 ×104 8.64 ×102 (±2.29 ×103) -89.32 to 90.00 7.27 (±43.91) 2 834 39 835 Table 4. Nearest Neighbor (NN) Analysis Magmatic vesicles (base unit: μm) Input distribution properties Resolution (pixel size) Minimum object area Objects inside convex hull (Ni) Interior object density (ρi) Range of NN distances Mean NN distance (ra; ±1 σ) Skewness Kurtosis 3.81 μm 20 pixels = 76.2 μm2 520 7.79 ×10-6 28.05 to 1.53 ×103 1.75 ×102 (±1.15 ×102) 2.48 16.13 Poisson model Expected mean NN distance (re) R with thresholds at 2 σ c with thresholds at 2 σ Conclusions (c value) 1.79 ×102 0.98 (0.97 and 1.07) -0.91 (-1.33 and 2.83) Model supported (<2 σ) Normalized Poisson model Distance threshold Expected mean NN distance (re) R with thresholds at 2 σ c with thresholds at 2 σ Conclusions (c value) 5 pixels = 19.05 μm 1.50 ×102 1.17 (0.97 and 1.07) 3.51 (-1.34 and 2.88) Model rejected (>2 σ) Repelled vs. Normalized Scavenged (k = 1) model Expected mean NN distance (re) R with thresholds at 2 σ c with thresholds at 2 σ Conclusions (c value) 2.69 ×102 0.65 (0.99 and 1.06) -21.82 (-0.69 and 3.91) Model rejected (>2 σ) Clustered vs. k = 1 Scavenged (k = 2) model Expected mean NN distance (re) R with thresholds at 2 σ c with thresholds at 2 σ Conclusions (c value) 3.36 ×102 0.52 (1.00 and 1.06) -37.05 (-0.14 and 4.78) Model rejected (>2 σ) Clustered vs. k = 2 836 40 837 Figure 1 838 41 839 Figure 2 840 42

New image processing software for analyzing

Related documents

Products

Support

New image processing software for analyzing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib