New image processing software for analyzing

advertisement
1
New image processing software for analyzing object size-frequency
2
distributions, geometry, orientation, and spatial distribution*
3
Ciarán Beggan1 and Christopher W. Hamilton2
4
1
Grant Institute, School of GeoSciences, University of Edinburgh, UK
5
2
Hawai‘i Institute of Geophysics and Planetology, University of Hawai‘i, USA
6
7
Corresponding author: Ciarán Beggan, E-mail: ciaran.beggan@ed.ac.uk; Phone: +44
8
131 650 5916; Address: School of GeoSciences, Grant Institute, West Mains Road,
9
Edinburgh, EH9 3JW, UK
10
11
*Code available from server at http://www.geoanalysis.org
12
13
Abstract
14
Geological Image Analysis Software (GIAS) combines basic tools for calculating
15
object area, abundance, radius, perimeter, eccentricity, orientation, and centroid
16
location, with the first automated method for characterizing the aerial distribution of
17
objects using sample-size-dependent nearest neighbor (NN) statistics. The NN
18
analyses include tests for: (1) Poisson, (2) Normalized Poisson, (3) Scavenged k = 1,
19
and (4) Scavenged k = 2 NN distributions. GIAS is implemented in MATLAB with a
20
Graphical User Interface (GUI) that is available as pre-parsed pseudocode for use
21
with MATLAB, or as a stand-alone application that runs on Windows and Unix
22
systems. GIAS can process raster data (e.g., satellite imagery, photomicrographs, etc.)
23
and tables of object coordinates to characterize the size, geometry, orientation, and
24
spatial organization of a wide range of geological features. This information expedites
25
quantitative measurements of 2D object properties, provides criteria for validating the
1
26
use of stereology to transform 2D object sections into 3D models, and establishes a
27
standardized NN methodology that can be used to compare the results of different
28
geospatial studies and identify objects using non-morphological parameters.
29
30
Key Words: Image analysis, software, size-frequency, spatial distribution, nearest
31
neighbor, vesicles, volcanology
32
33
1. Introduction
34
Geological image analysis extracts information from representations of natural objects
35
that may either be captured by an imaging system (e.g., photomicrographs, aerial
36
photographs, digital satellite imagery) or schematically rendered into visual form
37
(e.g., geological maps). In addition to examining the properties of individual objects,
38
spatial analysis may be used to quantify object distributions and investigate their
39
formation processes.
40
ImageJ (Rasbald, 2005) is a commonly used image processing application that
41
was developed as an open source Java-based program by the National Institutes of
42
Health (NIH). Custom plug-in modules enable ImageJ to solve numerous tasks,
43
including the analysis of vesicle size-frequency distributions within geological thin-
44
sections (e.g., Szramek et al, 2006, Polacci et al., 2007). However, ImageJ and similar
45
programs for Windows (e.g., Scion Image and ImageTool) and Macintosh (e.g., NIH
46
Image) are not specifically designed for geological applications nor do they provide
47
analytical tools for investigating patterns of spatial organization.
48
To take greater advantage of the information contained within geological
49
images, we have developed Geological Image Analysis Software (GIAS). This
50
program combines (1) an image processing module for calculating and visualizing
2
51
object areas, abundance, radii, perimeters, eccentricities, orientations, and centroid
52
locations, and (2) a spatial distribution module that automates sample-size dependent
53
nearest neighbor (NN) analyses. Although other programs can perform the basic
54
functions in the “Image Analysis" module, GIAS is the first program to automate
55
sample-size-dependent analyses of NN distributions.
56
57
2. Motivation
58
Nearest neighbor (NN) analysis is well-suited for investigating patterns of spatial
59
distribution within intrinsically two-dimensional (2D) datasets, such as orthorectified
60
aerial photographs and satellite imagery. Applications of NN analyses to remote
61
sensing imagery include the study of volcanic landforms (Bruno et al., 2004; 2006;
62
Baloga et al., 2007; Bishop, 2008; Hamilton et al., 2009; Bleacher et al., 2009),
63
sedimentary mud volcanoes (Burr et al., 2009), periglacial ice-cored mounds (Bruno
64
et al., 2006), glaciofluvial features (Burr et al., 2009), dune fields (Wilkins and Ford,
65
2007), and impact craters (Bruno et al., 2006). Despite widespread utilization of NN
66
analyses, its value as a remote sensing tool and effectiveness as basis for comparison
67
between different datasets is limited by the lack of a standardized NN methodology—
68
particularly in terms of defining feature field areas, thresholds of significance, and
69
criteria for apply higher-order NN methods.
70
In addition to remote sensing applications, NN analyses can be used to study
71
objects in photomicrographs such as crystals (Jerram et al., 1996; 2003; Jerram and
72
Cheadle, 2000) and vesicles. In this study, we emphasize the application of NN
73
analyses to vesicle distributions to demonstrate how GIAS can be used to validate (or
74
refute) the assumption of randomness, which is a prerequisite for effectively applying
3
75
stereological techniques to derive vesicle volumes from photomicrographs and for
76
selecting appropriate statistical models to characterize those vesicle populations.
77
Vesicle textures preserve information about the pre-eruptive history of
78
magmas and can be used to investigate the dynamics of explosive and effusive
79
volcanic eruptions (e.g., Mangan et al., 1993; Cashman et al., 1994; Mangan and
80
Cashman 1996; Cashman and Kauahikaua, 1997; Polacci and Papale 1997; Blower et
81
al., 2001; Blower et al., 2003; Gaonac'h et al., 2005; Shin et al., 2005; Lautze and
82
Houghton, 2005; Adams et al., 2006; Polacci et al., 2006; Sable et al., 2006; Gurioli
83
et al., 2008). Quantitative vesicle analyses stem from Marsh (1988), who explored the
84
physics of crystal nucleation and growth dynamics to derive an analytical formulation
85
for crystal size distributions. This research was then applied to numerous bubble size-
86
frequency distribution studies (e.g., Sarda and Graham, 1990; Cashman and Mangan,
87
1994; Cashman et al., 1994; Blower et al., 2003). Early studies of vesicle distributions
88
(e.g., Cashman and Mangan, 1994) were limited by their inability to characterize the
89
full range of vesicle sizes because their methodology could not resolve the smallest
90
vesicles. Nested photomicrographs solve this problem because photomicrographs
91
captured at multiple magnifications enable the reconstruction of total vesicle size-
92
frequency distributions (e.g., Adams et al., 2006; Gurioli et al., 2008).
93
In general, vesicle studies are limited by two major factors: transformations of
94
2D cross-sections into representative vesicle volumes (Mangan et al., 1993; Sahagian
95
and Proussevitch, 1998; Higgins, 2000; Jerram and Cheadle, 2000); and development
96
of reliable statistical characterizations of vesicle populations in terms of distribution
97
functions and their spatial characteristics (Morgan and Jerram, 2007; Proussevitch et
98
al., 2007a). These difficulties have been partially addressed by improved statistical
99
techniques for investigating vesicle populations; however, a single cross-section
4
100
cannot be used to reconstruct a representative 3D vesicle distribution unless the
101
objects viewed in a 2D section can be characterized using 2D reference textures with
102
known 3D distributions (Jerram et al., 1996; 2003; Jerram and Cheadle 2000;
103
Proussevitch et al., 2007a). Although synchrotron X-ray tomography is increasingly
104
being used to directly generate 3D vesicle distributions (e.g., Gualda and Rivers,
105
2006; Polacci et al., 2007; Proussevitch et al., 2007b), nested datasets containing
106
multiple Scanning Electron Microscope (SEM) images remain the most common
107
input for vesicle studies because of their superior spatial resolution relative to X-ray
108
tomography. To facilitate the analysis of vesicles in SEM imagery, GIAS can be used
109
to determine the geometric properties of vesicles within the plane of a given
110
photomicrograph and establish if objects fulfil the criteria of spatial randomness,
111
which is required for transforming 2D sections into 3D models using stereology.
112
113
3. Nearest neighbor (NN) analysis
114
Clark and Evans (1954) proposed a simple test for spatial randomness in which the
115
actual mean NN distance (ra) in a population of known density is compared to the
116
expected mean NN distance (re) within a randomly distributed population of
117
equivalent density. Following Clark and Evans (1954), re and expected standard error
118
(σe) of the Poisson distribution are:
119
re 
1
2 0
; e 
0.26136
N 0
,
[1; 2]
120
where the input population density, ρ0, equals the number of objects (N) divided by
121
the area (A) of the feature field (ρ0 = N / A). The following two test statistics (termed
122
R and c) are used to determine if ra follows a Poisson random distribution:
123
R
ra
re
; c
ra  re
e
.
[3; 4]
5
124
If a test population exhibits a Poisson random distribution, R should ideally
125
have a value of 1, while c should equal 0. If R is approximately equal to 1, then the
126
test population may have a Poisson random distribution. If R > 1, then the test
127
population exhibits greater than random NN spacing (i.e., tends towards a maximum
128
packing arrangement), whereas if R < 1, then the NN distances in the test case are
129
more closely spaced than expected within a random distribution and thus exhibit
130
clustering relative to the Poisson model.
131
To identify statistically significant departures from randomness at the 0.95 and
132
0.99 confidence levels, |c| must exceed the critical values of 1.96 and 2.58,
133
respectively (Clark and Evans, 1954); however, these critical values implicitly assume
134
large sample populations (N > 104). Jerram et al. (1996) and Baloga et al. (2007) note
135
that finite-sampling effects introduce biases into the variation of NN statistics. These
136
biases become significant for small populations (N < 300) and thereby necessitate the
137
use of sample-size-dependent calculations of R and c thresholds.
138
The NN methodology of Clark and Evans (1954) assumes that objects can be
139
approximated as points. However, for frameworks of solid objects, NN statistics
140
exhibit scale-dependent variations in randomness due to the restrictions imposed by
141
avoiding overlapping object placements. For 2D sections through 3D distributions of
142
equal-sized solid spheres, Jerram et al. (2003) define a porosity-dependent threshold
143
for R, which they term the random sphere distribution line (RSDL). With increasing N
144
(i.e., decreasing porosity), the RSDL threshold increases from a value just above 1 to
145
a maximum of 1.65 at 37% porosity, based on the experimental spherical packing
146
model of Finney (1970). Jerram et al. (1996, 2003) also note that compaction,
147
overgrowth, and sorting can introduce systematic variations in R relative to the RSDL
6
148
and thus, when correctly identified, can provide information relating to the relative
149
importance of these processes during rock formation.
150
151
In addition to sample-size-dependent tests for Poisson randomness, Baloga et
al. (2007) proposed the following three NN models:
152
153
1. Normalized Poisson NN distributions account for data resolution-limits, which
154
require normalization of re, σe, skewness, and kurtosis values based on a user-
155
defined minimum distance threshold.
156
2. Scavenged Poisson NN distributions assume features consume (or “scavenge”)
157
resources in their surroundings, which affects the kth nearest neighbor by
158
increasing re relative to the standard Poisson NN model. The order of the
159
scavenging model is given by the Poisson index, k, which identifies the
160
number of NN objects participating in the scavenging process and, if k = 0, the
161
standard Poisson NN distribution is obtained.
162
3. Logistic NN distributions apply the classical probability distribution for self-
163
limiting population growth due to the use of available resources. Under this
164
probability assumption, NN distances become more uniform, which leads to
165
smaller skewness, but larger kurtosis compared to the Poisson NN model.
166
167
Sample-size-dependent R and c test statistics, in addition to the expected
168
skewness versus kurtosis values, allow for robust statistical tests of spatial
169
organization. However, prior to this study, these extensions to the classical NN
170
method have not been automated and freely available for use.
171
172
4. Method
7
173
4.1. Input data and processing parameters
174
GIAS is designed to process two input data types: images and tables of object
175
coordinates. In the first instance, images may include rectangular raster files (e.g.,
176
*.tiff, *.jpg, *.gif, etc.) encoded in 8-bit grey-scale formats. All inputs are assumed to
177
have an orthogonal projection with equal x and y pixel dimensions. If the input image
178
pixels are not square, their area can be uniquely specified. To appropriately scale the
179
output results, the pixel size must be defined by the analyst for each input image.
180
Image segmentation is achieved using Digital Number (DN) thresholding,
181
such that inputs are converted to a binary threshold logical matrix. GIAS assumes
182
input grey-scale pixel values >250 are white; however, the upper and lower thresholds
183
can be adjusted in the input options. Within input images, black pixels are assumed to
184
be the objects of interest (e.g., vesicles) whereas white pixels are assumed to be part
185
of the background (e.g., non-vesicles). An option is provided for image inversion.
186
Other than performing the binary conversion, GIAS does not modify input
187
images. Consequently, pre-processing may be necessary to remove image artifacts
188
and segment objects into feature classes. To facilitate image analysis and pre-
189
processing of inputs to the NN module, GIAS supplies labeling information and
190
metadata for each object in the image analysis output file; however, the level of image
191
modification will depend upon the particular scientific investigation and the
192
preferences of the analyst.
193
Inputs to the NN module can be passed from the image analysis module or
194
loaded as a two column table (tab- or space- delimited) with x-coordinates in the first
195
column and y-coordinates in the second column. GIAS automatically checks input
196
files for duplicate entries and retains only unique coordinate pairs.
8
197
Slider bars control the detection thresholds for the minimum and maximum
198
object areas, in addition to the distance threshold used within the Normalized Poisson
199
tests. A check-box excludes objects that touch the periphery of the image from
200
subsequent processing. This option is enabled by default because meaningful cross-
201
sectional properties (e.g., area, geometry, and orientation) cannot be calculated for
202
truncated objects. Calculations of object abundance (as a percentage of total image
203
area) are based on total image properties and are unaffected by peripheral object
204
exclusion. Once the input options are satisfactorily set, the analyst simply presses the
205
“Process” button to calculate the output graphs and statistics.
206
207
4.2. Output results
208
GIAS outputs its results to on-screen graphs and text boxes, which are presented in
209
two tabbed panels. Statistics relating to the object area, geometry, and orientation are
210
shown in the first panel, labeled “Image Analysis” (Figure 1). The frequency axis can
211
be plotted in linear or log units with on-screen options for defining custom
212
dimensions. Object areas, radii, perimeters, and eccentricities are presented as
213
histograms, whereas object orientations are displayed using a rose plot. Object radii
214
are estimated by calculating the radii of equal area circles. In addition to the graphical
215
output, there are text-box summaries of the total number of objects, their abundance,
216
minimum, maximum, mean, standard deviation (at 1 σ), skewness, and kurtosis based
217
on the object area calculations.
218
Results of the spatial analyses are summarized in a second panel, labeled
219
“Nearest Neighbor Analysis” (Figure 2). These results include histograms of the
220
measured NN distances, in addition to graphs of R, c, and skewness versus kurtosis
221
(Baloga et al., 2007). The R and c values are presented with sample-size-dependent
9
222
thresholds for rejecting the null hypothesis (i.e., a given model of re) at significance
223
levels of 1 and 2 σ. By default, the program assumes a Poisson random model for the
224
expected spatial distribution; however, the user may also view results based on
225
Normalized Poisson, or Scavenged (k = 1 or 2) models. Similarly, the skewness
226
versus kurtosis plots are provided with sample-size-dependent significance levels of
227
0.90, 0.95, and 0.99.
228
Textbox outputs provide the model-dependent value of re and summarize ra
229
statistics, including the total number of objects, number within the convex hull, and
230
NN distance statistics (minimum, maximum, mean, and standard deviation at 1 and 2
231
σ). The convex hull is a polygon defined by the outermost points in a feature field
232
such that each interior angle of the polygon measurers less than 180º (Graham, 1972).
233
The convex hull thus forms a boundary around a feature field with the minimum
234
possible perimeter length (see Appendix 1 for examples).
235
The input parameters and computed statistics from both tabbed panels can be
236
output in separate user-named text files. The data are saved in a tab-delimited format,
237
allowing the files to be viewed using a text editor or imported into a spreadsheet.
238
239
4.3. Implementation
240
GIAS is written and implemented in MATLAB with a Graphical User Interface (GUI)
241
that is designed for use within the geological community. The program is available as
242
preparsed pseudocode for use with MATLAB (provided both “Image Processing” and
243
“Statistics” toolboxes are installed) or as a standalone application that can run on
244
Windows and Unix systems without requiring MATLAB or its auxiliary toolboxes.
245
These software products may be downloaded from the internet for free via
246
http://www.geoanalysis.org.
10
247
The regionprops function within the MATLAB Image Processing toolbox
248
calculates the following key properties of each object: area, centroid position,
249
perimeter, eccentricity, orientation, equal-area circle equivalent diameter and a list of
250
pixel positions within each vesicle. The pixel list is used to determine which objects
251
are in contact with the image edge and the equal area circle diameters are used to
252
calculate object radii. The eccentricity and orientation are calculated by fitting an
253
ellipse about the object. The angle between the major axis of the ellipse and the x-axis
254
of the image defines the orientation.
255
By default, the NN analysis utilizes centroid positions calculated in the image
256
analysis module, however, object locations may be uploaded as a table of x- and y-
257
coordinates. Within the NN module, the MATLAB function convhull is used to
258
identify the vertices of the convex hull for all of the input objects. The function
259
convhull is also used to calculate the area of the convex hull (Ahull) and to separate
260
interior objects (Ni) from the hull vertices (see Appendix 1 for examples).
261
NN distances are determined by calculating the Euclidean distance between
262
each interior object (centroid) and all other objects within the dataset, including the
263
vertices of the hull. The results for each interior point are sorted in ascending order to
264
identify the minimum NN distance and calculate the actual mean NN distance (ra) by
265
averaging the NN distances for all interior objects (Ni). The interior population
266
density (ρi = Ni / Ahull), provides a statistical estimate of the true density within an
267
infinite domain and thus substitutes for ρ0. We then calculate R as the ratio of ra-to-re.
268
Using sample-size-dependent values of R, c, and their respective thresholds of
269
significance, we may define the following scenarios. If the values of R and c fall
270
within ±2 σ of their expected values, then the results suggest that the model correctly
271
describes the input distribution. If the value of c is outside the range of ±2 σ, then the
11
272
inference is that the input distribution is inconsistent with the model, regardless of the
273
value of R. If c is outside ±2 σ, and R is greater than the +2 σ threshold, then the input
274
distribution exhibits a statistically significant repelling relative to the model.
275
Conversely, if c is outside the ±2 σ range, and R less then than the -2 σ threshold, then
276
the input distribution is inferred to be clustered relative to the model. Lastly, if R is
277
outside ±2 σ, and c is within the ±2 σ range, then the test results are ambiguous,
278
which generally indicates that there is insufficient data to perform the analysis and/or
279
the standard error (σe) of the input distribution is large.
280
One of the novel contributions of our application is the introduction of
281
automated calculations of sample-size-dependent biases in NN statistics. Baloga et al.
282
(2007) described how finite sampling would affect several NN models; however, they
283
only implemented a solution for the Poisson NN case. In our application, we have
284
automated the Poisson NN procedure, extended it to include the three other models of
285
Baloga et al. (2007), and derived the k = 2 case for the scavenged NN distribution
286
(Appendix 2). Automation of the NN method within GIAS enables users to rapidly
287
determine if their data fulfills the size-dependent criteria of statistical significance for
288
the following five spatial distribution models: (1) Poisson, (2) Normalized Poisson,
289
(3) Scavenged k = 1; and (4) Scavenged k = 2. Table 2 summarizes the properties of
290
these models.
291
To obtain sample-size-dependent values of R, c, and their confidence limits,
292
we generated simulations in MATLAB. Following the method of Baloga et al. (2007),
293
confidence limits for the Poisson and the Scavenged k = 1 and 2 models were
294
simulated using 1000 random draws from a modeled distribution. However, for the
295
Normalized Poisson model, limits for the standard Poisson case are employed because
296
simulating unique limits for each user-defined distance threshold is computationally
12
297
intensive in real-time (e.g., requiring approximately four minutes to perform the
298
simulations using a computer with a 2 GHz processor and 1 GB of RAM).
299
GIAS does not include the logistic density distribution described by Baloga et
300
al. (2007) because its parameters (i.e., the logistic mean (m) and the standard
301
deviation (b), see Table 2) are estimated using a best-fit to the input data. The logistic
302
case therefore differs substantially from the other NN analyses because they involve a
303
comparison between input spatial distributions and expected model distributions.
304
Additionally, the fit of a logistic curve to an input distribution does not necessarily
305
imply an underlying logistic process because other processes could generate
306
distributions that better fits the data. Thus given the differences between the logistic
307
density distribution described by Baloga et al. (2007) and the other Poisson NN
308
models, we have omitted the logistic case from the geospatial analysis tools
309
implemented within GIAS.
310
To correctly interpret the results of the Scavenging NN models, it is important
311
to recognize that the formulation of Baloga et al. (2007) is a scale-free distribution.
312
Modification to expected NN distances depend upon the number of objects
313
participating in the scavenging process and not on the location of the objects.
314
Consequently, the mean and standard deviation statistics are not directly related to the
315
radius of the scavenged area. This is an unrealistic condition for resource-scavenging
316
phenomena in nature, which have a fixed upper limit to r. Alternative modeling
317
scenarios, with a user-defined scavenging radius, may be more appropriate for some
318
applications—provided that a scale for the scavenging process can be determined.
319
Distinguishing between Poisson NN and Scavenged Poisson NN models
320
requires statistical separation of their respective NN distributions. The minimum
321
number of objects required will depend on ρ0 and k. In Appendix 3, we show that
13
322
separation of the Poisson and k = 1 Scavenged Poisson models, at the 95% confidence
323
level, requires a minimum 50 to 60 objects for ρ0 values of 0.5 to 0.0005 objects / unit
324
area, whereas 100 to 150 objects are required to distinguish between k = 1 and 2
325
models.
326
In addition to sample-size-dependent R and c statistics, we expand upon the
327
methods of Baloga et al. (2007) by simulating the sample-size-dependent range of
328
expected skewness and kurtosis values for each NN hypothesis. Calculated
329
confidence intervals—at 90%, 95% and 99%—are plotted within GIAS to illustrate
330
the expected ranges of skewness versus kurtosis for each model. Plots of skewness
331
versus kurtosis are also useful for discriminating between populations with otherwise
332
similar R and c values, such as ice-cored mounds (e.g., pingos) and volcanic rootless
333
cones (Bruno et al., 2006).
334
335
5. Results
336
To demonstrate the utility of GIAS, we have applied the program to vesicle patterns
337
within proximal magmatic tephra deposits from Fissure 3 of the 1783-1784 Laki cone
338
row, Iceland. Tables 3 and 4 summarize the results of the “Image Analysis” and
339
“Nearest Neighbor Analysis” modules, respectively. In Appendix 1, we include
340
additional examples of geospatial distributions that are clustered and repelled relative
341
to the Poisson NN model. These examples are based on the Hamilton et al. (2010a,
342
2010b) and demonstrate how statistical NN analyses can be combined with field
343
observations to obtain information about the effects of environmental resource
344
limitation on the spatial distribution of volcanic landforms. Appendix 1 also explores
345
how the geospatial analysis tools that are provided within GIAS can be applied to
346
remote sensing data to quantitatively compare the spatial distribution of landforms in
14
347
different planetary environments, supply non-morphological evidence to discriminate
348
between feature identities, and examine self-organization processes within geological
349
systems.
350
351
5.1. Vesicle analysis
352
Vesicle characteristics can provide information about the pre-eruptive history of
353
magmas, volatile exsolution dynamics, and conduit ascent processes—all of which are
354
important factors in controlling the intensity of explosive volcanic eruptions.
355
Quantification of these parameters relies largely upon the use of vesicle size, number,
356
and volume distributions to establish links between natural volcanic samples and
357
experimental data (Adams et al., 2006). Nested SEM images at several magnifications
358
are typically used in combination with stereology (e.g., Sahagian and Proussevitch,
359
1998) to transform 2D vesicle properties into 3D distributions. Although acquisition
360
and analysis of a complete nested SEM dataset is outside the scope of this study, we
361
present an example of how GIAS can be used to analyze vesicle properties within a
362
single image. For this example, we have chosen a magmatic tephra sample (LAK-t-
363
23c, Emma Passmore, unpublished data) from 1783-1784 Laki eruption in Iceland.
364
The resolution of the image is 3.81 μm / pixel and we have followed the method of
365
Gurioli et al. (2008) by using a minimum object area threshold of 20 pixels (76.2
366
μm2) to avoid complications associated with poorly resolved objects near the limit of
367
resolution within our data.
368
The output of the "Image Analysis" module (Table 3 and Figure 1) reveals that
369
the sample has a vesicularity of 65.48% and—excluding objects in contact with the
370
periphery of the image—there are 537 objects larger than the detection threshold of
371
76.2 μm2. Assuming the properties of an equal area circle, the vesicle diameters range
15
372
from 19.21 μm to 2.85 × 103 μm, with a mean of 3.01 × 102 μm ±6.89 × 102 (at 1 σ).
373
The ratio of equal area circle circumference to object perimeter is 0.77 ±1.53 (at 1 σ);
374
however; for the 21 vesicles larger than 3.10 × 105 μm2 the ratio decreases to 0.55
375
±0.21, which indicates that the largest vesicles strongly deviate from a circular
376
geometry (see the graph of “Objects Perimeters”). The vesicles have a mean
377
eccentricity of 0.71 ± 0.18 and a mean orientation trending 7.27° to 187.27° ±43.91°
378
(n.b., 90° points towards the top of the image; see “Object Eccentricity” and “Object
379
Orientations” in Figure 1).
380
The NN module results (Table 4 and Figure 2) show that NN distances
381
between vesicle centroids range from 28.05 to 1.53 ×103 μm, with a mean (ra) of 1.75
382
× 102 ±1.15 ×102 (see “Nearest Neighbor Frequency Distribution” in Figure 2). The
383
sample has R and c values within the 2 σ thresholds of significance (Figure 2) and
384
thus we conclude that the vesicle centroids exhibit a Poisson (random) distribution.
385
We have also analyzed the sample using a Normalized Poisson threshold of 5 pixels
386
(19.24 μm), which corresponds to the diameter of a circle that is equal in area to our
387
object area threshold of 20 pixels. Given this threshold, the sample exhibits a repelled
388
distribution with R and c values greater than their respective upper 2 σ limits. The
389
Scavenged k = 1 and 2 cases overestimate re relative to ra, and thus result in clustered
390
results relative to both Scavenged models.
391
The GIAS outputs suggest that vesicles within this sample exhibit an overall
392
Poisson distribution despite the presence of a vesicle fabric trending 7.27° to 187.27°
393
±43.91°. The random distribution of vesicles supports the usage of a stereological
394
transformation, however, deformation and coalescence have caused vesicles >3.10 ×
395
105 μm2 to strongly deviate from a circular geometry. Consequently, if the vesicles
396
represented within this thin-section were transformed into a 3D distribution using
16
397
stereology, it would be necessary to acknowledge propagating errors associated with
398
the non-circular geometry of the largest vesicles in the sample. Additionally, it is
399
important to recognize the resolution limitations associated with analysis because we
400
have applied a minimum object area threshold of 20 pixels, which excludes all
401
vesicles <76.2 μm2 (i.e., 630 of 1167 objects). A detection threshold of 20 pixels is
402
appropriate for studies of vesicle size-frequency distributions because an
403
misclassification of 1 pixel results in an error of 5% of the estimate of total object area
404
(Lucia Gurioli, personal communication, 2009). Nevertheless, this threshold appears
405
insufficient for filtering artifacts associated with the geometric properties of the
406
smallest vesicles in the input image. For instance, eccentricity values calculated for
407
vesicles within the Laki magmatic tephra sample appear to be very high (0.71 ± 0.18),
408
but this is a resolution-induced artifact resulting from pixel-scale asymmetries in a
409
large population of small vesicles. Resolution limitations also affect the Normalized
410
Poisson NN distribution because the large number of objects below the detection
411
thresholds leads to underestimates of ρ0 and re, which in turn results in a repelled
412
distribution relative to the model. In instances where the majority of objects are below
413
the detection limit of analysis, it is therefore imperative to use nested image datasets,
414
and/or high magnification mosaics to resolve the properties of the smallest objects in
415
a given distribution.
416
417
6. Discussion
418
For randomly distributed spherical objects, stereological techniques may be used to
419
convert cross-sectional areas into volumes (Sahagian and Proussevitch, 1998).
420
Higgins (2000) and Morgan and Jerram (2007) have also applied 2D to 3D
421
transformations for crystal shapes such as cubes and prisms based upon a large
17
422
number of observed cross-sectional areas. Nevertheless, for irregular object
423
geometries, transformations do not exist for generating an accurate 3D model from a
424
single 2D image (Mock and Jerram, 2005; Sable et al., 2006). For vesicle studies,
425
direct measurements of 3D vesicle distributions using X-ray microtomography can be
426
used to circumvent the problems associated with stereological transformations (e.g.,
427
Gualda and Rivers, 2006; Polacci et al., 2007; Proussevitch et al., 2007b), but the
428
resolution of synchrotron-based X-ray tomography is insufficient to resolve micron-
429
sized vesicle walls, which can result in erroneously large estimates of vesicle
430
permeability (Gurioli et al., 2008). Tomographic resolution limitations similarly
431
prohibit the detection of sub-micron vesicles, which decreases estimates of vesicle
432
number densities relative to investigations of higher resolution SEM images (Gurioli
433
et al., 2008). Consequently, even given the availability of 3D imaging techniques,
434
thin-section analysis continues to supply unique information about vesiculation
435
processes and, therefore, 2D image processing should continue to be explored.
436
Similarly, NN analysis of a single 2D image cannot resolve the spatial
437
organization of the centroids in 3D, unless additional reference information is
438
provided (e.g., Jerram et al., 1996; 2003; Jerram and Cheadle, 2000). Instead, 2D
439
methods characterize the aerial distribution of objects within the plane of the image.
440
When applying NN analyses to vesicle distributions in thin-sections, it is important to
441
remember that object centroids are unlikely to coincide with the center of a vesicle in
442
3D, especially when vesicles are irregularly shaped. To obtain a more meaningful
443
statistical description of vesicle organization it may be necessary to analyze several
444
photomicrographs—preferably three mutually perpendicular thin-sections that sample
445
the principal component directions of the vesicle fabric. Vertical offsets in the
446
position of objects in orthometric aerial photographs and satellite imagery may
18
447
similarly introduce discrepancies between projected object separations and true NN
448
distances, but the vertical offsets are generally small relative to the separation of
449
objects in the plane of the image and hence the differences can be ignored.
450
451
7. Conclusions
452
We have developed Geological Image Analysis Software (GIAS) for calculating the
453
geometric properties of objects and quantifying their spatial distribution. The program
454
improves the efficiency and reproducibility of geological object characterizations by
455
combining image processing tools for calculating object areas, abundance, radii,
456
perimeters, eccentricity, orientation, and centroid location with sample-size-dependent
457
NN tests for (1) Poisson; (2) Normalized Poisson; (3) Scavenged, k = 1; (4) and
458
Scavenged k = 2 distributions. These geospatial analyses have not been implemented
459
in other software packages and, consequently, GIAS represents a novel approach to
460
characterizing vesicle distributions in thin-sections and examining the spatial
461
organization of other geological features in datasets ranging from photomicrographs
462
(e.g., Section 5) to remote sensing imagery of planetary surfaces (e.g., Appendix 1
463
and Hamilton et al., 2010b).
464
Future work will focus on developing the following aspects of GIAS: (1)
465
image segmentation using Kriging analysis of DN histograms (e.g., Oh and Lindquist,
466
1999); (2) automated object recognition (e.g., Proussevitch and Sahagian, 2001;
467
Lindquist et al., 1996; Shin et al, 2005); (3) geometric cluster analysis (e.g., Still et
468
al., 2004); (4) statistical analyses to address truncated and multimodal populations
469
(e.g., Proussevitch et al., 2007a); (5) processing nested image datasets (e.g., Gurioli et
470
al., 2008); (6) incorporating stereological techniques for converting 2D data into 3D
471
distributions and then using the 3D models to calculate vesicle number density (e.g.,
19
472
Sahagian and Proussevitch, 1998; Proussevitch et al., 2007a); (7) developing NN
473
analyses for 3D datasets (Jerram and Cheadle, 2000; Mock and Jerram, 2005); and (8)
474
treatment of solid object interactions on NN statistics (e.g., Jerram et al.; 2003).
475
476
Acknowledgements
477
We thank Alexander Proussevitch, Steve Baloga, Sarah Fagents, Bruce Houghton,
478
Lucia Gurioli, and Mark Naylor for insightful discussions relating to vesicle
479
distributions and NN simulations; Emma Passmore for kindly providing the
480
geological thin-sections examined within this study; and Dougal Jerram and
481
Guilherme Gualda for their thorough reviews, which have greatly improved this
482
manuscript. CDB wishes to acknowledge the assistance of Sue Rigby in providing
483
funding for software purchases. CDB was funded under NERC studentship award
484
NER/S/J/2005/13496. CWH is funded by the Icelandic Centre for Research
485
(RANNÍS) and the NASA Mars Data Analysis Program (MDAP) grant
486
NNG05GQ39G. SOEST publication number: 7808. HIGP publication number 1802.
487
488
489
References
490
Adams, N.K., Houghton, B.F., Fagents, S.A., Hildreth, W., 2006. The transition from
491
explosive to effusive eruptive regime: The example of the 1912 Novarupta eruption,
492
Alaska. Geological Society of America Bulletin 118, 620–634.
493
doi:10.1130/B25768.1.
494
20
495
Baloga, S.M., Glaze, L.S., Bruno, B.C., 2007. Nearest-neighbor analysis of small
496
features on Mars: Applications to tumuli and rootless cones. Journal of Geophysical
497
Research 112, E03002, 1-17. doi:10.1029/2005JE002652.
498
499
Bishop, M.A., 2008. Higher-order neighbor analysis of the Tartarus Colles cone
500
groups, Mars: The application of geographical indices to the understanding of cone
501
pattern evolution. Icarus 197, 73–83.
502
503
Bleacher, J.E., Glaze, L.S., Greeley, R., Hauber E., Baloga, S.M., Sakimoto, S.E.H.,
504
Williams, D.A., Glotch, T.D., 2009. Spatial and alignment analyses for a field of
505
small volcanic vents south of Pavonis Mons and implications for the Tharsis province,
506
Mars. Journal of Volcanology and Geothermal Research 185, 96-102.
507
doi:10.1016/j.jvolgeores.2009.04.008.
508
509
Blower, J.D., Keating, J.P., Mader, H.M., Phillips, J.C., 2001. Inferring volcanic
510
degassing processes from vesicle size distributions. Geophysical Research Letters 28,
511
347–350.
512
513
Blower, J.D., Keating, J.P., Mader, H.M., Phillips, J.C., 2003. The evolution of
514
bubble size distributions in volcanic eruptions. Journal of Volcanology and
515
Geothermal Research 120, 1–23.
516
517
Bruno B.C., Fagents S.A., Thordarson T., Baloga S.M., Pilger E., 2004. Clustering
518
within rootless cone groups on Iceland and Mars: Effect of nonrandom processes.
519
Journal of Geophysical Research 109, E07009, 1-11. doi:10.1029/2004JE002273.
21
520
521
Bruno, B.C., Fagents, S.A., Hamilton, C.W., Burr, D.M., Baloga, S.M., 2006.
522
Identification of volcanic rootless cones, ice mounds, and impact craters on Earth and
523
Mars: Using spatial distribution as a remote sensing tool. Journal of Geophysical
524
Research 111, E06017, 1-16. doi:10.1029/2005JE002510.
525
526
Burr, D.M., Bruno, B.C., Lanagan, P.D., Glaze, L.S., Jaeger, W.L., Soare, R.J., Wan
527
Bun Tseung, J.M., Skinner, J.A., Baloga, S.M., 2009. Mesoscale raised-rim
528
depressions (MRRDs) on Earth: A review of the characteristics, processes and spatial
529
distributions of analogs for Mars. Planetary and Space Science 57, 579-596.
530
531
Cashman, K.V., Mangan, M.T., 1994. Physical aspects of magmatic degassing: II.
532
Constraints on vesiculation processes from textural studies of eruption products. In:
533
Carroll, M.R. and Holloway, J.R. (Eds.), Volatiles in Magmas. Reviews in
534
Mineralogy, Mineralogical Society of America, pp. 447–478.
535
536
Cashman, K.V., Kauahikaua, J.P., 1997. Reevaluation of vesicle distributions in
537
basaltic lava flows. Geology 25, 419–422.
538
539
Clark, P.J., Evans, F.C., 1954. Distance to nearest neighbor as a measure of spatial
540
relationships in populations. Ecology 35, 445–453.
541
542
Finney J., 1970. The geometry of random packing. Proceedings of the Royal Society
543
of London, Series A 318, 479-493.
544
22
545
Gaonac'h, H., Lovejoy, S., Schertzer, D., 2005. Scaling vesicle distributions and
546
volcanic eruptions. Bulletin of Volcanology 67, 350–357.
547
548
Gualda, G.A.R., Rivers, M., 2006. Quantitative 3D petrography using X-ray
549
tomography: Application to Bishop Tuff pumice clasts. Journal of Volcanology and
550
Geothermal Research 154, 48–62.
551
Gurioli, L., Harris, A.J.L., Houghton, B.F., Polacci, M., Ripepe, M., 2008. Textural
552
and geophysical characterization of explosive basaltic activity at Villarrica volcano,
553
Journal of Geophysical Research 113, B08206, 1-16. doi:10.1029/2007JB005328.
554
555
Hamilton, C.W., Thordarson, T., Fagents, S.A., 2010a, Explosive lava-water
556
interactions I: architecture and emplacement chronology of volcanic rootless cone
557
groups in the 1783-1784 Laki lava flow, Iceland. Bulletin of Volcanology, (in press)
558
559
Hamilton, C.W., Fagents, S.A., Thordarson, T., 2010b, Explosive lava-water
560
interactions II: self-organization processes among volcanic rootless eruption sites in
561
the 1783-1784 Laki lava flow, Iceland. Bulletin of Volcanology, (in press).
562
563
Higgins, M.D., 2000. Measurement of crystal size distributions. American
564
Mineralogist 85, 1105-1116.
565
566
Jerram, D.A., Cheadle, M.J., Hunter, R.H., Elliot, M.T., 1996. The spatial
567
distribution of crystals in rocks. Contributions to Mineralogy and Petrology 125(1),
568
60-74.
23
569
570
Jerram, D. A., Cheadle, M. J., 2000. On the cluster analysis of rocks. American
571
Mineralogist 84, 47-67.
572
573
Jerram, D.A., Cheadle, M.J., Philpotts, A.R., 2003. Quantifying the building blocks of
574
igneous rocks: are clustered crystal frameworks the foundation? Journal of Petrology
575
44, 2033-2051. doi:10.1093/petrology/egg069.
576
577
Lautze, N.C., Houghton, B.F., 2005. Physical mingling of magma and complex
578
eruption dynamics in the shallow conduit at Stromboli volcano, Italy. Geology 33
579
425–428.
580
581
Lindquist, W.B., Lee, S-M., Coker, D.A., Jones, K.W., Spanne, P., 1996. Medial axis
582
analysis of void structure in three-dimensional tomographic images of porous media.
583
Journal of Geophysical Research 101, 8297–8310.
584
585
Mangan, M.T., Cashman, K.V., 1996. The structure of basaltic scoria and reticulite
586
and inferences for vesiculation, foam formation, and fragmentation in lava fountains.
587
Journal of Volcanology and Geothermal Research 73, 1–18.
588
589
Mangan, M.T., Cashman, K.V., Newman S., 1993. Vesiculation of basaltic magma
590
during eruption, Geology 21, 157–160.
591
592
Mangan, M.T., Mastin, M., Sisson, T., 2004. Gas evolution in eruptive conduits:
593
combining insights from high temperature and pressure decompression experiments
24
594
with steady-state flow modelling. Journal of Volcanology and Geothermal Research
595
129, 23–36.
596
597
Marsh, B.D., 1988. Crystal size distributions (CSD) in rocks and the kinetics and
598
dynamics of crystallization: 1. Theory. Contributions to Mineralogy and Petrology 99,
599
277–291.
600
601
Mock, A., Jerram, DA., 2005. Crystal size distributions (CSD) in three dimensions:
602
Insights from the 3D reconstruction of a highly porphyritic rhyolite. Journal of
603
Petrology 46, 1525-1541.
604
605
Morgan, D., Jerram, D.A., 2006. On estimating crystal shape for crystal size
606
distribution analysis. Journal of Volcanology and Geothermal Research 154, 1-7.
607
608
Oh W., Lindquist W., 1999. Image thresholding by indicator Kriging. IEEE
609
Transactions on Pattern Analysis and Machine Intelligence 21, 590–602.
610
611
Polacci, M., Baker, D.R., Bai, L., Mancini L., 2007. Large vesicles record pathways
612
of degassing at basaltic volcanoes. Bulletin of Volcanology 70, 1023-1029. doi
613
10.1007/s00445-007-0184-8.
614
615
Polacci, M., Baker, D.R., Mancini, L., Tromba, G., Zanini, F., 2006. Three-
616
dimensional investigation of volcanic textures by X-ray microtomography and
617
implications for conduit processes. Geophysical Research Letters 33, L13312, 1-5.
618
doi:10.1029/2006GL026241.
25
619
620
Polacci, M., Papale, P., 1997. The evolution of lava flows from ephemeral vents at
621
Mount Etna: insights from vesicle distribution and morphological studies. Journal of
622
Volcanology and Geothermal Research 76, 1–17.
623
624
Proussevitch, A., Sahagian, D., Tsentalovich, E.P., 2007a. Statistical analysis of
625
bubble and crystal size distributions: Formulations and procedures. Journal of
626
Volcanology and Geothermal Research 164, 95-111.
627
628
Proussevitch, A., Sahagian, D., Carlson, W., 2007b. Statistical analysis of bubble and
629
crystal size distributions: Application to Colorado Plateau Basalts. Journal of
630
Volcanology and Geothermal Research 164, 112-126.
631
632
Proussevitch, A., Sahagian, D., 2001. Recognition and separation of discrete objects
633
within complex 3D voxelized structures. Computers & Geosciences 27, 441–454.
634
635
Rasband, W.S., 2005. ImageJ, U. S. National Institutes of Health, Bethesda,
636
Maryland, USA, http://rsb.info.nih.gov/ij/ [accessed November 28, 2009].
637
638
Sahagian, D., Proussevitch, A., 1998. 3D particle size distributions from 2D
639
observations: stereology for natural applications. Journal of Volcanology and
640
Geothermal Research 84, 173–196.
641
642
Sable, J., Houghton, B.F., Del Carlo, P., Coltelli, M., 2006. Changing conditions of
643
magma ascent and fragmentation during the Etna 122 BC basaltic Plinian eruption:
26
644
evidence from clast microtextures. Journal of Volcanology and Geothermal Research
645
158, 333–354.
646
647
Sarda, P., Graham, D., 1990. Mid-ocean ridge popping rocks: implication for
648
degassing at ridge crests. Earth and Planetary Science Letters 97, 268–289.
649
650
Shin, H., Lindquist, W.B., Sahagian, D.L., Song, S-R., 2005. Analysis of the vesicular
651
structure of basalts. Computers & Geosciences 31, 473–487.
652
653
Stephenson, G., 1992. Mathematical Methods for Science Students, 2nd edn.,
654
Longman Scientific & Technical, London, 528pp.
655
656
Still, S., Bialek, W., Bottou, L., 2004. Geometric clustering using the Information
657
Bottleneck method. In: Thrun, S., Saul, L., and Schölkopf, B. (Eds.), Advances in
658
Neural Information Processing Systems, 16, MIT Press, Cambridge, MA, pp. 1165-
659
1172.
660
661
Szramek, L., Gardner, J.E., Larsen, J., 2006. Degassing and microlite crystallization
662
of basaltic andesite magma erupting at Arenal Volcano, Costa Rica. Journal of
663
Volcanology and Geothermal Research 157, 182–201.
664
665
Wilkins, D.E., Ford, R.L., 2007. Nearest neighbor methods applied to dune field
666
organization: The Coral Pink Sand Dunes, Kane County, Utah, USA. Geomorphology
667
83, 48-57.
668
27
669
Appendix 1: Additional nearest neighbor analysis examples
670
Explosive interactions between lava and groundwater can generate volcanic rootless
671
cones (VRCs). VRCs are well suited to nearest neighbor (NN) analyses because they
672
occur in groups, which consist of a large number of landforms that share a common
673
formation mechanism. Geological Image Analysis Software (GIAS) includes NN
674
analysis tools that were used by Hamilton et al. (2010b) to investigate self-
675
organization processes within VRC groups and quantitatively compare the spatial
676
distribution of VRCs in the Laki lava flow in Iceland to analogous landforms in the
677
Tartarus Colles Region of Mars.
678
Figure 1 presents examples of rootless eruption sites within the Hnúta and
679
Hrossatungur groups of the Laki lava flow. In the field, Differential Global
680
Positioning System (DGPS) measurements were used to delimit rootless crater floors
681
and their centroids were assumed to represent the surface projection of underlying
682
rootless eruption sites (Hamilton et al., 2010a, 2010b). Hamilton et al. (2010a) used
683
tephrochronology and stratigraphic relationships to divide the complete study area
684
(Figure 1A) into chronologically and spatially distinct groups, domains, and
685
subdomains. Figure 1B provides an example of a VRC subdomain (Hnúta Subdomain
686
1.1). Figures 1C and D show the convex hull boundaries of the complete study area
687
and Subdomain 1.1 of the Hnúta Group, respectively. Figure 1 also shows the
688
locations of the convex hull vertices (Nv) and interior rootless eruption sites (Ni).
689
Identification of internal boundaries within the Hnúta and Hrossatungur
690
groups allowed Hamilton et al. (2010b) to examine the effects of scale and resource
691
availability on the spatial distribution explosive lava-water interactions. Relative to
692
the Poisson NN model, rootless eruption sites in both the complete study area (Ni =
693
2193) and Hnúta Subdomain 1.1 (Ni = 98) have |c| and |R| values that exceed their
28
694
respective sample-size-dependent limits of confidence at 2 σ. Consequently, neither of
695
these distributions are constant with a Poisson random model. Within the complete
696
study area, R is 0.72, which is lower than the 2 σ confidence limit of R (0.98) and
697
implies that the distribution is clustered with respect to the Poisson NN model. In
698
Hnúta Subdomain 1.1, R is 1.23, which is above the upper 2 σ confidence limit of
699
1.16 and implies that the distribution is repelled relative to a Poisson NN model.
700
Hamilton et al. (2010b) interpret these patterns of spatial distribution within the
701
context of resource abundance and depletion though competitive utilization. On
702
regional scales, VRCs cluster in locations that contain sufficient lava and groundwater
703
to initiate rootless eruptions, but on local scales rootless eruption sites tend to self-
704
organize into distributions that maximize the utilization of limited groundwater
705
resources. Hamilton et al. (2010b) interpret topography to be the primary influence on
706
regional clustering because it would control initial groundwater abundance and
707
regions of lava inundation. In contrast, they attribute local repelling to the non-
708
initiation, or early cessation, of rootless eruptions at unfavorable localities due to
709
groundwater depletion in a competitive system that is analogous to an aquifer
710
perforated by a network of extraction wells.
711
Baloga et al. (2007) showed that rootless eruption sites in the Tartarus Colles
712
Region of Mars have R = 1.22 and c = 6.14. These NN statistics are similar to those
713
observed within Hnúta Subdomain 1.1 (R = 1.23 and c = 4.45), which suggests that a
714
similar formation process in both environments. Thus the statistical NN analysis tools
715
provided within GIAS can be combined with field observations and remote sensing to
716
quantify patterns of spatial distribution and infer underlying mechanisms of formation
717
within a diverse range of geological systems.
718
29
719
Appendix 1 Figure 1:
720
A and B depict rootless eruption sites (red circles) superimposed on a 0.5 m/pixel
721
color aerial photograph showing 1783-1784 Laki eruption lava flows in Iceland. A:
722
Full extent of Hnúta and Hrossatungur groups and an inset (white rectangle) denoting
723
Hnúta Subdomain 1.1 (shown in B). C and D correspond to identical regions shown in
724
A and B, respectively, and provide examples of convex hull boundaries, rootless
725
eruption sites that form convex hull vertices (Nv), and interior points (Ni).
726
727
Appendix 1 Figure 1
30
728
Appendix 2: Derivation for the Scavenged Nearest Neighbor (k = 2) case
729
The probability of finding exactly k points within a circle of radius r on a plane can be
730
found using the Poisson distribution:
731
P(r , k ) 
(0 r 2 ) k 0 r 2
,
e
k!
[1]
732
where ρ0 is the spatial density—defined by the number of objects (N) divided by the
733
area of the feature field (A). If the process of creating a point on the plane consumes
734
(or “scavenges”), resources that would have formed the kth nearest neighbor, then the
735
process can be modelled as (Baloga et al., 2007):
736
737
738
739
P(r; k scavenges )  1  P(0)  P(1)   P(k ) .
[2]
For k = 2, the density distribution of the function can be written as:
p(r )  (0 ) 3 r 5 e 0r .
2
[3]
For convenience, we write   0 , leading to:
p(r )   3 r 5 e r .
2
740
741
To find the mean of this distribution, we multiply by r, integrate from 0 to ∞, and
742
normalise the resulting equation, to obtain first moment of the distribution r :
[4]
743

 3  r  r 5  e r dr
2
744
r 
0

r
.
5
e
r 2
[5]
dr
0
745
746
The integrals can be solved using integration by parts and two mathematical
747
identities for the odd and even terms of r (e.g., Stevenson, 1992). For the even terms,
748
the following identity can be used,
749
31
750

r
751
2n
 e r dr 
2
0
752
1.3.5(2n  1) 
,
1
2 n1 n 2
[6]
 e r dr
[7]
while odd integrals of the type,

r
753
2 n 1
2
0
754
can be solved using the substitution x2 = u and solving using integration by parts,
755
e.g.,

3
r
 r  e dr 
2
756
0

1
ue u du .

20
[8]
757
This leads to the following solution for the mean distribution of the nearest neighbour
758
distances for k = 2:
1 3  5  
759
760
r 
7
3
3
24   2  15    0  15 .
7 7
1 3
1
16 0
16 0 2 2

The second moment r 2 can be calculated in a similar manner:

761
[9]
r2 
r
2
 r 2  r 3  e r dr
2

0

r
5
e
r 2
dr
3
0
.
[10]
0
762
763
The standard deviation is calculated using the identity:  
15 2
0.27572

 2

.
0 16  0
0
3
764
Thus, the expected Standard Error (σe) is:
765
e 
0.27572
N  0
,
r2  r
2
:
[11]
[12]
32
766
where N is the number of points in the area of interest. The third and fourth moments
767
can be derived as above, leading to:
768

r3 
769
r
 r 5  e r dr
2
3

0

r
5
e
r 2
dr
105
320
3
;
[13]
2
0

r4 
770
r
4
 r 5  e r dr
2

0

r
5
e
r 2
dr
12
(0 ) 2
.
[14]
0
771
772
773
774
Using the moment formulae for skewness and kurtosis:
skew  r 3  3 r r 2  2 r
; kurt  r 4  4 r r 3  6 r
3
2
r 2  3 r , [15]
4
we obtain:
skew 
0.006663
 0
3
3
2
; kurt 
0.0174838
 4 0 2
.
[16; 17]
775
33
776
Appendix 3: Separation of population means
777
To examine the approximate number of samples required to find a statistically
778
significant separation of two Gaussian distributions, where the expected means and
779
standard deviations are known, a commonly-used test is to examine if zero lies within
780
the following confidence interval:
781
782




 
P s 0  s1  1.96 0  1   0  1  s 0  s1  1.96 0  1   0.95 ,
N
N
N
N 

[1]
783
where s0 and s1 are the sample means, μ0 and μ1 are the expected means and σ0 and σ1
784
are the expected standard deviations. N can be varied to estimate the number of
785
samples required to separate the distributions. In particular, if zero is within the
786
interval, then there is no discernible difference between the means of the distributions
787
at the 95% confidence level.
788
This method is applied to simulate two different means based upon the
789
expected values of the Poisson and the Scavenged NN when k = 1 and k = 2. To
790
simplify the problem, an approximation to a normal distribution is assumed.
791
Simulation from two normal distributions with increasing numbers of N was
792
coded in MATLAB. We found the number of samples required to statistically
793
differentiate between the distributions by varying ρ0.
794
For PNN and Scavenged NN, k = 1:
795
ρ0
(objects / unit area)
0.5
0.05
0.005
0.0005
Minimum N required to separate
Poisson NN and Scavenged NN (k = 1)
50
60
60
60
796
797
For Scavenged NN, k = 1 and Scavenged NN, k = 2:
34
ρ0
(objects / unit area)
0.5
0.05
0.005
0.0005
Minimum N required to separate
Poisson NN and Scavenged NN (k = 2)
100
120
130
150
798
799
Given that these simulations assume Normal distributions, it would be wise to
800
regard them as indicative of the minimum number of data points required to separate
801
the competing models. A much larger number of samples should be used separate the
802
Scavenged NN, k = 1 and k = 2 models because their expected means are relatively
803
close together.
804
35
805
Electronic Supplements
806
(1) MATLAB preparsed pseudo code and standalone MATLAB executable file,
807
available to the public at http://www.geoanalysis.org.
808
(2) GIAS help file
809
810
Figures and Tables
811
Tables
812
Table 1: Notation.
813
814
Table 2: Nearest neighbor model properties (after Baloga et al., 2007).
815
816
Table 3: “Image Analysis” module: summary of results.
817
818
Table 4: “Nearest Neighbor Analysis” module: summary of results.
819
820
Figures
821
Figure 1: Image analysis program in “Image Analysis” mode, with maximum object
822
areas truncated to ~104 pixels for visualization purposes which excludes very large
823
(and least circular) vesicles from displayed plots and histograms. Note that object
824
thresholding does not affect results presented in Tables 3 and 4.
825
826
Figure 2: Image analysis program in “Nearest Neighbor Analysis” mode with outputs
827
displaying results for a Poisson NN test.
828
36
829
Table 1.
Variable
A
Ahull
C
k
N
Ni
Nv
NN
P
R
R
ro
ra
re
P
ρ0
ρi
Σ
σe
830
Definition
Area of a feature field
Area of the convex hull
Statistic for comparing observed (actual) to expected mean NN distances
Poisson index
Number of features within a sample population
Number of features within the convex hull
Number of features forming the vertices of the convex hull
Abbreviation for nearest neighbor
Probability
Statistic for comparing actual (ra) to expected (re) mean NN distances
Radial distance
Threshold distance used to calculate Normalized Poisson NN distributions
Mean NN distance observed in the data
Mean NN distance calculated from a given NN model (e.g., Poisson)
Probability density
Population (spatial) density of the input features (ρ0 = N / A)
Population (spatial) density of the input features (ρi = Ni / Ahull)
Standard deviation
Standard error
37
831
Table 2.
Statistic
Poisson
Probability
Density (P)
20 re  0r
Mean (re)
1
3
15
2 0
4 0
16  0
Mode (rmode)
Scavenge k = 2
2(0 ) 2 r 3e  0r
2(0 )3 r 5e  0r
2
1
3
20
5
20
20
0.272249
0.27572
N 0
N 0
N 0
(  3)
0.008187
0.0066638
Standard Error
(σe)
0.26136
Skewness
4 30
Kurtosis
2
Scavenge k = 1
3
2
2  3 2 16
 4 (0 ) 2
 30
3
2
 30
3
2
0.016807
0.0174838
 4 0 2
 0
4
2
Normalized NN
20re   0 ( r0
2
20
e  0 r0
2

r
2
2
Logistic
e ( r  m) b
b[1  e ( r  m) b ]2
m
r 2 )
e  0 r dr
2
r0
1
20
m
r02  r 2  1 0
b
N
3
0
 3

1  2
r (r0  3r )  r 
 2r 2 
3  0
 
 20

1  
832
2 
3

2
2
2
  3r 4 
r  r  4r r0  6r 
 4  0  0
0 
0 2 
6
5
38
833
Table 3.
Image Analysis
Resolution (pixel size)
Minimum object area
Total number of objects
(excluding boundary objects)
Abundance (% of total)
Area (min. to max.)
Mean area (±1 σ)
Eccentricity (min. to max.)
Mean eccentricity (±1 σ)
Perimeter (min. to max.)
Mean perimeter (±1 σ)
Orientation (min. to max.)
Mean orientation (±1 σ)
Magmatic vesicles
(base unit: μm)
3.81 μm
20 pixels = 76.2 μm2
537
65.48
2.90 ×10 to 6.40 ×106
7.11 ×104 (±3.73 ×105)
0.00 to 0.97
0.71 (±0.18)
56.50 to 3.55 ×104
8.64 ×102 (±2.29 ×103)
-89.32 to 90.00
7.27 (±43.91)
2
834
39
835
Table 4.
Nearest Neighbor (NN)
Analysis
Magmatic vesicles
(base unit: μm)
Input distribution properties
Resolution (pixel size)
Minimum object area
Objects inside convex hull (Ni)
Interior object density (ρi)
Range of NN distances
Mean NN distance (ra; ±1 σ)
Skewness
Kurtosis
3.81 μm
20 pixels = 76.2 μm2
520
7.79 ×10-6
28.05 to 1.53 ×103
1.75 ×102 (±1.15 ×102)
2.48
16.13
Poisson model
Expected mean NN distance (re)
R with thresholds at 2 σ
c with thresholds at 2 σ
Conclusions (c value)
1.79 ×102
0.98 (0.97 and 1.07)
-0.91 (-1.33 and 2.83)
Model supported (<2 σ)
Normalized Poisson model
Distance threshold
Expected mean NN distance (re)
R with thresholds at 2 σ
c with thresholds at 2 σ
Conclusions (c value)
5 pixels = 19.05 μm
1.50 ×102
1.17 (0.97 and 1.07)
3.51 (-1.34 and 2.88)
Model rejected (>2 σ)
Repelled vs. Normalized
Scavenged (k = 1) model
Expected mean NN distance (re)
R with thresholds at 2 σ
c with thresholds at 2 σ
Conclusions (c value)
2.69 ×102
0.65 (0.99 and 1.06)
-21.82 (-0.69 and 3.91)
Model rejected (>2 σ)
Clustered vs. k = 1
Scavenged (k = 2) model
Expected mean NN distance (re)
R with thresholds at 2 σ
c with thresholds at 2 σ
Conclusions (c value)
3.36 ×102
0.52 (1.00 and 1.06)
-37.05 (-0.14 and 4.78)
Model rejected (>2 σ)
Clustered vs. k = 2
836
40
837
Figure 1
838
41
839
Figure 2
840
42
Download