Indices of regularity

advertisement
LOCREGULARMC
! Intensive !
To perform Monte Carlo tests for randomness, against the alternative hypotheses of regularity or local
regularity, using three statistics based upon nearest neighbour distances.
RUNNING THE MACRO
Calling statement
locregularmc c1 c2 k1 ;
nsim k1 (999) ;
distances m1 ;
statistics c1-c3.
Input
c1 and c2 should contain paired x and y co-ordinates for each point in the plane.
k1 should be the number of points at which observations are available.
Subcommands
nsim
Number of Monte Carlo simulations
distances
Specify a matrix within which to store the distance matrix
statistics
Specify three columns, within which to store simulated statistics for D (column 1),
S (column 2) and G (column 3).
Output
 Observed D, S and G statistics, with associated 1-sided randomization p-values
 Indices of regularity, based upon S and G.
Note that only the one-sided randomization p-values which test against the alternative hypothesis of
regularity are given.
ALTERNATIVE PROCEDURES : Standard procedures
No standard MINITAB procedure is available.
TECHNICAL DETAILS
Null hypothesis : Complete spatial randomness in the location of data points within the region.
Alternative hypothesis : Regularity (including local regularity).
Test-statistic : We use three different test-statistics. The three statistics considered here are :
D, the mean squared nearest neighbour distance.
S, the coefficient of variation of the squared nearest neighbour distances
G, the ratio of the geometric mean of the squared NN distances to their arithmetic mean.
These are defined to be n
n
 n 
D   vi n , S   (vi  D) 2 (n  1) , G    vi 
i 1
i 1
 i 1 
where the vi are squared nearest neighbour distances.
1/ n
D,
The test-statistics have the following properties  D > 0. Large values of D tend to indicate regularity.
 S > 0. Small values of S tend to indicate regularity. For complete regularity, S = 0.

G lies between 0 and 1. Large values of G tend to indicate regularity.
The test-statistic D is often used to test for spatial randomness, against alternative hypotheses of both
clustering and regularity, and S and G can be used for the same purpose. In this macro, we restrict
attention to the one-sided alternative hypothesis of regularity, since there are problems in interpreting
deviations of S and G from randomness in the opposite direction.
All three statistics should be sensitive to the detection of global regularity (i.e. the usual kind of largescale regularity), but Brown & Rothery (1978) suggest that S and G should be more effective teststatistics than D for detecting local regularity (i.e. regularity at the small scale, although the data are
clustered at the large scale).
Simulation procedure : Assume that the sample size is n. For each Monte Carlo simulation, we simulate
n realisations from a continuous uniform distribution on the interval [XMIN, XMAX], and use these as x
co-ordinates. We also simulate n realisations from a continuous distribution on the interval [YMIN,
YMAX], and use these as y co-ordinates. We pair the x and y co-ordinates randomly, and use the resulting
points as our simulated dataset.
Indices of regularity
Brown and Rothery (1978) suggest that suitable indices of regularity / local regularity might be :
IG = sqrt(1 - G)
IS = sqrt(S).
REFERENCE
BROWN, D. & ROTHERY, P. (1978), Randomness and local regularity of points in a plane,
Biometrika, 65, pp. 115-122.
WORKED EXAMPLE FOR LOCREGULARMC
Name of dataset
CELLS
Description
The data record the location of the centres of 42 biological cells within a fixed square of known size. Data
have been scaled so that x and y co-ordinates must lie between 0 and 1.
Source
DIGGLE, P.J. (1983) Statistical analysis of spatial point patterns, Academic Press, London (pp. 1).
Original sources
RIPLEY, B.D. (1977), Modelling spatial patterns (with Discussion), JRSS series B, 39, pp. 172-212.
CRICK, F.H.C. & LAWRENCE, P.A. (1975), Compartments and polychones in insect development, Science,
189, pp. 340-347.
Data
Number of observations = 42
Number of variables = 2
For each point, the x-value (top) and y-value (bottom) are given.
0.350 0.062 0.938 0.462 0.462 0.737 0.800 0.337 0.350 0.637 0.325
0.025 0.362 0.400 0.750 0.900 0.237 0.387 0.750 0.962 0.050 0.287
2
0.350 0.737 0.487 0.212 0.150 0.525 0.625 0.825 0.650 0.725 0.237
0.600 0.687 0.087 0.337 0.500 0.650 0.950 0.125 0.362 0.512 0.787
0.775 0.450 0.562 0.862 0.237 0.337 0.987 0.775 0.087 0.900 0.862
0.025 0.287 0.575 0.637 0.150 0.462 0.512 0.850 0.187 0.262 0.525
0.637 0.575 0.600 0.175 0.175 0.400 0.462 0.062 0.900
0.812 0.212 0.475 0.650 0.912 0.162 0.425 0.750 0.775
Plot
y
1
0
0
1
x
Worksheet
C1
X-co-ordinates
C2
Y-co-ordinates
Aims of analysis
To investigate whether the distribution of cells within the study region is random.
Randomization procedure
MTB > Retrieve "N:\resampling\Examples\Cells.MTW".
Retrieving worksheet from file: N:\resampling\Examples\Cells.MTW
# Worksheet was saved on 24/08/01 15:00:13
Results for: Cells.MTW
MTB > % N:\resampling\library\locregularmc c1 c2 42 ;
SUBC> nsim 999 ;
SUBC> distances m1 ;
SUBC> statistics c4-c6
Executing from file: N:\resampling\library\locregularmc.MAC
What are the minimum and maximum possible x co-ordinates ?
3
* NOTE * Please enter 2 values (min and max), then press return.
DATA> 0 1
What are the minimum and maximum possible y co-ordinates ?
* NOTE * Please enter 2 values (min and max), then press return.
DATA> 0 1
Monte-Carlo tests for local regularity
of points in a fixed rectangular plane
Data Display
Observed D statistic 0.01694
Randomization p-value
0.0010
Data Display
Observed S statistic 0.06669
Randomization p-value
0.0010
Data Display
Observed G statistic 0.9626
Randomization p-value 0.001000
Indices of local regularity
Data Display (WRITE)
Index based on G
Index based on S
0.1934
0.2582
* NOTE * For further details, see
BROWN, D. & ROTHERY, P. (1978), Randomness
and local regularity of points in a plane,
Biometrika, 65, pp. 115-122.
Modified worksheet
M1
A 65*65 matrix of distances between sample points
C4
A column of 999 D statistics, one for each simulated dataset
C5
A column of 999 S statistics, one for each simulated dataset
C5
A column of 999 G statistics, one for each simulated dataset
Discussion
All three statistics have picked up on the obvious (global) regularity in the dataset, with p-values of 0.001
(the minimum possible p-value for 999 randomizations) in all cases.
4
Download