LOCREGULARMC ! Intensive ! To perform Monte Carlo tests for randomness, against the alternative hypotheses of regularity or local regularity, using three statistics based upon nearest neighbour distances. RUNNING THE MACRO Calling statement locregularmc c1 c2 k1 ; nsim k1 (999) ; distances m1 ; statistics c1-c3. Input c1 and c2 should contain paired x and y co-ordinates for each point in the plane. k1 should be the number of points at which observations are available. Subcommands nsim Number of Monte Carlo simulations distances Specify a matrix within which to store the distance matrix statistics Specify three columns, within which to store simulated statistics for D (column 1), S (column 2) and G (column 3). Output Observed D, S and G statistics, with associated 1-sided randomization p-values Indices of regularity, based upon S and G. Note that only the one-sided randomization p-values which test against the alternative hypothesis of regularity are given. ALTERNATIVE PROCEDURES : Standard procedures No standard MINITAB procedure is available. TECHNICAL DETAILS Null hypothesis : Complete spatial randomness in the location of data points within the region. Alternative hypothesis : Regularity (including local regularity). Test-statistic : We use three different test-statistics. The three statistics considered here are : D, the mean squared nearest neighbour distance. S, the coefficient of variation of the squared nearest neighbour distances G, the ratio of the geometric mean of the squared NN distances to their arithmetic mean. These are defined to be n n n D vi n , S (vi D) 2 (n 1) , G vi i 1 i 1 i 1 where the vi are squared nearest neighbour distances. 1/ n D, The test-statistics have the following properties D > 0. Large values of D tend to indicate regularity. S > 0. Small values of S tend to indicate regularity. For complete regularity, S = 0. G lies between 0 and 1. Large values of G tend to indicate regularity. The test-statistic D is often used to test for spatial randomness, against alternative hypotheses of both clustering and regularity, and S and G can be used for the same purpose. In this macro, we restrict attention to the one-sided alternative hypothesis of regularity, since there are problems in interpreting deviations of S and G from randomness in the opposite direction. All three statistics should be sensitive to the detection of global regularity (i.e. the usual kind of largescale regularity), but Brown & Rothery (1978) suggest that S and G should be more effective teststatistics than D for detecting local regularity (i.e. regularity at the small scale, although the data are clustered at the large scale). Simulation procedure : Assume that the sample size is n. For each Monte Carlo simulation, we simulate n realisations from a continuous uniform distribution on the interval [XMIN, XMAX], and use these as x co-ordinates. We also simulate n realisations from a continuous distribution on the interval [YMIN, YMAX], and use these as y co-ordinates. We pair the x and y co-ordinates randomly, and use the resulting points as our simulated dataset. Indices of regularity Brown and Rothery (1978) suggest that suitable indices of regularity / local regularity might be : IG = sqrt(1 - G) IS = sqrt(S). REFERENCE BROWN, D. & ROTHERY, P. (1978), Randomness and local regularity of points in a plane, Biometrika, 65, pp. 115-122. WORKED EXAMPLE FOR LOCREGULARMC Name of dataset CELLS Description The data record the location of the centres of 42 biological cells within a fixed square of known size. Data have been scaled so that x and y co-ordinates must lie between 0 and 1. Source DIGGLE, P.J. (1983) Statistical analysis of spatial point patterns, Academic Press, London (pp. 1). Original sources RIPLEY, B.D. (1977), Modelling spatial patterns (with Discussion), JRSS series B, 39, pp. 172-212. CRICK, F.H.C. & LAWRENCE, P.A. (1975), Compartments and polychones in insect development, Science, 189, pp. 340-347. Data Number of observations = 42 Number of variables = 2 For each point, the x-value (top) and y-value (bottom) are given. 0.350 0.062 0.938 0.462 0.462 0.737 0.800 0.337 0.350 0.637 0.325 0.025 0.362 0.400 0.750 0.900 0.237 0.387 0.750 0.962 0.050 0.287 2 0.350 0.737 0.487 0.212 0.150 0.525 0.625 0.825 0.650 0.725 0.237 0.600 0.687 0.087 0.337 0.500 0.650 0.950 0.125 0.362 0.512 0.787 0.775 0.450 0.562 0.862 0.237 0.337 0.987 0.775 0.087 0.900 0.862 0.025 0.287 0.575 0.637 0.150 0.462 0.512 0.850 0.187 0.262 0.525 0.637 0.575 0.600 0.175 0.175 0.400 0.462 0.062 0.900 0.812 0.212 0.475 0.650 0.912 0.162 0.425 0.750 0.775 Plot y 1 0 0 1 x Worksheet C1 X-co-ordinates C2 Y-co-ordinates Aims of analysis To investigate whether the distribution of cells within the study region is random. Randomization procedure MTB > Retrieve "N:\resampling\Examples\Cells.MTW". Retrieving worksheet from file: N:\resampling\Examples\Cells.MTW # Worksheet was saved on 24/08/01 15:00:13 Results for: Cells.MTW MTB > % N:\resampling\library\locregularmc c1 c2 42 ; SUBC> nsim 999 ; SUBC> distances m1 ; SUBC> statistics c4-c6 Executing from file: N:\resampling\library\locregularmc.MAC What are the minimum and maximum possible x co-ordinates ? 3 * NOTE * Please enter 2 values (min and max), then press return. DATA> 0 1 What are the minimum and maximum possible y co-ordinates ? * NOTE * Please enter 2 values (min and max), then press return. DATA> 0 1 Monte-Carlo tests for local regularity of points in a fixed rectangular plane Data Display Observed D statistic 0.01694 Randomization p-value 0.0010 Data Display Observed S statistic 0.06669 Randomization p-value 0.0010 Data Display Observed G statistic 0.9626 Randomization p-value 0.001000 Indices of local regularity Data Display (WRITE) Index based on G Index based on S 0.1934 0.2582 * NOTE * For further details, see BROWN, D. & ROTHERY, P. (1978), Randomness and local regularity of points in a plane, Biometrika, 65, pp. 115-122. Modified worksheet M1 A 65*65 matrix of distances between sample points C4 A column of 999 D statistics, one for each simulated dataset C5 A column of 999 S statistics, one for each simulated dataset C5 A column of 999 G statistics, one for each simulated dataset Discussion All three statistics have picked up on the obvious (global) regularity in the dataset, with p-values of 0.001 (the minimum possible p-value for 999 randomizations) in all cases. 4