RNR 473/573

advertisement
RNR 473/573
Assignment 1: Random Processes
Spring, 2013
The goal of the assignment is to analysis the distribution from a set of random realizations. We
will generate 10 random realizations of points, perform the nearest neighbor analysis on the
points, and examine the distribution of the average minimum distance statistic from the 10
random realizations. This is a simple application of the Monte Carle approach. Commonly, in
using the Monte Carlo approach, 100s, if not 1000s, of realizations would be created to generate
a distribution for test statistics. However, this lab will introduce you to the basic concepts.
The principle issue is that with many spatial problems we do not have a theoretical population
distribution in which to compare our observations. This is a result of a place-based analysis
where the characteristics of the analysis area and spatial distribution of the sample which makes
the use theoretical population distributions problematic. One approach to address this issue is to
create, using the Monte Carlo approach, a population distribution of a metric with specific
statistical properties. Most often we generate a population distribution that representing
complete spatial randomness (CSR). We then compare our observed sample metrics to
generated population distribution to evaluate if our sample is CSR. With a large sample you can
also usually assume that your CSR population is normally distribution and you can use the
normal deviate transformation (z-statistic) in the evaluation process. The level of confidence in
generated testing metrics is 1/n, where n is the number of realizations used to compute the
metrics. In our case the probability level we would be testing at is 1/10 or 0.1.
In this lab we will be examining point patterns using the nearest neighbor method. The metric
we will create a population distribution for is the average minimum distance between neighbors
(NND). We will create a population distribution using only 10 realizations (i.e. samples) which
is small and is likely not to results in a normal distribution of NND, but we will assume that it
does for illustration. Based on our 10 realizations we will compute the mean and variance from
the 10 generated mean nearest neighbor distances, which would represent the Expected values
for a random set of points within the specified analysis area. We will then use our generated
statistics evaluate two point patterns (clustered and uniform) and compare our generated values
to theoretical population statistics.
Procedure:
1. We will use the uniform random number in EXCEL to generate 100 points within a square
study area with the dimensions of 100 x 100 (easting 0 – 100, northing 0-100). Use the function
RAND () to obtain the random numbers. RAND outputs random numbers between 0 and 1.0.
Convert the random numbers to your coordinate system (i.e. 0 –100) by multiplying the RAND()
number by 100. You will save 10 sets of 100 points into 10 EXCEL files (random1, random2,
…, random10). The files should have 3 columns of x, y and z. Make all z values equal to 1,
representing unweighted points. Make sure to create a header for the files (X,Y,Z)s. Note, if you
apply RAND in the 1st cell and then copy to the other cells all you need to do is reapply RAND
in the 1st cell to generate a new realization.
Example of final output:
X
Y
Z
25
45
etc.
99
21
1
1
A word on random number generators: if you are doing stochastic based modeling you will need
to use a random number generator. This is a research topic in itself and there are numerous
algorithms available to produce random numbers. People have been trying to build the “perfect”
random number generator for decades, and although they have gotten better, most generators still
have problems. The most common problems include repeated sequences and decrease variability
(i.e. approaching a constant value) as the sequences increase. If you are producing “short”
sequences (e.g. 100 points) most generators, including EXCEL’s, will suffice. If stochastic
simulation is going to be a focus of your research you should review the literature on random
number generation, select a generator/algorithm that fits your needs, and make sure you
understand how it works.
Using Excel: You can create one Excel spreadsheet with 10 sheets or save 10 individual
spreadsheets. You will need to save the spreadsheet using the Excel 97-3003 Worksheet
format. We believe this is an error in 10.1 since 10.0 would let you use the current Excel
format.
2. Open ArcMap 10.1. Add the random# EXCEL files (sheet1). Use the Display X,Y data tool
(right click on the Excel file, open Display X,Y Data tool). You do not need to define the
coordinate system. In order to use Average Nearest Neighbor tool in ArcMap we must convert
the point event files to shapefiles (right click on the event file, Open Data, Open Export Data). In
the Export Data window save your shapefile to your work space (I would call them Random1,
Random2, etc.). You can use the layer’s source data for the coordinate system. In the Save as
type drop down menu select Shapefile. Add the shapefile to your map document.
3. Open ArcToolbox. Open the Average Nearest Neighbor tool (Spatial Analyst Tools 
Analyzing Patterns  Average Nearest Neighbor). The Input Feature Class is one of the
Random# shapefiles, use the EUCLIDEAN_DISTANCE method, and click on Generate Report.
Make the Area 10000 units. You will do this 10 times for each Random# shapefile.
When you are done running the Average Nearest Neighbor tool open the session results
(Geoprocessing menu  Results). Open the HTML Report File. Create a table from the results
that includes Observed Mean Distance, Expected Mean Distance, Nearest Neighbor Ratio, zscore, and p-value..
There are two shapefiles in the Lab1 folder, cluster and uniform. Run the Average Nearest
Neighbor tool on these shapefiles and add the result to the table.
4. Examine the distribution of the observed means and z-values. For your 10 realizations
compute the mean, median, variance and standard deviation for the observed mean values.
5. Write a (brief) report on the results. Your results should have a table. The table should:
1. The Observed Mean, Expected Mean, Expected Variance, Z-Score, and p-value for the
10 Realizations. Note the Expected Mean and Variance should always be the same.
2. The computed mean, median, variance and standard deviation for your 10 realizations.
3. The results from the clustered and dispersed point sets.
Your report should address the following questions:
1. Compare the computed Expected values for the mean and variance and to the theoretical
Expected values. Are the computed values approaching the theoretical values?
2. Look at the range of z scores, what does it tell you about the different patterns created by
the generated random points? Are any of your patterns “clustered” or “dispersed”?
3. Are the patterns in the clustered and dispersed point datasets significantly different from
randomness?
Format for the report:
Header: Name, Date, Assignment Number
I.
Objective Statement
II.
Review of Methods
III.
Results
IV.
Discussion
V.
Literature Cited (if needed)
VI.
Appendix (if needed)
Download