RNR 473/573 Assignment 1: Random Processes Spring, 2013 The goal of the assignment is to analysis the distribution from a set of random realizations. We will generate 10 random realizations of points, perform the nearest neighbor analysis on the points, and examine the distribution of the average minimum distance statistic from the 10 random realizations. This is a simple application of the Monte Carle approach. Commonly, in using the Monte Carlo approach, 100s, if not 1000s, of realizations would be created to generate a distribution for test statistics. However, this lab will introduce you to the basic concepts. The principle issue is that with many spatial problems we do not have a theoretical population distribution in which to compare our observations. This is a result of a place-based analysis where the characteristics of the analysis area and spatial distribution of the sample which makes the use theoretical population distributions problematic. One approach to address this issue is to create, using the Monte Carlo approach, a population distribution of a metric with specific statistical properties. Most often we generate a population distribution that representing complete spatial randomness (CSR). We then compare our observed sample metrics to generated population distribution to evaluate if our sample is CSR. With a large sample you can also usually assume that your CSR population is normally distribution and you can use the normal deviate transformation (z-statistic) in the evaluation process. The level of confidence in generated testing metrics is 1/n, where n is the number of realizations used to compute the metrics. In our case the probability level we would be testing at is 1/10 or 0.1. In this lab we will be examining point patterns using the nearest neighbor method. The metric we will create a population distribution for is the average minimum distance between neighbors (NND). We will create a population distribution using only 10 realizations (i.e. samples) which is small and is likely not to results in a normal distribution of NND, but we will assume that it does for illustration. Based on our 10 realizations we will compute the mean and variance from the 10 generated mean nearest neighbor distances, which would represent the Expected values for a random set of points within the specified analysis area. We will then use our generated statistics evaluate two point patterns (clustered and uniform) and compare our generated values to theoretical population statistics. Procedure: 1. We will use the uniform random number in EXCEL to generate 100 points within a square study area with the dimensions of 100 x 100 (easting 0 – 100, northing 0-100). Use the function RAND () to obtain the random numbers. RAND outputs random numbers between 0 and 1.0. Convert the random numbers to your coordinate system (i.e. 0 –100) by multiplying the RAND() number by 100. You will save 10 sets of 100 points into 10 EXCEL files (random1, random2, …, random10). The files should have 3 columns of x, y and z. Make all z values equal to 1, representing unweighted points. Make sure to create a header for the files (X,Y,Z)s. Note, if you apply RAND in the 1st cell and then copy to the other cells all you need to do is reapply RAND in the 1st cell to generate a new realization. Example of final output: X Y Z 25 45 etc. 99 21 1 1 A word on random number generators: if you are doing stochastic based modeling you will need to use a random number generator. This is a research topic in itself and there are numerous algorithms available to produce random numbers. People have been trying to build the “perfect” random number generator for decades, and although they have gotten better, most generators still have problems. The most common problems include repeated sequences and decrease variability (i.e. approaching a constant value) as the sequences increase. If you are producing “short” sequences (e.g. 100 points) most generators, including EXCEL’s, will suffice. If stochastic simulation is going to be a focus of your research you should review the literature on random number generation, select a generator/algorithm that fits your needs, and make sure you understand how it works. Using Excel: You can create one Excel spreadsheet with 10 sheets or save 10 individual spreadsheets. You will need to save the spreadsheet using the Excel 97-3003 Worksheet format. We believe this is an error in 10.1 since 10.0 would let you use the current Excel format. 2. Open ArcMap 10.1. Add the random# EXCEL files (sheet1). Use the Display X,Y data tool (right click on the Excel file, open Display X,Y Data tool). You do not need to define the coordinate system. In order to use Average Nearest Neighbor tool in ArcMap we must convert the point event files to shapefiles (right click on the event file, Open Data, Open Export Data). In the Export Data window save your shapefile to your work space (I would call them Random1, Random2, etc.). You can use the layer’s source data for the coordinate system. In the Save as type drop down menu select Shapefile. Add the shapefile to your map document. 3. Open ArcToolbox. Open the Average Nearest Neighbor tool (Spatial Analyst Tools Analyzing Patterns Average Nearest Neighbor). The Input Feature Class is one of the Random# shapefiles, use the EUCLIDEAN_DISTANCE method, and click on Generate Report. Make the Area 10000 units. You will do this 10 times for each Random# shapefile. When you are done running the Average Nearest Neighbor tool open the session results (Geoprocessing menu Results). Open the HTML Report File. Create a table from the results that includes Observed Mean Distance, Expected Mean Distance, Nearest Neighbor Ratio, zscore, and p-value.. There are two shapefiles in the Lab1 folder, cluster and uniform. Run the Average Nearest Neighbor tool on these shapefiles and add the result to the table. 4. Examine the distribution of the observed means and z-values. For your 10 realizations compute the mean, median, variance and standard deviation for the observed mean values. 5. Write a (brief) report on the results. Your results should have a table. The table should: 1. The Observed Mean, Expected Mean, Expected Variance, Z-Score, and p-value for the 10 Realizations. Note the Expected Mean and Variance should always be the same. 2. The computed mean, median, variance and standard deviation for your 10 realizations. 3. The results from the clustered and dispersed point sets. Your report should address the following questions: 1. Compare the computed Expected values for the mean and variance and to the theoretical Expected values. Are the computed values approaching the theoretical values? 2. Look at the range of z scores, what does it tell you about the different patterns created by the generated random points? Are any of your patterns “clustered” or “dispersed”? 3. Are the patterns in the clustered and dispersed point datasets significantly different from randomness? Format for the report: Header: Name, Date, Assignment Number I. Objective Statement II. Review of Methods III. Results IV. Discussion V. Literature Cited (if needed) VI. Appendix (if needed)