Assignment 6: Vector Data Analysis Due March 2, 2012 Introduction A geographic information system comprises several components each of which plays a unique part in ensuring its overall functionality. Analytical functions, however, could be considered a centerpiece of a GIS, its reason d’être. Although other ways exist, analysis is the main vehicle of processing raw data into information that could be used in the decision making process. Most GIS packages have a wide range of analytical functions, including measurement techniques, queries, proximity analysis, overlay operations and analysis of surfaces and networks. The application of these analyses and techniques (algorithms) used to perform them differs depending on the spatial data model. This assignment gives you an opportunity to try your hand at analyzing vector data using measurement and overlay tools. Given the relative complexity and length of time required to learn and practice these techniques, this assignment contains only theory and guided tutorial sections. Section 1: Theory Textbook readings: Chapter 6 and lectures Answers to these questions can be found in the textbook, in the guided tutorial, or online. After reading the sources you find, please provide the answers in your own words. (each question is worth 2 marks) 1. Why Calculate Geometry tool can only be used to calculate the area, length, or perimeter of features if the dataset is projected? 2. Explain the difference between a feature and an entity as it applies to vector data model. 3. Which data encoding methods can be used for creating raster data and vector data? 4. Quires and overlays can be used to solve the same problems when analysing vector data. What is the difference between the results produced by these two methods of analysis? Describe situations when you would prefer to use each of these methods? 5. Define the buffering procedure. How is it used in GIS projects? 6. Describe common principles on which all three polygon-on-polygon overlay procedures are based. 1|Page Section 2: Guided Tutorial This section of the assignment is based on the ideas and data available on the website of the National Center for Ecological Analysis and Synthesis, a research center of the University of California, Santa Barbara. The Center has a wealth of spatial datasets showing distribution of plants and animals in the Western hemisphere and related data that can be downloaded for free. In this part of the assignment you will learn to apply some of the tools available in ArcGIS for analysis of vector data using a collection of point and polygon data sets showing distribution of two species of armadillo – Greater Naked-Tailed Armadillo and Southern Armadillo – collected in South America. You will find answers to four questions about the study area: 1. 2. 3. 4. What is the area of each species range? What is the total area of South America covered by (one or more) species ranges? What portion of the continent is covered by both Armadillo species ranges? Do all of the Armadillo species sightings occur inside the Armadillo species range? Instructions 1. Open AcrMap and add the data from T:\Class\Geography\geog303\Assignment6 folder to the map document. This folder contains four datasets: SouthAmericaStates.shp – a shapefile containing boundaries of South American countries. GreaterArmadilloPts.shp – a shapefile containing locations of sightings of a greater naked-tailed armadillo (Cabassous tatouay). GreaterArmadilloPoly.shp – a shapefile containing boundary of the species range for the greater naked-tailed armadillo. SouthernArmadilloPoly.shp – a shapefile containing boundary of the species range for a southern greater naked-tailed armadillo (Cabassous unicinctus). 2. If ArcCatalogue is not open in your ArcGIS session, click hte corresponding button on the Standard tool bar at the top of the ArcGIS window to open it. In ArcCatalogue, locate the folder with the data used in this exercise. Right click on the name of the first dataset listed in the Catalogue window and chose Properties. Click the X and Y Coordinate System tab to see the projection and coordinate system your data in. Repeat these steps for all four shapefiles. In this exercise you will measure the area of polygon features and perform several overlays. Will the projection and coordinate system your data is in be suitable for these operations? What type of projection you would want your data to be in? 3. We will have to project all four data sets in order to able to perform the analyses we planned to perform in this exercise. To calculate meaningful polygon areas, we need to transform each dataset into an equal area projection. Open the ArcToolbox window, in the Data Management Tools kit find Projections and Transformations tools for vector data located in the Feature subset and double-click on Project tool. In the window that opens, enter the name of the dataset you want to project. There are several ways to do that. You can: (a) click on the name of the dataset in the Table of Contents of the map document and grad and drop it into the corresponding line in the Project window; (2) alternatively 2|Page you can navigate to the dataset using browse button on the right of the line; or (3) select the name of the dataset from the drop down list for that line. For your output file, choose a name from the input dataset, for example SouthAmerica_prj. As your Output Coordinate System, Select South_America_Albers_Equal_Area_Conic from Predefined Projected Coordinate Systems Continental South America. From the list of Geographic Transformation options ArcGIS provides in the corresponding drop down list choose SAD_1969_To_WGS_1984_1. Click OK to run the transformation. Repeat the same procedure for all four shapefiles. 4. Delete the original files from the map document. Now you are all set to start the analysis. 5. To answer the first question -- What is the area of each species range? – you will need to use Calculate Geometry tool to calculate the area for each of the species range. Open the attribute table of the projected GreaterArmadilloPoly shapefile. In order to calculate the area of the features in this dataset you first need to add a field that will hold these values. Locate the Table Options button in the upper left corner of the Table window and select Add Field option. In the window that opens, name your field Area, set its Type to Double and Scale and Precision to 15 places (to allow for large numbers to be stored). Click OK to create the new field. After the field is added to the table, right-click on its name and select Calculate Geometry option. Since the dataset you are working with is small and the calculation is straightforward, you are going to perform the calculation outside an Edit session. Ignore the warning that pops up. When the tool window opens, make sure that the Property option is set to Area, accept defaults for the Coordinate System option and choose square kilometres as your Units. Click OK to perform the calculation based on the set parameters. Repeat this sequence on the projected SouthernArmadilloPoly shapefile. Note the answers to this question in a table below. 6. To answer the second question – What is the total area of South America covered by at least one of the two species ranges? – we need to overlay the three polygon datasets using a Union method. Since the question has to do with an area of South American continent, you need to calculate the area of the entities in the projected SouthAmericaStates shapefile before performing the overlay. To that, follow the steps outlined in section 5 above. After you prepared the SouthAmericaStates shapefile for further analysis, locate Analysis Tools kit in the ArcToolbox window. Expand the Overlay toolset and open the Union tool. In the window that opens, add all three projected polygon datasets as input features. Make sure that you save your output dataset where you can find it. For the rest of the settings in this window choose the default options and click OK to run the Union procedure. When the output dataset is added to the map document, open its attribute table and examine the fields. You may choose to Hide most of the fields that describe the species, except the PRESENCE fields (you should have two -- one for each species). This field contains ‘1’ for polygons representing the area where one of the two armadillo species is present. Why we can make this assumption? (Question 7: 2 points) Hint: check the original species range files and think about how the Union operation works. You can consult ArcHelp files. 3|Page Select these areas by performing an attribute query that will return a selection of features which have either one of the species present. Hint: use the OR operator. Now can find how much of the area of the continent has at least one of these species of armadillo present. Technically you can use any of the three area fields in the union output shapefile attribute table. But can you really? If you want to get an accurate answer to this question you will have to recalculate the geometry of one of the area fields. Why? (Question 8: 2 points) Right-click on an Area field’s name and select Calculate Geometry option. Uncheck the Calculate selected records only option to re-calculate area for all the records in the table while maintaining your selection. Recalculate geometry of all three Area fields in the table and compare the values. Are they different? Why? (Question 9: 2 points). After you done your calculations and compared the values, right-click on the Area field’s name and select Statistics option. This tool returns various summary statistics on the values in the field, including the sum. Note the sum of the area in a table below. This is the answer to the second question. 7. Clear the selected features in Union dataset and turn this layer off. You would not need to work with it any more. To answer the third question in our practice research project -- What portion of the continent is covered by both Armadillo species ranges? – we will perform another type of a polygon-on-polygon overlay, Intersect. Can you explain why in this case we should choose this method? (Question 10: 2 points) In the Analysis Tools kit, locate and open the Intersect tool. In the window that opens, add all three projected polygon datasets as input features. (The order in which you add them determines the order of attributes in the table of the output of the dataset.) Make sure that you save your output dataset where you can find it. For the rest of the settings in this window choose the default options and click OK to run the Intersect procedure. Examine the output dataset when it is added to the map document. Why does it have only one entity? (Question 11: 2 points) Hint: you may want to review how the Intersect procedure works in ArcHelp files. Open its attribute table of the output dataset and examine the fields. You may the fields that describe the species. Again, you got three area fields in this attribute table. Compare their values. Why are they different? (Question 12: 2 points). Again, you will have to re-calculate the area of the output feature using Calculate Geometry tool. Note the resulting area in a table below. This is the answer to the third question. 8. Finally, let’s find the answer to the last question in our project – Do all of the Armadillo species sightings occur inside the Armadillo species range? Based on the data we have, we can answer this question only for the greater naked-tailed armadillo species. We will use the point-in-polygon overlay procedure to get the answer. Make sure that your output data sets are turned off and only projected polygon and point datasets are displayed in the data view. In the Analysis Tools kit, locate and open the Intersect tool. In the window that opens, set the projected GreaterArmadilloPts shapefile as Target dataset and projected GreaterArmadilloPoly 4|Page shapefiles as Join dataset. Make sure that you save your output dataset where you can find it. Leave the Join option as One-to-One (default). In Field Map of Join Features you may delete duplicate fields. (Both the point and polygon files contain the same set of attributes.) Since we want to know which of the sittings are within the species range boundary, choose WITHIN as the Match Option. Click OK to run the Join. When the output dataset is added to the map document, turn off the point file that was used as an input dataset in this analysis. Open the attribute table of the output dataset and examine its attributes. The Join Count field displays the results the spatial join analysis – the cells containing ‘1’ belong to the records representing armadillo sittings with this species range. You will use this field to present the result of this analysis on a map. Close the attribute table and double-click on the output dataset name to open the Properties window. Click the Symbology tab open and select the Categories Unique values symbology to display the features in the dataset. Make sure that Joint Count is selected as the Value Field. Click Add All Values button to add values from this field to the classification box. By clicking on the corresponding text, change the Label for ‘0’ category to ‘outside the range’ and to ‘within the range’ for ‘1’ category. Remove the check mark next to All Other Values category and click OK to apply your symbology and close the Properties window. Create a map showing the result of Spatial Join analysis using LetterLandscape template on the Traditional Layouts tab of the Select Template window. Note: You can make your map more visually attractive by double-clicking on the data frame (contains the map in the layout view) and choosing Focus Data Frame option. When the data frame becomes editable you can zoom in or pan the map features using the tools you would you use in the data view. Question 13 Submit a map showing results of your work. 5 marks Question 13 Submit a table showing you area calculations. 5 marks Species Area covered by species range, sq. km Greater Naked-Tailed Armadillo Southern Greater Naked-Tailed Armadillo At least on these species Two species co-occur 5|Page