The Yale Map Collection GIS Workshop #3 Finding GIS Data & Preparing It for Use Census Data Download and Import into ArcGIS Goals of this Tutorial: To ensure that you can download both cartographic and demographic data from the U.S. Bureau of the Census in a form that can be imported into ArcGIS. To ensure that you know how to join attribute data to cartographic data. 1) First, if you are using Internet Explorer, you must add the census sites to your trusted sites list to allow downloads and pop-ups for this exercise. 2) In Internet Explorer, go to Tools>Internet Options and click on the Security Tab. 1 3) Highlight the green Trusted Sites icon, and click the Sites… button. 4) Uncheck the “Require server verification…” radio button. 5) Add census.gov and factfinder.census.gov to the Trusted Web Sites list. 6) Click OK twice to exit the Security setting dialog. Downloading and Pre-processing Census Tract Data from the Internet 7) Create a directory called Census_Data_Download in your working directory (it is always a good idea to place your working directory on the C: Drive, and without spaces or special characters in the folder name). 8) Go to www.census.gov. 9) Click on “Geography”. 2 10) Click on “Census 2000 Geographic Products”. 11) Click on “ Census 2000 Map Series ” under the Census 2000 Maps heading. 12) Click on “BOUNDARY FILES.” 13) Click on the “Download Boundary Files” icon on the left hand side. 14) Click on “2000” next to the words “Census Tracts.” (Note that you are provided with MANY possible Census Boundary File choices on this page. However, the census does not provide access to the Block level boundary file from the website, although these can be obtained from the ESRI Data & Maps CD Set, or from the ESRI Download Website). 15) Scroll down to “Census 2000: Census Tracts in ArcView Shapefile (.shp) format.” 16) Click on “Connecticut - tr09_d00_shp.zip (220,667 bytes).” When prompted, save this file to your \Data\Shapefile folder. 17) Next navigate to the \Data\Shapefile folder, right-click on the file you just downloaded and “Extract All.” 18) Make sure you have the three files with the extensions .dbf, .shx, .shp. Subsetting the Census Boundary File to an Area of Interest (AOI) Census Tracts 2000 The next step is to extract New Haven county census tracts from Connecticut state data using the FIPS (Federal Information Processing Standard) code attribute. 3 1) Open ArcMap with a new empty map and add the Connecticut census tracts to your map. 2) Save your map in your Census_Data_Download directory as “Census_Data_Download.mxd.” 3) Go to the FIPS PUB 6-4 Lookup (http://www.census.gov/geo/www/fip s/fips65/index.html) website to find out which FIPS code is associated with New Haven County (in this case, its 009). 4) In the Main Menu of ArcMap, Go to Selection>Select by Attributes. 5) Enter (or click on the appropriate Field name, operator, and value) "COUNTY" = '009' in the query dialog box. Click Apply and then Close. 6) The census tracts for New Haven County will be highlighted. 7) Right click on tr09_d00 layer in the Table of Contents on the left and Select Data>Export Data. 8) The Export: Selected Features option will be the default, since there is an active selection in the Map Document. 9) Browse to your working directory and Save the Export Shapefile as New_Haven_Tracts.shp. 4 10) Click OK. 11) Click Yes when prompted to add the new shapefile to the map as a layer. 12) Right-Click on the tr09_d00 layer and choose Remove to remove it from the current project. 13) Click on the Full Extent button to zoom into New Haven County. 14) Right-Click on New_Haven_Tracts layer and Open the Attribute Table. 15) Notice that you have 184 features. Notice that you have almost no attribute data, other than numerical identifiers. 16) Notice that you do have a column called Tract. This is going to be our Key Field that we use to join attribute data to our geography. 17) Save your map. 18) If your data looks like the picture below… Congratulations!, you got the cartographic data that you need. 5 Click Here Downloading and Pre-Processing Attribute Data In this section we are going to download attribute data that pertains to the population that lives in the census tracts for which you downloaded boundaries in the previous section. The attribute data that we want is the racial split of the population. After we download the data we are going to edit it Excel in order to import it into the GIS, and we will do it in such a way that we can assign each census tract its appropriate population/race values. 1) Go to http://www.census.gov/ 2) Click on “American FactFinder” on the left side column. Click Here 3) Click on “get data” under Decennial Census. 4) Make sure the radio button for “Census 2000 Summary File 1 (SF 1) 100-Percent Data” in the middle of the page. 5) In the “Select from the following” column in the right side of the page, select “Detailed Tables.” 6) In the new page under “Select a geographic type” choose “Census Tract” from the Dropdown menu. 7) Select “Connecticut” from the state dropdown menu. 6 8) Choose “New Haven County” from the county dropdown menu. 9) Under “Select one or more geographic areas…” menu select “All Census Tracts” and then click the “Add” button. 10) Highlight the first “Census Tract 0” item and click Remove to remove it from the list. Click the “Next” button. 11) In the “Select one or more tables…” field choose table “P3. Race” and click on “Add”. Then click “Show Result” You now have a table of attributes where each column is a census tract and each row is a different racial attribute. The page shows only the first 10 census tracts, but you can scroll to the “next” page if you want to see more. We won’t be using all of the attributes (rows), but we will need to download the entire table and then cut out what we don’t need. However, we are still missing one important element. We need a common identifier for each census tract that we can use to join this table with our table in the GIS. 12) Under “Options” on the top of the page choose “Show Geographic Identifiers”. Now you see two tables of information (if you scroll down). The first table shows all of the geographic identifiers for each tract, and the second table is the table of racial attributes that we saw before. We are going to download both of these tables so that we can edit them. 13) Under the “Print/Download” at the top of the page, choose “Download” to start downloading the table. 7 14) When the download table opens, scroll down to the section with the “Database compatible (data rows only)” heading: a. Choose the Comma delimited (.txt). b. Uncheck the “Include descriptive data element names” box.. c. Before you download the data set you have created, you should click on Technical Documentation (PDF) link, and save the file to your working folder. This PDF file continas information about the SF1 Data, including how the data is collected and a Data Dictionary. d. Click “OK” to download the data file. 15) When the Save File dialog opens, Save the “output.zip” file to your \Data\Tables folder. 16) Browse to the folder you saved the output.zip file in and extract it to that folder. This should result in four files: 1. dt_dec_2000_sf1_u_data1.txt 2. dt_dec_2000_sf1_u_geo.txt 3. dt_readme.txt 4. readme_dec_2000_sf1.txt 8 Cleaning Up the Attribute File 17) Start Microsoft Excel and go to File>Open. 18) Browse to the folder that you extracted your attribute data to. 19) Change the “Files of Type” drop-down to “All Types” and Open the dt_dec_2000_sf1_u_data 1.txt file. 20) In the Text Import Wizard, change the ‘Original Data Type” to Delimited, then click Next. 21) Check the “Comma” radio button and click Finish 22) Click on the A column identifier at the top of the table. 23) Scroll to the right and Shift-Click on X to select all the columns from A to X. Right-Click on the X and select Delete. 24) Repeat the above step to remove columns B through AO. Column Header (Right-Click here to delete) 25) Repeat the above step for columns E through W. 26) In the Main Menu, select Save As. 27) Browse to the folder where you extracted the data (if it is not already there), and change the “Save as Type” drop-down to CSV (Comma Delimited). 28) Click Save and Yes when warned about changing the Data Type. 9 29) Close Excel and Click NO when prompted to save changes. Opening the Attribute Table in ArcMap 1) Return to ArcMap 2) Click the Add Data button and browse to the folder that you extracted the Census Attribute Data to. 3) Add the dt_dec_2000_sf1_u_data1.csv file to ArcMap. 4) Notice that your Table of Contents View changes to the Source Tab. This is because you have added data that has no explicit geographic display. Preparing the Key Field in the Tract Boundary file for Joining to the Attribute File Like many things the government does the census data is not perfect and must be altered before it can be joined. The problem is that the TRACT records in the two separate tables are not formatted in the same way. We will need to alter these records to provide ourselves with identically formatted and recorded records 1) Right-Click on the dt_dec_2000_sf1_u_data1.csv table and OPEN it. 2) Right-Click on the New_Haven_Tracts layer and Open the Attribute Table. 3) Scroll to the right in both tables and find the TRACT field in each one. 4) Note that in the dt_dec_2000_sf1_u_data1.txt table the TRACT entries are recorded with 6 significant places, while in the New_Haven_Tracts table the TRACT entries are recorded either with 4 or 6 significant places. The explanation for the way the TRACT number is recorded in the dt_dec_2000_sf1_u_data1.txt table lies in the way that the Census Bureau creates new Census Tracts in the face of increasing population. When a Census Tract becomes larger than the Census finds is appropriate, that Tract is typically ‘split’ into two Tracts. When this happens, the Tract Number is appended with a suffix of 01, 02, 03, etc… (Depending on the number of times a Tract has been split). 10 What we need to do is create a Field in one of the tables that matches exactly a Field in the other table. These two identical fields can then be used to join the two datasets. In this case, there is a field in the Tracts Boundary file called NAME, in which the Census Tracts are recorded with the suffixes appearing after a decimal point. This means that we can simply multiply this field by 100 to arrive a value that corresponds to the Tract values in the dt_dec_2000_sf1_u_data1.txt table. 5) In the New_Haven_Tracts Attribute Table, click on the Options button and select Add Field. 6) Name the new field KEY_TRACT, and give it the Type: LONG INTEGER. 11 7) Right-Click on the Field Header for the new KEY_TRACT field and Select Calculate Values. 8) In the Field Calculator, enter the following argument: 100 * (CDbl ( [NAME] )) In this case, the field NAME, in the New_Haven_Tracts table, is formatted as a STRING so that we must convert the STRING value in the field to a Double Value (Using the CDbl() function) before we can multiply it by 100. Click OK to calculate the new values for the KEY_TRACT field. 9) You can now close both Attribute Tables, if you have not already. Joining the Attribute Data to the Boundary File 1) Right-Click on the New_Haven_Tracts layer in the Table of Contents and Select Joins and Relates>Join… 2) Assign the values to the Join Data dialog box as shown below: a. b. c. d. Join attributes from a table …join will be based on: KEY_TRACT …the table to join…: dt_dec_2000_sf1_u_data1.csv …field in the table to base the join on: TRACT 3) Click OK to apply the Join. 4) Open the Attribute Table for the 12 New_Haven_Tracts layer and note the attribute data has now been joined to the boundary file. If we would like to make the attribute data a permanent part of the boundary file, we can do so by exporting the Joined layer to a new shapefile. 5) Right-Click on the New_Haven_Tracts layer and go to Data>Export Data… 6) Make sure Export:All features is selected. Browse to your working folder and name the Export shapefile New_Haven_Tracts_Attribs. Click OK and Select Yes when prompted to add the layer to your map layout. 7) Open the Attribute Table of the new layer and note that the New_Haven_Tracts prefix has been removed from the field names in the Attribute Table. This is because the data is no longer “joined,” but is now part of the shapefile we have created. Finally The procedure outlined here can be used to download any of the census boundary files and associate those boundaries with census attribute data, except in the case of census blocks. The census block boundary files can be downloaded from the ESRI Census Data Download website. ESRI provides the boundary and attribute data separately, just as the census does; however, the ESRI Census Data contains geographic identifiers that have already been altered to make them comparable for joining attributes to boundaries. Additional Suggested Reading: Brewer, C., and T. Suchan. Mapping Census 2000: The Geography of U. S. Diversity. ESRI Press, 2001. Bureau, U. S. C. Summary File 1 Technical Documentation: 2000 (2003) Bureau, U. S. C. Census of Population and Housing, Summary File 3: Technical Documentation (2000) Bureau, U. S. C. "Census 2000 Basics." Washington DC: US Government Printing Office (2002) 13 Downloading Census & Other Data from the ESRI Census Data Website 1. Open your Web Browser and go to http://arcdata.esri.com/data/tiger2000/t iger_download.cfm 2. Select Connecticut from the dropdown menu, or the image map. 3. On the resulting page, under “Select by County,” Select New Haven. 4. Click on “Submit Selection.” 5. On the resulting page, select the checkboxes next to the items: Census Blocks Census Block Demographics (SF1) 6. Scroll to the bottom of the page and Click on the “Proceed to Download” button. 7. Your dataset will be assembled into a single *.zip file and you will be presented with a new page. 8. Click on “Download File” button. Note: You may need to add esri.com 14 to your trusted sites list, just as you did in the first part of this tutorial. 9. When prompted, Browse to your C:\Temp\initials folder, create a new folder called ESRI_Census and save the file there. New Folder Button 10. Once the download has Completed, Browse to the folder where you saved the file. 11. UnZip the downloaded file and you should find the you have 2 new Zip files and a readme.html. 12. UnZip both of the resulting files into the same folder you have downloaded to. 13. You will now have four new file, three of which make up the Census Blocks Shapefile for New Haven County (tgr09009blk00.shp, etc…) and one which is the table containing the Census Block SF1 attribute data for the entire state of Connecticut (tgr09000sf1blk.dbf). Joining the Attribute File to the Boundary File 1. Open ArcMap, or click on the New Document Empty Map.” button to create a “New 15 2. Use the Add Data Button to Open the Add Data dialog. 3. Browse to the folder where you saved and unzipped the files from the ESRI Census site. 4. Hold down the Ctrl-key and select both files, as shown on the right. Click Add to add them to your Map Document. Note that you may be presented with a warning about the fact one of the layers you have added is missing spatial reference. This is because the Census Block shapefile you downloaded does not have a projection “explicitly defined.” This will not cause many problems in ArcGIS 9.1 and earlier because, when the coordinate values that record the positions of the points, lines and polygons of a shapefile fall within the normal range of Latitude (0 to 90 degrees) & Longitude (0 to 180 degrees) coordinates, these versions of ArcGIS assume that the shapefile is in Geographic Coordinate System (Lat/Lon) and will act as if the projection has been defined. This is no longer the case in ArcGIS 9.2, so you should get in the habit of defining the project for datasets, now, so that the lack of projection definition is not propagated through your collection of derivative shapefiles, as you subset and create new shapefiles from the initial file. 16 Defining the Spatial Coordinate System of Your Data 1. Right-click on the tgr09009blk00 layer and Open the Properties dialog. 2. Click on the Source Tab and note that the Coordinate System is GCS_Assumed_Geograph ic_1, and the Datum is NAD 1927. The “assumed” part of this item means that ArcGIS is assuming the projection information, based upon the range of the coordinates in the shapefile. 3. Close the Properties Dialog Box. 4. Open the ArcToolbox . 5. Click on the Search Tab at the bottom of the Arc Toolbox Panel. 6. Enter “define projection” as your search term and click the Search button. 7. Double click on the first item, which is the Define Projection Tool from the Data Management Toolbox. 8. Select the tgr09009blk00 layer as the Input Dataset. 9. Click on the Select Coordinate System button. 10. In the Browse for Coordinate System dialog, browse to Geographic Coordinate Systems>North America. 11. Select the North American Datum 1983.prj and Click Add. 17 12. Click OK on the Spatial Reference Properties dialog to apply the selection. 13. Click OK on the Define Projection Tool to apply the definition. In the Dialog box that shows the progress of the ArcToolbox tool application, you will likely see a warning that there is a “Datum Conflict between the Result and Map.” This is because the Map Document and the layers it contains can have different projections. ArcMap will usually successfully do an on-the-fly projection to overlay data properly. This is not always the case and, in fact, there is a particular NAD 1927 to NAD 1983 shift problem that causes many headaches. In the case of this Map Document, ArcMap Assumed the coordinate system was Lat/Lon with a Datum of NAD 1927. When the data layer was projected to NAD 1983, this Datum Shift problem triggered the warning message. You should redefine the Map Document Coordinate System, to avoid overlay problems. 14. Right-click on the Layers item at the top of the Table of Contents, and open the Properties dialog box. 15. Click on the Coordinate System Tab. 16. In the “Select a coordinate system” panel, browse to Predefined>Geographic Coordinate Systems>North America. 17. Select the same North American Datum 1983.prj that you chose for the layer definition. 18 18. Click OK to apply the change. Other Data Downloads and Preparation Skills 1. If you have not already, go to http://www.library.yale.edu/MapColl/gis_workshop_materials.html and download the Data file for the “Finding Data…” workshop to the C:\Temp folder you have been working in. 2. Unzip the downloaded file to the C:\Temp folder. Downloading from the Seamless Data Distribution Website The Seamless Data Distribution Website is maintained by the USGS and is the primary clearinghouse for raster data produced by the USGS, including orthoimagery, Digital Elevation data, mosaics of Landsat imagery and more. Once familiar with the interface for the Seamless site, you should be able to successfully navigate and use many other ArcIMS-based data sites, as they are based upon the same architecture. In this exercise you will download elevation data for the New Haven, CT area. 1. Go to the USGS Seamless Data Distribution Website at http://seamless.usgs.gov/website/seamless/ and click on the “View and Download United States Data,” link at the right side of the page. Wait for the Map to load. 19 2. By default, the Zoom Tool will be active. Use it to zoom into New Haven, CT. This may take several Zooms. 3. On the right side of the Seamless Application, look for the “Download” tab and activate it. Zoom Download Tab Define Download Area Elevation 4. Scroll down the list (noting the available downloads), find the Elevation Group and expand it (if not already) by clicking on the triangle next to the word Elevation. 5. Make sure the 1” NED item is checked, and that no other items are enabled. NED is the National Elevation Dataset, which is a raster dataset that describes the elevation at any given point, at several different resolutions (in this case ~30 meters). This type of data is commonly referred to as a Digital Elevation Model, or DEM. 20 6. Now click on the 1” NED layer name to open a description of the layer in a new window. 7. Enable the Define Download Area Tool , under the Downloads Menu and Drag a box across the city of New Haven to define area of interest (AOI). Note that the box will remain green unless the area you have defined becomes larger that the Seamless Site allows (you can request up to 1.6GB, in 100MB files at once). Once you release the mouse button, a new page will be opened in a new browser window (you may have to enable pop-ups for the site). This new window will provide some general info about the data you have requested (projection, bounding coordinates, cell size, etc…) as well as Download Link Buttons to begin downloading the data to your hard drive. 8. Click on the Download Button to post the data request to the server. Yet another window will be opened, which indicates the Current Status of your request on the server. 9. When the Save As… dialog box opens, browse to the \Data\Raster folder to save the file. 10. When the download is complete, browse to find the ZIP file you just downloaded, right-click on it and “Extract All.” 11. The data will be extracted to a new subfolder called something like “ned_70561049.” In the rest of this tutorial, this layer will be referred to as ned_#######. Applying an Appropriate Projection to You Data Data does not always come in a ‘ready to use’ format. One common necessity is to apply a “Projection” to the data, in order to transform the spatial reference from angular Lat/Long coordinates to planar/linear units, such as feet or meters. Here, 21 you will “project” your Digital Elevation Model to a projection that is appropriate for applying calculations that assume the data is recorded in linear units. 12. Open ArcMap. 13. Use the Add Data button to browse to the folder containing your elevation data. There should be two files there, one a polygon shapefile named “METADATA.shp,” (which contains the footprint of your elevation data, and its metadata), as well a raster layer with a name similar to the folder that contains it. 14. Select the ned_######## layer and add it to ArcMap. 15. Right-click on the ned_######## layer and Open its Properties Dialog box. 16. Select the Source Tab and scroll down to the Spatial Reference information. 17. Note that there is no Linear Unit assigned to this data. Scroll back up to the top and look at the Cellsize (X,Y) item. 18. This data has a Cellsize of about 30 meters, but here it is notes as 0.0002777777777999463. This is because the data has not been 22 projected and is currently spatially referenced in latitude & longitude coordinates, which are angular units of measurement. 19. Click OK to close the Properties Dialog. 20. Open the ArcToolbox and search for “Hillshade.” Open the Hillshade tool and use your ned_######## Layer as the Input Raster. Name the Output raster “hillshade01” and place it in the \Data\Raster folder. Change the Z factor option to 3 (this exaggerates the elevation for a better visual quality). Click OK to apply the tool. When the tool is finished running, you should see a new layer in your Map View window. However, the effect is has produced is not very attractive. The Hillshade layer we have produced is very dark, and the topography it has created seems far more “extreme” than we might have expected. These poor results are related to what we observed earlier in the Spatial Reference and Cellsize of our Digital Elevation Model. Creating a Hillshade involves calculations that assume that the input parameters being used are in linear units, rather than the angular units that we currently have. This same problem would be true if we were to calculate slope, aspect and many other mathematical operations we might want to apply to this elevation data. What is necessary is that we “Project” our dataset from the current Latitude & Longitude Coordinates, which locate features on the face of the oblate spheroid that is the earth, to a projection 23 that records our data in linear measurements, as if the earth were flat. 21. Return to ArcToolbox and Search for “Project Raster.” Open the tool and select your ned_####### layer as the Input raster, Browse to your \Data\Raster folder and save the Output raster as ned_proj. For the Output Coordinate System, Click the Properties Icon to open the Spatial Reference Properties Dialog Box. Click on the Select… Button and Browse to Projected Coordinate Systems>State Plane>Nad 83> NAD 1983 StatePlane Connecticut FIPS 0600.prj. Click Add. Click OK. 22. ArcToolbox adds the new layer to our Map Document. Right-Click on the new ned_proj layer and Open the Properties. Select the Source Tab and inspect the changes to the Cellsize and Linear Units items. 23. Use the Hillshade tool again, using the new projected elevation layer, to produce a new Hillshade Layer, called hillshade_02. Be sure to set the Z Factor to 3, like before. 24. You should find that you now have a much more pleasant looking result from the Hillshade Tool. Converting from the Interchange (.e00) File Format to Shapefile. Interchange (.e00) format is a legacy data format from the days of Arc/INFO, when coverages and grids were the default data type for GIS modeling. Interchange files were a way of ‘packaging’ coverages and grids, whose essential data were 24 distributed across more than one folder. While most GIS data is now being produced in, or has been converted to, shapefile, you will still encounter Interchange format files. ArcCatalog retains the tools necessary to convert from Interchange format. 1. Open ArcCatalog and go to View>Toolbars to enable the “ArcView 8x Tools” toolbar. You should see a toolbar appear with a single dropdown button labeled “Conversion Tools.” 2. Launch the “Import from Interchange File” tool from the Conversion Tools. 3. Browse to the \Data\Other folder and select the newhav.e00 file as your input file. 4. For the Output File, browse to the \Data\Shapefile folder and name the output file nh_wetlands. Click Save. 5. Before you apply the conversion, Click the Batch Button , and note that this tool provides the ability to convert multiple files at once. Click OK to apply the conversion. 6. In ArcCatalog, Browse to the \Data\Shapefile folder, and find the nh_wetlands coverage file. In the Catalog Tree window (on the left) expand the nh_wetlands coverage layer so that the four component layers are visible. We only want the polygon layer. 7. Right-click on the polygon component of the nh_wetlands coverage and select Export>To Shapefile (Single). Give the \Data\Shapefile folder as the Output Location, and name the Output Feature Class as New_Haven_NWI_Wetlands. Leave the remaining Options as their 25 default values. Click OK to apply the conversion. 8. Once the conversion is complete, you should see the New_Haven_NWI_Wetlands.shp appear in your \Data\Shapefiles folder. 26