Downloading from the Seamless Data Distribution Website

advertisement
The Yale Map Collection
GIS Workshop #3
Finding GIS Data & Preparing It for Use
Census Data Download and Import into ArcGIS
Goals of this Tutorial:


To ensure that you can download both cartographic and demographic
data from the U.S. Bureau of the Census in a form that can be imported
into ArcGIS.
To ensure that you know how to join attribute data to cartographic data.
1) First, if you are using Internet Explorer, you must add the census sites to
your trusted sites list to allow downloads and pop-ups for this exercise.
2) In Internet Explorer, go to Tools>Internet Options and click on the
Security Tab.
1
3) Highlight the green Trusted Sites icon, and click the Sites… button.
4) Uncheck the “Require server verification…” radio button.
5) Add census.gov and factfinder.census.gov to the Trusted Web Sites
list.
6) Click OK twice to exit the Security setting dialog.
Downloading and Pre-processing Census Tract Data from the
Internet
7) Create a directory called Census_Data_Download in your working
directory (it is always a good idea to place your working directory on the
C: Drive, and without spaces or special characters in the folder name).
8) Go to www.census.gov.
9) Click on “Geography”.
2
10) Click on “Census 2000 Geographic
Products”.
11) Click on “ Census 2000 Map Series ” under
the Census 2000 Maps heading.
12) Click on “BOUNDARY FILES.”
13) Click on the “Download Boundary Files”
icon on the left hand side.
14) Click on “2000” next to the words “Census
Tracts.”
(Note that you are provided with MANY
possible Census Boundary File choices
on this page. However, the census does
not provide access to the Block level
boundary file from the website, although
these can be obtained from the ESRI
Data & Maps CD Set, or from the ESRI
Download Website).
15) Scroll down to “Census 2000: Census
Tracts in ArcView Shapefile (.shp)
format.”
16) Click on “Connecticut - tr09_d00_shp.zip
(220,667 bytes).” When prompted, save this
file to your \Data\Shapefile folder.
17) Next navigate to the \Data\Shapefile folder,
right-click on the file you just downloaded
and “Extract All.”
18) Make sure you have the three files with the
extensions .dbf, .shx, .shp.
Subsetting the Census Boundary File to an
Area of Interest (AOI)
Census
Tracts
2000
The next step is to extract New Haven county
census tracts from Connecticut state data using
the FIPS (Federal Information Processing
Standard) code attribute.
3
1) Open ArcMap with a new empty map and add the Connecticut census
tracts to your map.
2) Save your map in your
Census_Data_Download directory
as “Census_Data_Download.mxd.”
3) Go to the FIPS PUB 6-4 Lookup
(http://www.census.gov/geo/www/fip
s/fips65/index.html) website to find
out which FIPS code is associated
with New Haven County (in this
case, its 009).
4) In the Main Menu of ArcMap, Go to
Selection>Select by Attributes.
5) Enter (or click on the appropriate Field name,
operator, and value) "COUNTY" = '009' in the
query dialog box. Click Apply and then Close.
6) The census tracts for New Haven County will be
highlighted.
7) Right click on tr09_d00 layer in the Table of Contents on the left and
Select Data>Export Data.
8) The Export: Selected Features option will
be the default, since there is an active
selection in the Map Document.
9) Browse to your working directory and Save
the Export Shapefile as
New_Haven_Tracts.shp.
4
10) Click OK.
11) Click Yes when prompted to add the new shapefile to the map as a layer.
12) Right-Click on the tr09_d00 layer and choose Remove to remove it from
the current project.
13) Click on the Full Extent
button to zoom into New Haven County.
14) Right-Click on New_Haven_Tracts layer and Open the Attribute Table.
15) Notice that you have 184 features. Notice that you have almost no
attribute data, other than numerical identifiers.
16) Notice that you do have a column called Tract. This is going to be our Key
Field that we use to join attribute data to our geography.
17) Save your map.
18) If your data looks like the picture below… Congratulations!, you got the
cartographic data that you need.
5
Click
Here
Downloading and Pre-Processing
Attribute Data
In this section we are going to download
attribute data that pertains to the population
that lives in the census tracts for which you
downloaded boundaries in the previous
section. The attribute data that we want is the
racial split of the population. After we
download the data we are going to edit it
Excel in order to import it into the GIS, and we
will do it in such a way that we can assign
each census tract its appropriate
population/race values.
1) Go to http://www.census.gov/
2) Click on “American FactFinder” on the
left side column.
Click
Here
3) Click on “get data” under Decennial
Census.
4) Make sure the radio
button for “Census
2000 Summary File 1
(SF 1) 100-Percent
Data” in the middle of
the page.
5) In the “Select from the
following” column in
the right side of the
page, select “Detailed
Tables.”
6) In the new page under
“Select a geographic
type” choose “Census
Tract” from the
Dropdown menu.
7) Select “Connecticut”
from the state
dropdown menu.
6
8) Choose “New Haven County” from the county dropdown menu.
9) Under “Select one or more geographic areas…” menu select “All
Census Tracts” and then click the “Add” button.
10) Highlight the first “Census Tract 0” item and click Remove to remove it
from the list. Click the “Next” button.
11) In the “Select one or more tables…” field choose table “P3. Race” and
click on “Add”. Then click “Show Result”
You now have a table of attributes
where each column is a census tract and
each row is a different racial attribute. The
page shows only the first 10 census
tracts, but you can scroll to the “next”
page if you want to see more. We won’t
be using all of the attributes (rows), but
we will need to download the entire table
and then cut out what we don’t need.
However, we are still missing one
important element. We need a common
identifier for each census tract that we
can use to join this table with our table
in the GIS.
12) Under “Options” on the top of the
page choose “Show Geographic
Identifiers”.
Now you see two tables of information
(if you scroll down). The first table shows
all of the geographic identifiers for each
tract, and the second table is the table of
racial attributes that we saw before. We
are going to download both of these
tables so that we can edit them.
13) Under the “Print/Download” at the
top of the page, choose
“Download” to start downloading
the table.
7
14) When the download table opens, scroll
down to the section with the
“Database compatible (data rows
only)” heading:
a. Choose the Comma delimited
(.txt).
b. Uncheck the “Include
descriptive data element
names” box..
c. Before you download the data
set you have created, you
should click on Technical
Documentation (PDF) link, and
save the file to your working
folder. This PDF file continas
information about the SF1 Data,
including how the data is
collected and a Data Dictionary.
d. Click “OK” to download the data file.
15) When the Save File dialog opens, Save the “output.zip” file to your
\Data\Tables folder.
16) Browse to the folder you saved the output.zip file in and extract it to that
folder.
This should result in four files:
1. dt_dec_2000_sf1_u_data1.txt
2. dt_dec_2000_sf1_u_geo.txt
3. dt_readme.txt
4. readme_dec_2000_sf1.txt
8
Cleaning Up the Attribute File
17) Start Microsoft Excel and go to File>Open.
18) Browse to the folder that
you extracted your
attribute data to.
19) Change the “Files of
Type” drop-down to “All
Types” and Open the
dt_dec_2000_sf1_u_data
1.txt file.
20) In the Text Import
Wizard, change the
‘Original Data Type” to
Delimited, then click Next.
21) Check the “Comma” radio button and click Finish
22) Click on the A column
identifier at the top of the
table.
23) Scroll to the right and
Shift-Click on X to select
all the columns from A to
X. Right-Click on the X
and select Delete.
24) Repeat the above step to
remove columns B
through AO.
Column Header
(Right-Click here
to delete)
25) Repeat the above step for
columns E through W.
26) In the Main Menu, select Save As.
27) Browse to the folder where you extracted the data (if it is not already
there), and change the “Save as Type” drop-down to CSV (Comma
Delimited).
28) Click Save and Yes when warned about changing the Data Type.
9
29) Close Excel and Click NO when prompted to save changes.
Opening the Attribute Table in ArcMap
1) Return to ArcMap
2) Click the Add Data
button and browse to the folder that you extracted
the Census Attribute Data to.
3) Add the dt_dec_2000_sf1_u_data1.csv file to ArcMap.
4) Notice that your Table of Contents View changes to the Source Tab. This
is because you have added data that has no explicit geographic display.
Preparing the Key Field in the Tract Boundary file for Joining to the
Attribute File
Like many things the government does the census data is not perfect and
must be altered before it can be joined. The problem is that the TRACT records
in the two separate tables are not formatted in the same way. We will need to
alter these records to provide ourselves with identically formatted and recorded
records
1) Right-Click on the dt_dec_2000_sf1_u_data1.csv table and OPEN it.
2) Right-Click on the New_Haven_Tracts layer and Open the Attribute
Table.
3) Scroll to the right in both tables and find the TRACT field in each one.
4) Note that in the dt_dec_2000_sf1_u_data1.txt table the TRACT entries
are recorded with 6 significant places, while in the New_Haven_Tracts
table the TRACT entries are recorded either with 4 or 6 significant places.
The explanation for the way the TRACT number is recorded in the
dt_dec_2000_sf1_u_data1.txt table lies in the way that the Census Bureau
creates new Census Tracts in the face of increasing population. When a Census
Tract becomes larger than the Census finds is appropriate, that Tract is typically
‘split’ into two Tracts. When this happens, the Tract Number is appended with a
suffix of 01, 02, 03, etc… (Depending on the number of times a Tract has been
split).
10
What we need to do is create a Field in one of the tables that matches
exactly a Field in the other table. These two identical fields can then be used
to join the two datasets. In this case, there is a field in the Tracts Boundary
file called NAME, in which the Census Tracts are recorded with the suffixes
appearing after a decimal point. This means that we can simply multiply this
field by 100 to arrive a value that corresponds to the Tract values in the
dt_dec_2000_sf1_u_data1.txt table.
5) In the New_Haven_Tracts Attribute Table, click on the Options button
and select Add Field.
6) Name the new field KEY_TRACT, and give it the Type: LONG INTEGER.
11
7) Right-Click on the Field Header for the new KEY_TRACT field and
Select Calculate Values.
8) In the Field Calculator, enter the following argument:
100 * (CDbl ( [NAME] ))
In this case, the field NAME, in the New_Haven_Tracts table, is
formatted as a STRING so that we must convert the STRING value in the
field to a Double Value (Using the CDbl() function) before we can multiply
it by 100.
Click OK to calculate the new values for the KEY_TRACT field.
9) You can now close both Attribute Tables, if you have not already.
Joining the Attribute Data to the Boundary File
1) Right-Click on the New_Haven_Tracts layer in the
Table of Contents and Select Joins and
Relates>Join…
2) Assign the values to the Join Data dialog box as
shown below:
a.
b.
c.
d.
Join attributes from a table
…join will be based on: KEY_TRACT
…the table to join…: dt_dec_2000_sf1_u_data1.csv
…field in the table to base the join on: TRACT
3) Click OK to apply the Join.
4) Open the Attribute Table for the
12
New_Haven_Tracts layer and note the attribute data has now been
joined to the boundary file.
If we would like to make the attribute data a permanent part of the boundary file,
we can do so by exporting the Joined layer to a new shapefile.
5) Right-Click on the New_Haven_Tracts layer and go to Data>Export
Data…
6) Make sure Export:All features is selected. Browse to your working
folder and name the Export shapefile New_Haven_Tracts_Attribs. Click
OK and Select Yes when prompted to add the layer to your map layout.
7) Open the Attribute Table of the new layer and note that the
New_Haven_Tracts prefix has been removed from the field names in the
Attribute Table. This is because the data is no longer “joined,” but is now
part of the shapefile we have created.
Finally
The procedure outlined here can be used to download any of the census
boundary files and associate those boundaries with census attribute data, except
in the case of census blocks. The census block boundary files can be
downloaded from the ESRI Census Data Download website. ESRI provides the
boundary and attribute data separately, just as the census does; however, the
ESRI Census Data contains geographic identifiers that have already been altered
to make them comparable for joining attributes to boundaries.
Additional Suggested Reading:




Brewer, C., and T. Suchan. Mapping Census 2000: The Geography of U.
S. Diversity. ESRI Press, 2001.
Bureau, U. S. C. Summary File 1 Technical Documentation: 2000 (2003)
Bureau, U. S. C. Census of Population and Housing, Summary File 3:
Technical Documentation (2000)
Bureau, U. S. C. "Census 2000 Basics." Washington DC: US Government
Printing Office (2002)
13
Downloading Census & Other Data from the ESRI
Census Data Website
1. Open your Web Browser and go to
http://arcdata.esri.com/data/tiger2000/t
iger_download.cfm
2. Select Connecticut from the dropdown menu, or the image map.
3. On the resulting page, under “Select
by County,” Select New Haven.
4. Click on “Submit Selection.”
5. On the resulting page, select the
checkboxes next to the items:
Census Blocks
Census Block Demographics (SF1)
6. Scroll to the bottom of the page and
Click on the “Proceed to Download”
button.
7. Your dataset will be assembled into a
single *.zip file and you will be
presented with a new page.
8. Click on “Download File” button.
Note: You may need to add esri.com
14
to your trusted sites list, just as you
did in the first part of this tutorial.
9. When prompted, Browse to your
C:\Temp\initials folder, create a new
folder called ESRI_Census and save
the file there.
New Folder
Button
10. Once the download has Completed,
Browse to the folder where you
saved the file.
11. UnZip the downloaded file and you should find the you have 2 new Zip
files and a readme.html.
12. UnZip both of the resulting files into the same folder you have downloaded
to.
13. You will now have four new file, three of which make up the Census
Blocks Shapefile for New Haven County (tgr09009blk00.shp, etc…) and
one which is the table containing the Census Block SF1 attribute data for
the entire state of Connecticut (tgr09000sf1blk.dbf).
Joining the Attribute File to the Boundary File
1. Open ArcMap, or click on the New Document
Empty Map.”
button to create a “New
15
2. Use the Add Data
Button to Open the Add
Data dialog.
3. Browse to the folder where
you saved and unzipped
the files from the ESRI
Census site.
4. Hold down the Ctrl-key
and select both files, as
shown on the right. Click
Add to add them to your
Map Document.
Note that you may be presented with a warning about the fact one of the layers you have added
is missing spatial reference. This is because the Census Block shapefile you downloaded does
not have a projection “explicitly defined.” This will not cause many problems in ArcGIS 9.1 and
earlier because, when the coordinate values that record the positions of the points, lines and
polygons of a shapefile fall within the normal range of Latitude (0 to 90 degrees) & Longitude (0 to
180 degrees) coordinates, these versions of ArcGIS assume that the shapefile is in Geographic
Coordinate System (Lat/Lon) and will act as if the projection has been defined. This is no longer
the case in ArcGIS 9.2, so you should get in the habit of defining the project for datasets, now, so
that the lack of projection definition is not propagated through your collection of derivative
shapefiles, as you subset and create new shapefiles from the initial file.
16
Defining the Spatial Coordinate System of Your Data
1. Right-click on the
tgr09009blk00 layer and
Open the Properties
dialog.
2. Click on the Source Tab
and note that the
Coordinate System is
GCS_Assumed_Geograph
ic_1, and the Datum is
NAD 1927. The
“assumed” part of this item
means that ArcGIS is
assuming the projection
information, based upon
the range of the
coordinates in the shapefile.
3. Close the Properties Dialog Box.
4. Open the ArcToolbox
.
5. Click on the Search Tab at the bottom of the Arc
Toolbox Panel.
6. Enter “define projection” as your search term and click
the Search button.
7. Double click on
the first item,
which is the Define Projection Tool
from the Data Management Toolbox.
8. Select the tgr09009blk00 layer as the
Input Dataset.
9. Click on the Select Coordinate System
button.
10. In the Browse for Coordinate System dialog, browse to Geographic
Coordinate Systems>North America.
11. Select the North American Datum 1983.prj and Click Add.
17
12. Click OK on the Spatial Reference Properties dialog to apply the selection.
13. Click OK on the Define Projection Tool to apply the definition.
In the Dialog box that shows the progress of the ArcToolbox tool application, you
will likely see a warning that there is a “Datum Conflict between the Result and
Map.” This is because the Map Document and the layers it contains can have
different projections. ArcMap will usually successfully do an on-the-fly projection
to overlay data properly. This is not always the case and, in fact, there is a
particular NAD 1927 to NAD 1983 shift problem that causes many headaches. In
the case of this Map Document, ArcMap Assumed the coordinate system was
Lat/Lon with a Datum of NAD 1927. When the data layer was projected to NAD
1983, this Datum Shift problem triggered the warning message. You should
redefine the Map Document Coordinate System, to avoid overlay problems.
14. Right-click on the Layers
item at the top of
the Table of Contents, and
open the Properties dialog box.
15. Click on the Coordinate System
Tab.
16. In the “Select a coordinate
system” panel, browse to
Predefined>Geographic
Coordinate Systems>North
America.
17. Select the same North
American Datum 1983.prj that
you chose for the layer
definition.
18
18. Click OK to apply the change.
Other Data Downloads and Preparation Skills
1. If you have not already, go to
http://www.library.yale.edu/MapColl/gis_workshop_materials.html and
download the Data file for the “Finding Data…” workshop to the C:\Temp
folder you have been working in.
2. Unzip the downloaded file to the C:\Temp folder.
Downloading from the Seamless Data Distribution Website
The Seamless Data Distribution Website is maintained by the USGS and is the
primary clearinghouse for raster data produced by the USGS, including orthoimagery, Digital Elevation data, mosaics of Landsat imagery and more. Once
familiar with the interface for the Seamless site, you should be able to
successfully navigate and use many other ArcIMS-based data sites, as they are
based upon the same architecture. In this exercise you will download elevation
data for the New Haven, CT area.
1. Go to the USGS Seamless Data Distribution Website at
http://seamless.usgs.gov/website/seamless/ and click on the “View and
Download United States Data,” link at the right side of the page. Wait
for the Map to load.
19
2. By default, the Zoom Tool
will be active. Use it to zoom into New
Haven, CT. This may take several Zooms.
3. On the right side of the Seamless Application, look for the “Download”
tab and activate it.
Zoom
Download
Tab
Define
Download
Area
Elevation
4. Scroll down the list (noting the available downloads), find the Elevation
Group and expand it (if not already) by clicking on the triangle next to the
word Elevation.
5. Make sure the 1” NED item is checked, and that no other items are
enabled.
NED is the National Elevation Dataset, which is a raster dataset that
describes the elevation at any given point, at several different resolutions
(in this case ~30 meters). This type of data is commonly referred to as a
Digital Elevation Model, or DEM.
20
6. Now click on the 1” NED layer name
to open a description of the layer in a
new window.
7. Enable the Define Download Area
Tool
, under the Downloads Menu
and Drag a box across the city of New
Haven to define area of interest
(AOI).
Note that the box will remain green
unless the area you have defined
becomes larger that the Seamless Site
allows (you can request up to 1.6GB,
in 100MB files at once).
Once you release the mouse button, a
new page will be opened in a new browser window (you may have to
enable pop-ups for the site). This new window will provide some general
info about the data you have requested (projection, bounding coordinates,
cell size, etc…) as well as Download Link Buttons to begin downloading
the data to your hard drive.
8. Click on the Download Button to post the data request to the server. Yet
another window will be opened, which indicates the Current Status of
your request on the server.
9. When the Save As… dialog box
opens, browse to the \Data\Raster
folder to save the file.
10. When the download is complete,
browse to find the ZIP file you just
downloaded, right-click on it and
“Extract All.”
11. The data will be extracted to a new
subfolder called something like
“ned_70561049.” In the rest of this
tutorial, this layer will be referred to as ned_#######.
Applying an Appropriate Projection to You Data
Data does not always come in a ‘ready to use’ format. One common necessity is
to apply a “Projection” to the data, in order to transform the spatial reference from
angular Lat/Long coordinates to planar/linear units, such as feet or meters. Here,
21
you will “project” your Digital Elevation Model to a projection that is appropriate
for applying calculations that assume the data is recorded in linear units.
12. Open ArcMap.
13. Use the Add Data button
to browse to the folder containing your
elevation data. There should be two files there, one a polygon shapefile
named “METADATA.shp,” (which contains the footprint of your elevation
data, and its metadata), as well a raster layer with a name similar to the
folder that contains it.
14. Select the ned_######## layer and add it to ArcMap.
15. Right-click on the ned_######## layer and Open its Properties Dialog
box.
16. Select the Source Tab and scroll down to the Spatial Reference
information.
17. Note that there is no Linear Unit assigned to this data. Scroll back up to
the top and look at the Cellsize (X,Y) item.
18. This data has a Cellsize of about 30 meters, but here it is notes as
0.0002777777777999463. This is because the data has not been
22
projected and is currently spatially referenced in latitude & longitude
coordinates, which are angular units of measurement.
19. Click OK to close the Properties Dialog.
20. Open the ArcToolbox
and search for “Hillshade.” Open the
Hillshade tool and use your ned_######## Layer as the Input Raster.
Name the Output raster
“hillshade01” and place it
in the \Data\Raster folder.
Change the Z factor
option to 3 (this
exaggerates the elevation
for a better visual quality).
Click OK to apply the tool.
When the tool is finished running, you should see a new layer in your Map
View window. However, the effect is has produced is not very attractive. The
Hillshade layer we have produced is very dark, and the topography it has
created seems far more “extreme” than we might have expected.
These poor results are related to what
we observed earlier in the Spatial
Reference and Cellsize of our Digital
Elevation Model. Creating a Hillshade
involves calculations that assume that
the input parameters being used are in
linear units, rather than the angular units
that we currently have. This same
problem would be true if we were to
calculate slope, aspect and many other
mathematical operations we might want
to apply to this elevation data. What is
necessary is that we “Project” our
dataset from the current Latitude &
Longitude Coordinates, which locate
features on the face of the oblate
spheroid that is the earth, to a projection
23
that records our data in linear measurements, as if the earth were flat.
21. Return to ArcToolbox and Search for “Project Raster.” Open the tool
and select your ned_####### layer as the Input raster, Browse to your
\Data\Raster folder and save the Output raster as ned_proj. For the
Output Coordinate System, Click the Properties Icon
to open the
Spatial Reference Properties Dialog Box. Click on the Select… Button
and Browse to Projected Coordinate Systems>State
Plane>Nad 83> NAD 1983 StatePlane Connecticut FIPS 0600.prj.
Click Add. Click OK.
22. ArcToolbox adds the
new layer to our Map
Document. Right-Click
on the new ned_proj
layer and Open the
Properties. Select the
Source Tab and inspect
the changes to the
Cellsize and Linear
Units items.
23. Use the Hillshade tool
again, using the new
projected elevation
layer, to produce a new Hillshade Layer, called hillshade_02. Be sure to
set the Z Factor to 3, like before.
24. You should find that you now have a
much more pleasant looking result
from the Hillshade Tool.
Converting from the Interchange (.e00)
File Format to Shapefile.
Interchange (.e00) format is a legacy data
format from the days of Arc/INFO, when
coverages and grids were the default data type for GIS modeling. Interchange
files were a way of ‘packaging’ coverages and grids, whose essential data were
24
distributed across more than one folder. While most GIS data is now being
produced in, or has been converted to, shapefile, you will still encounter
Interchange format files. ArcCatalog retains the tools necessary to convert from
Interchange format.
1. Open ArcCatalog and go to View>Toolbars to enable the “ArcView 8x
Tools” toolbar. You should see a toolbar appear with a single dropdown
button labeled “Conversion Tools.”
2. Launch the “Import from Interchange File” tool from the Conversion
Tools.
3. Browse to the \Data\Other folder and select the newhav.e00 file as your
input file.
4. For the Output File, browse to the \Data\Shapefile folder and name the
output file nh_wetlands. Click Save.
5. Before you apply the conversion, Click the Batch Button
, and
note that this tool provides the ability to convert multiple files at once.
Click OK to apply the conversion.
6. In ArcCatalog, Browse to the \Data\Shapefile folder, and find the
nh_wetlands coverage file. In the Catalog Tree window (on the left)
expand the nh_wetlands coverage layer so that the four component
layers are visible. We only want the polygon layer.
7. Right-click on the polygon component of the nh_wetlands coverage and
select Export>To Shapefile (Single). Give the \Data\Shapefile folder as
the Output Location, and name the Output Feature Class as
New_Haven_NWI_Wetlands. Leave the remaining Options as their
25
default values. Click OK to apply the conversion.
8. Once the conversion is complete, you should see the
New_Haven_NWI_Wetlands.shp appear in your \Data\Shapefiles
folder.
26
Download