Lab 7: Spatial Analysis I: Classification

advertisement
Lab 7: Spatial Analysis I: Classification
Note: This laboratory covers material in chapters 6 and 7.
Cartographic classification. Cartographic features are frequently classified so that
patterns in the data can be visualized. Classification methods are statistical techniques for
placing individual cases into groups called classes. Maps of polygons that are classified
are called choropleth maps. Choropleth maps are frequently found in atlases and
newspapers where they are used to portray information about areas and their subsets, e.g.,
countries and states. You will explore several automated classification techniques in
ArcView. During this lab, we will consider ways to select the best classification variable
for identifying spatial patterns. We will also look at some data error problems.
Census data and spatial analysis. The U.S. Census is a major source of demographic
data in the United States. Demographic data is data about the distribution of people. The
census has created nested spatial units that contain increasingly smaller areas. The data
that we will work with are spatial units called census tracts and are low-resolution
expressions of demographic information. The highest resolution units publicly available
are blockgroups which nest hierarchically inside of the tracts. Tract areas are defined by
population numbers. Areas of high population density have small tracts while areas of
low population tend to have large tracts. You will see these patterns as you look at the
tracts surrounding the Atlanta metropolitan area.
Data Requirements for Classification
In this laboratory we will explore the data requirements for classification. Because we are
classifying polygons of varying size based on their data values, we must factor out the
effect of size on the classified variable. For example a very large polygon should have
many more people living in it than a very small polygon. To examine the differences in
population structure, we look at a variable that is a ratio of population to area called
population density. It is very important to remember that only area-normalized data or
other ratio data can be used in a choropleth map. Start ArcMap. Copy the data folder for
this lab into your own working directory. Add the atlanta.shp layer from your working
directory. This theme is composed of demographic data collected during the 1990 census.
Open the table for the Atlanta layer and examine the fields that are available in this data.
Question: List all of the fields (attributes) that are available in the atlanta.shp table.
Look at the Fields labeled Pop_90, Pop_93, Pop_98, and Pop_growth.
Question: Is the Pop_growth field related to Pop_90, Pop_93, and Pop_98? What do
the values in the Pop_growth field mean? How were they calculated?
Open the Properties window for the Atlanta layer and select the Symbology tab. Show
the data as Graduated Colors (under Quantities -> Graduated Colors). Set the
Classification Field to Pop_90 and Normalize by Sq_miles. Select a color ramp of your
choice. Apply the changes, but do not close the window. Please do not use the Classify
button yet - we will accept default classification.
Question: What attribute are you mapping when you normalize population with
area? What do you think might be wrong about mapping raw population values
without normalization?
Question: Why do you suppose the lowest class has a negative value for Population
/Area?
Since there appears to be a negative value somewhere in the Pop_90 data field which is
skewing the classification we will need to do some data manipulations. Click on the
Classify button. The Classification window will open. This window provides
information on the data being classified, and has several options for you to modify the
classification. For instance, you can change the classification method, number of classes,
breaks between classes, and you can exclude certain data from classification.
We will now remove the negative values from your classification. Click on the
Exclusion button. The Data Exclusion Properties window will open. This window
allows you to write logical queries in order to select particular records from your data set.
Because these are logical expressions it is very important to have all of the parameters in
the correct locations. Incorrect expressions cannot be processed. The query builder
window contains a scrolling list of Fields on the left and a scrolling list of Values on the
right. In between the lists are the automatic logical operators that are available to you. In
this window, write a query to select all of the records containing a value of –99 in the
Pop_90 field. To write a query you will double click on a field, select a logical operator
and then double click on a value. A sample query is shown below.
Click OK and look at the new statistics in the Classification window. When you are
done, click OK and Apply the changes.
Question: Describe what effect this query has had on your classification.
Switch to layout view. Copy and paste your existing data frame so that you have a
second copy (recall that we did this in lab 2). You will want to re-size the data frames so
that you can see both of them at the same time.
Change the classification on the second data frame so that it is showing the Pop_90 field
but is not normalized.
Question: Compare your classified population map, normalized by area, to your
map that is not normalized. Which is more meaningful? Write a short analysis of
the population patterns in the Atlanta urban area.
The variable that you created through classification was a population density variable.
Another way to make such a variable is to create a new field and calculate the density
value as a ratio of population to area. In order to do this you will need to edit the data.
Open the attribute table for the Atlanta.shp layer in your new data frame (the one that is
not normalized).
Click on the Options -> Add Field button.
Add a new field called Popden. It should be type double, with precision of 13 and scale
of 6.
Now start editing. Right click on the field header for your new Popden field. Select
“Calculate Values.”
The field calculator will open. Here you will write a mathematical expression in which
attribute names stand for the field values. Double click on the Field name Pop_90.
Double click on the Request / (divide by). Double click on the Field name Sq_miles.
Your Field Calculator should look like this:
Select OK. Save your edits and stop editing. Now classify your data using the Symbology
tab in the Properties window. Let the Classification Field be popden and select <None>
for the Normalization field.
Question: Is there a difference between the two approaches to normalizing
population data by area? What would have happened if you had used a Pop_98/
Sq_miles ratio?
Cartographic Classification
We will now explore different types of classification available in ArcView. ArcView
provides five approaches to classification: Natural Breaks (which is the default), Equal
Interval, Quantile, and Standard Deviation. All of these classification approaches are
based on the structure of a data variable. A Natural Breaks approach is based upon the
histogram of a data variable with counts in the y-axis and values in the x-axis. A simple
histogram appears below.
In the histogram we can see a Natural Break about halfway along the x-axis. This
classification approach assumes that humans can intuitively find such breaks in a
histogram, and design classes accordingly. ArcView uses an algorithm to statistically
optimize the Natural Breaks approach. This is a nice approach because it is intuitively
easy to understand. It can be inappropriate when some classes in a histogram have many
counts and others have few. For explanations of the other 3 classification approaches use
the ArcGIS desktop help index and type in “classification.” The page on standard
classification schemes is brief, but should be sufficient.
At the beginning of Exercise 1 we said that choropleth data must be ratio data. Look now
at your data table for Atlanta.
Question: List all of the attributes for your data table. State whether you think each
attribute is a ratio, is not a ratio, or you are not sure.
Decide on a ratio variable field from your table which you will use in this exercise. You
may not use the popden field that you created.
Question: What is the name of the attribute you have chosen? How do you know
that it is a ratio variable?
Now you will copy your existing data frame so that you have a total of four data frames
with the same atlanta.shp data.
Now you will classify your variable using each of the classification methods. You will
print one map with the four different classifications to turn in with your lab. Make
certain that you use layout, or add text to your view so that your TA or instructor
can tell which classification method you used for each map.
Open the Properties window for the Atlanta layer in the first data frame. Select the
Symbology tab. Click on the Classify button. The Classification window will pop up.
Pull down the Classification Method menu, and set the method to Equal interval. Notice
what has happened to the histogram in the classification window. Click OK to close the
Classification window. In the Properties window, click OK to apply the changes and
close the window. Label this data layer using the text tool. Be sure to include the
variable being classified.
Repeat this technique in your remaining data frames for the Quantile, Natural Breaks, and
Standard Deviation classification methods. Print your map to turn in.
Question: What do the histograms in the Classification window show? How do
they change with the different classifications?
Question: Write a discussion of the differences and similarities between your maps.
Do different approaches create different patterns? Do you see a spatial pattern in
your variable? If so, what does the pattern mean? Which classification method do
you like best for your variable?
Conclusion
In this lab we have explored some of the database issues related to spatial classification.
We have seen that when small errors or irregularities occur in a data table, extreme errors
can be produced through mathematical manipulations. In order to use mathematical or
statistical methods it is very important to develop the ability to spot problems in the data,
diagnose, and solve them. In addition, we experimented with creating some simple
choropleth maps. Classification is a major area of cartography, and has been covered here
very briefly. Further experimentation with the classification methods in ArcView can
help to develop a deeper understanding of the effects of classification.
Download