Full count vs percent, classification, and classes

advertisement
Displaying your data and using
Classify
Exploring how to use the legend
classify command
When displaying data on a
map there are several
things you should be aware
of:
1. Since polygon sizes are
different many times
large areas simply have
larger numbers
2. Thus, Normalizing by
population can produce
different results
3. How you classify your
data can also
emphasize different
patterns
4. The number of classes
you use can add to
complexity
Default with total count 5 classes and
Here are the
contiguous 48 states
for Whites in the US
Natural Breaks
In part large states
end up at the highest
end of the category
This is using the
default “natural
breaks” which
probably isn’t the best
classification here
natural breaks classification
See Also: classification, Jenks' optimization
[cartography] A method of manual data classification that seeks to partition data into
classes based on natural groups in the data distribution. Natural breaks occur in the
histogram at the low points of valleys. Breaks are assigned in the order of the size of
the valleys, with the largest valley being assigned the first natural break.
Here I have changed to a
simpler 3 classes
Notice now the smaller
eastern states and large
but mostly low density
western states fall into the
lowest category
But the size of the
categories is quite a bit
different
Real Definition of Natural Breaks
Jenk’s Optimization: The method requires an iterative process. That is, calculations must be repeated using different breaks in
the dataset to determine which set of breaks has the smallest in-class variance. The process is started by dividing the ordered data
into groups. Initial group divisions can be arbitrary. There are four steps that must be repeated:
•Calculate the sum of squared deviations between classes (SDBC).
•Calculate the sum of squared deviations from the array mean (SDAM).
•Subtract the SDBC from the SDAM (SDAM-SDBC). This equals the sum of the squared deviations from the class means.
•After inspecting each of the SDBC, a decision is made to move one unit from the class with the largest SDBC toward the class
with the lowest SDBC.
•New class deviations are then calculated, and the process is repeated until the sum of the within class deviations reaches a
minimal value.[1][5]
5 Classes
3 Classes
What is it doing?
•
•
•
•
Not always clear.
In my opinion works better with remotely sensed data.
If data is logarithmic, then use a log or geometric classification.
Make sure your classification scheme reflects whatever you’re trying to do.
Equal Interval
Results of Equal Interval
Now we see the results of really big state and small one based on
population in equal sized classes
Geometric Progression
Since our data is
highly skewed to
the right, we
might want to try
a geometric
progression
Geometric Progression
Now the really small
states are really small
the middle size ones
have a larger range,
and the largest ones
have the largest range
Now do it by percent white
Percent White
Consider the
future of
Republicans.
Switch to 5 Classes to improve detail
Exploring for a Geometric Progression
Given the fairly
even
distribution of
the data there
doesn’t seem to
be anything
gained by going
to a geometric
progression
Further Explorations
• Now explore Hispanic and Black Populations
Final note and caution
• How you display your data can give quite
different answers
Download