Through this example, you will see how classifying data using

advertisement
ArcView Student Workbook
Backrounder: Lying with Maps
In the following examples you will see how classifying data using different classification
methods can produce very different maps. The various classification methods place data
into groups that have similar characteristics or values.
ArcView allows you to classify data using five different methods: natural breaks,
quantile, equal area, equal interval, and by standard deviations. Each of these methods
places the 'boundary' between the groups of data at different intervals.
To understand what is going on when you apply different classification methods to data,
look at the maps of Canada below every description. These maps illustrate how Canada's
population (by Census Division) can be represented using the different classification
methods. Think about how different all five classification methods are. Think about
which classification might be best which a particular set of data. Also, think about how
you could mis-represent data, such as population, by choosing one classification method
over another. This misrepresentation is used all the time in the media to re-enforce
points, convince people to take sides in arguments, and also to warp the way in which
people perceive reality.
It is difficult to say which method is best, it depends on the data and what you are trying
to map and what pattern you may be looking for.
Lying with Maps
Page 1
ArcView Student Workbook
Natural Breaks
Classification by natural breaks uses a complex statistical method to find groupings and
patterns in the data. The computer looks for jumps in the values. After the computer finds
the pattern it breaks the data up, so block groups having similar values are placed in the
same class. This is the default classification: when you first classify a theme using a
Graduated Colour method, this is the classification type that ArcView will use. This
method is usually best for single maps and not for comparing two different maps.
When Canada's population is mapped using the natural breaks classification, we can
clearly see that the highest populations occur close to the USA / Canada boarder, as well
around other major urban areas. This classification is a fairly accurate depiction of
Canada's population: the lighter areas are mostly wilderness and prairie, where relatively
few people live.
Lying with Maps
Page 2
ArcView Student Workbook
Quantile
In a quantile classification, the same number of features are assigned to each class. For
example, if you had a map of 35 cities, and you wanted to divide it into 5 classes, each
class would have 7 cities.
Quantile classifications are best used on datasets where there is a fairly even spread of
values. That is, the data should not group or cluster around particular values or
geographic locations.
The quantile classification method may be really simple to understand, but it is often the
most misleading. Data sets such as population counts can become highly distorted when
using the quantile classification method, as only a few places may be highly populated.
When working with population, this can usually be overcome by expressing population in
terms of density or percentage. When working with the actual population counts, it can
also be overcome by increasing the number of classes. You can see this effect when
Canada's population is mapped using this method: this map appears to show areas of
relatively high population in places such as Northern Ontario, Labrador, and the Rocky
Mountain. In these areas, the population is actually much lower than this map indicates.
Lying with Maps
Page 3
ArcView Student Workbook
Equal Area
In ArcView, this method of classification can only be used on polygon themes, as lines
and points do not possess an area. This method groups the polygons so that each class
covers approximately the same amount of geographic area. ArcView determines the total
area, divides this area by the number of classes, and then groups the polygon features into
classes accordingly. This method is not used very often, it is based more on the size of the
area that your are interested in rather than the actual data.
If the area of all the polygon features is approximately the same, the equal area
classification will look pretty much similar to the quantile classification. However, if the
polygon features have wildly varying areas, the classification will look vastly different
than the quantile classification.
An equal area classification also produces a distorted view of Canada's population.
Again, it shows Northern Ontario, the Yukon Territory and parts of the Northwest
Territories as having relatively high populations, when we know that this is not the case!
Lying with Maps
Page 4
ArcView Student Workbook
Equal Interval
This method classifies the data so each category or class has the same range. In other
words the difference between the high and low values is the same for every class. For
example, if someone wanted to classify some data which ranged from 0 to 100 (i.e. when
you are working with percentages), and the desired number of classes was 4, each class
would have a range of 25. The first class would run from 0 to 25, the second from 25 to
50, the third from 50 to 75 and the fourth from 75 to 100. (Why would 0-24, 25-49, 5074 and 75-100 be statistically more accurate?)
The population map produced by an equal area classification is much more accurate than
the previous two; however, it does severely under-represent the population across the
country. This is because the population of Canada's census divisions ranges from very
low (2,155) to very high (2,275,770). The mean population, by Census Division, is
99,639 - a value that is much closer to the lowest population value than it is to the
highest. Therefore, when the population is broken up into five ranges, each with an
interval of 454,723, most of the census divisions will fall into the lowest class - ranging
from 2,155 to 456,878. So, Canada appears to have a very low population across the
country.
Lying with Maps
Page 5
ArcView Student Workbook
Standard Deviation
Standard deviation is a measure of how statistically dispersed the data is. If the data is
spread out over a wide range, the standard deviation is quite large. If it is spread out over
a narrow range, the standard deviation is a smaller value.
When classifying data using the standard deviation method, ArcView finds the average
value (the mean), and then places class breaks above and below the mean at intervals of
either 1/4, 1/2, or 1 standard deviations until all the data values are contained within the
classes. Therefore, the data is classed based on how spread out the data is (or how much
their values vary from the mean).
Below, you can see a map of Canada's population, classified using the Standard
Deviation method. You can not criticize this map as not being accurate, it is just
illustrating how the Census Divisions vary from the mean.
Lying with Maps
Page 6
ArcView Student Workbook
Choosing a classification scheme:
To decide which scheme to use, you need to know how the data values are distributed
across their range. You can eye-ball the data and see what the distribution is like, you can
randomly select every 5th or 10th value and then place them in a table with selected
ranges and see how the data is distributed. The best way is too create a bar chart.
ArcView can do this for you.
 If the data is unevenly distributed (many features have the same or similar values,
and there are gaps between groups of values) use natural breaks.
 If the data is evenly distributed and you want to emphasize the difference between
features, use equal interval or standard deviation.
 If the data is evenly distributed and you want to emphasize the relative difference
between features, use quantile.
 NOTE: The following graphs represent the frequency that the values occur in the
data.
C
A
B
A - Unevenly distributed Data
B - Evenly distributed Data
C - Data with an Outlier.
Outliers, what are they?
When you graph data, you may find that you have few extremely high or low values.
These are called outliers and they can skew your class ranges, and hence the pattern on
your map. Especially if you are using equal interval or standard deviation. Natural Breaks
can isolate outliers. Outliers could be present due to errors in the data or they may be
anomalies based on a small data sample, or they may be completely valid. You can use
the null value to ignore them. You can create a class just for the outliers or you can place
them in the nearest class.
Lying with Maps
Page 7
ArcView Student Workbook
The ideal situation:
Frequency
Actual Values
The curved line represents the frequency that the values occur throughout the data. This
is an ideal situation and a normal distribution is created. The width between the bars
represent the ranges and the attempt to place an equal amount of values in each range!
A few final words:



Three to Six classes is the best class size to use.
You may want to adjust the actual values of the ranges to make them easier to
read. If you don't have to show exact numbers, change the labeling to the
maximum and minimum values.
If the numbers are not important you may want to leave them out and label the
ranges with words like high, medium and low. Make sure your legend title always
shows the unit of measure your data is in.
Lying with Maps
Page 8
ArcView Student Workbook
Your Turn……
Select a particular field from the world table demog.dbf or worldata.dbf (found in the O
drive under Student Apps/ Geo_Data/World…), for example you may want to select birth
rates (Bir_rate). Remember you must join these tables to a World Shapefile! You can use
a Shapefile with a lot of data already joined to it called Cntry95.shp found in the
ArcCanada folder. Create thematic maps (five of them) for each of the classifications
mentioned above. You don’t have to change anything, but the classification type. Include
a brief write up describing why each map is different.
Create one more map using any field of your choice and using any classification, create a
map that LIES, in other words create a map that doesn’t show the true pattern. The map
should twist or distort the truth in some way. You can even play with the choices in the
Legend Editor. Try changing the number of classes or the type of classification or even
the shading or the colours or even the way the numbers are ramped. One simple
restriction or rule to follow, is do not change the actual data in any way, the data must be
left alone. Finally, in a few sentences describe the lie or distortion.
Hand in the following:
1. Five thematic maps, each having a different classification.
2. A short write up comparing the five maps and how they are different.
3. One thematic map that distorts the truth.
4. A brief write up describing the distortion.
Lying with Maps
Page 9
Download