ArcView Student Workbook Backrounder: Lying with Maps In the following examples you will see how classifying data using different classification methods can produce very different maps. The various classification methods place data into groups that have similar characteristics or values. ArcView allows you to classify data using five different methods: natural breaks, quantile, equal area, equal interval, and by standard deviations. Each of these methods places the 'boundary' between the groups of data at different intervals. To understand what is going on when you apply different classification methods to data, look at the maps of Canada below every description. These maps illustrate how Canada's population (by Census Division) can be represented using the different classification methods. Think about how different all five classification methods are. Think about which classification might be best which a particular set of data. Also, think about how you could mis-represent data, such as population, by choosing one classification method over another. This misrepresentation is used all the time in the media to re-enforce points, convince people to take sides in arguments, and also to warp the way in which people perceive reality. It is difficult to say which method is best, it depends on the data and what you are trying to map and what pattern you may be looking for. Lying with Maps Page 1 ArcView Student Workbook Natural Breaks Classification by natural breaks uses a complex statistical method to find groupings and patterns in the data. The computer looks for jumps in the values. After the computer finds the pattern it breaks the data up, so block groups having similar values are placed in the same class. This is the default classification: when you first classify a theme using a Graduated Colour method, this is the classification type that ArcView will use. This method is usually best for single maps and not for comparing two different maps. When Canada's population is mapped using the natural breaks classification, we can clearly see that the highest populations occur close to the USA / Canada boarder, as well around other major urban areas. This classification is a fairly accurate depiction of Canada's population: the lighter areas are mostly wilderness and prairie, where relatively few people live. Lying with Maps Page 2 ArcView Student Workbook Quantile In a quantile classification, the same number of features are assigned to each class. For example, if you had a map of 35 cities, and you wanted to divide it into 5 classes, each class would have 7 cities. Quantile classifications are best used on datasets where there is a fairly even spread of values. That is, the data should not group or cluster around particular values or geographic locations. The quantile classification method may be really simple to understand, but it is often the most misleading. Data sets such as population counts can become highly distorted when using the quantile classification method, as only a few places may be highly populated. When working with population, this can usually be overcome by expressing population in terms of density or percentage. When working with the actual population counts, it can also be overcome by increasing the number of classes. You can see this effect when Canada's population is mapped using this method: this map appears to show areas of relatively high population in places such as Northern Ontario, Labrador, and the Rocky Mountain. In these areas, the population is actually much lower than this map indicates. Lying with Maps Page 3 ArcView Student Workbook Equal Area In ArcView, this method of classification can only be used on polygon themes, as lines and points do not possess an area. This method groups the polygons so that each class covers approximately the same amount of geographic area. ArcView determines the total area, divides this area by the number of classes, and then groups the polygon features into classes accordingly. This method is not used very often, it is based more on the size of the area that your are interested in rather than the actual data. If the area of all the polygon features is approximately the same, the equal area classification will look pretty much similar to the quantile classification. However, if the polygon features have wildly varying areas, the classification will look vastly different than the quantile classification. An equal area classification also produces a distorted view of Canada's population. Again, it shows Northern Ontario, the Yukon Territory and parts of the Northwest Territories as having relatively high populations, when we know that this is not the case! Lying with Maps Page 4 ArcView Student Workbook Equal Interval This method classifies the data so each category or class has the same range. In other words the difference between the high and low values is the same for every class. For example, if someone wanted to classify some data which ranged from 0 to 100 (i.e. when you are working with percentages), and the desired number of classes was 4, each class would have a range of 25. The first class would run from 0 to 25, the second from 25 to 50, the third from 50 to 75 and the fourth from 75 to 100. (Why would 0-24, 25-49, 5074 and 75-100 be statistically more accurate?) The population map produced by an equal area classification is much more accurate than the previous two; however, it does severely under-represent the population across the country. This is because the population of Canada's census divisions ranges from very low (2,155) to very high (2,275,770). The mean population, by Census Division, is 99,639 - a value that is much closer to the lowest population value than it is to the highest. Therefore, when the population is broken up into five ranges, each with an interval of 454,723, most of the census divisions will fall into the lowest class - ranging from 2,155 to 456,878. So, Canada appears to have a very low population across the country. Lying with Maps Page 5 ArcView Student Workbook Standard Deviation Standard deviation is a measure of how statistically dispersed the data is. If the data is spread out over a wide range, the standard deviation is quite large. If it is spread out over a narrow range, the standard deviation is a smaller value. When classifying data using the standard deviation method, ArcView finds the average value (the mean), and then places class breaks above and below the mean at intervals of either 1/4, 1/2, or 1 standard deviations until all the data values are contained within the classes. Therefore, the data is classed based on how spread out the data is (or how much their values vary from the mean). Below, you can see a map of Canada's population, classified using the Standard Deviation method. You can not criticize this map as not being accurate, it is just illustrating how the Census Divisions vary from the mean. Lying with Maps Page 6 ArcView Student Workbook Choosing a classification scheme: To decide which scheme to use, you need to know how the data values are distributed across their range. You can eye-ball the data and see what the distribution is like, you can randomly select every 5th or 10th value and then place them in a table with selected ranges and see how the data is distributed. The best way is too create a bar chart. ArcView can do this for you. If the data is unevenly distributed (many features have the same or similar values, and there are gaps between groups of values) use natural breaks. If the data is evenly distributed and you want to emphasize the difference between features, use equal interval or standard deviation. If the data is evenly distributed and you want to emphasize the relative difference between features, use quantile. NOTE: The following graphs represent the frequency that the values occur in the data. C A B A - Unevenly distributed Data B - Evenly distributed Data C - Data with an Outlier. Outliers, what are they? When you graph data, you may find that you have few extremely high or low values. These are called outliers and they can skew your class ranges, and hence the pattern on your map. Especially if you are using equal interval or standard deviation. Natural Breaks can isolate outliers. Outliers could be present due to errors in the data or they may be anomalies based on a small data sample, or they may be completely valid. You can use the null value to ignore them. You can create a class just for the outliers or you can place them in the nearest class. Lying with Maps Page 7 ArcView Student Workbook The ideal situation: Frequency Actual Values The curved line represents the frequency that the values occur throughout the data. This is an ideal situation and a normal distribution is created. The width between the bars represent the ranges and the attempt to place an equal amount of values in each range! A few final words: Three to Six classes is the best class size to use. You may want to adjust the actual values of the ranges to make them easier to read. If you don't have to show exact numbers, change the labeling to the maximum and minimum values. If the numbers are not important you may want to leave them out and label the ranges with words like high, medium and low. Make sure your legend title always shows the unit of measure your data is in. Lying with Maps Page 8 ArcView Student Workbook Your Turn…… Select a particular field from the world table demog.dbf or worldata.dbf (found in the O drive under Student Apps/ Geo_Data/World…), for example you may want to select birth rates (Bir_rate). Remember you must join these tables to a World Shapefile! You can use a Shapefile with a lot of data already joined to it called Cntry95.shp found in the ArcCanada folder. Create thematic maps (five of them) for each of the classifications mentioned above. You don’t have to change anything, but the classification type. Include a brief write up describing why each map is different. Create one more map using any field of your choice and using any classification, create a map that LIES, in other words create a map that doesn’t show the true pattern. The map should twist or distort the truth in some way. You can even play with the choices in the Legend Editor. Try changing the number of classes or the type of classification or even the shading or the colours or even the way the numbers are ramped. One simple restriction or rule to follow, is do not change the actual data in any way, the data must be left alone. Finally, in a few sentences describe the lie or distortion. Hand in the following: 1. Five thematic maps, each having a different classification. 2. A short write up comparing the five maps and how they are different. 3. One thematic map that distorts the truth. 4. A brief write up describing the distortion. Lying with Maps Page 9