DATA STANDARDIZATION and CLASSIFICATION Cartographic Design for GIS (Geog. 340) Prof. Hugh Howard American River College STANDARDIZATION STANDARDIZATION • Normalization • Transformation of raw data values to different, more meaningful values – To map densities instead of “raw” values – To map proportions between variables – To map other relationships between variables – To map statistical summaries MAPPING DENSITY • How much of a particular thing exists within a given area • Larger enumeration units often have "more" of a particular thing – Mapping density is not necessary if all you want to do is show where “more” is – Accounting for the varying sizes of enumeration units can be more revealing MAPPING DENSITY Population/Area “persons per square mile” MAPPING DENSITY Bushels/Area “bushels per acre” MAPPING PROPORTIONS • Proportions represent the relationship of a part to a whole • Several ways to express proportions – Quotient: 0.0-1.0 – Percentage: 0-100% – Rate: 7 per 1,000 MAPPING PROPORTIONS Persons 60 and Over Persons 60 and Over/Total Persons*100 “percentage of seniors” MAPPING PROPORTIONS Non Grads/Total Population*100 “percentage of non grads” MAPPING RELATIONSHIPS • It is often revealing to show how two variables are related (in a manner that is not strictly proportional) • Several ways to express relationships – Quotient: 0.0-infinity – Percentage: 0-infinity% – Rate: 1,500 per 100 MAPPING RELATIONSHIPS Females/Males “ratio of females to males” MAPPING RELATIONSHIPS • It is often revealing to show how two variables are related (in a manner that is not strictly proportional) • Several ways to express relationships – Quotient: 0.0-infinity – Percentage: 0-infinity% – Rate: 1,500 per 100 MAPPING RELATIONSHIPS Acres of Cropland/Population “acres per 1,000 people” MAPPING STAT. SUMMARIES • Enumeration units can be represented according to calculated statistics – Median – Mean (average) – Standard Deviation, etc. MAPPING STAT. SUMMARIES Animation showing raw and standardized values (slow version) Animation showing raw and standardized values (fast version) STANDARDIZATION • Transformation of raw data values to different, more meaningful values – Densities, Proportions, Relationships, and Statistical Summaries • In conjunction with data classification, normalization allows us to craft our message… DATA CLASSIFICATION DATA CLASSIFICATION • The act of organizing attribute values into categories, or groups • Can be qualitative or quantitative, and based on any of the four measurement scales – Nominal – Ordinal – Interval – Ratio DATA CLASSIFICATION Com m er ci al NOM I NAL (Z on i n g) Resi d en t i al I n d u st r i al Poor O RD I NAL (Vi si b i l i t y) Fai r Good 2 .4 - 4 .7 I N T E R VA L (Qu al i t y of L i fe) 4 .8 - 6 .3 6 .4 - 8 .6 0 - 500 RATI O (Pop u l at i on ) 501 - 1,000 1 ,0 0 1 - 1 ,5 0 0 DATA CLASSIFICATION • One of the most interesting aspects of thematic mapping – One set of attribute values can yield many different maps, depending on the classification scheme – The scheme you choose can strongly influence how your map is perceived DATA CLASSIFICATION DATA CLASSIFICATION • Animation showing population using equal interval, quantile, and natural breaks classification methods DATA CLASSIFICATION There is no “best” method Certain methods are not well suited to particular situations DATA CLASSIFICATION • How many classes should you use? – Anywhere from 3 to 7 – 5 is probably optimal – An odd # has a “middle” class Difficult to differentiate large numbers of tints DATA CLASSIFICATION • Animation showing agricultural sales using 2, 4, and 6 classes DATA CLASSIFICATION DATA CLASSIFICATION • Equal Interval – Each class occupies an equal interval along the number line, or histogram TOWN POPULATION No gaps between classes DATA CLASSIFICATION • Advantages of Equal Interval – Can be easy to understand and interpret – Good for attributes that are normally represented using uniform classes: elevation, precipitation, temperature 0 – 20 21 – 40 41 – 60 61 – 80 81 – 100 DATA CLASSIFICATION • Disadvantage of Equal Interval * DATA CLASSIFICATION • *Considers distribution of data along a number line (poor) – Doesn't work well with skewed distributions (can result in empty classes) DATA CLASSIFICATION • Quantile – Each class contains the same (or similar) number of attribute values TOWN POPULATION 4 classes: quartiles 5 classes: quintiles 6 classes: sextiles Gaps between classes DATA CLASSIFICATION • Advantage of Quantile – Ensures that a choropleth map will have the same number of darkest polygons as lightest, etc. 67 Counties 5 Classes ≈13 Counties per Class DATA CLASSIFICATION • Disadvantage of Quantile * DATA CLASSIFICATION • *Considers distribution of data along a number line (poor) – Doesn’t work well with skewed distributions (one or two classes can occupy the majority of the range) DATA CLASSIFICATION • Natural Breaks – Each class contains clusters of attribute values, and “natural” breaks between TOWN POPULATION More subjective Gaps between classes DATA CLASSIFICATION • Advantage of Natural Breaks * DATA CLASSIFICATION • *Considers distribution of data along a number line (very good) – Considers how the data are distributed along the number line; each classification is “custom tailored” – Works well with skewed data distributions DATA CLASSIFICATION • Disadvantages of Natural Breaks – Subjective, and results will differ – More difficult to compare with other maps – One or two classes can end up occupying the majority of the data's range DATA CLASSIFICATION • Classification for map comparison – Use the same method for all maps (if possible) – Equal interval with identical break values often works best (shown here) – Quantile can also work well – By definition, natural breaks will result in different classifications on different maps, making comparison difficult DATA STANDARDIZATION and CLASSIFICATION Cartographic Design for GIS (Geog. 340) Prof. Hugh Howard American River College