02 Data Standardization, Classification S12

advertisement
DATA
STANDARDIZATION
and
CLASSIFICATION
Cartographic Design for GIS (Geog. 340)
Prof. Hugh Howard
American River College
STANDARDIZATION
STANDARDIZATION
• Normalization
• Transformation of raw data values to
different, more meaningful values
– To map densities instead of “raw” values
– To map proportions between variables
– To map other relationships between
variables
– To map statistical summaries
MAPPING DENSITY
• How much of a particular thing exists
within a given area
• Larger enumeration units often have
"more" of a particular thing
– Mapping density is not necessary if all
you want to do is show where “more” is
– Accounting for the varying sizes of
enumeration units can be more revealing
MAPPING DENSITY
Population/Area
“persons per square mile”
MAPPING DENSITY
Bushels/Area
“bushels per acre”
MAPPING PROPORTIONS
• Proportions represent the relationship
of a part to a whole
• Several ways to express proportions
– Quotient: 0.0-1.0
– Percentage: 0-100%
– Rate: 7 per 1,000
MAPPING PROPORTIONS
Persons 60 and Over
Persons 60 and Over/Total Persons*100
“percentage of seniors”
MAPPING PROPORTIONS
Non Grads/Total Population*100
“percentage of non grads”
MAPPING RELATIONSHIPS
• It is often revealing to show how two
variables are related (in a manner that
is not strictly proportional)
• Several ways to express relationships
– Quotient: 0.0-infinity
– Percentage: 0-infinity%
– Rate: 1,500 per 100
MAPPING RELATIONSHIPS
Females/Males
“ratio of females to males”
MAPPING RELATIONSHIPS
• It is often revealing to show how two
variables are related (in a manner that
is not strictly proportional)
• Several ways to express relationships
– Quotient: 0.0-infinity
– Percentage: 0-infinity%
– Rate: 1,500 per 100
MAPPING RELATIONSHIPS
Acres of Cropland/Population
“acres per 1,000 people”
MAPPING STAT. SUMMARIES
• Enumeration units can be represented
according to calculated statistics
– Median
– Mean (average)
– Standard Deviation, etc.
MAPPING STAT. SUMMARIES
Animation showing raw
and standardized values
(slow version)
Animation showing raw
and standardized values
(fast version)
STANDARDIZATION
• Transformation of raw data values to
different, more meaningful values
– Densities, Proportions, Relationships,
and Statistical Summaries
• In conjunction with data classification,
normalization allows us to craft our
message…
DATA
CLASSIFICATION
DATA CLASSIFICATION
• The act of organizing attribute values
into categories, or groups
• Can be qualitative or quantitative, and
based on any of the four
measurement scales
– Nominal
– Ordinal
– Interval
– Ratio
DATA CLASSIFICATION
Com m er ci al
NOM I NAL
(Z on i n g)
Resi d en t i al
I n d u st r i al
Poor
O RD I NAL
(Vi si b i l i t y)
Fai r
Good
2 .4 - 4 .7
I N T E R VA L
(Qu al i t y of L i fe)
4 .8 - 6 .3
6 .4 - 8 .6
0 - 500
RATI O
(Pop u l at i on )
501 - 1,000
1 ,0 0 1 - 1 ,5 0 0
DATA CLASSIFICATION
• One of the most interesting aspects of
thematic mapping
– One set of attribute values can yield
many different maps, depending on the
classification scheme
– The scheme you choose can strongly
influence how your map is perceived
DATA CLASSIFICATION
DATA CLASSIFICATION
• Animation showing population using
equal interval, quantile, and natural
breaks classification methods
DATA CLASSIFICATION
There is no “best” method
Certain methods are not well
suited to particular situations
DATA CLASSIFICATION
• How many classes should you use?
– Anywhere from 3 to 7
– 5 is probably optimal
– An odd # has a “middle” class
Difficult to differentiate
large numbers of tints
DATA CLASSIFICATION
• Animation showing agricultural sales
using 2, 4, and 6 classes
DATA CLASSIFICATION
DATA CLASSIFICATION
• Equal Interval
– Each class occupies an equal interval
along the number line, or histogram
TOWN
POPULATION
No gaps between classes
DATA CLASSIFICATION
• Advantages of Equal Interval
– Can be easy to understand and interpret
– Good for attributes that are normally
represented using uniform classes:
elevation, precipitation, temperature
0 – 20
21 – 40
41 – 60
61 – 80
81 – 100
DATA CLASSIFICATION
• Disadvantage of Equal Interval
*
DATA CLASSIFICATION
• *Considers distribution of data along a
number line (poor)
– Doesn't work well with skewed
distributions (can result in empty classes)
DATA CLASSIFICATION
• Quantile
– Each class contains the same (or
similar) number of attribute values
TOWN
POPULATION
4 classes: quartiles
5 classes: quintiles
6 classes: sextiles
Gaps between classes
DATA CLASSIFICATION
• Advantage of Quantile
– Ensures that a choropleth map will have
the same number of darkest polygons as
lightest, etc.
67 Counties
5 Classes
≈13 Counties
per Class
DATA CLASSIFICATION
• Disadvantage of Quantile
*
DATA CLASSIFICATION
• *Considers distribution of data along a
number line (poor)
– Doesn’t work well with skewed
distributions (one or two classes can
occupy the majority of the range)
DATA CLASSIFICATION
• Natural Breaks
– Each class contains clusters of attribute
values, and “natural” breaks between
TOWN
POPULATION
More subjective
Gaps between classes
DATA CLASSIFICATION
• Advantage of Natural Breaks
*
DATA CLASSIFICATION
• *Considers distribution of data along a
number line (very good)
– Considers how the data are distributed
along the number line; each
classification is “custom tailored”
– Works well with skewed data
distributions
DATA CLASSIFICATION
• Disadvantages of Natural Breaks
– Subjective, and results will differ
– More difficult to compare with other maps
– One or two classes can end up occupying
the majority of the data's range
DATA CLASSIFICATION
• Classification for map comparison
– Use the same method for all maps (if
possible)
– Equal interval with identical break values
often works best (shown here)
– Quantile can also work well
– By definition, natural breaks will result in
different classifications on different maps,
making comparison difficult
DATA
STANDARDIZATION
and
CLASSIFICATION
Cartographic Design for GIS (Geog. 340)
Prof. Hugh Howard
American River College
Download