Maps and Time Series Stat 579
 Heike Hofmann

advertisement
Maps and Time Series
Stat 579
Heike Hofmann
Outline
• Quick review: melting
• Casting - the opposite of melting
• Maps: polygons, chloropleth
Warm-up
• Start R and load data ‘fbi’ from http://www.hofroe.net/stat579/crimes-2012.csv
• This data set contains number of crimes by type
for each state in the U.S.
• Investigate which states have the highest number
of crimes (almost independently of type)
• Pick one state and crime type and plot a time
series
Reshaping Data
• Two step process: • get data into a “convenient” shape, i.e.
melt
• cast data into new shape(s) that are
cast
one that is particularly flexible
better suited for analysis
melt.data.frame(data, id.vars, measure.vars,
na.rm = F, ...)"
key
X1
molten form
“long & skinny”
• id.vars: all identifiers (keys) and
qualitative variables X2
X3
• measure.vars: all quantitative
variables
original data
id.vars
X4
key
X1 X2X3X4X5
measure.vars
X5
Casting
• Function cast
dcast(dataset, rows ~ columns, aggregate)
columns
rows
aggregate(data)
Data aggregation sometimes is just a transformation
Then, cast
• Row variables, column variables, and a summary
function (sum, mean, max, etc)
• dcast(molten,
• dcast(molten,
row ~ col, summary)"
• dcast(molten,
• dcast(molten,
row ~ . , summary)"
row1 + row2 ~ col,
summary)"
. ~ col, summary)
Casting
• Using dcast:
• find the number of all offenses in 2009
• find the number of offenses by type of crime
• find the number of all offenses by state
What is a map?
43.5
43.0
Set of points specifying
latitude and longitude
lat
42.5
42.0
41.5
41.0
40.5
-96
-95
-94
-93
-92
-91
long
43.5
42.5
lat
Polygon: connect dots in
correct order
43.0
42.0
41.5
41.0
40.5
-96
-95
-94
long
-93
-92
-91
What is a map?
40
lat
35
Polygon: connect only
the correct dots
30
-95
-90
long
-85
Grouping
• Use parameter group to connect the
“right” dots (need to create grouping
sometimes)
qplot(long, lat, geom="point", data=states)
40
40
lat
45
lat
45
35
35
30
30
-120
-110
-100
-90
-80
long
-70
-120
-110
-100
-90
-80
-70
long
qplot(long, lat, geom="path", data=states, group=group)
qplot(long, lat, geom="polygon", data=states, group=group, fill=region)
45
45
40
40
lat
35
lat
lat
30
35
35
40
45
30
30
-120
-110
-100
-90
long
-80
-70
-120
-110
-100
-90
-80
-70
long
qplot(long, lat, geom="polygon", data=states.map, fill=lat, group=group)
Practice
• Using the maps package, pull out map data for all
US counties
counties <- map_data(“county”)
• Draw a map of counties (polygons & path geom)
• Colour all counties called “story”
• Advanced: What county names are used often?
Merging Data
• Merging data from different datasets:
merge(x, y, by = intersect(names(x), names(y)),"
by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,"
sort = TRUE, suffixes = c(".x",".y"), incomparables = NULL, ...)"
e.g.:
states.fbi <- merge(states, fbi.cast,
by.x="", by.y="Abbr")
Merging Data
• Merging data from different datasets:
region
X1
alabama
...
...
reg X1 X2 X3
ion
alabama
alabama
alabama
X2
region
alabama
alabama
alabama
...
...
...
X3
Practice
• Merge the fbi crime data and the map of the
States
• Plot Chloropleth maps of crimes.
• Describe the patterns that you see.
!
• Advanced: try to cluster the states according to
crime rates (use hclust)
Download