Maps and Time Series Stat 579 Heike Hofmann Outline • Quick review: melting • Casting - the opposite of melting • Maps: polygons, chloropleth Warm-up • Start R and load data ‘fbi’ from http://www.hofroe.net/stat579/crimes-2012.csv • This data set contains number of crimes by type for each state in the U.S. • Investigate which states have the highest number of crimes (almost independently of type) • Pick one state and crime type and plot a time series Reshaping Data • Two step process: • get data into a “convenient” shape, i.e. melt • cast data into new shape(s) that are cast one that is particularly flexible better suited for analysis melt.data.frame(data, id.vars, measure.vars, na.rm = F, ...)" key X1 molten form “long & skinny” • id.vars: all identifiers (keys) and qualitative variables X2 X3 • measure.vars: all quantitative variables original data id.vars X4 key X1 X2X3X4X5 measure.vars X5 Casting • Function cast dcast(dataset, rows ~ columns, aggregate) columns rows aggregate(data) Data aggregation sometimes is just a transformation Then, cast • Row variables, column variables, and a summary function (sum, mean, max, etc) • dcast(molten, • dcast(molten, row ~ col, summary)" • dcast(molten, • dcast(molten, row ~ . , summary)" row1 + row2 ~ col, summary)" . ~ col, summary) Casting • Using dcast: • find the number of all offenses in 2009 • find the number of offenses by type of crime • find the number of all offenses by state What is a map? 43.5 43.0 Set of points specifying latitude and longitude lat 42.5 42.0 41.5 41.0 40.5 -96 -95 -94 -93 -92 -91 long 43.5 42.5 lat Polygon: connect dots in correct order 43.0 42.0 41.5 41.0 40.5 -96 -95 -94 long -93 -92 -91 What is a map? 40 lat 35 Polygon: connect only the correct dots 30 -95 -90 long -85 Grouping • Use parameter group to connect the “right” dots (need to create grouping sometimes) qplot(long, lat, geom="point", data=states) 40 40 lat 45 lat 45 35 35 30 30 -120 -110 -100 -90 -80 long -70 -120 -110 -100 -90 -80 -70 long qplot(long, lat, geom="path", data=states, group=group) qplot(long, lat, geom="polygon", data=states, group=group, fill=region) 45 45 40 40 lat 35 lat lat 30 35 35 40 45 30 30 -120 -110 -100 -90 long -80 -70 -120 -110 -100 -90 -80 -70 long qplot(long, lat, geom="polygon", data=states.map, fill=lat, group=group) Practice • Using the maps package, pull out map data for all US counties counties <- map_data(“county”) • Draw a map of counties (polygons & path geom) • Colour all counties called “story” • Advanced: What county names are used often? Merging Data • Merging data from different datasets: merge(x, y, by = intersect(names(x), names(y))," by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all," sort = TRUE, suffixes = c(".x",".y"), incomparables = NULL, ...)" e.g.: states.fbi <- merge(states, fbi.cast, by.x="", by.y="Abbr") Merging Data • Merging data from different datasets: region X1 alabama ... ... reg X1 X2 X3 ion alabama alabama alabama X2 region alabama alabama alabama ... ... ... X3 Practice • Merge the fbi crime data and the map of the States • Plot Chloropleth maps of crimes. • Describe the patterns that you see. ! • Advanced: try to cluster the states according to crime rates (use hclust)