Function writing, more advanced plots in R stat 480 Outline • some more function writing: conditional expressions • more advanced plots in R Basic Structure • Name • Input arguments • names, • default values • Body • Output values A first function mean <- function(x) { return(sum(x)/length(x)) } ! mean(1:15) mean(c(1:15, NA)) ! mean <- function(x, na.rm=F) { if (na.rm) x <- na.omit(x) return(sum(x)/length(x)) } ! mean(1:15) mean(c(1:15, NA), na.rm=T) Function mean • Name: • Input arguments mean x, na.rm=T! • names, • default values • Body • Output values if(na.rm) x <- na.omit(x)! return(sum(x)/length(x)) Function Writing • Start simple, then extend • Test out each step of the way • Don’t try too much at once •help(browser) Practice • For the Batting data in the Lahman package: • Write a function called mba input: playerID output: life-time batting average for playerID • what does mba(“bondsba01”)do? • write a function called pstats input: playerID output: life-time batting average for playerID & number of overall at bats Checkpoint • Submit all of your code for the last Your Turn at http://heike.wufoo.com/forms/check-point/ Let’s explore Google’s flu trends • https://www.google.org/flutrends/us/ ! • We can read this in R: • library(RCurl) x <- getURL(“https://www.google.org/flutrends/us/“) read.csv(text=x) Practice • For the flu trend data: • melt the data (library(reshape2)) for all state flu trends into a single variable. Keep the date variable. • Plot the flu trends since the beginning of this year, facet by state. Practice • For the flu trend data: • write a function fitflu that fits a linear model (lm) of the number of flu cases in Date and returns the slope of this fit. • try the function out on the last five observations for Iowa • use dplyr routines to get the flu trends for all states What is a map? 43.5 43.0 Set of points specifying latitude and longitude lat 42.5 42.0 41.5 41.0 40.5 -96 -95 -94 -93 -92 -91 long 43.5 42.5 lat Polygon: connect dots in correct order 43.0 42.0 41.5 41.0 40.5 -96 -95 -94 long -93 -92 -91 What is a map? 40 lat 35 Polygon: connect only the correct dots 30 -95 -90 long -85 Grouping • Use parameter group to connect the “right” dots (need to create grouping sometimes) qplot(long, lat, geom="point", data=states) 40 40 lat 45 lat 45 35 35 30 30 -120 -110 -100 -90 -80 long -70 -120 -110 -100 -90 -80 -70 long qplot(long, lat, geom="path", data=states, group=group) qplot(long, lat, geom="polygon", data=states, group=group, fill=region) 45 45 40 40 lat 35 lat lat 30 35 35 40 45 30 30 -120 -110 -100 -90 long -80 -70 -120 -110 -100 -90 -80 -70 long qplot(long, lat, geom="polygon", data=states.map, fill=lat, group=group) Practice • Using the maps package, pull out map data for all US counties counties <- map_data(“county”) • Draw a map of counties (polygons & path geom) • Colour all counties called “story” • Advanced: What county names are used often? Merging Data • Merging data from different datasets: merge(x, y, by = intersect(names(x), names(y)),! by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,! sort = TRUE, suffixes = c(".x",".y"), incomparables = NULL, ...)! e.g.: flumap <- merge(states, flu.states, by="region") Merging Data • Merging data from different datasets: region X1 alabama ... ... reg X1 X2 X3 ion alabama alabama alabama X2 region alabama alabama alabama ... ... ... X3 Flu trends across the US