Function writing, more advanced plots in R stat 480

advertisement
Function writing, more
advanced plots in R
stat 480
Outline
• some more function writing: conditional expressions
• more advanced plots in R
Basic Structure
• Name
• Input arguments
• names, • default values
• Body
• Output values
A first function
mean <- function(x) {
return(sum(x)/length(x))
}
!
mean(1:15)
mean(c(1:15, NA))
!
mean <- function(x, na.rm=F) {
if (na.rm) x <- na.omit(x)
return(sum(x)/length(x))
}
!
mean(1:15)
mean(c(1:15, NA), na.rm=T)
Function mean
• Name:
• Input arguments
mean
x, na.rm=T!
• names, • default values
• Body
• Output values
if(na.rm) x <- na.omit(x)!
return(sum(x)/length(x))
Function Writing
• Start simple, then extend • Test out each step of the way
• Don’t try too much at once
•help(browser)
Practice
• For the Batting data in the Lahman package:
• Write a function called mba input: playerID
output: life-time batting average for playerID
• what does mba(“bondsba01”)do?
• write a function called pstats
input: playerID
output: life-time batting average for playerID & number of overall at bats
Checkpoint
• Submit all of your code for the last Your Turn at
http://heike.wufoo.com/forms/check-point/
Let’s explore Google’s flu trends
•
https://www.google.org/flutrends/us/
!
•
We can read this in R:
•
library(RCurl)
x <- getURL(“https://www.google.org/flutrends/us/“)
read.csv(text=x)
Practice
• For the flu trend data:
• melt the data (library(reshape2)) for all state flu
trends into a single variable. Keep the date
variable.
• Plot the flu trends since the beginning of this
year, facet by state.
Practice
• For the flu trend data:
• write a function fitflu that fits a linear model (lm)
of the number of flu cases in Date and returns
the slope of this fit.
• try the function out on the last five observations
for Iowa
• use dplyr routines to get the flu trends for all
states
What is a map?
43.5
43.0
Set of points specifying
latitude and longitude
lat
42.5
42.0
41.5
41.0
40.5
-96
-95
-94
-93
-92
-91
long
43.5
42.5
lat
Polygon: connect dots in
correct order
43.0
42.0
41.5
41.0
40.5
-96
-95
-94
long
-93
-92
-91
What is a map?
40
lat
35
Polygon: connect only
the correct dots
30
-95
-90
long
-85
Grouping
• Use parameter group to connect the
“right” dots (need to create grouping
sometimes)
qplot(long, lat, geom="point", data=states)
40
40
lat
45
lat
45
35
35
30
30
-120
-110
-100
-90
-80
long
-70
-120
-110
-100
-90
-80
-70
long
qplot(long, lat, geom="path", data=states, group=group)
qplot(long, lat, geom="polygon", data=states, group=group, fill=region)
45
45
40
40
lat
35
lat
lat
30
35
35
40
45
30
30
-120
-110
-100
-90
long
-80
-70
-120
-110
-100
-90
-80
-70
long
qplot(long, lat, geom="polygon", data=states.map, fill=lat, group=group)
Practice
• Using the maps package, pull out map data for all
US counties
counties <- map_data(“county”)
• Draw a map of counties (polygons & path geom)
• Colour all counties called “story”
• Advanced: What county names are used often?
Merging Data
• Merging data from different datasets:
merge(x, y, by = intersect(names(x), names(y)),!
by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,!
sort = TRUE, suffixes = c(".x",".y"), incomparables = NULL, ...)!
e.g.:
flumap <- merge(states, flu.states, by="region")
Merging Data
• Merging data from different datasets:
region
X1
alabama
...
...
reg X1 X2 X3
ion
alabama
alabama
alabama
X2
region
alabama
alabama
alabama
...
...
...
X3
Flu trends across the US
Download