Introduction to R - Environmental Statistics Group

Introduction to R Workshop
June 23-25, 2010
Southwest Fisheries Science Center
3333 North Torrey Pines Court
La Jolla, CA 92037
Eric Archer
eric.archer@noaa.gov
858-546-7121
1
Introduction to R
1) How R thinks
• Environment
• Data Structures
• Data Input/Output
2) Becoming a codeR
• Data Selection and Manipulation
• Data Summary
• Functions
3) Visualization and analysis
• Data Processing (‘apply’ family)
• Plotting & Graphics
• Statistical Distributions
• Statistical Tests
• Model Fitting
• Packages, Path, Options
2
S, S-Plus, R
S
Chambers, Becker, Wilks
1984: Bell Labs
S-Plus
1988: Statistical Sciences
1993: MathSoft
2001: Insightful
2008: TIBCO
R
Ihaka & Gentleman
1996
(The R Project)
“Programming ought to be regarded as an
integral part of effective and responsible data
analysis”
- Venables and Ripley. 1999. S Programming
Why R?
• Free
• Open source
• Many packages
• Large support base
• Multi-platform
• Vectorization
3
Workspace
Entering commands
• commands and assignments executed or evaluated immediately
• separated by new line (Enter/Return) or semicolon
• recall commands with ↑ or ↓
• case sensitive
• everything is some sort of function that does something
Getting help
> help(mean)
> ?median
> help(“[“)
> example(mean)
> help.search(“regression”)
> RSiteSearch(“genetics”)
> http://www.r-project.org/
4
Workspace
ls()
rm(…)
rm(list = ls())
save.image()
load(".rdata")
list objects in workspace
remove objects from workspace
remove all objects from workspace
saves workspace
loads saved workspace
history()
loadhistory()
savehistory()
view command history
load command history
save command history
#
comments
5
Assignment and data creation
<c(…)
seq(x)
seq(from,to,by)
from:to
rep(x,times)
letters,LETTERS
assign
combine arguments into a vector
generate sequence from 1 to x
generate sequence with increment by
generate sequence from .. to
replicate x
vector of 26 lower and upper case letters
> x <- 1
> y <- "A"
> my.vec <- c(1, 5, 6, 10)
> my.nums <- 12:24
> x
[1] 1
> y
[1] "A"
> my.vec
[1] 1 5 6 10
> my.nums
[1] 12 13 14 15 16 17 18 19 20 21 22 23 24
6
Data Structures
Object modes (atomic structures)
integer
whole numbers (15, 23, 8, 42, 4, 16)
numeric
real numbers (double precision: 3.14, 0.0002, 6.022E23)
character
text string (“Hello World”, “ROFLMAO”, “A”)
logical
TRUE/FALSE or T/F
Object classes
vector
factor
array
matrix
list
data.frame
object with atomic mode
vector object with discrete groups (ordered/unordered)
multiple dimensions
2-dimensional array
vector of components
"matrix –like" list of variables of same # of rows
Special Values
NULL
NA
NaN
Inf, -Inf
object of zero length, test with is.null(x)
Not Available / missing value, test with is.na(x)
Not a number, test with is.nan(x) (e.g. 0/0, log(-1))
Positive/negative infinity, test with is.infinite(x) (e.g. 1/0)
7
Vectors
Creation and info
vector(mode,length) create vector
length(x)
number of elements
names(x)
get or set names
Indexing (number, character (name), or logical)
x[n]
nth element
x[-n]
all but the nth element
x[a:b]
elements a to b
x[-(a:b)]
all but elements a to b
x[c(…)]
specific elements
x[“name”]
“name” element
x[x > a]
x[x %in% c(…)]
all elements greater than a
all elements in the set
8
Vectors
Create a vector
> x <- 1:10
Give the elements some names
> names(x) <- c("first","second","third","fourth","fifth")
Select elements based on another vector
> i <- c(1,5)
> x[i]
first fifth
1
5
> x[-c(i,8)]
second third fourth
<NA>
<NA>
2
3
4
6
7
<NA>
9
<NA>
10
9
logical testing
==
>, <
>=, <=
!
&, &&
|, ||
Vectors
equals
greater, less than
greater,less than or equal to
not
and (single is element-by-element, double is first element)
or
Select elements based on a condition
> x <- 1:10
> x[x < 5]
[1] 1 2 3 4
> x < 5
[1] TRUE TRUE
> x[x < 5]
[1] 1 2 3 4
& vs &&
> x < 5 & x > 2
[1] FALSE FALSE
FALSE
> x < 5 && x > 2
[1] FALSE
TRUE
TRUE FALSE FALSE FALSE FALSE FALSE FALSE
TRUE
TRUE FALSE FALSE FALSE FALSE FALSE
10
Vectorization
Operator recycles smaller object enough times to cover larger object
> x <- 4
> y <- c(5, 6, 7, 8, 9, 10)
> z <- x + y
> z
[1] 9 10 11 12 13 14
> x <- c(3, 5)
> z <- x + y
> z
[1] 8 11 10 13 12 15
> i <- 1:10
> j <- c(T, T, F)
> i[j]
[1] 1 2 4 5 7
8 10
11
Object Information
summary(x)
str(x)
mode(x)
class(x)
is.<class>(x)
attr(x, which)
attributes(x)
generic summary of object
display object structure
get or set storage mode
name of object class
test type of object (is.numeric, is.logical, etc.)
get or set the attribute of an object
get or set all attributes of an object
12
Object Information
> y <- 1:10
> str(y)
int [1:10] 1 2 3 4 5 6 7 8 9 10
> mode(y)
[1] "numeric“
> class(y)
[1] "integer“
> is.character(y)
[1] FALSE
> is.integer(y)
[1] TRUE
> is.double(y)
[1] FALSE
> is.numeric(y)
[1] TRUE
13
> x <- 1:4
> names(x) <- c("first","second","third","four")
> x
first second third
four
1
2
3
4
> str(x)
Named int [1:4] 1 2 3 4
- attr(*, "names")= chr [1:4] "first" "second" "third" "four"
> attributes(x)
$names
[1] "first" "second" "third" "four"
Object Information
> attr(x, "notes") <- "This is a really important vector."
> attributes(x)
$names
[1] "first" "second" "third" "four"
$notes
[1] "This is a really important vector."
> attr(x, "date") <- 20090624
> attributes(x)
$names
[1] "first" "second" "third"
"four"
$notes
[1] "This is a really important vector."
$date
[1] 20090624
> x
first second third
four
1
2
3
4
attr(,"notes")
[1] "This is a really important vector."
attr(,"date")
[1] 20090624
14
coercion
as.<class>(x)
coerces object x to <class> if possible
> x <- 1:10
> x.char <- as.character(x)
> as.numeric(x.char)
[1] 1 2 3 4 5 6 7 8
9 10
> y <- letters[1:10]
> as.numeric(y)
[1] NA NA NA NA NA NA NA NA NA NA
Warning message:
NAs introduced by coercion
> z <- "1char"
> as.numeric(z)
[1] NA
Warning message:
NAs introduced by coercion
> logic.chars <- c("TRUE", "FALSE", "T", "F", "t", "f", "0", "1")
> as.logical(logic.chars)
[1] TRUE FALSE TRUE FALSE
NA
NA
NA
NA
> logic.nums <- c(-2, -1, 0, 1.5, 2, 100)
> as.logical(logic.nums)
[1] TRUE TRUE FALSE TRUE TRUE TRUE
15
Factors
• Discrete ordered or unordered data
• Internally represented numerically
factor(x, levels, labels, exclude, ordered)
levels(x)
labels(x)
is.factor(x),is.ordered(x)
16
Factors
> x <- c("b", "a", "a", "c", "B", "d", "a", "d")
> x.fac <- factor(x)
> x.fac
[1] b a a c B d a d
Levels: a b B c d
> str(x.fac)
Factor w/ 5 levels "a","b","B","c",..: 2 1 1 4 3 5 1 5
> levels(x.fac)
[1] "a" "b" "B" "c" "d“
> labels(x.fac)
[1] "1" "2" "3" "4" "5" "6" "7" "8“
> as.numeric(x.fac)
[1] 2 1 1 4 3 5 1 5
> as.character(x.fac)
[1] "b" "a" "a" "c" "B" "d" "a" "d"
17
> x.fac.lvl <- factor(x, levels = c("a", "c"))
> x.fac.lvl
[1] <NA> a
a
c
<NA> <NA> a
<NA>
Levels: a c
Factors
> x.fac.exc <- factor(x, exclude = c("a", "c"))
> x.fac.exc
[1] b
<NA> <NA> <NA> B
d
<NA> d
Levels: b B d
> x.fac.lbl <- factor(x, labels = c("L1", "L2", "L3", "L4", "L5"))
> x.fac.lbl
[1] L2 L1 L1 L4 L3 L5 L1 L5
Levels: L1 L2 L3 L4 L5
> x.fac[2] < x.fac[1]
[1] NA
Warning message:
In Ops.factor(x.fac[2], x.fac[1]) : < not meaningful for factors
> x.ord <- factor(x, ordered = TRUE)
> x.ord
[1] b a a c B d a d
Levels: a < b < B < c < d
> x.ord[2] < x.ord[1]
[1] TRUE
18
Arrays and Matrices
array(data, dim, dimnames)
matrix(data, nrow, ncol, dimnames)
create array (row-priority)
create matrix
x[row, col]
x[row,] x[, col]
x[“name”, ]
etc.
element at row,col
vector of row and col
vector of row “name”
dim(x)
nrow(x)
ncol(x)
retrieve or set dimensions
number of rows
number of columns
dimnames(x)
rownames(x)
colnames(x)
retrieve or set dimension names
retrieve or set row names
retrieve or set column names
cbind(…)
rbind(…)
t(x)
create array from columns
create array from rows
transpose (matrices)
19
Create an array
Arrays and Matrices
> x <- array(1:10, dim = c(4, 6))
> x
[,1] [,2] [,3] [,4] [,5] [,6]
[1,]
1
5
9
3
7
1
[2,]
2
6
10
4
8
2
[3,]
3
7
1
5
9
3
[4,]
4
8
2
6
10
4
> str(x)
int [1:4, 1:6] 1 2 3 4 5 6 7 8 9 10 ...
> attributes(x)
$dim
[1] 4 6
> dim(x)
[1] 4 6
> dimnames(x)
NULL
20
Arrays and Matrices
Set column or row names
> colnames(x) <- c("col1", "col2", "col3", "col4", "5", "6")
> x
col1 col2 col3 col4 5 6
[1,]
1
5
9
3 7 1
[2,]
2
6
10
4 8 2
[3,]
3
7
1
5 9 3
[4,]
4
8
2
6 10 4
> colnames(x) <- c("column1", "column2")
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
> colnames(x)[1] <- "column1"
> x
column1 col2 col3 col4 5
[1,]
1
5
9
3 7
[2,]
2
6
10
4 8
[3,]
3
7
1
5 9
[4,]
4
8
2
6 10
6
1
2
3
4
21
Set row and columns names using dimnames
Arrays and Matrices
> dimnames(x) <- list(c("first", "second", "third", "4"), NULL)
> x
[,1] [,2] [,3] [,4] [,5] [,6]
first
1
5
9
3
7
1
second
2
6
10
4
8
2
third
3
7
1
5
9
3
4
4
8
2
6
10
4
Setting dimension names
> dimnames(x) <- list(my.rows = c("first", "second", "third", "4"), my.cols = NULL)
> x
my.cols
my.rows [,1] [,2] [,3] [,4] [,5] [,6]
first
1
5
9
3
7
1
second
2
6
10
4
8
2
third
3
7
1
5
9
3
4
4
8
2
6
10
4
22
Change dimensionality of array
Arrays
> dim(x) <- c(6, 4)
> x
[,1] [,2] [,3] [,4]
[1,]
1
7
3
9
[2,]
2
8
4
10
[3,]
3
9
5
1
[4,]
4
10
6
2
[5,]
5
1
7
3
[6,]
6
2
8
4
> dim(x) <- c(3, 4, 2)
> x
, , 1
[,1] [,2] [,3] [,4]
[1,]
1
4
7
10
[2,]
2
5
8
1
[3,]
3
6
9
2
, , 2
[,1] [,2] [,3] [,4]
[1,]
3
6
9
2
[2,]
4
7
10
3
[3,]
5
8
1
4
23
Arrays and Matrices
Bind several vectors into an array
> i1 <- seq(from = 1, to = 20, length = 10)
> i2 <- seq(from = 3.4, to = 25, length = 10)
> i3 <- seq(from = 15, to = 25, length = 10)
> i <- cbind(i1, i2,
> i
i1
i2
[1,] 1.000000 3.4
[2,] 3.111111 5.8
[3,] 5.222222 8.2
[4,] 7.333333 10.6
[5,] 9.444444 13.0
[6,] 11.555556 15.4
[7,] 13.666667 17.8
[8,] 15.777778 20.2
[9,] 17.888889 22.6
[10,] 20.000000 25.0
i3)
i3
15.00000
16.11111
17.22222
18.33333
19.44444
20.55556
21.66667
22.77778
23.88889
25.00000
24
Arrays and Matrices
> j <- rbind(i1, i2, i3)
> j
i1
i2
i3
i1
i2
i3
[,1]
[,2]
[,3]
[,4]
[,5]
[,6]
[,7]
[,8]
[,9]
1.0 3.111111 5.222222 7.333333 9.444444 11.55556 13.66667 15.77778 17.88889
3.4 5.800000 8.200000 10.600000 13.000000 15.40000 17.80000 20.20000 22.60000
15.0 16.111111 17.222222 18.333333 19.444444 20.55556 21.66667 22.77778 23.88889
[,10]
20
25
25
> i <- cbind(col1 = i1, col2 = i2, col3 = i3)
25
Lists
• Special vector
• Collection of elements of different modes
• Often used as return type of functions
list(…), vector(“list”, length)
x[i]
x[[i]]
x[“name”]
x[[“name”]] or x$name
unlist
create list
list of element i
element i
list of element name
element name
transform list to a vector
26
Lists
> x <- list(1:10, c("a", "b"), c(TRUE, TRUE, FALSE, TRUE), 5)
> x
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[1] "a" "b"
[[3]]
[1] TRUE
TRUE FALSE
TRUE
[[4]]
[1] 5
> is.list(x)
[1] TRUE
> is.vector(x)
[1] TRUE
> is.numeric(x)
[1] FALSE
27
Lists
What are the elements in a list?
> x[1]
[[1]]
[1] 1
2
3
4
5
6
7
8
9 10
> str(x[1])
List of 1
$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
> mode(x[1])
[1] "list“
> x[[1]]
[1] 1 2
3
4
5
6
7
8
9 10
> str(x[[1]])
int [1:10] 1 2 3 4 5 6 7 8 9 10
> mode(x[[1]])
[1] "numeric“
28
> y <- list(numbers = c(5, 10:25), initials = c(“rnm", "fds"))
> y
$numbers
[1] 5 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Lists
$initials
[1] “rnm" "fds"
> y$initials
[1] “rnm" "fds“
> y["numbers"]
$numbers
[1] 5 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
> y$new.element <- "This is new"
> y
$numbers
[1] 5 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
$initials
[1] “rnm" "fds"
$new.element
[1] "This is new"
29
Data Frames
•
•
Like matrices, but columns of different modes
Organized list where components are columns of equal length rows
x[[“name”]] or x$name
x[row, column], etc.
>
>
>
>
>
1
2
3
4
5
column name
age <- c(1:5)
color <- c("neonate", "two-tone", "speckled", "mottled", "adult")
juvenile <- c(TRUE, TRUE, FALSE, FALSE, FALSE)
spotted <- data.frame(age, color, juvenile)
spotted
age
color juvenile
1 neonate
TRUE
2 two-tone
TRUE
3 speckled
FALSE
4 mottled
FALSE
5
adult
FALSE
30
Data Frames
> is.matrix(spotted)
[1] FALSE
> is.array(spotted)
[1] FALSE
> is.list(spotted)
[1] TRUE
> is.data.frame(spotted)
[1] TRUE
> spotted$age
[1] 1 2 3 4 5
> spotted$age[2]
[1] 2
> spotted$color[2]
[1] two-tone
Levels: adult mottled neonate speckled two-tone
> spotted[spotted$age < 3, ]
age
color juvenile
1
1 neonate
TRUE
2
2 two-tone
TRUE
31
Data Frames
Forcing character columns
> str(spotted)
'data.frame':
5 obs. of 3 variables:
$ age
: int 1 2 3 4 5
$ color
: Factor w/ 5 levels "adult","mottled",..: 3 5 4 2 1
$ juvenile: logi
TRUE TRUE FALSE FALSE FALSE
>
+
+
>
1
2
3
4
5
spotted2 <- data.frame(age.class = age,
color.pattern = color, juvenile.stat = juvenile,
stringsAsFactors = FALSE)
spotted2
age.class color.pattern juvenile.stat
1
neonate
TRUE
2
two-tone
TRUE
3
speckled
FALSE
4
mottled
FALSE
5
adult
FALSE
> str(spotted2)
'data.frame':
5
$ age.class
:
$ color.pattern:
$ juvenile.stat:
obs. of 3 variables:
int 1 2 3 4 5
chr "neonate" "two-tone" "speckled" "mottled" ...
logi
TRUE TRUE FALSE FALSE FALSE
32
Data Frames
Deleting columns
> spotted$age <- NULL
> spotted
color juvenile
1 neonate
TRUE
2 two-tone
TRUE
3 speckled
FALSE
4 mottled
FALSE
5
adult
FALSE
Creating new columns
> spotted$freq <- c(0.3, 0.2, 0.2, 0.15, 0.15)
> spotted$have.data <- TRUE
> spotted
color juvenile freq have.data
1 neonate
TRUE 0.30
TRUE
2 two-tone
TRUE 0.20
TRUE
3 speckled
FALSE 0.20
TRUE
4 mottled
FALSE 0.15
TRUE
5
adult
FALSE 0.15
TRUE
33
Data Frames
subset(x, subset, select)
> subset(spotted, age >=3)
age
color juvenile
3
3 speckled
FALSE
4
4 mottled
FALSE
5
5
adult
FALSE
> subset(spotted, juvenile == FALSE & age <= 4)
age
color juvenile
3
3 speckled
FALSE
4
4 mottled
FALSE
> subset(spotted, age <=2, select = c("color", "juvenile"))
color juvenile
1 neonate
TRUE
2 two-tone
TRUE
34
Directory management
dir()
list files in directory
setwd(path) set working directory
getwd()
get working directory
?files
File and Directory Manipulation
Standard ASCII
read.table
read.csv
read.delim
read.fwf
write.table
write.csv
Data Input/Output
Format
creates a data frame from text file
read comma-delimited file
read tab-delimited file
read fixed width format
write data to text file
write comma-delimited file
R Binary Format
save
writes binary R objects
save.image
writes current environment in binary R
load
reload files written with save
R Text Format
dump
source
creates text representation of R objects
accept input from text file (scripts)
35
Data Input/Output
Reading ASCII
> sets <- read.csv("Sets_All.csv", header = TRUE)
> sets$Ordered.Year <- ordered(sets$Year)
> sets$SpotCd.Fac <- factor(sets$SpotCd, exclude = NULL)
> spotted.sets <- sets[sets$Sp1Cd == 2, ]
> write.table(spotted.sets, file = "spotted.txt",
+ row.names = FALSE)
Reading R binary
> save(spotted.sets, file = "spotted.RData")
> rm(list = ls())
> load("spotted.RData")
Reading R commands
> positions <- spotted.sets[, c("Latitude", "Longitude")]
> dump("positions", file = "set_positions.R")
> rm(list = ls())
> source("set_positions.R")
36
Writing Scripts
• Text files containing commands and comments written as if executed
on command line (usually end with .r)
• From R GUI : File|New script
• Any text editor (Notepad, Tinn-R, VEDIT, etc.)
Commands executed with:
• source("filename.r")
• Copy/paste
• From R Editor : Edit|Run...
37
Exercise 1A : Assemble data frame
1. Assemble a data frame from “Homework 1” files with only these columns (make these
names and in this order): boat (character), skipper (character), lat, lon, year, month,
day, mammals, turtles, fish
2. Add a column classifying each trip by season: Winter: Dec – Feb, Spring: Mar – May,
Summer: Jun – Aug, Fall: Sep – Nov
3. Add three columns classifying bycatch size for each of:
fish : < 15 (small), 15 – 200 (medium), > 200 (large)
turtles : < 4 (small), >= 4 (large)
mammals: < 2 (small), >= 2 (large)
4. Add column indicating that boat needs to be inspected if any bycatch class is “large”
5. Write your new data frame to a .csv file
Exercise 1B : Make a list
1. Read .csv file from 1A into clean R environment
2. Create a list with one element for the entire data set and one element per bycatch
type (4 elements total). Each bycatch element should contain a named vector of the
number of trips with small, medium, and large bycatches
3. How many trips needed to be inspected?
4. How many trips had no bycatch at all?
5. Save list and results from 3 & 4 in an R workspace
End Day 1
38
Data Selection and Manipulation
sample(x, size, replace, prob)
cut(x, breaks, labels)
take a random sample from x
divide vector into intervals
%in%
which(x)
all(…), any(…)
return logical vector of matches
return index of TRUE results
return TRUE if all or any arguments are TRUE
unique(x)
duplicated(x)
return unique observations in vector
return duplicated observations
sort
order
sort vector or factor
sort based on multiple arguments
merge()
merge two data frames by common cols or rows
ceiling, floor, trunc, round, signif
rounding functions
39
> x <- 1:5
sample
Sample x (jumble or permute)
> sample(x)
[1] 2 1 4 5 3
Sample from x
> sample(x, 3)
[1] 2 4 3
Sample with replacement
> sample(x, 10, replace = TRUE)
[1] 2 3 5 3 3 4 2 1 4 4
Sample with modified probabilities
>
>
>
>
>
>
cars <- c("Ford", "GM", "Toyota", "VW", "Subaru", "Honda")
male.wts <- c(6, 5, 3, 1, 3, 3)
female.wts <- c(3, 3, 4, 8, 3, 6)
male.survey <- sample(cars, 100, replace = TRUE, prob = male.wts)
female.survey <- sample(cars, 100, replace = TRUE, prob = female.wts)
40
cut
cut(x, breaks, labels = NULL,
include.lowest = FALSE, right = TRUE, dig.lab = 3,
ordered_result = FALSE, ...)
> y <- c(4, 5, 6, 10, 11, 30, 49, 50, 51)
Bins : 5 > y <= 10, 10 > y <= 30, 30 > y <= 50
> y.cut <- cut(y, breaks = c(5, 10, 30, 50))
> y.cut
[1] <NA>
<NA>
(5,10] (5,10] (10,30] (10,30] (30,50] (30,50] <NA>
Levels: (5,10] (10,30] (30,50]
> str(y.cut)
Factor w/ 3 levels "(5,10]","(10,30]",..: NA NA 1 1 2 2 3 3 NA
Bins : 5 >= y <= 10, 10 > y <= 30, 30 > y <= 50
> cut(y, breaks = c(5, 10, 30, 50), include.lowest = TRUE)
[1] <NA>
[5,10] [5,10] [5,10] (10,30] (10,30] (30,50] (30,50] <NA>
Levels: [5,10] (10,30] (30,50]
Bins : 5 >= y < 10, 10 >= y < 30, 30 >= y < 50
> cut(y, breaks = c(5, 10, 30, 50), right = FALSE)
[1] <NA>
[5,10) [5,10) [10,30) [10,30) [30,50) [30,50) <NA>
Levels: [5,10) [10,30) [30,50)
<NA>
Bins : 5 >= y < 10, 10 >= y < 30, 30 >= y <= 50
> cut(y, breaks = c(5, 10, 30, 50), include.lowest = TRUE, right = FALSE)
[1] <NA>
[5,10) [5,10) [10,30) [10,30) [30,50] [30,50] [30,50] <NA>
Levels: [5,10) [10,30) [30,50]
41
%in%, which
> x <- sample(1:10, 20, replace = TRUE)
> x
[1] 4 10 2 3 4 3 6 4 7 3 9 1
[20] 5
3
4
7
1
3
2
8
> x %in% c(3, 10, 2, 1)
[1] FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
[10] TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE
[19] FALSE FALSE
> x[x %in% c(3, 10, 2, 1)]
[1] 10 2 3 3 3 1 3 1
3
2
> which(x %in% c(3, 10, 2, 1))
[1] 2 3 4 6 10 12 13 16 17 18
> which(x < 5)
[1] 1 3 4 5
6
8 10 12 13 14 16 17 18
> x[which(x > 6)]
[1] 10 7 9 7 8
42
any, all
> x <- sample(1:10, 20, replace = TRUE)
> x
[1] 2 7 8 1 1 7 5 8 6 7 3 7
2
1
5 10
3
9
1
> any(x == 6)
[1] TRUE
> all(x < 5)
[1] FALSE
43
2
unique, duplicated
> x <- sample(1:10, 20, replace = TRUE)
> x
[1] 6 5 1 8 9 6 2 3 8 9 8 10 10
[20] 10
> unique(x)
[1] 6 5 1
8
9
2
3 10
2
9
3
4
3
4
4
> duplicated(x)
[1] FALSE FALSE FALSE FALSE FALSE
[10] TRUE TRUE FALSE TRUE TRUE
[19] TRUE TRUE
TRUE FALSE FALSE
TRUE TRUE FALSE
TRUE
TRUE
44
sort, order
> x <- sample(1:10, 20, replace = TRUE)
> x
[1] 3 6 7 1 5 3 10 3 7 2 3 9
[19] 4 1
> sort(x)
[1] 1 1
[19] 9 10
1
2
2
4
3
8
2
3
3
4
4
5
6
7
7
8
8
> sort(x, decreasing = TRUE)
[1] 10 9 8 8 7 7 6 5
[19] 1 1
4
4
3
3
3
3
3
2
2
1
8 11 16 15 19
5
2
3
9 14 17
1
3
8
3
> order(x)
[1] 4 13 20 10 18
[19] 12 7
3
1
6
> trips <- read.csv(“homework 1a df.csv")
> month.sort <- trips[order(trips$month), ]
> month.days.sort <- trips[order(trips$month, trips$day), ]
45
merge
merge(x, y, by = intersect(names(x), names(y)),
by.x = by, by.y = by, all = FALSE, all.x = all,
all.y = all, sort = TRUE, suffixes = c(".x",".y"), ...)
> rm(list = ls())
> load("merge data.rdata")
> str(cranial)
'data.frame':
20 obs. of 2 variables:
$ id
: Factor w/ 20 levels "Specimen-1","Specimen-12",..: 14 11
13 7 20 18 3 10 5 17 ...
$ skull: num 260 266 259 273 262 ...
> str(haps)
'data.frame':
20 obs. of 2 variables:
$ id : Factor w/ 20 levels "Specimen-1","Specimen-10",..: 16 12
15 18 8 7 3 13 6 9 ...
$ haps: Factor w/ 5 levels "A","B","C","D",..: 1 4 4 5 5 3 1 3 3
4 ...
> merge(haps, cranial)
id haps
skull
1
Specimen-1
A 255.4461
2 Specimen-12
A 262.5730
3 Specimen-16
E 256.2258
4 Specimen-22
E 259.2000
...
46
merge
merge(x, y, by = intersect(names(x), names(y)),
by.x = by, by.y = by, all = FALSE, all.x = all,
all.y = all, sort = TRUE, suffixes = c(".x",".y"), ...)
> str(sex)
'data.frame':
40 obs. of 2 variables:
$ specimens: Factor w/ 40 levels "Specimen-1","Specimen10",..: 1 12 23 34 36 37 38 39 40 2 ...
$ sex
: Factor w/ 2 levels "F","M": 1 2 1 2 2 2 2 2 2 1
...
> str(trials)
'data.frame':
30 obs. of 2 variables:
$ id
: Factor w/ 23 levels "Specimen-1","Specimen-18",..: 5
6 1 9 3 7 8 2 10 4 ...
$ value: num 30.1 23.1 24.3 22.6 36.7 ...
> merge(sex, trials, by.x = "specimens", by.y = "id")
specimens sex
value
1
Specimen-1
F 24.28745
2 Specimen-11
F 23.90455
3 Specimen-12
M 27.41010
4 Specimen-14
M 36.84547
5 Specimen-15
M 20.08898
47
String Manipulation
nchar(x)
substr(x, start, stop)
strsplit(x, split)
paste(..., sep, collapse)
number of characters in string
extract or replace substrings
split string
concatenate vectors
format
grep, sub, gsub
format object for printing
pattern matching and replacement
48
nchar, substr, strsplit
> x <- "This is a sentence."
> nchar(x)
[1] 19
> substr(x, 3, 9)
[1] "is is a“
> substr(x, 1, 4) <- "That"
> x
[1] "That is a sentence.“
> strsplit(x, " ")
[[1]]
[1] "That"
"is"
"sentence."
> strsplit(x, "a")
[[1]]
[1] "Th"
"t is "
"a"
" sentence."
49
paste
> sites <- LETTERS[1:6]
> paste("Site", sites)
[1] "Site A" "Site B" "Site C" "Site D" "Site E" "Site F"
> paste("Site", sites, sep = "-")
[1] "Site-A" "Site-B" "Site-C" "Site-D" "Site-E" "Site-F"
> paste("Site", sites, sep = "_", collapse = ",")
[1] "Site_A,Site_B,Site_C,Site_D,Site_E,Site_F"
50
Data Summary
summary
table
summarizes object – different for each class
create contingency table
sum(x), prod(x)
cumsum(x)
sum and product of vector
vector of cumulative sums
rowSums, colSums
rowMeans, colMeans
rowsum(x, group)
compute row or column sums
compute row or column means
compute column sums for a grouping variable
51
table
> trips <- read.csv(“homework 1a df.csv")
> table(season = trips$season)
season
Fall Spring Summer Winter
2503
2546
2336
2615
> table(season = trips$season, fish.class = trips$fish.class)
fish.class
season
Large Medium Small
Fall
1499
897
107
Spring 1505
960
81
Summer 1380
865
91
Winter 1550
959
106
> turtle.class.table <- as.data.frame(table(turtle.class = trips$turtle.class))
> str(turtle.class.table)
'data.frame':
2 obs. of 2 variables:
$ turtle.class: Factor w/ 2 levels "Large","Small": 1 2
$ Freq
: int 3443 6557
> turtle.class.table
turtle.class Freq
1
Large 3443
2
Small 6557
52
row/col sums/means
> x <- matrix(1:18, nrow = 6, ncol = 3)
> x
[,1] [,2] [,3]
[1,]
1
7
13
[2,]
2
8
14
[3,]
3
9
15
[4,]
4
10
16
[5,]
5
11
17
[6,]
6
12
18
> rowSums(x)
[1] 21 24 27 30 33 36
> colMeans(x)
[1] 3.5 9.5 15.5
> rowsum(x,
[,1] [,2]
1
3
15
2
7
19
3
11
23
> rowsum(x,
[,1] [,2]
a
6
24
b
15
33
c(1, 1, 2, 2, 3, 3))
[,3]
27
31
35
c("a", "a", "a", "b", "b", "b"))
[,3]
42
51
53
Data Summary
min, max
return minimum or maximum values
range
return a vector of minimum and maximum values
which.min, which.max
return index of first minimum value
mean(x)
sd, var, cov, cor
arithmetic mean of vector
standard deviation, variance, covariance, correlation
median(x)
quantile(x, probs)
median of vector
give quantiles of vector
> x <- sample(1:100, 50, replace = TRUE)
> mean(x)
[1] 55.82
> median(x)
[1] 51.5
> range(x)
[1]
1 100
> quantile(x, probs = 0.1)
10%
21.9
> quantile(x, probs = c(0.025, 0.5, 0.975))
2.5%
50% 97.5%
6.825 51.500 98.325
54
Functions
fun.name <- function(args) {
statements
x or return(x)
}
•
•
•
•
result of last statement is return value
arguments(args) passed by value
can give default arguments
“…” passes unmatched arguments to other functions
55
Functions
F2C <- function(faren) {
# converts farenheit to celsius
cels <- round((faren - 32) * 5/9, 2)
paste(faren, "deg. Farenheit =", cels, "deg. Celsius",
sep=" ", collapse="")
}
sample.mean <- function(x, sample.size = 10) {
y <- sample(x, size = sample.size, replace = TRUE)
mean(y)
}
sample.mean <- function(x, sample.size = length(x)) {
y <- sample(x, size = sample.size, replace = TRUE)
mean(y)
}
sample.mean <- function(x, ...) {
y <- sample(x, ...)
mean(y, na.rm = TRUE)
}
56
Functions
if(cond) {statements} else {statements} evaluate condition
ifelse(test, yes, no) evaluate test, return yes or no
for(var in seq) {statements} execute one loop for each var in seq
while(cond) {statements}
execute loop as long as condition is true
repeat {statements} execute expression on each loop
break
exits loop
next
moves to next iteration in loop
switch(EXPR, ...)
print(x)
stop("...")
warning("...")
stopifnot(cond)
select from list of alternatives
prints object x to screen
stop function and print error message
generate warning message
stop if cond not TRUE
57
fishery.status.1 <- function(catch, catch.limit = 20) {
result <- list(to.close = TRUE, remaining.catch = NA)
if (catch < catch.limit) {
result$to.close = FALSE
result$remaining.catch = catch.limit - catch
} else {
result$to.close = TRUE
result$remaining.catch = 0
}
result
}
if, ifelse
fishery.status.2 <- function(catch, catch.limit = 20) {
to.close <- catch >= catch.limit
remaining.catch <- ifelse(catch < catch.limit, catch.limit - catch, 0)
list(to.close = to.close, remaining.catch = remaining.catch)
}
> x
> y
> z
> x
[1]
> x
[1]
> x
[1]
<- c(TRUE, TRUE, FALSE)
<- c(FALSE, TRUE, FALSE)
<- c(TRUE, FALSE, FALSE)
& y
FALSE TRUE FALSE
&& y
FALSE
&& z
TRUE
58
for
make.plates <- function(num.plates) {
plate.vec <- vector("character", length = num.plates)
for(i in 1:num.plates) {
first.num <- sample(0:9, 1)
chars <- sample(LETTERS, 3, replace = TRUE)
chars <- paste(chars, collapse = "")
last.nums <- sample(0:9, 3, replace = TRUE)
last.nums <- paste(last.nums, collapse = "")
plate.vec[i] <- paste(first.num, chars, last.nums, sep = "", collapse = "")
}
plate.vec
}
check.plates <- function(plates, reserved) {
bad.plates <- vector("character")
for(plate in plates) {
plate.str <- substr(plate, 2, 4)
if (plate.str %in% reserved) bad.plates <- c(bad.plates, plate)
}
bad.plates
}
59
bootstrap example
Question: How many trips had “small” bycatches for all categories?
More importantly: What is the variance of this measure?
trips <- read.csv("homework 1a df.csv")
boot.bycatch <- function(trip.df, nrep) {
obs.num.small <- num.all.small(trip.df)
boot.results <- vector("numeric", nrep)
for(i in 1:nrep) {
boot.rows <- sample(1:nrow(trip.df), nrow(trip.df), rep = TRUE)
boot.df <- trip.df[boot.rows, ]
boot.results[i] <- num.all.small(boot.df)
}
list(observed = obs.num.small, boot.dist = boot.results)
}
num.all.small <- function(trip.df) {
f.small <- trip.df$fish.class == "Small"
t.small <- trip.df$turtle.class == "Small"
m.small <- trip.df$mammal.class == "Small"
sum(f.small & t.small & m.small)
}
60
Exercise 2A : Reformat dates
1)
2)
3)
4)
5)
6)
Use “Homework 2 sets.csv”
Write function to split Date into Year, Month, Day
Save function as R object
Create numeric Year, Month, Day columns in data frame
Create new Date character column that is DD-MM-YY
Remove old Date column and save new data frame under new name
Exercise 2B : Bootstrap fishery closures
1) Use “Homework 2 catches.txt"
2) Write and save a function that takes catch.data, a catch.limit, and a number
of bootstrap replicates. The function should bootstrap the catch over all
years and return two objects: 1) a distribution of the number of years with
closures, and 2) a distribution of the average catch remaining.
3) Run bootstrap with catch limits of 20 and 50 at 1000 replicates each.
Extra: Create a table showing the frequency distribution of the number of
closures in the bootstrap result.
End Day 2
61
Data Processing - ‘apply’ family
lapply(X, FUN, …)
sapply(X, FUN, …)
apply(X, MARGIN, FUN, …)
tapply(X, INDEX, FUN, …)
apply function to list or vector
simplified version of lapply
apply function to margins of array
apply function to ragged array
by(data, INDICES, FUN, ...)
aggregate(x, by, FUN, ...)
apply function to data frame
compute function for subsets of object
62
lapply
lapply returns list
> spring.trip <- trips$season == "Spring"
> spring.fish <- trips$fish[spring.trip & trips$fish > 0]
> spring.turtles <- trips$turtles[spring.trip & trips$turtles > 0]
> spring.mammals <- trips$mammals[spring.trip & trips$mammals > 0]
>
> spring <- list(fish = spring.fish, turtles = spring.turtles, mammals = spring.mammals)
>
> lapply(spring, length)
$fish
[1] 2525
$turtles
[1] 1274
$mammals
[1] 2119
> lapply(spring, mean)
$fish
[1] 250.2356
$turtles
[1] 5.49843
$mammals
[1] 3.050024
63
sapply
sapply returns vector or matrix
> sapply(spring, median)
fish turtles mammals
250
5
3
> sapply(spring, function(i) sum(i > 5 & i < 20))
fish turtles mammals
63
623
0
> sapply(spring, function(i) c(n = length(i), mean = mean(i), var = var(i)))
fish
turtles
mammals
n
2525.0000 1274.00000 2119.000000
mean
250.2356
5.49843
3.050024
var 20785.6612
8.61783
1.953115
64
apply
bycatch.df <- subset(trips, , c("fish", "turtles", "mammals"))
Apply across columns
> apply(bycatch.df, 2, mean)
fish turtles mammals
248.6285
2.7283
2.5160
> apply(bycatch.df, 2, quantile, prob = c(0.025, 0.975))
fish turtles mammals
2.5%
8
0
0
97.5% 489
10
5
Apply across rows
> bycatch.sum <- apply(bycatch.df, 1, sum)
> range(bycatch.sum)
[1]
0 512
> mean(bycatch.sum)
[1] 253.8728
65
tapply
apply function based on groups
> tapply(trips$fish, trips$season, mean)
Fall
Spring
Summer
Winter
250.1322 248.1716 250.5051 245.9576
> tapply(trips$fish, list(season = trips$season, class = trips$fish.class), median)
class
season
Large Medium Small
Fall
354.0
112
6.0
Spring 354.0
107
5.0
Summer 353.5
111
3.0
Winter 348.0
108
5.5
66
Exercise 3 : Bootstrap with apply
1) Rewrite bootstrap from Exercise 2B using apply family
2) Run bootstrap with catch limits of 10, 15, 20, 30, 50, 60.
3) Summarize mean and median of results for each catch limit in one object
67
Simulated growth data
Create a function that simulates growth data according to a Gompertz model,
length  L 0  e

k 1 e
 g age

The output should have two columns (age and length).
Age should be rounded to two decimal places.
Length should be rounded to one decimal place.
Try to put in checks and traps for screwy input data.
sim.growth.func <- function(age.range, L0, k, g, sd, sample.size)
age.range is a two element vector giving min and max ages
L0 is length at birth
k, g are model rate parameters
sd is the standard deviation for the error term
sample.size is the number of samples to return
68
80
60
Length (cm)
40
20
0
0
10
20
30
40
50
60
Age (years)
69
Simulated growth data
# Gompertz growth function
gomp.func <- function(age.vec, LAB, k, g)
LAB * exp(k * (1 - exp(-g * age.vec)))
}
{
# A function to created simulated growth data according
#
to a Gompertz equation
sim.growth.func <- function(age.range, LAB, k, g,
std.dev, sample.size = 1000) {
# Check to make sure age.range is a reasonable vector
if (!is.numeric(age.range) || !is.vector(age.range))
stop("'age.range' is not a numeric vector")
if (any(age.range < 0)) stop("'age.range' < 0")
if (age.range[1] >= age.range[2])
stop("'age.range[1]' >= 'age.range[2]'")
# Generate some random ages between min and max of age.range
random.ages <- runif(sample.size, age.range[1], age.range[2])
# Calculate the expected length for those ages from the Gompertz equation
expected.length <- gomp.func(random.ages, LAB, k, g)
# Add some error to the lengths and return the named array
length.err <- rnorm(sample.size, 0, std.dev)
as.data.frame(cbind(age = random.ages, length = expected.length + length.err))
}
growth.df <- sim.growth.func(age.range = c(0, 65), LAB = 10, k = 2, g = 0.25, std.dev = 5)
70
plot(x, y = NULL, type = "p", xlim = NULL, ylim = NULL,
log = "", main = NULL, sub = NULL, xlab = NULL, ylab = NULL,
ann = par("ann"), axes = TRUE, frame.plot = axes,
panel.first = NULL, panel.last = NULL,
col = par("col"), bg = NA, pch = par("pch"),
cex = 1, lty = par("lty"), lab = par("lab"),
lwd = par("lwd"), asp = NA, ...)
Plot
40
20
Length (cm)
60
80
plot(growth$age, growth.df$length, xlab = "Age (years)", ylab = "Length (cm)")
0
10
20
30
Age (years)
40
50
60
71
hist(x, breaks = "Sturges", freq = NULL, probability = !freq,
include.lowest = TRUE, right = TRUE,
density = NULL, angle = 45, col = NULL, border = NULL,
main = paste("Histogram of" , xname),
xlim = range(breaks), ylim = NULL,
xlab = xname, ylab,
axes = TRUE, plot = TRUE, labels = FALSE,
nclass = NULL, ...)
Hist
> hist(growth$age)
> hist(growth$age, breaks = c(0:5, seq(6, 12, 2), 15, 20, 40, max(growth.df$age)),
+col = "black", border = "white")
Histogram of growth.df$age
0.005
0.010
Density
40
0.000
20
0
Frequency
60
0.015
80
0.020
Histogram of growth.df$age
0
10
20
30
growth.df$age
40
50
60
0
10
20
30
40
50
60
growth.df$age
72
Boxplot
boxplot(formula, data = NULL, ..., subset, na.action = NULL)
boxplot(x, ..., range = 1.5, width = NULL, varwidth = FALSE,
notch = FALSE, outline = TRUE, names, plot = TRUE,
border = par("fg"), col = NULL, log = "",
pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5),
horizontal = FALSE, add = FALSE, at = NULL)
> age.breaks <- hist(growth$age)$breaks
> binned.age <- cut(growth$age, breaks = age.breaks)
> boxplot(growth$length ~ binned.age, xlab = "Age bin", ylab = "Length")
Length
40
0.010
20
0.005
0.000
Density
0.015
60
0.020
80
Histogram of growth.df$age
0
10
20
30
growth.df$age
40
50
60
(0,1]
(2,3]
(4,5]
(6,8]
(10,12]
(15,20]
(40,65]
Age bin
73
Modifying Graphs
abline
lines
points
title
text
add straight lines to plot
join points at coordinates with lines
place points on plot
add labels to a plot
write text on a plot
?plot.default
par
default plot options
set or get graphical parameters
layout(mat, ...)
split.screen(figs, ...)
divide graphical screen into matrix
divide graphical screen into sub-screens
80
newborns <- growth[growth$age <= 3, ]
adults <- growth[growth$age > 3, ]
plot(adults$age, adults$length, xlim = range(growth$age),
ylim = range(growth$length), xlab = "", ylab = "",
col = "red", pch = 21)
abline(v = 3, col = "green")
Length
40
par(new = TRUE)
60
plot(newborns$age, newborns$length, xlim = range(growth$age),
ylim = range(growth$length), xlab = "Age", ylab = "Length",
col = "blue", pch = 21)
Transition
20
>
>
>
>
+
+
>
>
>
>
+
+
>
>
>
>
0
10
20
30
40
50
60
Age
text(3, 80, "Transition", pos = 4)
74
Modifying Graphs
> layout(matrix(c(1, 1, 2, 3), 2, 2, byrow = TRUE))
> plot(growth$age, growth$length, xlab = "Age", ylab =
+"Length", main = "Simulated growth data")
> age.breaks <- seq(0, max(growth$age) + 5, 5)
> binned.age <- cut(growth$age, age.breaks)
> hist(growth$age, age.breaks, xlab = "Age", main = "")
> boxplot(growth$length ~ binned.age, names =
+age.breaks[-length(age.breaks)], xlab = "Age bin")
Length
20
40
60
80
Simulated growth data
0
10
20
30
40
50
60
60
60
40
40
20
20
0
Frequency
80
80
Age
0
10
20
30
Age
40
50
60
0
10
20
30
Age bin
40
50
60
75
Curve
curve(expr, from, to, n = 101, add = FALSE, type = "l",
ylab = NULL, log = NULL, xlim = NULL, ...)
>
+
>
+
Length
40
0.0
20
-0.5
-1.0
sin (x)
60
0.5
80
1.0
> curve(sin, -10, 10)
plot(growth$age, growth$length, xlab = "Age",
ylab = "Length", main = "")
curve(10 * exp(2 * (1 - exp(-0.25 * x))),
add = TRUE, lty = "dashed", lwd = 2, col = "red")
-10
-5
0
x
5
10
0
10
20
30
40
50
60
Age
76
0.4
Statistical Distributions
dnorm
0.2
density
distribution function
quantile function
random number
dunif, dnorm, dgamma, dbeta, dchisq, etc.
>library(help=“stats”)
0.0
0.1
dnorm (x)
0.3
d<dist>
p<dist>
q<dist>
r<dist>
-3
-2
-1
0
1
2
>set.seed(x)
3
set random number seed
1.0
x
qnorm
0
0.0
-2
0.2
-1
0.4
qnorm (x)
pnorm (x)
0.6
1
0.8
2
pnorm
-3
-2
-1
0
x
1
2
3
0.0
0.2
0.4
0.6
x
0.8
1.0
77
Statistical Tests
binom.test(x, n, p = 0.5,
alternative = c("two.sided", "less", "greater"),
conf.level = 0.95)
chisq.test(x, y = NULL, correct = TRUE,
p = rep(1/length(x), length(x)), rescale.p = FALSE,
simulate.p.value = FALSE, B = 2000)
t.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)
78
>
>
>
>
>
>
male.growth <- sim.growth.func(c(0, 65), 10, 2.05, 0.27, 5)
female.growth <- sim.growth.func(c(0, 65), 10, 1.99, 0.23, 4)
adult.males <- male.growth[male.growth[, "age"] > 18, ]
adult.females <- female.growth[female.growth[, "age"] > 18, ]
gender.test <- t.test(adult.males[, "length"], adult.females[, "length"])
gender.test
t-test
Welch Two Sample t-test
data: adult.males[, "length"] and adult.females[, "length"]
t = 19.3369, df = 1427.025, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.146325 5.082547
sample estimates:
mean of x mean of y
77.56675 72.95232
> str(gender.test)
List of 9
$ statistic : Named num 19.3
..- attr(*, "names")= chr "t"
$ parameter : Named num 1427
..- attr(*, "names")= chr "df"
$ p.value
: num 3.56e-74
$ conf.int
: atomic [1:2] 4.15 5.08
..- attr(*, "conf.level")= num 0.95
$ estimate
: Named num [1:2] 77.6 73.0
..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
$ null.value : Named num 0
..- attr(*, "names")= chr "difference in means"
$ alternative: chr "two.sided"
$ method
: chr "Welch Two Sample t-test"
$ data.name : chr "adult.males[, \"length\"] and adult.females[, \"length\"]"
- attr(*, "class")= chr "htest"
79
Model Fitting
Linear Models
lm(formula, data, subset, weights, na.action,
method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
singular.ok = TRUE, contrasts = NULL, offset, ...)
Analysis of Variance Model
aov(formula, data = NULL, projections = FALSE, qr = TRUE,
contrasts = NULL, ...)
Generalized Linear Models
glm(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart,
offset, control = glm.control(...), model = TRUE,
method = "glm.fit", x = FALSE, y = TRUE, contrasts = NULL, ...)
Nonlinear Least Squares
nls(formula, data, start, control, algorithm,
trace, subset, weights, na.action, model,
lower, upper, ...)
Non-Linear Minimization
nlm(f, p, hessian = FALSE, typsize=rep(1, length(p)), fscale=1,
print.level = 0, ndigit=12, gradtol = 1e-6,
stepmax = max(1000 * sqrt(sum((p/typsize)^2)), 1000),
steptol = 1e-6, iterlim = 100, check.analyticals = TRUE, ...)
80
>
>
>
>
lm
sim.growth <- sim.growth.func(c(0, 65), 10, 2, 0.25, 5)
juv <- as.data.frame(sim.growth[sim.growth[, "age"] < 10, ])
juv.lm <- lm(length ~ age, juv)
juv.lm
50
60
Call:
lm(formula = length ~ age, data = juv)
Coefficients:
(Intercept)
12.438
30
length
40
age
5.584
10
20
> plot(juv.lm)
Waiting to confirm page change...
Waiting to confirm page change...
Waiting to confirm page change...
Waiting to confirm page change...
> plot(juv)
> abline(coef = juv.lm$coefficients, col = "red", lty = "dashed")
6
8
10
age
60
50
40
length
30
0
20
-1
-5
0
Standardized residuals
1
5
2
10
4
111
111
134
10
-2
-10
Residuals
2
70
Normal Q-Q
Residuals vs Fitted
0
116
134
116
20
30
40
Fitted values
lm(length ~ age)
50
60
-2
-1
0
Theoretical Quantiles
lm(length ~ age)
1
2
0
2
4
6
age
8
10
81
Model Fitting
fitted
coef
resid
deviance
logLik
AIC
predict
anova
> coef(juv.lm)
(Intercept)
12.88
extract fitted values for models
extract model coefficients
extract model residuals
extract deviances for models
calculate log-likelihood for model fit
calculate AIC for model fit
predictions from model results
calculate analysis of variance tables
age
5.28
> logLik(juv.lm)
'log Lik.' -508 (df=3)
> AIC(juv.lm)
[1] 1023
> predict(juv.lm, data.frame(age = c(1, 5, 10)))
1
2
3
18.2 39.3 65.6
82
> gomp.form <- formula(length ~ LAB * exp(k * (1 - exp(-g * age))))
> growth.nls <- nls(gomp.form, sim.growth, start = c(LAB = 5, k = 5, g = 0.6))
> growth.nls
Nonlinear regression model
model: length ~ LAB * exp(k * (1 - exp(-g * age)))
data: sim.growth
LAB
k
g
10.995 1.905 0.236
residual sum-of-squares: 24793
nls
20
40
length
60
80
Number of iterations to convergence: 6
Achieved convergence tolerance: 9.67e-06
> plot(sim.growth)
> age.vec <- 1:max(sim.growth$age)
> lines(age.vec, predict(growth.nls, list(age = age.vec)), col = "red",
+ lty = "dashed", lwd = 2)
0
10
20
30
age
40
50
60
83
Packages, Path, & Options
library()
library(package)
library(help = "package")
require(package)
list available packages
load package
list info about package (build, functions, etc.)
loads package and returns FALSE if not present
attach(x,pos) attach database (list, data frame, or file) to search path
detach(x)
remove database from search path
search()
list attached packages in search path
options(...) set and examine global options
?Startup
Control initialization of R session
84