Fitting Royle`s N-mixture Model with Package unmarked in R Written

advertisement
Fitting Royle’s N-mixture Model with Package unmarked in R
Written by Ian Fiske and Richard Chandler
Sinead Borchert
4/16/13
Package unmarked includes code for modeling the occupancy (presence/absence), abundance, and
density of unmarked animals. In this tutorial I will discuss the pcount function (using sample code
provided by the authors), which models abundance for a closed population. The data structure is based
on repeated counts of a population in the field, which is then used to model site (e.g. habitat) and
detection (e.g. wind, time of day) covariates with abundance. The ultimate goal is to transform observed
field count data into an index of abundance and to determine how environmental factors contribute to
variation in abundance.
References:
Fiske, I. and R.B. Chandler. 2011. unmarked: An R package for fitting hierarchical models of wildlife
occurrence and abundance. Journal of Statistical Software, 43(10), 1-23.
Royle, J. A. 2004. N-Mixture Models for Estimating Population Size from Spatially Replicated
Counts. Biometrics 60, 108–105.
Kéry, M., Royle, J. A., and Schmid, H. 2005. Modeling Avaian Abundance from Replicated Counts Using
Binomial Mixture Models. Ecological Applications 15(4), 1450–1461.
Package unmarked can be downloaded at: http://cran.rproject.org/web/packages/unmarked/index.html
Lots of information on package unmarked and a help forum can be found at the authors’ website:
https://sites.google.com/site/unmarkedinfo/
My comments follow the # symbol and precede the code.
Calibri = the code
Courier New = the output
#
# INSTALL PACKAGE UNMARKED
#
# Installs directly from the CRAN website
install.packages("unmarked")
# Load the package
library(unmarked)
# Set the working directory to where your file is
setwd("C:/Users/Sinead/Documents/R_Seminar")
# Check that you have the right working directory saved (optional)
getwd()
#
# SET-UP DATA FOR ANALYSIS
#
# Import data
# This is repeated count data of Alder Flycatchers with site covariates for vegetation structure and
percent woody cover and detection covariates for time and date of each visit
# str() checks the structure of the data
alfl.data <- read.csv("alfl.csv", row.names=1)
str(alfl.data)
'data.frame':
50 obs. of 12 variables:
$ alfl1 : int 1 2 0 1 0 3 0 0 0 0 ...
$ alfl2 : int 0 3 1 0 0 1 0 0 1 0 ...
$ alfl3 : int 1 0 1 1 0 1 0 0 2 0 ...
$ struct: num 5.45 4.75 14.7 5.05 4.15 9.75 9.6 15.7 9.2 7.75 ...
$ woodyO: int 6 1 7 6 2 8 4 0 4 3 ...
$ woody : num 0.3 0.05 0.35 0.3 0.1 0.4 0.2 0 0.2 0.15 ...
$ time.1: num 8.68 9.43 8.25 7.77 9.57 9.1 8.6 8.12 7.63 9.92 ...
$ time.2: num 8.73 7.4 6.7 6.23 9.55 9.12 8.62 7.92 7.43 5.67 ...
$ time.3: num 5.72 7.58 7.62 7.17 5.73 9.12 6.72 8.07 7.6 9.72 ...
$ date.1: int 6 20 20 20 8 8 8 1 1 1 ...
$ date.2: int 25 32 32 32 27 27 27 27 27 27 ...
$ date.3: int 34 54 47 47 36 36 36 36 36 36 ...
# Create an object that will pull out the count data
alfl.y <- alfl.data[,c("alfl1", "alfl2", "alfl3")]
# Standardize the site-covariates – you want them all to be on the same scale because different ranges
have an effect on the variance
woody.mean <- mean(alfl.data$woody)
woody.sd <- sd(alfl.data$woody)
woody.z <- (alfl.data$woody-woody.mean)/woody.sd
struct.mean <- mean(alfl.data$struct)
struct.sd <- sd(alfl.data$struct)
struct.z <- (alfl.data$struct-struct.mean)/struct.sd
# Create your unmarkedFrame
# Function pcount fits the N-Mixture model of Royle
# Site covariates stay the same. Observation covariates (time and date) change with each visit.
# “y” is the count data we pulled out earlier
library(unmarked)
alfl.umf <- unmarkedFramePCount(y=alfl.y,
siteCovs=data.frame(woody=woody.z, struct=struct.z),
obsCovs=list(time=alfl.data[,c("time.1", "time.2", "time.3")],
date=alfl.data[,c("date.1", "date.2", "date.3")]))
summary(alfl.umf)
unmarkedFrame Object
50 sites
Maximum number of observations per site: 3
Mean number of observations per site: 3
Sites with at least one detection: 34
Tabulation of y observations:
0
1
2
3 <NA>
85
42
17
6
0
Site-level covariates:
woody
struct
Min.
:-1.5967
Min.
:-1.80997
1st Qu.:-0.6019
1st Qu.:-0.77096
Median :-0.1045
Median : 0.02358
Mean
: 0.0000
Mean
: 0.00000
3rd Qu.: 0.6417
3rd Qu.: 0.60241
Max.
: 2.3826
Max.
: 3.20894
Observation-level covariates:
time
date
Min.
:4.900
Min.
: 1.00
1st Qu.:6.550
1st Qu.:11.00
Median :7.560
Median :27.00
Mean
:7.553
Mean
:24.83
3rd Qu.:8.620
3rd Qu.:34.00
Max.
:9.920
Max.
:54.00
#
# FIT MODELS
#
# Fit models to your data and extract the estimates
# Detection covariates follow first tilde and abundance covariates follow the second. You can also
specify different distributions using “mixture=”. The detection process is always
# modeled as binomial. The abundance process is by default modeled as Poisson, but you can also
specify zero-inflated Poisson and negative binomial
# Some of the models compared below use the zero-inflated poisson (“ZIP”) or negative binomial (“NB”)
# Estimates of detection covariates are on the logit scale
# The first fitted model (with no covariates) sets up the intercepts
# I am fitting 10 models here, so there will be a lot of code, but the key here is to look at the parameter
estimates for the BEST model for interpretation and when you scroll down
# this is “fm4”. For the purposes of this tutorial, disregard the other 9 models and scroll down to the
parameter estimates for “fm4” where I explain in further detail.
# When you run this code you will get an error message “K was not specified and was set to…” The
authors mention not to worry about this.
# K is upper index of integration, if you don't set this R will do it automatically. It should be high enough
to not affect the parameter estimates, typically 100 or higher.
(fm1 <- pcount(~1 ~1, alfl.umf))
Call:
pcount(formula = ~1 ~ 1, data = alfl.umf)
Abundance:
Estimate
SE
z P(>|z|)
0.777 0.283 2.74 0.00608
Detection:
Estimate
SE
z P(>|z|)
-0.904 0.394 -2.3 0.0217
AIC: 313.9004
Warning message:
In pcount(~1 ~ 1, alfl.umf) : K was not specified and was set to 103.
backTransform(fm1, type="state")
Backtransformed linear combination(s) of Abundance estimate(s)
Estimate
SE LinComb (Intercept)
2.17 0.615
0.777
1
backTransform(fm1, type="det")
Backtransformed linear combination(s) of Detection estimate(s)
Estimate
SE LinComb (Intercept)
0.288 0.0808 -0.904
1
(fm2 <- pcount(~date+time ~1, alfl.umf))
Call:
pcount(formula = ~date + time ~ 1, data = alfl.umf)
Abundance:
Estimate
SE
z P(>|z|)
0.436 0.164 2.65 0.00802
Detection:
(Intercept)
date
time
Estimate
SE
z P(>|z|)
-0.352 0.274 -1.28 1.99e-01
-0.768 0.176 -4.35 1.35e-05
-0.460 0.172 -2.67 7.61e-03
AIC: 286.6671
Warning message:
In pcount(~date + time ~ 1, alfl.umf) :
K was not specified and was set to 103.
(fm3 <- pcount(~date+time ~woody, alfl.umf))
Call:
pcount(formula = ~date + time ~ woody, data = alfl.umf)
Abundance:
(Intercept)
woody
Estimate
SE
z P(>|z|)
0.379 0.191 1.98 0.047647
0.466 0.124 3.77 0.000162
Detection:
(Intercept)
date
time
Estimate
SE
z P(>|z|)
-0.461 0.312 -1.48 1.40e-01
-0.720 0.179 -4.02 5.72e-05
-0.425 0.175 -2.43 1.51e-02
AIC: 274.868
Warning message:
In pcount(~date + time ~ woody, alfl.umf) :
K was not specified and was set to 103.
# “fm4” is the best model (has the lowest AIC value of all the models = 274.2125). Notice the estimates
for abundance. They are both positive values, indicating that the number of
# Alder Flycatchers detected increases with both percent woody cover (0.411) and increasing vegetation
structure (0.213). The estimates for detection are both negative
# values, indicating that Alder Flycatcher detectability decreases later in the season (date = -0.701) and
later in the day (-0.457).
(fm4 <- pcount(~date+time ~woody+struct, alfl.umf))
Call:
pcount(formula = ~date + time ~ woody + struct, data = alfl.umf)
Abundance:
(Intercept)
woody
struct
Estimate
SE
z P(>|z|)
0.389 0.198 1.97 0.04904
0.411 0.127 3.24 0.00121
0.213 0.126 1.69 0.09145
Detection:
(Intercept)
date
time
Estimate
SE
z P(>|z|)
-0.519 0.321 -1.62 1.06e-01
-0.701 0.177 -3.96 7.49e-05
-0.457 0.175 -2.61 9.00e-03
AIC: 274.2125
Warning message:
In pcount(~date + time ~ woody + struct, alfl.umf) :
K was not specified and was set to 103.
(fm5 <- pcount(~date+time ~1, alfl.umf,mixture="NB"))
Call:
pcount(formula = ~date + time ~ 1, data = alfl.umf, mixture = "NB")
Abundance:
Estimate
SE
z P(>|z|)
0.477 0.195 2.44 0.0145
Detection:
(Intercept)
date
time
Estimate
SE
z P(>|z|)
-0.422 0.313 -1.35 1.78e-01
-0.754 0.177 -4.27 1.98e-05
-0.468 0.173 -2.70 6.86e-03
Dispersion:
Estimate
SE
z P(>|z|)
1.94 1.77 1.1
0.271
AIC: 288.2807
Warning message:
In pcount(~date + time ~ 1, alfl.umf, mixture = "NB") :
K was not specified and was set to 103.
(fm6 <- pcount(~date+time ~1, alfl.umf,mixture="ZIP"))
Call:
pcount(formula = ~date + time ~ 1, data = alfl.umf, mixture = "ZIP")
Abundance:
Estimate
SE
z P(>|z|)
0.622 0.243 2.56 0.0105
Detection:
(Intercept)
date
time
Estimate
SE
z P(>|z|)
-0.443 0.314 -1.41 1.58e-01
-0.745 0.178 -4.19 2.82e-05
-0.447 0.173 -2.59 9.67e-03
Zero-inflation:
Estimate
SE
z P(>|z|)
-1.93 0.993 -1.94 0.0523
AIC: 287.5286
Warning message:
In pcount(~date + time ~ 1, alfl.umf, mixture = "ZIP") :
K was not specified and was set to 103.
(fm7 <- pcount(~date+time ~woody,alfl.umf,mixture="ZIP"))
Call:
pcount(formula = ~date + time ~ woody, data = alfl.umf, mixture =
"ZIP")
Abundance:
(Intercept)
Estimate
SE
z P(>|z|)
0.378 0.191 1.98 0.047702
woody
0.466 0.124 3.76 0.000167
Detection:
(Intercept)
date
time
Estimate
SE
z P(>|z|)
-0.461 0.312 -1.48 1.40e-01
-0.719 0.179 -4.02 5.76e-05
-0.426 0.175 -2.43 1.49e-02
Zero-inflation:
Estimate SE
z P(>|z|)
-8.87 27 -0.329
0.742
AIC: 276.8709
Warning message:
In pcount(~date + time ~ woody, alfl.umf, mixture = "ZIP") :
K was not specified and was set to 103.
(fm8 <- pcount(~date+time ~struct,alfl.umf,mixture="ZIP"))
Call:
pcount(formula = ~date + time ~ struct, data = alfl.umf, mixture =
"ZIP")
Abundance:
(Intercept)
struct
Estimate
SE
z P(>|z|)
0.528 0.235 2.25 0.0243
0.321 0.131 2.46 0.0139
Detection:
(Intercept)
date
time
Estimate
SE
z P(>|z|)
-0.488 0.314 -1.56 1.20e-01
-0.723 0.175 -4.13 3.61e-05
-0.488 0.172 -2.83 4.64e-03
Zero-inflation:
Estimate
SE
z P(>|z|)
-2.62 1.65 -1.59
0.113
AIC: 284.2515
Warning message:
In pcount(~date + time ~ struct, alfl.umf, mixture = "ZIP") :
K was not specified and was set to 103.
(fm9 <- pcount(~date+time ~woody+struct, alfl.umf,mixture="ZIP"))
Call:
pcount(formula = ~date + time ~ woody + struct, data = alfl.umf,
mixture = "ZIP")
Abundance:
Estimate
SE
z P(>|z|)
(Intercept)
woody
struct
0.389 0.197 1.97 0.04904
0.411 0.127 3.24 0.00121
0.213 0.126 1.69 0.09152
Detection:
Estimate
SE
z P(>|z|)
(Intercept)
-0.519 0.321 -1.61 1.06e-01
date
-0.701 0.177 -3.96 7.48e-05
time
-0.457 0.175 -2.61 8.99e-03
Zero-inflation:
Estimate
SE
z P(>|z|)
-11.5 88.2 -0.131
0.896
AIC: 276.2128
Warning message:
In pcount(~date + time ~ woody + struct, alfl.umf, mixture = "ZIP") :
K was not specified and was set to 103.
(fm10<- pcount(~date+time ~woody+struct, alfl.umf,mixture="NB"))
Call:
pcount(formula = ~date + time ~ woody + struct, data = alfl.umf,
mixture = "NB")
Abundance:
(Intercept)
woody
struct
Estimate
SE
z P(>|z|)
0.389 0.198 1.97 0.04904
0.411 0.127 3.24 0.00121
0.213 0.126 1.69 0.09145
Detection:
(Intercept)
date
time
Estimate
SE
z P(>|z|)
-0.519 0.321 -1.62 1.06e-01
-0.701 0.177 -3.96 7.49e-05
-0.457 0.175 -2.61 9.00e-03
Dispersion:
Estimate
SE
z P(>|z|)
11 84.8 0.129
0.897
AIC: 276.2128
Warning message:
In pcount(~date + time ~ woody + struct, alfl.umf, mixture = "NB") :
K was not specified and was set to 103.
#
# MODEL SELECTION
#
# Put the fitted models in a "fitList". A fitlist includes a bunch of ways to fit a model to your data.
# fitList() organizes models for model selection.
fms <- fitList("lam(.)p(.)"
= fm1,
"lam(.)p(date+time)"
= fm2,
"lam(woody)p(date+time)"
= fm3,
"lam(woody+struct)p(date+time)" = fm4,
"lam(.)p(date+time)NB"
= fm5,
"lam(.)p(date+time)ZIP"
= fm6,
"lam(woody)p(date+time)ZIP" = fm7,
"lam(struct)p(date+time)ZIP" = fm8,
"lam(woody+struct)p(date+time)ZIP"=fm9,
"lam(woody+struct)p(date+time)NB" =fm10)
# Rank them by AIC with (ms <- modSel())
# Once you run the code, the best model (“fm4” or “lam(woody+struc)p(date+time)”) has the lowest AIC
value
# A value of delta <2 suggests model has substantial support
# The AIC weight indicates that the model has an x% chance of being the best model.
# Check placement of your null model (“lam(.)p(.)”) with no covariates, ideally you want it to be near the
bottom
(ms <- modSel(fms))
lam(woody+struct)p(date+time)
lam(woody)p(date+time)
lam(woody+struct)p(date+time)ZIP
lam(woody+struct)p(date+time)NB
lam(woody)p(date+time)ZIP
lam(struct)p(date+time)ZIP
lam(.)p(date+time)
lam(.)p(date+time)ZIP
lam(.)p(date+time)NB
lam(.)p(.)
nPars
6
5
7
7
6
6
4
5
5
2
AIC
274.21
274.87
276.21
276.21
276.87
284.25
286.67
287.53
288.28
313.90
delta
0.00
0.66
2.00
2.00
2.66
10.04
12.45
13.32
14.07
39.69
AICwt cumltvWt
3.7e-01
0.37
2.6e-01
0.63
1.3e-01
0.76
1.3e-01
0.90
9.7e-02
1.00
2.4e-03
1.00
7.2e-04
1.00
4.7e-04
1.00
3.2e-04
1.00
8.8e-10
1.00
#
# ANALYSIS AND PREDICTIONS
#
# We may be interested in predicting the expected detection probability as a function of time of day
# We standardized "time" and predict over a range of values on that scale. We have to fix the date at an
arbitrary value.
# predict() returns predicted values from fitted model objects
newData1 <- data.frame(time=seq(-2.08, 1.86, by=0.1), date=0)
E.p <- predict(fm4, type="det", newdata=newData1, appendData=TRUE)
head(E.p)
Predicted
SE
lower
upper time date
1 0.6062208 0.1305971 0.3450656 0.8181269 -2.08
0
2 0.5952590 0.1283500 0.3411001 0.8068848 -1.98
0
3
4
5
6
0.5842014
0.5730583
0.5618406
0.5505593
0.1259704
0.1234691
0.1208577
0.1181491
0.3370719
0.3329757
0.3288059
0.3245565
0.7951837
0.7830342
0.7704513
0.7574540
-1.88
-1.78
-1.68
-1.58
0
0
0
0
# Plot it
# ylim is the y-axis and goes from 0 to 1 for probability
plot(Predicted ~ time, E.p, type="l", ylim=c(0,1),
xlab="time of day (standardized)",
ylab="Expected detection probability")
lines(lower ~ time, E.p, type="l", col=gray(0.5))
lines(upper ~ time, E.p, type="l", col=gray(0.5))
(this produces a plot in R of the relationship between time of day and
expected detection probability)
# Find the expected abundance over the range of the percent cover of "woody" vegetation.
newData2 <- data.frame(woody=seq(-1.6, 2.38,,50),struct=seq(-1.8,3.2,,50))
E.N <- predict(fm4, type="state", newdata=newData2, appendData=TRUE)
head(E.N)
1
2
3
4
5
6
Predicted
0.5213254
0.5508414
0.5820285
0.6149814
0.6497999
0.6865898
SE
0.1851973
0.1893141
0.1934132
0.1974977
0.2015747
0.2056563
lower
0.2598516
0.2808563
0.3034468
0.3277205
0.3537757
0.3817107
upper
1.045905
1.080361
1.116364
1.154039
1.193524
1.234981
woody
-1.600000
-1.518776
-1.437551
-1.356327
-1.275102
-1.193878
struct
-1.800000
-1.697959
-1.595918
-1.493878
-1.391837
-1.289796
# Plot it, but convert the x-axis back to original scale by multiplying by the sd and adding back the mean
plot(Predicted ~ woody, E.N, type="l", ylim=c(-.1,max(E.N$Predicted)),
xlab="Percent cover - woody vegetation",
ylab="Expected abundance, E[N]",
xaxt="n")
xticks <- -1:2
xlabs <- xticks*woody.sd + woody.mean
axis(1, at=xticks, labels=round(xlabs, 1))
lines(lower ~ woody, E.N, type="l", col=gray(0.5))
lines(upper ~ woody, E.N, type="l", col=gray(0.5))
(this produces a plot in R of the relationship between percent cover
woody veg and expected abundance)
Download