Fitting Royle’s N-mixture Model with Package unmarked in R Written by Ian Fiske and Richard Chandler Sinead Borchert 4/16/13 Package unmarked includes code for modeling the occupancy (presence/absence), abundance, and density of unmarked animals. In this tutorial I will discuss the pcount function (using sample code provided by the authors), which models abundance for a closed population. The data structure is based on repeated counts of a population in the field, which is then used to model site (e.g. habitat) and detection (e.g. wind, time of day) covariates with abundance. The ultimate goal is to transform observed field count data into an index of abundance and to determine how environmental factors contribute to variation in abundance. References: Fiske, I. and R.B. Chandler. 2011. unmarked: An R package for fitting hierarchical models of wildlife occurrence and abundance. Journal of Statistical Software, 43(10), 1-23. Royle, J. A. 2004. N-Mixture Models for Estimating Population Size from Spatially Replicated Counts. Biometrics 60, 108–105. Kéry, M., Royle, J. A., and Schmid, H. 2005. Modeling Avaian Abundance from Replicated Counts Using Binomial Mixture Models. Ecological Applications 15(4), 1450–1461. Package unmarked can be downloaded at: http://cran.rproject.org/web/packages/unmarked/index.html Lots of information on package unmarked and a help forum can be found at the authors’ website: https://sites.google.com/site/unmarkedinfo/ My comments follow the # symbol and precede the code. Calibri = the code Courier New = the output # # INSTALL PACKAGE UNMARKED # # Installs directly from the CRAN website install.packages("unmarked") # Load the package library(unmarked) # Set the working directory to where your file is setwd("C:/Users/Sinead/Documents/R_Seminar") # Check that you have the right working directory saved (optional) getwd() # # SET-UP DATA FOR ANALYSIS # # Import data # This is repeated count data of Alder Flycatchers with site covariates for vegetation structure and percent woody cover and detection covariates for time and date of each visit # str() checks the structure of the data alfl.data <- read.csv("alfl.csv", row.names=1) str(alfl.data) 'data.frame': 50 obs. of 12 variables: $ alfl1 : int 1 2 0 1 0 3 0 0 0 0 ... $ alfl2 : int 0 3 1 0 0 1 0 0 1 0 ... $ alfl3 : int 1 0 1 1 0 1 0 0 2 0 ... $ struct: num 5.45 4.75 14.7 5.05 4.15 9.75 9.6 15.7 9.2 7.75 ... $ woodyO: int 6 1 7 6 2 8 4 0 4 3 ... $ woody : num 0.3 0.05 0.35 0.3 0.1 0.4 0.2 0 0.2 0.15 ... $ time.1: num 8.68 9.43 8.25 7.77 9.57 9.1 8.6 8.12 7.63 9.92 ... $ time.2: num 8.73 7.4 6.7 6.23 9.55 9.12 8.62 7.92 7.43 5.67 ... $ time.3: num 5.72 7.58 7.62 7.17 5.73 9.12 6.72 8.07 7.6 9.72 ... $ date.1: int 6 20 20 20 8 8 8 1 1 1 ... $ date.2: int 25 32 32 32 27 27 27 27 27 27 ... $ date.3: int 34 54 47 47 36 36 36 36 36 36 ... # Create an object that will pull out the count data alfl.y <- alfl.data[,c("alfl1", "alfl2", "alfl3")] # Standardize the site-covariates – you want them all to be on the same scale because different ranges have an effect on the variance woody.mean <- mean(alfl.data$woody) woody.sd <- sd(alfl.data$woody) woody.z <- (alfl.data$woody-woody.mean)/woody.sd struct.mean <- mean(alfl.data$struct) struct.sd <- sd(alfl.data$struct) struct.z <- (alfl.data$struct-struct.mean)/struct.sd # Create your unmarkedFrame # Function pcount fits the N-Mixture model of Royle # Site covariates stay the same. Observation covariates (time and date) change with each visit. # “y” is the count data we pulled out earlier library(unmarked) alfl.umf <- unmarkedFramePCount(y=alfl.y, siteCovs=data.frame(woody=woody.z, struct=struct.z), obsCovs=list(time=alfl.data[,c("time.1", "time.2", "time.3")], date=alfl.data[,c("date.1", "date.2", "date.3")])) summary(alfl.umf) unmarkedFrame Object 50 sites Maximum number of observations per site: 3 Mean number of observations per site: 3 Sites with at least one detection: 34 Tabulation of y observations: 0 1 2 3 <NA> 85 42 17 6 0 Site-level covariates: woody struct Min. :-1.5967 Min. :-1.80997 1st Qu.:-0.6019 1st Qu.:-0.77096 Median :-0.1045 Median : 0.02358 Mean : 0.0000 Mean : 0.00000 3rd Qu.: 0.6417 3rd Qu.: 0.60241 Max. : 2.3826 Max. : 3.20894 Observation-level covariates: time date Min. :4.900 Min. : 1.00 1st Qu.:6.550 1st Qu.:11.00 Median :7.560 Median :27.00 Mean :7.553 Mean :24.83 3rd Qu.:8.620 3rd Qu.:34.00 Max. :9.920 Max. :54.00 # # FIT MODELS # # Fit models to your data and extract the estimates # Detection covariates follow first tilde and abundance covariates follow the second. You can also specify different distributions using “mixture=”. The detection process is always # modeled as binomial. The abundance process is by default modeled as Poisson, but you can also specify zero-inflated Poisson and negative binomial # Some of the models compared below use the zero-inflated poisson (“ZIP”) or negative binomial (“NB”) # Estimates of detection covariates are on the logit scale # The first fitted model (with no covariates) sets up the intercepts # I am fitting 10 models here, so there will be a lot of code, but the key here is to look at the parameter estimates for the BEST model for interpretation and when you scroll down # this is “fm4”. For the purposes of this tutorial, disregard the other 9 models and scroll down to the parameter estimates for “fm4” where I explain in further detail. # When you run this code you will get an error message “K was not specified and was set to…” The authors mention not to worry about this. # K is upper index of integration, if you don't set this R will do it automatically. It should be high enough to not affect the parameter estimates, typically 100 or higher. (fm1 <- pcount(~1 ~1, alfl.umf)) Call: pcount(formula = ~1 ~ 1, data = alfl.umf) Abundance: Estimate SE z P(>|z|) 0.777 0.283 2.74 0.00608 Detection: Estimate SE z P(>|z|) -0.904 0.394 -2.3 0.0217 AIC: 313.9004 Warning message: In pcount(~1 ~ 1, alfl.umf) : K was not specified and was set to 103. backTransform(fm1, type="state") Backtransformed linear combination(s) of Abundance estimate(s) Estimate SE LinComb (Intercept) 2.17 0.615 0.777 1 backTransform(fm1, type="det") Backtransformed linear combination(s) of Detection estimate(s) Estimate SE LinComb (Intercept) 0.288 0.0808 -0.904 1 (fm2 <- pcount(~date+time ~1, alfl.umf)) Call: pcount(formula = ~date + time ~ 1, data = alfl.umf) Abundance: Estimate SE z P(>|z|) 0.436 0.164 2.65 0.00802 Detection: (Intercept) date time Estimate SE z P(>|z|) -0.352 0.274 -1.28 1.99e-01 -0.768 0.176 -4.35 1.35e-05 -0.460 0.172 -2.67 7.61e-03 AIC: 286.6671 Warning message: In pcount(~date + time ~ 1, alfl.umf) : K was not specified and was set to 103. (fm3 <- pcount(~date+time ~woody, alfl.umf)) Call: pcount(formula = ~date + time ~ woody, data = alfl.umf) Abundance: (Intercept) woody Estimate SE z P(>|z|) 0.379 0.191 1.98 0.047647 0.466 0.124 3.77 0.000162 Detection: (Intercept) date time Estimate SE z P(>|z|) -0.461 0.312 -1.48 1.40e-01 -0.720 0.179 -4.02 5.72e-05 -0.425 0.175 -2.43 1.51e-02 AIC: 274.868 Warning message: In pcount(~date + time ~ woody, alfl.umf) : K was not specified and was set to 103. # “fm4” is the best model (has the lowest AIC value of all the models = 274.2125). Notice the estimates for abundance. They are both positive values, indicating that the number of # Alder Flycatchers detected increases with both percent woody cover (0.411) and increasing vegetation structure (0.213). The estimates for detection are both negative # values, indicating that Alder Flycatcher detectability decreases later in the season (date = -0.701) and later in the day (-0.457). (fm4 <- pcount(~date+time ~woody+struct, alfl.umf)) Call: pcount(formula = ~date + time ~ woody + struct, data = alfl.umf) Abundance: (Intercept) woody struct Estimate SE z P(>|z|) 0.389 0.198 1.97 0.04904 0.411 0.127 3.24 0.00121 0.213 0.126 1.69 0.09145 Detection: (Intercept) date time Estimate SE z P(>|z|) -0.519 0.321 -1.62 1.06e-01 -0.701 0.177 -3.96 7.49e-05 -0.457 0.175 -2.61 9.00e-03 AIC: 274.2125 Warning message: In pcount(~date + time ~ woody + struct, alfl.umf) : K was not specified and was set to 103. (fm5 <- pcount(~date+time ~1, alfl.umf,mixture="NB")) Call: pcount(formula = ~date + time ~ 1, data = alfl.umf, mixture = "NB") Abundance: Estimate SE z P(>|z|) 0.477 0.195 2.44 0.0145 Detection: (Intercept) date time Estimate SE z P(>|z|) -0.422 0.313 -1.35 1.78e-01 -0.754 0.177 -4.27 1.98e-05 -0.468 0.173 -2.70 6.86e-03 Dispersion: Estimate SE z P(>|z|) 1.94 1.77 1.1 0.271 AIC: 288.2807 Warning message: In pcount(~date + time ~ 1, alfl.umf, mixture = "NB") : K was not specified and was set to 103. (fm6 <- pcount(~date+time ~1, alfl.umf,mixture="ZIP")) Call: pcount(formula = ~date + time ~ 1, data = alfl.umf, mixture = "ZIP") Abundance: Estimate SE z P(>|z|) 0.622 0.243 2.56 0.0105 Detection: (Intercept) date time Estimate SE z P(>|z|) -0.443 0.314 -1.41 1.58e-01 -0.745 0.178 -4.19 2.82e-05 -0.447 0.173 -2.59 9.67e-03 Zero-inflation: Estimate SE z P(>|z|) -1.93 0.993 -1.94 0.0523 AIC: 287.5286 Warning message: In pcount(~date + time ~ 1, alfl.umf, mixture = "ZIP") : K was not specified and was set to 103. (fm7 <- pcount(~date+time ~woody,alfl.umf,mixture="ZIP")) Call: pcount(formula = ~date + time ~ woody, data = alfl.umf, mixture = "ZIP") Abundance: (Intercept) Estimate SE z P(>|z|) 0.378 0.191 1.98 0.047702 woody 0.466 0.124 3.76 0.000167 Detection: (Intercept) date time Estimate SE z P(>|z|) -0.461 0.312 -1.48 1.40e-01 -0.719 0.179 -4.02 5.76e-05 -0.426 0.175 -2.43 1.49e-02 Zero-inflation: Estimate SE z P(>|z|) -8.87 27 -0.329 0.742 AIC: 276.8709 Warning message: In pcount(~date + time ~ woody, alfl.umf, mixture = "ZIP") : K was not specified and was set to 103. (fm8 <- pcount(~date+time ~struct,alfl.umf,mixture="ZIP")) Call: pcount(formula = ~date + time ~ struct, data = alfl.umf, mixture = "ZIP") Abundance: (Intercept) struct Estimate SE z P(>|z|) 0.528 0.235 2.25 0.0243 0.321 0.131 2.46 0.0139 Detection: (Intercept) date time Estimate SE z P(>|z|) -0.488 0.314 -1.56 1.20e-01 -0.723 0.175 -4.13 3.61e-05 -0.488 0.172 -2.83 4.64e-03 Zero-inflation: Estimate SE z P(>|z|) -2.62 1.65 -1.59 0.113 AIC: 284.2515 Warning message: In pcount(~date + time ~ struct, alfl.umf, mixture = "ZIP") : K was not specified and was set to 103. (fm9 <- pcount(~date+time ~woody+struct, alfl.umf,mixture="ZIP")) Call: pcount(formula = ~date + time ~ woody + struct, data = alfl.umf, mixture = "ZIP") Abundance: Estimate SE z P(>|z|) (Intercept) woody struct 0.389 0.197 1.97 0.04904 0.411 0.127 3.24 0.00121 0.213 0.126 1.69 0.09152 Detection: Estimate SE z P(>|z|) (Intercept) -0.519 0.321 -1.61 1.06e-01 date -0.701 0.177 -3.96 7.48e-05 time -0.457 0.175 -2.61 8.99e-03 Zero-inflation: Estimate SE z P(>|z|) -11.5 88.2 -0.131 0.896 AIC: 276.2128 Warning message: In pcount(~date + time ~ woody + struct, alfl.umf, mixture = "ZIP") : K was not specified and was set to 103. (fm10<- pcount(~date+time ~woody+struct, alfl.umf,mixture="NB")) Call: pcount(formula = ~date + time ~ woody + struct, data = alfl.umf, mixture = "NB") Abundance: (Intercept) woody struct Estimate SE z P(>|z|) 0.389 0.198 1.97 0.04904 0.411 0.127 3.24 0.00121 0.213 0.126 1.69 0.09145 Detection: (Intercept) date time Estimate SE z P(>|z|) -0.519 0.321 -1.62 1.06e-01 -0.701 0.177 -3.96 7.49e-05 -0.457 0.175 -2.61 9.00e-03 Dispersion: Estimate SE z P(>|z|) 11 84.8 0.129 0.897 AIC: 276.2128 Warning message: In pcount(~date + time ~ woody + struct, alfl.umf, mixture = "NB") : K was not specified and was set to 103. # # MODEL SELECTION # # Put the fitted models in a "fitList". A fitlist includes a bunch of ways to fit a model to your data. # fitList() organizes models for model selection. fms <- fitList("lam(.)p(.)" = fm1, "lam(.)p(date+time)" = fm2, "lam(woody)p(date+time)" = fm3, "lam(woody+struct)p(date+time)" = fm4, "lam(.)p(date+time)NB" = fm5, "lam(.)p(date+time)ZIP" = fm6, "lam(woody)p(date+time)ZIP" = fm7, "lam(struct)p(date+time)ZIP" = fm8, "lam(woody+struct)p(date+time)ZIP"=fm9, "lam(woody+struct)p(date+time)NB" =fm10) # Rank them by AIC with (ms <- modSel()) # Once you run the code, the best model (“fm4” or “lam(woody+struc)p(date+time)”) has the lowest AIC value # A value of delta <2 suggests model has substantial support # The AIC weight indicates that the model has an x% chance of being the best model. # Check placement of your null model (“lam(.)p(.)”) with no covariates, ideally you want it to be near the bottom (ms <- modSel(fms)) lam(woody+struct)p(date+time) lam(woody)p(date+time) lam(woody+struct)p(date+time)ZIP lam(woody+struct)p(date+time)NB lam(woody)p(date+time)ZIP lam(struct)p(date+time)ZIP lam(.)p(date+time) lam(.)p(date+time)ZIP lam(.)p(date+time)NB lam(.)p(.) nPars 6 5 7 7 6 6 4 5 5 2 AIC 274.21 274.87 276.21 276.21 276.87 284.25 286.67 287.53 288.28 313.90 delta 0.00 0.66 2.00 2.00 2.66 10.04 12.45 13.32 14.07 39.69 AICwt cumltvWt 3.7e-01 0.37 2.6e-01 0.63 1.3e-01 0.76 1.3e-01 0.90 9.7e-02 1.00 2.4e-03 1.00 7.2e-04 1.00 4.7e-04 1.00 3.2e-04 1.00 8.8e-10 1.00 # # ANALYSIS AND PREDICTIONS # # We may be interested in predicting the expected detection probability as a function of time of day # We standardized "time" and predict over a range of values on that scale. We have to fix the date at an arbitrary value. # predict() returns predicted values from fitted model objects newData1 <- data.frame(time=seq(-2.08, 1.86, by=0.1), date=0) E.p <- predict(fm4, type="det", newdata=newData1, appendData=TRUE) head(E.p) Predicted SE lower upper time date 1 0.6062208 0.1305971 0.3450656 0.8181269 -2.08 0 2 0.5952590 0.1283500 0.3411001 0.8068848 -1.98 0 3 4 5 6 0.5842014 0.5730583 0.5618406 0.5505593 0.1259704 0.1234691 0.1208577 0.1181491 0.3370719 0.3329757 0.3288059 0.3245565 0.7951837 0.7830342 0.7704513 0.7574540 -1.88 -1.78 -1.68 -1.58 0 0 0 0 # Plot it # ylim is the y-axis and goes from 0 to 1 for probability plot(Predicted ~ time, E.p, type="l", ylim=c(0,1), xlab="time of day (standardized)", ylab="Expected detection probability") lines(lower ~ time, E.p, type="l", col=gray(0.5)) lines(upper ~ time, E.p, type="l", col=gray(0.5)) (this produces a plot in R of the relationship between time of day and expected detection probability) # Find the expected abundance over the range of the percent cover of "woody" vegetation. newData2 <- data.frame(woody=seq(-1.6, 2.38,,50),struct=seq(-1.8,3.2,,50)) E.N <- predict(fm4, type="state", newdata=newData2, appendData=TRUE) head(E.N) 1 2 3 4 5 6 Predicted 0.5213254 0.5508414 0.5820285 0.6149814 0.6497999 0.6865898 SE 0.1851973 0.1893141 0.1934132 0.1974977 0.2015747 0.2056563 lower 0.2598516 0.2808563 0.3034468 0.3277205 0.3537757 0.3817107 upper 1.045905 1.080361 1.116364 1.154039 1.193524 1.234981 woody -1.600000 -1.518776 -1.437551 -1.356327 -1.275102 -1.193878 struct -1.800000 -1.697959 -1.595918 -1.493878 -1.391837 -1.289796 # Plot it, but convert the x-axis back to original scale by multiplying by the sd and adding back the mean plot(Predicted ~ woody, E.N, type="l", ylim=c(-.1,max(E.N$Predicted)), xlab="Percent cover - woody vegetation", ylab="Expected abundance, E[N]", xaxt="n") xticks <- -1:2 xlabs <- xticks*woody.sd + woody.mean axis(1, at=xticks, labels=round(xlabs, 1)) lines(lower ~ woody, E.N, type="l", col=gray(0.5)) lines(upper ~ woody, E.N, type="l", col=gray(0.5)) (this produces a plot in R of the relationship between percent cover woody veg and expected abundance)