MT3835: Wildlife Population Assessment David Borchers, 2002 Distance Sampling Line Transect Methods Before lecture: load library copy MT5751 misc functions.txt into command window 1. Motivating Example What if we don’t manage to count all animals in the covered region? # Make heterogeneous process model myreg <- generate.region(x.length=100, y.width=50) mydens <- generate.density(nint.x=100, nint.y=50, southwest=1,southeast=10, northwest=40) mydens<-add.hotspot(mydens,myreg,x=20,y=10,altitude=200, sigma=10) mydens <- add.hotspot(mydens, myreg, x=80, y=25, altitude=100, sigma=15) plot(mydens,myreg,eye.vert=10,eye.horiz=320) # Generate heterogeneous population (same type as book) mypop.pars<-mypop.pars<-setpars.population (myreg, mydens, number.groups=1000, size.method="poisson",size.min=1, size.max=8, size.mean=3, exposure.method="beta",exposure.min=0.00001, exposure.max=1,exposure.mean=0.5, exposure.shape=1.5,type.values= c("M", "F"), type.prob= c(0.5,0.5), density.pop=mydens, adjust.interactive=T) set.seed(1234) mypop <- generate.population(mypop.pars) plot(mypop) Plot of the population pars.des.lt<-setpars.design.lt(myreg, n.transects=6, n.units=18, visual.range=2, percent.on.effort=0.5) set.seed(1820) mydes <- generate.design.lt(pars.des.lt) plot(mydes,show.paths=T) Plot of the survey design pars.sur.lt<-setpars.survey.lt (mypop, mydes, disthalf.min=0.5, disthalf.max=1.2) set.seed(1652) mysamp <- generate.sample.lt (pars.sur.lt) plot(mysamp,whole.population=T,dsf=0.8) summary(mysamp) Plot of the sample Saw n=58 groups; covered region is 12% of survey region, so using plot sampling estimator, abundance is estimated to be n<-58 pi.c<-0.12 est.pl<-n/pi.c est.pl est.cond.lt<-point.est.lt(mysamp, conditional=T) The above call to the line transect estimation function point.est.lt() puts estimates and related data into the object est.cond.lt. You can see the names of the things in this object by typing names(est.cond.lt) To see the estimate of group abundance, type round(est.cond.lt$Nhat.grp) The true population size is 1,000, the plot survey estimate is 483 and the line transect estimate is 972. What is going on? 2. The Line Transect Point Estimator of N Animals farther from the line have been missed (see plot) sigma2<-est.cond.lt$theta hn<-function(x,sigma2) {exp(-x^2/(2*sigma2))} w<-mysamp$design$visual.range x<-seq(0,w,length=100) f<-hn(x,sigma2) plot(x,f,type=”l”,main=”Estimated detection function”,ylim=c(0,1),col=”blue”,bty=”n”) lines(c(0,0),c(0,1),lwd=2) lines(c(0,w),c(0,0),lwd=2) # add dotted line at height of intercept: lines(c(0,w),rep(f[1],2),lty=2) # add dotted line to complete box: lines(c(w,w),c(0,f[1]),lty=2) # shade missed bit: polygon(c(x,x[length(x)]),c(f,f[1]),col=”lightblue”) Area under detection function is mu<-est.cond.lt$mu mu Area under curve if detect all is 1*2=2 So proportion detected is p<-mu/2 p Correcting the plot sampling abundance estimate for the fact that we only detected a proportion p of the animals in the covered region, we get round(est.pl/p) which gives the line transect estimate. round(est.cond.lt$Nhat.grp) Line transect estimators methods “correct” plot sampling estimators to take account of the proportion of animals within the covered region that are missed. We can see algebraically what the line transect estimator has done. The plot sampling estimate is n n nA nA Nˆ c a a 2wL A where w is the “truncation distance” (the maximum distance searched from the line), and L is the total line length searched. The line transect estimator just “corrects” this, by dividing it by the estimated probability of detection, which is the area under the estimated detection function ( ̂ ) divided by the area under the line of certain detection (w): pˆ ̂ w nA w 2wL ˆ Hence Nˆ nA 2wL pˆ nA 2ˆ L General formulation conditional and full likelihood functions (on board; see sections 7.2.4 and 7.2.5 of the book) See Tutorial 1 for a derivation of the MLE from the conditional likelihood for the half normal detection function. Grouped populations Many animals occur in groups (herds of caribou, schools of fish, pods of whales, flocks of birds) and it is often the group that is the detection unit – your attention is drawn by the group as a whole, not to each individual animal separately. In this case it is often best to estimate group abundance, and to separately estimate mean group size. The WiSP library estimates group abundance ($Nhat.grp), mean group size ($Es), and individual abundance ($Nhat.ind=$Nhat.grp*$Es). for individual abundance: CV Nˆ Conventionally, for group abundance: CV 2 Nˆ 2 n CV 2 CV 2 ̂ , and L n CV 2 CV 2 ˆ CV 2 Eˆ s L If we bootstrap, we need not make this assumption. 3. Model Selection We assumed above that the detection function was half-normal, i.e., that the probability of detecting an animal at perpendicular distance x is x2 p( x) exp 2 2 where 2 is the unknown parameter of this model. Note that p(0)=1. This is a new sort of assumption: one that concerns the observation model rather than the process model. Recall that the main assumption we made about the process model was that animals were uniformly distributed in the survey region, but that we then use a nonparametric bootstrap to estimate a CI, and this does not rely on this model assumption. With plot sampling we had no observation model so did not need to worry about observation model assumptions. Now we do. Here’s an alternative detection function model (called the “hazard rate” model): 2 x p( x) 1 exp 1 2 It has two unknown parameters, 1 and 2 . Note that p(0)=1. Here’s the fit to the data: est.cond.hr.lt<-point.est.lt(mysamp, conditional=T, model=”hazard.rate”) round(est.cond.hr.lt$Nhat.grp) This model gives an abundance estimate of 1,370 (compared to 977 from the half normal model). Compare the fits visually: #close graphics window, then: par(mfrow=c(2,1)) est.cond.hn.lt<-point.est.lt(mysamp, conditional=T) est.cond.hr.lt<-point.est.lt(mysamp,model="hazard.rate" , conditional=T) Plots of two detection function fits Which model looks better to you? The value of –2log(L) gives an indication of how good the fit is (the smaller it is, the better the fit). The log-likelihood is contained in est.cond.hr.lt$log.likelihood est.cond.hn.lt$log.likelihood #and So the hazard rate model fits better. But remember that the hazard rate model has one more unknown parameter than the half normal model. The question we should really ask is: “Is the improvement in fit sufficient to warrant estimating one more parameter?” This is a model selection problem and we can use the AIC of each model to choose between them. Compare the two AICs: est.cond.hr.lt$AIC est.cond.hn.lt$AIC #and The half normal model AIC is smaller than the hazard rate model AIC, so the answer to the question in bold above is No. We base estimation on the half normal model. 4. Interval estimation Implement the bootstrap in exactly the same way it is implemented with plot surveys, resampling the strips where before we resampled the plots. Note that when you resample any strip, you get as data not only the number of animals in the strip (as you did with plot sampling), but also their perpendicular distances from the line. The line transect estimator of N (and related parameters) is used where the plot sampling estimator was used before. int.lt<-int.est.lt(mysamp,nboot=99,plot.all.fits=T) #int.lt<-int.est.lt(mysamp,nboot=9,plot.all.fits=T) # can look at bootstrap distribution of μ for example, like this: plot.boot.dbn(int.lt$boot.dbn$mu,ci=int.lt$ci$mu,vlevels=c(0.025,0.97 5),mean=T,nclass=10,main=”Bootstrap distribution of mu”) # or mean group size, E[s]: plot.boot.dbn(int.lt$boot.dbn$Es,ci=int.lt$ci$Es,vlevels=c(0.025,0.97 5),mean=T,nclass=10,main="Bootstrap distribution of E[s]") # or the detection function parameter σ2: plot.boot.dbn(int.lt$boot.dbn$theta[,1],ci=int.lt$ci$theta[1,],vlevel s=c(0.025,0.975),mean=T,nclass=10,main="Bootstrap distribution of sigma^2") 5. Assumptions Main Assumptions & Effect of violating them: 1. Assumption 1: All animals on the transect line are detected: p(0)=1. (Note: It is conventional in the line transect literature to refer to the detection function as g(x), not p(x), and this assumption is commonly referred to as the “g(0)=1” assumption.) Effect of violation: Estimates of abundance are negatively biased in proportion to p(0). For example, if p(0)=0.25, estimates will on average be only 25% of the true abundance. 2. Assumption 2: Groups are randomly (uniformly) and independently distributed in the survey region Effect of violation: CI’s based on the assumption tend to be biased. Provided robust interval estimation methods are used (e.g. transect-based nonparamteric bootstrap), violation of this assumption is of no great consequence. 3. Assumption 3: Animals do not move before detection. Effect of violation: Random movement induces positive bias (the encounter rate is “too high”). Provided object movement is slow relative to movement of the observer, the bias is small. Responsive movement can cause large bias (positive if there is attraction to the observer, negative if there is avoidance). 4. Assumption 4: Distances are measured accurately. Effect of violation: The estimator is fairly robust to random errors in measurement. It is sensitive to systematic bias in distance measurement, and to rounding to zero distance. Providing assumption 1 holds, line transect estimators are “Pooling Robust”. That is, no bias is introduced by pooling data from animals with different detectabilities (big groups and small groups, for example, or “exposed” and “unexposed” animals). The figure below is the simulated distribution of a line transect estimator from a population with substantial heterogeneity. Note that the true mean and simulated mean are almost identical – heterogeneity has not resulted in bias even though the line transect estimator does not take explicit account of heterogeneity. This is a powerful feature of line transect methods (and distance sampling methods in general) because animal populations are almost always heterogeneous and it is difficult to model this heterogeneity well. Line transect methods sidestep this difficulty – they do not need to model heterogeneity to be unbiased. Mark recapture and removal methods, by contrast are not pooling robust, as we shall see. Key Idea: Estimate p by modelling the decline in detection frequency with distance. Process Model: Animals distributed uniformly and independently in the survey region. Observation model: All animals on the line (or point) are detected with probability 1; detection probability decreases with distance; detections independent. Likelihood function (see book) Very Brief mention of Point Transect methods: Same observation models as with line transects. Uniform process model for animal location leads to different distribution of perpendicular distances (x’s) (see next page). A weakness of point transect methods is that you have least information where you need it most – near the origin. This is in contrast to line transect methods. Observerd Distribution Detection Function g(x) True Distribution of Animals Line Transect W Point Transect W