MT3835 2002 Lectures Part 3 (Line transects)

advertisement
MT3835: Wildlife Population Assessment
David Borchers, 2002
Distance Sampling
Line Transect Methods
Before lecture:
 load library
 copy MT5751 misc functions.txt into command window
1. Motivating Example
What if we don’t manage to count all animals in the covered region?
# Make heterogeneous process model
myreg <- generate.region(x.length=100, y.width=50)
mydens <- generate.density(nint.x=100, nint.y=50,
southwest=1,southeast=10, northwest=40)
mydens<-add.hotspot(mydens,myreg,x=20,y=10,altitude=200, sigma=10)
mydens <- add.hotspot(mydens, myreg, x=80, y=25, altitude=100,
sigma=15)
plot(mydens,myreg,eye.vert=10,eye.horiz=320)
# Generate heterogeneous population (same type as book)
mypop.pars<-mypop.pars<-setpars.population (myreg, mydens,
number.groups=1000, size.method="poisson",size.min=1, size.max=8,
size.mean=3, exposure.method="beta",exposure.min=0.00001,
exposure.max=1,exposure.mean=0.5, exposure.shape=1.5,type.values=
c("M", "F"), type.prob= c(0.5,0.5), density.pop=mydens,
adjust.interactive=T)
set.seed(1234)
mypop <- generate.population(mypop.pars)
plot(mypop)
Plot of the population
pars.des.lt<-setpars.design.lt(myreg, n.transects=6, n.units=18,
visual.range=2, percent.on.effort=0.5)
set.seed(1820)
mydes <- generate.design.lt(pars.des.lt)
plot(mydes,show.paths=T)
Plot of the survey design
pars.sur.lt<-setpars.survey.lt (mypop, mydes, disthalf.min=0.5,
disthalf.max=1.2)
set.seed(1652)
mysamp <- generate.sample.lt (pars.sur.lt)
plot(mysamp,whole.population=T,dsf=0.8)
summary(mysamp)
Plot of the sample
Saw n=58 groups; covered region is 12% of survey region, so using plot sampling estimator,
abundance is estimated to be
n<-58
pi.c<-0.12
est.pl<-n/pi.c
est.pl
est.cond.lt<-point.est.lt(mysamp, conditional=T)
The above call to the line transect estimation function point.est.lt() puts estimates and
related data into the object est.cond.lt. You can see the names of the things in this object
by typing
names(est.cond.lt)
To see the estimate of group abundance, type
round(est.cond.lt$Nhat.grp)
The true population size is 1,000, the plot survey estimate is 483 and the line transect estimate
is 972.
What is going on?
2. The Line Transect Point Estimator of N
Animals farther from the line have been missed (see plot)
sigma2<-est.cond.lt$theta
hn<-function(x,sigma2) {exp(-x^2/(2*sigma2))}
w<-mysamp$design$visual.range
x<-seq(0,w,length=100)
f<-hn(x,sigma2)
plot(x,f,type=”l”,main=”Estimated detection function”,ylim=c(0,1),col=”blue”,bty=”n”)
lines(c(0,0),c(0,1),lwd=2)
lines(c(0,w),c(0,0),lwd=2)
# add dotted line at height of intercept:
lines(c(0,w),rep(f[1],2),lty=2)
# add dotted line to complete box:
lines(c(w,w),c(0,f[1]),lty=2)
# shade missed bit:
polygon(c(x,x[length(x)]),c(f,f[1]),col=”lightblue”)
Area under detection function is
mu<-est.cond.lt$mu
mu
Area under curve if detect all is 1*2=2
So proportion detected is
p<-mu/2
p
Correcting the plot sampling abundance estimate for the fact that we only detected a proportion p of the
animals in the covered region, we get
round(est.pl/p)
which gives the line transect estimate.
round(est.cond.lt$Nhat.grp)
Line transect estimators methods “correct” plot sampling estimators to take account of the
proportion of animals within the covered region that are missed.
We can see algebraically what the line transect estimator has done. The plot sampling estimate is
n
n
nA
nA
Nˆ 



 c  a  a 2wL
 
 A
where w is the “truncation distance” (the maximum distance searched from the line), and L is the total
line length searched.
The line transect estimator just “corrects” this, by dividing it by the estimated probability of detection,
which is the area under the estimated detection function ( ̂ ) divided by the area under the line of
certain detection (w):
pˆ 
̂
w

nA w

2wL ˆ
Hence
Nˆ


 nA 


 2wL 
pˆ

nA
2ˆ L
General formulation conditional and full likelihood functions
(on board; see sections 7.2.4 and 7.2.5 of the book)
See Tutorial 1 for a derivation of the MLE from the conditional likelihood for the half normal
detection function.
Grouped populations
Many animals occur in groups (herds of caribou, schools of fish, pods of whales, flocks of birds)
and it is often the group that is the detection unit – your attention is drawn by the group as a whole,
not to each individual animal separately.
In this case it is often best to estimate group abundance, and to separately estimate mean group
size. The WiSP library estimates group abundance ($Nhat.grp), mean group size ($Es), and
individual abundance ($Nhat.ind=$Nhat.grp*$Es).

for individual abundance: CV Nˆ 
Conventionally, for group abundance: CV 2 Nˆ
2
n
 CV 2    CV 2 ̂  , and
L
n
 CV 2    CV 2 ˆ   CV 2 Eˆ s
L
 
If we bootstrap, we need not make this assumption.
3. Model Selection
We assumed above that the detection function was half-normal, i.e., that the probability of detecting an
animal at perpendicular distance x is
 x2 
p( x)  exp  2 
 2 
where  2 is the unknown parameter of this model. Note that p(0)=1.
This is a new sort of assumption: one that concerns the observation model rather than the process
model. Recall that the main assumption we made about the process model was that animals were
uniformly distributed in the survey region, but that we then use a nonparametric bootstrap to estimate a
CI, and this does not rely on this model assumption. With plot sampling we had no observation model
so did not need to worry about observation model assumptions. Now we do.
Here’s an alternative detection function model (called the “hazard rate” model):
2

 x
p( x)  1  exp  


  1




 2





It has two unknown parameters,  1 and  2 . Note that p(0)=1.
Here’s the fit to the data:
est.cond.hr.lt<-point.est.lt(mysamp, conditional=T,
model=”hazard.rate”)
round(est.cond.hr.lt$Nhat.grp)
This model gives an abundance estimate of 1,370 (compared to 977 from the half normal model).
Compare the fits visually:
#close graphics window, then:
par(mfrow=c(2,1))
est.cond.hn.lt<-point.est.lt(mysamp, conditional=T)
est.cond.hr.lt<-point.est.lt(mysamp,model="hazard.rate" ,
conditional=T)
Plots of two detection function fits
Which model looks better to you?
The value of –2log(L) gives an indication of how good the fit is (the smaller it is, the better the fit). The
log-likelihood is contained in
est.cond.hr.lt$log.likelihood
est.cond.hn.lt$log.likelihood
#and
So the hazard rate model fits better. But remember that the hazard rate model has one more unknown
parameter than the half normal model. The question we should really ask is:
“Is the improvement in fit sufficient to warrant estimating one more parameter?”
This is a model selection problem and we can use the AIC of each model to choose between them.
Compare the two AICs:
est.cond.hr.lt$AIC
est.cond.hn.lt$AIC
#and
The half normal model AIC is smaller than the hazard rate model AIC, so the answer to the question in
bold above is No. We base estimation on the half normal model.
4. Interval estimation
Implement the bootstrap in exactly the same way it is implemented with plot surveys, resampling the
strips where before we resampled the plots. Note that when you resample any strip, you get as data not
only the number of animals in the strip (as you did with plot sampling), but also their perpendicular
distances from the line. The line transect estimator of N (and related parameters) is used where the plot
sampling estimator was used before.
int.lt<-int.est.lt(mysamp,nboot=99,plot.all.fits=T)
#int.lt<-int.est.lt(mysamp,nboot=9,plot.all.fits=T)
# can look at bootstrap distribution of μ for example, like this:
plot.boot.dbn(int.lt$boot.dbn$mu,ci=int.lt$ci$mu,vlevels=c(0.025,0.97
5),mean=T,nclass=10,main=”Bootstrap distribution of mu”)
# or mean group size, E[s]:
plot.boot.dbn(int.lt$boot.dbn$Es,ci=int.lt$ci$Es,vlevels=c(0.025,0.97
5),mean=T,nclass=10,main="Bootstrap distribution of E[s]")
# or the detection function parameter σ2:
plot.boot.dbn(int.lt$boot.dbn$theta[,1],ci=int.lt$ci$theta[1,],vlevel
s=c(0.025,0.975),mean=T,nclass=10,main="Bootstrap distribution of
sigma^2")
5. Assumptions
Main Assumptions & Effect of violating them:
1.
Assumption 1: All animals on the transect line are detected: p(0)=1. (Note: It is conventional in
the line transect literature to refer to the detection function as g(x), not p(x), and this assumption is
commonly referred to as the “g(0)=1” assumption.)
Effect of violation: Estimates of abundance are negatively biased in proportion to p(0). For
example, if p(0)=0.25, estimates will on average be only 25% of the true abundance.
2.
Assumption 2: Groups are randomly (uniformly) and independently distributed in the survey
region
Effect of violation: CI’s based on the assumption tend to be biased. Provided robust interval
estimation methods are used (e.g. transect-based nonparamteric bootstrap), violation of this
assumption is of no great consequence.
3.
Assumption 3: Animals do not move before detection.
Effect of violation: Random movement induces positive bias (the encounter rate is “too
high”). Provided object movement is slow relative to movement of the observer, the bias is
small. Responsive movement can cause large bias (positive if there is attraction to the
observer, negative if there is avoidance).
4.
Assumption 4: Distances are measured accurately.
Effect of violation: The estimator is fairly robust to random errors in measurement. It is
sensitive to systematic bias in distance measurement, and to rounding to zero distance.
Providing assumption 1 holds, line transect estimators are “Pooling Robust”. That is, no bias is
introduced by pooling data from animals with different detectabilities (big groups and small groups, for
example, or “exposed” and “unexposed” animals). The figure below is the simulated distribution of a
line transect estimator from a population with substantial heterogeneity. Note that the true mean and
simulated mean are almost identical – heterogeneity has not resulted in bias even though the line
transect estimator does not take explicit account of heterogeneity. This is a powerful feature of line
transect methods (and distance sampling methods in general) because animal populations are almost
always heterogeneous and it is difficult to model this heterogeneity well. Line transect methods
sidestep this difficulty – they do not need to model heterogeneity to be unbiased.
Mark recapture and removal methods, by contrast are not pooling robust, as we shall see.
Key Idea:
Estimate p by modelling the decline in detection frequency with distance.
Process Model:
Animals distributed uniformly and independently in the survey region.
Observation model:
All animals on the line (or point) are detected with probability 1; detection probability
decreases with distance; detections independent.
Likelihood function
(see book)
Very Brief mention of Point Transect methods:

Same observation models as with line transects.

Uniform process model for animal location leads to different distribution of perpendicular
distances (x’s) (see next page).

A weakness of point transect methods is that you have least information where you need it
most – near the origin. This is in contrast to line transect methods.
Observerd Distribution
Detection Function g(x)
True Distribution of Animals
Line Transect
W
Point Transect
W
Download