Lecture 1:

advertisement
MT5751 2002/3
Removal Methods
Before lecture:
 load library
 load non-library functions
1. Introduction
Roughly speaking:


The state model provides the basis for drawing inferences about the population outside the covered
region.
The observation model provides the basis for drawing inferences about the undetected population
within the covered region.
Recall that


Plot sampling involves a state model (for animal distribution) but no observation model (because
all animals in the covered region are observed with certainty).
Distance sampling involves both a state model (for animal distribution) and an observation model
(of which the detection function is the central component).
Simple removal methods1 and mark recapture methods2 assume that all animals in the survey region are
at risk of being captured – that is, that the whole survey region is the covered region. The likelihood
functions for these two classes of method are therefore based entirely on observation models.
The simplest possible observation model involves all animals in the survey region having equal,
independent probability (p) of capture. This is the observation model on which the simplest removal
and mark recapture methods are based. If we consider capture a “success” and the N animals in the
population the “trials”, then with equal, independent capture, the number of animals captured (n) is
binomial, with parameters p and N.
This should sound familiar: the plot sampling likelihood was based on exactly the same assumptions.
There is a crucial difference here though – we don’t know the probability of detection (“success”).
With plot sampling we knew it because we knew what fraction of the survey region was covered and
all animals in it were detected. With line transect methods we knew detection probability on the line
and could estimate the average probability from the drop off in number of detections with distance.
With removal methods “success” is “capture”, probability of capture depends on unknown factors (how
animals behave, what sort of capture is involved, how good the surveyors are, etc) and we don’t know
capture probability anywhere.
From a single survey, we get one bit of data: n, the number of animals captured. But we have two
unknown parameters: N and p. You can’t sensibly estimate two parameters with only one bit of
information!
Lets say we captured and removed 64 animals from a population. What would you estimate N to be?
All the survey really tells you is that N is at least 64. But N=64 and p=1 is just as plausible as N=640
and p=0.1, or N=6400 and p=0.01, etc.
1
2
Removal methods are also sometimes called “harvest methods”.
Mark recapture methods are also called “capture recapture methods”.
2. Simple Removal Method3
Suppose we go back and survey the population again (in exactly the same way, with exactly the same
observers, in exactly the same conditions), and this time we capture 33 animals. What does this tell us,
besides that there were at least 97 animals there to start with? Well, if 64 was a miniscule fraction of
the original population and capture probability on the second survey is the same as that on the first,
then we would expect to catch about 64 animals on the second occasion as well (because the population
is virtually unchanged, and we are as likely to catch animals on both occasions). Conversely, if 64 was
a big fraction of the original population and capture probability on the second survey was the same as
that on the first, then we should see few animals the second time round (because removing 64 reduced
the population by a large fraction).
This is the basic idea behind the simple removal method. We can be more precise about it. If we
captured exactly the same proportion of the population on each occasion (which is what we expect, on
average, if p is the same), then
33 64

N2 N
We also know that there were 64 fewer animals there at the time of the second survey, than at the time
of the first:
N 2  N  64
Solving these two equations for N, we get the estimator:
Nˆ 
64 2
64  33
Nˆ 
n1
n1  n 2
or, in general:
2
As well as being an intuitively sensible estimator, this turns out to be the MLE for N for the twosample removal method.
(derivation on board)4
How good is the estimator?
We investigate by simulation – using the same population we used for the plot sampling and line
sampling surveys:
myreg <- generate.region(x.length=100, y.width=50)
mydens <- generate.density(nint.x=100, nint.y=50, southwest=1,
southeast=10, northwest=40)
mydens<-add.hotspot(mydens,myreg,x=20,y=10,altitude=200, sigma=10)
3
Note: with removal and mark recapture methods one tends to capture individual animals, not groups of animals.
For this reason we define groups to be animals in the notes for these methods.
4
Aside from the derivation of the general MLEs for an S-sample removal method (where S is any number greater
than 1), we restrict our attention in this course to the case S=2 (the two-sample removal method).
mydens <- add.hotspot(mydens, myreg, x=80,
sigma=15)
plot(mydens,myreg,eye.vert=10,eye.horiz=320)
y=25,
altitude=100,
# Generate heterogeneous population
mypop.pars<-setpars.population (myreg, mydens, number.groups=1000,
size.method="poisson",size.min=1, size.max=8, size.mean=3,
exposure.method="beta",exposure.min=0.00001,
exposure.max=1,exposure.mean=0.5, exposure.shape=1.5,type.values=
c("M", "F"), type.prob= c(0.5,0.5), density.pop=mydens,
adjust.interactive=F)
set.seed(1234)
mypop <- generate.population(mypop.pars)
summary(mypop)
plot(mypop)
# Set survey up:
des.rm<-generate.design.rm(myreg, n.occ=2)
pars.sur.rm<-setpars.survey.rm(mypop, des.rm, pmin=0.002, pmax = 0.2)
set.seed(1295)
hetero.samp <- generate.sample.rm (pars.sur.rm)
plot(hetero.samp,whole.population=T)
summary(hetero.samp)
#Saw n=201 groups (more than 3 times the sample size of that we got with the line transect method
survey of this population).
point.rm<-point.est.rm(hetero.samp, numerical=FALSE, plot=TRUE)
round(point.rm$Nhat.grp)
The true population size is 1,000; the plot survey estimate was 492; the line transect estimate was 977;
the simple removal method estimate is 383!
WOOOAA - what is going on here?!
Could it be heterogeneity (lack of pooling robustness)? Lets make all animals equally detectable (in a
way that gives the same sample size) and see what happens.
pars.sur.rm<-setpars.survey.rm(mypop, des.rm, pmin=0.12, pmax = 0.12)
set.seed(1066)
homo.samp <- generate.sample.rm (pars.sur.rm)
#plot(homo.samp)
summary(homo.samp)
homo.point.rm<-point.est.rm(homo.samp, numerical=FALSE, plot=TRUE)
round(homo.point.rm$Nhat.grp)
We captured n=209 groups (much the same as above). This looks better, with abundance estimate
1100.
Unmodelled heterogeneity might indeed be the problem: if on both occasions we tend to catch the more
catchable animals, we deplete the catchable population by more than we deplete the population as a
whole, and this makes the estimator think there are fewer animals there than there really are. In an
extreme case, in which the population was composed of animals that were either catchable or not, we’d
estimate the abundance of the catchable population only! Although the above results hint at this effect,
they come from only one survey so in themselves are not strong evidence of anything.
We can learn more about the estimator’s properties by simulating many surveys.
Here are the results of a simulation study which give an indication of the size of the problem. The
dashed line is true abundance, the solid line the mean of the simulated values.
The removal method is not pooling robust
Here’s a graph that shows the problem more generally:
There is another serious problem with the method, which is apparent from the following sample:
pars.sur.rm<-setpars.survey.rm(mypop, des.rm, pmin=0.12, pmax = 0.12)
set.seed(134321)
inad.samp <- generate.sample.rm (pars.sur.rm)
#plot(inad.samp)
summary(inad.samp)
SAMPLE SUMMARY (REMOVAL METHODS)
-------------------------------number of survey occasions: 2
Total number of captured groups
Number removed by start of each occasion
Number captured on each occasion
: 263
: 0 129
: 129 134
This sample has n2>n1, which gives a negative estimate of abundance!. We call this sort of estimate
“inadmissable” because it is not sensible (WiSP deals with inadmissable estimates as follows: it reports
inadmissable point estimates as –1; it discards them from bootstrap estimates). The higher the fraction
of the population removed, the less likely this is to happen, but for small p and small N, the probability
can be really high!
Here’s a graph which shows the problem for a range of values of p and N.
From the above two plots you can see that you have got to catch a large proportion of the population
for the removal method to work at all well! (You can do this by having p high and/or by having many
capture occasions.) As an abundance estimation method, it is really only a viable option if a large
proportion of the population is being removed anyway, or if you want to remove a large proportion of
the population. Even in this case, however, it may be biased due to unmodelled heterogeneity.
3. Interval Estimation
Because the whole survey region is covered, a nonparametric bootstrap can’t resample from plots in the
way we did with plot sampling and line transect methods (there is only one – the whole region). All the
uncertainty comes from the observation process and the sampling units are the animal groups
themselves.
To implement a nonparametric bootstrap, we resample from the observed capture histories.
Capture histories
A capture history is a set of 0’s and 1’s, with a 1 in position s indicating capture on occasion s and a 0
indicating no capture. For a two sample survey, there are three possible capture histories:



(1,0) for animals captured on the first occasion (there are n1 of them);
(0,1) for animals captured on the second occasion (there are n2 of them)
(0,0) for uncaptured animals (there are N-n1-n2 of them)
You can’t have the capture history (1,1) because an animal captured on the first occasion is removed
and unavailable for capture on the second occasion.
A nonparametric bootstrap for the two sample removal method proceeds as follows:
1.
We create a pseudo-population consisting of the (n1+n2) observed capture histories and
Nˆ  (n1  n2 ) capture histories (0,0) for the uncaptured animals;
2.
We “resample” N̂ , capture histories with replacement, from the pseudo-population;
3.
Each time we resample, we calculate an estimate of N;
4.
If we resample B=999 times, we have 999 estimates of N; the distribution of these is our estimated
sampling distribution.
5.
To get the CI, we use the estimated N below which 2.5% of the 999 estimates of N fall, and the N
above which 2.5% of the estimates of N fall. (This is called the “percentile method” of calculating
a CI from bootstrap resamples.)
The method extends naturally to removal methods with more than two samples.
# Heterogeneous population:
#int.prof.rm<-int.est.rm(mysamp,ci.type=”profile”)
#round(int.prof.rm$ci$Nhat.grp)
set.seed(1654)
int.boot.rm<-int.est.rm(hetero.samp,nboot=499)
round(int.boot.rm$ci$Nhat.grp)
# or, to get a bit more control over what is plotted:
ests<-int.boot.rm$boot.dbn$Nhat.grp
plot.boot.dbn(ests[ests<1500],ci=int.boot.rm$ci$Nhat.grp,estname
=”Group abundance”,nclass=10)
# Homogeneous population:
#homo.int.prof.rm<-int.est.rm(mysamp,ci.type=”profile”)
#round(homo.int.prof.rm$ci$Nhat.grp)
set.seed(1654)
homo.int.boot.rm<-int.est.rm(homo.samp,nboot=499)
round(homo.int.boot.rm$ci$Nhat.grp)
# or, to get a bit more control over what is plotted:
ests<-homo.int.boot.rm$boot.dbn$Nhat.grp
plot.boot.dbn(ests[ests<8000],ci=homo.int.boot.rm$ci$Nhat.grp,estname
=”Group abundance”,nclass=10)
4. Summary of Removal Method
Key Idea:
Estimate p by changing N in a known way
State model:
None; assume complete coverage
Observation model:
Capture probability is the same for all animals on all occasions; independent captures.
Likelihood function
(see book)
Main Assumptions & Effect of violating them:
1.
Assumption 1: All animals are detected with equal probability, p.
Effect of violation: Estimates of abundance are negatively biased by violation of this
assumption. The more p varies in the population (the more heterogeneity), the larger the bias.
It can be large. The simple removal method is not pooling robust.
Change in p between surveys due to surveyor- or occasion-effects also cause bias. Consider
the case in which all animals are equally detectable on any one occasion, but p differs between
occasions. If p1>p2, then n1 tends to by inflated relative to n2 and this causes negative bias.
Conversely, if p1<p2, then n1 tends to by deflated relative to n2 and this causes positive bias.
(The catch-effort method tries to deal with this problem – see later.)
2.
Assumption 2: Groups are randomly (uniformly) and independently distributed in the survey
region
Effect of violation: CI’s based on the assumption tend to be biased. Provided robust interval
estimation methods are used (e.g. a capture-history-based nonparamteric bootstrap), violation
of this assumption is of no great consequence.
3.
Assumption 3: Removals are known.
Effect of violation: Bias and/or variance inflation. There are methods that try to deal with
uncertainty in removals; we do not deal with them in this short course.
5. Catch-Effort Method
If survey effort changes from one occasion to another, we would expect the capture probability to
change between occasions: the more effort you put into catching animals, the more likely you are to
catch them. We can extend the simple removal method to cope with this sort of situation.
The obvious way to do it might seem to be to have a different capture probability parameter for each
occasion: ps for occasion s. This does not work. Consider a two sample removal method. It generates
two bits of data: n1 and n2. If we allow capture probability on occasion 1 to be p1 and that on occasion 2
to be p2, we have three parameters: p1, p2 and N. You can’t estimate three parameters sensibly with
only two bits of data. Having more capture occasions does not help: for each new occasion you get one
more bit of data (ns for occasion s) and one more parameter (ps for occasion s). You always have one
more parameter than you have bits of data!
A solution is to make capture probability a function of survey effort. A function commonly used in the
analysis of fisheries catch data is
p(l s )  1  exp{   l s }
where ls is survey effort on occasion s. Below are some examples of the shapes this function can take.
The dots are observed captures from a three sample removal method; the dashed line is the maximum
likelihood fit to these data; the other lines are just examples of other shapes the function can take.
With this parameterisation you have only two parameters:  and N., so you can estimate them from
two or more survey occasions.
It is easy to write down the likelihood for this case – it is just the removal method likelihood with the
constant capture probability p replaced by the function above for each occasion. However, maximising
the likelihood is not easy and the MLE has to be found numerically. We don’t do this in this course.
Before the advent of powerful desktop computers, this was a real obstacle to estimation. As a result,
various approximations were used to simplify the problem. A common one is to approximate the
function p(ls) above by
p(l s )    l s
This approximation is quite good when p(ls) is small (which in fisheries applications it usually is). To
see this without algebra, look at the figure above. N is around 200 for this population: the curves are
very nearly linear when number of detections is small (below about 50) – this is the region in which
p(ls) is small. (Tutorial 2 contains an example using this approximation.)
Download