Lecture 1:

advertisement
MT5751 2002/3
Mark Recapture Methods
Before lecture:
 load wisp library
1. Introduction
Simple mark recapture methods1 assume that all animals in the survey region are at risk of being
captured – that is, that the whole survey region is the covered region. The likelihood functions for this
two classes of method are therefore based entirely on observation models.
The simplest possible observation model involves all animals in the survey region having equal,
independent probability (p) of capture. This is the observation model on which the simplest mark
recapture method is based. If we consider capture a “success” and the N animals in the population the
“trials”, then with equal, independent capture, the number of animals captured (n) is binomial, with
parameters p and N.
This should sound very familiar by now: the plot sampling and simple removal method likelihood
functions were based on exactly the same sort of assumptions; and they appeared in distance sampling
as well. As with the removal method, we don’t know the probability of “success” and we have to
estimate both N and p.
Mark recapture methods implicitly contain removal methods (as we shall see), but they include a new
kind of data: data from recaptures (animals that are caught after having been caught before): these give
capture history (1,1) which we could not get with the two sample removal method.
Here’s a schematic representation of a mark recapture survey in which animals are marked in such a
way that when we catch them again, we can see that they have been caught before, but not when, nor
which particular previously caught animal they are. (We concentrate on this sort of marking – there are
other sorts and the theory changes a bit when they are used, but the basic ideas are the same for all
methods.)
Notation:
 Ms is number of marked animals at the start of occasion s
 ms is number of marked animals that are captured pm occasion s
 Us is number of unmarked animals at the start of occasion s
 us is number of unmarked animals captured on occasion s
We assume throughout that all captured animals are marked and returned to the population.
1
Mark recapture methods are also called “capture recapture methods”.
From the first survey, we get one bit of data: n1=u1. From all others we get two new bits of data: the
number of unmarked captures (us on occasion s) and the number of previously marked animals that are
captured (ms on occasion s).
How does this help?
2. The Simplest Mark Recapture Method
Suppose, as before, that we captured n1=u1=64 animals from a population on our first survey. Instead of
removing them, we mark them and return them to the population.
We go back after some time (enough time to let the marked animals mix with the unmarked and to get
over the experience of being marked) and survey the population again (in exactly the same way, with
exactly the same observers, in exactly the same conditions), and this time we capture m2=33 of the
animals we marked on the first occasion, and u2=34 unmarked animals. What does this tell us, aside
from the fact that there were at least 97 animals there to start with (as the removal method did)?
We caught 33 64ths of the animals we marked. So if all animals are equally catchable, we can
immediately estimate p to be 33/64. More generally, for a two-sample experiment:
pˆ 
m2
m
 2
n1
M2
We also know that we caught a total of 34+33=67 animals on the second occasion, and if this is 33
64ths of the population, an intuitively sensible estimator of N is
67
67  64
Nˆ 

 126
34
 34 
 
 64 
Or, more generally:
Nˆ 
n2
 m2

 n1




n1 n 2
m2
Not only is this estimator sensible, but it turns out to be the maximum likelihood estimator as well. It is
variously referred to as the Lincoln index, Lincoln-Petersen estimator, Petersen estimator, ... (after the
two who first derived or at least publicised it).
(MLE derivation on board for the case p1 not necessarily equal to p2)
(see Tutorial 2 for the derivation when p1=p2)
It turns out that the MLE of N is quite biased for small sample sizes. A less biased estimator is
n  1n2  1  1
Nˆ c  1
m2  1
This is “Chapman’s modified estimator”, named after its author.
How good are these estimators of abundance?
We investigate by simulation – using the same population we used for the plot sampling, removal
method, and line sampling surveys:
myreg <- generate.region(x.length=100, y.width=50)
myreg <- generate.region(x.length=100, y.width=50)
mydens <- generate.density(nint.x=100, nint.y=50, southwest=1,
southeast=10, northwest=40)
mydens<-add.hotspot(mydens,myreg,x=20,y=10,altitude=200, sigma=10)
mydens <- add.hotspot(mydens, myreg, x=80, y=25, altitude=100,
sigma=15)
plot(mydens,myreg,eye.vert=10,eye.horiz=320)
# Generate heterogeneous population
hetero.pop.pars<-setpars.population (myreg, mydens,
number.groups=1000, size.method="poisson",size.min=1, size.max=8,
size.mean=3, exposure.method="beta",exposure.min=0.00001,
exposure.max=1,exposure.mean=0.5, exposure.shape=1.5,type.values=
c("M", "F"), type.prob= c(0.5,0.5), density.pop=mydens,
adjust.interactive=F)
set.seed(1234)
hetero.pop <- generate.population(hetero.pop.pars)
plot(hetero.pop)
# Set survey up to generate about the same n:
des.cr<-generate.design.cr(myreg, n.occ=2)
pars.sur.cr<-setpars.survey.cr(hetero.pop, des.cr,
pmin.unmarked=0.002, pmax.unmarked = 0.2)
set.seed(1295)
samp.cr <- generate.sample.cr (pars.sur.cr)
plot(samp.cr, whole.population=T)
summary(samp.cr)
#Saw n=207, much the same as with the removal method (and more than 3 times the sample size of
that we got with the line transect method survey of this population).
point.cr.M0<-point.est.crM0(samp.cr,Chapmod=T)
round(point.cr.M0$Nhat.grp)
point.cr.M0<-point.est.crM0(samp.cr,Chapmod=F)
round(point.cr.M0$Nhat.grp)
# Bootstrap CI:
set.seed(1654)
int.cr.M0<-int.est.crM0(samp.cr,ci.type=”boot.nonpar”,nboot=99)
names(int.cr.M0)
names(int.cr.M0$ci)
round(int.cr.M0$ci$Nhat.grp)
# can look at bootstrap distribution of Nhat.grp like this:
plot.boot.dbn(int.cr.M0$boot.dbn$Nhat.grp,ci=int.cr.M0$ci$Nhat.grp,es
tname =”Group abundance”,nclass=10)
# can look at bootstrap distribution of p for example, like this:
plot.boot.dbn(int.cr.M0$boot.dbn$phat,ci=int.cr.M0$ci$phat,estname
=”Capture probability”,nclass=10)
Note that the p estimate is close to the maximum true p (0.2) – not the average true p (which is about
0.11).
The true population size is 1,000. The estimates from this and previously used methods are shown
below. First we do a plot sampling survey of this population for comparison (when we first looked at
plot sampling, we used a different population):
pars.des.pl<-setpars.design.pl(myreg, n.interval.x=10,
n.interval.y=10, method="random",area.covered = 0.2)
set.seed(1212)
des.pl <- generate.design.pl(pars.des.pl)
#plot(des.pl)
set.seed(1652)
samp.pl <- generate.sample.pl(hetero.pop, des.pl)
plot(samp.pl, whole.population=T)
summary(samp.pl)
est.pl<-point.est.pl(samp.pl)
est.pl$Nhat.grp
int.pl<-int.est.pl(samp.pl)
int.pl$ci$Nhat.grp
Estimator
Plot sampling
Line Transect
Removal
Mark recapture
Coverage
probability
20%
12%
100%
100%
Sample size
Point estimate
194
58
207
201
970
972
383
574
Bootstrap 95%
Confidence interval
(775; 1190)
(608; 1338)
(261; 971)
(399; 831)
Does the mark-recapture method have the same weakness as the removal method with respect to
heterogeneous capture probability? Could unmodelled heterogeneity (lack of pooling robustness) be the
problem again? Lets make all animals equally detectable (in a way that gives about the same sample
size) and see what happens.
homo.pars.sur.cr<-setpars.survey.cr(hetero.pop, des.cr,
pmin.unmarked=0.106, pmax.unmarked = 0.106)
set.seed(243)
homo.samp.cr <- generate.sample.cr (homo.pars.sur.cr)
#plot(homo.samp.cr)
summary(homo.samp.cr)
homo.point.cr.M0<-point.est.crM0(homo.samp.cr,Chapmod=T)
round(homo.point.cr.M0$Nhat.grp)
homo.point.cr.M0<-point.est.crM0(homo.samp.cr,Chapmod=F)
round(homo.point.cr.M0$Nhat.grp)
set.seed(1654)
homo.int.cr.M0<-int.est.crM0(homo.samp.cr,nboot=99)
round(homo.int.cr.M0$ci$Nhat.grp)
plot.boot.dbn(homo.int.cr.M0$boot.dbn$Nhat.grp,ci=
homo.int.cr.M0$ci$Nhat.grp,estname =”Group abundance”,nclass=10)
homo.point.cr.M0$phat
plot.boot.dbn(homo.int.cr.M0$boot.dbn$phat,ci=
homo.int.cr.M0$ci$phat,estname =”Capture probability”,nclass=10)
Now p estimate is half of what it was. This looks much better!
It looks like unmodelled heterogeneity is the problem again. We can’t really conclude this from only
one sample, but we can investigate it by simulation (and in some cases analytically). Here are results
from a simulation study using the same population as was used for similar plots for the removal
method.
Mark recapture estimators of abundance are not pooling robust.
Actually that is unfair, because we have used the very simplest estimator above. There are many and
various mark recapture estimators available. The statement is true for those that do not allow for animal
heterogeneity – i.e. differences in catchability between individual animals. Unfortunately we don’t
have time to go into estimators that do allow for animal heterogeneity.
Otis et al. (1978) developed a useful classification of mark recapture models that is often quoted. They
recognised three sources of heterogeneity, and together with a model with no heterogeneity, these
provide a basis for classification of the methods, as follows:
1.
M0: No heterogeneity (the model we used above).
1.
Mt: Changes in capture probability between capture occasions. Subscript t (for time) is used to
denote this source of heterogeneity. Capture probability might change from one occasion to
another (while remaining constant within occasion – perhaps) due to the survey effort, survey type
(type of capture mechanism), environmental variables, etc.
2.
Mb: Capture probability due to marking. Subscript b (for animal behaviour) is used to denote this
source of heterogeneity. Animals might become “trap-happy” if being caught is rewarding (getting
food from a baited trap, for example), or “trap-shy” if being caught is traumatic (having a plastic
tag stapled through your ear, for example).
3.
Mh: Changes in capture probability due to individual animal differences (animal heterogeneity).
Subscript h (for heterogeneity) is used to denote this source of heterogeneity.
Further models are constructed by combining some or all of the above: Mtb, Mth, Mbh, Mtbb
For lack of time, and because they are substantially more complex, we do not deal with any the hmodels here. In addition, note that you can’t estimate both behaviour and time effects in model Mtb
without further assumptions. You can see this by thinking about how many bits of data you have, and
how many parameters you need to estimate.
With a two-sample survey you have three bits of data (n1, u2, m2) and four parameters to estimate: N,
capture probabilities p1, p2 for unmarked animals on the two occasions, and capture probability c 2 for
captured animals on the two second occasion (there are no captured animals on the first occasion). You
can’t sensibly estimate four parameters with only three bits of data.
With a three-sample survey you have five bits of data (n1, u2, m2, u3, m3) but now you have six
parameters to estimate: N, p1, p2, p3, c2 , c3. You can’t sensibly estimate six parameters with only five
bits of data. And so on.
See Tutorial 2 for an example of an assumption that reduces the number of parameters to an estimable
number. (Aside: when you can’t estimate all the parameters from the data, the parameters are said to be
not “identifiable”. You will come across this term in the mark recapture literature.)
Example of Mt estimation:
point.cr.Mt<-point.est.crMt(samp.cr)
round(point.cr.Mt$Nhat.grp)
point.cr.Mt
int.cr.Mt<-int.est.crMt(samp.cr,ci.type=”boot.nonpar”,nboot=99)
round(int.cr.Mt$ci$Nhat.grp)
# to look at bootstrap distribution of p1, for example:
# (need to be careful with indices)
plot.boot.dbn(int.cr.Mt$boot.dbn$phat[,1],ci=int.cr.Mt$ci$phat[1,],
estname =”Capture probability p1”,nclass=10)
# or, for groupsize:
plot.boot.dbn(int.cr.Mt$boot.dbn$Es,ci=int.cr.Mt$ci$Es, estname
=”Mean group size”,nclass=10)
Example of Mb estimation:
point.cr.Mb<-point.est.crMb(samp.cr)
round(point.cr.Mb$Nhat.grp)
point.cr.Mb
int.cr.Mb<-int.est.crMb(samp.cr,ci.type=”boot.nonpar”,nboot=99)
round(int.cr.Mb$ci$Nhat.grp)
# To look at bootstrap distribution of p[1]=p(unmarked):
# (need to be careful with indices)
par(mfrow=c(2,1)) #(divides the graphics window into 2 rows, 1 col)
point.cr.Mb$phat
plot.boot.dbn(int.cr.Mb$boot.dbn$phat[,1],ci=int.cr.Mb$ci$phat[1,],
estname=”Capture probability p(unmarked)”,nclass=10)
plot.boot.dbn(int.cr.Mb$boot.dbn$phat[,2],ci=int.cr.Mb$ci$phat[1,],
estname=”Capture probability p(marked)”,nclass=10)
par(mfrow=c(1,1)) #(makes graphics window 1 row, 1 col)
Model Selection
It is a BIG
ISSUE in mark recapture surveys.
Compare the log-likelihoods and AICs of the three models used above:
point.cr.M0$log.Likelihood; point.cr.Mt$log.Likelihood;
point.cr.Mb$log.Likelihood
point.cr.M0$AIC; point.cr.Mt$AIC; point.cr.Mb$AIC
round(point.cr.M0$Nhat.grp); round(point.cr.Mt$Nhat.grp);
round(point.cr.Mb$Nhat.grp)
Multiple captures
Lets now generate a population with trap-shyness (and 8 capture occasions) and see how the model
selection criterion works:
# Heterogeneous, trap-shy
des.cr<-generate.design.cr(myreg, n.occ=8)
Hetero.pars.sur.cr<-setpars.survey.cr(hetero.pop, des.cr,
pmin.unmarked=0.003, pmax.unmarked = 0.3,
pmin.marked=0.0003, pmax.marked = 0.03)
set.seed(243)
Hetero.samp.cr <- generate.sample.cr(Hetero.pars.sur.cr)
summary(Hetero.samp.cr)
Hetero.point.cr.M0<-point.est.crM0(Hetero.samp.cr)
Hetero.point.cr.Mt<-point.est.crMt(Hetero.samp.cr)
Hetero.point.cr.Mb<-point.est.crMb(Hetero.samp.cr)
round(Hetero.point.cr.M0$Nhat.grp);
round(Hetero.point.cr.Mt$Nhat.grp);
round(Hetero.point.cr.Mb$Nhat.grp);
Hetero.point.cr.M0$log.Likelihood; Hetero.point.cr.Mt$log.Likelihood;
Hetero.point.cr.Mb$log.Likelihood
Hetero.point.cr.M0$phat
Hetero.point.cr.Mt$phat
Hetero.point.cr.Mb$phat
Hetero.point.cr.M0$AIC; Hetero.point.cr.Mt$AIC;
Hetero.point.cr.Mb$AIC
Mark Recapture estimators can be very biased if you have the wrong model.
Model selection is important!
Question: Although we did not model the heterogeneity in the population, the point estimate from the
best model is not far off the true abundance. Why do you think this is?
Answer: We captured 60% of the population! At the very worst the estimator could be biased by 40%
in this case. The point estimate is about 30% lower than the true abundance.
# Heterogeneous, reducing effort
des.cr<-generate.design.cr(myreg, n.occ=8, effort=c(8:1)/5)
Hetero.pars.sur.cr<-setpars.survey.cr(hetero.pop, des.cr,
pmin.unmarked=0.002, pmax.unmarked = 0.2)
set.seed(243)
Hetero.samp.cr <- generate.sample.cr (Hetero.pars.sur.cr)
summary(Hetero.samp.cr)
Hetero.point.cr.M0<-point.est.crM0(Hetero.samp.cr)
Hetero.point.cr.Mt<-point.est.crMt(Hetero.samp.cr)
Hetero.point.cr.Mb<-point.est.crMb(Hetero.samp.cr)
round(Hetero.point.cr.M0$Nhat.grp);
round(Hetero.point.cr.Mt$Nhat.grp);
round(Hetero.point.cr.Mb$Nhat.grp);
Hetero.point.cr.M0$log.Likelihood; Hetero.point.cr.Mt$log.Likelihood;
Hetero.point.cr.Mb$log.Likelihood
Hetero.point.cr.M0$phat
Hetero.point.cr.Mt$phat
Hetero.point.cr.Mb$phat
Hetero.point.cr.M0$AIC; Hetero.point.cr.Mt$AIC;
Hetero.point.cr.Mb$AIC
# Homogeneous, trap-shy
des.cr<-generate.design.cr(myreg, n.occ=8)
homo.pars.sur.cr<-setpars.survey.cr(hetero.pop, des.cr,
pmin.unmarked=0.1, pmax.unmarked = 0.1, pmin.marked=0.01, pmax.marked
= 0.01)
set.seed(243)
Homo.samp.cr <- generate.sample.cr (homo.pars.sur.cr)
summary(Homo.samp.cr)
Homo.point.cr.M0<-point.est.crM0(Homo.samp.cr)
Homo.point.cr.Mt<-point.est.crMt(Homo.samp.cr)
Homo.point.cr.Mb<-point.est.crMb(Homo.samp.cr)
round(Homo.point.cr.M0$Nhat.grp); round(Homo.point.cr.Mt$Nhat.grp);
round(Homo.point.cr.Mb$Nhat.grp)
Homo.point.cr.M0$log.Likelihood; Homo.point.cr.Mt$log.Likelihood;
Homo.point.cr.Mb$log.Likelihood
Homo.point.cr.M0$phat
Homo.point.cr.Mt$phat
Homo.point.cr.Mb$phat
Homo.point.cr.M0$AIC; Homo.point.cr.Mt$AIC; Homo.point.cr.Mb$AIC
# Homogeneous, reducing effort
des.cr<-generate.design.cr(myreg, n.occ=8, effort=c(15:8)/15)
homo.pars.sur.cr<-setpars.survey.cr(hetero.pop, des.cr,
pmin.unmarked=0.11, pmax.unmarked = 0.11)
set.seed(243)
Homo.samp.cr <- generate.sample.cr (homo.pars.sur.cr)
#plot(Homo.samp.cr)
summary(Homo.samp.cr)
Homo.point.cr.M0<-point.est.crM0(Homo.samp.cr)
Homo.point.cr.Mt<-point.est.crMt(Homo.samp.cr)
Homo.point.cr.Mb<-point.est.crMb(Homo.samp.cr)
round(Homo.point.cr.M0$Nhat.grp); round(Homo.point.cr.Mt$Nhat.grp);
round(Homo.point.cr.Mb$Nhat.grp);
Homo.point.cr.M0$log.Likelihood; Homo.point.cr.Mt$log.Likelihood;
Homo.point.cr.Mb$log.Likelihood
Homo.point.cr.M0$phat
Homo.point.cr.Mt$phat
Homo.point.cr.Mb$phat
Homo.point.cr.M0$AIC; Homo.point.cr.Mt$AIC; Homo.point.cr.Mb$AIC
3. Interval Estimation
To implement a nonparametric bootstrap, we resample from the observed capture histories, as we did
with the removal method.
Capture histories
A capture history is a set of 0’s and 1’s, with a 1 in position s indicating capture on occasion s and a 0
indicating no capture. For a two sample survey, there are four possible capture histories:




(1,0) for animals captured only on the first occasion (there are n1 = u1 of them);
(0,1) for animals captured only on the second occasion (there are u2 of them)
(1,1) for animals captured on both occasions (there are m2 of them)
(0,0) for uncaptured animals (there are N-u1-u2-m2 of them)
Capture history (1,1) gives additional information beyond that which was available with the removal
method.
A nonparametric bootstrap for the two sample mark recapture method proceeds as follows:
1.
We create a pseudo-population consisting of the (n=u1+u2) observed capture histories and Nˆ  n
capture histories (0,0) for the uncaptured animals;
2.
We “resample” N̂ , capture histories with replacement, from the pseudo-population;
3.
Each time we resample, we calculate an estimate of N;
4.
If we resample B=999 times, we have 999 estimates of N; the distribution of these is our estimated
sampling distribution.
5.
To get the CI, we use the estimated N below which 2.5% of the 999 estimates of N fall, and the N
above which 2.5% of the estimates of N fall. (This is called the “percentile method” of calculating
a CI from bootstrap resamples.)
The method extends naturally to mark recapture methods with more than two samples.
Summary
Key Idea:
Estimate p by proportion of marked animals that are recaptured
Process Model:
None; assume complete coverage
Observation model:
Various. With simplest, capture probability is the same for all animals on all occasions. Other
models allow it to depend on capture occasion, trap response, and/or individual differences by
animal; independent captures.
Likelihood function
(see book)
Main Assumptions & Effect of violating them:
1.
Assumption 1: Capture probability is
a.
M0: is constant
b.
Mt: depends only on capture occasion
c.
Mb: is constant except that marked animals are more/less catchable than unmarked
d.
Mtb: depends on capture occasion and whether or not an animal has been caught.
e.
Mh*: (where * is any of the above subscripts) depends on animal-level variables, and
possibly other factors, as above.
Effect of violation:
All estimators are biased by violation of the assumption above on which they are based. A
particular problem is that catchability is very often different for different animals (there is
heterogeneity in the population) and if this is not modelled (using Mh* type models). Any of
the estimators that assume that there is not animal-level difference in catchability will be
baised in this case – sometimes by a large amount. It is important to try a variety of plausible
models (including Mh* type models); AIC can be used to select between them.
You can sometimes reduce the unmodelled heterogeneity problem by using very different
capture methods on different capture occasions. Bias arises from unmodelled heterogeneity
because some animals are more catchable on all occasions than others – the abundance
estimate tends towards the abundance of the more catchable animals (which is lower than the
abundance of all animals). With sufficiently different capture methods, animals that are very
catchable by the method used on the first occasion are not particularly catchable using the
method used on the second occasion, etc. You want the capture methods to be such that the
fact that an animal was (or was not) caught on the first occasion implies nothing about its
probability of being caught on the second occasion.
2.
Assumption 2: Animals are captured independently of one another.
Effect of violation: CI’s based on the assumption tend to be biased and point estimates may
be biased. Violation of this assumption tends to be a problem if practicalities make it so. For
example, if a trap can hold only one animal, then when all traps are full, the probability of
catching any of the uncaptured animals is zero: capture probability depends on how many
animals have been capture. Care should be taken in designing the experiment to avoid this sort
of problem.
3.
Assumption 3: No marks are lost.
Effect of violation: Mark loss results in positive bias in abundance estimation. The lower
recapture frequency that mark loss causes is interpreted as a smaller fraction of the population
having been marked than actually has been marked. This causes the abundance estimator to be
positively biased. The same effect occurs if marks are not reported even though they are still
on the animals. This is common in situations where recaptures are made by members of the
public rather than a dedicated survey team (e.g. where sport fisherpeople return marks from
fish they catch; where birdwatchers return marks from dead birds).
References
Otis,D.L. and Burnham,K.P. and White,G.C. and Anderson, D.R. (1978) Statistical inference from
capture data on closed animal populations. Wildlife Monographs 62.
Download