Detectability and Sampling (Chapter 16)

advertisement
Detectability and Sampling (Chapter 16)
To this point, all sampling methods considered have assumed that the variable is interest is
measured without error and that the only source of variation is natural variation between
the observed sampling units. Particularly in situations where we count the number of some
species in some subplot of a large region, it is not always the case that detection of all such
species is “perfect.” In fact, with many elusive animal species (birds, fish, bears, etc.), detectability is far from perfect, and we need to account for the probability of detection of such
species in estimating species population totals or means.
Consider some region within which we want to estimate the total number of objects in
the region, or the mean number of objects in the region per unit area. To carry out this type
of estimation under imperfect detectability, we introduce the following notation:
τ = the actual total # of objects in the whole region,
A = the area of the region,
D = τ /A = the actual density of objects per unit area in the region,
y = the observed # of objects in the region under imperfect detectability,
[Note that y here is a random variable]
p = the probability of detection (assumed equal for objects).
• Assume also that the detections are independent of one another (i.e.: the fact that one
object is detected has no bearing on whether or not any other object is detected). Is
this reasonable?
• What possible values could the random variable y have?
• Under the assumption of independent detections, and equal probabilities of detection
on each object, the distribution of y is:
with mean:
and variance:
.
Estimating τ , D for Known Detectability p:
• Suppose, for example, that we detect y animals where p is known. Since E(y) = τ p
y
and the observed y is an unbiased estimate of E(y), then is an unbiased estimate of
p
τ . Hence, the estimated total and corresponding variance are given by:
à !
y
y
τb =
and Var(τb) = Var
p
p
127
=
τ p(1 − p)
τ (1 − p)
=
.
2
p
p
b
d τb) = τ (1 − p) .
• The estimated variance of τb is given by: Var(
p
• To estimate the density of objects per unit area in the region (D = τ /A), we use:
c=
D
b
τb
y
c = τ (1 − p) , Var(
d D)
c = τ (1 − p) .
=
, Var(D)
A
pA
A2 p
A2 p
Example (Problem 1 on page 197): In an aerial survey in Alaska, 82 moose were detected.
Intensive independent studies determined the probability of detection to be 0.89. Estimate
the total number of moose in the study region and estimate the variance of that estimate.
Here, y = 82 moose, and p = 0.89 (probability of detection), so we estimate:
y
82
=
= 92.1 moose, with standard error:
p
.89
s
s
b(1 − p)
92.1(1 − .89) √
τ
d τb) =
SE(
=
= 11.39 = 3.38 moose.
p
.89
τb =
• We assume that p is somehow known here, but, in practice, it must be estimated. If it is
estimated within the same study, by, for example, ground-truthing the aerial estimates
on a subset of plots in the study area, then τb above is the same as the ratio estimator
(because p would be the reciprocal of the ratio r of actual to visual; see Chap. 7) and
the standard error should be estimated by the formula for ratio estimation and not the
formula above.
• If the estimate of p comes from another study independent of the current one, then if
there is a standard error associated with the estimate we should use the methods below
for estimated p. Methods of estimating detectability include mark-recapture methods,
radio-collaring methods, distance-based methods, and regression-based methods.
• Whatever was done, it is important to recognize that if p comes from outside the current study, then it is an independent estimate of the detectability.
b
Estimating τ , D with Estimated Detectability p:
• Suppose now that instead of assuming the detectability p is known, p is in fact estimated
b
independently by pb with some variance Var(p).
• Here, τ is estimated through a ratio estimator given by:
τb =
y
, with variance given by:
pb


µ2
1
b + 2 Var(y) (Delta Method (#2 on p. 38 of notes))
Var(τb) ≈  y4  Var(p)
µpb
µpb
Ã
=
!
i
1
1 h
τ 2 p2
2
b +
b
Var(
p)
Var(y)
=
Var(y)
+
τ
Var(
p)
p4
p2
p2
128
i
1 h
2
b
τ
p(1
−
p)
+
τ
Var(
p)
p2
Ã
!
1−p
τ2
b
=
τ
+
Var(p)
p
p2
=
|
{z
}
variation due to
imperfect detection
|
{z
.
}
variation in pb
• There is no covariance term in the Delta Method variance approximation above as it
is assumed that the current survey is independent of the one used to estimate p.
• The estimated approximate variance (taking into account the effect of estimated detectability) is
Ã
!
1 − pb
τb2 d
d
b
Var(τb) = τb
+ 2 Var(
p).
pb
pb
Back to the Moose Example: Suppose in addition to being told that y = 82 moose were
d p)
b = .05.
detected and that the detectability was estimated to be pb = .89, we are told that SE(
Then
d τb) = 11.39 +
Var(
92.12
d τb) = 6.2.
(.05)2 = 11.39 + 26.8 = 38.2 =⇒ SE(
.892
• Note that this SE is almost double what it was in the earlier calculation (3.38 vs. 6.2).
Hence, taking into account the variation in pb demonstrates how badly we underestimated the SE initially, and how important it is to take this extra source of variation
into account.
• It will usually be the case (and certainly should be the case!) that an estimate of p,
b includes an estimate of the variability of p.
b Unfortunately, independent estimates
p,
taken from other papers are often treated as “truth” without any consideration of the
variability underlying such estimates.
Detectability with Simple Random Sampling: Suppose we take an SRS (without replacement) of size n from a population of N units. We might consider units to be plots within
some region, where animals within a selected plot are detected with constant probability p,
independently. Let:
Yi = the actual number of objects (animals) in unit i, i = 1, . . . , N ,
yi = the observed number of objects in unit i,
so that yi ∼ Bin(Yi , p), where we assume for now that p is known. The goal, as before, is to
estimate the population total number of objects τ =
N
X
Yi .
i=1
• We know from earlier that for a given sampled unit i:
Ybi =
yi (1 − p)
yi
, Var(Ybi ) =
, where: E(Ybi ) = Yi .
p
p
129
With these then, an unbiased estimate of the population total τ is:
τb =
n
n
n
NX
NX
yi
y
1X
b
Yi =
yi .
= N , where: y =
n i=1
n i=1 p
p
n i=1
• The variance of τb (derived later in these notes and in Sec. 16.7 of notes) is


!
µ
¶ 2 Ã
1 − p µ
N −n σ
+
Var(τb) = N 2 

,
n}
p
n
| N{z
|
{z
}
where: µ =
units.
N
τ
1 X
and σ 2 =
(Yi − µ)2 is the natural variability in the population
N
N − 1 i=1
• An unbiased estimator of Var(τb) is given by
2
d τb) = N
Var(
2
p
where s2 =
"µ
¶
µ
¶ #
N − n s2
1−p
+
y
N
n
N
n
1 X
(yi − y)2 is the sample variance of the observed counts.
n − 1 i=1
Note: s2 does not estimate σ 2 (as was the case earlier with two-stage sampling). s2
underestimates σ 2 .
Worst Case: Detectability in SRS with p Unknown: Suppose now we take an SRS of n units
from a population of N units, where p is unknown and is estimated independently by pb with
b As before for p unknown, the population total τ is estimated via a ratio
variance Var(p).
estimator:
Ny
τb =
, with approximate variance given by:
pb

µ
¶ 2
2 N −n σ
Var(τb) ≈ N 
n}
| N{z
variability
due to SRS
Ã
+
|
1−p
p
!
{z
µ
n
 

µ
  Delta 
b
Var(p)
p2
Method
| {z }
2
+
}
variability due to
imperfect detectability
variability due to
estimating p
This variance is estimated by:
d τb) =
Var(
N2
pb2
"µ
¶
µ
¶
#
1 − pb
y2 d
N − n s2
b .
+
y + 2 Var(
p)
N
n
N
pb
• There are 3 variance components here to account for all 3 levels of estimation (SRS,
Imperfect Detectability, Estimation of p).
130
Example: Problem 4, page 197: Suppose an SRS of n = 5 plots is selected from a study area
of N = 100 plots and that the numbers of animals detected in the five plots are 10, 7, 0,
0, and 5, but that the probability of detection for any animal in a selected plot is p = .80.
Estimate the total number of animals in the study region and estimate the variance of the
estimator.
R is employed on the next page to answer this question:.
> N <- 100
> n <- 5
> y <- c(10,7,0,0,5)
> p <- .8
> N*mean(y)/p
# Estimate of the total number
[1] 550
#
of animals in the region
Ã
y
τb = N
p
!
The estimated variability will be computed in three ways: assuming no error in estimating
b = .05, and assuming SE(p)
b = .25, for the sake of comparison.
p, assuming SE(p)
# If Detectability Estimated with SE of 0
# =======================================
Ã
¶ !
µ
N 2 N − n s2
> var.srs <- (N^2/p^2)*((N-n)/N)*var(y)/n
d
VarSRS (τb) = 2
p
N
n
> var.srs
[1] 57296.9
# This is the major contribution to the variability.
Ã
¶ !
> var.det <- (N^2/p^2)*((1-p)/N)*mean(y)
2 µ
N
1
−
p
d
Var
y
> var.det
ID (τb) = p2
N
[1] 137.5
# Variability due to detectability is minor.
> sqrt(var.srs + var.det)
¶
µ
q
q
[1] 239.6547
# SE of phat
d τb) = Var
d
d
b
b
(
τ
)
+
Var
(
τ
)
SE(τb) = Var(
SRS
ID
# If Detectability Estimated with SE of .05
# =========================================
> var.srs <- (N^2/p^2)*((N-n)/N)*var(y)/n
> var.det <- (N^2/p^2)*(((1-p)/N)*mean(y) + (mean(y)^2/p^2)*(.05)^2)
> sqrt(var.srs + var.det)
[1] 242.1074
# SE of phat
# If Detectability Estimated with SE of .25
# =========================================
> var.srs <- (N^2/p^2)*((N-n)/N)*var(y)/n
> var.det <- (N^2/p^2)*(((1-p)/N)*mean(y) + (mean(y)^2/p^2)*(.25)^2)
> sqrt(var.srs + var.det)
[1] 294.9159
# SE of phat
131
• Note that the estimated SE with detectability estimated with a SE of .05 is not much
different than having no error in the estimation of pb (i.e.: assuming p is known).
• The estimated SE with detectability estimated with a SE of .25 is appreciably larger
than when p is assumed known, as the third piece of the variance above is now large
relative to the first component.
• The bottom line in this problem is that getting an accurate estimate of p is not that
important here; getting a larger sample size n is more important here.
Derivation of Variance Expressions: The derivation of the variances of the estimators of τ
under the various scenarios described above illustrates some useful techniques: the delta
method (described earlier in the notes) and two common results on iterated expectations
(sometimes called the laws of total expectation and total variance). These latter two results
are:
6
1. E(Y ) = E [E [Y |X]].
2. Var(Y ) =
E [Var(Y |X)]
|
{z
+
}
var. within y at some
x averaged over all x
Var [E(Y |X)] .
|
{z
}
var. due to differences
in the µY |X ’s
-
The derivation of the variance expressions for τb in the order in which they were considered
above follows.
1. Known detectability p over a whole region
(Section 16.1): The expression for Var(τb) (eq. 4 on p. 186 of text) follows directly
from the variance of a binomial random variable. The unbiasedness of τb follows from
the expected value of a binomial random variable.
2. Estimated detectability over a whole region
(Section 16.3): The estimated population total is τb = y/pb where y is a binomial(τ, p)
b = p (at least approximately)
random variable, and pb is a random variable with E(p)
b We then use the delta method approximation for the variance of
and variance Var(p).
the ratio of two random variables:
µ
Y
Var
X
Ã
¶
≈
µ2Y
µ4X
Ã
!
2
σX
1
+
µ2X
Ã
!
σY2
µY
−2
µ3X
!
ρσX σY .
where, in this situation, ρ = 0 since y and pb are assumed to be independent. Substituting E(y) = τ p and Var(y) = τ p(1 − p) (since y is a binomial random variable), and
b = p (at least approximately), gives eq. (7) on p. 188 of the text:
E(p)
Ã
Var(τb) ≈ τ
!
τ2
1−p
b
+ 2 Var(p).
p
p
As a ratio estimator, τb is not unbiased in this situation.
132
3. Known detectability with simple random sampling
(Section 16.4): This derivation is outlined in Section 16.7. Based on an SRS of n
plots from N plots, where the detection probability p is known, the estimator for the
population total was given as:
"µ
Ã
! #
¶
n
n
NX
NX
yi
N − n σ2
1−p µ
2
b
τb =
Yi =
, with variance Var(τb) = N
+
.
n i=1
n i=1 p
N
n
p
n
Note that the estimate τb obtained depends on which n plots are chosen. It’s easy to
compute the expectation and variance of τb given the particular set S of n plots chosen
for the SRS. So we condition on S and write:
Ã
!
N X
N X
N X
NX
E(τb|S) = E
yi | S =
E(yi |S) =
pYi =
Yi
np i∈S
np i∈S
np i∈S
n i∈S
because yi is binomial(Yi , p) where Yi is the actual number of individuals in unit i. Now
we take the expectation of the above expression over all possible SRS’s S. Since this is
just the expected value of N times the sample mean from an SRS, we know by results
in chapter 2 (for a finite population) that the expected value is N times the population
mean. In other words, the unconditional expected value of τb is
"
#
µ ¶
NX
τ
E(τb) = E[E(τb|S)] = E
Yi = N E(Y ) = N µ = N
= τ.
n i∈S
N
Hence, τb is an unbiased estimator of τ .
To obtain the variance of τb, we again condition on S. First, note that
Ã
!
N X
N2 X
Var(τb|S) = Var
yi | S = 2 2
Var(yi |S)
np i∈S
n p i∈S
=
N2 X
Yi p(1 − p)
n2 p2 i∈S
N2
=
n2
Ã
!
1−p X
Yi .
p
i∈S
The variance of τb is then computed as
Var(τb) = E [Var(τb|S)] + Var [E(τb|S)]
"
!
Ã
#
"
#
N2 1 − p X
NX
= E 2
Yi + Var
Yi
n
p
n i∈S
i∈S
Ã
!
N2 1 − p
· nµ + N 2 Var(Y ) (since µ = τ /N )
=
n2
p
"µ
Ã
! #
µ
¶ 2
¶
N 2 (1 − p)µ
N − n σ2
1−p µ
2 N −n σ
2
=
+N
=N
+
.
pn
N
n
N
n
p
n
This is the equation in the middle of p. 193 of the text.
4. Estimated detectability with simple random sampling
(Section 16.5) The derivation of the variance of τb in this case is outlined on p. 194; the
complete derivation will be left as a homework exercise.
133
Download