Detectability and Sampling (Chapter 16) To this point, all sampling methods considered have assumed that the variable is interest is measured without error and that the only source of variation is natural variation between the observed sampling units. Particularly in situations where we count the number of some species in some subplot of a large region, it is not always the case that detection of all such species is “perfect.” In fact, with many elusive animal species (birds, fish, bears, etc.), detectability is far from perfect, and we need to account for the probability of detection of such species in estimating species population totals or means. Consider some region within which we want to estimate the total number of objects in the region, or the mean number of objects in the region per unit area. To carry out this type of estimation under imperfect detectability, we introduce the following notation: τ = the actual total # of objects in the whole region, A = the area of the region, D = τ /A = the actual density of objects per unit area in the region, y = the observed # of objects in the region under imperfect detectability, [Note that y here is a random variable] p = the probability of detection (assumed equal for objects). • Assume also that the detections are independent of one another (i.e.: the fact that one object is detected has no bearing on whether or not any other object is detected). Is this reasonable? • What possible values could the random variable y have? • Under the assumption of independent detections, and equal probabilities of detection on each object, the distribution of y is: with mean: and variance: . Estimating τ , D for Known Detectability p: • Suppose, for example, that we detect y animals where p is known. Since E(y) = τ p y and the observed y is an unbiased estimate of E(y), then is an unbiased estimate of p τ . Hence, the estimated total and corresponding variance are given by: à ! y y τb = and Var(τb) = Var p p 127 = τ p(1 − p) τ (1 − p) = . 2 p p b d τb) = τ (1 − p) . • The estimated variance of τb is given by: Var( p • To estimate the density of objects per unit area in the region (D = τ /A), we use: c= D b τb y c = τ (1 − p) , Var( d D) c = τ (1 − p) . = , Var(D) A pA A2 p A2 p Example (Problem 1 on page 197): In an aerial survey in Alaska, 82 moose were detected. Intensive independent studies determined the probability of detection to be 0.89. Estimate the total number of moose in the study region and estimate the variance of that estimate. Here, y = 82 moose, and p = 0.89 (probability of detection), so we estimate: y 82 = = 92.1 moose, with standard error: p .89 s s b(1 − p) 92.1(1 − .89) √ τ d τb) = SE( = = 11.39 = 3.38 moose. p .89 τb = • We assume that p is somehow known here, but, in practice, it must be estimated. If it is estimated within the same study, by, for example, ground-truthing the aerial estimates on a subset of plots in the study area, then τb above is the same as the ratio estimator (because p would be the reciprocal of the ratio r of actual to visual; see Chap. 7) and the standard error should be estimated by the formula for ratio estimation and not the formula above. • If the estimate of p comes from another study independent of the current one, then if there is a standard error associated with the estimate we should use the methods below for estimated p. Methods of estimating detectability include mark-recapture methods, radio-collaring methods, distance-based methods, and regression-based methods. • Whatever was done, it is important to recognize that if p comes from outside the current study, then it is an independent estimate of the detectability. b Estimating τ , D with Estimated Detectability p: • Suppose now that instead of assuming the detectability p is known, p is in fact estimated b independently by pb with some variance Var(p). • Here, τ is estimated through a ratio estimator given by: τb = y , with variance given by: pb µ2 1 b + 2 Var(y) (Delta Method (#2 on p. 38 of notes)) Var(τb) ≈ y4 Var(p) µpb µpb à = ! i 1 1 h τ 2 p2 2 b + b Var( p) Var(y) = Var(y) + τ Var( p) p4 p2 p2 128 i 1 h 2 b τ p(1 − p) + τ Var( p) p2 à ! 1−p τ2 b = τ + Var(p) p p2 = | {z } variation due to imperfect detection | {z . } variation in pb • There is no covariance term in the Delta Method variance approximation above as it is assumed that the current survey is independent of the one used to estimate p. • The estimated approximate variance (taking into account the effect of estimated detectability) is à ! 1 − pb τb2 d d b Var(τb) = τb + 2 Var( p). pb pb Back to the Moose Example: Suppose in addition to being told that y = 82 moose were d p) b = .05. detected and that the detectability was estimated to be pb = .89, we are told that SE( Then d τb) = 11.39 + Var( 92.12 d τb) = 6.2. (.05)2 = 11.39 + 26.8 = 38.2 =⇒ SE( .892 • Note that this SE is almost double what it was in the earlier calculation (3.38 vs. 6.2). Hence, taking into account the variation in pb demonstrates how badly we underestimated the SE initially, and how important it is to take this extra source of variation into account. • It will usually be the case (and certainly should be the case!) that an estimate of p, b includes an estimate of the variability of p. b Unfortunately, independent estimates p, taken from other papers are often treated as “truth” without any consideration of the variability underlying such estimates. Detectability with Simple Random Sampling: Suppose we take an SRS (without replacement) of size n from a population of N units. We might consider units to be plots within some region, where animals within a selected plot are detected with constant probability p, independently. Let: Yi = the actual number of objects (animals) in unit i, i = 1, . . . , N , yi = the observed number of objects in unit i, so that yi ∼ Bin(Yi , p), where we assume for now that p is known. The goal, as before, is to estimate the population total number of objects τ = N X Yi . i=1 • We know from earlier that for a given sampled unit i: Ybi = yi (1 − p) yi , Var(Ybi ) = , where: E(Ybi ) = Yi . p p 129 With these then, an unbiased estimate of the population total τ is: τb = n n n NX NX yi y 1X b Yi = yi . = N , where: y = n i=1 n i=1 p p n i=1 • The variance of τb (derived later in these notes and in Sec. 16.7 of notes) is ! µ ¶ 2 à 1 − p µ N −n σ + Var(τb) = N 2 , n} p n | N{z | {z } where: µ = units. N τ 1 X and σ 2 = (Yi − µ)2 is the natural variability in the population N N − 1 i=1 • An unbiased estimator of Var(τb) is given by 2 d τb) = N Var( 2 p where s2 = "µ ¶ µ ¶ # N − n s2 1−p + y N n N n 1 X (yi − y)2 is the sample variance of the observed counts. n − 1 i=1 Note: s2 does not estimate σ 2 (as was the case earlier with two-stage sampling). s2 underestimates σ 2 . Worst Case: Detectability in SRS with p Unknown: Suppose now we take an SRS of n units from a population of N units, where p is unknown and is estimated independently by pb with b As before for p unknown, the population total τ is estimated via a ratio variance Var(p). estimator: Ny τb = , with approximate variance given by: pb µ ¶ 2 2 N −n σ Var(τb) ≈ N n} | N{z variability due to SRS à + | 1−p p ! {z µ n µ Delta b Var(p) p2 Method | {z } 2 + } variability due to imperfect detectability variability due to estimating p This variance is estimated by: d τb) = Var( N2 pb2 "µ ¶ µ ¶ # 1 − pb y2 d N − n s2 b . + y + 2 Var( p) N n N pb • There are 3 variance components here to account for all 3 levels of estimation (SRS, Imperfect Detectability, Estimation of p). 130 Example: Problem 4, page 197: Suppose an SRS of n = 5 plots is selected from a study area of N = 100 plots and that the numbers of animals detected in the five plots are 10, 7, 0, 0, and 5, but that the probability of detection for any animal in a selected plot is p = .80. Estimate the total number of animals in the study region and estimate the variance of the estimator. R is employed on the next page to answer this question:. > N <- 100 > n <- 5 > y <- c(10,7,0,0,5) > p <- .8 > N*mean(y)/p # Estimate of the total number [1] 550 # of animals in the region à y τb = N p ! The estimated variability will be computed in three ways: assuming no error in estimating b = .05, and assuming SE(p) b = .25, for the sake of comparison. p, assuming SE(p) # If Detectability Estimated with SE of 0 # ======================================= à ¶ ! µ N 2 N − n s2 > var.srs <- (N^2/p^2)*((N-n)/N)*var(y)/n d VarSRS (τb) = 2 p N n > var.srs [1] 57296.9 # This is the major contribution to the variability. à ¶ ! > var.det <- (N^2/p^2)*((1-p)/N)*mean(y) 2 µ N 1 − p d Var y > var.det ID (τb) = p2 N [1] 137.5 # Variability due to detectability is minor. > sqrt(var.srs + var.det) ¶ µ q q [1] 239.6547 # SE of phat d τb) = Var d d b b ( τ ) + Var ( τ ) SE(τb) = Var( SRS ID # If Detectability Estimated with SE of .05 # ========================================= > var.srs <- (N^2/p^2)*((N-n)/N)*var(y)/n > var.det <- (N^2/p^2)*(((1-p)/N)*mean(y) + (mean(y)^2/p^2)*(.05)^2) > sqrt(var.srs + var.det) [1] 242.1074 # SE of phat # If Detectability Estimated with SE of .25 # ========================================= > var.srs <- (N^2/p^2)*((N-n)/N)*var(y)/n > var.det <- (N^2/p^2)*(((1-p)/N)*mean(y) + (mean(y)^2/p^2)*(.25)^2) > sqrt(var.srs + var.det) [1] 294.9159 # SE of phat 131 • Note that the estimated SE with detectability estimated with a SE of .05 is not much different than having no error in the estimation of pb (i.e.: assuming p is known). • The estimated SE with detectability estimated with a SE of .25 is appreciably larger than when p is assumed known, as the third piece of the variance above is now large relative to the first component. • The bottom line in this problem is that getting an accurate estimate of p is not that important here; getting a larger sample size n is more important here. Derivation of Variance Expressions: The derivation of the variances of the estimators of τ under the various scenarios described above illustrates some useful techniques: the delta method (described earlier in the notes) and two common results on iterated expectations (sometimes called the laws of total expectation and total variance). These latter two results are: 6 1. E(Y ) = E [E [Y |X]]. 2. Var(Y ) = E [Var(Y |X)] | {z + } var. within y at some x averaged over all x Var [E(Y |X)] . | {z } var. due to differences in the µY |X ’s - The derivation of the variance expressions for τb in the order in which they were considered above follows. 1. Known detectability p over a whole region (Section 16.1): The expression for Var(τb) (eq. 4 on p. 186 of text) follows directly from the variance of a binomial random variable. The unbiasedness of τb follows from the expected value of a binomial random variable. 2. Estimated detectability over a whole region (Section 16.3): The estimated population total is τb = y/pb where y is a binomial(τ, p) b = p (at least approximately) random variable, and pb is a random variable with E(p) b We then use the delta method approximation for the variance of and variance Var(p). the ratio of two random variables: µ Y Var X à ¶ ≈ µ2Y µ4X à ! 2 σX 1 + µ2X à ! σY2 µY −2 µ3X ! ρσX σY . where, in this situation, ρ = 0 since y and pb are assumed to be independent. Substituting E(y) = τ p and Var(y) = τ p(1 − p) (since y is a binomial random variable), and b = p (at least approximately), gives eq. (7) on p. 188 of the text: E(p) à Var(τb) ≈ τ ! τ2 1−p b + 2 Var(p). p p As a ratio estimator, τb is not unbiased in this situation. 132 3. Known detectability with simple random sampling (Section 16.4): This derivation is outlined in Section 16.7. Based on an SRS of n plots from N plots, where the detection probability p is known, the estimator for the population total was given as: "µ à ! # ¶ n n NX NX yi N − n σ2 1−p µ 2 b τb = Yi = , with variance Var(τb) = N + . n i=1 n i=1 p N n p n Note that the estimate τb obtained depends on which n plots are chosen. It’s easy to compute the expectation and variance of τb given the particular set S of n plots chosen for the SRS. So we condition on S and write: à ! N X N X N X NX E(τb|S) = E yi | S = E(yi |S) = pYi = Yi np i∈S np i∈S np i∈S n i∈S because yi is binomial(Yi , p) where Yi is the actual number of individuals in unit i. Now we take the expectation of the above expression over all possible SRS’s S. Since this is just the expected value of N times the sample mean from an SRS, we know by results in chapter 2 (for a finite population) that the expected value is N times the population mean. In other words, the unconditional expected value of τb is " # µ ¶ NX τ E(τb) = E[E(τb|S)] = E Yi = N E(Y ) = N µ = N = τ. n i∈S N Hence, τb is an unbiased estimator of τ . To obtain the variance of τb, we again condition on S. First, note that à ! N X N2 X Var(τb|S) = Var yi | S = 2 2 Var(yi |S) np i∈S n p i∈S = N2 X Yi p(1 − p) n2 p2 i∈S N2 = n2 à ! 1−p X Yi . p i∈S The variance of τb is then computed as Var(τb) = E [Var(τb|S)] + Var [E(τb|S)] " ! à # " # N2 1 − p X NX = E 2 Yi + Var Yi n p n i∈S i∈S à ! N2 1 − p · nµ + N 2 Var(Y ) (since µ = τ /N ) = n2 p "µ à ! # µ ¶ 2 ¶ N 2 (1 − p)µ N − n σ2 1−p µ 2 N −n σ 2 = +N =N + . pn N n N n p n This is the equation in the middle of p. 193 of the text. 4. Estimated detectability with simple random sampling (Section 16.5) The derivation of the variance of τb in this case is outlined on p. 194; the complete derivation will be left as a homework exercise. 133