Line-Intercept Sampling (Chapter 19)

advertisement
Line-Intercept Sampling (Chapter 19)
Most of the sampling methods discussed thus far have consisted of choosing a sample from
a well-defined population according to some probabilistic mechanism. Consider now taking
a sample in some area where there are stationary “objects” of interest (e.g., ponds, shrubs,
patches of a certain habitat type, wolf tracks in the snow) and the objective is to estimate the
total number or density of the objects or the average or total value of some characteristic of
the objects. One way to carry out such objectives is to randomly choose “lines” (transects)
through the area, and sample all units which are intercepted by these lines; hence the term
line-intercept sampling.
Example: Reconsider the farms example, where a 25x25 unit2 area (625 pixels) was partitioned into a number of farms of different sizes and with different numbers of workers per
farm. We considered this example in the notes for Chapter 6 on Unequal Probability Sampling. The goal was to estimate the total number of farms, total number of workers per farm,
the mean size of a farm, or the mean number of workers per farm. Recall that we sampled
farms with probability proportional to size (PPS) by picking random points on the grid. We
looked at Hansen-Hurwitz and Horvitz-Thompson estimators of the population parameters.
• For line-intercept sampling, consider taking line transects perpendicular to the base
of the farms area. Since the length along this base is 25 units, we randomly select a
number between 0 and 25, and sample the line drawn vertically from the chosen point.
• For each vertical transect chosen, we sample all farms intersected by the transect and
record the variables of interest (the total number of farms will be estimated by letting
yi = l).
• Estimation of these quantities using line-intercept sampling will be demonstrated later
in this handout using R.
Notation for Line-Intercept Sampling: Let:
K = the number of distinct objects in the population (# of farms)
yk = the response variable for the k th unit, k = 1, . . . , K
τ =
K
X
yk = the population total (total # farms, or total # workers)
k=1
A = the total area of the study region (= 625 for the farms example)
b = the baseline width of the study region,
wk = the width along the baseline of the k th object, k = 1, . . . , K,
D = τ /A = the density of the response variable per unit area (# workers per unit2 )
wk
= the probability the k th unit is intersected by a randomly chosen line transect
pk =
b
(selection probability), k = 1, . . . , K,
112
πk = the probability that the k th unit is included in a sample of n line transects
(inclusion probability), k = 1, . . . , K,
πkh = the joint inclusion probability of the k th and hth units in a random sample of n
line transects, k, h = 1, . . . , K.
Thompson presents two basic ways with line-intercept sampling to estimate population totals
and means and calculate associated standard errors. For estimating a population total, these
two ways are:
1. Separate transects estimator: calculate the Horvitz-Thompson estimate for each of the
n transects separately, then average these n estimates to get an overall estimate of the
population total.
2. Horvitz-Thompson estimator: calculate the Horvitz-Thompson estimate based on the
distinct objects intercepted by the whole set of n transects.
These two approaches are discussed in more detail below.
Separate Transects Estimator: Use the information attained from a single line transect (say
the ith transect) to estimate the population total τ . Let Ci be the set of all objects in the
region intersected by the ith transect and suppose we have measured some response yk on
each of the units in Ci . Then an unbiased estimator of the population total is given by:
X yk
vi =
.
k∈Ci pk
This is the Horvitz-Thompson estimator for a single transect (note that we cannot calculate
the Hansen-Hurwitz estimator because individual objects are not selected randomly with
replacement).
• For n such vertical line transects, we obtain n estimates of τ , given by v1 , v2 , . . . , vn ,
where these estimates are independent and identically distributed. Hence, an unbiased
estimator of τ and corresponding variance based on all n transects are given by:
τbp =
Var(τbp ) =
n
1X
vi (the average of the vi ’s),
n i=1
1
σ2
Var(vi ) = v (where σv2 is the variance of vi ).
n
n
• Using the sample variance of the vi ’s as an estimate of σv2 , an unbiased estimate of the
variance of τbp is given by:
d τb ) =
Var(
p
n
1 X
s2v
where: s2v =
(vi − τbp )2 .
n
n − 1 i=1
• Note that there is no finite population correction (fpc) here. Why?
113
• Note that with this estimator, a given object might be intersected by more than one
transect and hence be included in the sample more than once.
Horvitz-Thompson Estimator: Recall that the Horvitz-Thompson estimator gives a general
way of acquiring an unbiased estimator of a population total where distinct selected units are
used but once in the development of the estimator. For line-transect sampling, we use the
whole set of objects intercepted by at least one of the n transects. Letting v be the number of
distinct objects intercepted, recall that the general form of the Horvitz-Thompson estimator
of the total and corresponding
variance and estimated variance are given by:
v
X
yk
τbπ =
(where v = the # of distinct units selected),
k=1 πk
Var(τbπ ) =
d τb ) =
Var(
π
¶
N µ
X
1 − πi
i=1
Ã
v
X
i=1
πi
!
yi2
+
N X
X
Ã
i=1 j6=i
Ã
v X
X
πij − πi πj
πi πj
πij − πi πj
1 − πi 2
yi +
2
πi
πi πj
i=1 j6=i
!
yi yj ,
!
yi yj
,
πij
where the πi ’s and πij ’s are the inclusion and joint inclusion probabilities.
• So, for example, if yk = the number of workers on farm k, and v = the total number of
distinct farms selected in the line transect samples, then an unbiased (H-T) estimate
of the total number of workers in the farm region is given by:
τbπ =
v
X
yk
k=1
πk
.
To use the Horvitz-Thompson estimator for line-intercept sampling, we need to compute
the inclusion and joint inclusion probabilities for all objects intercepted (sampled) by the
vertical transects. How do we do this?
πk =
=
πkh =
=
=
• One favorable aspect of the Horvitz-Thompson estimator over the separate transect estimator (as previously discussed) is the fact that it only relies on distinct units from the
sample. The major drawback is the difficulty with which the joint inclusion probabilities
114
are computed. In addition, the estimated variance of a Horvitz-Thompson estimator is
not guaranteed to be positive, while the separate transect variance estimator is.
Estimating the Density using the Total: With either of the two estimators for the population
total just developed, an estimate of the density per unit area (mean) and corresponding
variance are given by:
c=
D
τb
c =
, Var(D)
A
where A = the total area of the study region.
• If the study region is rectangular with base of width b and constant transect length l,
then A = bl. Note that the estimators for the total and the density do not rely on the
transect length in any way.
• If the study region is irregularly-shaped, so that the transect lengths may vary from
transect to transect, the estimators given are still unbiased estimators of the population
quantities. However, if the lengths of the transects vary greatly, we might make use
of any relationship between the response variable and the transect length via ratio or
regression estimation to improve the estimators. This will be discussed later.
Example (Example 19.1 on pp. 247-250 in the text): Researchers were interested in estimating the abundance of wolverines in a certain region. Aircraft were flown over selected
transects looking for tracks in the snow. Each set of tracks encountered along the transect
was mapped. The variable of interest in the study was yk = the number of wolverines associated with the k th set of tracks. A map of the survey area is shown below with the tracks
mapped. The area was 36 miles in length at the base and 20 miles in length from the base.
115
Four systematic samples of 3 transects each were taken with random starting points in the
first 12 miles given by A1, B1, C1, and D1 in the map. So, for the purposes of the study, there
were four total transects, each consisting of three flights over the region. The 12 selected
transects intersected v = 4 distinct sets of tracks. Letting:
yk = the number of wolverines on the k th set of tracks (k = 1, 2, 3, 4),
wk = the width of the projection of the k th set of tracks onto the base of the region,
pk = wk /12 = the probability of intersecting the k th set of tracks for a given transect,
πk = 1 − (1 − pk )4 = the inclusion probability of the k th set of tracks in the sample,
the following table summarizes the information on the tracks:
Track #
1
2
3
4
yi
1
2
2
1
wi (miles)
5.25
7.50
2.40
7.05
pi = wi /12
.4375
.6250
.2000
.5875
πi
.90
.98
.59
.97
Separate Transects Estimate of τ = Total # of Wolverines The first transect (A1, A2, and
A3) intersects the 1st, 2nd, and 4th sets of tracks, giving the following estimate of the total:
v1 =
y1 y2 y4
1
2
1
+
+
=
+
+
= 7.1878 wolverines,
p1 p2 p4
.4375 .6250 .5875
and similarly, v2 = 7.1878, v3 = 11.7021, & v4 = 11.7021. Hence, the estimated total number
of wolverines in the region, and corresponding standard error are given by:
τbp =
4
1X
1
vi = [7.1878 + 7.1878 + 11.7021 + 11.7021] = 9.44 wolverines,
4 i=1
4
s
d τb ) =
SE(
p
s2v
n
v
u
u
=t
4
1 X
(vi − τbp )2 /n =
4 − 1 i=1
s
6.7930 √
= 1.698 = 1.30 wolverines.
4
Horvitz-Thompson Estimate of τ = Total # of Wolverines Using the inclusion probabilities
in the above table, the estimated total number of wolverines in the region is:
τbπ
v
X
yk
·
¸
2
2
1
1
=
+
+
+
= 7.57 wolverines.
=
.90 .98 .59 .97
k=1 πk
The joint inclusion probabilities are computed on page 250 of the text, with the standard
error found to be 2.30 wolverines, which is nearly twice that of the separate transects estimate.
116
Example 2 (Farms Example): Recall the scenario given earlier where line transects are used
to sample farms in the farm example. R will be used to estimate population parameters of
interest under two settings. First, we consider estimation using a single transect, and second
we consider estimation using three transects. The reason for looking at a single transect first
is to illustrate clearly the computation of the joint inclusion probabilities, and to examine
the gains in terms of the SE’s in taking more than one transect.
> runif(1,0,25)
[1] 10.78054
# Select x-coordinate for transect
# Record the widths of the 6 farms intersected
# ============================================
> wk <- c(1,4,2,8,5,5)
> pk <- wk/25
# Prob. of inclusion for a single transect
> pk
µ
¶
wk
[1] 0.04 0.16 0.08 0.32 0.20 0.20
pk = πk =
b
#
#
>
>
>
>
>
>
>
Record overlap of the farms for computation of joint inclusion probabilities
============================================================================
wkh <- matrix(0,nrow=6,ncol=6)
wkh[1,2] <- 1; wkh[1,3] <- 1; wkh[1,4] <- 1; wkh[1,5] <- 1
wkh[1,6] <- 1; wkh[2,3] <- 2; wkh[2,4] <- 4; wkh[2,5] <- 4
wkh[2,6] <- 4; wkh[3,4] <- 2; wkh[3,5] <- 2; wkh[3,6] <- 2
wkh[4,5] <- 5; wkh[4,6] <- 5; wkh[5,6] <- 4
pkh <- wkh/25
µ
¶
pkh
wkh
πkh =
b
[,1] [,2] [,3] [,4] [,5] [,6]
[1,]
0 0.04 0.04 0.04 0.04 0.04
[2,]
0 0.00 0.08 0.16 0.16 0.16
[3,]
0 0.00 0.00 0.08 0.08 0.08
[4,]
0 0.00 0.00 0.00 0.20 0.20
[5,]
0 0.00 0.00 0.00 0.00 0.16
[6,]
0 0.00 0.00 0.00 0.00 0.00
# Estimate the total number of farms (N). Here, yi=1 for all farms
# =================================================================
> yk <- rep(1,6)
> yk
[1] 1 1 1 1 1 1
Ã
!
> tau.hat <- sum(yk/pk)
v
X
yi
τbπ =
> tau.hat
# H-T Estimate of
i=1 πi
[1] 56.875
#
N (N=79 here)
117
#
#
>
>
>
Estimate the variance of the estimated total number of farms (Eq. 6, p.54)
==========================================================================
c1 <- sum((1/pk^2 - 1/pk)*yk^2)
Ã
Ã
! !
v
X
c2 <- 0
1
1
c1 =
yk2
−
2
for (k in 1:5){
π
π
k
k
k=1
for (h in (k+1):6){
c2 <- c2 + 2*((1/(pk[k]*pk[h]) - 1/pkh[k,h])*yk[k]*yk[h])
}


¶
v Xµ
X
1
1
}
c2 = 2
−
yk yh 
π
π
π
> var.tau.hat <- c1 + c2
k
h
kh
k=1 h>k
> sqrt(var.tau.hat) # Estimated SE of tau.hat
[1] 52.51562
# Estimate the total number of workers
# ====================================
> yk <- c(3,5,1,9,4,5)
> tau.hat <- sum(yk/pk)
> tau.hat
# Estimated number of workers
[1] 191.875
> c1 <- sum((1/pk^2 - 1/pk)*yk^2)
> c2 <- 0
> for (k in 1:5){
for (h in (k+1):6){
c2 <- c2 + 2*((1/(pk[k]*pk[h]) - 1/pkh[k,h])*yk[k]*yk[h])
}
}
> var.tau.hat <- c1 + c2
> sqrt(var.tau.hat) # Estimated SE of tau.hat
[1] 172.0585
Now, suppose three transects are selected at random. We will use the transect selected above
and just select two more below.
> runif(2,0,25)
[1] 7.020874 4.758928
#
#
>
>
>
>
# Two more randomly selected x-coordinates
Widths and Selection Probabilities for all 3 Transects
======================================================
w1 <- c(1,4,2,8,5,5)
w2 <- c(3,3,4,4,2,4,3,2,6,3)
w3 <- c(5,2,2,4,3,2,5,6,2)
p1 <- w1/25; p2 <- w2/25; p3 <- w3/25
118
# Estimate the total number of farms (separate transects)
#========================================
> y1 <- rep(1,6)
> y2 <- rep(1,10)
> y3 <- rep(1,9)
> v1 <- sum(y1/p1)
> v2 <- sum(y2/p2)
> v3 <- sum(y3/p3)
> c(v1,v2,v3)
[1] 56.875 81.250 78.750
> mean(c(v1,v2,v3))
# Estimated total number Ã
!
n
1X
[1] 72.29167
#
of farms
vi
τbp =
n i=1
> sqrt(var(c(v1,v2,v3))/3)

s 
[1] 7.742043
# SE of the estimate
s2v 
SE(τbp ) =
n
# Estimate the total number of workers (separate transects)
#==========================================
> y1 <- c(3,5,1,9,4,5)
> y2 <- c(4,2,7,5,3,4,5,2,2,3)
> y3 <- c(6,3,3,4,2,3,3,2,1)
> v1 <- sum(y1/p1)
> v2 <- sum(y2/p2)
> v3 <- sum(y3/p3)
> c(v1,v2,v3)
[1] 191.875 287.500 220.000
> mean(c(v1,v2,v3))
!
[1] 233.125
# Estimated total number of workers ÃX
N
yi = 249
> sqrt(var(c(v1,v2,v3))/3)
i=1
[1] 28.3739
# SE of the estimate
#
#
>
>
>
>
>
>
Horvitz-Thompson Estimates: Use only DISTINCT farms
===================================================
w1 <- c(1,4,2,8,5,5)
w2 <- c(3,3,4,4,2,4,3,2,6,3)
w3 <- c(5,2,2,3,2,5,2)
p1 <- w1/25
p2 <- w2/25
p3 <- w3/25
119
# Estimate the total number of farms (H-T)
# ========================================
> y1 <- rep(1,6)
> y2 <- rep(1,10)
> y3 <- rep(1,7)
> p <- c(p1,p2,p3)
> y <- c(y1,y2,y3)
> sum(y/(1-(1-p)^3))
Ã
!
v
v
X
X
y
y
k
k
[1] 77.26119
τbπ =
=
n
k=1 πk
k=1 1 − (1 − pk )
# Estimate the total number of workers (H-T)
# ==========================================
> y1 <- c(3,5,1,9,4,5)
> y2 <- c(4,2,7,5,3,4,5,2,2,3)
> y3 <- c(6,3,3,2,3,3,1)
> p <- c(p1,p2,p3)
> y <- c(y1,y2,y3)
> sum(y/(1-(1-p)^3))
Ã
!
v
v
X
X
yk
yk
[1] 253.6919
τbπ =
=
n
k=1 πk
k=1 1 − (1 − pk )
The standard errors for the Horvitz-Thompson estimates were not calculated here, as the
code becomes increasingly lengthy with more than 1 transect.
120
Farms Map
The 25x25 grid map below represents the boundaries of farms in a certain area.
• The letters for each farm represent the “type” of farm.
• The numbers for each farm represent the number of workers on the farm.
• After adding the missing farm information from earlier in the course, there are N = 79
total farms and τy = 249 total workers.
B1
A1
C3
A2
D2
B2
C4
C8
A5
A3
B2
C D2
2
B5
C3
B4
D2
A1
D3
C2
D4
B2
B
2 C1
A9
B3
D2 A B
1 2
B4
A1
C3
D5
C2
D4
A3
D4
C B1
3
D5
B3
A1
A C2 B A3
3
2
D1
B4
C2
D7
C3
D5
A2
C6
C
2
B3
C3
B5
B4
D4
A2
D3
A3
B2
D4
B
3
A8
C4
121
A
B1
2 B
3 C A2 C2
B6
4
D4
B3
A5
C5
Download