Some Extensions of Line Intercept Sampling

advertisement
Some Extensions of Line Intercept Sampling
Assume we have a random sample of n transects and that we will use the separate transects
n
1X
b
estimator τp =
vi in equation (3) on page 246 of the text and estimate its variance by
n i=1
d τb ) = s2 /n. That is, we will calculate the Horvitz-Thompson esequation (5), namely Var(
p
v
timate for each transect separately and then average the estimates to get an overall estimate.
Recall that wk is the width of the k th object on a transect and so its probability of inclusion for a single transect is pk = wk /b, where b is the width of the baseline. Recall also
that Ci is the collection of sampled objects on the ith transect. The wk ’s should also have a
subscript i indicating which transect the object is on, but Thompson has chosen to suppress
the additional subscript with the understanding that the wk ’s are different for each transect.
There are two basic extensions of line intercept sampling as discussed in the text which
are given here. The first considers the case of estimating the mean response per object (as
opposed to per unit area). The second considers the situation of variable lengths in the
transects.
Extensions:
1. Estimating µ, the Per Object Mean: While Thompson considers estimating τ , the total
of the y-values for the population, and D = τ /A, the density as number per unit area,
he does not consider estimating the mean value of y per object, µ = τ /K, where K is
the number of objects in the population.
• For example, suppose we wanted to estimate the mean height of shrubs sampled
via this method. If K is known, the estimation is straightforward:
µb =
d τb)
Var(
τb d
b =
, Var(µ)
.
K
K2
• However, if K is not known (as would usually be the case), then K must also
be estimated and a ratio of means estimator
 used. First, to estimate K using
n
X yk
1X
, we
the method in equation (3) on page 246 τbp =
vi where vi =
n i=1
p
k
k²Ci
first estimate K for each transect separately using a Horvitz-Thompson estimator
(with yk = 1 for all objects) and then average the individual transect estimates of
K:
c =
K
i
X
k∈Ci
n
n X
X 1
X
X
1
1
c= 1
c = b
=b
, where then: K
.
K
i
wk /b
w
n
n
w
k
k
i=1
i=1
k∈Ci
k∈Ci
The estimate of the population total τ is:
τb =
n
n X
n X
1X
bX
1X
yk
yk
=
.
vi =
n i=1
n i=1 k∈Ci wk /b
n i=1 k∈Ci wk
122
Combining the estimates of K and τ , a ratio estimator of µ is:
n X
1X
yk
n i=1 k∈Ci wk
n
1X
ti
n i=1
τb
t
µb = c = X
= X
= ,
n X
n
1
1
1
u
K
ui
n i=1 k∈Ci wk
n i=1
a ratio of two means, where ti =
X yk
k∈Ci
wk
and ui =
X 1
k∈Ci
wk
.
• To estimate the variance, we can use the Delta method formula (see the handout
on the Delta method, part 3, for the case where N is unknown - page 39 in your
notes) or bootstrapping of the pairs. The Delta method approximation to the
variance is:
d µ)
b ≈
Var(
à 2!
t
s2
u
4
µ
1
u
+ 2
n
u
¶ 2
s
Ã
t
t
−2 3
n
u
!
ρbt,u st su
,
n
where s2t and s2u are the sample variances and ρbt,u is the sample correlation between
the ti ’s and the ui ’s. Note that b is not needed in either formula.
2. Transects of Variable Length: If the region is irregularly shaped, then the transects may
vary in length. Let L1 , . . . , Ln denote the lengths of the n randomly chosen transects.
The Li are then random variables whose values are unknown until the transects are
selected.
• Let E(L) be the expected length of a random transect. Then E(L) = A/b where A
is the total area of the region and b is the length of the baseline.
• The separate transects and Horvitz-Thompson estimators on pages 245-247 of the
text (pages 113-114 of the notes) are still unbiased even if transect lengths differ
and we can, if we choose, ignore the differing lengths of the transects.
• However, we might want to consider alternative estimators for two reasons:
(i) Estimators which account for the differing transect lengths might be
better (i.e., have smaller variance).
(ii)We might not know the total area A for an irregularly shaped region
and might want alternative estimators which do not require knowledge
of the total area.
Each of these reasons is explored below.
(i) Accounting for Different Transect Lengths: Consider estimation of τ , the population total. The estimator of τ (Eq.(3) on page 246 of the text) is constructed by
computing the Horvitz-Thompson (H-T) estimator of τ on each transect and then
averaging the estimates. The H-T estimator on each transect is:
vi =
X yk
k∈Ci
pk
=b
X yk
k∈Ci
wk
= bti , where ti =
123
X yk
k∈Ci
wk
.
• It seems clear that vi based on longer transects will tend to be larger than those
based on shorter transects. Therefore, we could use transect length as an auxiliary
variable if we knew E(L), the average transect length for the whole area. If we view
the transect as the sampling unit and vi as the response, then we want to use ratio
estimation to estimate the mean value of v for the whole population of possible
transects. From page 68 in the text, with yi representing vi , xi representing Li ,
and µx = E(L), the ratio estimate of τ is:
 n

n
n
X
X
X
v
bE(L)
t
A
ti
i
i

 i=1 
i=1
i=1
 E(L) =
τbr = 
= X
.
n
n
n
X

X


Li
Li
i=1
(*)
Li
i=1
i=1
• The variance of τbr can be approximated by Equation (5) on page 69:
d τb ) =
Var(
r
where r =
n
X
i=1
vi /
n
X
n
X
s2r
1
=
(vi − rLi )2 ,
n
n(n − 1) i=1
Li and where the finite population correction (fpc) has been
i=1
ignored because the population of possible transects is assumed to be infinite.
(ii) Estimating D when A is Unknown: Consider estimation of D = τ /A, the population density per unit area. Based on the equation (*) above, an estimate of D is:
n
X
b
c = τr = i=1
D
r
n
X
A
ti
, where ti =
X yk
k∈Ci
Li
wk
.
i=1
c does not depend on knowing A or E(L).
Note that D
r
• If each object intersected by a transect represents one animal, then the density of
animals is estimated by setting yk = 1. However, if each object can represent more
than one animal (such as Example 19.1 in the last handout on wolverine tracks)
then the yk are the number of animals associated with each object. If objects
encountered are bushes and yk is the weight of berries on a bush, then D is the
average weight of berries per unit area.
c is (see the formulas at the top of p. 70 of the text):
• The estimated variance of D
r
µ
d D
c) =
Var(
r
n
d τb )
X
Var(
1
vi − rLi
r
=
2
A
n(n − 1) i=1
A

 ti − L i
n 
X
1

=

n(n − 1) i=1 

"
n
X
tj /
j=1
n
X
124
2
Lj 
j=1
E(L)
n
1 X
1
c )2
(ti − Li D
=
r
nE(L)2 n − 1 i=1
¶2
#





since vi = bti and A = bE(L). If E(L) is not known, it can be estimated by
n
1X
L=
Li , the sample mean transect length.
n i=1
• Note that the ratio estimate of D based on a single transect of length Li would
bti
Pn
E(L)
c
ti
Li
i=1 Li Di
c
c
is a weighted average of the ratio
be Di =
=
so that Dr = P
n
A
Li
i=1 Li
estimates of density from the individual transects.
Transect Sampling Experiment
Suppose the desks in this classroom are randomly dispersed, and we use transect sampling
to estimate the proportion of the room covered by desks, as well as the total number of desks
in the room. Note that both of these quantities can be computed exactly.
• To estimate these population quantities via line intercept sampling, the front wall is
considered the base of the classroom, and four transects are randomly selected. Along
these line transects, we count the number of desks intersected, and need to measure
the inclusion and joint inclusion probabilities of all desks encountered. How might we
do this?
• The length of the baseline is 306 inches (=777 cm). Four random transects will be
selected (we could do a systematic sample with interval 306/4 = 76 inches and random
starting point from 0 to 76 and treat it as an SRS). Breaking into groups, we want to
record the following for each desk intersected by a transect (The transects are each of
length 274 inches = 696 cm):
1. The vertical length of intersection with the desk (i.e.: how much of the surface of
the desk is intersected along the transect?)
2. The horizontal width of the desk to the left of the transect.
3. The horizontal width of the desk to the right of the transect.
• The resulting data are recorded in the table below:
Transect 1
Left
Right
Length Width Width
Transect 2
Left
Right
Length Width Width
Transect 3
Left
Right
Length Width Width
125
Transect 4
Left
Right
Length Width Width
• The total number of desks in the room can be estimated in two ways:
1. Treating the desks encountered as one big sample, the total number of desks can
be estimated with a Horvitz-Thompson estimator, through computation of the
inclusion and joint inclusion probabilities.
2. Treating the transects separately, a Horvitz-Thompson estimate vi can be computed for each of the four transects separately and then averaged to give τb.
• There are two ways we could estimate the total area covered by the desks:
1. Let yi = the length of transect i which intersects a desk; then a ratio estimator of
P
P
the area covered by desks is: τbr = A · yi / Li , where A = the area of the region
(the room in this case). The estimated proportion of the area covered by desks is
P
P
yi / Li .
2. Rather than just measuring the length of each transect which falls on each desk,
we could measure the area yk of each intercepted desk. Then we could use either a
separate transects estimate or Horvitz-Thompson estimate of the total area covered
by desks. The separate transects estimate is
τb =
4
X yk
1X
vi , where: vi =
n i=1
k∈Ci wk /b
If the area of each desk is the same, say y, then this simplifies to τb = y
P
1
k∈Ci wk /b
=
c where K
c is the estimated total number of desks. If we know the area A of the
y K,
room, then an alternative estimate of the proportion of the room covered is τb/A;
c
if the desks are all the same size, this equals y K/A.
126
Download