Some Extensions of Line Intercept Sampling Assume we have a random sample of n transects and that we will use the separate transects n 1X b estimator τp = vi in equation (3) on page 246 of the text and estimate its variance by n i=1 d τb ) = s2 /n. That is, we will calculate the Horvitz-Thompson esequation (5), namely Var( p v timate for each transect separately and then average the estimates to get an overall estimate. Recall that wk is the width of the k th object on a transect and so its probability of inclusion for a single transect is pk = wk /b, where b is the width of the baseline. Recall also that Ci is the collection of sampled objects on the ith transect. The wk ’s should also have a subscript i indicating which transect the object is on, but Thompson has chosen to suppress the additional subscript with the understanding that the wk ’s are different for each transect. There are two basic extensions of line intercept sampling as discussed in the text which are given here. The first considers the case of estimating the mean response per object (as opposed to per unit area). The second considers the situation of variable lengths in the transects. Extensions: 1. Estimating µ, the Per Object Mean: While Thompson considers estimating τ , the total of the y-values for the population, and D = τ /A, the density as number per unit area, he does not consider estimating the mean value of y per object, µ = τ /K, where K is the number of objects in the population. • For example, suppose we wanted to estimate the mean height of shrubs sampled via this method. If K is known, the estimation is straightforward: µb = d τb) Var( τb d b = , Var(µ) . K K2 • However, if K is not known (as would usually be the case), then K must also be estimated and a ratio of means estimator used. First, to estimate K using n X yk 1X , we the method in equation (3) on page 246 τbp = vi where vi = n i=1 p k k²Ci first estimate K for each transect separately using a Horvitz-Thompson estimator (with yk = 1 for all objects) and then average the individual transect estimates of K: c = K i X k∈Ci n n X X 1 X X 1 1 c= 1 c = b =b , where then: K . K i wk /b w n n w k k i=1 i=1 k∈Ci k∈Ci The estimate of the population total τ is: τb = n n X n X 1X bX 1X yk yk = . vi = n i=1 n i=1 k∈Ci wk /b n i=1 k∈Ci wk 122 Combining the estimates of K and τ , a ratio estimator of µ is: n X 1X yk n i=1 k∈Ci wk n 1X ti n i=1 τb t µb = c = X = X = , n X n 1 1 1 u K ui n i=1 k∈Ci wk n i=1 a ratio of two means, where ti = X yk k∈Ci wk and ui = X 1 k∈Ci wk . • To estimate the variance, we can use the Delta method formula (see the handout on the Delta method, part 3, for the case where N is unknown - page 39 in your notes) or bootstrapping of the pairs. The Delta method approximation to the variance is: d µ) b ≈ Var( à 2! t s2 u 4 µ 1 u + 2 n u ¶ 2 s à t t −2 3 n u ! ρbt,u st su , n where s2t and s2u are the sample variances and ρbt,u is the sample correlation between the ti ’s and the ui ’s. Note that b is not needed in either formula. 2. Transects of Variable Length: If the region is irregularly shaped, then the transects may vary in length. Let L1 , . . . , Ln denote the lengths of the n randomly chosen transects. The Li are then random variables whose values are unknown until the transects are selected. • Let E(L) be the expected length of a random transect. Then E(L) = A/b where A is the total area of the region and b is the length of the baseline. • The separate transects and Horvitz-Thompson estimators on pages 245-247 of the text (pages 113-114 of the notes) are still unbiased even if transect lengths differ and we can, if we choose, ignore the differing lengths of the transects. • However, we might want to consider alternative estimators for two reasons: (i) Estimators which account for the differing transect lengths might be better (i.e., have smaller variance). (ii)We might not know the total area A for an irregularly shaped region and might want alternative estimators which do not require knowledge of the total area. Each of these reasons is explored below. (i) Accounting for Different Transect Lengths: Consider estimation of τ , the population total. The estimator of τ (Eq.(3) on page 246 of the text) is constructed by computing the Horvitz-Thompson (H-T) estimator of τ on each transect and then averaging the estimates. The H-T estimator on each transect is: vi = X yk k∈Ci pk =b X yk k∈Ci wk = bti , where ti = 123 X yk k∈Ci wk . • It seems clear that vi based on longer transects will tend to be larger than those based on shorter transects. Therefore, we could use transect length as an auxiliary variable if we knew E(L), the average transect length for the whole area. If we view the transect as the sampling unit and vi as the response, then we want to use ratio estimation to estimate the mean value of v for the whole population of possible transects. From page 68 in the text, with yi representing vi , xi representing Li , and µx = E(L), the ratio estimate of τ is: n n n X X X v bE(L) t A ti i i i=1 i=1 i=1 E(L) = τbr = = X . n n n X X Li Li i=1 (*) Li i=1 i=1 • The variance of τbr can be approximated by Equation (5) on page 69: d τb ) = Var( r where r = n X i=1 vi / n X n X s2r 1 = (vi − rLi )2 , n n(n − 1) i=1 Li and where the finite population correction (fpc) has been i=1 ignored because the population of possible transects is assumed to be infinite. (ii) Estimating D when A is Unknown: Consider estimation of D = τ /A, the population density per unit area. Based on the equation (*) above, an estimate of D is: n X b c = τr = i=1 D r n X A ti , where ti = X yk k∈Ci Li wk . i=1 c does not depend on knowing A or E(L). Note that D r • If each object intersected by a transect represents one animal, then the density of animals is estimated by setting yk = 1. However, if each object can represent more than one animal (such as Example 19.1 in the last handout on wolverine tracks) then the yk are the number of animals associated with each object. If objects encountered are bushes and yk is the weight of berries on a bush, then D is the average weight of berries per unit area. c is (see the formulas at the top of p. 70 of the text): • The estimated variance of D r µ d D c) = Var( r n d τb ) X Var( 1 vi − rLi r = 2 A n(n − 1) i=1 A ti − L i n X 1 = n(n − 1) i=1 " n X tj / j=1 n X 124 2 Lj j=1 E(L) n 1 X 1 c )2 (ti − Li D = r nE(L)2 n − 1 i=1 ¶2 # since vi = bti and A = bE(L). If E(L) is not known, it can be estimated by n 1X L= Li , the sample mean transect length. n i=1 • Note that the ratio estimate of D based on a single transect of length Li would bti Pn E(L) c ti Li i=1 Li Di c c is a weighted average of the ratio be Di = = so that Dr = P n A Li i=1 Li estimates of density from the individual transects. Transect Sampling Experiment Suppose the desks in this classroom are randomly dispersed, and we use transect sampling to estimate the proportion of the room covered by desks, as well as the total number of desks in the room. Note that both of these quantities can be computed exactly. • To estimate these population quantities via line intercept sampling, the front wall is considered the base of the classroom, and four transects are randomly selected. Along these line transects, we count the number of desks intersected, and need to measure the inclusion and joint inclusion probabilities of all desks encountered. How might we do this? • The length of the baseline is 306 inches (=777 cm). Four random transects will be selected (we could do a systematic sample with interval 306/4 = 76 inches and random starting point from 0 to 76 and treat it as an SRS). Breaking into groups, we want to record the following for each desk intersected by a transect (The transects are each of length 274 inches = 696 cm): 1. The vertical length of intersection with the desk (i.e.: how much of the surface of the desk is intersected along the transect?) 2. The horizontal width of the desk to the left of the transect. 3. The horizontal width of the desk to the right of the transect. • The resulting data are recorded in the table below: Transect 1 Left Right Length Width Width Transect 2 Left Right Length Width Width Transect 3 Left Right Length Width Width 125 Transect 4 Left Right Length Width Width • The total number of desks in the room can be estimated in two ways: 1. Treating the desks encountered as one big sample, the total number of desks can be estimated with a Horvitz-Thompson estimator, through computation of the inclusion and joint inclusion probabilities. 2. Treating the transects separately, a Horvitz-Thompson estimate vi can be computed for each of the four transects separately and then averaged to give τb. • There are two ways we could estimate the total area covered by the desks: 1. Let yi = the length of transect i which intersects a desk; then a ratio estimator of P P the area covered by desks is: τbr = A · yi / Li , where A = the area of the region (the room in this case). The estimated proportion of the area covered by desks is P P yi / Li . 2. Rather than just measuring the length of each transect which falls on each desk, we could measure the area yk of each intercepted desk. Then we could use either a separate transects estimate or Horvitz-Thompson estimate of the total area covered by desks. The separate transects estimate is τb = 4 X yk 1X vi , where: vi = n i=1 k∈Ci wk /b If the area of each desk is the same, say y, then this simplifies to τb = y P 1 k∈Ci wk /b = c where K c is the estimated total number of desks. If we know the area A of the y K, room, then an alternative estimate of the proportion of the room covered is τb/A; c if the desks are all the same size, this equals y K/A. 126