Module H6 Woods Solution 10 “To the Woods” Discussion The discussion in Session 10 of the “To the Woods” exercise will be based on results presented in this handout. The following gives an outline of what was expected in the practical, i.e. a set of results (similar to the data values you might have collected), and the corresponding answers to the questions posed. Clearly your data will be different, but the general approach and methods used would be similar. The objective regarding estimation of the proportion of large trees in the forest will also be discussed, and an appreciation given about the meaning and use of post-stratification. Part 1 - Scheme A - SRS of 14 strips from whole forest Data Strip Number Small trees (x) Large trees (y) Total (z) 008 172 98 270 019 028 030 035 060 064 065 070 113 117 232 218 216 204 217 202 185 192 125 125 101 103 103 103 108 98 90 122 81 88 333 321 319 307 325 300 275 314 206 213 136 151 156 Total 124 118 130 2460 84 70 87 1336 208 188 217 3796 Mean Variance SADC Course in Statistics 175.71 95.43 271.14 s 2z = 2840.75 Module H2 Woods Solution 10 – Page 1 Module H6 Woods Solution 10 (i) (ii) Mean Number of large trees per strip, y = 95.43 Mean Overall Number of trees per strip, z = 271.14 Proportion of large trees in the forest = y z y T z T 0.3516 (iii) Total Number of trees in the forest = 168 z = 45552 (iv) Standard error of the estimate in (iii) = 168 1 14 168 s 14 2 z = 2291.2 Part 2 - Scheme B (Stratified random sample (8 from Region 1, 6 from Region 2) Data Strip Number 021 028 030 033 043 044 048 086 Total Region 1 Small (x) Large (y) 159 218 216 192 216 204 189 208 1602 87 103 103 96 95 98 98 103 783 Region 2 Small (x) Large (y) Total (z) Strip Number Total (z) 246 321 319 288 311 302 287 311 2385 017 032 033 042 063 066 125 118 128 154 136 115 81 84 80 85 86 85 206 202 208 239 222 200 Total 776 501 1277 Mean 200.3 97.9 298.1 Mean 129.3 83.5 212.8 Variance 397.2 29.8 607.1 Variance 201.4 5.9 224.1 (i) y 96 168y1 72 168y 2 9171 . z 96 168 z1 72 168 z2 26157 . (ii) Proportion of large trees = y z = 0.3506 SADC Course in Statistics Module H2 Woods Solution 10 – Page 2 Module H6 Woods Solution 10 (iii) Total number of trees in Region 1 = 28620 Total number of trees in Region 2 = 15324 Total number of trees in the forest = 43944 (iv) Variance of the estimate in (iii) for total number of trees in forest is 962 Var z1 72 2 Var z 2 = 962 1 8 96 s12 8 72 2 1 6 72 s22 6 = 9688 s12 = 1056s12 792s22 818513 8 72 66 s22 6 Hence, standard error = 818513 = 904.72 FURTHER QUESTIONS: 1. If we look at the sample mean and variance in each of the two regions, we see that there are substantial differences between the two regions. For example there are many more trees, both small and large, in region 1 compared to region 2. In scheme A, we ignored any differences between the two regions and this led to a large variance. In scheme B, we took account of the difference between the two regions in computing the standard error of our estimate of the total number of trees in the forest. Since the variation within each region is smaller than in the forest as a whole, this led to a smaller standard error. 2. We can use the approach given above, i.e. y z , or we could use the mean of the proportions in each strip, i.e. y i z i n . For instance, in scheme A, we would get 87 98 101 14 0.357 270 333 217 SADC Course in Statistics Module H2 Woods Solution 10 – Page 3 Module H6 Woods Solution 10 which is quite close to the first estimate of 0.352. The alternative estimate above is not recommended since it has large bias and variance. What is perhaps more important is that they are measuring different things: the first is taken with respect to trees, the second with respect to strips. If we were actually interested in strips, then the average proportion above might be relevant. However, in our exercise, we are interested in the proportion of trees in the forest as a whole and we do better to use the ratio of the sample totals for y and z. 3. This is because we have proportional allocation i.e. 96:72 is the same ratio as 8:6). for example, the mean number of large trees per strip estimated by y 96 168y1 72 168y 2 is identical to y 8 14y1 6 14y 2 8y1 6y 2 14 = mean of the 14 observations Thus if you have proportional allocation, then the estimate of the mean obtained by weighting the means of each region is identical to the ordinary mean of all the observations taken together. 4. The argument is incorrect because the total number of large trees, y, in a given strip, cannot be regarded as being binomially distributed since (a) we are not sampling trees individually, but in groups, i.e. strips, and (b) the probability of a large tree in any strip is not the same from strip to strip. We do not know in advance how many trees will occur in each of the strips included in our sample; each strip has to be considered as a unit with two measurements being taken from it, namely y and z. We are then interested in the ratio of the population total of y to that for z. Note also that both y and z are random variables and hence determining the standard error of their ratio is not trivial. We now discuss briefly some ideas associated with post-stratification. SADC Course in Statistics Module H2 Woods Solution 10 – Page 4