"Woods" solution

advertisement
Module H6 Woods Solution 10
“To the Woods” Discussion
The discussion in Session 10 of the “To the Woods” exercise will be based on results
presented in this handout.
The following gives an outline of what was expected in the practical, i.e. a set of results
(similar to the data values you might have collected), and the corresponding answers to the
questions posed. Clearly your data will be different, but the general approach and methods
used would be similar.
The objective regarding estimation of the proportion of large trees in the forest will also be
discussed, and an appreciation given about the meaning and use of post-stratification.
Part 1 - Scheme A - SRS of 14 strips from whole forest
Data
Strip Number
Small trees (x)
Large trees (y)
Total (z)
008
172
98
270
019
028
030
035
060
064
065
070
113
117
232
218
216
204
217
202
185
192
125
125
101
103
103
103
108
98
90
122
81
88
333
321
319
307
325
300
275
314
206
213
136
151
156
Total
124
118
130
2460
84
70
87
1336
208
188
217
3796
Mean
Variance
SADC Course in Statistics
175.71
95.43
271.14
s 2z = 2840.75
Module H2 Woods Solution 10 – Page 1
Module H6 Woods Solution 10
(i)
(ii)
Mean Number of large trees per strip, y
= 95.43
Mean Overall Number of trees per strip,
z
= 271.14
Proportion of large trees in the forest = y z  y T z T  0.3516
(iii) Total Number of trees in the forest = 168 z = 45552
(iv) Standard error of the estimate in (iii)
=
168
1  14 168 s 14
2
z
= 2291.2
Part 2 - Scheme B (Stratified random sample (8 from Region 1, 6 from Region 2)
Data
Strip
Number
021
028
030
033
043
044
048
086
Total
Region 1
Small (x) Large (y)
159
218
216
192
216
204
189
208
1602
87
103
103
96
95
98
98
103
783
Region 2
Small (x) Large (y)
Total (z)
Strip
Number
Total (z)
246
321
319
288
311
302
287
311
2385
017
032
033
042
063
066
125
118
128
154
136
115
81
84
80
85
86
85
206
202
208
239
222
200
Total
776
501
1277
Mean
200.3
97.9
298.1
Mean
129.3
83.5
212.8
Variance
397.2
29.8
607.1
Variance
201.4
5.9
224.1
(i)
y  96 168y1  72 168y 2  9171
.
z  96 168 z1  72 168 z2  26157
.
(ii)
Proportion of large trees = y z = 0.3506
SADC Course in Statistics
Module H2 Woods Solution 10 – Page 2
Module H6 Woods Solution 10
(iii)
Total number of trees in Region 1 = 28620
Total number of trees in Region 2 = 15324
Total number of trees in the forest = 43944
(iv) Variance of the estimate in (iii) for total number of trees in forest is
962 Var z1   72 2 Var z 2 
=
962 1  8 96 s12 8  72 2 1  6 72 s22 6
=
 9688 s12
=
1056s12  792s22  818513
8   72 66 s22 6
Hence, standard error = 818513 = 904.72
FURTHER QUESTIONS:
1.
If we look at the sample mean and variance in each of the two regions, we see that
there are substantial differences between the two regions. For example there are
many more trees, both small and large, in region 1 compared to region 2. In
scheme A, we ignored any differences between the two regions and this led to a
large variance. In scheme B, we took account of the difference between the two
regions in computing the standard error of our estimate of the total number of
trees in the forest. Since the variation within each region is smaller than in the
forest as a whole, this led to a smaller standard error.
2.
We can use the approach given above, i.e. y z , or we could use the mean of the
proportions in each strip, i.e.
 y i z i 
n . For instance, in scheme A, we would
get
87 
 98 101



 14  0.357
 270 333
217 
SADC Course in Statistics
Module H2 Woods Solution 10 – Page 3
Module H6 Woods Solution 10
which is quite close to the first estimate of 0.352. The alternative estimate above is
not recommended since it has large bias and variance. What is perhaps more
important is that they are measuring different things: the first is taken with respect
to trees, the second with respect to strips. If we were actually interested in strips,
then the average proportion above might be relevant. However, in our exercise, we
are interested in the proportion of trees in the forest as a whole and we do better to
use the ratio of the sample totals for y and z.
3.
This is because we have proportional allocation i.e. 96:72 is the same ratio as 8:6).
for example, the mean number of large trees per strip estimated by
y  96 168y1  72 168y 2
is identical to
y  8 14y1  6 14y 2  8y1  6y 2  14 = mean of the 14 observations
Thus if you have proportional allocation, then the estimate of the mean obtained by
weighting the means of each region is identical to the ordinary mean of all the
observations taken together.
4.
The argument is incorrect because the total number of large trees, y, in a given
strip, cannot be regarded as being binomially distributed since (a) we are not
sampling trees individually, but in groups, i.e. strips, and (b) the probability of a
large tree in any strip is not the same from strip to strip. We do not know in
advance how many trees will occur in each of the strips included in our sample;
each strip has to be considered as a unit with two measurements being taken from
it, namely y and z. We are then interested in the ratio of the population total of y to
that for z. Note also that both y and z are random variables and hence determining
the standard error of their ratio is not trivial.
We now discuss briefly some ideas associated with post-stratification.
SADC Course in Statistics
Module H2 Woods Solution 10 – Page 4
Download