Comparing different sampling schemes

advertisement
Module H6 Practicals 13&14
Comparing different sampling schemes
1. One aim of this exercise (to be done in pairs) is to ensure that you are clear about the
difference between drawing a stratified random sample with proportional allocation
compared to drawing a cluster sample with probability proportional to size.
A second aim is to compute estimates according to different sampling schemes and assess
the benefits and limitations of such schemes, both in terms of the practicalities associated
with drawing the sample, and with respect to the precision of the sample estimates
obtained.
A third aim is to give you an appreciation of how you might compute probabilities of
selection and thereby determine whether your sample is self-weighting.
A fourth aim is to give you an appreciation of what is meant by design effects and how
these may be computed for the fairly simple scenario being dealt with in this exercise.
You are expected to do parts (a) to (d) during the time allocated for practical work in
Session 13 and continue with the remaining parts of this exercise during the allocated time
for practical work in Session 14.
Problem description:
In a small district, there are 5 administrative areas which include 20, 15, 25, 16 and 27
villages respectively. The total number of cows in the district is to be estimated, sampling
no more than 15 of the 103 villages in the district. Assume that once a village is visited, it
is relatively easy to measure accurately the number of cows in the village.
The data for the whole population, i.e. administrative area, village number, and total cow
numbers in each village, are available to you in the worksheet named cows in the Excel file
named H6_data.xls, but you need to pretend, for the purpose of this exercise, that you
will have the information only for those villages that are visited. However, you will be
asked at the end, to use all the population data for the purpose of checking how good your
estimates are from the different schemes. The number of villages in each area may be
assumed known, and to have values as given above.
SADC Course in Statistics
Module H6 Practicals 13&14 – Page 1
Module H6 Practicals 13&14
Your Task:
(a) Select a sample of 15 villages using each of the following procedures, collect data on
cow numbers and organise the data in a new spreadsheet, identifying clearly which
sampling procedure applies to which data subset.
(i) A simple random sample.
(ii) A stratified random sample with 3 units drawn from each stratum (admin. area) at
random.
(iii) A stratified random sample with proportional allocation.
(iv) A cluster sample, choosing 3 clusters with replacement, with probability proportional
to size, and 5 villages from each of these clusters without replacement.
(b) Use data from your simple random sample to determine an estimate, say XRAN, of the
total number of cows in the district, (XRAN =
).
Calculate also the standard error of your estimate using the formula
15  s 2
Std. error of XRAN  103 1 

 103  15
=
<enter computed value)
where s2 refers to the sample variance.
(c) Now consider data obtained from your stratified sample, drawing equal sized sample,
i.e. three villages at random, from each stratum.
From the summary statistics, estimate the total number of cows, XSTEQ and its standard
error, and enter the values below. You may use Excel facilities for this purpose.
XSTEQ =
s.e. (XSTEQ) =
SADC Course in Statistics
Module H6 Practicals 13&14 – Page 2
Module H6 Practicals 13&14
(d) Next consider the stratified random sample obtained with proportional allocation.
You should get sample sizes as follows. If you haven’t got these sample sizes, check with a
staff member what has gone wrong with your computations.
i.e. n1 = 3
n2 = 2
n3 = 4
n4 = 2
n5 = 4
As with the simple random sample, estimate XSTPR, i.e. the total number of cows in the
district. Find also the standard error of this estimate. Note down the formulae used and
the final results below.
XSTPR =
s.e. (XSTPR) =
(e) Now consider data obtained from your cluster sample, drawing three clusters with
replacement and then 5 villages at random (without replacement) from the three selected
clusters.
From the summary statistics, estimate the total number of cows, XCLUS and its standard
error, and enter the values below.
XCLUS =
s.e. (XCLUS) =
SADC Course in Statistics
Module H6 Practicals 13&14 – Page 3
Module H6 Practicals 13&14
(f) Finally compute actual population results (but remember this is not what you would
have in practice). Note down the results below.
Mean =
;
Std. Dev. =
;
Sum =
Complete the table given below using the results you have obtained so far. The column
labelled 'error' corresponds to the difference between your estimate of the total number of
cows and the true population value. Comment on the appropriateness of the different
sampling schemes by looking at the results of your table.
Sampling method
(a)
XRAN
(b)
XSTEQ
(c)
XSTPR
(d)
XCLUS
Estimate
Std. Error
Error
% Error
Comments:
(g) Finally, use your results above to calculate the design effect (deff) for sampling schemes
(b), (c) and (d). Which design would you regard as providing the most precise estimates?
How will the values of deff help in a future survey?
SADC Course in Statistics
Module H6 Practicals 13&14 – Page 4
Download