Estimation in Stratified Random Sampling

advertisement
Estimation in Stratified
Random Sampling
(Session 07)
SADC Course in Statistics
Learning Objectives
By the end of this session, you will be able to
• explain what is meant by stratification, how
a stratified sample is drawn, and its
advantages
• explain proportional or Neyman’s allocation
of sample sizes to each stratum
• compute estimates of the population mean
and population total from results of a
stratified random sample
• determine measures of precision for the
above estimates
To put your footer here go to View > Header and Footer
2
Review of stratified sampling
• We recall first that stratification is done when
it is possible to divide the population into
groups (strata) so that the within group
variance is small, ideally as small as possible.
• From each stratum, a sample of suitable size
is drawn, usually using simple random
sampling.
• The greatest challenge is in defining a suitable
stratification variable.
• It is useful when information is required for
each stratum (e.g. each region in a country)
as well as for the whole population.
To put your footer here go to View > Header and Footer
3
Advantages of stratification
• Sampling from each stratum guarantees that
the overall sample is more representative of
the whole population compared to a simple
random sample
• If each stratum is more homogeneous, i.e. less
variable than the population as a whole with
respect to key responses of interest, then
estimates will be more precise
• Likely to be administratively convenient, e.g.
when different sampling procedures need to be
applied to different strata (see ELUS example in
Practical 2 for large sized estates of >500ha)
To put your footer here go to View > Header and Footer
4
Sampling with proportional allocation
• Suppose there are m strata and a sample
of size ni is chosen from the Ni units in
stratum i.
• Then total population size is N =  Ni ,
while the sample size is n =  ni .
• Often convenient to choose ni so that
n1
n2
nm
n
=
= ... =
=
N1
N2
Nm
N
• This is called proportional allocation
To put your footer here go to View > Header and Footer
5
Sampling using Neyman’s allocation
• If costs of sampling are the same in each
stratum, but variability is different (although
homogeneous within strata), then sensible
to take more samples where there is greater
variability, i.e. sample in proportion to the
standard deviation.
• The appropriate value of ni in this case, see
below, is called Neyman’s (or optimum)
allocation.
ni =
n Ni Si
m

Ni Si
i
To put your footer here go to View > Header and Footer
6
Other issues and allocation methods
• Above assumes within-stratum variances Si are
known. A pilot run or a previous study may
give estimates.
• But results from a pilot run may give very poor
estimates, since they will often be based on
very small sample sizes
• Also note that Neyman’s allocation may lead to
very few units being sampled from some strata
– not useful if separate results for each stratum
are also needed.
• Other methods of allocation exists, e.g.
incorporating possible differences in sampling
costs
To put your footer here go to View > Header and Footer
7
Estimating the population mean
• First carry out computations for each
stratum, i.e. find mean and variance for ith
stratum.
• The estimate the population mean is then
x ST
1 m
=  Ni xi
N 1
, with variance
1 m
2
ˆ
ˆ i)
Var(x ST ) = 2  N i Var(x
N
1
2


N
n
s
 i
=    1- i  i
Ni  ni
1  N  
m
2
To put your footer here go to View > Header and Footer
8
Estimating the population total
• As with the mean, first find an estimate for
the total in ith stratum, i.e. Ni x i
• The estimate the population total is then
m
N
i
xi = N x ST
, with variance
1
ˆ ST )
N Var(x
2
Note: Use expressions on the previous
page in computing these estimates
To put your footer here go to View > Header and Footer
9
An example
Government agricultural inspectors carry out a
survey of cattle ownership in a region divided
into 3 administrative areas. Five farms are
selected from each area and the number of
cattle recorded as shown below. The total
number of farms is 636.
Area
Number of farms
No of cattle
1
186
8, 50, 92, 60, 34
2
214
0,
3
236
16,
0,
4,
12, 24
4, 28,
46, 28
To put your footer here go to View > Header and Footer
10
Questions to answer
What is the mean number of cattle per farm?
What is the total number of cattle in the region?
First need to compute some summaries:
Area
Ni
xi
si 2
1 - fi
1
186
48.8
969.2
0.9731
2
214
8.0
104.0
0.9766
3
236
24.4
244.8
0.9788
Note: fi = ni/Ni in ith stratum.
To put your footer here go to View > Header and Footer
11
Answers for estimating mean
The mean number of cattle per farm is
estimated as:
1 3
x ST =  Ni xi = 16547.2/636 = 26.02
N 1
i.e. Approximately 26 cows per farm.
This has variance:
2


N
n
s
 i
i
i

1

1  N   N  n

i 
i
3
2
= 25.031
Hence its std. error = 5.0
To put your footer here go to View > Header and Footer
12
Answers for estimating total
• The total number of cattle in the region is
estimated as: N x ST
= 636 x 26.02 = 16547
This has variance:
ˆ ST ) = (636)2 x 25.031
N 2 Var(x
Hence its standard error is
636 x 25.031 = 3181.9
To put your footer here go to View > Header and Footer
13
Estimating population proportion
• As with the mean, first find an estimate for
proportion in ith stratum, i.e. pi = ri/ni
• The estimate the population proportion is
then
1 m
 Ni pi , with variance
N 1
1
2
N
 ni  pi (1-pi )
1 N  1- N  n -1

i 
i
m
2
i
To put your footer here go to View > Header and Footer
14
References
Barnett, V. (1974) Elements of Sampling Theory.
Edward Arnold. ISBN 0 340 17387 4
Levy, P.S. and Lemeshow, S. (1999) Sampling
and Populations: Methods and Applications (3rd
edition) Wiley, New York. ISBN 0-471-15575-6
Lohr, S.L. (1999) Sampling: Design and
Analysis. International Thomson Publishing.
ISBN 0-534-35361-4
To put your footer here go to View > Header and Footer
15
Practical work follows…
To put your footer here go to View > Header and Footer
16
Download