Chapter 5 Stratified Random Sampling

advertisement
Chapter 5
Stratified Random Sampling
Advantages of stratified random
sampling
 How to select stratified random sample
 Estimating population mean and total
 Determining sample size, allocation
 Estimating population proportion; sample
size and allocation
 Optimal rule for choosing strata

Stratified Random Sampling

The ultimate function of stratification is to
organize the population into
homogeneous subsets and to select a
SRS of the appropriate size from each
stratum.
Stratified Random Sampling

Often-used option
– May produce smaller BOE than SRS of
same size
– Cost per observation may be reduced
– Obtain estimates of population parameters
for subgroups

Useful when the population is
heterogeneous and it is possible to
establish strata which are reasonably
homogeneous within each stratum
Chapter 5
Stratified Random Sampling
Improved Sampling
Designs with Auxiliary
Information
Stratified Random
Sampling
Chapter 6 Ratio and
Regression
Estimators
Stratified Random Sampling:
Notation
yi :sample mean of data from stratum i,
i  1, , L
ni :sample size for stratum i
i : population mean of stratum i
 i :population total of stratum i
population total   1   2    L
Stratified Random Sampling
SRS within each stratum, so:
E ( yi )  i
ˆi  N i yi ; E (ˆi )  E ( N i yi )  N i E ( yi )  N i i   i
Estimate population total  by summing
estimates of  i
ˆ1  ˆ2   ˆL
ˆ 
 yst
N
Stratified Random Sampling: Estimate
of Mean
1
yst   N1 y1  N 2 y2   N L yL 
N
1 L
  N i yi
N i 1
1
ˆ
V ( yst )  2  N12Vˆ ( y1 )  N 22Vˆ ( y2 )   N L2Vˆ ( yL ) 
N
2
2
2








n
s
n
s
n
s
1
2
2
2
1
1
2
2
L
L
 2  N1 1 
  N 2 1 
   N L 1 
 
N   N1  n1
 N 2  n2
 N L  nL 
Stratified Random Sampling:
Estimate of Mean , BOE
1
2
2
ˆ
V ( yst )  2  N1 Vˆ ( y1 )  N 2 Vˆ ( y2 ) 
N
1 
2
 N LVˆ ( y L ) 
2
2
n
s
n
s




2
2
 2  N1  1  1  1  N 2  1  2  2 
N1  n1
N  
 N 2  n2
2

n
s


2
L
L
 N L 1 
 
 N L  nL 
BOE
1 
2
2
n
s
n
s




2
2
1
1
2
2
 1.96
N
1


N
1


 
2 
2  1 
N1  n1
N  
 N 2  n2
2

n
s


2
L
L
 N L 1 
 
 N L  nL 
Stratified Random
Sampling: Estimate of
Population Total
Nyst   N1 y1  N 2 y2 
 N L yL 
L
  N i yi
i 1
Vˆ ( Nyst )  N 2Vˆ ( yst )
  N12Vˆ ( y1 )  N 22Vˆ ( y2 ) 
 N L2Vˆ ( yL ) 
2
 2


n1  s12
n
s
2
2
2
  N1 1 

N
1


 
2 
  N1  n1
 N 2  n2
2



n
s
2
L
L
 N L 1 
 
 N L  nL 
Stratified Random Sampling: BOE for Mean
and Total , t distribution

When stratum sample sizes are small, can use t dist.
 
L
2
2
ak sk
Satterwaithe df 
L

a s 
2
k
N k ( N k  nk )
where ak 
k 1
2
nk
k
nk  1
k 1
BOE for  :
 
N 1 
 
1
t df
N
2
1
2


n
n
s
1
N
2
1
1
N
1
2
2

1 

n


n
s
2
N
2
2
2

N
2
2
L

1 

n
 
 
n 
s
L
N
L
t df
 
N 1 
 
2
1
1
N
1


n
s
2
1
1
N
2
2

1 

n
2
N
2


n
s
2
2
2

N
2
L

1 

n
L
N
L
 
 
n 
s
L
L
BOE for  :
n
2
2
L
L
Degrees of Freedom(worksheet cont.)
Stratified Random Sample Summary:
a 
k
N (N  n )
k
k
n
k
, n  20, n  8, n  12
1
2
3
k
N  155, N  62, N  93,
1
2
3
a  1046.25, a  418.5, a  627.75
1
2
3
1046.25  5.95  418.5  15.25  627.75  9.36 
1046.25  5.95   418.5  15.25   627.75  9.36 
2
df 
2
2
2
2

19
 21.09; t 21.09
 2.08
2
2
2
2

7
(see Excel worksheet)
11
2
Compare BOE in Stratified Random
Sample and SRS (worksheet cont.)
Stratified Random Sample Summary:
n  40, y  27.7;Vˆ ( y )  1.97
st
st
Strat. random
sample has
more precision
If observations were from SRS:
40
11.31


s  11.31, Vˆ ( y )  1 
 2.79

 310  40
2
Approx. Sample Size to Estimate
2 V ( y st )  B  V ( y st ) 
B
2
4
Let ni  ai n, ai  prop. of sample from stratum i
1
N
2
N
i 1
 an s
1  N  a n


2
L
2
i
i
i
i
i

B
2

4
L

n
2
2
N i si ai
where D 
i 1
L
N D
2
N s
i
i 1
2
i
B
2
4
Approx. Sample Size to Estimate
2 V ( Ny st )  B  V ( y st ) 
B
2
4N
2
Let ni  ai n, ai  prop. of sample from stratum i
1
N
2
N
i 1
 an s
1  N  a n


2
L
2
i
i
i
i
i

B
2
4N
2

L

n
2
2
N i si ai
where D 
i 1
L
N D
2
N s
i
i 1
2
i
B
2
4N
2
Summary: Approx. Sample Size to
Estimate ,
L
n
N
i 1
2
i
2
i
s ai
L
N DN s
2
i 1
2
i i
B2
D
when estimating 
4
B2
D
when
estimating

4N 2
Example: Sample Size to
Estimate (worksheet
cont.)
L
n
N
i 1
2
i
si2 ai
L
N D   N i si2
2
i 1
Prior survey:  1  5,  2  15,  3  10.
Estimate  to within 2 hrs with 95% conf.
allocation proportions are a1  a2  a3  1 3.
B 2 D B
3
N
i 1
s ai 
2 2
i i
2
4
 1; N 2 D  310 2  96,100
1552 (25)
13

622 (225)
13

932 (100)
13
3
 6,991, 275
2
N
s
 i i  155(25)  62(225)  93(100)  27,125
i 1
6,991, 275
n
 56.7  57
96,100  27,125
so n1  n2  n3  13 (57)  19
B2
D
4
Example: Sample Size to
Estimate (worksheet
cont.)
L
n
N
i 1
2
i
si2 ai
L
N D   N i si2
2
i 1
Prior survey:  1  5,  2  15,  3  10.
Estimate  to within 400 hrs with 95% conf.
allocation proportions are a1  a2  a3  1 3.
D
3
B2
N
i 1
4N
2

4002
4N2
s ai 
2 2
i i
 160,00

4N2
1552 (25)
13

40,000
N2
622 (225)
13
; N 2 D  40, 000

932 (100)
13
3
 6,991, 275
2
N
s
 i i  155(25)  62(225)  93(100)  27,125
i 1
6,991, 275
n
 104.2 105
40, 000  27,125
so n1  n2  n3  13 (105)  35
B2
D
4N 2
5.5 Allocation of the Sample
Objective: obtain estimators with small
variance at lowest cost.
 Allocation affected by 3 factors:

1. Total number of elements in each stratum
2. Variability in each stratum
3. Cost per observation in each stratum
5.5 Allocation of the Sample:
Proportional Allocation

If don’t have variability and cost
information for the strata, can use
proportional allocation.
Sample size for stratum h :
Nh
nh  n 
N
In general this is not the optimum choice
for the stratum sample sizes.
5.5 Optimal {min V ( yst )} allocation
1
ˆ
V ( yst )  2
of the sample: same cost/obs
N
in each stratum
min Vˆ ( yst ), subject to g ( n1 , n2 ,
, nL )  0,
, nL )  n1  n2 
 nL  n
n1 , n2 , , nL
where g ( n1 , n2 ,
Directly proportional to
stratum size and stratum variability
Use Lagrange multipliers:
Vˆ ( yst )
g

 0, i  1,
ni
ni
2


n
s
2
i
i
N
1




i
i 1
 N i  ni
L
, L  ni  n
N i si
L
N s
k 1
This method of choosing n1 , n2 ,
called Neyman allocation
, nL
k k
, i  1,
,L
5.5 Optimal {min V ( yst )} allocation
of the sample: same cost/obs
in each stratum
2
i 1
L
N D
L
N s
, i  1,
,L
k k
ni
substitute for ai above gives
n


  N i si 
 i 1

L
n
n
2
N i si ai
N s
i
i 1
N i si
k 1

2
From previous slide
ni  n
L
L
2
N D   N i si2
2
i 1
2
i
5.5 Optimal {min V ( yst )} allocation of the sample:
same cost/obs in each stratum

Worksheet 11
5.5 Optimal {min V ( yst )} allocation
1
ˆ
of the sample for fixed cost C: ci = cost/obs V ( yst )  2
N
in stratum i.
min Vˆ ( yst ), subject to g ( n1 , n2 ,
n1 , n2 , , nL
where g ( n1 , n2 ,
, nL )  0,
, nL )  c1n1  c2 n2 
Use Lagrange multipliers:
Vˆ ( yst )
g

 0, i  1,
ni
ni
2


n
s
2
i
i
N
1




i
i 1
 N i  ni
L
 c L nL  C
Directly proportional to
stratum size and stratum variability
, L  ni  n
N i si
ci
, i  1,
L
N s
k 1
k k
,L
ck
Inversely proportional
to stratum cost/obs
5.5 Optimal {min V ( yst )} allocation
of the sample: ci cost/obs
in stratum i
ni  n
k 1
k k
n
2
2
N i si ai
i 1
L
N D
N s
i
i 1
ci
, i  1,
L
N s

2
From previous slide
N i si
L
,L
ck
ni
substitute for ai above gives
n
 L
 L

  N k sk ck   N i si ci 
k 1
i 1



n
L
N 2 D   N i si2
i 1
2
i
5.5 Optimal {min V ( yst )} allocation of the sample:
ci = cost/obs in each stratum

Worksheet 12
Download