Class exercise (1) NOTES This classroom exercise concerns

advertisement
Class exercise (1)
NOTES
This classroom exercise concerns numerical illustrations of simple random, clustered (multi-stage)
and stratified sampling. It is recommended that the students work in pairs.
To begin with, from a given population we will select simple random samples of elements of
various sizes to illustrate how the distribution of sample means varies with sample size. We will
also illustrate how the variability among elements in any given sample estimates the population
variance.
1 The population
(1) In order to facilitate quick selection of many random samples, we will employ a simple wellmixed population with known characteristics in which units with different values appear in an
entirely random order, so that any arbitrary set of units can be regarded as a random sample.
Our example is artificial in two respects:


We actually know the entire population (the Yj values vary in the range 0 to 9).
The population is thoroughly mixed at least in relation to values of the variable of interest,
i.e. the Yj values appear in an entirely random order.
(2) Theoretically, the population parameters are:
Y 
1 10
.  Y  4.500;
j
10
2 
j  1
2
1 10 
.   Y  4.5  8.25;
j
10
  2.87
j  1
Our illustrative population is shown in tab “40x40 random digits 0-9”. Let us assume that these
digits represent values of some variable Y, the average value of which we are interested in
estimating. Our table of 1,600 digits is of limited size and not perfect. It compares as follows with
the theoretical values above (see tab “frequency distribution”):
Comparison of frequencies in the 1600 digit table with the expected frequencies
Frequency distribution of digits
0
1
2
3
4
theoretical 160 160 160 160 160
actual
151 162 161 144 176
5
160
169
6
160
169
7
160
170
8
160
154
y
2
9
160 4.500
144 4.498
8.250
7.940
S2
8.250
7.945
2 Simple random sampling
For illustrative purposes, we construct six sets of simple random samples:
Sample “design”
Sample size
(1r)
one quarter of each row
n =(1x10)=10
160
(1c)
one quarter of each column
n =(10x1)=10
160
(2)
Each (4x4) square
n =(4x4)=16
100
(3r)
each whole row
n =(1x40)=40
40
(3c)
each whole column
n =(40x1)=40
40
(4)
Each (10x10) square
n =(10x10)=100
16
Number of samples
1
In each case, the above mentioned samples amount to a very small subset of all possible samples of
a given size that can be drawn from the population. Many more can be easily created.
3 Clusters with different degrees of homogeneity
Here we have considered the 100 square clusters of size (4x4).
Clusters with five different degrees of homogeneity have been formed:
(1) Entirely random clusters
(2) For each set of 4 columns of digits, the first column is sorted by increasing value of Y. Such
sorting is applied to left half of the table of (40x40) digits. The (4x4) clusters are formed in the
normal way, but using the set of digits sorted as above.
(3) As above, except that the sorting is applied to the whole table.
(4) The sorting is applied to the first two columns of each set of 4 columns of digits in the left half
of the table.
(5) The above is applied to the whole table.
If a simple random sample of a clusters (each of size n) were selected from a population of A
clusters, variance of the mean for this clustered sample would be
Var y A
2
Yk  Y 
a
S2
 A  a  SA
,f  .

 (1  f ). A , with S A2 
.
A 1
A
a
 A  a
2
When the clusters are formed by entirely random grouping as in (1), the variance would be identical
to that for a simple random sample of elements of the same size, i.e. of (a.n) elements:
2
2
 N  a.n  S
 Aa S
, giving
Var0  y   
.

.



 N  a.n  A  a.n
S
2
1A

S2
n
or Var1  y A  Var0  y  .
In schemes (2)-(5), clusters correspond to increasingly homogeneous groupings of elements.
Greater homogeneity within clusters implies greater variability between clusters. Hence the cluster
means in populations (2)-(5) are increasingly more diverse compared to cluster means in (1).
S
2
5A
S
2
4A
S
2
3A
S
2
2A
S
2
1A

S2
n
.
Generally, with n>>1, Si A  S 2 .
2
 A  a  Sk A
gives Vark  y A > Vark 1  y A , 𝑘 = 1 − 5.
Vark  y A  
.
 A  a
2
2
Vark  y A  S k A 
deft 

, 𝑘 = 1 − 5.
Var0  y   S 2 n 
2
k
deft5 A  deft42A  deft3 A  deft2 A  deft1 A  1 .
2
2
2
2
2
4 Stratification
If sample selection and estimation is done separately within each stratum, the same basic
expressions given above apply to each stratum. Using subscript h to refer to a particular stratum, we
have with SRS within strata:
Yhj  Yh 
S2
,
Var  y h   1  f h . h with S h2 
Nh  1
nh
2
summed over Nh units in the stratum h.
In putting together the results from different strata, we often do that in proportion to stratum size,
e.g. Wh=Nh/N. For the total population Y   hWh .Yh . and if the Wh are known, y   hWh . y h and
Var  y    h Wh2 .Var  y h  .
For simplicity in our illustrations, we consider the population as divided into H strata equal in
population as well as in sample size (Wh = Nh/N = nh/n = 1/H). With this, it follows from the above
expression that with SRS within each stratum, variance can be written as
Var y  E 
1  f   h S h2 
.
.
n  H 
Comparing this with unstratified SRS, the effect of stratification is in proportion to the ratio of the
average within-stratum variance S 2  S h2 H to the unstratified value S2.
We may decompose the total variance into variation within strata and variation among the strata
means:

 h  j Yhj  Y

2

  h  j Yhj  Yh

2
  h  j Yh  Y  ,
2
or dividing by N, we may write the above as  2   2  2 , where the first term on the right is the
within-stratum and the second term the between-strata component. ∆̅ is the mean squared deviation
of means. The proportionate reduction in variance from stratification is approximately
2
2

 hj  Yh  Y 

 hj Yhj  Y
2

2


 h N h .  Yh  Y 
 hj Yhj  Yh

2
2
  h N h .  Yh  Y 
2
.
The actual gain is slightly smaller due to the (generally minor) difference in the definition of  and
S. With H strata of equal size N/H, it is seen to be
S 2  S 2 2  H  1  2 
 2 
. .
N H 2
S2

An important point to note is that exactly the same idea applies when we are dealing with clusters
rather than elements as the sampling units in a stratified design. The above quantities then refer to
the variance of cluster means. With a given stratification, the deviation between strata means, 2 , is
the same whether element or cluster sampling is used within strata. By contrast, S2 or the
within-stratum term, S 2 , is usually much smaller for cluster means than that for individual elements
(as noted earlier).
Hence with cluster sampling, the relative gain from stratification is usually much more appreciable.
3
5 Estimating variance from the sample
In our illustrations, the average of the sample means y =yi/n is equal to the population mean Y . We
say that the expected value of the former equals to the latter: E[ y ]= Y ; i.e. y provides an unbiased
estimator of Y .
Furthermore, the variability among elements in any particular sample provides a measure of that
2
2
variability in the population, i.e.  2   i  y i  y  n   j Y j  Y
N   2 , where the summation


on the left is over n elements of the sample and that on the right is over N population elements.
2
Actually, for an SRS the exact relationship happens to be E[s2]=S2, where s 2   i  y i  y  n  1

and S 2   j Y j  Y

2
 N  1 . Hence for a simple random sample
E[var( y )]=Var[ y ],
where var( y )=(1-f)s2/n is estimated from the sample and Var( y )=(1-f).S2/n its population value.
This is the basis on which we can estimate the variance (a measure of variability among different
samples) from the results of a single sample that is available.
It is important to note that variance computed above provides a valid estimate for only simple
random sampling. For more complex designs estimating variance will involve more complex
formulae taking into account the complexity of the design. But interestingly, an important result of
sampling theory is that for many complex designs, the relationship E[s2]S2 still holds
approximately.
4
Y value
0
1
2
3
4
5
6
7
8
9
total
mean
sigma2
S2
theoretical N→∞
160
160
160
160
160
160
160
160
160
160
1600
4.500
8.250
8.250
illustrative N=1600
151
162
161
144
176
169
169
170
154
144
1600
4.498
7.940
7.945
Frequency distribution
Population characteristics
Theoratical Illustrative
mean= 4.500
4.498
var= 8.250
7.940
StDev= 2.870
2.818
cv= 0.638
0.626
Examples of simple random samples
(1) n=10: (10x1); (1x10)
(2) n=16: (4x4)
(3) n=40: (40x1); (1x40)
(4) n=100: (10x10)
mean=
var=
StDev=
cv=
4.50
0.48
0.69
0.15
Stratum
(1)
Unstratified
mean=
var=
StDev=
cv=
4.50
0.48
0.69
0.15
4.50
0.67
0.82
0.18
4.50
1.00
1.00
0.22
4.50
1.62
1.27
0.28
4.50
0.48
0.69
0.15
(2)
(1+2)
4.89
0.45
4.50
0.52
0.72
0.16
3.83
0.59
3.55
0.83
Stratum
(1)
Unstratified
mean=
var=
StDev=
cv=
4.60
0.49
Stratum
(1)
Unstratified
mean=
var=
StDev=
cv=
4.11
0.60
4.50
2.28
1.51
0.34
3.27
0.88
mean=
var=
StDev=
cv=
(1+2)
Stratum
(1)
Unstratified
mean=
var=
StDev=
cv=
4.40
0.46
4.50
0.67
0.82
0.18
(2)
Stratum
(1)
Unstratified
mean=
var=
StDev=
cv=
mean=
var=
StDev=
cv=
(2)
(1+2)
5.17
0.51
4.50
0.55
0.74
0.17
(2)
(1+2)
5.45
0.59
4.50
0.71
0.84
0.19
(2)
(1+2)
5.72
0.65
4.50
0.77
0.88
0.19
4.50
0.99
1.00
0.22
Complete the statistics for sampling distributions
(1.r) (1.c)
(2)
(3.r) (3.c)
(4)
mean= 4.50
var= 0.85
StDev= 0.92
cv= 0.20
mean=
var=
StDev=
cv=
4.50
1.60
1.27
0.28
mean=
var=
StDev=
cv=
4.50
2.25
1.50
0.33
5
Download