Uploaded by jaykeluskar

Cluster Sampling A

advertisement
CLUSTER SAMPLING
In random sampling, it is presumed (to suppose) that the population has been divided into a
finite number of distinct and identifiable units called the sampling units. The smallest units
into which the population can be divided are called the elements of the population, and a
group of such elements is known as a cluster. After dividing the population into specified
cluster (as a simple rule, the number of elements in a cluster should be small and the number
of cluster should be large), the required number of clusters are then obtained either by the
method of equal or unequal probabilities of selection, such procedure, when the sampling
units is a cluster, is called cluster sampling. If the entire area containing the population
under study is subdivided into smaller area segments, and each element in the population is
associated with one and only one such area segment, the procedure is alternatively called
area sampling. There are two main reasons for using cluster as a sampling unit.
i) Usually a complete list of the population units is not available and therefore the use of
individual unit as sampling unit is not feasible.
ii) Even when a complete list of the population units is available, by using cluster as
sampling unit the cost of sampling can be reduced considerably.
For instance, in a population survey it may be cheaper to collect data from all persons in a
sample of households than from a sample of the same number of persons selected directly
from all the persons. Similarly, it would be operationally more convenient to survey all
households situated in a sample of areas such as villages than to survey a sample of the same
number of households selected at random from a list of all households. Another example of
the utility of cluster sampling is provided by crop survey, where locating a randomly selected
farms or plot requires a considerable part the total time taken for the survey, but once the plot
is located, the time taken for identifying and surveying a few neighbouring plots will
generally be only marginal.
Theory of equal clusters
Suppose the population consists of N clusters, each of M elements. A sample of n clusters
is drawn by the method of simple random sampling and every unit in the selected clusters is
enumerated. Let us denote by
yij , value of the j  th element in the i  th cluster, j  1, , M ; i  1, , N .
y i. 
1 M
 yij , mean per element of the i  th cluster.
M j 1
YN 
1 N
 yi. , mean of cluster means in the population of N clusters.
N i 1
Y 
1 N M
  yij , mean per element in the population.
NM i 1 j 1
yn 
y
1 n
 yi. , mean of cluster means in a sample of n clusters.
n i 1
1 n M
  yij , mean per element in the sample.
nM i 1 j 1
112
RU Khan
YN  Y , and y n  y , if size of clusters are same.
Note:
S i2 
1 M
( yij  yi. ) 2 , mean square between elements within the i  th cluster.

M  1 j 1
1
S w2 
N
S i2 ,

N
S b2 
mean square within clusters.
i 1
1 N
 ( yi.  YN ) 2 , mean square between cluster means in the population.
N  1 i 1
N M
1
S 
( yij  Y ) 2 , mean square between elements in the population.


NM  1 i 1 j 1
2

E ( yij  Y ) ( yik  Y )
E ( yij  Y ) 2

N M
1
  ( yij  Y ) ( yik  Y )
NM ( M  1) i 1 j  k
1 N M
  ( yij  Y ) 2
NM i 1 j 1
N M
  ( yij  Y ) ( yik  Y )

i 1 j  k
( M  1) ( NM  1) S 2
in clusters.
, intracluster correlation coefficient between elements with
Theorem: A simple random sample, wor , of n clusters each having M elements is drawn
from a population of N clusters, the sample mean y n is an unbiased estimator of population
1 f 2
1 1 
mean Y and its variance is V ( y n )     S b2 
S .
n b
n N 
Proof:
We have,
1 n
 1 n
1 N
E ( y n )  E   yi.    E ( yi. )   yi.  YN  Y .
n

N i 1
 i 1  n i 1
To obtain the variance, we have, by definition
2
1 n


n
1  n
V ( y n )  E ( y n  YN )  E   yi.  YN  
E  ( yi.  YN ) 
n


n
n 2  i 1
 i 1


2
2

n

1  n
2
E
(
y

Y
)

E ( yi.  YN ) ( yi .  YN ) 


i
.
N

n 2  i 1
i i

Consider
E ( yi.  YN ) 2 
1 N
N 1 2
( yi.  YN ) 2 
S .

N i 1
N b
(7.1)
Cluster sampling
113
and
E ( yi.  YN ) ( yi .  YN ) 
N
1
 ( yi.  YN ) ( yi .  YN )
N ( N  1) i  i 

N
N


1


  ( yi.  YN )   ( yi .  YN )  ( yi.  YN )
N ( N  1) i 1


i  1



N
N
N

1
(
y

Y
)
(
y

Y
)

( yi.  YN ) 2 
  i.

N  i .
N
N ( N  1) i 1
i  1
i 1


N
1
1
( yi.  YN ) 2   S b2 .

N ( N  1) i 1
N
(7.2)
In view of equations (7.1) and (7.2), V ( y n ) reduces to
n
1  n N 1 2
 1 2  1  n ( N  1) 2 n (n  1) 2 
S



   N Sb   2  N Sb  N Sb 
b
n 2 i 1 N
 n
i i
V ( yn ) 
N  n 2 1 f 2
S 
S .
nN b
n b

For large N , V ( y n ) 
Note:
1 2
S .
n b
Alternative expression of V ( y n ) interms of correlation coefficient
Consider the intracluster correlation coefficient between elements within clusters and is
defined as
N M


E ( yij  Y ) ( yik  Y )
E ( yij  Y ) 2
  ( yij  Y ) ( yik  Y )

i 1 j  k
( M  1) ( NM  1) S 2
N M
  ( yij  Y ) ( yik  Y )  (M  1) ( NM  1)  S 2 .
i 1 j  k
By definition,
V ( yn ) 
1 f 2
1 f N
Sb 
( yi.  YN ) 2

n
n ( N  1) i 1
(7.3)
Consider
N 
2
M
N  M


M
2
 1
  1


(
y

Y
)

y

Y
(
y

Y
)
 i. N
  M  ij M N 
N 
2    ij
M i 1 j 1
i 1
i 1
j 1


N

2
N M

1  N M
2
 , as Y  Y
(
y

Y
)

(
y

Y
)
(
y

Y
)
  ij
N
ik
2    ij

M  i 1 j 1
i 1 j  k

(7.4)
114
RU Khan
1

M

2
[( NM  1) S 2  ( M  1)( NM  1)  S 2 ]
( NM  1) S 2
M2
[1  ( M  1)  ]
(7.5)
Substitute the values of equation (7.5) in equation (7.3), we get
V ( yn ) 
Note:
1 f
n
 ( NM  1) S 2 

 [1  ( M  1)  ] .
 M 2 ( N  1) 


For large N ,
N (M  1 / N )
1
NM  1
1
.
 0 , so that (1  f )  1 , and


2
2
N
M ( N  1) NM (1  1 / N ) M
Hence,
V ( yn ) 
Corollary:
S2
[1  ( M  1)  ] .
nM
Yˆ  NM y n is an unbiased estimate of the population total Y , and its variance
2
2 2 1 f  2
2  1  f  ( NM  1) S
ˆ
V (Y )  N M 
[1  ( M  1)  ]
 Sb  N 

N 1
 n 
 n 
1  f  2
 N 2M 
 S [1  ( M  1)  ] , for large N .
 n 
Estimation of variance V ( y n )
Define,
sb2 

1 n
1  n 2
2
2
(
y

y
)

y

n
y
 i. n n  1   i.
n  , then
n  1 i 1
 i 1

E ( sb2 ) 

1  n
2
2 
E
(
y
)

n
E
(
y
)
 i.
n 
n  1  i 1

Note that,
V ( yi. )  E ( yi.2 )  YN 2 , so that
 N  1 2
2
E ( yi. 2 )  
 S b  YN .
N


(7.6)
and
V ( y n )  E ( y n 2 )  YN 2 , so that
 N n 2
2
E ( yn 2 )  
 S b  YN .
nN


In view of equations (7.7), and (7.6), E ( sb2 ) reduces as
(7.7)
Cluster sampling
E ( sb2 ) 
115
1   N  1 2
1  nN  n  N  n  2
 N  n 2
2
n
 Sb  n 
 Sb  

 Sb  Sb .

n 1   N 
N
 nN   n  1 

This shows that sb2 is an unbiased estimate of S b2 . Hence v ( y n ) 
estimator of V ( y n ) 
1 f 2
Sb .
n
1 f 2
s is an unbiased
n b
Relative efficiency (RE ) of cluster sampling
In sampling of nM elements from the population by simple random sampling, wor , the
variance of the sample mean y is given by
 NM  nM
V ( y sr )  
 NM
2
1 f 2
1 f 2
S
Sb .

S , and V ( y n ) 

n
nM
 nM
Thus, the relative efficiency of cluster sampling compared with simple random sampling is
given by
RE 
V ( y sr )
S2
. This shows that the efficiency of cluster sampling increases as the

V ( yn ) M S 2
b
mean square between clusters means S b2 decreases.
Note: For large N , the relative efficiency of cluster sampling in terms of intracluster
correlation coefficient  is given by
RE 
V ( y sr )
1
.

V ( y n ) 1  ( M  1) 
It can be seen that the relative efficiency depends on the value of  , if
i)
  0 , then V ( y sr )  V ( y n ) , i.e. both methods are equally precise.
ii)   0 , then V ( y sr )  V ( yn ) , i.e. simple random sampling is more precise.
iii)   0 , then V ( y sr )  V ( y n ) , i.e. cluster sampling is more precise.
Estimation of relative efficiency of cluster sampling
We have,
Est. ( RE ) 
Est. S 2
M Est. S b2
, here s 2 will not be a unbiased estimate of S 2 i.e. E ( s 2 )  S 2 ,
because a sample of nM elements is not taken randomly from the population of NM
elements. To find unbiased estimate of S 2 , consider
N M
N M
( NM  1) S    ( yij  Y )    ( yij  yi.  yi.  Y ) 2
2
i 1 j 1
N M
2
i 1 j 1
   [( yij  yi. ) 2  ( yi.  Y ) 2  2 ( yij  yi. ) ( yi.  Y )]
i 1 j 1
116
RU Khan
N M
N
N
i 1 j 1
i 1
i 1
   ( yij  yi. ) 2  M  ( yi.  Y ) 2  0  ( M  1)  S i2  M ( N  1) S b2
2
 N (M  1) S w
 M ( N  1) S b2 .
(7.8)
2
It can be seen that in a random sample of n clusters, sb2 and s w
will provide unbiased
estimates of S b2 and S w2 , respectively.
Define,
2
sw

n M
1
( yij  yi. ) 2 , and


n ( M  1) i 1 j 1
sb2 
1 n
( yi.  y n ) 2 .

n  1 i 1
Consider
2
sw

n M
n
 n M

1
1
2
2
2

(
y

y
)

y

M
y
  ij i. n (M  1)    ij
 i.  , so that
n ( M  1) i 1 j 1
i 1
 i 1 j 1

2
E (s w
)
n
n M

1
  E ( yij2 )  M  E ( yi2. )
n ( M  1) i 1 j 1

i 1


Note that
V ( yij )  E ( yij2 )  YN2 , then
E ( yij2 ) 
( NM  1) 2
( N  1) 2
S  YN2 . Similarly, we can see, E ( yi2. ) 
S b  YN2 .
NM
N
Therefore,
2
E (s w
)
n
 n M  ( NM  1)
1

 ( N  1) 2

  
S 2  YN2   M  
S b  YN2 
n ( M  1) i 1 j 1 NM
N


i 1 


( NM  1) 2
( N  1) 2
1


nM
S  nM YN2  nM
S b  nM YN2 

n ( M  1) 
NM
N


1
[( NM  1) S 2  M ( N  1) S b2 ]
N ( M  1)

1
[ N ( M  1) S w2 ]  S w2 , by using relation, which is given in equation (7.8).
N ( M  1)
and
E ( sb2 )  S b2 , as n clusters are drawn under srswor .
Thus, an unbiased estimate of S 2 will be
Sˆ 2 
1
2
[ N ( M  1) s w
 M ( N  1) sb2 ] .
NM  1
Cluster sampling
117
Therefore,
1
2
[ N ( M  1) s w
 M ( N  1) sb2 ]
.
Est ( RE )  NM  1
M sb2
Note:
For large N ,
Est. ( RE ) 
1
2
[ N ( M  1) s w
 M ( N  1) sb2 ]
N (M  1 / N )
M sb2
1
2
[ N ( M  1) s w
 NM (1  1 / N ) sb2 ] ( M  1) s 2  M s 2
w
NM
b
.


2
2 2
M sb
M sb
Estimation of 
For large N , RE 
1
 E (say), so that
1  ( M  1) 
Eˆ  (M  1) Eˆ ˆ  1 , where Eˆ 
or ˆ 
1  Eˆ
( M  1) Eˆ
1

1
M 2 sb2
2
( M  1) s w
 M sb2
M 2 sb2
2
[( M  1) s w
 M sb2 ]
 1

2
2 

( M  1)
[( M  1) s w  M sb ]
 M 2 s2

b


2
M ( M  1) sb2  ( M  1) s w

2
( M  1) [(M  1) s w
 M sb2 ]

2
M sb2  s w
2
( M  1) s w
 M sb2

2
M 2 sb2  ( M  1) s w
 M sb2
2
( M  1) [( M  1) s w
 M sb2 ]
.
Alternative method
We have,

1 N M
  ( yij  Y ) ( yik  Y )
M  1 i 1 j  k
( NM  1) S
2
2
 M ( N  1) Sb2 .
, and ( NM  1) S 2  N (M  1) S w
Note that, from equation (7.4)
M
2
N
 ( yi.  YN )
i 1
N M
or
2
N M
N M
   ( yij  Y )  
2
i 1 j 1
N
  ( yij  Y ) ( yik  Y )  M  ( yi.  Y )
i 1 j  k
 ( yij  Y ) ( yik  Y )
i 1 j  k
2
i 1
2
N M
   ( yij  Y ) 2
i 1 j 1
118
RU Khan
2
 M 2 ( N  1) S b2  ( NM  1) S 2  M 2 ( N  1) S b2  N ( M  1) S w
 M ( N  1) S b2
2
 M ( N  1) S b2 (M  1)  N (M  1) S w
.
Hence,

M ( N  1) S b2  N S w2
M ( N  1) S b2  N ( M  1) S w2
.
2
It can be seen that in a random sample of n clusters, sb2 , and s w
will provide unbiased
estimate of S b2 , and S w2 respectively. Therefore, an estimator of  will be
ˆ 
2
M ( N  1) sb2  N s w
2
M ( N  1) sb2  N ( M  1) s w
, and for large N , ˆ 
2
M sb2  s w
2
M sb2  ( M  1) s w
.
Determination of optimum cluster size
The best size of the cluster to use depends on the cost of collecting information from clusters
and the resulting variance. Regarding the variance function, it is found that variability
between elements within clusters increases as the size of cluster increases (this means that
large clusters are found to be more heterogeneous than small clusters) and decreases with
increasing number of clusters. On the other hand, the cost decreases as the size of cluster
increases and increases with the number of clusters increases. Hence, it is necessary to
determine a balancing point by finding out the optimum cluster size and the number of
clusters in the samples, which can minimize the sampling variance for a given cost or,
alternatively, minimize the cost for a fixed variance.
i) The cost of a survey, apart from overhead cost, will be made up of two components.
ii) Cost due to expenses in enumerating the elements in the sample and in travelling within
the cluster, which is proportional to the number of elements in the sample.
iii) Cost due to expenses on travelling between clusters, which is proportional to the distance
to be travelled between clusters. It has been shown empirically that the expected value of
minimum distance between n points located at random is proportional to n .
The cost of a survey can be, therefore expressed as
C  c1nM  c2 n ,
where c1 is the cost of collecting information from an element within the cluster and c2 is
the cost per unit distance travelled between clusters. In various agricultural surveys it has
been observed that S w2 is related to M by the relation S w2  a M g , g  0 , where a and g
are positive constant, then
Sb2 
( NM  1) S 2  N ( M  1) aM g
 S 2  ( M  1) aM g 1 , for large N .
M ( N  1)
Thus, the variance V ( y n ) for large N , reduces as
1
V ( y n )  [ S 2  ( M  1) a M g 1 ] .
n
Cluster sampling
119
The problem is to determine n and M such that for specified cost, the variance of y n is a
minimum. Using calculus methods we form
  V ( yn )   (c1nM  c2 n  C ) ,
where  is an unknown constant. Differentiating with respect to n and M respectively, and
equating the results to zero, we obtain
c 


1
0
[ S 2  ( M  1) a M g 1 ]    c1M  2  , so that
n
2 n

n2
c 

1
V ( y n )    c1M  2 
n
2 n

(7.9)
and


0
V ( y n )   c1n , so that
M
M

V ( y n )   c1n .
M
(7.10)
On eliminating  from equation (7.9) and (7.10), we have
c1n

V ( yn )  
or
1
M
c


2
V ( yn )
 c1M 

n
2 n

1
or
M

V ( yn )  
V ( y n ) M
1
1

V ( yn )  
V ( y n ) M
c1


c2

c1M 1 
 2c M n 
1


1
c2
2c1M n
Now solving, c1n M  c2 n  C  0 as a quadratic in
n , we have


 c2  c 22  4 c1M C
4 c1M C
4 c1MC


n
or 2 c1M n  c 2  c 2 1 
 c2  1 
 1
2
2
2 c1M


c2
c2


Hence,
M

V ( yn )  
V ( y n ) M
1
1
c2


4 c1MC


c2  1 
 1


c 22



4 c MC 
 1  1

c 22 

Now, solve LHS of equation (7.11), we have
M

M

V ( yn ) 
[ S 2  ( M  1) a M g 1 ]
V ( y n ) M
n V ( y n ) M

1
[agM g  a ( g  1) M g 1 ] .
nV ( yn )
1 / 2
 1.
(7.11)
120
RU Khan
Therefore,

4 c MC 
aM g 1 [ gM  ( g  1)]
 1  1  1

nV ( yn )
c 22 

1 / 2
(7.12)
It is difficult to get an explicit expression for M . However, M can be obtained by the
iterative method (trial and error method). On substituting the value of M thus obtained in
equation (7.12), we can obtain the optimum value of n .
It is evident from equation (7.12) that the optimum size of the unit becomes smaller when
i)
c1 increases i.e. time of measurement increases.
ii) c2 decreases i.e. travel become cheaper.
iii) total cost of survey C increases.
Cluster sampling for proportion
If it is desired to estimate the proportion P of elements belonging to a specified category A
when the population consists of N clusters, each of size M and a random sample, wor , of
n clusters is selected. Defining yij as 1 if the j  th element of the i  th cluster belongs to
M
the class A and 0 otherwise, it is easy to note that ai   y ij gives the total number of
j 1
a
elements in the i  th cluster that belong to class A , and pi  i is the proportion in the
M
i  th cluster. Hence the proportion P is
1 N M
1 N
1 N
P
  yij  NM  ai  N  pi .
NM i 1 j 1
i 1
i 1
n
1
An unbiased estimate of P is Pˆ   pi  p
n i 1
and
N
N n N
1 1  1
2
V ( p)    
(
p

P
)

( pi  P) 2 , for large N .


i
2
n
N
N

1


N n i 1
i 1
n
1 1  1
ˆ
As an estimate of V ( p) we may use V ( p)    
( pi  p ) 2 .

 n N  n  1 i 1
Alternatively, if we take a simple random sample, wor of nM elements from the population
n  PQ
 NM  nM  PQ 
of size, NM , the variance of p is V ( p)  
, for large N .
 1  

 NM  1  nM  N  nM
Cluster sampling
121
Theory of unequal clusters
There are a number of situations where the cluster size vary from cluster to cluster, for
example, villages or urban blocks which are groups of households, and households, which are
groups of persons are usually considered as clusters for purposes of sampling, because of
operational convenience.
Suppose the population, consisting N
clusters of size M 1 , M 2 , , M N such that
N
 M i  M 0 . A sample of
n clusters is drawn by the method of simple random sampling,
i 1
wor , and all elements of the clusters surveyed. Let us denote by
yij , value of the j  th element in the i  th cluster, j  1, 2, , M i ;
i  1, 2, , N .
M
i
1
y i. 
yij , mean per element of the i  th cluster.

M i j 1
1 N
YN   yi. , mean of the cluster means in the population of N clusters.
N i 1
yn 
1 n
 yi. , mean of the cluster means in the sample of n clusters.
n i 1
N Mi
1
Y 
N
Mi
1 N
  yij  M  M i yi. ,
i 1 j 1
mean per element in the population.
0 i 1
i 1
M
M
1 N
M i  0 , mean of cluster size.

N i 1
N
Three estimators of population mean Y , that are in common use may be considered.
1 n
1 estimate: It is defined by the sample mean of clusters means as y I   yi.  y n .
n i 1
st
By definition,
1 n
 1 N
E ( y I )  E   yi.    yi.  YN  Y , as the sampling is sr .
n

 i 1  N i 1
Thus, y I is biased estimator of the population mean Y .
The bias of the estimator is given as
B  E( y I )  Y 

1
NM
1 N
1 N
1 N
1 N
y

M
y

y

 i. M  i i. N  i. N M  M i yi.
N i 1
0 i 1
i 1
i 1
N
N
N


 M  yi.   M i yi.    1  ( M i  M ) yi.


NM i 1
i 1
 i 1

122
RU Khan

1 N

 ( M i  M ) ( yi.  YN  YN )
N M i 1


1 N
1 N

 ( M i  M ) ( yi.  YN ) 
 ( M i  M ) YN
N M i 1
 N M i 1

1
Cov ( yi. , M i ) .
M
This shows that bias is expected to be small when M i and yi. are not highly correlated. In
such a case, it is advisable to use this estimator.
Its variance is given by
V ( yI )  E ( yI  YN )2 
1 f 2
1 N
Sb , where Sb2 
 ( yi.  YN )2
N  1 i 1
n
and an unbiased estimator of V ( y I ) is
v( y I ) 
1 f 2
1 n
sb , where sb2 
 ( yi.  y I ) 2 .
n
n  1 i 1
1 n
2 estimate: It is defined as y II 
 M i y i. .
nM i 1
nd
By definition,
E ( y II ) 

1 n
1  1 N
1 N

E
(
M
y
)

M
y

 i i. M  N  i i.  NM  M i yi.  Y , as srwor .
nM i 1
i 1
 i 1

This shows that y II is unbiased estimate of Y . Its variance is given by
 1 n

 1 n M i yi. 

 
.
V ( y II )  V 
M
y

V
 nM  i i. 
n

M
i 1


 i 1

Define, a variate
ui 
M i yi.
, i  1, 2,, N .
M
Let u and U be the sample and population means of variable u , respectively, where,
1 n M i yi.
u 
 y II , and
n i 1 M
1 N M i yi.
1 N
U  

 M i yi.  Y .
N i 1 M
M 0 i 1
Therefore,
V ( y II )  V (u ) 
1
where, S b 2 
N 1
1 f 2
S b , as clusters are randomly drawn wor .
n
N
1 N M y

 (ui  U )  N  1   Mi i.  Y 
i 1
i 1
2
2
Cluster sampling
123
and an unbiased estimator of V ( y II ) is
2
1 f 2
1 n  M i yi.

2
v( y II ) 
su , where su 
 y II  .


n
n  1 i 1  M

3rd estimator: It is defined as y III 
n
1
 M i yi. . This estimate is a ratio estimate of
 M i i1
i
the form Rˆ 
1
 yi , and its variance is given by replacing xi by M i and yi by M i yi.
x
 i i
i
in the variance of ratio estimator, where, V ( Rˆ ) 
N
1 f
2
 ( yi  R xi ) 2 ,
n ( N  1) X i 1
and
2
X
2
1 N

   M i   M 2 . Hence,
N

 i 1 








N 
N
 1


1 f
V ( y III ) 
M i yi.  
M i y i.  M i 



n ( N  1) M 2 i 1 
 N

i 1

 Mi



 i 1



2
2
1 f N Mi
1 f


( yi.  Y ) 
S b 2 ,

(
M
y

Y
M
)


i
i
.
i

2
n
(
N

1
)
M
n

n ( N  1) M i 1
i 1 
1 f
1
where S b 2 
N 1
N
N
2
2
M

  Mi ( yi.  Y ) .
i 1
An unbiased estimate of V ( y III ) is given by
1 f 2
v ( y III ) 
s  , where
n b
1
sb 2 
(n  1)
n
2
M

  Mi ( yi.  y III ) .
i 1
Cluster sampling with varying probabilities and with replacement
If a sample of n clusters is drawn with probabilities proportional to size, i.e.
M
pi  M i or pi  i and with replacement, then an unbiased estimate of Y is given by
M0
Theorem:
yn 
1 N Mi
1 n
with
variance
y
V
(
y
)

 i.
 ( yi.  Y ) 2 .
n
n i 1
n i 1 M 0
Proof: By definition,
1 n
 1 n

1 n  N
1 N



E ( y n )  E  yi.   E ( yi. )    pi yi. 
M y Y .
n

 M 0  i i.
n i 1 i 1
i 1
 i 1  n i 1

124
RU Khan
This shows that y n is an unbiased estimator of Y .
To obtain the variance of y n , we have
V ( y n )  E [ yn  E ( yn )]2  E ( y n2 )  Y 2 .
(7.13)
Consider
2
E ( y n2 )
n
n
1 n


1  n
2


 E  yi. 
E ( yi. )    E ( yi. ) E ( yi . ) 

n


n 2  i 1
i 1i   i 1
 i 1 


1  N M i 2

n
yi.  n (n  1) Y 2  , since i  th cluster is drawn with probability

n 2  i 1 M 0

Mi
, and sampling of clusters are wr , i.e. E ( yi. )  Y  E ( yi. ) .
M0


1  N M i 2
yi.  (n  1) Y 2  .


n  i 1 M 0

(7.14)
In view of equations (7.14) and (7.13), we get
V ( yn ) 
N M
1 N Mi 2
1
2
2 1
i
y

(
n

1
)
Y

Y

( yi.  Y ) 2   b2 , (say).


i.
n i 1 M 0
n i 1 M 0
n
Estimation of V ( y n )
Define,
sb2 
1 n
 ( yi.  yn ) 2 , then
n  1 i 1
E sb2 


1  n
1  n  N M i 2
2
2 
2


E
(
y
)

n
E
(
y
)

y

n
V
(
y
)

n
Y
 i.
   M i.
n 
n

n  1  i 1
n

1

0


i 1 i 1


1   N M i 2
n 

yi.  n Y 2   n V ( y n )

n  1   i 1 M 0




1  N M i

n
( yi.  Y ) 2  n b
n  1  i 1 M 0
n

2
  1 (n  2   2 )   2 .
b
b
b
 n 1

1
This shows that sb2 is an unbiased estimate of  b2 . Therefore, Vˆ ( y n )  sb2 is an unbiased
n
estimate of V ( y n )   b2 / n .
Download