CLUSTER SAMPLING In random sampling, it is presumed (to suppose) that the population has been divided into a finite number of distinct and identifiable units called the sampling units. The smallest units into which the population can be divided are called the elements of the population, and a group of such elements is known as a cluster. After dividing the population into specified cluster (as a simple rule, the number of elements in a cluster should be small and the number of cluster should be large), the required number of clusters are then obtained either by the method of equal or unequal probabilities of selection, such procedure, when the sampling units is a cluster, is called cluster sampling. If the entire area containing the population under study is subdivided into smaller area segments, and each element in the population is associated with one and only one such area segment, the procedure is alternatively called area sampling. There are two main reasons for using cluster as a sampling unit. i) Usually a complete list of the population units is not available and therefore the use of individual unit as sampling unit is not feasible. ii) Even when a complete list of the population units is available, by using cluster as sampling unit the cost of sampling can be reduced considerably. For instance, in a population survey it may be cheaper to collect data from all persons in a sample of households than from a sample of the same number of persons selected directly from all the persons. Similarly, it would be operationally more convenient to survey all households situated in a sample of areas such as villages than to survey a sample of the same number of households selected at random from a list of all households. Another example of the utility of cluster sampling is provided by crop survey, where locating a randomly selected farms or plot requires a considerable part the total time taken for the survey, but once the plot is located, the time taken for identifying and surveying a few neighbouring plots will generally be only marginal. Theory of equal clusters Suppose the population consists of N clusters, each of M elements. A sample of n clusters is drawn by the method of simple random sampling and every unit in the selected clusters is enumerated. Let us denote by yij , value of the j th element in the i th cluster, j 1, , M ; i 1, , N . y i. 1 M yij , mean per element of the i th cluster. M j 1 YN 1 N yi. , mean of cluster means in the population of N clusters. N i 1 Y 1 N M yij , mean per element in the population. NM i 1 j 1 yn y 1 n yi. , mean of cluster means in a sample of n clusters. n i 1 1 n M yij , mean per element in the sample. nM i 1 j 1 112 RU Khan YN Y , and y n y , if size of clusters are same. Note: S i2 1 M ( yij yi. ) 2 , mean square between elements within the i th cluster. M 1 j 1 1 S w2 N S i2 , N S b2 mean square within clusters. i 1 1 N ( yi. YN ) 2 , mean square between cluster means in the population. N 1 i 1 N M 1 S ( yij Y ) 2 , mean square between elements in the population. NM 1 i 1 j 1 2 E ( yij Y ) ( yik Y ) E ( yij Y ) 2 N M 1 ( yij Y ) ( yik Y ) NM ( M 1) i 1 j k 1 N M ( yij Y ) 2 NM i 1 j 1 N M ( yij Y ) ( yik Y ) i 1 j k ( M 1) ( NM 1) S 2 in clusters. , intracluster correlation coefficient between elements with Theorem: A simple random sample, wor , of n clusters each having M elements is drawn from a population of N clusters, the sample mean y n is an unbiased estimator of population 1 f 2 1 1 mean Y and its variance is V ( y n ) S b2 S . n b n N Proof: We have, 1 n 1 n 1 N E ( y n ) E yi. E ( yi. ) yi. YN Y . n N i 1 i 1 n i 1 To obtain the variance, we have, by definition 2 1 n n 1 n V ( y n ) E ( y n YN ) E yi. YN E ( yi. YN ) n n n 2 i 1 i 1 2 2 n 1 n 2 E ( y Y ) E ( yi. YN ) ( yi . YN ) i . N n 2 i 1 i i Consider E ( yi. YN ) 2 1 N N 1 2 ( yi. YN ) 2 S . N i 1 N b (7.1) Cluster sampling 113 and E ( yi. YN ) ( yi . YN ) N 1 ( yi. YN ) ( yi . YN ) N ( N 1) i i N N 1 ( yi. YN ) ( yi . YN ) ( yi. YN ) N ( N 1) i 1 i 1 N N N 1 ( y Y ) ( y Y ) ( yi. YN ) 2 i. N i . N N ( N 1) i 1 i 1 i 1 N 1 1 ( yi. YN ) 2 S b2 . N ( N 1) i 1 N (7.2) In view of equations (7.1) and (7.2), V ( y n ) reduces to n 1 n N 1 2 1 2 1 n ( N 1) 2 n (n 1) 2 S N Sb 2 N Sb N Sb b n 2 i 1 N n i i V ( yn ) N n 2 1 f 2 S S . nN b n b For large N , V ( y n ) Note: 1 2 S . n b Alternative expression of V ( y n ) interms of correlation coefficient Consider the intracluster correlation coefficient between elements within clusters and is defined as N M E ( yij Y ) ( yik Y ) E ( yij Y ) 2 ( yij Y ) ( yik Y ) i 1 j k ( M 1) ( NM 1) S 2 N M ( yij Y ) ( yik Y ) (M 1) ( NM 1) S 2 . i 1 j k By definition, V ( yn ) 1 f 2 1 f N Sb ( yi. YN ) 2 n n ( N 1) i 1 (7.3) Consider N 2 M N M M 2 1 1 ( y Y ) y Y ( y Y ) i. N M ij M N N 2 ij M i 1 j 1 i 1 i 1 j 1 N 2 N M 1 N M 2 , as Y Y ( y Y ) ( y Y ) ( y Y ) ij N ik 2 ij M i 1 j 1 i 1 j k (7.4) 114 RU Khan 1 M 2 [( NM 1) S 2 ( M 1)( NM 1) S 2 ] ( NM 1) S 2 M2 [1 ( M 1) ] (7.5) Substitute the values of equation (7.5) in equation (7.3), we get V ( yn ) Note: 1 f n ( NM 1) S 2 [1 ( M 1) ] . M 2 ( N 1) For large N , N (M 1 / N ) 1 NM 1 1 . 0 , so that (1 f ) 1 , and 2 2 N M ( N 1) NM (1 1 / N ) M Hence, V ( yn ) Corollary: S2 [1 ( M 1) ] . nM Yˆ NM y n is an unbiased estimate of the population total Y , and its variance 2 2 2 1 f 2 2 1 f ( NM 1) S ˆ V (Y ) N M [1 ( M 1) ] Sb N N 1 n n 1 f 2 N 2M S [1 ( M 1) ] , for large N . n Estimation of variance V ( y n ) Define, sb2 1 n 1 n 2 2 2 ( y y ) y n y i. n n 1 i. n , then n 1 i 1 i 1 E ( sb2 ) 1 n 2 2 E ( y ) n E ( y ) i. n n 1 i 1 Note that, V ( yi. ) E ( yi.2 ) YN 2 , so that N 1 2 2 E ( yi. 2 ) S b YN . N (7.6) and V ( y n ) E ( y n 2 ) YN 2 , so that N n 2 2 E ( yn 2 ) S b YN . nN In view of equations (7.7), and (7.6), E ( sb2 ) reduces as (7.7) Cluster sampling E ( sb2 ) 115 1 N 1 2 1 nN n N n 2 N n 2 2 n Sb n Sb Sb Sb . n 1 N N nN n 1 This shows that sb2 is an unbiased estimate of S b2 . Hence v ( y n ) estimator of V ( y n ) 1 f 2 Sb . n 1 f 2 s is an unbiased n b Relative efficiency (RE ) of cluster sampling In sampling of nM elements from the population by simple random sampling, wor , the variance of the sample mean y is given by NM nM V ( y sr ) NM 2 1 f 2 1 f 2 S Sb . S , and V ( y n ) n nM nM Thus, the relative efficiency of cluster sampling compared with simple random sampling is given by RE V ( y sr ) S2 . This shows that the efficiency of cluster sampling increases as the V ( yn ) M S 2 b mean square between clusters means S b2 decreases. Note: For large N , the relative efficiency of cluster sampling in terms of intracluster correlation coefficient is given by RE V ( y sr ) 1 . V ( y n ) 1 ( M 1) It can be seen that the relative efficiency depends on the value of , if i) 0 , then V ( y sr ) V ( y n ) , i.e. both methods are equally precise. ii) 0 , then V ( y sr ) V ( yn ) , i.e. simple random sampling is more precise. iii) 0 , then V ( y sr ) V ( y n ) , i.e. cluster sampling is more precise. Estimation of relative efficiency of cluster sampling We have, Est. ( RE ) Est. S 2 M Est. S b2 , here s 2 will not be a unbiased estimate of S 2 i.e. E ( s 2 ) S 2 , because a sample of nM elements is not taken randomly from the population of NM elements. To find unbiased estimate of S 2 , consider N M N M ( NM 1) S ( yij Y ) ( yij yi. yi. Y ) 2 2 i 1 j 1 N M 2 i 1 j 1 [( yij yi. ) 2 ( yi. Y ) 2 2 ( yij yi. ) ( yi. Y )] i 1 j 1 116 RU Khan N M N N i 1 j 1 i 1 i 1 ( yij yi. ) 2 M ( yi. Y ) 2 0 ( M 1) S i2 M ( N 1) S b2 2 N (M 1) S w M ( N 1) S b2 . (7.8) 2 It can be seen that in a random sample of n clusters, sb2 and s w will provide unbiased estimates of S b2 and S w2 , respectively. Define, 2 sw n M 1 ( yij yi. ) 2 , and n ( M 1) i 1 j 1 sb2 1 n ( yi. y n ) 2 . n 1 i 1 Consider 2 sw n M n n M 1 1 2 2 2 ( y y ) y M y ij i. n (M 1) ij i. , so that n ( M 1) i 1 j 1 i 1 i 1 j 1 2 E (s w ) n n M 1 E ( yij2 ) M E ( yi2. ) n ( M 1) i 1 j 1 i 1 Note that V ( yij ) E ( yij2 ) YN2 , then E ( yij2 ) ( NM 1) 2 ( N 1) 2 S YN2 . Similarly, we can see, E ( yi2. ) S b YN2 . NM N Therefore, 2 E (s w ) n n M ( NM 1) 1 ( N 1) 2 S 2 YN2 M S b YN2 n ( M 1) i 1 j 1 NM N i 1 ( NM 1) 2 ( N 1) 2 1 nM S nM YN2 nM S b nM YN2 n ( M 1) NM N 1 [( NM 1) S 2 M ( N 1) S b2 ] N ( M 1) 1 [ N ( M 1) S w2 ] S w2 , by using relation, which is given in equation (7.8). N ( M 1) and E ( sb2 ) S b2 , as n clusters are drawn under srswor . Thus, an unbiased estimate of S 2 will be Sˆ 2 1 2 [ N ( M 1) s w M ( N 1) sb2 ] . NM 1 Cluster sampling 117 Therefore, 1 2 [ N ( M 1) s w M ( N 1) sb2 ] . Est ( RE ) NM 1 M sb2 Note: For large N , Est. ( RE ) 1 2 [ N ( M 1) s w M ( N 1) sb2 ] N (M 1 / N ) M sb2 1 2 [ N ( M 1) s w NM (1 1 / N ) sb2 ] ( M 1) s 2 M s 2 w NM b . 2 2 2 M sb M sb Estimation of For large N , RE 1 E (say), so that 1 ( M 1) Eˆ (M 1) Eˆ ˆ 1 , where Eˆ or ˆ 1 Eˆ ( M 1) Eˆ 1 1 M 2 sb2 2 ( M 1) s w M sb2 M 2 sb2 2 [( M 1) s w M sb2 ] 1 2 2 ( M 1) [( M 1) s w M sb ] M 2 s2 b 2 M ( M 1) sb2 ( M 1) s w 2 ( M 1) [(M 1) s w M sb2 ] 2 M sb2 s w 2 ( M 1) s w M sb2 2 M 2 sb2 ( M 1) s w M sb2 2 ( M 1) [( M 1) s w M sb2 ] . Alternative method We have, 1 N M ( yij Y ) ( yik Y ) M 1 i 1 j k ( NM 1) S 2 2 M ( N 1) Sb2 . , and ( NM 1) S 2 N (M 1) S w Note that, from equation (7.4) M 2 N ( yi. YN ) i 1 N M or 2 N M N M ( yij Y ) 2 i 1 j 1 N ( yij Y ) ( yik Y ) M ( yi. Y ) i 1 j k ( yij Y ) ( yik Y ) i 1 j k 2 i 1 2 N M ( yij Y ) 2 i 1 j 1 118 RU Khan 2 M 2 ( N 1) S b2 ( NM 1) S 2 M 2 ( N 1) S b2 N ( M 1) S w M ( N 1) S b2 2 M ( N 1) S b2 (M 1) N (M 1) S w . Hence, M ( N 1) S b2 N S w2 M ( N 1) S b2 N ( M 1) S w2 . 2 It can be seen that in a random sample of n clusters, sb2 , and s w will provide unbiased estimate of S b2 , and S w2 respectively. Therefore, an estimator of will be ˆ 2 M ( N 1) sb2 N s w 2 M ( N 1) sb2 N ( M 1) s w , and for large N , ˆ 2 M sb2 s w 2 M sb2 ( M 1) s w . Determination of optimum cluster size The best size of the cluster to use depends on the cost of collecting information from clusters and the resulting variance. Regarding the variance function, it is found that variability between elements within clusters increases as the size of cluster increases (this means that large clusters are found to be more heterogeneous than small clusters) and decreases with increasing number of clusters. On the other hand, the cost decreases as the size of cluster increases and increases with the number of clusters increases. Hence, it is necessary to determine a balancing point by finding out the optimum cluster size and the number of clusters in the samples, which can minimize the sampling variance for a given cost or, alternatively, minimize the cost for a fixed variance. i) The cost of a survey, apart from overhead cost, will be made up of two components. ii) Cost due to expenses in enumerating the elements in the sample and in travelling within the cluster, which is proportional to the number of elements in the sample. iii) Cost due to expenses on travelling between clusters, which is proportional to the distance to be travelled between clusters. It has been shown empirically that the expected value of minimum distance between n points located at random is proportional to n . The cost of a survey can be, therefore expressed as C c1nM c2 n , where c1 is the cost of collecting information from an element within the cluster and c2 is the cost per unit distance travelled between clusters. In various agricultural surveys it has been observed that S w2 is related to M by the relation S w2 a M g , g 0 , where a and g are positive constant, then Sb2 ( NM 1) S 2 N ( M 1) aM g S 2 ( M 1) aM g 1 , for large N . M ( N 1) Thus, the variance V ( y n ) for large N , reduces as 1 V ( y n ) [ S 2 ( M 1) a M g 1 ] . n Cluster sampling 119 The problem is to determine n and M such that for specified cost, the variance of y n is a minimum. Using calculus methods we form V ( yn ) (c1nM c2 n C ) , where is an unknown constant. Differentiating with respect to n and M respectively, and equating the results to zero, we obtain c 1 0 [ S 2 ( M 1) a M g 1 ] c1M 2 , so that n 2 n n2 c 1 V ( y n ) c1M 2 n 2 n (7.9) and 0 V ( y n ) c1n , so that M M V ( y n ) c1n . M (7.10) On eliminating from equation (7.9) and (7.10), we have c1n V ( yn ) or 1 M c 2 V ( yn ) c1M n 2 n 1 or M V ( yn ) V ( y n ) M 1 1 V ( yn ) V ( y n ) M c1 c2 c1M 1 2c M n 1 1 c2 2c1M n Now solving, c1n M c2 n C 0 as a quadratic in n , we have c2 c 22 4 c1M C 4 c1M C 4 c1MC n or 2 c1M n c 2 c 2 1 c2 1 1 2 2 2 c1M c2 c2 Hence, M V ( yn ) V ( y n ) M 1 1 c2 4 c1MC c2 1 1 c 22 4 c MC 1 1 c 22 Now, solve LHS of equation (7.11), we have M M V ( yn ) [ S 2 ( M 1) a M g 1 ] V ( y n ) M n V ( y n ) M 1 [agM g a ( g 1) M g 1 ] . nV ( yn ) 1 / 2 1. (7.11) 120 RU Khan Therefore, 4 c MC aM g 1 [ gM ( g 1)] 1 1 1 nV ( yn ) c 22 1 / 2 (7.12) It is difficult to get an explicit expression for M . However, M can be obtained by the iterative method (trial and error method). On substituting the value of M thus obtained in equation (7.12), we can obtain the optimum value of n . It is evident from equation (7.12) that the optimum size of the unit becomes smaller when i) c1 increases i.e. time of measurement increases. ii) c2 decreases i.e. travel become cheaper. iii) total cost of survey C increases. Cluster sampling for proportion If it is desired to estimate the proportion P of elements belonging to a specified category A when the population consists of N clusters, each of size M and a random sample, wor , of n clusters is selected. Defining yij as 1 if the j th element of the i th cluster belongs to M the class A and 0 otherwise, it is easy to note that ai y ij gives the total number of j 1 a elements in the i th cluster that belong to class A , and pi i is the proportion in the M i th cluster. Hence the proportion P is 1 N M 1 N 1 N P yij NM ai N pi . NM i 1 j 1 i 1 i 1 n 1 An unbiased estimate of P is Pˆ pi p n i 1 and N N n N 1 1 1 2 V ( p) ( p P ) ( pi P) 2 , for large N . i 2 n N N 1 N n i 1 i 1 n 1 1 1 ˆ As an estimate of V ( p) we may use V ( p) ( pi p ) 2 . n N n 1 i 1 Alternatively, if we take a simple random sample, wor of nM elements from the population n PQ NM nM PQ of size, NM , the variance of p is V ( p) , for large N . 1 NM 1 nM N nM Cluster sampling 121 Theory of unequal clusters There are a number of situations where the cluster size vary from cluster to cluster, for example, villages or urban blocks which are groups of households, and households, which are groups of persons are usually considered as clusters for purposes of sampling, because of operational convenience. Suppose the population, consisting N clusters of size M 1 , M 2 , , M N such that N M i M 0 . A sample of n clusters is drawn by the method of simple random sampling, i 1 wor , and all elements of the clusters surveyed. Let us denote by yij , value of the j th element in the i th cluster, j 1, 2, , M i ; i 1, 2, , N . M i 1 y i. yij , mean per element of the i th cluster. M i j 1 1 N YN yi. , mean of the cluster means in the population of N clusters. N i 1 yn 1 n yi. , mean of the cluster means in the sample of n clusters. n i 1 N Mi 1 Y N Mi 1 N yij M M i yi. , i 1 j 1 mean per element in the population. 0 i 1 i 1 M M 1 N M i 0 , mean of cluster size. N i 1 N Three estimators of population mean Y , that are in common use may be considered. 1 n 1 estimate: It is defined by the sample mean of clusters means as y I yi. y n . n i 1 st By definition, 1 n 1 N E ( y I ) E yi. yi. YN Y , as the sampling is sr . n i 1 N i 1 Thus, y I is biased estimator of the population mean Y . The bias of the estimator is given as B E( y I ) Y 1 NM 1 N 1 N 1 N 1 N y M y y i. M i i. N i. N M M i yi. N i 1 0 i 1 i 1 i 1 N N N M yi. M i yi. 1 ( M i M ) yi. NM i 1 i 1 i 1 122 RU Khan 1 N ( M i M ) ( yi. YN YN ) N M i 1 1 N 1 N ( M i M ) ( yi. YN ) ( M i M ) YN N M i 1 N M i 1 1 Cov ( yi. , M i ) . M This shows that bias is expected to be small when M i and yi. are not highly correlated. In such a case, it is advisable to use this estimator. Its variance is given by V ( yI ) E ( yI YN )2 1 f 2 1 N Sb , where Sb2 ( yi. YN )2 N 1 i 1 n and an unbiased estimator of V ( y I ) is v( y I ) 1 f 2 1 n sb , where sb2 ( yi. y I ) 2 . n n 1 i 1 1 n 2 estimate: It is defined as y II M i y i. . nM i 1 nd By definition, E ( y II ) 1 n 1 1 N 1 N E ( M y ) M y i i. M N i i. NM M i yi. Y , as srwor . nM i 1 i 1 i 1 This shows that y II is unbiased estimate of Y . Its variance is given by 1 n 1 n M i yi. . V ( y II ) V M y V nM i i. n M i 1 i 1 Define, a variate ui M i yi. , i 1, 2,, N . M Let u and U be the sample and population means of variable u , respectively, where, 1 n M i yi. u y II , and n i 1 M 1 N M i yi. 1 N U M i yi. Y . N i 1 M M 0 i 1 Therefore, V ( y II ) V (u ) 1 where, S b 2 N 1 1 f 2 S b , as clusters are randomly drawn wor . n N 1 N M y (ui U ) N 1 Mi i. Y i 1 i 1 2 2 Cluster sampling 123 and an unbiased estimator of V ( y II ) is 2 1 f 2 1 n M i yi. 2 v( y II ) su , where su y II . n n 1 i 1 M 3rd estimator: It is defined as y III n 1 M i yi. . This estimate is a ratio estimate of M i i1 i the form Rˆ 1 yi , and its variance is given by replacing xi by M i and yi by M i yi. x i i i in the variance of ratio estimator, where, V ( Rˆ ) N 1 f 2 ( yi R xi ) 2 , n ( N 1) X i 1 and 2 X 2 1 N M i M 2 . Hence, N i 1 N N 1 1 f V ( y III ) M i yi. M i y i. M i n ( N 1) M 2 i 1 N i 1 Mi i 1 2 2 1 f N Mi 1 f ( yi. Y ) S b 2 , ( M y Y M ) i i . i 2 n ( N 1 ) M n n ( N 1) M i 1 i 1 1 f 1 where S b 2 N 1 N N 2 2 M Mi ( yi. Y ) . i 1 An unbiased estimate of V ( y III ) is given by 1 f 2 v ( y III ) s , where n b 1 sb 2 (n 1) n 2 M Mi ( yi. y III ) . i 1 Cluster sampling with varying probabilities and with replacement If a sample of n clusters is drawn with probabilities proportional to size, i.e. M pi M i or pi i and with replacement, then an unbiased estimate of Y is given by M0 Theorem: yn 1 N Mi 1 n with variance y V ( y ) i. ( yi. Y ) 2 . n n i 1 n i 1 M 0 Proof: By definition, 1 n 1 n 1 n N 1 N E ( y n ) E yi. E ( yi. ) pi yi. M y Y . n M 0 i i. n i 1 i 1 i 1 i 1 n i 1 124 RU Khan This shows that y n is an unbiased estimator of Y . To obtain the variance of y n , we have V ( y n ) E [ yn E ( yn )]2 E ( y n2 ) Y 2 . (7.13) Consider 2 E ( y n2 ) n n 1 n 1 n 2 E yi. E ( yi. ) E ( yi. ) E ( yi . ) n n 2 i 1 i 1i i 1 i 1 1 N M i 2 n yi. n (n 1) Y 2 , since i th cluster is drawn with probability n 2 i 1 M 0 Mi , and sampling of clusters are wr , i.e. E ( yi. ) Y E ( yi. ) . M0 1 N M i 2 yi. (n 1) Y 2 . n i 1 M 0 (7.14) In view of equations (7.14) and (7.13), we get V ( yn ) N M 1 N Mi 2 1 2 2 1 i y ( n 1 ) Y Y ( yi. Y ) 2 b2 , (say). i. n i 1 M 0 n i 1 M 0 n Estimation of V ( y n ) Define, sb2 1 n ( yi. yn ) 2 , then n 1 i 1 E sb2 1 n 1 n N M i 2 2 2 2 E ( y ) n E ( y ) y n V ( y ) n Y i. M i. n n n 1 i 1 n 1 0 i 1 i 1 1 N M i 2 n yi. n Y 2 n V ( y n ) n 1 i 1 M 0 1 N M i n ( yi. Y ) 2 n b n 1 i 1 M 0 n 2 1 (n 2 2 ) 2 . b b b n 1 1 This shows that sb2 is an unbiased estimate of b2 . Therefore, Vˆ ( y n ) sb2 is an unbiased n estimate of V ( y n ) b2 / n .