Estimation of Population Mean in Simple and Stratified Random Sampling Hulya Cingi and Nilgün Özgül Hacettepe University, Department of Statistics, Beytepe, 06800, Ankara, Turkey. e-mails : kadilar@hacettepe.edu.tr ; hcingi@hacettepe.edu.tr Abstract We propose the ratio estimator for the estimation of the population mean in the simple random sampling by using the estimators in Bahl and Tuteja [1] and Prasad [7]. We also adapt the proposed estimator to the stratified sampling using the separate ratio estimation method. Obtaining the mean square error (MSE) equations of the proposed estimators in both simple and stratified random sampling, we find theoretical conditions that the proposed estimators are more efficient than the other estimators. In addition, these conditions are supported by a numerical example. Key words : Separate ratio estimator, auxiliary information, sampling, efficiency. 2000 AMS Classification : 62 D 05 1. Introduction When information is available on the auxiliary variable, x that is positively correlated with the study variable, y, the ratio estimator is a widely used estimator to estimate the population mean, Y , as follows: yr y X , x (1) where y and x are the sample means of study and auxiliary variables, respectively, and it is assumed that the population mean, X , of the auxiliary variable is known. It is well known that the MSE equation of the ratio estimator is given by MSE yr Y 2 C y2 2C yx C x2 , where (2) 1 f n ; f ; n is the sample size; N is the number of units in the n N population; C yx C y C x S y S x YX ; is the population correlation coefficient between the auxiliary and the study variables; S x2 and S y2 are the population variances of the auxiliary and the study variables, respectively; C x and C y are the population coefficients of variation of auxiliary and study variables, respectively. Prasad [7] suggested the following ratio estimator: y p yr y X , x where is a constant whose optimal value for the estimator in (3) is P (3) 1 C yx 1 C y2 . The MSE of this estimator can be given by 2 1 C yx 2 2 . MSE min y p Y 2 1 C x 2 1 C y (4) Bahl and Tuteja [1] suggested the following estimator: X x , y BT y exp X x (5) where exp is the exponential function. The MSE equation of this estimator can be given by 1 MSE y BT Y 2 C y2 C yx C x2 . 4 (6) Although there have been many studies on the combined estimators in stratified random sampling for recent years, such as Shabbir and Gupta [9,10], Singh et al. [11], Koyuncu and Kadilar [6], the authors rarely consider the separate estimators in stratified random sampling literature. However, Vishwakarma and Singh [12] show that the separate estimators are always more efficient than the combined estimators for their proposed estimators. For this reason, we adapt the estimator proposed in the simple random sampling to the stratified random sampling using the separate method in this study. 2. Suggested Estimator in Simple Random Sampling Replacing the traditional ratio estimator, given in (1), with y in the estimator of Bahl and Tuteja [1], given in (5), and motivated by the estimator of Prasad [7], given in (3), we propose a new ratio estimator as follows: y pr X x y . X exp x X x (7) 3 To obtain the MSE equation for the proposed estimator, we use the Taylor Series Method defined by h a, b h a, b h Xˆ , Yˆ h X , Y | X ,Y Xˆ X | X ,Y Yˆ Y , (8) a b [13] where h Xˆ , Yˆ y pr , h X , Y Y , Xˆ x , X X and Yˆ y y so Y Y for the proposed estimator. y pr h a, b | X ,Y | a x X ,Y Y 1 Y X 2X 3 R , 2 y pr h a, b | X ,Y | b y X ,Y = 1, where R Y . X Then, with the aid of (8), we can write 2 E y pr Y 3 E R x X y Y , 2 MSE y pr 9 2 2 R V x 3 R cov x , y V y , 4 2 where V x S x2 , cov x , y cov x , y S yx , V y 2 S y2 1 Y 2 , 2 4 (see [8]). Using these equations, we can write 9 2 MSE y pr 1 Y 2 2 Y 2 C y2 3 C yx C x2 . 4 Setting pr MSE y pr (9) 0 , we get the optimum value of as 1 , 1 A (10) 9 where A C y2 3 C yx C x2 . By this way, when is replaced with pr in (9), the 4 minimum MSE of the proposed estimator can be written as MSE min y pr pr 1 Y 2 2pr Y 2 A 2 Y 2 A 1 A pr Y 2 A . (11) When there is no information about the population, one can estimate pr from the sample by ˆ pr 1 , 1 Â 9 where Aˆ Cˆ y2 3 Cˆ yx Cˆ x2 . Here Ĉ x and Ĉ y are the sample coefficients of 4 ˆ Ĉ y Ĉ x , where variation of auxiliary and study variables, respectively, and Ĉ yx ̂ is the sample correlation coefficient between the auxiliary and the study variables. 5 We would like to remind that the value of pr is always between 0 and 1 (0<pr<1), because and A are always positive. 3. Efficiency Comparisons in Simple Random Sampling In this section, we try to obtain the efficiency conditions for the proposed estimator by comparing the MSE of the proposed estimator with the MSE of the sample mean, traditional ratio estimator and the ratio estimators suggested by Prasad [7] and Bahl and Tuteja [1]. It is well known that under simple random sampling without replacement (SRSWOR) the variance of the sample mean is V y Y 2 C y2 . (12) We first compare the MSE of the proposed estimator, given in (11), with the variance of the sample mean. By this comparison, we have the following condition: MSE y pr V y , pr C y2 A . (13) When this condition is satisfied, the proposed estimator is more efficient than the sample mean. Secondly, we compare the MSE of the proposed estimator with the MSE of the traditional ratio estimator, given in (2). We have the following condition: MSE y pr MSE y r , pr B , A (14) 6 where B C y2 2 C yx C x2 . When the condition (14) is satisfied, the proposed estimator is more efficient than the traditional ratio estimator. Thirdly, comparing the MSE of the proposed estimator with the MSE of the estimator in Prasad [7], given in (4), we have the following condition: MSE y pr MSE y P , D , A pr (15) 1 C 2 where D 1 yx 1 C y2 C x2 . When the condition (15) is satisfied, we can say that the proposed estimator is more efficient than the ratio estimator, suggested by Prasad [7]. Finally, we compare the MSE of the proposed estimator with the MSE of the estimator in Bahl and Tuteja [1], given in (6), and we have the following condition: MSE y pr MSE y BT , pr E , A (16) 1 where E C y2 C yx C x2 . When the condition (16) is satisfied, the proposed 4 estimator is more efficient than the ratio estimator, suggested by Bahl and Tuteja [1]. By the equations (13)-(16), we can also find the upper bound of pr for the proposed estimator to be more efficient than the other estimators. 7 4. Suggested Estimator in Stratified Random Sampling Separate ratio estimator for the population total, Y, in the stratified random sampling is defined by yh Xh , h 1 x h y rs (17) where X h N h X h ; Nh is the population size in the stratum h; X h is the population mean of the auxiliary variable in the stratum h; x h and y h are the sample means of the auxiliary and study variables, respectively, in the stratum h and is the total number of stratum [3]. When we divide both sides of (17) by N, it is clear that we obtain the separate ratio estimator for the population mean in the stratified random sampling as y rs h h 1 where h yh Xh , xh (18) Nh [2]. N Adapting the proposed estimator in (7) to the separate ratio estimator in (18), we suggest a new estimator for the population mean in the stratified random sampling as follows: y prst h h 1 X xh h yh , X h exp h xh X h xh (19) where h is a constant for the stratum h. The MSE of this estimator can be obtained by 8 E y prst Y 2 X xh y hYh E h h X h exp h xh X h x h h 1 h 1 y X xh Yh E h h X h exp h X h xh h 1 xh 2 2 2 X h xh 2 yh Yh E h X h exp X h xh h 1 x h y X xh E h X h exp h h 1 X h xh xh 2 h Yh 2 2h MSE y prh . h 1 By this equation, we use (9) and we can write the MSE of the proposed estimator as 9 2 2 MSE y prst 2h h 1 Yh2 2h h Yh2 C yh 3 C yxh C xh2 (20) 4 h 1 where h 1 fh ; nh fh nh ; nh is the sample size in the stratum h; Nh C yxh h C yh C xh ; h is the population correlation coefficient between the auxiliary and the study variables in the stratum h; C xh and C yh are the population coefficients of variation of auxiliary and study variables, respectively, in the stratum h. Setting MSE y prst h 0 for each stratum, we get the optimum value of h as 9 prh 1 1 h Ah , h = 1, 2, … , (21) 9 2 3 C yxh C xh2 . Similar with the simple random sampling, prh where Ah C yh 4 can also be estimated from the sample for each stratum. Using these notations, when h is replaced with prh in (20), the minimum MSE of the proposed estimator can be written as MSE min y prst prh h 2hYh2 Ah . (22) h 1 It is clear that the values of prh differ from stratum to stratum but all of them are between 0 and 1. 5. Efficiency Comparisons in Stratified Random Sampling The traditional estimator in the stratified random sampling is defined by y st h y h . h 1 It is well known that the MSE equations of the traditional and the separate ratio estimators in the stratified random sampling are respectively 2 , MSE y st h 2h Yh2 C yh (23) h 1 MSE y rs h 2h Yh2 Bh , (24) h 1 2 where Bh C yh 2 C yxh C xh2 . When we compare these MSE equations with the MSE equation of the proposed estimator, given in (22), we have the following conditions: 10 h 1 h 1 h 1 h 1 prh Ah C yh2 , (25) prh Ah Bh . (26) When the condition (25) is satisfied, the proposed estimator is more efficient than the traditional stratified estimator and similarly when the condition (26) is satisfied, the proposed estimator is more efficient than the separate ratio estimator. 6. Numerical Example We use data in Kadilar and Cingi [4,5] to compare efficiencies between the classical and proposed estimators in the simple and the stratified random samplings, respectively. These data sets concern the level of apple production as the study variable, number of apple trees as the auxiliary variable in 106 villages in the Marmarian Region and in 854 villages in 6 strata of Turkey, respectively (as 1:Marmarian, 2:Agean, 3:Mediterranean, 4:Central Anatolia, 5:Black Sea, 6:East and Southeast Anatolia) in 1999 (Source: Institute of Statistics, Republic of Turkey). 6.1 Numerical example for simple random sampling In Table 1, we observe the statistics about the population. Using the simple random sampling, we take the sample size as n=20. We would like to remind that the sample size has no effect on the efficiency comparisons of the estimators, except the condition (15), as shown in the Section 3. Note that the correlation coefficient () between the auxiliary and study variables is 0.82 for this data set. INSERT TABLE 1 INSERT TABLE 2 11 We compute the MSE values of sample mean, traditional ratio, Prasad, Bahl-Tuteja, and proposed estimators using the equations (12), (2), (4), (6) and (11), respectively. Using these MSE values we compute the relative efficiency for the estimators, say Yˆ , with respect to the sample mean by MSE y RE Yˆ MSE Yˆ , Yˆ y , y r , y P , y BT , y pr . These relative efficiency values are shown in Table 2. We observe that the most efficient estimator is the proposed estimator. However, this result is an expected result because the conditions (13)-(16) are all satisfied as follows: C y2 A = 2.914; B = 1.299; A D = 0.854; A E = 1.937 . A It is worth of pointing that we obtain pr = 0.804 for this data set. In addition, we should denote that we use various sample sizes for the condition (15), but the condition is satisfied for all the sample sizes. 6.2 Numerical example for stratified random sampling In Table 3, we observe the statistics about the population. Using Neyman allocation in the stratified random sampling, we obtain the sample size for each stratum, nh (h = 1,2,…,6), as shown in Table 3. For details, please see Kadilar and Cingi [5]. INSERT TABLE 3 INSERT TABLE 4 We compute the MSE values of proposed, traditional and separate ratio estimators using the equations (22)-(24), respectively. Using these MSE values we 12 compute the relative efficiency for the estimators, say Yˆ st , with respect to the traditional stratified estimator by MSE y st RE Yˆst MSE Yˆst , Yˆst y st , yrs , y prst . These relative efficiency values are shown in Table 4. We observe that the most efficient estimator is the proposed estimator. However, this result is an expected result because the conditions (25) and (26) are all satisfied as follows: 6 C yh2 = 92.745 ; h 1 6 B h 1 h = 29.388 . 6 We would like to note that we obtain prh Ah = 17.370 for this data set. It is h 1 worth to point out that the sample size has no effect on the efficiency comparisons of the estimators for these conditions. 7. Conclusion We develop a new ratio estimator for the population mean in the simple random sampling using the estimator suggested in Bahl and Tuteja [1] and adapt this new estimator to the stratified random sampling using the separate method. Theoretically and numerically, we demonstrate that the proposed estimators in both simple and stratified random sampling have the smallest MSE values in certain conditions and for a specific data set. References [1] Bahl, S. and Tuteja, R.K. Ratio and Product Exponential Estimators, Journal of Information and Optimization Sciences 12 (1), 159-164, 1991. 13 [2] Cingi, H. Sampling Theory (Hacettepe University Press, 1994). (in Turkish) [3] Cochran, W.G. Sampling Techniques (John Wiley and Sons, 1977). [4] Kadilar, C. and Cingi, H. A study on the chain ratio-type estimator, Hacettepe Journal of Mathematics and Statistics 32 (1), 105-108, 2003. [5] Kadilar, C. and Cingi, H. Ratio Estimators in Stratified Random Sampling, Biometrical Journal 45 (2), 218-225, 2003. [6] Koyuncu, N. and Kadilar, C. Ratio and Product Estimators in Stratified Random Sampling, Journal of Statistical Planning and Inference 139 (8), 2552-2558, 2009. [7] Prasad, B. Some Improved Ratio Type Estimators of Population Mean and Ratio in Finite Population Sample Surveys, Communications in Statistics: Theory and Methods 18 (1), 379-392, 1989. [8] Searls, D.T. Utilization of Known Coefficient of Kurtosis in the Estimation Procedure of Variance, Journal of American Statistical Association 59, 1225-1226, 1964. [9] Shabbir, J. and Gupta, S. Improved Ratio Estimators in Stratified Sampling, American Journal of Mathematical and Management Sciences 25, 293-311, 2005. [10] Shabbir, J. and Gupta, S. A New Estimator of Population Mean in Stratified Sampling, Communications in Statistics: Theory and Methods 35 (7), 12011209, 2006. [11] Singh, H.P., Tailor, R., Singh, S., and Kim, J.M. A Modified Estimator of Population Mean Using Power Transformation, Statistical Papers 49, 37-58, 2008. [12] Vishwakarma, G.K. and Singh, H.P. (2009) Ratio-Product Estimators in Stratified Sampling, Statistical Methodology (accepted) [13] Wolter, K.M. Introduction to Variance Estimation (Springer-Verlag, 1985). 14 Table 1 Data statistics of the population for the simple random sampling. N = 106 Y = 1536.774 n = 20 X = 24375.594 = 0.041 A = 6.000 = 0.816 B = 7.789 Cyx = 6.881 D = 0.208 Cy = 4.181 E = 11.616 Cx = 2.018 pr = 0.804 15 Table 2 Relative efficiency of estimators in the simple random sampling. Estimators RE sample y 100 traditional y r 224.415 Prasad y P 341.205 Bahl-Tuteja y BT 150.475 proposed y pr 362.345 16 Table 3 Data statistics of the population for the stratified random sampling. N=854 N1=106 N2=106 N3=94 N4=171 N5=204 N6=173 n=140 n1=9 n2=17 n3=38 n4=67 n5=7 n6=2 X =37600 X 1 =24375 X 2 =27422 X 3 =72410 X 4 =74365 X 5 =26442 X 6 =9844 Y =2930 Y1 =1537 Y2 =2213 Y3 =9384 Y4 =5588 Y5 =967 Y6 =404 1=0.124 2=0.124 3=0.110 4=0.200 5=0.239 6=0.203 1=0.102 2=0.049 3=0.016 4=0.009 5=0.138 6=0.006 =0.917 1=0.816 2=0.856 3=0.901 4=0.986 5=0.713 6=0.894 Cx=3.851 Cx1=2.018 Cx2=2.095 Cx3=2.220 Cx4=3.841 Cx5=1.717 Cx6=1.909 Cy=5.838 Cy1=4.181 Cy2=5.221 Cy3=3.187 Cy4=5.126 Cy5=2.471 Cy6=2.339 Cyx=20.604 Cyx1=6.881 Cyx2=9.365 Cyx3=6.376 Cyx4=19.408 Cyx5=3.026 Cyx6=3.990 A1 =6.000 A2 =9.043 A3 =2.119 A4 =1.237 A5 =3.663 A6 =1.701 B1 =7.789 B2 =12.919 B3 =2.334 B4 =2.208 B5 =3.004 B6 =1.135 pr1 =0.621 pr2 =0.691 pr3 =0.968 pr4 =0.989 pr5 =0.664 pr6 =0.990 17 Table 4 Relative efficiency of estimators in the stratified random sampling. Estimators RE stratified y st 100 separate ratio y rs 416.507 proposed y prst 658.390 18