Estimation of Population Mean in Simple and Stratified Sampling

advertisement
Estimation of Population Mean in Simple and Stratified Random
Sampling
Hulya Cingi and Nilgün Özgül
Hacettepe University, Department of Statistics, Beytepe, 06800, Ankara, Turkey.
e-mails : kadilar@hacettepe.edu.tr
;
hcingi@hacettepe.edu.tr
Abstract
We propose the ratio estimator for the estimation of the population mean in the
simple random sampling by using the estimators in Bahl and Tuteja [1] and
Prasad [7]. We also adapt the proposed estimator to the stratified sampling using
the separate ratio estimation method. Obtaining the mean square error (MSE)
equations of the proposed estimators in both simple and stratified random
sampling, we find theoretical conditions that the proposed estimators are more
efficient than the other estimators. In addition, these conditions are supported by a
numerical example.
Key words : Separate ratio estimator, auxiliary information, sampling, efficiency.
2000 AMS Classification : 62 D 05
1. Introduction
When information is available on the auxiliary variable, x that is positively
correlated with the study variable, y, the ratio estimator is a widely used estimator
to estimate the population mean, Y , as follows:
yr 
y
X ,
x
(1)
where y and x are the sample means of study and auxiliary variables,
respectively, and it is assumed that the population mean, X , of the auxiliary
variable is known. It is well known that the MSE equation of the ratio estimator is
given by


MSE  yr    Y 2 C y2  2C yx  C x2 ,
where  
(2)
1 f
n
; f  ; n is the sample size; N is the number of units in the
n
N
population; C yx  C y C x 
S y S x
YX
;  is the population correlation coefficient
between the auxiliary and the study variables; S x2 and S y2 are the population
variances of the auxiliary and the study variables, respectively; C x and C y are the
population coefficients of variation of auxiliary and study variables, respectively.
Prasad [7] suggested the following ratio estimator:
y p   yr  
y
X ,
x
where  is a constant whose optimal value for the estimator in (3) is  P 
(3)
1   C yx
1   C y2
.
The MSE of this estimator can be given by
2
 1  C yx 2

2
.
MSE min  y p   Y 2 1 


C
x
2


1


C
y


(4)
Bahl and Tuteja [1] suggested the following estimator:
X x
 ,
y BT  y exp 
X x
(5)
where exp is the exponential function. The MSE equation of this estimator can be
given by
1


MSE  y BT    Y 2  C y2  C yx  C x2  .
4


(6)
Although there have been many studies on the combined estimators in stratified
random sampling for recent years, such as Shabbir and Gupta [9,10], Singh et al.
[11], Koyuncu and Kadilar [6], the authors rarely consider the separate estimators
in stratified random sampling literature. However, Vishwakarma and Singh [12]
show that the separate estimators are always more efficient than the combined
estimators for their proposed estimators. For this reason, we adapt the estimator
proposed in the simple random sampling to the stratified random sampling using
the separate method in this study.
2. Suggested Estimator in Simple Random Sampling
Replacing the traditional ratio estimator, given in (1), with y in the estimator of
Bahl and Tuteja [1], given in (5), and motivated by the estimator of Prasad [7],
given in (3), we propose a new ratio estimator as follows:
y pr  
X x
y
 .
X exp 
x
X x
(7)
3
To obtain the MSE equation for the proposed estimator, we use the Taylor Series
Method defined by
h a, b
h a, b
h Xˆ , Yˆ  h  X , Y  
| X ,Y Xˆ  X 
| X ,Y Yˆ  Y , (8)
a
b








[13] where h Xˆ , Yˆ  y pr , h  X , Y   Y , Xˆ  x , X  X and Yˆ  y    y so
Y   Y for the proposed estimator.
 y pr
h a, b 
| X ,Y 
|
a
 x X ,Y

Y
1

Y
X
2X
3
  R ,
2
 y pr
h a, b 
| X ,Y 
|
b
  y X ,Y
= 1,
where R 
Y
.
X
Then, with the aid of (8), we can write
2
E  y pr  Y 
 3

 E   R x  X    y   Y  ,
 2

MSE  y pr  
9 2 2
 R V  x   3  R cov  x , y   V  y  ,
4
2
where
V x    S x2 ,
cov x , y    cov x , y     S yx ,
V  y    2  S y2    1 Y 2 ,
2
4
(see [8]).
Using these equations, we can write
9


2
MSE  y pr     1 Y 2   2  Y 2  C y2  3 C yx  C x2  .
4


Setting
 pr 
MSE  y pr 

(9)
 0 , we get the optimum value of  as
1
,
1  A
(10)
9
where A  C y2  3 C yx  C x2 . By this way, when  is replaced with pr in (9), the
4
minimum MSE of the proposed estimator can be written as
MSE min y pr    pr  1 Y 2   2pr  Y 2 A
2

Y 2 A
1  A
  pr Y 2 A .
(11)
When there is no information about the population, one can estimate pr
from the sample by
ˆ pr 

1
,
1   Â
9
where Aˆ  Cˆ y2  3 Cˆ yx  Cˆ x2 . Here Ĉ x and Ĉ y are the sample coefficients of
4
ˆ Ĉ y Ĉ x , where
variation of auxiliary and study variables, respectively, and Ĉ yx  
̂ is the sample correlation coefficient between the auxiliary and the study
variables.
5
We would like to remind that the value of pr is always between 0 and 1
(0<pr<1), because  and A are always positive.
3. Efficiency Comparisons in Simple Random Sampling
In this section, we try to obtain the efficiency conditions for the proposed
estimator by comparing the MSE of the proposed estimator with the MSE of the
sample mean, traditional ratio estimator and the ratio estimators suggested by
Prasad [7] and Bahl and Tuteja [1].
It is well known that under simple random sampling without replacement
(SRSWOR) the variance of the sample mean is
V  y    Y 2 C y2 .
(12)
We first compare the MSE of the proposed estimator, given in (11), with
the variance of the sample mean. By this comparison, we have the following
condition:
MSE  y pr   V  y  ,
 pr 
C y2
A
.
(13)
When this condition is satisfied, the proposed estimator is more efficient than the
sample mean.
Secondly, we compare the MSE of the proposed estimator with the MSE of
the traditional ratio estimator, given in (2). We have the following condition:
MSE  y pr   MSE  y r  ,
 pr 
B
,
A
(14)
6
where B  C y2  2 C yx  C x2 . When the condition (14) is satisfied, the proposed
estimator is more efficient than the traditional ratio estimator.
Thirdly, comparing the MSE of the proposed estimator with the MSE of
the estimator in Prasad [7], given in (4), we have the following condition:
MSE  y pr   MSE  y P  ,
D
,
A
 pr 
(15)
1  C 
2
where D  1 
yx
1  C y2
 C x2 . When the condition (15) is satisfied, we can say
that the proposed estimator is more efficient than the ratio estimator, suggested by
Prasad [7].
Finally, we compare the MSE of the proposed estimator with the MSE of
the estimator in Bahl and Tuteja [1], given in (6), and we have the following
condition:
MSE  y pr   MSE  y BT  ,
 pr 
E
,
A
(16)
1
where E  C y2  C yx  C x2 . When the condition (16) is satisfied, the proposed
4
estimator is more efficient than the ratio estimator, suggested by Bahl and Tuteja
[1]. By the equations (13)-(16), we can also find the upper bound of  pr for the
proposed estimator to be more efficient than the other estimators.
7
4. Suggested Estimator in Stratified Random Sampling
Separate ratio estimator for the population total, Y, in the stratified random
sampling is defined by

yh
Xh ,
h 1 x h
y rs  
(17)
where X h  N h X h ; Nh is the population size in the stratum h; X h is the
population mean of the auxiliary variable in the stratum h; x h and y h are the
sample means of the auxiliary and study variables, respectively, in the stratum h
and  is the total number of stratum [3].
When we divide both sides of (17) by N, it is clear that we obtain the
separate ratio estimator for the population mean in the stratified random sampling
as

y rs   h
h 1
where  h 
yh
Xh ,
xh
(18)
Nh
[2].
N
Adapting the proposed estimator in (7) to the separate ratio estimator in
(18), we suggest a new estimator for the population mean in the stratified random
sampling as follows:

y prst   h
h 1
 X  xh 
 h yh
 ,
X h exp  h
xh
 X h  xh 
(19)
where  h is a constant for the stratum h. The MSE of this estimator can be
obtained by
8
E  y prst  Y 
2


 X  xh  
y
   hYh 
 E  h h X h exp  h
xh
 X h  x h  h 1
 h 1


y

 X  xh 


  Yh  
 E  h  h X h exp  h

 X h  xh 

 h 1  xh

2
2
2
 

 
 X h  xh 
2 yh
  Yh  
 E   h  X h exp 
 X h  xh 
 h 1  x h
 
y
 X  xh
   E  h X h exp  h
h 1
 X h  xh
 xh

2
h


  Yh 


2
   2h MSE  y prh  .

h 1
By this equation, we use (9) and we can write the MSE of the proposed estimator
as


9
 2

2
MSE  y prst    2h  h  1 Yh2   2h  h Yh2  C yh
 3 C yxh  C xh2  (20)
4
h 1



where  h 
1 fh
;
nh
fh 
nh
; nh is the sample size in the stratum h;
Nh
C yxh   h C yh C xh ; h is the population correlation coefficient between the auxiliary
and the study variables in the stratum h; C xh and C yh are the population
coefficients of variation of auxiliary and study variables, respectively, in the
stratum h.
Setting
MSE  y prst 
 h
 0 for each stratum, we get the optimum value of h
as
9
 prh 
1
1   h Ah
, h = 1, 2, … , 
(21)
9
2
 3 C yxh  C xh2 . Similar with the simple random sampling, prh
where Ah  C yh
4
can also be estimated from the sample for each stratum. Using these notations,
when h is replaced with prh in (20), the minimum MSE of the proposed estimator
can be written as
MSE min  y prst     prh  h  2hYh2 Ah .

(22)
h 1
It is clear that the values of prh differ from stratum to stratum but all of
them are between 0 and 1.
5. Efficiency Comparisons in Stratified Random Sampling
The traditional estimator in the stratified random sampling is defined by

y st   h y h .
h 1
It is well known that the MSE equations of the traditional and the separate ratio
estimators in the stratified random sampling are respectively

2
,
MSE  y st     h 2h Yh2 C yh
(23)
h 1

MSE  y rs     h 2h Yh2 Bh ,
(24)
h 1
2
where Bh  C yh
 2 C yxh  C xh2 . When we compare these MSE equations with the
MSE equation of the proposed estimator, given in (22), we have the following
conditions:
10


h 1
h 1


h 1
h 1
  prh Ah   C yh2 ,
(25)
  prh Ah   Bh .
(26)
When the condition (25) is satisfied, the proposed estimator is more efficient than
the traditional stratified estimator and similarly when the condition (26) is
satisfied, the proposed estimator is more efficient than the separate ratio estimator.
6. Numerical Example
We use data in Kadilar and Cingi [4,5] to compare efficiencies between the
classical and proposed estimators in the simple and the stratified random
samplings, respectively. These data sets concern the level of apple production as
the study variable, number of apple trees as the auxiliary variable in 106 villages
in the Marmarian Region and in 854 villages in 6 strata of Turkey, respectively
(as 1:Marmarian, 2:Agean, 3:Mediterranean, 4:Central Anatolia, 5:Black Sea,
6:East and Southeast Anatolia) in 1999 (Source: Institute of Statistics, Republic of
Turkey).
6.1 Numerical example for simple random sampling
In Table 1, we observe the statistics about the population. Using the simple
random sampling, we take the sample size as n=20. We would like to remind that
the sample size has no effect on the efficiency comparisons of the estimators,
except the condition (15), as shown in the Section 3. Note that the correlation
coefficient () between the auxiliary and study variables is 0.82 for this data set.
INSERT TABLE 1
INSERT TABLE 2
11
We compute the MSE values of sample mean, traditional ratio, Prasad,
Bahl-Tuteja, and proposed estimators using the equations (12), (2), (4), (6) and
(11), respectively. Using these MSE values we compute the relative efficiency for
the estimators, say Yˆ , with respect to the sample mean by

MSE  y 
RE Yˆ 
MSE Yˆ
, Yˆ  y , y r , y P , y BT , y pr .

These relative efficiency values are shown in Table 2. We observe that the most
efficient estimator is the proposed estimator. However, this result is an expected
result because the conditions (13)-(16) are all satisfied as follows:
C y2
A
= 2.914;
B
= 1.299;
A
D
= 0.854;
A
E
= 1.937 .
A
It is worth of pointing that we obtain pr = 0.804 for this data set. In addition, we
should denote that we use various sample sizes for the condition (15), but the
condition is satisfied for all the sample sizes.
6.2 Numerical example for stratified random sampling
In Table 3, we observe the statistics about the population. Using Neyman
allocation in the stratified random sampling, we obtain the sample size for each
stratum, nh (h = 1,2,…,6), as shown in Table 3. For details, please see Kadilar and
Cingi [5].
INSERT TABLE 3
INSERT TABLE 4
We compute the MSE values of proposed, traditional and separate ratio
estimators using the equations (22)-(24), respectively. Using these MSE values we
12
compute the relative efficiency for the estimators, say Yˆ st , with respect to the
traditional stratified estimator by
 
MSE  y st 
RE Yˆst 
MSE Yˆst
, Yˆst  y st , yrs , y prst .
 
These relative efficiency values are shown in Table 4. We observe that the most
efficient estimator is the proposed estimator. However, this result is an expected
result because the conditions (25) and (26) are all satisfied as follows:
6
 C yh2 = 92.745 ;
h 1
6
B
h 1
h
= 29.388 .
6
We would like to note that we obtain
  prh Ah = 17.370 for this data set. It is
h 1
worth to point out that the sample size has no effect on the efficiency comparisons
of the estimators for these conditions.
7. Conclusion
We develop a new ratio estimator for the population mean in the simple random
sampling using the estimator suggested in Bahl and Tuteja [1] and adapt this new
estimator to the stratified random sampling using the separate method.
Theoretically and numerically, we demonstrate that the proposed estimators in
both simple and stratified random sampling have the smallest MSE values in
certain conditions and for a specific data set.
References
[1] Bahl, S. and Tuteja, R.K. Ratio and Product Exponential Estimators,
Journal of Information and Optimization Sciences 12 (1), 159-164, 1991.
13
[2] Cingi, H. Sampling Theory (Hacettepe University Press, 1994). (in Turkish)
[3] Cochran, W.G. Sampling Techniques (John Wiley and Sons, 1977).
[4] Kadilar, C. and Cingi, H. A study on the chain ratio-type estimator,
Hacettepe Journal of Mathematics and Statistics 32 (1), 105-108, 2003.
[5] Kadilar, C. and Cingi, H. Ratio Estimators in Stratified Random Sampling,
Biometrical Journal 45 (2), 218-225, 2003.
[6] Koyuncu, N. and Kadilar, C. Ratio and Product Estimators in Stratified
Random Sampling, Journal of Statistical Planning and Inference 139 (8),
2552-2558, 2009.
[7] Prasad, B. Some Improved Ratio Type Estimators of Population Mean and
Ratio in Finite Population Sample Surveys, Communications in Statistics:
Theory and Methods 18 (1), 379-392, 1989.
[8] Searls, D.T. Utilization of Known Coefficient of Kurtosis in the Estimation
Procedure of Variance, Journal of American Statistical Association 59,
1225-1226, 1964.
[9] Shabbir, J. and Gupta, S. Improved Ratio Estimators in Stratified Sampling,
American Journal of Mathematical and Management Sciences 25, 293-311,
2005.
[10] Shabbir, J. and Gupta, S. A New Estimator of Population Mean in Stratified
Sampling, Communications in Statistics: Theory and Methods 35 (7), 12011209, 2006.
[11] Singh, H.P., Tailor, R., Singh, S., and Kim, J.M. A Modified Estimator of
Population Mean Using Power Transformation, Statistical Papers 49, 37-58,
2008.
[12] Vishwakarma, G.K. and Singh, H.P. (2009) Ratio-Product Estimators in
Stratified Sampling, Statistical Methodology (accepted)
[13] Wolter, K.M. Introduction to Variance Estimation (Springer-Verlag, 1985).
14
Table 1 Data statistics of the population for the simple random sampling.
N = 106
Y = 1536.774
n = 20
X = 24375.594
 = 0.041
A = 6.000
 = 0.816
B = 7.789
Cyx = 6.881
D = 0.208
Cy = 4.181
E = 11.616
Cx = 2.018
pr = 0.804
15
Table 2 Relative efficiency of estimators in the simple random sampling.
Estimators
RE
sample  y 
100
traditional  y r 
224.415
Prasad  y P 
341.205
Bahl-Tuteja  y BT 
150.475
proposed  y pr 
362.345
16
Table 3 Data statistics of the population for the stratified random sampling.
N=854
N1=106
N2=106
N3=94
N4=171
N5=204
N6=173
n=140
n1=9
n2=17
n3=38
n4=67
n5=7
n6=2
X =37600
X 1 =24375
X 2 =27422
X 3 =72410
X 4 =74365
X 5 =26442
X 6 =9844
Y =2930
Y1 =1537
Y2 =2213
Y3 =9384
Y4 =5588
Y5 =967
Y6 =404
1=0.124
2=0.124
3=0.110
4=0.200
5=0.239
6=0.203
1=0.102
2=0.049
3=0.016
4=0.009
5=0.138
6=0.006
=0.917
1=0.816
2=0.856
3=0.901
4=0.986
5=0.713
6=0.894
Cx=3.851
Cx1=2.018
Cx2=2.095
Cx3=2.220
Cx4=3.841
Cx5=1.717
Cx6=1.909
Cy=5.838
Cy1=4.181
Cy2=5.221
Cy3=3.187
Cy4=5.126
Cy5=2.471
Cy6=2.339
Cyx=20.604
Cyx1=6.881
Cyx2=9.365
Cyx3=6.376
Cyx4=19.408
Cyx5=3.026
Cyx6=3.990
A1 =6.000
A2 =9.043
A3 =2.119
A4 =1.237
A5 =3.663
A6 =1.701
B1 =7.789
B2 =12.919
B3 =2.334
B4 =2.208
B5 =3.004
B6 =1.135
 pr1 =0.621
 pr2 =0.691
 pr3 =0.968
 pr4 =0.989
 pr5 =0.664
pr6 =0.990
17
Table 4 Relative efficiency of estimators in the stratified random sampling.
Estimators
RE
stratified  y st 
100
separate ratio  y rs 
416.507
proposed
y 
prst
658.390
18
Download