A General Class of Estimators of Population Median Using Two

advertisement
A General Class of Estimators of Population Median Using Two Auxiliary
Variables in Double Sampling
Jack Allen1 , Housila P. Singh2, Sarjinder Singh3, Florentin Smarandache4
1
School of Accounting and Finance, Griffith University, Australia
School of Studies in Statistics, Vikram University, Ujjain - 456 010 (M. P.), India
3
Department of Mathematics and Statistics, University of Saskatchewan, Canada
4
Department of Mathematics, University of New Mexico, Gallup, USA
2
Abstract:
In this paper we have suggested two classes of estimators for population median MY of the study
character Y using information on two auxiliary characters X and Z in double sampling. It has
been shown that the suggested classes of estimators are more efficient than the one suggested by
Singh et al (2001). Estimators based on estimated optimum values have been also considered
with their properties. The optimum values of the first phase and second phase sample sizes are
also obtained for the fixed cost of survey.
Keywords: Median estimation, Chain ratio and regression estimators, Study variate, Auxiliary
variate, Classes of estimators, Mean squared errors, Cost, Double sampling.
2000 MSC: 60E99
1. INTRODUCTION
In survey sampling, statisticians often come across the study of variables which have highly
skewed distributions, such as income, expenditure etc. In such situations, the estimation of
median deserves special attention. Kuk and Mak (1989) are the first to introduce the estimation
of population median of the study variate Y using auxiliary information in survey sampling.
Francisco and Fuller (1991) have also considered the problem of estimation of the median as part
of the estimation of a finite population distribution function. Later Singh et al (2001) have dealt
extensively with the problem of estimation of median using auxiliary information on an auxiliary
variate in two phase sampling.
Consider a finite population U={1,2,…,i,...,N}. Let Y and X be the variable for study and
auxiliary variable, taking values Yi and Xi respectively for the i-th unit. When the two variables
are strongly related but no information is available on the population median MX of X, we seek
to estimate the population median MY of Y from a sample Sm, obtained through a two-phase
selection. Permitting simple random sampling without replacement (SRSWOR) design in each
phase, the two-phase sampling scheme will be as follows:
(i)
The first phase sample Sn(Sn⊂U) of fixed size n is drawn to observe only X in
order to furnish an estimate of MX.
(ii)
Given Sn, the second phase sample Sm(Sm⊂Sn) of fixed size m is drawn to observe
Y only.
Assuming that the median MX of the variable X is known, Kuk and Mak (1989) suggested a ratio
estimator for the population median MY of Y as
M
Mˆ 1 = Mˆ Y X
Mˆ
(1.1)
X
where M̂ Y and M̂ X are the sample estimators of MY and MX respectively based on a sample Sm
of size m. Suppose that y(1), y(2), …, y(m) are the y values of sample units in ascending order.
Further, let t be an integer such that Y(t) ≤ MY ≤Y(t+1) and let p=t/m be the proportion of Y, values
in the sample that are less than or equal to the median value MY, an unknown population
parameter. If p̂ is a predictor of p, the sample median M̂ Y can be written in terms of quantities
as Qˆ ( pˆ ) where pˆ = 0.5 . Kuk and Mak (1989) define a matrix of proportions (Pij(x,y)) as
Y
X ≤ MX
X > MX
Total
and a position estimator of My given by
Y ≤ MY
P11(x,y)
P12(x,y)
P1⋅(x,y)
Y > MY
P21(x,y)
P22(x,y)
P2⋅(x,y)
Total
P⋅1(x,y)
P⋅2(x,y)
1
(p)
Mˆ Y = Qˆ Y ( pˆ Y )
where
pˆ Y =
(1.2)
1  mx pˆ 11 ( x, y ) (m − mx ) pˆ 12 ( x, y ) 


+
m  pˆ ⋅1 ( x, y )
pˆ ⋅2 ( x, y )

 m pˆ ( x, y ) + (m − mx ) pˆ 12 ( x, y ) 
≈ 2 x 11

m


with pˆ ij ( x, y ) being the sample analogues of the Pij(x,y) obtained from the population and mx the
number of units in Sm with X ≤ MX.
~
~
Let FYA ( y ) and FYB ( y ) denote the proportion of units in the sample Sm with X ≤ MX, and
X>MX, respectively that have Y values less than or equal to y. Then for estimating MY, Kuk and
Mak (1989) suggested the 'stratification estimator' as
{
}
~ ( y)
(St )
Mˆ Y = inf y : FY ≥ 0.5
[
1 ~ ( y) ~ ( y)
where FˆY ( y ) ≅ FYA + FYB
2
(1.3)
]
It is to be noted that the estimators defined in (1.1), (1.2) and (1.3) are based on prior knowledge
of the median MX of the auxiliary character X. In many situations of practical importance the
population median MX of X may not be known. This led Singh et al (2001) to discuss the
problem of estimating the population median MY in double sampling and suggested an analogous
ratio estimator as
Mˆ 1X
ˆ
ˆ
M 1d = M Y
Mˆ
(1.4)
X
where Mˆ 1X is sample median based on first phase sample Sn.
Sometimes even if MX is unknown, information on a second auxiliary variable Z, closely related
to X but compared X remotely related to Y, is available on all units of the population. This type
of situation has been briefly discussed by, among others, Chand (1975), Kiregyera (1980, 84),
Srivenkataramana and Tracy (1989), Sahoo and Sahoo (1993) and Singh (1993). Let MZ be the
known population median of Z. Defining
 Mˆ Y

 Mˆ X

 Mˆ X 1
  Mˆ Z

 M̂ Z1









e0 = 
− 1, e1 = 
− 1, e2 = 
− 1, e3 
− 1 and e 4 = 
− 1
 MY

MX

 MX
 MZ

 MZ

such that E(ek)≅0 and ek<1 for k=0,1,2,3; where M̂ 2 and M̂ 21 are the sample median
estimators based on second phase sample Sm and first phase sample Sn. Let us define the
following two new matrices as
X ≤ MX
X > MX
Total
Z ≤ MZ
P11(x,z)
P12(x,z)
P1⋅(x,z)
Z > MZ
P21(x,z)
P22(x,z)
P2⋅(x,z)
Total
P⋅1(x,z)
P⋅2(x,z)
1
Y ≤ MY
Y > MY
Total
Z ≤ MZ
P11(y,z)
P12(y,z)
P1⋅(y,z)
Z > MZ
P21(y,z)
P22(y,z)
P2⋅(y,z)
Total
P⋅1(y,z)
P⋅2(y,z)
1
and
Using results given in the Appendix-1, to the first order of approximation, we have
N-m
E(e02) =  N  (4m)-1{MYfY(MY)}-2,


N-m


E(e12) = 
 (4m)-1{MXfX(MX)}-2,
N


N-n


E(e22) =  N  (4n)-1{MXfX(MX)}-2,


N-m


E(e32) =  N  (4m)-1{MZfZ(MZ)}-2,


N-n


E(e42) = 
 (4n)-1{MZfZ(MZ)}-2,
 N 
N-m
E(e0e1) =  N  (4m)-1{4P11(x,y)-1}{MXMYfX(MX)fY(MY)}-1,


N-n
E(e0e2) =  N  (4n)-1{4P11(x,y)-1}{MXMYfX(MX)fY(MY)}-1,


N-m
E(e0e3) =  N  (4m)-1{4P11(y,z)-1}{MYMZfY(MY)fZ(MZ)}-1,


N-n
E(e0e4) =  N  (4n)-1{4P11(y,z)-1}{MYMZfY(MY)fZ(MZ)}-1,


N-n


E(e1e2) =  N  (4n)-1{MXfX(MX)}-2,


N-m


E(e1e3) =  N  (4m)-1{4P11(x,z)-1}{MXMZfX(MX)fZ(MZ)}-1,


N-n


E(e1e4) =  N  (4n)-1{4P11(x,z)-1}{MXMZfX(MX)fZ(MZ)}-1,


N-n
E(e2e3) =  N  (4n)-1{4P11(x,z)-1}{MXMZfX(MX)fZ(MZ)}-1,


N-n
E(e2e4) =  N  (4n)-1{4P11(x,z)-1}{MXMZfX(MX)fZ(MZ)}-1,


N-n


E(e3e4) =  N  (4n)-1(fZ(MZ)MZ)-2


where it is assumed that as N→∞ the distribution of the trivariate variable (X,Y,Z) approaches a
continuous distribution with marginal densities fX(x), fY(y) and fZ(z) for X, Y and Z respectively.
This assumption holds in particular under a superpopulation model framework, treating the
values of (X, Y, Z) in the population as a realization of N independent observations from a
continuous distribution. We also assume that fY(MY), fX(MX) and fZ(MZ) are positive.
Under these conditions, the sample median M̂ Y is consistent and asymptotically normal (Gross,
1980) with mean MY and variance
 N −m
−1
−2

(4m ) {f Y (M Y )}
 N 
In this paper we have suggested a class of estimators for MY using information on two auxiliary
variables X and Z in double sampling and analyzes its properties.
2. SUGGESTED CLASS OF ESTIMATORS
Motivated by Srivastava (1971), we suggest a class of estimators of MY of Y as
{
}
(g )
(g )
g = Mˆ Y : Mˆ Y = M Y g (u , v )
Mˆ
where u = 1X , v =
Mˆ
X
(2.1)
Mˆ Z1
and g(u,v) is a function of u and v such that g(1,1)=1 and such that it
Mˆ
Z
satisfies the following conditions.
1.
Whatever be the samples (Sn and Sm) chosen, let (u,v) assume values in a closed convex
sub-space, P, of the two dimensional real space containing the point (1,1).
2.
The function g(u,v) is continuous in P, such that g(1,1)=1.
3.
The first and second order partial derivatives of g(u,v) exist and are also continuous in P.
Expanding g(u,v) about the point (1,1) in a second order Taylor's series and taking expectations,
it is found that
(
)
g
E Mˆ Y ( ) = M Y + 0(n −1 )
so the bias is of order n−1.
Using a first order Taylor's series expansion around the point (1,1) and noting that g(1,1)=1, we
have
( )
(g )
Mˆ Y ≅ M Y [1 + e0 + (e1 − e2 )g1 (1,1) + e4 g 2 (1,1) + 0 n −1 ]
or
(M ( ) − M ) ≅ M [e + (e
g
Y
Y
Y
0
1
− e2 )g 1 (1,1) + e4 g 2 (1,1)] (2.2)
where g1(1,1) and g2(1,1) denote first order partial derivatives of g(u,v) with respect to u and v
respectively around the point (1,1).
(g)
Squaring both sides in (2.2) and then taking expectations, we get the variance of Mˆ Y
to the
first degree of approximation, as
(
)
(g )
Var Mˆ Y
=
where
1
2
4( f Y (M Y ))
 1 1   1 1 
1 1  
 m − N  +  m − n  A +  n − N  B ,


 
 

(2.3)
 M f (M ) 

 M f (M ) 
A =  Y Y Y  g1 (1,1)  Y Y Y  g1 (1,1) + 2(4 P11 (x, y ) − 1)
 M X f X (M X ) 
 M X f X (M X ) 

(2.4)
 M f (M ) 

 M f (M ) 
B =  Y Y Y  g Z (1,1) Y Y Y  g 2 (1,1) + 2(4 P11 ( y, z ) − 1)
 M Z f Z (M Z ) 
 M Z f Z (M Z ) 

(2.5)
g
The variance of M̂ Y ( ) in (2.3) is minimized for
 M f (M X ) 
g1 (1,1) = − X X
(4 P11 (x, y ) − 1)
 M Y f Y (M Y ) 
 M f (M Z ) 
g 2 (1,1) = − Z Z
(4 P11 ( y, z ) − 1)
 M Y f Y (M Y ) 
(2.6)
g
Thus the resulting (minimum) variance of M Y ( ) is given by
(
)
(g )
min. Var Mˆ Y
=
1
2
4( f Y (M Y ))
 1 1   1 1 

2
1 1 
 m − N  −  m − n (4 P11 (x, y ) − 1) −  n − N (4 P11 ( y, z ) − 1)
(2.7)
Now, we proved the following theorem.
Theorem 2.1 - Up to terms of order n-1,
(
) 4( f
1
2
y (M Y )
with equality holding if
Var M̂ Y g ≥
)
 1 1   1 1 
2
2
1 1 
 m − N  −  m − n (4 P11 (x, y ) − 1) −  n − N (4 P11 ( y, z ) − 1) 
 M f (M ) 
g1 (1,1) = − x x x (4 P11 (x, y ) − 1)
 M Y f Y (M Y ) 
 M f (M z ) 
g 2 (1,1) = − z z
(4 P11 ( y, z ) − 1)
 M Y f Y (M Y ) 
It is interesting to note that the lower bound of the variance of M̂ y
the linear regression estimator
(
)
(g )
(
at (2.1) is the variance of
(l )
Mˆ Y = Mˆ Y + dˆ1 Mˆ 1X − Mˆ X + dˆ 2 M Z − Mˆ Z1
)
(2.8)
where
(
(
fˆ (Mˆ
fˆ (Mˆ
)
)
)(4 pˆ
)
fˆ Mˆ x
(4 pˆ 11 (x, y ) − 1),
dˆ1 = X
fˆY Mˆ y
dˆ 2 =
Z
Z
Y
Y
11
( y, z ) − 1),
with pˆ 11 (x, y ) and pˆ 11 ( y, z ) being the sample analogues of the p11 (x, y ) and p11 ( y, z )
respectively and fˆY Mˆ Y , fˆX (M X ) and fˆZ (M Z ) can be obtained by following Silverman (1986).
( )
Any parametric function g(u,v) satisfying the conditions (1), (2) and (3) can generate an
asymptotically acceptable estimator. The class of such estimators are large. The following
simple functions g(u,v) give even estimators of the class
g (1) (u , v ) = u α v β , g (2 ) (u, v ) =
1 + α (u − 1)
,
1 − β (v − 1)
g (3 ) (u , v ) = 1 + α (u − 1) + β (v − 1), g (4 ) (u , v ) = {1 − α (u − 1) − β (v − 1)}
g (5 ) (u , v ) = w1u α + w2 v β , w1 + w2 = 1
g (6 ) (u, v ) = αu + (1 − α )v β , g (7 ) (u, v ) = exp{α (u − 1) + β (v − 1)}
−1
(g )
Let the seven estimators generated by g(i)(u,v) be denoted by Mˆ Yi = Mˆ Y g (i ) (u , v ), (i = 1 to7 ) . It
is easily seen that the optimum values of the parameters α,β,wi(i-1,2) are given by the right hand
sides of (2.6).
3. A WIDER CLASS OF ESTIMATORS
The class of estimators (2.1) does not include the estimator
(
)
(
)
Mˆ Yd = Mˆ Y + d1 Mˆ 1X − M X + d 2 M Z − Mˆ Z1 , (d1 , d 2 )
being constants.
However, it is easily shown that if we consider a class of estimators wider than (2.1), defined by
(
(G )
Mˆ Y = G1 Mˆ Y , u , v
)
(3.1)
of MY, where G(⋅) is a function of M̂ Y , u and v such that G (M Y ,1,1) = M Y and G1 (M Y ,1,1) = 1 .
G1 (M Y ,1,1) denoting the first partial derivative of G(⋅) with respect to M̂ Y .
Proceeding as in Section 2 it is easily seen that the bias of M̂ Y
order of terms, the variance of M̂ Y
(
Var M̂ Y
(G )
) = 4( f
(G )
(G )
is of the order n−1 and up to this
is given by
1
2
Y (M Y ))
 1 1   1 1  f Y (M Y ) 
[  −  +  − 

 m N   m n  M X f X (M X ) 
 f Y (M Y ) 

G2 (M Y ,1,1) + 2(4 P11 (x, y ) − 1)
G2 (M Y ,1,1)
 M X f X (M X ) 

 1 1  f Y (M Y )
+ − 
 n N  f Z (M Z )M Z
 f Y (M Y ) 

G3 (M Y ,1,1) + 2(4 P11 ( y, z ) − 1) ]

 M Z f Z (M Z ) 

(3.2)
where G2(MY1,1) and G3(MY1,1) denote the first partial derivatives of u and v respectively
around the point (MY,(1,1).
The variance of M̂ Y
(G )
is minimized for
 M f (M X ) 
(4 P11 (x, y ) − 1)
G2 (M Y ,1,1) = − X X
 f Y (M Y ) 
 M f (M Z ) 
(4 P11 ( y, z ) − 1)
G3 (M Y ,1,1) = − Z Z
(
)
f
M
 Y Y 
Substitution of (3.3) in (3.2) yields the minimum variance of M̂ Y
(G )
as
(3.3)
(
min. Var M̂ Y
(G )
) = 4( f
1
 1 1   1 1
1 1 
2
2
−  −  − (4 P11 (x, y ) − 1) −  − (4 P11 ( y, z ) − 1) ]
[
2 
m N  m n
n N 
Y (M Y )) 
(
= min.Var M̂ Y
(g)
)
(3.4)
Thus we established the following theorem. Theorem 3.1 - Up to terms of order n-1,
(
) ≥ 4( f
 1 1   1 1 
1
1 1 
2
2
−  −  − (4 P11 (x, y ) − 1) −  − (4 P11 ( y, z ) − 1) 
2 
m N  m n
n N 

Y (M Y )) 
with equality holding if
Var M̂ Y
(G )
 f (M X )M X
G2 (M Y ,1,1) = − x
 f Y (M Y )

(4 P11 (x, y ) − 1)

 M f (M Z ) 
(4 P11 ( y, z ) − 1)
G3 (M Y ,1,1) = − Z Z
(
)
f
M
Y
Y


If the information on second auxiliary variable z is not used, then the class of estimators M̂ Y
reduces to the class of estimators of MY as
(
(H )
Mˆ Y = H Mˆ Y , u
(
)
(
)
(G )
(3.5)
)
where H Mˆ Y , u is a function of Mˆ Y , u such that H (M Y ,1) = M Y and H 1 (M Y ,1) = 1,
H 1 (M Y ,1) =
∂H (⋅)

∂Mˆ Y  (M
. The estimator M̂ Y
Y
is reported by Singh et al (2001).
,1)
The minimum variance of M̂ Y
(
(H )
min.Var M̂ Y
(H )
(H )
) = 4( f
to the first degree of approximation is given by
1
2
Y (M Y ))
 1 1   1 1 
2
 m − N  −  m − n (4 P11 (x, y ) − 1) 
 



(3.6)
From (3.4) and (3.6) we have
(
minVar M̂ Y
(H )
)− min.Var(M̂ ( ) ) =  1n − N1  4( f
G
Y


1
(4 P11 ( y, z ) − 1)2
2
Y (M Y ))
which is always positive. Thus the proposed class of estimators M̂ Y
estimator M̂ Y
(H )
considered by Singh et al (2001).
(G )
(3.7)
is more efficient than the
4. ESTIMATOR BASED ON ESTIMATED OPTIMUM VALUES
We denote
α1 =
M X f X (M X )
(4 P11 (x, y ) − 1)
M Y f Y (M Y )
M f (M Z )
(4 P11 ( y, z ) − 1)
α2 = Z Z
M Y f Y (M Y )
(4.1)
In practice the optimum values of g1(1,1)(=-α1) and g2(1,1)(=-α2) are not known. Then we use
to find out their sample estimates from the data at hand. Estimators of optimum value of g1(1,1)
and g2(1,1) are given as
gˆ 1 (1,1) = −αˆ 1
gˆ 2 (1,1) = −αˆ 2
(4.2)
where
αˆ 1 =
Mˆ X
Mˆ
Y
Mˆ
αˆ 2 = Z
Mˆ
Y
(
(
fˆ (Mˆ
fˆ (Mˆ
)
)
)(4 p
)
fˆX Mˆ X
(4 pˆ 11 (x, y ) − 1)
fˆ Mˆ
Y
Y
Z
Z
Y
Y
11
(4.3)
(y, z ) − 1)
Now following the procedure discussed in Singh and Singh (19xx) and Srivastava and Jhajj
(1983), we define the following class of estimators of MY (based on estimated optimum) as
(g *)
Mˆ Y
= Mˆ Y g * (u , v, αˆ 1 , αˆ 2 )
where g*(⋅) is a function of (u, v, αˆ 1 , αˆ 2 ) such that
g * (1,1, α 1α 2 ) = 1
g1* (1,1, α 1 , α 2 ) =
∂g * (⋅)
= −α 1
∂u (1,1,α1 ,α 2 )
g 2* (1,1, α 1 , α 2 ) =
∂g * (⋅)
= −α 2
∂v (1,1,α1 ,α 2 )
g 3* (1,1, α 1 , α 2 ) =
∂g * (⋅)
=0
∂αˆ 1 (1,1,α ,α )
1
g 4* (1,1, α 1 , α 2 ) =
2
∂g * (⋅)
=0
∂αˆ 2 (1,1,α ,α )
1
2
(4.4)
and such that it satisfies the following conditions:
1.
Whatever be the samples (Sn and Sm) chosen, let u , v, αˆ1αˆ 2 assume values in a closed
convex sub-space, S, of the four dimensional real space containing the point (1,1,α1,α2).
The function g*(u,v, α1, α2) continuous in S.
2.
3.
The first and second order partial derivatives of g * (u , v, αˆ 1 , αˆ 2 ) exst. and are also
continuous in S.
Under the above conditions, it can be shown that
(
)
( )
( g *)
E Mˆ Y
= M Y + 0 n −1
( g *)
is given by
and to the first degree of approximation, the variance of Mˆ Y
(
)
(
g
(g*)
Var Mˆ Y
= min.Var M̂ Y
(
where min.Var M̂ Y
(g )
)
(4.5)
) is given in (2.7).
A wider class of estimators of MY based on estimated optimum values is defined by
(
(G*)
Mˆ Y
= G * Mˆ Y , u , v, αˆ 1* , αˆ 2*
)
(4.6)
where
( )
( )
Mˆ fˆ (Mˆ )
(4 pˆ
=
fˆ (Mˆ )
αˆ 1* =
Mˆ X fˆX Mˆ X
(4 pˆ 11 (x, y ) − 1)
fˆ Mˆ
Y
αˆ
*
2
Z
Y
Z
Z
11
Y
(4.7)
( y, z ) − 1)
Y
are the estimates of
α 1* =
M x f X (M X )
(4 P11 (x, y ) − 1)
f Y (M Y )
M f (M )
α = Z Z Z (4 P11 ( y, z ) − 1)
f Y (M Y )
*
2
(
)
and G*(⋅) is a function of Mˆ Y , u , v, α 1* , αˆ 2* such that
(4.8)
(
)
(
)
G * M Y ,1,1,α 1* ,α 2* = M Y
∂G * (⋅)
∂Mˆ Y (M
G1* M Y ,1,1, α1* ,α 2* =
* *
Y ,1,1,α1 ,α 2
)
=1
(
)
∂G * (⋅)
= −α 1*
∂u (M Y ,1,1,α1* ,α 2* )
(
)
∂G * (⋅)
= −α 2*
∂v (M Y 1,1,∂1*,α 2* )
(
)
∂G * (⋅)
∂αˆ 1* (M
G2* M Y 1,1,α 1* ,α 2* =
G3* M Y 1,1, α 1* , α 2* =
G4* M Y 1,1, α 1* , α 2* =
(
)
G5* M Y 1,1, α1* , α 2* =
∂G * (⋅)
∂αˆ 2* (M
* *
Y ,1,1,α 1 ,α 2
* *
Y ,1,1,α1 ,α 2
)
)
=0
=0
Under these conditions it can be easily shown that
(
)
( )
(G* )
E Mˆ Y
= M Y + 0 n −1
(G* )
and to the first degree of approximation, the variance of Mˆ Y
is given by
(
)
(
G*
(G )
Var Mˆ Y = min.Var Mˆ Y
(
where min.Var M̂ Y
G
)
(4.9)
) is given in (3.4).
( g *)
It is to be mentioned that a large number of estimators can be generated from the classes Mˆ Y
(G* )
and Mˆ
based on estimated optimum values.
Y
5. EFFICIENCY OF THE SUGGESTED CLASS OF ESTIMATORS FOR FIXED COST
The appropriate estimator based on on single-phase sampling without using any auxiliary
variable is M̂ Y , whose variance is given by
( )
1
1 1
Var Mˆ Y =  − 
2
 m N  4( f Y (M Y ))
(5.1)
In case when we do not use any auxiliary character then the cost function is of the form C0-mC1,
where C0 and C1 are total cost and cost per unit of collecting information on the character Y.
The optimum value of the variance for the fixed cost C0 is given by

 G 1 
Opt.Var Mˆ Y = V0 
− 
C
 0 N 

( )
(5.2)
where
1
V0
(5.3)
4( f Y (M Y ))
2
When we use one auxiliary character X then the cost function is given by
C 0 = Gm + C 2 n,
(5.4)
where C2 is the cost per unit of collecting information on the auxiliary character Z.
The optimum sample sizes under (5.4) for which the minimum variance of M̂ Y
are
m opt =
[
n opt =
(H )
is optimum,
(V0 − V1 ) / C1
(V0 − V1 )C1 + V1C 2 ]
C0
[ (V
C 0 V1 / C 2
0
− V1 )C1 + V1C 2
(5.5)
]
where V1=V0(4P11(x,y)-1)2.
Putting these optimum values of m and n in the minimum variance expression of M̂ Y
(
we get the optimum min.Var M̂ Y
[
(
(H )
Opt. min.Var Mˆ Y
) as
(H )
)] (

=


(V0 − V1 )C1 +
C0
V1C 2
)
2
V0 
− 
N

(H )
in (3.6),
(5.7)
Similarly, when we use an additional character Z then the cost function is given by
C 0 = C1 m + (C 2 + C 3 )n
(5.8)
where C3 is the cost per unit of collecting information on character Z.
It is assumed that C1>C2>C3. The optimum values of m and n for fixed cost C0 which minimizes
(G )
(g )
the minimum variance of Mˆ Y orMˆ Y
(2.7) (or (3.4)) are given by
(
m opt =
n opt =
)
[
(V0 − V1 ) C1
(V0 − V1 )C1 + (C 2 + C3 )(V1 − V2 )]
C0
(5.9)
(V1 − V2 ) C 2 + C3
(V0 − V1 )C1 + (C 2 + C3 )(V1 − V2 )]
C0
[
(5.10)
where V2=V0(4P11(y,z)-1)2.
(
)
(g )
(G )
The optimum variance of Mˆ Y orMˆ Y
corresponding to optimal two-phase sampling strategy
is
[ (V0 − V1 )C1 + (C 2 + C 3 )(V1 − V2 )] 2 V2 
(g )
(G )
ˆ
ˆ
Opt min.Var M Y or min.Var M Y
=
− 
C0
N 

[
(
)
)]
(
(5.11)
Assuming large N, the proposed two phase sampling strategy would be profitable over single
phase sampling so long as
[Opt.Var(Mˆ )] > Opt.[min.Var(Mˆ ( ) )or min.Var(Mˆ ( ) )]
g
Y
i.e.
G
Y
Y
C 2 + C 3  V0 − V0 − V1 
<

C1
V1 − V2


(5.12)
When N is large, the proposed two phase sampling is more efficient than that Singh et al (2001)
strategy if
[
(
)
(
)]
[
(
(g )
(G )
(H )
< Opt min.Var Mˆ Y
Opt min.Var Mˆ Y or min.Var Mˆ Y
i.e.
C 2 + C3
V1
<
C1
V1 − V2
6. GENERALIZED CLASS OF ESTIMATORS
We suggest a class of estimators of MY as
)]
(5.13)
{
(
(F )
(F )
ℑ = Mˆ Y : Mˆ Y = F Mˆ Y , u, v, w
)}
(6.1)
where u = Mˆ X / Mˆ ′X , v = Mˆ Z′ / M Z , w = Mˆ Z / M Z and the function F(⋅) assumes a value in a
bounded closed convex subset W⊂ℜ4, which contains the point (MY,1,1,1)=T and is such that
F(T)=MY⇒F1(T)=1, F1(T) denoting the first order partial derivative of F(⋅) with respect to M̂ Y
around the point T=(MY,1,1,1). Using a first order Taylor's series expansion around the point T,
we get
(
)
(F )
Mˆ Y = F (T ) + Mˆ Y = M Y F1 (T ) + (u − 1) F2 (T ) + (v − 1) F3 (T ) + ( w − 1) F4 (T ) + 0(n −1 )
(6.2)
(
)
where F2(T), F3(T) and F4(T) denote the first order partial derivatives of F Mˆ Y , u, v, w with
respect to u, v and w around the point T respectively. Under the assumption that F(T)=MY and
F1(T)=1, we have the following theorem.
Theorem 6.1. Any estimator in ℑ is asymptotically unbiased and normal.
Proof: Following Kuk and Mak (1989), let PY, PX and PZ denote the proportion of Y, X and Z
values respectively for which Y≤MY, X≤MX and Z≤MZ; then we have
Mˆ Y − M Y =
Mˆ X − M X =
1
1
(1 − 2 PY ) + 0 p  n − 2 ,


2 f Y (M Y )
1
1
(1 − 2 PX ) + 0 p  n − 2 ,


2 f X (M X )
Mˆ ′x − M X =
1
1
(1 − 2 PX ) + 0 p  n − 2 


2 f X (M X )
Mˆ z − M Z =
1
1
(1 − 2 PZ ) + 0 p  n − 2 


2 f Z (M Z )
and
Mˆ Z′ − M Z =
1
1
(1 − 2 PZ ) + 0 p  n − 2 


2 f Z (M z )
Using these expressions in (6.2), we get the required results.
Expression (6.2) can be rewritten as
(
)
(F )
Mˆ Y − M Y ≅ Mˆ Y − M Y + (u − 1)F2 (T ) + (v − 1) F3 (T ) + ( w − 1) F4 (T )
or
(F )
Mˆ Y − M Y ≅ M Y e0 + (e1 − e2 )F2 (T ) + e4 F3 (T ) + e3 F4 (T )
Squaring both sides of (6.3) and then taking expectation, we get the variance of M̂ Y
degree of approximation, as
(
)
(F )
Var Mˆ Y
=
1
4( f Y (M Y ))
2
(6.3)
(F )
 1 1 
 1 1
1 1  
 m − N  A1 +  m − n  A2 +  n − N  A3 ,




 

to the first
(6.4)
where
  f (M )  2 2

 f (M ) 
A1 = 1 +  Y Y  F4 (T ) + 2(4 P11 ( y, z ) − 1) Y Y  F4 (T )
  M Z f Z (M Z ) 

 M Z f Z (M Z ) 
 f Y (M Y )  2

 F2 (T ) + 2(4 P11 ( x, y ) − 1) F2 (T )

 f Y (M Y )   M X f X (M X ) 


A2 = 


 f (M ) 
 M X f X (M X )  
+ 2(4 P11 ( x, z ) − 1 Y Y  F2 (T ) F4 (T ) 


 M z f Z (M Z ) 
 f Y (M Y )  2

 F3 (T ) + 2(4 P11 ( y, z ) − 1) F3 (T )

 f (M )   M Z f Z (M Z ) 

A3 =  Y Y  

 f (M ) 
 M Z f Z (M Z )  
+ 2 Y Y  F3 (T ) F4 (T ) 


 M Z f Z (M Z ) 
(
The Var M̂ Y
F2 (T ) = −
(F )
) at (6.4) is minimized for
[(4 P11 (x, y ) − 1) − (4 P11 (x, z ) − 1)(4 P11 ( y, z ) − 1)] M X f X (M X )
⋅
⋅
2
f Y (M Y )
[1 − (4 P11 (x, z ) − 1) ]
= −a 2 (say)
(6.5)
F3 (T ) = −
(4 P11 (x, z ) − 1)[(4 P11 (x, y ) − 1) − (4 P11 (y, z ) − 1)(4 P11 (x, z ) − 1)] ⋅ M Z f Z (M Z ) ⋅
2
f Y (M Y )
[1 − (4 P11 (x, z ) − 1) ]
= − a 2 (say)
F4 (T ) = −
[(4 P11 ( y, z ) − 1) − (4 P11 (x, y ) − 1)(4 P11 (x, z ) − 1)] M Z f Z (M Z )
⋅
⋅
2
f Y (M Y )
[1 − (4 P11 (x, z ) − 1) ]
= −a3 (say)
Thus the resulting (minimum) variance of M̂ Y
(
(F )
minVar Mˆ Y
)
(F )
is given by
2
 1 1
 
1 1 
D2



 −  −  − 
+ (4 P11 (x, y ) − 1) 
1
 m N   m n 1 − (4 P11 (x, z ) − 1)2
 
=
2 

4(f Y (M Y ))
1 1 
2


−  − (4 P11 ( y, z ) − 1)


n N 
1
D2
 1 1
(G )
ˆ
= min.Var M Y
− − 
2
2
 m n  4( f Y (M Y )) 1 − 4 P11 (x, z ) − 1
(6.6)
(
)
[
]
where
D = [(4 P11 ( y, z ) − 1) − (4 P11 (x, y ) − 1)(4 P11 (x.z ) − 1)]
(
and min.Var M̂ Y
(G )
(6.7)
) is given in (3.4)
(F )
Expression (6.6) clearly indicates that the proposed class of estimators M̂ Y is more efficient
(G )
(g )
(H )
than the class of estimator Mˆ Y or Mˆ Y
and hence the class of estimators M̂ Y suggested
(
)
by Singh et al (2001) and the estimator M̂ Y at its optimum conditions.
The estimator based on estimated optimum values is defined by
{
(
(F * ) ˆ F *
p* = Mˆ Y
: M Y = F * Mˆ Y , u , v, w, aˆ1 , aˆ 2 , aˆ 3
)}
where
aˆ1 =
[(4 pˆ 11 (x, y ) − 1) − (4 pˆ 11 (x, z ) − 1)(4 pˆ 11 ( y, z ) − 1)] Mˆ x fˆx (Mˆ x ) ⋅
2
[1 − (4 pˆ 11 (x, z ) − 1) ]
fˆY (Mˆ Y )
aˆ 2 =
(4 pˆ 11 (x, z ) − 1)[(4 pˆ 11 (x, y ) − 1) − (4 pˆ 11 ( y, z ) − 1)(4 pˆ 11 (x, z ) − 1)] Mˆ Z fˆZ (Mˆ Z )
⋅
[1 − (4 pˆ 11 (x, z ) − 1)2 ]
fˆY (Mˆ Y )
(6.8)
a3 =
[(4 pˆ 11 ( y, z ) − 1) − (4 pˆ 11 (x, y ) − 1)(4 pˆ 11 (x, z ) − 1)] Mˆ Z fˆZ (Mˆ Z ) ⋅
[1 − (4 pˆ 11 (x, z ) − 1)2 ]
fˆY (Mˆ Y )
(6.9)
are the sample estimates of a1, a2 and a3 given in (6.5) respectively, F*(⋅) is a function of
Mˆ Y , u , v, w, aˆ1 , aˆ 2 , aˆ 3 such that
(
)
F * (T *) = M Y
⇒ F1 * (T *) =
∂F * (⋅)
=1
∂Mˆ Y T *
F2 * (T *) =
∂F * (⋅)
= − a1
∂u T *
F3 * (T *) =
∂F * (⋅)
= −a2
∂v T *
F4 * (T *) =
∂F * (⋅)
= − a3
∂w T *
F5 * (T *) =
∂F * (⋅)
=0
∂aˆ1 T *
F6 * (T *) =
∂F * (⋅)
=0
∂aˆ 2 T *
F7 * (T *) =
∂F * (⋅)
=0
∂aˆ 3 T *
where T* = (MY,1,1,1,a1,a2,a3)
Under these conditions it can easily be shown that
(
)
( )
(F *)
E Mˆ Y
= M Y + 0 n −1
(F *)
and to the first degree of approximation, the variance of Mˆ Y
is given by
(
)
(
F
(F *)
Var Mˆ Y
= min.Var Mˆ Y
)
(6.10)
(
where min.Var M̂ Y
(F )
) is given in (6.6).
Under the cost function (5.8), the optimum values of m and n which minimizes the minimum
(F )
variance of M̂ Y is (6.6) are given by
m opt =
n opt =
(V0 − V1 − V3 )/ C1
(V0 − V1 − V3 )C1 + (V1 − V2 − V3 )(C 2 + C3 )]
C0
[
(6.11)
(V1 − V2 − V3 ) / C 2
(V0 − V1 − V3 )C1 + (V1 − V2 + V3 )(C 2 + C3 )]
C0
[
where
V3 =
D 2V0
[1 − (4P (x, z ) − 1) ]
(6.12)
2
11
(
for large N, the optimum value of min.Var M̂ Y
[
)] [ (V
(
(F )
Opt. min.Var Mˆ Y
=
0
(F )
) is given by
− V1 − V3 )C1 +
(V1 − V2 + V3 )(C 2 + C3 )]
C0
(6.13)
The proposed two-phase sampling strategy would be profitable over single phase-sampling so
(F )
long as Opt. Var Mˆ Y > Opt. min.Var M̂ Y
[ ( )]
[
)]
(
C 2 + C 3  V0 − V0 − V1 − V3
i.e.
<
c1
V1 − V2 + V3




2
(6.14)
It follows from (5.7) and (6.13) that
[
)]
(
[
(
H
(F )
Opt. min.Var Mˆ Y
< Opt. min.Var Mˆ Y
if
 V0 − V1 − V0 − V1 − V3


V1 − V2 + V3

  C 2 + C3
>
−
 
C
1
 
)]
V1
C2 
(V1 − V2 +V 3 )C1 C1 
for large N.
Further we note from (5.11) and (6.13) that
[
(
)]
[
(
G
(F )
(g )
Opt. min.Var Mˆ Y
< Opt. min.Var Mˆ Y orMˆ Y
)]
(6.15)
if
C 2 + C 3  (V0 − V1 ) − (V0 − V1 − V3 ) 
<

C1
 (V1 − V2 + V3 ) − V1 − V2 
2
(6.16)
REFERENCES
Chand, L. (1975): Some ratio-type estimators based on two or more auxiliary variables.
Unpublished Ph.D. dissertation, Iowa State University, Ames, Iowa.
Francisco, C.A. and Fuller, W.A. (1991): Quntile estimation with a complex survey design.
Ann. Statist. 19, 454-469.
Kiregyera, B. (1980): A chain ratio-type estimator in finite population double sampling using
two auxiliary variables. Metrika, 27, 217-223.
Kiregyera, B. (1984): Regression-type estimators using two auxiliary variables and the model of
double sampling from finite populations. Metrika, 31, 215-226.
Kuk, Y.C.A. and Mak, T.K. (1989): Median estimation in the presence of auxiliary information.
J.R. Statist. Soc. B, (2), 261-269.
Sahoo, J. and Sahoo, L.N. (1993): A class of estimators in two-phase sampling using two
auxiliary variables. Jour. Ind. Statist. Assoc., 31, 107-114.
Singh, S., Joarder, A.H. and Tracy, D.S. (2001): Median estimation using double sampling. Aust.
N.Z. J. Statist. 43(1), 33-46.
Singh, H.P. (1993): A chain ratio-cum-difference estimator using two auxiliary variates in
double sampling. Journal of Raishankar University, 6, (B) (Science), 79-83.
Srivenkataramana, T. and Tracy, D.S. (1989): Two-phase sampling for selection with probability
proportional to size in sample surveys. Biometrika, 76, 818-821.
Srivastava, S.K. (1971): A generalized estimator for the mean of a finite population using
multiauxiliary information. Jour. Amer. Statist. Assoc. 66, 404-407.
Srivastava, S.K. and Jhajj, H.S. (1983): A class of estimators of the population mean using multiauxiliary information. Cal. Statist. Assoc. Bull., 32, 47-56.
Download