Uploaded by Mike Hidiroglou

Brewer, K.R.W. (1963b). Ratio estimation and finite populations

advertisement
THE AUSTRALIAN
JOURNAL OF STATETICS
Published by
THE STATISTICAL SOCIETY OF AUSTRALIA
~..____
VOL. 5 ,
No. 3
__
~
___.__
__
NOVE~EIC
1963
,
RATIO ESTIMATION AND FINITE POPULATIONS : SOME
RESULTS DEDUCIBLE FROM T H E ASSUMPTION OF AN
UNDERLYING STOCHASTIC P R O C E S S
K. Lt. W. BREWEE
Ciimmonwealth Bureuu of Cefi.vt~and statistics, Cunbema
1. Introduction
Practically all the formulae used in the application of sampling
methods to finite populations are derived solely from the properties
of the particular finite population concerned. The population is
treated as an entity in itself, completely independent of any stochastic
process which may have generated it. As a result, a number of
important questions have hitherto remained unanswered and, in a
sense, unanswerable. In this paper consideration is given to the
implications of the existence of an underlying stochastic process, and,
on the basis of this assumption, the following results are obtained:
(1)A n expression for the conditional variance of a ratio estimator,
subject to a particular sample of population units having been
selected.
(2) The optimum probabilities of selection of the individual
units in the population.
(3) The likely accuracy of a ratio estimate baged on the largest
population units deliberately selected (a " partial collection ").
(4) The extent of the diminution in the variance of the ratio
estimator obtained by sampling without replacement (with
total probability of selection proportional to size or some
function of size) instead of with replacement (with the
probability of selection a t each draw proportional t o size
or the same function of size).
(5) An extension of the expressions and estimates for conditional
variance to two-stage sampling. (These formulae can easily
be generalized to any desired number of stages.)
Manuscript received September 13, 1962 ; revised June 18, 1963.
94
K. R. W. BREWER
2. A Generalized Form of the Ratio Estimator
For the purpose of this discussion, '' ratio estimation " will be
taken to mean the use of any formula in which the ratio of the item P
to the benchmark item Z is estimated by the ratio of some linear
function of the sample values y, ( i = l , 2 , . . ., rn referring to the
order of selection in the sample) to the same linear funetion of the
sample values xi, in which each coefficient in the linear function can
depend only on the benchmark item value of the particular sample
unit to which it applies. Using this definition, the expression " ratio
estimation " includrbs three important particular cases :
(1)Unbiased estimation with proba.bility of selection proportional
to size, '' size " being in this instance the values of the benchmark item Z, (I=l, 2, . ., N referring to some ordering
of units within the population). The coefficients in the
linear function of the y i and x i are proportional to ~$7'and
the estimator of the ratio Y / Z reduces to n-lC'(y,/z,).
NOTE: C' is used in this paper t o denote 8 summation
over i = l , 2 , . ., n and Z is used for a summation over
I=1, 2 , . . , , N .
(2) Ordinary ratio estimation with equal probabilities of selection
for all units. The coefficients of the linear functions are
all unity and the estimator of the ratio Y/Z is
C.'Yi/X'Z,.
(3) A modified form of regression estimation in which it is
assumed that the constant term is zero. The coeBcients
are proportional to zi and the estimator of P/Z is
.
.
XrYpJZ'Z;.
It will be noted that in the first two cases the probabilities of
selection were specifled, and, because in both these cases the coefacients
of the linear functions were inversely proportional to the probabilities
of selection, the estimators were consistent. The modified regression
estimator is a. coneistent estimator when selection is made with
probability inversely proportional to size.
The discussion in this paper will be limited (cxcept in Section 6)
to such consistent ratio estimators. The consistent ratio estimator
of Y is then y", where
(1f
y' =zrp;'yilz'p;'zi
where P, ia the probability of selecting the Ith population unit for
the sample a t any draw, and p i is the particular value of P, corresponding to the sample unit selected a t the i t h draw.
It should be noted that, when sampling is without replacement,
P I cannot strictly be regarded as the probability of seleotion of the I t h
population unit a t any single draw. It is therefore necessary to define
P , as one nth part of the total probability of selection of &heI t h population unit in a sample of n. Except where the contrary is explicitly
stated, it will be taken that " selection without replacement " means
strictly what it says, e.g. if systematic selection from a randomly
ordered population (with probability proportional to size) is used to
ensure selection without replacement, then the size of each population
unit should be no larger than the skip interval.
RATIO ESTIMATION A N D F I N I T E POPULATIONS
95
3. The Concept of an Underlying Stochastic Process
The typical case, in which ratio estimation is employed to estimate
the total value of the item Y from a sample, is that in which the values
PI are in some sense dependent on the known values Z , of the benchmark item. To take an imaginary example, if the problem were to
estimate the production of butter in Australia each month, the value
of butter production PI by a given factory in 8 given month in fact
depends to a large extent on the known value of butter production
2, by the same factory over the year covered by the last Factory
Census. Alternatively, it may be regarded as dependent on the known
wage bill for that factory in that month, in which case " wages "
wonld be zl useful benchmark item. I n other words, given tho value
of the benchmark item 2, we may make a stochastic estimate of Y,.
The existence of a stochastic relation between Y , and 2, is, in
faat, a hidden assumption behind the decision to use ratio estimation
in nearly all practical cases. For although such a decision is justified
formally on the grounds of a high correlation between Y , and Z,,
the ground for assuming that a high correlation exists is almost
invariably that there is some form of stochastic dependents between
PI and 2,.
From this point of view, the actual finite population with which
the sampling statistician is confronted may be regarded as one
particular state of affairs (namely the state of affairs which in fact
exists) from all the possible states of affairs which might have existed,
given the benchmark item information. It may, in fact, be regarded
m a sample of one from an infinite number of finite populations, all
with the same values of Z , but with values of P, varying from
population to population and stochastically dependent on the 2,.
The simplest form which this dependence can take, in which
case the generalized form of ratio estimation described in the previous
section is appropriate, is one which appears to have been first suggested
by Cochran (1953)-see particularly pages 123-4 and 211-2. The
form of this dependence is that the value of PI for the kth population
is Pk,,where
(2)
ykI=PzIf
Uk,
and the ZJk, are independently dist'l-ibutedwith zero mean and variance
0;.
The value of 0; may be regarded as a function of 2,. If the P,
are subject to proportionate variations, c; will be proportional to
Zf. I€on the other hand they are subject to equal variations, u? will
be a constant. If they may be regarded as made up of large numbers
of small equal elementary units, each subject to equal and independent
variations, & will be proportional to 2,. In general it will be assumed
that
(31
u: =rPz;y
where e2 and y are constants. It may be seen intuitively that y
is nearly always between 0 and 1. According to Cochran (1953),
p. 212, it is usually between and 1 and this has been borne out in
empirical studies by the author. The model described by equations
(2) and (3) wiU form the basis for the results obtained in this paper.
An approach somewhat analogous to that used in this paper
wag employed in an article by Godambe (1955). Godambe's theory
+
96
K. R. W. BEEWEZi
virtually uses that part of Cochran's model which may be expressed by
equation (2) without equation (3), but the particular form in which
he uses it is applicable only to sampling without replacement. The
most important result in Godambe's paper duplicated in the present
one is his equation (15)' which corresponds to the one numbered (22)
in this paper. Some related problems were also investigated in a
very similar fashion in a paper by Des Raj (1958).
4. Conditional Variances for Particular Samples Drawn
Without Replacement
The customasy definition of the mean square error of a ratio
estimator is based on the average value of the squared deviation from
the population total over all possible samples of a given size. This
mean square error is a function of the individual population values
and of the numerical size of the sample only. The accuracy of any
sample estimate of a population total is usually measured by the
sample estimate of this mean square error, and though this estimate
may vary from sample to sample, the variations have no relevance
to the accuracy of each particular sample, but ere only unavoidable
deviations from the " true " mean square error. Thus no account is
taken of whether the sample units selected are (by chance) large or
small. It is, nevertheless, reasonable to suppose that if large units
were selected by chance the resulting sample estimates would be more
accurate than if small units were selected by chance. As long as the
population is regarded as an entity in itself, there is no way of
measuring these different accuracies and, indeed, no reason to assume
that the accuracies do differ. When the assumption is made of an
underlying stochastic process, more particularly of the model described
by equations (2) and (3),it becomes possible to meamre these different
acxxracies.
In order to do this it is necessary to regard the y" of equation ( I )
as an estimator not only of Y, but also of PZ, or, since the population
from which the sample is drawn is being regarded as one (the kth)
of an infinite number of possible populatiom, it is better to say that
y; must be regmded as an estimator both of Y, and PZ. I n some
situations the statistician is interested in all possible states of affairs
but in others he may be interested only in the one existing state.
This, in practical situations, determines whether he wants to estimate
Pk or PZ. The theory for both these situations is developed here.
Denoting by E , the expectation over all possible populations
subject to the particular values of 2, selected,
(4)
=pz.
Thus, given the model described by equation (2), y l is an unbiased
estimator of pZ no matter which populations units are selected in
sample, and therefore it is also an unbiased estimator of pZ over all
possible samples of size 71..
The variance of y; as an estimator of PZ depends on whether
selection is with or without replacement. The treatment of the
wse where selection is without replacement' i s the simpler, as it is
RATIO ESTIMATION AND FINITE POPULATIONS
97
known that every sample unit is a dserent population unit. Consequently, this case will be considered first, and sampling with replacement later (in Section 7). For convenience of notation, the subscript k
will be omitted from the remainder of this paper. Given, then, that
selection is without replacement, the conditional variance of y "
as an estimator of PZ, subject to the particular values of 2, selected,
is
(5)
(bzo;"li)=Ei(y" -PZ)2
=Z2E i{Z'pr' 2 ~j / Z f p c1 ~ i } 2
=Z"'pi2o~/(X'pi'zi)2
=Z2Z;'p&;/&'2
where a' is the unbiased sample estimator of 2. Similarly, the
conditional variance of y" regarded as an estimator of Y , subject
to the particular values of 2, selected, is
(";. 2 I i )= E i ( y " - - P ) 2 = E i [ ~ ) ~ ' ~ , / I ; ' p i ' a i - 2 ; U I ] 2
(6)
=Ei[X'(prlZ- X ' p ~ l ~ & j / Z p ~-Il;"~ U,]2
i
=z2C'(p;l --nx'Z-yS: ITZ22'2 +x "o;
where I;" indicates a summation over all the population units not
selected in sample.
from equation (3), equations (5) and (6) become
Substituting for CT;
(7)
and
(8)
I i) = z 2 o 2 ~ 9 ; ~ ~ i 7 / n 2 ~ ~ ~
(flzo9"
2
(CT$
I i) =."z2c'(p;l
--nz'Z-~)~~/TZ4'2+~"Z~].
Both these expressions are functions of the particular values of
2, selected, and their different values indicate the variation of the
accuracy of y" (as an estimator of PZ and P respectively) for different
samples.
It is to be noted that, if the model defined by equations (2) and (3)
describes perfectly the true dependence of PI on Z I , the significance
of equation (4) is that y" is an unbiased estimator of fI2 regardless of
the values of the individual P I , that is of how the sample is chosen.
5. Optimum Selection Probabilities
Denoting by az& and o$ the expectations over all possible
samples of size TZ of the expressions on the left-hand sides of equations
(5)and (6), these may be termed the expected variances of y" regarded
as an estimator of fXZ and Y respectively. The optimum values of P,
are those which minimize these two expressions. An approximate
formula for the optimum selection probabilities which minimize
azo> can be derived as follows:
98
K. R. W. BREWER
The expression in square brackets on the right-hand side tends to
unity as n increases. The difference, which is well known to be of
the order of @,-l,will be neglected for the remainder of this paper.
Thus
Bzoi- =Z2EC 'pi20:/ (EC' p a1zi)2
(10)
-n-qp-1I
-
2
CI.
Differentiating this expression partially with respect to P,,
holding the sum of the P, equal to unity by means of a Lagrangian
multiplier, and equating the resulting partial difYerentia1 coefficient
to zero, the optimum values of PI which result are given by
(11)
P,=o,/Co,.
The analogy is close between this formula and the usual optimum
allocation for stratified sa,mpling, where with the usual notation,
=n#h/xxh 8 ,
(121
h
as may be seen by putting n and all the N h in equation (12) equal
to unity.
More particularly, it follows from equations (3) and (11) that
(13)
PI =2qEG.
The optimum probabilities of selection are thus proportional
to
a.
This result also holds when the object is to minimize IT,"-,for
sampling,
that is
(14)
jjzc$.=E(y" -pZ)z
=E(~"-P)2+;E(P-fi2)2
B z ~ >can be split into two components, one from each stage of
2
2
=ou"+&SI.
The second term, which represents the first stage variance, is not
dependent on the PI. Hence whatever values of PI minimize
2
B z ~ $also minimize 0,".
The formula P,ccz', implies that, when y =0, equal probabilities
of selection are optimum, and when y=l, the optimum probabilities
of selection are proportional to size, in which case the ratio estimator
is unbiased.
6. Ratio Estimation from " Partial Collections "
A possible objection to the above derivation of optimum probabilities of selection is the following. Assuming that equations (2)
and (3) describe the manner in which the population has been
generated and that 0 l y 51, it is tedious but not ditlicult to show that,
for a given fixed sample size of n, the most accurate ratio estimate for
the population total can be constructed from a ' L partial collection ",
that is, the n units with the largest values of 2,. I n view of this,
the optimum probabilities calculated above appear to be irrelevant.
It must indeed be conceded that a ratio estimator based on such
a, partial collection is, on these assumptions, an unbiased estimator of
pZ and that its conditional variance is less than the expected variance
of any possible estimator of the type described by equation (1). On
the other hand it may well be unwise to abandon a sampling plan for
RATIO ESTIMATION AND FINITE POPULATIONS
99
which a variance can be calculated, regardless of any assumptions,
for one giving only a conditional variance valid on the assumption
that equations ( 2 ) and ( 3 ) describe the generation of the population.
Nevertheless, if the statistician is satisfied that equations ( 2 )
and (3) hold sufficiently well for his purpose, it is possible for him to
minimize the conditional variance of his estimate by choosing to
select the n units with the largest values of 2,. It is still possible,
and in fact usual, with partial collections, to use equation (1) for
estimation by writing n for p i ’ . The conditional variances appropriate
t o such a method of estimation can be obtained from equations ( 7 )
and (8) writing n for p i 1 and C’z, for 2’. This is arithmetically
equivalent t o treating the partial collection as though it were a sample
drawn with equal probabilities of selection for all units in the population. However, the conditional variances may be reduced still
further, in fact minimized, by treating the partial collection as though
it were a sample drawn with unequal probabilities. The conditional
variance of y” regarded as an estimator of PZ can be minimized by
treating p i 1 as though it were proportional to zioi2. This yields
the equations
y ” =zc ly izi0; 2 1 2 rx; a
&: 2 =ZC ’y I.a!
Z . - 2 y / c 5z: -2y
(15)
and
(azoy2 1 i) = Z 2 / C ’ ~ ; ~ =Z202/X’~:-2Y.
;2
(16)
The conditional variance of y“ regarded as an estimator of P
is minimized by treating p i ’ as though it were proportional to
(2- ~ ‘ z ~ ) x+ Z
~ :c’ ~
Z ;~~ ~This
~ . yields the equations
(17)
y” =C‘y, + ( Z - C ’ z , ) C ’ y i x i o ~ 2 / c ; ’ z ~ o ~ 2
=C’y, +(Z --c’zi)C’y,zi - 2 y / ~ zi
’ 2-2y
and
(18)
(o;” I i) =(Z-CC’zi)2/x’2;oi2+C”o~
= ( Z -x’Izi)Zo2/c’&-2y +02yz;-f.
It is interesting to note that the approach of this section can be used
to analyse the results obtained from any sample, no matter how it was
selected, provided the model of equations ( 2 ) and (3) is known to hold
to a sufficient degree of accuracy.
7. Comparison between Sampling Without and Sampling
With Replacement
In this section it will be assumed that sampling with replacement
ia made with the probability of selection at each draw the same for a
given population unit. The expression for the ratio estimator y’
is that given by equation (1). The expected variance of y” as an
estimator of PZ will be written ,&izw, the tiZde indicating that sampling
is with replacement.
Then
(19)
&r> =E(y” -pzy = z 2 E E , { I ; ’ p c l u i / ~ ’ ~ , ~ l z * } 2
=22E([zlp;2a:
+ Z’
i# j
Z’6j,,pi”0:]/(~lpil~i)3
I00
K. R. W. BREWER
. .,
where Z f Z f indicates a double summation over i=1, 2, .
n and
over j=1, 2, . .,n and where 6,,! is unity when the i t h and jth sample
units are the same population unit and zero in all other cases. Then
.
&$
(20)
"Z2{Epc20; +(n - l ) E 6 , , j p i 2 0 ~ } / n ( E p i l ~ i ) a
=n-'(ZP;lo; +(m -l)&$}.
The expected variance of y" as an estimator of P is therefore,
from equation (14),
(21)
6; *,-1{q--lo; -&T;j.
The corresponding expression for sampling without replacement
is obtained by comparing equations (10) and (14),
(221
0;
=,-'{XPF
b* -.Z$}.
The absolute amount by which the expected variance is reduced by
sampling without (as opposed to with) replacement is therefore
wl(a-1)h; and is independent of the probabilities of selection.
Thus if B worthwhile saving is achieved with equal probabilitiea of
selection, the same absolute saving with more nearly optimum
probabilities of selection is even more worthwhile.
The factor by which the expected variance is reduced is in fact
oy+Z(&--'
-n)o;/Z(PF1--I)oI. 2
i23)
There are a number of interesting special cases of this formula.
(i) If all P y l = N it reduces to (N--m)/(X-l) which is the wellknown reduction factor for equal probabilities of selection.
(ii) If selection is with probability proportional to size and
R,= Z I P ,
C;./Z;# L C ( R ; ~-n) C;/CRF~-1) aI.
2
(24)
If o?=o22iy, then with y=) this again gives ( N - m ) / ( N - l )
but with y = l it gives (l--nXB;)/(l-XB~).
(iii) If o; =02Z; r, then
(25)
rJ;"/&;"
+X(PF' -n)ZyC(PF' -1)Z;y.
If,in addition, optimum probabilities of selection are used with
=a,
p,
(26)
$&+$={Xz;-nCz;y}/{cz;y-Zz;y).
8. Selection with Minimum Replacement
When certain population units are so large that no scheme of
selection without replacement is compatible with the total probability
of selection being proportional t o size, the variance is minimized by
RATIO ESTIMATION AND FINITE POPULATIONS
101
taking these large units out of the population, enumerating them
separately, and selecting the remainder of the sample from the rump
population in the usual way. If this is inconvenient for any reason,
the population may be kept in its original state and each large unit
selected at least a certain minimum number of times and at most
once more often, as in systematic selection.
In dealing With selection without replacement P I was defined as
one nth part of the total probability of selection. When selection
is with minimum replacement PImust be defined as one nth part of
the number of times the J t h unit is certain of selection plus one nth
part of the probability of it being selected an extra time also. Thus
if the unit is large enough to be selected v I times with certainty, P I
will lie between v,/n and (v,+l)/n.
The expected variance of the ratio estimator may be derived
using a method analogous to that of the previous section.
Consider the expression Bj(ZpT1uJ2which appears in the
numerator in the formula for conditional variance. A unit which
appears in the sample v I times with certainty will contribute either
v, or v,+1 terms t o the summation, all of magnitude PT1UI. The
probability that there will be vI such terms is v,+l-nP, and the
Thus
probability that there will be v,+l such terms is nP,-v,.
the contribution of this term to
with probability v,+l-nP,
a , ( X ' p i ' ~ is
~ )$PI2&
~
and with probability nP,-v, its contributiou
is ( ~ , + l ) ~ P 1 ~ aHence
;.
its expected contribution is :
(27)
2
( v I + l -nP,)vipr
-2
2
or +(nP, - v I ) ( v I + 1 ) 2 p ~ ~ a ?
= ( ~ P , ( ~ v , +-Iv) , ( v I + ~ ) ) P ~ ~ o ; .
The expected variance of the retio estimator (regarded as an
estimator of pZ) is therefore
(28)
aza~-=n-2C{n~I(2vI
+I) - v I ( v I + l ) ) ~ ~ ~ a ?
and the expected variance of the ratio estimator (regarded as an
estimator of P) is
(29)
& = N - ~ x { ~ P , ( ~ v , + I ) - v , ( v , + l ) -n2P:}PF2c;.
9. Estimators of Model Parameters
The problem of estimating variance is effectively that of
estimating the o; which are the only unknown terms in all
the variance expressions considered in this paper. Since only one
value of PI is available, the individual o; cannot be estimated without
the use of the assumption expressed by equation (3), in which case
the problem is to estimate d and y. This in turn involves the
estimation of p.
(i) Estimation of p and aa with y assumed known
In practice the value of y for any population is likely to remain
stable over a long period and it would s d c e to determine its value
at Mequent intervals. In the meantime p and cr2 could be estimated,
usually from samples, using an assumed value of y, in the following
fashion.
102
K. R, W. BREWER
Assuming that the equations (2) and (3) describe the manner in
which the population has been generated, it was indicated in Section 6
that the minimum conditional variance estimator of p is
b =E;'yiz;-2Y/);'&-2y
(30)
and that its conditional variance is
I
(31)
(6; i)=a2/E,'X:-2Y.
Now from the fact that the y i are distributed with mean @xi and
variance C J ~ Z ; ~it, can be shown that the yi--bzi are distributed with
zero mean and variance
a2x;Y(1
(32)
--x?-'Y
z
-1
&xi
2-2y
)
2nd that consequently an unbiased estimator of c2 is
(33)
8 2 =m-"'{(y,
--bXi)2/zy(l -zi'-?y
/c'x;-2~)}.
(5) Simultaneous estimation of p, o2 and y
If y is unknown, it is necessary to assume that the distribution
of the U l is normal and to use the method of maximum likelihood.
The maximum likelihood estimator of p is then
(34)
where g is the maximum likelihood estimator of y. Except for the
substitution of g for y, it will be seen that this estimator is identical
with that of equation (30). The estimator of t
3,however, is smaller
than the unbiased estimator of equation (33),as is usual with maximum
likelihood estimators of variance, and takes the form
(35)
Putting
(36)
stfL=n-lC'(yi -tu,)
2/ x T".
ui =yi --bai,
equation (35) can be written
(37)
sML
2
=n-lXrzi:/z:g.
It is not possible to write an explicit formula for the maximum
likelihood estimator of y, but the method yields the following implicit
expression :
cov {&i2Q,
log X i } =o.
(38)
This is in accordance with what might be expected from another
line of reasoning, namely, that since over all possible populations
generated by the hypothesized stochastic process the covariance of
u;zT2ywith all possible functions of ziis zero, any reasonable estimator
of y might be expected to equa.te the covariance of the zi:xi2y with
some function of xi to zero. The method of maximum likelihood is
still needed to indicate that the most appropriate function of the st
for this purpose is t,he logarithmic.
RATIO ESTIMATION AND FINITE POPULATIONS
103
It does not appear possible to find any iterative formula for g,
and graphical methods must be used. However, there would not
appear to be much point in calculating the covarianc,es for assumed
values of y much less than & or more than unity. Empirical studies
suggest that the correlation between u ~ z ~ and
2 y log x i varies approximately linearly with y over a considerable ra'nge where the value of
the correlation is in the region of zero.
The author is indebted to Rlr. N. I?. Nett,hheim for suggesting
this method of estimnting y.
Unfortunately the logarithmic function lacks robustness in tbat
it varies very rapidly when xi is insignifica'ntly small, a'nd one or two
amall units could easily dominate the whole covariance expression in
equation (38). To avoid this, it is necessary either t o trunca,te the
distribution a t some arbitrary point or to use some other statistic
which approximates to log x i in the upper tail. The work of Simon
(1955)and of Simon and Bonini (1958)indicat'es that most populations
of the type to whic,li estimation sampling is usually applied fit closely
the Yule distribution, which approximates to a Pareto distxibution
in the upper tail. It would then follow that the logarithms of the
rank orders (reckoning the rank order of the largest to be one, that
of the second largest to be two, ctc.) or of the rank orders diminished
by +,would, in the upper tail, be spaced roughly proportionately to
the spacing of the logxi.
10. Conditional Variances for Two-Stage Sarnpling
So far, it has been shown that y" is, given the assumption of
the underlying stochastic process, an unbiased estimator of (32 and
that, if all the sample units are different population units, the conditional variance, subject to the selection of a particular set of Z I ,
is given by equation (7). It has also been shown that, regarding y"
as an estimator of P, the conditional variance of y" could be regarded
as the second stage variance of the hypothetical sample de,sign. This
conditional variance is given in equation (8). If, however, the y i
are not accurately known, but are themselves estimat,ed by y i (on
the basis of a second stage sample of the same kind) then
(39)
and
y; =Z&p $yii'/C;ip+(j#
(40)
y" =Zyp;lyi/Cfp ;'xi
where X21 is used to denote a summation over C = l , 2 , . . ., ni,
for a summation over I'==l,
2 , . . ., ATr and X i I indicates a summation
over a11 the second-stage population units in the Ith first-stage population unit which were not selected a t the second stage of sampling.
To arrive a t a formula for the conditional variance of y" it is
necessary to redefine the stochastic distribution by the following
set of equations :
(41)
Yr,. = P r Z r r ~ U I f
and
(42)
B*=P+,-; ui
+
104
I
(
. B. W.
BREWER
from which it follows that
(431
P,=PZ,+
~I+~c,,U,,J
where UIand Ur,' are independent random variables with zero means
and variances ts; and IS&, respectively and
(44)
(45)
2
2 2Y1
a1 =0181
bIr =c22p
I I"'..*
It then follows immediately that for selection without replacement
11. Estimators of Model Parameters in Two-Stage
Sampling
It is clear that the second stage parameters psi, crgi and y Z i can
be estimated in the same way as the parameters p, cr2 and y in Section 9,
but for the estimation of PI, CT; and y1 it is no longer possible to use
similar formulae, as the y i are not exactly known and the y i themselves
have a conditional variance arising from the second stage of sampling,
which can, however, be estimated.
It is theoretically possible to use maximum likelihood for
estimating the first stage parameters, but in practice the formulae
are unduly complicated, involving the simultaneous solution of two,
or, if y1 is unknown, three intrinsic equations. For practical purposes
it is probably best to use a sub-optimum procedure leading to a
simpler set of equations.
RATIO ESTI&iTIOW AYD FISITE POPULATIOKS
105
the conditional variance of 4, is
Eii’Oi=Eii.(~l;--b,z,)’
2
(52)
(ii)
Simultaneous estimation of
pl,
c
: and yl.
By analogy with equation (38) an estimator of y1 is given by
(54)
cov
i
[&;2QLz;2Q1
(sy$
z
I i’)(l-Y”&-2g’/C’ zi2--2Q1 1
..
As with equation (38), it is necessary to use graphical methods
of solution.
Ref erelzces
Cochran, W. G. (1953). Sampling Techniques. John Wiley & Sons, Inc., New York ;
Chapman & Hall Ltd., London.
Godambe, V. P. (1955). “ A Unified Theory of Sampling from Finite Populations ”
J . Roy. Statist.“Soc., B17, 269-278.
On the Relative Accuracy of Some Sampling Techniques.”
Raj, Des (1958).
J. dmer. Statist. Assoc., 53, 98-101,
Simon, H. A. (1955). “ On a Class of Skew Distribution Functions.” Bimetrika,
42, 425-440.
Simon, H. A., and Bonini, C. P. (1958). “ The Size Distribution of Business Firms. ”
American Economic Review, 48, 602-617.
Download