9.1. How is equation (9.1) derived?

advertisement
9.1. How is equation (9.1) derived?
In this appendix we consider the case of repeat surveys where we are interested in the difference of prevalence with respect to an earlier estimate, particularly a decline. Let be the overall
prevalence estimated in the former survey. On the basis of the same survey data, also the between
cluster variation for the former survey can be estimated as illustrated in section 5.2 above. Let
denote such an estimate. Hence, for the subsequent survey, is a significant value for the actual
population prevalence . It can be used as a known threshold or cut-off such as two competing
statistical hypotheses may be stated. The null hypothesis:
signifying either stable or increasing prevalence, versus the alternative hypothesis:
which signifies decline in prevalence in the time period between the two consecutive surveys.
Under this scenario the appropriate statistical approach is not that of the precision of an estimate,
but rather a hypothesis test. More specifically, we would like to assess the statistical evidence and
collect evidence in favour of one of the two hypotheses. Survey data are then used to verify the null
hypothesis at a stated significance level, usually 95% meaning with a 5% probability of mistaking
by rejecting
in favour of
(Type I error). However also the risk of mistaking by accepting
exists, which in a probabilistic perspective is a totally different error (Type II error) so that it has to
be considered while planning the sample size. The probability of not mistaking by rejecting
in
favour of , i.e. the complement of the Type II error, is the test power. It will be illustrated here
the strategy of sample size computation in repeat surveys which allows for both controlling the
test power and at the same time accounting for the cluster sampling design. However for every
statistical test there exists a probabilistic trade off between the significance level and the power:
choosing a large significance level too close to 1 leads to low power close to 0, unless the sample
size increases. Three key evaluations have to be made in order to plan the sample size:
1. to choose the significance level, usually 0.95. This leaves a 5% probability of incorrect
rejection of
i.e. to conclude erroneously in favor of a decline in the repeated survey
. Being a probability, the mini2. to choose the minimum power desired for the test, say
mum power must lie in the interval . It is generally set less than the significance level, for instance 0.80 or 0.90. This means a 20% or respectively a 10% probability of false acceptance
of , i.e. to conclude erroneously against a decline in the repeated survey
3. to anticipate with a “prior guess” the magnitude of the between-cluster variation as illustrated in section 5.2. Moreover, as discussed in section 5.5, it is usually easier to anticipate
a “prior guess” in terms of the coefficient of variation as defined in section 5.2.4. Thus in
the following we will refer to whenever such an anticipation is needed.
In more details with regard to point 3, let be the estimated prevalence computed from the more
recent survey data and let be its variance. Plausibly is different for different sampling designs.
It would equal
under simple random sampling. In a clustered-sampled TB prevalence
survey, as illustrated in section 5.1,
is inflated by the variability among the (actual) prevalences
at cluster level
(1)
where
is the (absolute) measure of the between-cluster variation, and is the (constant) cluster size of individuals to be surveyed within each of clusters randomly selected from
the collection of all clusters partitioning the eligible population. Therefore, the total sample size is
and the global estimated prevalence at national level is given by the average of cluster
estimates. Note that tends to be an increasing function of the cluster size and a decreasing
function of the number of cluster surveyed . As a matter of fact, although the practical split of the
total sample size into and as discussed in section 5.1 would be a compromise among statistical, logistic and budgetary requirements, math indicates that larger versus smaller improve the
estimate’s precision (by reducing the magnitude of in equation (1) and ultimately improving the
stability of .
The test we are interested in is one-tailed and technically relies upon the usual Normal approximation for large samples. With the customary 95% significance level, the one-tailed z-score 1.65 is
involved. Statistical evidence would indicate to reject in favour of a the decline stated in , with
error probability 5% of false refusal, when the following relation holds:
(2)
which is equivalent to a p-value less than
. The term in equation (2) denotes the variance
of estimator when the null hypotheses of no decline holds. We now look for the sample size
ensuring the chosen minimum power
:
(3)
Let be a value for the population prevalence indicating a decline, i.e.
hold. Equation (3) can be rewritten as:
< , so that
would
(4)
satisfying equation (4)
Again a Normal approximation is used so that a z-score of probability
is now involved, noted by . For instance, by choosing 80% or 90% as minimum power
, the
z-score would equal 0.84 or 1.28 respectively. By substituting equation (1), i.e. under a cluster sample survey, and by expressing the between-cluster variation in the more recent survey by
means of the coefficient of variation
, equation (4) after some algebra provides the following equation for sample size computation:
(5)
where
denotes the coefficient of variation of the cluster-specific true prevalences by assuming
as the true overall prevalence in the more recent survey so that
holds. For equation (5) to be
practically implemented, a “prior guess” on is needed as stated in point 3 above.
In case of no clustering or no variation in prevalence between clusters we have =1 and = =0
so that equation (5) reduces to the standard formula for simple random sampling. On the opposite
extreme situation, equals its upper bound
stituting in equation (5) gives the worst scenario:
and
equals its maximum
. Sub-
(6)
Equation (6) supplies a sample size planning strategy for repeat cluster surveys without requiring
any “prior guess” upon the between-cluster variability still ensuring the chosen minimum power.
The strategy suggests considering times the sample size needed under a simple random one-toone individual sampling. As a consequence it produces larger sample sizes than equation (5) vastly
unnecessary as long as reality deviates from the assumed worst scenario. Thus it is recommendable
only if a “prior guess” on were highly hazardous and/or for high TB prevalence countries. Again
the finite population correction, as illustrated in section 5.6, is appropriate and applies straightforward to the result of sample size equation (5), or equivalently the worst scenario equation (6).
Whatever sample size computation approach is used, either equation (5) or equation (6), notice
that the higher the chosen power
is, the larger the sample size needed to achieve that power.
Moreover, the greater the distance between the two competing values and , i.e. between the
former estimated prevalence and the assumed decreased prevalence in the subsequent survey,
the smaller the sample size required for the test to achieve the chosen power. On the other hand,
highly competing values and close to each other as by assuming a small decline or in case of
a short time between the repeat surveys, would require larger sample sizes to provide sufficient
statistical evidence to make a decision either in favour or against the decline.
General references
1. Fleiss, Levin and Paik, 2003, Statistical Methods for Rates and Proportions, Wiley, 3rd Ed. (pag. 30-34)
2. Hayes and Moulton, 2009, Cluster Randomized Trials, Chapman&Hall (chapter 7)
Download