9.1. How is equation (9.1) derived? In this appendix we consider the case of repeat surveys where we are interested in the difference of prevalence with respect to an earlier estimate, particularly a decline. Let be the overall prevalence estimated in the former survey. On the basis of the same survey data, also the between cluster variation for the former survey can be estimated as illustrated in section 5.2 above. Let denote such an estimate. Hence, for the subsequent survey, is a significant value for the actual population prevalence . It can be used as a known threshold or cut-off such as two competing statistical hypotheses may be stated. The null hypothesis: signifying either stable or increasing prevalence, versus the alternative hypothesis: which signifies decline in prevalence in the time period between the two consecutive surveys. Under this scenario the appropriate statistical approach is not that of the precision of an estimate, but rather a hypothesis test. More specifically, we would like to assess the statistical evidence and collect evidence in favour of one of the two hypotheses. Survey data are then used to verify the null hypothesis at a stated significance level, usually 95% meaning with a 5% probability of mistaking by rejecting in favour of (Type I error). However also the risk of mistaking by accepting exists, which in a probabilistic perspective is a totally different error (Type II error) so that it has to be considered while planning the sample size. The probability of not mistaking by rejecting in favour of , i.e. the complement of the Type II error, is the test power. It will be illustrated here the strategy of sample size computation in repeat surveys which allows for both controlling the test power and at the same time accounting for the cluster sampling design. However for every statistical test there exists a probabilistic trade off between the significance level and the power: choosing a large significance level too close to 1 leads to low power close to 0, unless the sample size increases. Three key evaluations have to be made in order to plan the sample size: 1. to choose the significance level, usually 0.95. This leaves a 5% probability of incorrect rejection of i.e. to conclude erroneously in favor of a decline in the repeated survey . Being a probability, the mini2. to choose the minimum power desired for the test, say mum power must lie in the interval . It is generally set less than the significance level, for instance 0.80 or 0.90. This means a 20% or respectively a 10% probability of false acceptance of , i.e. to conclude erroneously against a decline in the repeated survey 3. to anticipate with a “prior guess” the magnitude of the between-cluster variation as illustrated in section 5.2. Moreover, as discussed in section 5.5, it is usually easier to anticipate a “prior guess” in terms of the coefficient of variation as defined in section 5.2.4. Thus in the following we will refer to whenever such an anticipation is needed. In more details with regard to point 3, let be the estimated prevalence computed from the more recent survey data and let be its variance. Plausibly is different for different sampling designs. It would equal under simple random sampling. In a clustered-sampled TB prevalence survey, as illustrated in section 5.1, is inflated by the variability among the (actual) prevalences at cluster level (1) where is the (absolute) measure of the between-cluster variation, and is the (constant) cluster size of individuals to be surveyed within each of clusters randomly selected from the collection of all clusters partitioning the eligible population. Therefore, the total sample size is and the global estimated prevalence at national level is given by the average of cluster estimates. Note that tends to be an increasing function of the cluster size and a decreasing function of the number of cluster surveyed . As a matter of fact, although the practical split of the total sample size into and as discussed in section 5.1 would be a compromise among statistical, logistic and budgetary requirements, math indicates that larger versus smaller improve the estimate’s precision (by reducing the magnitude of in equation (1) and ultimately improving the stability of . The test we are interested in is one-tailed and technically relies upon the usual Normal approximation for large samples. With the customary 95% significance level, the one-tailed z-score 1.65 is involved. Statistical evidence would indicate to reject in favour of a the decline stated in , with error probability 5% of false refusal, when the following relation holds: (2) which is equivalent to a p-value less than . The term in equation (2) denotes the variance of estimator when the null hypotheses of no decline holds. We now look for the sample size ensuring the chosen minimum power : (3) Let be a value for the population prevalence indicating a decline, i.e. hold. Equation (3) can be rewritten as: < , so that would (4) satisfying equation (4) Again a Normal approximation is used so that a z-score of probability is now involved, noted by . For instance, by choosing 80% or 90% as minimum power , the z-score would equal 0.84 or 1.28 respectively. By substituting equation (1), i.e. under a cluster sample survey, and by expressing the between-cluster variation in the more recent survey by means of the coefficient of variation , equation (4) after some algebra provides the following equation for sample size computation: (5) where denotes the coefficient of variation of the cluster-specific true prevalences by assuming as the true overall prevalence in the more recent survey so that holds. For equation (5) to be practically implemented, a “prior guess” on is needed as stated in point 3 above. In case of no clustering or no variation in prevalence between clusters we have =1 and = =0 so that equation (5) reduces to the standard formula for simple random sampling. On the opposite extreme situation, equals its upper bound stituting in equation (5) gives the worst scenario: and equals its maximum . Sub- (6) Equation (6) supplies a sample size planning strategy for repeat cluster surveys without requiring any “prior guess” upon the between-cluster variability still ensuring the chosen minimum power. The strategy suggests considering times the sample size needed under a simple random one-toone individual sampling. As a consequence it produces larger sample sizes than equation (5) vastly unnecessary as long as reality deviates from the assumed worst scenario. Thus it is recommendable only if a “prior guess” on were highly hazardous and/or for high TB prevalence countries. Again the finite population correction, as illustrated in section 5.6, is appropriate and applies straightforward to the result of sample size equation (5), or equivalently the worst scenario equation (6). Whatever sample size computation approach is used, either equation (5) or equation (6), notice that the higher the chosen power is, the larger the sample size needed to achieve that power. Moreover, the greater the distance between the two competing values and , i.e. between the former estimated prevalence and the assumed decreased prevalence in the subsequent survey, the smaller the sample size required for the test to achieve the chosen power. On the other hand, highly competing values and close to each other as by assuming a small decline or in case of a short time between the repeat surveys, would require larger sample sizes to provide sufficient statistical evidence to make a decision either in favour or against the decline. General references 1. Fleiss, Levin and Paik, 2003, Statistical Methods for Rates and Proportions, Wiley, 3rd Ed. (pag. 30-34) 2. Hayes and Moulton, 2009, Cluster Randomized Trials, Chapman&Hall (chapter 7)