ATLA 31, Supplement 1, 65–75, 2003 65 The Role of Control Groups in Mutagenicity Studies: Matching Biological and Statistical Relevance1 Dieter Hauschke,2 Torsten Hothorn3 and Juliane Schäfer4 of Biometry, ALTANA Pharma, 78467 Konstanz, Germany; 3Department of Medical Informatics, Biometry and Epidemiology, University of Erlangen-Nuremberg, 91054 Erlangen, Germany; 4Department of Statistics, University of Munich, 80539 Munich, Germany 2Department Summary — The statistical test of the conventional hypothesis of “no treatment effect” is commonly used in the evaluation of mutagenicity experiments. Failing to reject the hypothesis often leads to the conclusion in favour of safety. The major drawback of this indirect approach is that what is controlled by a prespecified level α is the probability of erroneously concluding hazard (producer risk). However, the primary concern of safety assessment is the control of the consumer risk, i.e. limiting the probability of erroneously concluding that a product is safe. In order to restrict this risk, safety has to be formulated as the alternative, and hazard, i.e. the opposite, has to be formulated as the hypothesis. The direct safety approach is examined for the case when the corresponding threshold value is expressed either as a fraction of the population mean for the negative control, or as a fraction of the difference between the positive and negative controls. Key words: biological relevance, Fieller confidence intervals, mutagenicity studies, test on equivalence. Introduction Statistical Proof of Hazard Before the administration of the first dose of a new compound to a human subject, a safety assessment has to be performed in mutagenicity studies. Statistical analysis plays a fundamental part in the interpretation of the data from the corresponding experiments. Usually, the conventional null hypothesis of “no difference in the effect” between the treatment and negative control group is tested. Failing to reject the null hypothesis often leads to the conclusion that the compound has no deleterious effect in the biological model concerned. The major drawback of this indirect procedure is that what is controlled by the pre-specified significance level is the probability of erroneously concluding hazard (producer risk). However, the primary concern of safety assessment is the control of consumer risk, i.e. limiting the probability of erroneously concluding that a product is safe. Thus, the adequate test problem should be formulated by reversing the null hypothesis and the alternative, and incorporating a threshold value defined a priori. A solution is derived for this problem for the case of normally distributed random variables, when the threshold is expressed either as a fraction of the population mean for the negative control, or as a fraction of the difference between the positive and negative controls. The following one-way layout represents a typical experimental design as used in genotoxicity assessment: 1In {Negative control, Dose1, Dose2, ..., Dosek, Positive control}. One objective of the analysis is to identify the noobserved adverse effect dose (NOAED), that is the highest experimental dose with no statistically increased safety effect relative to the negative control. The inclusion of a positive control with known mutagenic potential allows a check to be made on the sensitivity of the test system. Let Xij denote the observation of the primary endpoint for the jth experimental unit in the ith dose group Di (i = 0 denotes the negative control and i = k + 1 the positive control, respectively). It is assumed that these random variables are mutually independent and normally distributed with location parameters µi and unknown but common variances σ 2. Without loss of generality, it is assumed that the population means are positive, and that it is known a priori that, if there is a critical response to the substance, it will increase in magnitude, that is µi > µ0. Assuming that the mean response is a non-decreasing function of the dose level, i.e. µ0 ≤ µ1 ≤ µ2 ... ≤ µk, the conventional approach (proof of hazard) can be this paper, the terms “proof of hazard/safety” should be interpreted in a statistical sense for the underlying experimental conditions. D. Hauschke et al. 66 performed by the following sequential procedure (1), starting with an assessment of assay sensitivity: is tested by applying a trend test, e.g. Bartholomew’s (2). For a more-detailed discussion of other trend tests for the statistical analysis of monotone dose–response relationships in mutagenicity assays, see also Hothorn et al. (3). If H0k is rejected at level α, the test is repeated without the highest dose: alternative is demonstrated by measuring the strength of evidence against the null hypothesis. A way of directly concluding that a substance has no harmful effect is the proof of sufficient safety. This requires that the test problem should be formulated by reversing the null hypothesis and the alternative, and incorporating a threshold that quantifies the maximum tolerable increase of risk relative to the control. In the next section, the test procedure for the direct approach is derived, by assuming that the threshold is expressed either as a fraction of the population mean for the negative control, or as a fraction of the difference between positive and negative controls. Recently, these two definitions of a threshold value were also used in the validation of an internal standard in comet assay analysis (7). The first definition is also implicitly applied in the assessment of a potential mutagenic effect of a substance by the Ames assay. Therefore, one decides in favour of mutagenicity, if at least two doses produce a result more than two-fold the spontaneous background. H0k–1: µ0 = µ1 = µ2 = ... = µk–1 H1k–1: µ0 ≤ µ1 ≤ µ2 ≤ ... ≤ µk–1 and µ0 < µk–1. Statistical Proof of Safety H0: µk+1 – µ0 ≤ 0 (no assay sensitivity) H1: µk+1 – µ0 > 0 (assay sensitivity). A comparison between the doses and the negative control is only performed, if H0 was rejected at level α in favour of H1 (assay sensitivity with respect to the negative control) according to Student’s t test. Starting with all doses in the next step, the hypothesis H0k: µ0 = µ1 = µ2 = ... = µk H1k: µ0 ≤ µ1 ≤ µ2 ≤ ... ≤ µk and µ0 < µk In the case of a non-significant result ( p value > α), the procedure stops. In general, H0i, i = k, ..., 1, is tested at level α, if, and only if, all H0l have been rejected at level α, i < l, l = i + 1, ..., k. Hence, the NOAED is the highest dose Di for which H0i was not rejected. Based on the closed testing procedure, Maurer et al. (4) have shown that this a priori ordered test hierarchy controls the family-wise error, i.e. the error over all tested hypotheses. Obviously, the NOAED represents a statistical no-effect dose that depends on the power of the study. Hence, a less-sensitive mutagenicity experiment with a small sample size results in higher safe doses than the corresponding study with a larger sample size and lower variability, which is exactly the opposite of what is desired. On the other hand, a significant statistical result could provide evidence for the conclusion that there is a mutagenic effect of the treatment. However, even good laboratory practice with a large sample size and little experimental variation may lead to the problem that an unimportant difference will be statistically significant (5). The classical approach therefore often leads to the problem that statistical significance does not necessarily mean biological relevance, and that statistical non-significance does not necessarily correspond to biological irrelevance (6). The major reason for these difficulties involves the choice of the null hypothesis and the alternative. In statistical hypothesis testing, the null and alternative hypotheses are not treated equally, and this results in an inherent unbalance. The likelihood of the Regulatory requirements for new drug development allow the sponsor to proceed along the lines indicated by the fundamental assumptions that: a) drugs are considered non-efficacious until proven otherwise; and b) drugs are considered sufficiently safe until proven otherwise. Therefore, classical statistical testing directly controls the consumer risk for demonstrating efficacy, but only the producer risk for demonstrating sufficient safety. However, it is intuitively clear that the consumer risk should always be of primary concern. Therefore, the adequate test problem for mutagenic studies is formulated for the two-sample design as follows, providing consistency of the consumer risk for approval based on efficacy as well as on safety: H0i : µi – µ0 ≥ δ (dose Di is hazardous under test conditions) H1i : µi – µ0 < δ (dose Di is safe under test conditions), where (–∞, δ), δ > 0, denotes the safety range. Inherently, it is necessary to define a priori a minimally relevant safety threshold δ. This means that an increase of the safety endpoint up to δ is still acceptable. Hothorn & Hauschke (8) applied this concept for the one-way layout with k increasing doses. Instead of using the term NOAED, the authors introduced the definition of maximum safe dose (MAXSD) as follows: MAXSD = Di, where i = max(i: µj – µ0 < δ, j = 1,...,i). The role of control groups in mutagenicity studies It should be noted that this definition assumes only that all doses lower than MAXSD must also be safe. Hothorn & Hauschke (8) described the following sequentially rejecting procedure, controlling the family-wise error for the determination of the highest safe dose. Starting with the lowest dose, the shifted hypothesis H01 : µ1 – µ0 ≥ δ (dose D1 is hazardous under test conditions) H11 : µ1 – µ0 < δ (dose D1 is safe under test conditions) is tested by the two-sample t test. The procedure stops if H01 is not rejected and hence D1 could not be proven to be safe. If H01 is rejected at level α, the problem H02 : µ2 – µ0 ≥ δ (dose D2 is hazardous under test conditions) H12 : µ2 – µ0 < δ (dose D2 is safe under test conditions) is tested. Again, in the case of a non-significant result, the procedure stops. In general, H0i is tested at level α, if, and only if, all H0l have been rejected at level α, l < i, i = 1, ..., k. The MAXSD is the highest dose Di, i = 1, ..., k, for which the shifted null hypothesis H0i : µi – µ0 ≥ δ was rejected in favour of H1i : µi – µ0 < δ (safety), that is: – – X – X0 – δ ti = i ≤ –tα,n0+ni – 2, 1 1 S n +n 0 i Î where tα,ν is the (1 – α) percentile of the central – – t-distribution with ν degrees of freedom, Xi and X0 denote the sample means of dose Di and the negative control, n0 and ni are the corresponding sample sizes and S2 the pooled estimator of σ 2: n0 S2 = – ni – Σ (X0j – X0)2 + Σ (Xij – Xi)2 j=1 j=1 n0 + ni – 2 . In practice, there is often a reluctance to define δ a priori. If δ can only be specified a posteriori, the above stepwise procedure should be based on the classical confidence intervals, i.e. concluding safety of dose Di if the one-sided 100(1 – α)% confidence interval for µi – µ0 is included in the safety range: 1–∞, X– – X– i 0 + tα,n0+ni – 2 S Î n1 + n1 4 ⊂ (–∞,δ ). 0 i Specification of δ requires the statisticians and genetic toxicologists to think about what constitutes a minimally relevant difference; ideally, this should happen at the planning stage of the experiment, but not later than after the statistical analysis, when point estimates and confidence intervals have been calculated, and the results are to be discussed. 67 A more common situation in practice is that the value δ is expressed as a proportion of the unknown population mean µ0 of the negative control. Suppose that δ = ƒµ0, ƒ > 0, then the foregoing test problem can be formulated as: H0i: µi – µ0 ≥ ƒµ0 H1i: µi – µ0 < ƒµ0 which can be restated as: µ H0i: i ≥ 1 + ƒ µ0 µ H1i: i < 1 + ƒ µ0 where (–∞, 1 + ƒ) is the corresponding safety interval for the ratio of µi and µ0. By analogy, the maximum safe dose is defined as: µj MAXSD = Di, where i = max i: µ < 1 + ƒ, j = 1,...,i . 0 3 4 Sasabuchi (9) demonstrated that the size-α likelihood ratio test rejects the null hypothesis H0i, i = 1, ..., k, concerning the ratio of the two means, if: – – Xi – (1 + ƒ) X0 ti = ≤ –tα,n0+ni – 2. 1 (1 + ƒ)2 S n + n i 0 Î Hauschke et al. (10) have shown that the condition ti ≤ –tα,n0+ni–2 is equivalent to: θui ≤ 1 + ƒ and – X02 > a0 where θui = – – – – X0 Xi + Îa0 Xi2 + ai X02 – a0 ai – X02 – a0 a0 = S2 2 t , n0 α,n0+ni – 2 and ai = S2 2 t . ni α,n0+ni – 2 It should be noted that the one-sided 100(1 – α)% confidence interval (–∞,θui) for µi/µ0 is a special case of the more-general confidence interval according – to Fieller (11). This is because the estimators Xi – and X0 are uncorrelated. Therefore, the corresponding sequentially rejecting procedure based on either corresponding tests or confidence intervals can be easily applied to the situation where the parameter of interest is expressed as a ratio of location parameters. The corresponding threshold value for the difference µi – µ0 is δ = ƒµ0, which is equivalent to the condition that a dose D i is considered safe if µi < µ0 + δ = (1 + ƒ)µ0. This is illustrated in Figure 1. Obviously, this critical threshold value should be based on biological relevance rather than on statistical reasoning, thus taking into account correspon- D. Hauschke et al. 68 – – – – In this situation, the estimators Xi – X0 and Xk+1 – X0 are correlated, so the calculation of the one-sided 100(1 – α)% confidence interval (–∞,θui) for Figure 1: Graphical interpretation of the threshold µi – µ0 µk+1 – µ0 } ∆ µ0+δ µk+1 H0i H1i ding safety considerations. Using the definition δ = ƒµ0 is one possible way of relating statistical and biological relevance. Another approach is to incorporate the difference between the positive and the negative controls, i.e. ∆ = µk+1 – µ0. A dose Di is considered safe if the corresponding mean is not greater than the mean of the control plus a fraction of the difference between positive and negative control, that is µi < µ0 + δ = µ0 + ƒ(µk+1 – µ0), ƒ > 0 (see Figure 1). This formulation is equivalent to the condition that the threshold value for the difference µi – µ0 is δ = ƒ∆ = ƒ(µk+1 – µ0). The corresponding formulation of the test problem leads to the following one-sided shifted null hypotheses: An Example Adler & Kliesch (14) published raw data from a micronucleus mutagenicity assay on hydroquinone. The results for male mice at the 24-hour sampling time are given in Table 1. Table 2 provides the summary statistics for the negative control, the four dose levels of hydroquinone, and for the positive control, cyclophos- H0i: µi – µ0 ≥ ƒ(µk+1 – µ0) H1i: µi – µ0 < ƒ(µk+1 – µ0) Table 1: Number of micronuclei per animal and 2000 scored cells for the negative control, four doses of hydroquinone and the positive control cyclophosphamide which can be restated as: H0i: µi – µ0 ≥ƒ µk+1 – µ0 H1i: µi – µ0 < ƒ. µk+1 – µ0 Schäfer (12) has shown that H0i can be rejected, if: – – – Xi – ƒXk+1 – (1 – ƒ) X0 ti = ≤ –tα,n0+nk+1+ni – 3, ƒ2 1 (1 – ƒ)2 S n +n + n0 i k+1 Negative control 30mg/kg 50mg/kg 75mg/kg 100mg/kg 25mg/kg cyclophosphamide Î which is equivalent to: θui ≤ ƒ and 0 ≤ Gi < 1 where: tα,n0+nk+1+ni – 3 c 1 θui = 1 – G Ri – Gi c 0 + S Z k+1 i 5 – – Z = Xk+1 – X0, Îc (1 – G ) – 2c R + c i i 0 i c0 = n1 , 0 Gi = tα2,n0+nk+1+ni – 3 S2ck+1 Z2 3, 2, 2, 3, 2, 5, 1 5, 4, 4, 4, 2 7, 4, 6, 8, 6 9, 18, 13, 12, 18 22, 13, 23, 22, 20 33, 15, 32, 20 c02 2 R + Gi k+1 i c – – Xi – X0 Ri = – – Xk+1 – X0 ci = n1 + n1 , ck+1 = n 1 + n1 , i 0 k+1 0 Number of micronuclei/2000 cells Treatment group k+1 5 µ0 must be based on Fieller’s method (11). Analogously, the corresponding sequential rejecting procedure based on either corresponding tests or confidence intervals can be used. However, because the threshold value is defined as a fraction of the difference µk+1 – µ0, the selection of a suitable dose of the positive control is of outstanding importance. Increasing the difference by using a high dose of the positive control implies that the derived threshold might not be considered as a minimum acceptable increase in the safety endpoint. Thus, doses should not be so high that excessive responses are observed (13). The role of control groups in mutagenicity studies Table 2: 69 Sample means, sizes for the number of micronuclei and upper 95% confidence µ – µ µ limits for µi and µ i – 0µ , i = 1, ..., 4 k+1 0 0 Upper confidence limit for Treatment group i i i i i i = 0: Negative control = 1: 30mg/kg = 2: 50mg/kg = 3: 75mg/kg = k = 4: 100mg/kg = k + 1 = 5: 25mg/kg cyclophosphamide Sample mean Sample size µi µ0 µi – µ0 µk+1 – µ0 2.57 3.80 6.20 14.0 20.0 7 5 5 5 5 — 2.31 3.88 19.05 29.20 — 0.24 0.35 0.74 1.04 25.0 4 — — Obviously, safety cannot be concluded for the doses 50, 75 and 100mg/kg because they show an unacceptable increase relative to both the negative control and the difference between the positive and negative control. The low dose 30mg/kg shows only a slight increase, which might be regarded as biologically unimportant and therefore, could be considered as MAXSD. S functions for both standard Fieller confidence intervals (two-sample design) and correlated Fieller confidence intervals (many-to-one design) are given in Appendices 1 and 2. The output for the above example is in Appendix 3. All three files can be downloaded from http://www.bioinf.uni-hannover.de/INVITROSTAT. phamide. Additionally, the corresponding upper 95% confidence intervals for µi – µ0 µi and µk+1 – µ0 , i = 1, ..., 4, µ0 are given, which can be interpreted as the percentage of the mutagenic potency of positive minus negative control. Conclusions The consumer risk, i.e. limiting the probability of erroneously concluding safety, is not controlled by the classical testing approach (proof of hazard). Furthermore, it often leads to the problem of inequivalence between statistical significance and biological relevance. One major reason for this logical difficulty is clearly described by Fisher (15): “ . . . the null hypothesis is never proved or established, but is possibly disproved in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.” Thus, the adequate test problem should be formulated as a proof of safety by reversing the role of the null hypothesis and the alternative, and incorporating a threshold value. This report is concerned with the safety approach when the threshold value is expressed either relative to the negative control mean, or as a fraction of the difference between positive and negative controls. It should be noted that the statistical methodology was developed for a normally distributed endpoint with common variance. Further research has to be done for the issue of violation of these assumptions, e.g. assuming variance heterogeneity and/or non-normal distribution. Of course, this approach is suitable, not only for mutagenicity studies, but also for every toxicological problem related to safety. Acknowledgement This paper was partly sponsored by ECVAM (via EC contract number 17159-2000-11F1ED ISP DE). References 1. 2. 3. 4. 5. Hothorn, L.A. (1995). Biostatistical analysis of the control vs k treatments design including a positive control group. In Biometrie in der chemisch-pharmazeutischen Industrie (ed J. Vollmar), pp. 19–26. Stuttgart, Germany: Gustav Fischer Verlag. Bartholomew, D.J. (1961). Ordered tests in the analysis of variance. Biometrika 2, 325–332. Hothorn, L. A., Hayashi, M. & Seidel, D. (2000). Dose–response relationship in mutagenicity assays including an appropriate positive control group: a multiple testing approach. Environmental and Ecological Statistics 7, 27–42. Maurer, W., Hothorn, L. A. & Lehmacher, W. (1995). Multiple comparisons in drug clinical and preclinical assays: a priori ordered hypotheses. In Biometrie in der chemisch-pharmazeutischen Industrie (ed. J. Vollmar), pp. 3–18. Stuttgart, Germany: Gustav Fischer Verlag. Hauschke, D., Hayashi, M., Lin, K. K., Lovell, D. P., Robinson, W. D. & Yoshimura, I. (1997). Recom- D. Hauschke et al. 70 mendations for biostatistics of mutagenicity studies. Drug Information Journal 31, 323–326. 6. Hauschke, D. & Hothorn, L. A. (1998). Safety assessment in toxicological studies: proof of safety versus proof of hazard. In Design and Analysis of Animal Studies in Pharmaceutical Development (ed. S-C. Chow & J-P. Liu), pp. 197–225. New York, NY, USA: Marcel Dekker. 7. De Boeck, M., Touil, N., De Visscher, G., Vande, P. A. & Kirsch-Volders, M. (2000). Validation and implementation of an internal standard in comet assay analysis. Mutation Research 469, 181–197. 8. Hothorn, L. A. & Hauschke, D. (2000). Identifying the maximum safe dose: a multiple testing approach. Journal of Biopharmaceutical Statistics 10, 15–30. 9. Sasabuchi, S. (1988). A multivariate one-sided test with composite hypotheses determined by linear inequalities when the covariance matrix has an unknown scale factor. Memoirs of the Faculty of Science, Kyushu University, Series A, Mathematics 42, 9–19. 10. Hauschke, D., Kieser, M. & Hothorn, L. A. (1999). Proof of safety in toxicology based on the ratio of two 11. 12. 13. 14. 15. means for normally distributed data. Biometrical Journal 41, 295–304. Fieller, E. (1954). Some problems in interval estimation. Journal of the Royal Statistical Society B 16, 175–185. Schäfer, J. (2001). Kriterien zur Entscheidung über therapeutische Äquivalenz (Criteria for the Decision on Therapeutic Equivalence; in German). Masters Thesis, University of Munich. Anon. (1991). Guidance Note: The Practical Interpretation of Annex V: Test Method B10, the In Vitro Mammalian Cell Cytogenetics Test. XI/574/91 Rev. 2. Brussels, Belgium: Commission of the EC, Directorate General Environment. Adler, I.D. & Kliesch, U. (1990). Comparison of single and multiple treatment regimens in the mouse bone marrow micronucleus assay for hydroquinone and cyclophosphamide. Mutation Research 234, 115–123. Fisher, R.A. (1935). The Design of Experiments. London, UK: Oliver & Boyd. The role of control groups in mutagenicity studies Appendix 1: An S function for standard Fieller confidence intervals for the two-sample problem fieller ## ## ## ## ## ## ## ## ## ## ## ## ## ## <- function(treat, group, alternative=c(“two.sided”, “greater”, “less”), conf.level = 0.95) { Computes parametric confidence intervals for the ratio of mean(dosis)/mean(control) for a two-sample design Input: treat: numeric vector of measurements group: a factor at levels “dosis” and control alternative: side of the confidence sets to be computed conf.level: the confidence level Output: a list with components “lower”, “upper” and attribute “conf.level” Example: treat <- c(rnorm(10,3), rnorm(10,1)) group <- factor(c(rep(“dosis”, 10), rep(“control”,10))) fieller(treat, group) if (!is.vector(treat) || is.null(treat)) stop(“treat is no vector”) if (is.null(group)) stop(“no groups given”) if (length(treat) != length(group)) stop(“length differ”) alternative <- match.arg(alternative) alpha <- 1 - conf.level if if (!any(levels(group) == “dosis”)) stop(“No treatment group defined”) (!any(levels(group) == “control”)) stop(“No control defined”) x <- treat[group == “control”] y <- treat[group == “dosis”] m <- length(x) n <- length(y) S <- (1/(m+n -2))*(sum((x - mean(x))^2) + sum((y - mean(y))^2)) cint <- switch(alternative, two.sided={ tquant <- qt(alpha/2, m + n - 2) ax <- S/m*tquant^2 ay <- S/n*tquant^2 sqrtt <- sqrt(ax*mean(y)^2 + ay*mean(x)^2 - ax*ay) c((mean(x)*mean(y) - sqrtt)/(mean(x)^2 - ax), (mean(x)*mean(y) + sqrtt)/(mean(x)^2 - ax)) }, greater={ tquant <- qt(alpha, m + n - 2) ax <- S/m*tquant^2 ay <- S/n*tquant^2 sqrtt <- sqrt(ax*mean(y)^2 + ay*mean(x)^2 - ax*ay) c((mean(x)*mean(y) - sqrtt)/(mean(x)^2 - ax), Inf) }, less={ tquant <- qt(alpha, m + n - 2) ax <- S/m*tquant^2 ay <- S/n*tquant^2 sqrtt <- sqrt(ax*mean(y)^2 + ay*mean(x)^2 - ax*ay) c(0, (mean(x)*mean(y) + sqrtt)/(mean(x)^2 - ax)) }) if (ax > mean(x)^2) stop(“mean(x) is not significantly unequal zero”) attr(cint, “conf.level”) <- conf.level return(cint) } 71 72 D. Hauschke et al. Appendix 2: An S function for correlated Fieller confidence intervals according to Schäfer (12) fiellermuta <- function(treat, group, alternative=c(“two.sided”, “greater”, “less”), conf.level = 0.95) { ## ## Computes parametric confidence intervals for the ratio ## (mean(dosis) - mean(ncontrol))/(mean(pcontrol) - mean(ncontrol)) ## for a many-to-one design ## ## Input: ## treat: numeric vector of measurements ## group: a factor at levels “dosis”, “pcontrol” and “ncontrol” ## indicating the group ## alternative: side of the confidence sets to be computed ## conf.level: the confidence level ## ## Output: ## a list with components “lower”, “upper” and attribute “conf.level” ## ## Example: ## treat <- c(rnorm(10,3), rnorm(10,5), rnorm(10)) ## group <- factor(c(rep(“dosis”, 10), rep(“pcontrol”, 10), ## rep(“ncontrol”,10))) ## fiellermuta(treat, group) ## if (!is.vector(treat) || is.null(treat)) stop(“treat is no vector”) if (is.null(group)) stop(“no groups given”) if (length(treat) != length(group)) stop(“length differ”) alternative <- match.arg(alternative) if (!any(levels(group) == “dosis”)) stop(“No dosis group defined”) if (!any(levels(group) == “pcontrol”)) stop(“No positive control defined”) if (!any(levels(group) == “ncontrol”)) stop(“No negative control defined”) ndosis <- sum(group == “dosis”) npcontrol <- sum(group == “pcontrol”) nncontrol <- sum(group == “ncontrol”) df <- ndosis + npcontrol + nncontrol - 3 cdosis <- 1/ndosis + 1/nncontrol cnpcontrol <- 1/npcontrol + 1/nncontrol cnncontrol <- 1/nncontrol pooledvar <- ((ndosis - 1) * var(treat[group==“dosis”]) + (npcontrol - 1) * var(treat[group==“pcontrol”]) + (nncontrol - 1) * var(treat[group==“ncontrol”]))/df z <- mean(treat[group==“pcontrol”]) - mean(treat[group==“ncontrol”]) rdosis <- (mean(treat[group==“dosis”]) - mean(treat[group==“ncontrol”]))/z cint <- switch(alternative, “two.sided” ={ alpha <- (1 - conf.level)/2 gdosis <- (qt(1 - alpha, df)^2 * pooledvar * cnpcontrol)/(z^2) lower <- 1/(1 - gdosis) * (rdosis - (gdosis * cnncontrol)/cnpcontrol - (qt(1 - alpha, df) * sqrt(pooledvar))/z * sqrt(cdosis * (1 - gdosis) - 2 * cnncontrol * rdosis + cnpcontrol * rdosis^2 + (cnncontrol^2/cnpcontrol) * gdosis)) upper <- 1/(1 - gdosis) * (rdosis - (gdosis * cnncontrol)/cnpcontrol + (qt(1 - alpha, df) * sqrt(pooledvar))/z * sqrt(cdosis * (1 - gdosis) - 2 * cnncontrol * rdosis + cnpcontrol * rdosis^2 + (cnncontrol^2/cnpcontrol) * gdosis)) c(lower, upper) The role of control groups in mutagenicity studies 73 }, “less”={ alpha <- 1 - conf.level gdosis <- (qt(1 - alpha, df)^2 * pooledvar * cnpcontrol)/(z^2) upper <- 1/(1 - gdosis) * (rdosis - (gdosis * cnncontrol)/cnpcontrol + (qt(1 - alpha, df) * sqrt(pooledvar))/z * sqrt(cdosis * (1 - gdosis) - 2 * cnncontrol * rdosis + cnpcontrol * rdosis^2 + (cnncontrol^2/cnpcontrol) * gdosis)) c(0, upper) }, “greater”={ alpha <- 1 - conf.level gdosis <- (qt(1 - alpha, df)^2 * pooledvar * cnpcontrol)/(z^2) lower <- 1/(1 - gdosis) * (rdosis - (gdosis * cnncontrol)/cnpcontrol - (qt(1 - alpha, df) * sqrt(pooledvar))/z * sqrt(cdosis * (1 - gdosis) - 2 * cnncontrol * rdosis + cnpcontrol * rdosis^2 + (cnncontrol^2/cnpcontrol) * gdosis)) c(lower, Inf) }) attr(cint, “conf.level”) <- conf.level return(cint) } The S-functions ‘fieller’ and ‘fiellermuta’ implement the methods described in this paper. Both programs can be executed by using the commercial program “S-Plus” (http://www.insightful.com/), as well as the freely-available system “R” (http://www.r-project.org). 74 Appendix 3: The output for the example data R : Copyright 2002, The R Development Core Team Version 1.4.1 (2002–01–30) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type ‘license()’ or ‘licence()’ for distribution details. R is a collaborative project with many contributors. Type ‘contributors()’ for more information. Type ‘demo()’ for some demos, ‘help()’ for on-line help, or ‘help.start()’ for a HTML browser interface to help. Type ‘q()’ to quit R. > source(“fiellermuta.s”) > source(“fieller.s”) > > # data from the micronuclei example > > Cminus <- c(3,2,2,3,2,5,1) > Cplus <- c(33, 15, 32, 20) > D1 <- c(5,4,4,4,2) > group <- factor(c(rep(“control”,7), rep(“dosis”,5))) > > # Standard Fieller confidence intervals for the two-sample problem > > print(fieller(c(Cminus, D1), group, alternative=“less”)) [1] 0.000000 2.311095 attr(,“conf.level”) [1] 0.95 > > D2 <- c(7,4,6,8,6) > print(fieller(c(Cminus, D2), group, alternative=“less”)) [1] 0.000000 3.882335 attr(,“conf.level”) [1] 0.95 > > D3 <- c(9, 18, 13, 12, 18) > print(fieller(c(Cminus, D3), group, alternative=“less”)) [1] 0.00000 19.08949 attr(,“conf.level”) [1] 0.95 > > D4 <- c(22,13,23,22,20) > print(fieller(c(Cminus, D4), group, alternative=“less”)) [1] 0.00000 29.20161 attr(,“conf.level”) [1] 0.95 > > # Correlated Fieller confidence intervals with positive and negative control > > group <- factor(c(rep(“dosis”, 5), rep(“ncontrol”, 7), rep(“pcontrol”, 4))) > > print(fiellermuta(c(D1, Cminus, Cplus), group, conf.level=0.95, alternative=“less”)) [1] 0.0000000 0.2442684 attr(,“conf.level”) [1] 0.95 D. Hauschke et al. The role of control groups in mutagenicity studies > > print(fiellermuta(c(D2, Cminus, Cplus), group, conf.level=0.95, alternative=“less”)) [1] 0.0000000 0.3509787 attr(,“conf.level”) [1] 0.95 > > print(fiellermuta(c(D3, Cminus, Cplus), group, conf.level=0.95, alternative=“less”)) [1] 0.0000000 0.7360543 attr(,“conf.level”) [1] 0.95 > > print(fiellermuta(c(D4, Cminus, Cplus), group, conf.level=0.95, alternative=“less”)) [1] 0.000000 1.043725 attr(,“conf.level”) [1] 0.95 > 75