Issues and Experience in Analyzing Transgenic Mouse Carcinogenicity Studies: An Industry Perspective Ronald Menton Wyeth Research 2005 FDA/Industry Statistics Workshop Washington, DC, 14-16 Sep 2005 Outline • • • • Some statistical questions for 2-year studies Transgenic models Some thoughts on the questions for transgenic models Final Comments Study Design Questions? • • • • Are two control groups needed? How many animals per group? What groups are needed? Statistical methods? Some In-Life Questions? • When should we terminate group x? • When should we terminate the study? • Do we have a valid study? Questions at End of Study? • DO WE HAVE A VALID STUDY? • ARE ANY FINDINGS STATISTICALLY SIGNIFICANT? Transgenic Mouse Models • Mouse model more susceptible to drug-induced tumors due to – Knocking out gene associated with tumor suppression (eg., p53+/-, XPA ) – Insertion of multiple copies of human gene associated with tumor promotion (eg., TgrasH2,TG.AC) • The increased signal permits shorter study duration and smaller group sizes Transgenic Models • Current Regulations (ICH S1B) permit sponsors to conduct the traditional 2 year rat study plus a short- or mediumterm rodent study in lieu of 2 year studies in both rats and mice • The Committee for Medicinal Products for Human Use stated that the TgrasH2 and p53+/- mouse models are acceptable alternatives to the 2-year mouse study. CPMP (2004) Why Conduct Transgenic Study? • Faster – In-life: 6-months vs 2 years – Study completion: 1 year vs > 3 years • Less Resources – Fewer animals – People – Space • Increased Flexibility for Drug Development Typical Study Design for 2-Year Rodent Study Group Control Group 1 Control Group 2 Low Dosage Mid Dosage High Dosage Number of Animals Males Females 50-75 50-75 50-75 50-75 50-75 50-75 50-75 50-75 50-75 50-75 Are Two Control Groups Needed? • Many companies routinely use two vehicle control groups for 2-year carcinogenicity studies. • Why? – Permits an assessment of variation in tumor rates between groups – Poor survival in control group is problematic • See Haseman (1990) for discussion Multiple Control Groups in 2-Year Studies Eight of 14 companies indicated that multiple control groups are employed for at least 75 % of their studies. 10 8 6 4 2 0 8 4 0 0% 9 Two vehicle control groups 1 < 25% What type of multiple control group designs are routinely used? 25-75% > 75% 2 Vehicle control and water control 2 Vehicle control and untreated control Studies with Multiple Control Groups Survey of 14 PhRMA Companies on Statistical Methods Used for 2-year Rodent Carcinogenicity Studies. Menton R (2003) Are Two Control Groups Needed? • Are Two Vehicle Control Groups Needed in Short-term Carcinogenicity Studies? • Not for most models – Low spontaneous rate of tumors – Survival rate usually high for at least 6 months Survival for P53 Mouse from 6 NTP Studies NTP Web Site Mortality in TgrasH2 Mice N Studies N animals VC 12 180 Male MNU1 MNU2 4 7 60 104 Mortality Range Mean 0-13% 2.8% 0-33% 0-100% 13.3% 57.7% 1. 13-week studies 2. 26-Week Studies Adapted from Table 4 in Takaoka (2003) VC 12 179 Female MNU1 4 60 MNU2 7 105 0-13% 13-27% 13-100% 3.9% 20% 55.2% Spontaneous Tumors in P53 Mice Neoplasm Leukemia: Granulocytic Malignant Lymphoma Osteosarcoma or Osteoma Osteosarcoma Alveolar/Bronchiolar Adenoma Sarcoma Adapted from NTP Website Tumor Incidence Male Female All Organs 1/108 (0.93%) 0/109 2/108 (1.85%) 2/109 (1.83%) 2/108 (1.85%) 0/109 Bone 2/108 (1.85%) 0/109 Lung 0/108 Skin 2/108 (1.85%) 1/109 (0.92%) 3/109 (2.75%) Spontaneous Tumors in TgrasH2 Mice • Usui (2001) summarized tumor incidence and time of first tumor for common spontaneous tumors (incidence > 1%) in 12 ILSI ACT studies. • 180 male and 178 female mice (15 per study/sex) • Male tumor incidence: 0 – 1.8% • Female tumor incidence: 0 – 2.3% • In most cases, the incidence of these common tumors was only marginally greater than 1.0% How Many Animals Per Group? • 2-year mouse studies typically use between 50-65 animals per group. • Study duration was typically 24 months for both rat and mouse studies. The number of animals per group per sex was evenly divided between 50, 60, and 65. How Many Animals Per Group? • Original ILSI protocols recommended 15 animals per group for transgenic studies • Recent papers and presentations have recommended 20-25 per group – Morton (2002) – Lin (2004) – CPMP (2004) Sample Size • Recommend 20 to 25 mice/sex/group for carcinogenicity assessment studies in TgrasH2 mice. (Morton 2002) • Group size of 15 animals in the original transgenic mouse study protocol is too small. To have a level of power between 80 and 90% in detecting a true 15% difference, 20-25 animals per group are needed. (Lin 2004) • The number of animals per group in the ILSI/HESI studies is too small. An increase in group size to 20-25 animals per group is recommended. (CPMP 2004) Power to Detect Selected Increases in Tumor Rate Assuming Background Tumor Rate Near 0 n=15 P2 =0.01 =0.05 0.1 0.29 0.55 0.15 0.44 0.71 0.2 0.59 0.81 0.25 0.7 0.88 0.3 0.8 0.93 0.35 0.8 0.95 Adapted from Lin (2004) n=20 =0.01 =0.05 0.39 0.65 0.57 0.81 0.73 0.9 0.84 0.95 0.91 0.98 0.93 0.98 n=25 =0.01 =0.05 0.46 0.73 0.68 0.88 0.82 0.95 0.91 0.98 0.96 0.99 0.97 0.99 n=30 =0.01 =0.05 0.57 0.80 0.77 0.92 0.9 0.97 0.95 0.99 0.98 0.99 0.99 0.99 Power to Detect 15% Increase in Tumor Rate for Sample Sizes of 15, 20, and 25a Number of Historical prevalence of spontaneous neoplasms 0% 3.75% 7.5% mice/sex/group Sexes analyzed separately. Test will detect change in one sex or both 15 0.60 0.46 0.53 20 0.78 0.66 0.60 25 0.86 0.67 0.74 Both sexes analyzed together with blocking. 15 0.77 0.52 0.62 20 0.90 0.74 0.66 25 0.96 0.72 0.81 a Assumptions for these sample power simulations include: 1. A trend test is performed. 2. Three treatment groups and a negative control group are analyzed. 3. Prevalence of treatment-related neoplasm increases proportionally to the dosage. 4. There are no sex differences in neoplastic responses. 5. p < 0.05 is statistically significant. Adapted from Table 2 in Morton, 2002 What Groups to Include? • Typical 2-year carcinogenicity study includes 5 groups: C1, C2, L, M, H • All but one respondent indicated that a typical study includes three dose groups, with one stating that they usually employ four dose groups. Study Design for TgrasH2 Study GROUP NO.OF MICE Toxicity M F CB6F1-TgHras2 Vehicle Control 25 Positive Control 25 Low-Dose 25 Mid-Dose 25 High-Dose 25 CB6F1-nonTgrasH2 Vehicle Control 25 High-Dose 25 Adapted from www.rash2.com 25 25 25 25 25 25 25 What Groups to Include? • • • • Original ILSI Protocol recommended 7 Groups C, L, M, H, Positive Control, WT-C, WT-H WT groups are now considered optional Two questions: – Is the PC control group needed? – If PC group included, then how many animals are needed in this group? Positive Controls in Short-term Studies • Storer (2001) summarized results for 19 ILSI ACT studies that used p-cresidine as the positive control group • N=15 per sex • Males – P-cresidine was considered positive for 18 of 19 studies – Bladder tumor incidence ranged from 0 to 86.7% • Females – P-cresidine was considered positive for 15 of 19 studies – Bladder tumor incidence ranged from 0 to 60% Positive Controls in Short-term Studies Incidence of Select Neoplasms in TgrasH2 Mice Treated with MNU Organ/Diagnosis Forestomach/ Squamous cell papilloma/carcinoma Multisystemic/ Malignant lymphomas Male (7 Studies) Range Mean Female (7 Studies) Range Mean 87-100% 96% 93-100% 98% 53-87% 53-100% 76% Adapted from Table 8 in Takaoka (2003) 76% Power for Comparing Tumor Incidence Between Positive Control and Vehicle Control Group Background Tumor Incidence Incidence = 5% Positive Control Number in PC Group Group n=15 n=25 50% 94.9% 89.5% 60% 97.0% 99.1% 70% >99.9% 99.6% 80% 99.9% >99.9% 90% >99.9% >99.9% Background Incidence = 10% Number in PC Group n=15 n=25 83.7 74.8% 91.1% 95.3 99.5% 98.0% 99.8% >99.9 >99.9% >99.9 Calculations assume that tumor incidence is compared between the two groups using a Fisher Exact test at the 5% significance level. Power was computed via simulation (5000 runs per simulation). Possible Design for 6-Month P53+/- or TgrasH2 Study Group Control Group Low Dosage Mid Dosage1 High Dosage Positive Control Group2,3 Number of Animals Males Females 25 25 25 25 25 25 25 25 15-20 15-20 1. Do we need three dosage groups? 2. After demonstrating model assay validity, do we need the positive control group? 3. 20 animals if tumor incidence in target organs is 50-60%. 15 animals if tumor incidence in target organs is 70% Statistical Methods? • Peto’s test is commonly used for the statistical analysis of tumor data for 2-year carcinogenicity studies • Eleven of 13 respondents familiar with the procedures detailed in the draft FDA guidance document, “Statistical Aspects of Design, Analysis, and Interpretation of Animal Carcinogenicity Studies”. • Twelve companies stated that they are using Peto type tests for the analysis of tumor data. Options for Statistical Methodology for P53 and TgrasH2 Studies • Cochran-Armitage Trend test and Fisher’s Exact test Exclude animals that die with short survival times. Definition of sufficient survival based on time of tumor observation in sponsor’s historical data and literature • Peto Methods • Poly-K methods Cochran-Armitage and Fisher Exact Tests • • • • Advantages Simple, well known test Exact tests available Easy to block or stratify for other covariates Appropriate if there are few fatal tumors and intercurrent mortality is similar among groups Disadvantages • Requires specification of survival time for excluding animals • Does not account for time of tumor onset or cause of death Peto Methods • • • • • Advantages FDA may use Peto’s method Accounts for time of tumor onset and cause of death Software available Exact tests available Scientists familiar w/ methods Disadvantages • Requires specification of incidental intervals • Specification of incidental intervals is complicated due to small number of deaths in vehicle control groups • Complexity makes stratification/blocking more difficult Poly-K Methods • • • • • Advantages Adjusts for mortality Does not require cause of death determination Do not have to specify time intervals Easy to block or stratify for the two studies Fairly simple method Disadvantages • Not much experiece for 6month study • Biologists not familiar with method • Application of exact tests for poly-k method is a research topic Statistical Methods? Incidence of mortality, neoplasms/select non-neoplasms will be compared among dosage groups using the CochranArmitage trend test and Fisher's exact test between each dosage group and the vehicle-control group. If excessive intercurrent mortality is observed then the trend and pairwise tests of tumor data will be conducted using Peto's method. What constitutes excessive mortality? Number of early deaths: > 5? > 10? Employ Poly-k Method? Questions During In Life • Mortality and/or differential intercurrent mortality raises statistical questions during conduct of 2-year studies – Should the high dose be lowered? – Should one or more groups be terminated early? – Should the study be terminated early? • Ten of 13 companies indicated that at least one dose group was terminated early or the top dose lowered for at least one study in the past five years. Mortality Guidelines for 2-year Studies • 20-30 animals per group should be alive during weeks 80-90 – FDA Draft Guidance (May 2001) • High-Dose group could be terminated early when the survival of the group is reduced to 10-12 animals – Fairweather et al (1998). Drug Information Journal • A study could be terminated if survival of the control group goes below 20-30 after weeks 80-90 – FDA Draft Guidance (May 2001) Mortality Issues for Short-term Studies • Survival is usually very high in short-term studies • However, what do we do if it isn’t? • What are the criteria for evaluating if study is acceptable, terminating a study, or terminating a dosage group? Mortality Issues for Short-term Studies • We (scientific community) do not currently know how many animals are needed at the end of a 26-week carcinogenicity study • We also do not know how many weeks represents sufficient exposure • We do know that the more animals per group the more sensitive the statistical tests will be for detecting compound related tumor increases of a specified magnitude Power for Reduced Survival Tumor Rate Background Increase at Rate High Dose 15% 20% .1% 25% 30% 35% 3% 15% 20% 25% 30% 35% Sample Size at High Dose 15 10 55 – 67% 72 – 84% 85 – 92% 93 – 96% 96 –99% 44 – 47% 62 – 69% 75 – 79% 85 - 88% 90 –96% 44 – 48% 59 – 66% 70 – 79% 83 – 90% 89 – 94% 32 – 40% 44 – 54% 56 – 67% 66 – 79% 77 – 89% Description of Power Calculations • Simulations were conducted to estimate the probability of detecting differences of 15 - 35% in tumor rates between the treated groups and control group – Power calculations assume that tumor incidence is compared among 4 dosage groups using a one-sided Cochran-Armitage trend test conducted at the 5% significance level – Background tumor incidence ranged from 0.1% to 3% – Tumor incidence in L and M dosage groups ranged from background rates to 2/3 of that in H dosage group – Power was computed via simulation (1000 runs per simulation) – Calculations performed for two sets of samples sizes: 25, 24, 22, and 15 in the C, L, M, and H dosage groups, 25, 24, 22, and 10 in the C, L, M, and H dosage groups, Some Thoughts On Mortality Guidelines for Short-term Studies • xx-yy animals per group should be alive during weeks ww-zz – xx - yy = 15 – 20? – ww-zz likely species dependent • High-dose group could be terminated early when the survival of the group is reduced to 10-15 (?) animals before weeks ww-zz. • A study could be terminated if survival of the control group goes below 20 (assuming n = 25) before weeks ww - zz Are Any Findings Statistically Significant? • What is Considered Statistically Significant? • Different approaches are utilized to adjust for the multiple statistical tests performed in 2-year carcinogenicity studies. • Six of 13 companies employ the decision rule in FDA’s draft guidance document of 0.025 for rare tumors and 0.005 for common tumors. What significance levels are used for the evaluation of rare/common tumors? Rare/Common 4 0.05/0.05 with no adjustments for multiple tumors 1 0.05/0.05 with an adjustment for multiple tumors 2 0.05/0.01 i.e., Haseman Rule 6 0.025/0.005 i.e., FDA Decision Rule Decision Rule in FDA’s Draft Guidance Significance levels for making statistical decisions to accommodate the multiple tests Control-High Pairwise Comparisons Common tumors = 0.005 Common tumors=0.01 Rare tumors = 0.025 Rare tumors = 0.05 Tests for Positive Trend Standard 2-Year Studies in Rat & Mouse Alternative ICH Studies (eg. 2 year rat study + 6-month mouse study) Adapted from US FDA (May 2001) Common tumors = 0.01 Rare tumors = 0.05 Under development and not yet available. What is Considered Statistically Significant? • Is a multiplicity adjustment needed for short-term studies? • No – Only a handful of tumor types observed in a study – Probability of a false positive is low due to low spontaneous rate Final Comments • Alternative mouse models provide additional flexibility in drug development • While 25 animals per sex/group is reasonable for the control and treated transgenic groups, smaller sample sizes make sense for the positive control group • Simple statistical methods work well when survival is high • More research and/or guidance is needed on defining adequate survival Some References • • • • • • • • • • • CPMP Safety Working Party. CHMP SWP conclusions and recommendations on the use of genetically modified animal models for carcinogenicity assessment. London, 23 June 2004. Haseman JK, Hajian G, Crump KS, Selwyn MR, and Peace KE, Dual controls in rodent carcinogenicity studies. In: Statistical issues in drug research and development, Ed by KE Peace. Marcel Dekker, New York. 1990. Lin K. Statistical Issues in Review of Carcinogenicity Studies of Pharmaceuticals, Drug Information Association 40th Annual Meeting, June 16, 2004, Washington, DC MacDonald J, et al. The utility of genetically modified mouse assays for identifying human carcinogens: a basic understanding and path forward. Toxicol Sci. 2004:188-94. Menton R. and R Perry. Statistical Methods for 2-Year Rodent Carcinogenicity Studies. Midwest Biopharmaceutical Workshop, Muncie, In, 2003. Morton D. The Tg rasH2 Mouse in Cancer Hazard Identification, Toxicol Pathol, 2002: 139-146. NTP web pages on Histoical Controls for P53 Mice. http://ntp.niehs.nih.gov/ Study Results & Research Projects >> Study Data Searches >> Historical Controls >> NTP Historical Control for Genetically-Modified Models Storer R, et al. p53+/- Hemizygous Knockout Mouse: Overview of Available Data. Toxicol. Pathol.,2001, 29 Suppl:30-50. Takaoka M, et al. Interlaboratory comparison of short-term carcinogenicity studies using CB6F1rasH2 transgenic mice. Toxicol Pathol, 2003:191-9. US Food and Drug Administration, Statistical Aspects of Design, Analysis, and Interpretation of Animal Carcinogenicity Studies, Draft Guidance for Industry, May 2001. Usui T, et al., CB6F1-rasH2 mouse: Overview of Available Data. Toxicol Pathol, 2001. 29 Suppl:90108.