Stratified Analysis of A Binary Endpoint and “Beyond” Christy Chuang-Stein Statistical Research and Consulting Center Pfizer Inc ASA Biopharm Section Webinar May 7 2009 1 delete these guides from slide master before printing or giving to the client Related Webinars Offered Previously October 21, 2008 Devan Mehrotra - Stratified Analyses: Tips for Improving Power (http://www.biopharmnet.com/doc/2008_10_21_webinar.pdf ) April 3, 2009 Frank Harrell – Case Study in Parametric Survival Modeling First 16 slides or so on “Covariable Adjustment in Randomized Clinical Trials” (http://www.biopharmnet.com/doc/2009_04_03_webinar.pdf ) delete these guides from slide master before printing or giving to the client 2 delete these guides from slide master before printing or giving to the client Outline of This Webinar Stratified Analysis of a Binary Endpoint Inverse vs CMH Weighting Simpson’s Paradox and Collapsibility Beyond Stratified Randomization vs Stratified Analysis Stratification and Subgroup Analysis Sample Sizing for a Multi-regional Trial Regulatory Guidances on Global Trials, Data Extrapolation Conclusion delete these guides from slide master before printing or giving to the client 3 delete these guides from slide master before printing or giving to the client A Sepsis Study A confirmatory trial in severe sepsis, a double-blind placebo control trial; IV with 96 hours duration; randomization stratified by center. Primary analysis was 28-day mortality rate after treatment onset, stratified by 3 pre-specified covariates: APACHE II score, age and protein C activity. Trial was terminated by an independent DSMB for efficacy after 2nd interim analysis of 1520 patients. Many subgroup analyses were conducted, including APACHE II subgroups (4 defined by the observed quartiles), subgroups defined by the components of the APACHE II score, and subgroups defined by 1, or 2, or 3, or at least 4 organ dysfunctions. delete these guides from slide master before printing or giving to the client 4 delete these guides from slide master before printing or giving to the client Notations for 28 Day Mortality Rate Treatment APACHE II Score Stratum 3-19 (1Q) 20-24 (2Q) 25-29 (3Q) 30-53 (4Q) New Trt p11 p12 p13 p14 Placebo p21 p22 p23 p24 Risk Difference d1 d2 d3 d4 delete these guides from slide master before printing or giving to the client 5 delete these guides from slide master before printing or giving to the client When Dealing with Binary Outcome Three measures are commonly used to assess efficacy within the j th APACHE II stratum Risk difference dj : p1j – p2j Relative risk rj : p1j / p2j Odds ratio oj : { p1j (1 - p2j ) } / { (1 - p1j ) p2j } Denote the observed rate by pij, pij = nij1 /nij+. We will focus on risk difference. In each stratum, estimate p1j – p2j by p1j – p2j. We will get an overall treatment effect estimate and construct a test statistic. delete these guides from slide master before printing or giving to the client 6 delete these guides from slide master before printing or giving to the client Test an Overall Treatment Effect A common approach is to form a weighted average and construct a test statistic for the overall effect as dˆ ˆ w d j j j X2 j wj 2 ˆ d var( dˆ ) X2 has an asymptotic chi-square distribution with 1 degree of freedom if Sj wj dj = 0. delete these guides from slide master before printing or giving to the client 7 delete these guides from slide master before printing or giving to the client Choice of Weights – Method I Inverse variance – {wi} is equal to the inverse of the sample variance of d̂ j . In this case, X2 will be 2 ˆ w d j j j j 2 wj When dj = d (the risk difference is uniform across the strata), the inverse variance weighting produces the minimum variance estimate for the common risk difference d, which is unbiased for large samples. This method is favored by meta analysts. delete these guides from slide master before printing or giving to the client 8 delete these guides from slide master before printing or giving to the client Choice of Weights – Method 2 CMH method – {wi} is equal to the inverse of the harmonic mean of n1j+ and n2j+. This method produces the X2 test by Cochran, which is asymptotically equivalent to a test developed by Mantel and Haenszel. Continuity correction could be applied. n1 j n2 j 2 X C j p j (1 p j ) n j 2MH n1 j n2 j j p j (1 p j ) n j 1 -1 n n 1 j 2 j dˆ j j n j 1 j 2 n1 j n2 j ˆ dj n j 2 delete these guides from slide master before printing or giving to the client 9 delete these guides from slide master before printing or giving to the client CMH Method Let fi represent the relative frequency of patients in the jth stratum in the population. When the study population mimics the target population, CMH estimate is approximately unbiased for Sj fj dj. The above makes CMH weighting attractive when one is not sure if the treatment effect is the same across the strata. delete these guides from slide master before printing or giving to the client 10 delete these guides from slide master before printing or giving to the client Assumptions on True Mortality Rates Treatment Disease Severity Score 1Q 2Q 3Q 4Q New Trt 12% 21% 27% 36% Placebo 12% 24% 36% 48% True Risk Difference 0% -3% -9% -12% When the mortality rate is low, there is not much room to improve. Most of the benefit is in the high-risk population. delete these guides from slide master before printing or giving to the client 11 delete these guides from slide master before printing or giving to the client Impact of Weighting Weighting by the relative frequency of a stratum within the population leads to an overall treatment effect Sj fj dj of 0.25*(0)+0.25*(3%)+0.25*(9%)+0.25*(12%)= 6% . Assume equal allocation within each stratum. The overall treatment effect estimate under the CMH weighting will approach 6% for large samples. If we use the inverse variance weighting, we will weigh treatment effects in the 1Q, 2Q, 3Q and 4Q by 2.23 : 1.38 : 1.20 : 1.00. The effect estimate will approach 4.5% for large samples. The inverse variance weighting will underestimate the parameter Sj fj dj of interest in this case. delete these guides from slide master before printing or giving to the client 12 delete these guides from slide master before printing or giving to the client Results for 28 Day Mortality Rate Treatment New Trt Placebo Risk Difference APACH II Score Stratum 3-19 (1Q) 20-24 (2Q) 25-29 (3Q) 30-53 (4Q) 15% 22% 24% 38% (N=218) (N=218) (N=204) (N=210) 12% 26% 36% 49% (N=215) (N=222) (N=162) (N=241) 3% -4% -12% -11% delete these guides from slide master before printing or giving to the client 13 delete these guides from slide master before printing or giving to the client Findings from the Sepsis Trial The CMH test statistic has a value 7.310 with 1 degree of freedom (no continuity correction). The two-sided Pvalue is 0.0068. The CMH test statistic computes the variance assuming p1j = p2j for all j. A 95% CI for the overall difference in the mortality rate (new treatment – placebo) under the CMH weighting is (-9.8%,-1.6%). The calculation of variance in this case does not assume p1j = p2j . The inverse variance approach produces a 95% for the difference in the mortality rate (new treatment – placebo) of (-8.1%, -0.1%). delete these guides from slide master before printing or giving to the client 14 delete these guides from slide master before printing or giving to the client Comparing across Strata The difference in the mortality rates (new treatment – placebo) in the 4 APACHEII strata range between 3% to –12%. The graph suggests a possible interaction that might be qualitative in nature. We will look at an approach proposed by Gail and Simon (1985, Biometrics, 41:361-372) to test for qualitative interaction. 0.05 0 -0.05 -0.1 -0.15 1Q 2Q 3Q 4Q Dmitrienko et al (2005). Analysis of Clinical Trials Using SAS. delete these guides from slide master before printing or giving to the client 15 delete these guides from slide master before printing or giving to the client Test for Qualitative Interaction Let O+ = {di = set of non-negative differences Let O- = {di = set of non-positive differences Q j 1 J dˆ 2j s 2j I (dˆ j 0 ) , Q j 1 J dˆ 2j s 2j I (dˆ j 0 ) Q min ( Q , Q ) Q > c can be used to test the null hypothesis of no qualitative interaction. Q follows a fairly complex distribution based on a weighted sum of chi-square distribution. SAS codes are available in the book by Dmitrienko et al. delete these guides from slide master before printing or giving to the client 16 delete these guides from slide master before printing or giving to the client Test for Qualitative Interaction Q+ can be used to test the null hypothesis of all differences being negative. Q- can be used to test the null hypothesis of all differences being positive. For the sepsis study, the two-sided Gail-Simon test has a P-value of 0.4822. The one-sided P-value for H0 of positive differences (new treatment – placebo) is 0.0030. The one-sided Pvalue for H0 of negative differences is 0.6005. Like other interaction tests, G-S test requires strong evidence before we can reject the no qualitative interaction hypothesis. delete these guides from slide master before printing or giving to the client 17 delete these guides from slide master before printing or giving to the client In the End… Data from this single study led to the approval of Xigris® Xigris® INDICATIONS AND USAGE Xigris is indicated for the reduction of mortality in adult patients with severe sepsis (sepsis associated with acute organ dysfunction) who have a high risk of death (e.g., as determined by APACHE II). Safety and efficacy have not been established in adult patients with severe sepsis and lower risk of death. delete these guides from slide master before printing or giving to the client 18 delete these guides from slide master before printing or giving to the client Table in the Package Insert APACHE II Quartile score 1st + 2nd (3-24) 3rd + 4th (25-53) Xigris Placebo Total 436 Mortality rate 18.8% 437 Mortality rate 19.0% 414 30.9% 403 43.7% Total Patients who have a high risk for death are represented by an APACHE II score in the 3rd and 4th APACHE II score categories. Treatment effects need to differ more than what shown in this case for Gail-Simon test to conclude interaction. delete these guides from slide master before printing or giving to the client 19 delete these guides from slide master before printing or giving to the client Questions Could one have anticipated this extent of treatment difference before the trial? If yes, what would have been a good design and analysis strategy? Options Specify the high risk population as the primary analysis population and enroll adequate patients in this group. Test both the high risk population and the entire population with adjustment for multiplicity. Analysis follows the design strategy. delete these guides from slide master before printing or giving to the client 20 delete these guides from slide master before printing or giving to the client The LIFE Study Losartan Intervention For Endpoint Reduction in Hypertension Study. Conducted at 945 sites in 7 countries. Enrolled 9193 hypertensive patients with left ventricular hypertrophy (LVH) The primary endpoint is a composite endpoint of cardiovascular deaths, stroke, and myocardial infarction. Results reviewed by the FDA Cardiovascular and Renal Drugs AC on Jan 6 2003 for a new proposed indication Cozaar is indicated to reduce the risk of cardiovascular morbidity and mortality as measured by the combined incidence of cardiovascular death, stroke, and myocardial infarction in hypertensive patients with left ventricular hypertrophy. delete these guides from slide master before printing or giving to the client 21 delete these guides from slide master before printing or giving to the client Some Background Losartan’s then label states that the effect in blood pressure reduction in blacks was somewhat less than in that in whites (a common statement for beta-blockers). FDA statistician quoted data from three endpoint studies of other drugs. These studies demonstrated less or no treatment effect in blacks when compared to whites. On the primary endpoint, when compared to atenolol, losartan had a hazards ratio of 0.869 (95% CI from 0.772 to 0.979) with a P-value of 0.021. The effect came primarily from the stroke component of the composite. The issue of how losartan compared to atenolol in blacks came up. delete these guides from slide master before printing or giving to the client 22 delete these guides from slide master before printing or giving to the client Hazard Ratio and 95% CIs - Primary Endpoint Ov e ra l l Un i te d S ta te s M al e Fe m a l e B l ac k W h i te A g e <6 5 A ge 65 or ov e v or s L o s a rt delete theseF guidesa from slide master before printing or giving to the client 23 delete these guides from slide master before printing or giving to the client Gail-Simon Test Nominal p-value for Black vs. Non-Black Qualitative Interaction = 0.016. Impossible to correctly adjust this p-value for multiple comparisons post hoc. 3 subgroups pre-specified for special importance (U.S. region, Diabetics, ISH) To do it correctly, the formal analysis plan would need to list all important subgroups and specify a method to correctly adjust for the number of tests. Source: John Lawrence’s (FDA Statistical Reviewer) slides at the January 6 2003 FDA AC meeting. For more discussion, see http://www.fda.gov/ohrms/dockets/ac/03/slides/3920s1.htm delete these guides from slide master before printing or giving to the client 24 delete these guides from slide master before printing or giving to the client COZAAR® Package Insert Indications and Usage … COZAAR is indicated to reduce the risk of stroke in patients with hypertension and left ventricular hypertrophy, but there is evidence that this benefit does not apply to Black patients. … Clinical Pharmacology In the LIFE study, Black patients treated with atenolol were at lower risk of experiencing the primary composite endpoint compared with Black patients treated with COZAAR…. This finding could not be explained on the basis of differences in the populations other than race or on any imbalances between treatment groups… the LIFE study provides no evidence that the benefits of COZAAR on reducing the risk of cardiovascular events in hypertensive patients with left ventricular hypertrophy apply to Black patients. delete these guides from slide master before printing or giving to the client 25 delete these guides from slide master before printing or giving to the client Observations In the case of Xigris, subgroups defined by APACHE II score were pre-specified. Statistical significance was not achieved by the Gail-Simon test at the 5% level. In the case of COZAAR, race subgroups were not prespecified. They are, however, among the “usual” demographic subgroups and there is a priori reason for looking at this subgroup. A post hoc Gail-Simon test produced a value less than 0.05. The end results (language in the product package insert) are similar – the label describes differential treatment effects in the subgroups. delete these guides from slide master before printing or giving to the client 26 delete these guides from slide master before printing or giving to the client Clinical Summary of Safety Study Drug A # of Pts 1 8% 4% 2 7% 6% 3 1% 1% 4 1% 2% 5 21% 20% 6 8% 10% Total Avg 13% 1000 Drug B 9.5% # of Pts 750 13% vs 9.5%: a two-sided P-value of 0.023. delete these guides from slide master before printing or giving to the client 27 delete these guides from slide master before printing or giving to the client Clinical Summary of Safety Study Drug A # of Pts Drug B # of Pts 1 8% 100 4% 100 2 7% 100 6% 100 3 1% 100 1% 100 4 1% 100 2% 100 5 21% 500 20% 250 6 8% 100 10% 100 Total Avg 13% 1000 9.5% 750 95% CI for the diff (A – B) using inverse variance weighting is (-0.017, 0.018) with a point estimate of 0.001. What happens? delete these guides from slide master before printing or giving to the client 28 delete these guides from slide master before printing or giving to the client Clinical Summary of Safety Study Drug A # of Pts Drug B # of Pts 1 8% 100 4% 100 2 7% 100 6% 100 3 1% 100 1% 100 4 1% 100 2% 100 5 21% 500 20% 250 6 8% 100 10% 100 Total Avg 13% 1000 9.5% 750 The study with the highest AE rates had twice as many subjects on Drug A as on Drug B. delete these guides from slide master before printing or giving to the client 29 delete these guides from slide master before printing or giving to the client Simpson’s Paradox Treatment New Control Total Study I Event No Event Study 2 Event No Event 180 (60%) 60 (30%) 120 (40%) 60 40 (60%) (40%) New: 300 Control: 100 140 (70%) 60 140 (30%) (70%) New: 200 Control: 200 •Within each study, the two groups have the same event rates. •Study 1 randomized patients 1:1:1:1 to 3 doses and 1 control. •Study 2 randomized patients 1:1 to one dose and control. delete these guides from slide master before printing or giving to the client 30 delete these guides from slide master before printing or giving to the client Results Pooled over Studies Treatment Event No Event Combined New 240 (48%) 120 (40%) 260 (52%) 180 (60%) 500 Control 300 Pooling produces an event rate of 48% for the new treatment and 40% for the control. The chi-square statistic has a two-sided P- value = 0.028. Conducting un-stratified (un-adjusted) analysis in this case will lead to an erroneous conclusion. delete these guides from slide master before printing or giving to the client 31 delete these guides from slide master before printing or giving to the client Collapsibility In this example, the risk difference is not collapsible over the studies (i.e., we can’t ignore “study”). Randomization (treatment assignment) is not independent of study in the two-way marginal table of treatment by study. Study 1 Study 2 Combined New Treatment Control 300 100 200 200 500 300 Total 400 400 delete these guides from slide master before printing or giving to the client 32 delete these guides from slide master before printing or giving to the client Collapsibility When both randomization ratio and risk difference are the same across studies, risk difference is collapsible over studies. In this case, the proportion of event for each treatment is a weighted average of the proportions in individual studies with weights proportional to the study sizes. Study 1 Study 2 Combined New Treatment Control 60% 60% 30% 30% 40% 40% Total (3:1) 400 (3:1) 800 delete these guides from slide master before printing or giving to the client 33 delete these guides from slide master before printing or giving to the client In General If the two treatments have the same effect in all studies (null hypothesis) and in addition, the randomization ratio is the same, then risk difference, risk ratio, and odds ratio are all collapsible across studies. In the above case, the risk difference is 0 and the relative risk and odds ratio are 1. Otherwise, collapsibility depends on the chosen measure for association (risk difference, risk ratio, odds ratio) - Greenlander, 1998, Encyclopedia of Biostatistics. delete these guides from slide master before printing or giving to the client 34 delete these guides from slide master before printing or giving to the client Collapsibility Depends on Measure 1:1 randomization, equal risk difference in two studies Study 1 Study 2 Combined New Trt 80% N = 100 40% N = 100 60% Control 60% N = 100 20% N = 100 40% Risk Diff 0.20 0.20 0.20 Risk Ratio 1.33 2.00 1.50 Odds Ratio 2.67 2.67 2.25 delete these guides from slide master before printing or giving to the client 35 delete these guides from slide master before printing or giving to the client Observations Meta analysis procedure is frequently used to combine efficacy results. Should use meta analysis (stratified analysis) when summarizing safety data from different studies, especially when studies have different patient populations and/or different randomization ratios. If there is no a priori information suggesting different risk differences for different studies, inverse variance weighting would be a good choice. Should always consider stratified analysis when covariates are highly correlated with the response. delete these guides from slide master before printing or giving to the client 36 delete these guides from slide master before printing or giving to the client Stratified (Adjusted) Analysis Factor defining strata is prognostic of response. Allowing comparison within more homogeneous groups. Factor defining strata is predictive of treatment effect. Issue of interaction Evaluating treatment effect with subgroups Overall treatment effect might be less meaningful if the interaction between treatment and factor is substantial delete these guides from slide master before printing or giving to the client 37 delete these guides from slide master before printing or giving to the client Stratified Randomization vs Analysis If we employ stratified randomization, the convention is to include the stratifying factor in the analysis (CPMP/EWP/2863/99 on adjustment for baseline covariates). When there are >=50 patients in each treatment group, Grizzle found that there was little advantage to using stratified randomization with two strata when the strata are roughly equally represented (Grizzle, Controlled Clinical Trials, 1982). The incremental benefit of stratified randomization beyond that due to the stratified analysis is minimum (Permutt, DIJ 2007). delete these guides from slide master before printing or giving to the client 38 delete these guides from slide master before printing or giving to the client Stratified Randomization vs Analysis The above is due to the fact that, for a reasonable sample size, the chance that the randomization will produce the type of imbalance that will substantially affect the inference is low. If a stratum is small, stratified randomization could reduce the chance of imbalance. If we are forced to treat un-stratified analysis as the primary analysis, stratified randomization could generally give us results close to those from an adjusted analysis. Stratified allocation is used to ensure adequate (or even greater) representation of a particular type of patients in the study. delete these guides from slide master before printing or giving to the client 39 delete these guides from slide master before printing or giving to the client Permutt, DIJ 2007 50 subjects will be randomized to one of two treatments. There are 50 men and 50 women. Gender is a prognostic factor and could be used as a stratifying factor for randomization and/or analysis, resulting in 4 options: stratified randomization and analysis (R&A), stratified randomization only (R Only), stratified analysis only (A Only), Neither. Assume standard deviation is 10, and a treatment effect that will result in 80% power with 25 per group per gender under the R&A option (i.e., D = 5.6). Assuming no treatment by gender interaction, but gender effect varies between 0 and 20. delete these guides from slide master before printing or giving to the client 40 delete these guides from slide master before printing or giving to the client Permutt, DIJ 2007 Under “A Only” (stratified analysis without stratified randomization), the power was calculated for each possible (treatment,gender) allocation combination. The power was then averaged using probability under the hypergeometric distribution as the weight. Under option “R Only” (stratified randomization without stratified analysis), Type I error could be lower than the nominal level (two-sided 5%) because the reduction in the variance of the estimated treatment effect due to stratified randomization is not properly accounted for in the analysis. (See the original paper.) delete these guides from slide master before printing or giving to the client 41 delete these guides from slide master before printing or giving to the client Numerical Results (Permutt, DIJ 2007) Gender R&A R Only A Only Neither 0 0.800 0.800 0.799 0.800 2 0.800 0.799 0.799 0.796 4 0.800 0.795 0.799 0.784 6 0.800 0.790 0.799 0.765 8 0.800 0.783 0.799 0.739 12 0.800 0.765 0.799 0.671 16 0.800 0.744 0.799 0.590 20 0.800 0.724 0.799 0.508 delete these guides from slide master before printing or giving to the client 42 delete these guides from slide master before printing or giving to the client Stratification & Subgroup Analysis How does the treatment perform in patients with mild disease? Do patients with mild/moderate disease respond to the treatment similarly as patients with severe disease? This is typically phrased as an interaction between treatment and disease severity at baseline If heterogeneous effect (interaction) exists, is it qualitative or quantitative? delete these guides from slide master before printing or giving to the client 43 delete these guides from slide master before printing or giving to the client Subgroup Analysis: Issues Multiplicity leading to inflated false positive rate Lack of statistical power leading to inflated false negative rate Treatment group incomparable because randomization was not done within the subgroups Appropriate reporting/interpretation to ensure scientifically defensible and balanced conclusion We will focus on the first two issues here. delete these guides from slide master before printing or giving to the client 44 delete these guides from slide master before printing or giving to the client False Positive Multiplicity With multiple subgroup analyses, probability of a false positive finding substantial. With 10 independent tests (α=0.05), chance of at least one false positive > 40%. Lagakos (2006) NEJM 354;16 delete these guides from slide master before printing or giving to the client 45 delete these guides from slide master before printing or giving to the client Forest Plot of Treatment Effect Hypothetical study Typical Result 4000 patients in 20 countries (200 patients each) with a control arm risk of 20% and an experimental arm risk of 15% Homogenous absolute risk reduction of 5% in all countries. Marschner (DIA Annual Meeting) delete these guides from slide master before printing or giving to the client 46 delete these guides from slide master before printing or giving to the client Simulation Study of Country Differences In 10,000 simulations of similar studies, the largest and smallest treatment effect among the 20 countries was calculated – On average the largest treatment effect among the 20 countries was a 15% absolute risk reduction on the experimental therapy – On average the smallest treatment effect among the 20 countries was a 5% absolute risk increase on the experimental therapy Purely by chance, the observed experimental treatment effect in different countries can be expected to range from extremely beneficial to apparently harmful. Marschner (DIA Annual Meeting) delete these guides from slide master before printing or giving to the client 47 delete these guides from slide master before printing or giving to the client Prob of Neg Result for One Subgroup Assuming two groups and a continuous endpoint: Factors increasing the probability • Substantial imbalance between treatment groups • Substantial differences in the subgroup size • A large number of subgroups Factors decreasing the probability • Balanced treatments and subgroup size • A large treatment effect size • A large sample size delete these guides from slide master before printing or giving to the client 48 delete these guides from slide master before printing or giving to the client Disjoint Subgroups 2-sided a = 0.05 1:1 ratio with perfect balance between treatments Various scenarios for subgroup size # of Subgroups 3 80% Power 90% Power 15 – 35% 9 – 30% 5 40 – 60% 30 – 35% Li, Chuang-Stein, Hoseyni, DIJ (2007), 41:47-56. delete these guides from slide master before printing or giving to the client 49 delete these guides from slide master before printing or giving to the client Overlapping Subgroups (Simulations) Each baseline covariate defines 3 subgroups with equal proportions (2 or 5 covariates). Probabilities based on simulations (1000 replicates). Unconditional on the overall result. # of Subgroups Effect Size = 0.25 Effect Size = 0.50 80% Power (253/group) 90% Power (338/group) 80% Power (64/group) 90% Power (86/group) 6 24% 17% 22% 15% 15 38% 26% 43% 27% delete these guides from slide master before printing or giving to the client 50 delete these guides from slide master before printing or giving to the client Overlapping Subgroups (Simulations) Each baseline covariate defines 3 subgroups with equal proportions (2 or 5 covariates). Probabilities based on simulations (1000 replicates). Conditional on a statistically significant overall result. # of Subgroups Effect Size = 0.25 Effect Size = 0.50 80% Power (253/group) 90% Power (338/group) 80% Power (64/group) 90% Power (86/group) 6 12% 11% 13% 9% 15 28% 21% 27% 21% delete these guides from slide master before printing or giving to the client 51 delete these guides from slide master before printing or giving to the client MERIT-HF Trial The only pivotal trial to assess the efficacy and safety of metoprolol (Toprol XL) as an adjunctive therapy to optimal standard therapy for patients with congestive heart failure. There were 3991 patients from several hundred sites in US and 13 European countries. The study has two primary endpoints, total mortality and a composite endpoint. 27% of the patients (539 on placebo and 532 on metoprolol) were from the US. delete these guides from slide master before printing or giving to the client 52 delete these guides from slide master before printing or giving to the client HR & 95% CI - Total Mortality All US Europe NYHA II NYHA III NYHA IV EF <= 0.25 EF > 0.25 Previous acute MI:Y Previous acute MI:N Gender - Male Gender - Female Age <= 69.4 Age > 69.4 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 Favors Favors Toprol-XL Placebo delete these guides from slide master before printing or giving to the client 53 delete these guides from slide master before printing or giving to the client Implication for Designing Global Trials Desire to control (minimize) the probability of observing a negative treatment effect in at least one region when the treatment effect is positive and uniform across all regions in a multi-regional (global) trial. delete these guides from slide master before printing or giving to the client 54 delete these guides from slide master before printing or giving to the client Current State Bob O’Neill: PhRMA/FDA Workshop on Multi-Regional Trials 2007. delete these guides from slide master before printing or giving to the client 55 delete these guides from slide master before printing or giving to the client Robert Califf: PhRMA/FDA Workshop 2007 delete these guides from slide master before printing or giving to the client 56 delete these guides from slide master before printing or giving to the client MRCT Cross-Functional Key Issues Team The Biostatistics and Data Management Group convened a Multi-Regional Clinical Trials (MRCT) Key Issues team after the workshop. Bruce Binkowitz (Merck), stat co-chair of the MRCT working group, will present the group’s progress at the Harvard/Schering Plough workshop on May 28-29. The theme of the workshop is “Global Trials: Challenges and Opportunities”. delete these guides from slide master before printing or giving to the client 57 delete these guides from slide master before printing or giving to the client Simultaneous Global Development Committee PhRMA also has a SGD Committee. Its focus is Regulatory, seeking to enable a regulatory framework for allowing global development of therapies that could result in simultaneous global submissions with one single global data-set expedite global patient access to these products Current focus is China, Korea, Taiwan and Japan. delete these guides from slide master before printing or giving to the client 58 delete these guides from slide master before printing or giving to the client Asian Region Cooperation The 1st China-Japan-Korea Ministerial Meeting on Health was held in Korea in April 2007. The 2nd one took place in Nov 2008 in Beijing. They declared in the “Joint Statement of the First Tripartite Health Ministers Meeting (THMM)” to jointly promote cooperation in areas of Clinical Researches, ... Cooperation in an investigation on ethnic factors MHLW set up study group to investigate differences in PK/PD and safety among Asian populations The 1st report on PK difference is targeted 2Q2009 – The Goal : Could Asia be regarded as “one population”? delete these guides from slide master before printing or giving to the client 59 delete these guides from slide master before printing or giving to the client One Approach for Sample Sizing 1. A continuous endpoint that follows a normal distribution. Large values are desirable. 2. Treatment effect within each region is estimated by the difference in the observed means (or observed mean changes from baseline). 3. Effect size (D/s) is uniform across regions. 4. The one-sided significance level for the primary analysis on the overall treatment effect is 2.5%. Power to detect (D/s) is 1-b. 5. For simplicity, we will work with 3 regions with 1:1 allocation to 2 treatments. delete these guides from slide master before printing or giving to the client 60 delete these guides from slide master before printing or giving to the client Framework (Kawai et al DIJ, 2007) Sample Size/Group: N p1 p2 1 p1 The number N is determined to provide an 80% or 90% power for the primary analysis at the one-sided 2.5% level. 2 D3 Region 3 [Largest] p3 D2 Region 2 [2nd smallest] p2 D1 Region 1 [Smallest] p1 Estimated treatment effect (New treatment - Placebo) 1 p1 p1 p2 2 Due to the constraints of p1 ≤ p2 ≤ p3 and p1+p2+p3=1 delete these guides from slide master before printing or giving to the client 61 delete these guides from slide master before printing or giving to the client Basis for Deciding Regional Size D3 Region 3 D2 D1 0 Region 2 Region 1 We want a high probability (e.g. 80% or 90%) that the point estimates for the treatment effect in all regions are positive. Estimated treatment effect (New treatment - Placebo) PCS = Probability that three regions show consistent results. delete these guides from slide master before printing or giving to the client 62 delete these guides from slide master before printing or giving to the client Plots of Pcs against p1 with p2=p1 1.0 0.9 0.9 Probability of observing allDi >0 0.8 Power:90% 0.8 0.7 Pcs never reaches 90% 0.6 Power:80% 0.5 0.4 0.3 0.2 0.151 0.1 0.213 0.277 0.0 0.05 0.10 0.15 0.20 0.25 0.30 p1 Worst case with two small regions and a large one. delete these guides from slide master before printing or giving to the client 63 delete these guides from slide master before printing or giving to the client But … In practice, inference concerning regional results (as a secondary analysis) is relevant only if the overall treatment effect in the confirmatory trial is statistically significant. The above calls for looking at Pcs conditional on first concluding a significant overall treatment effect at the one-sided 2.5% level. delete these guides from slide master before printing or giving to the client 64 delete these guides from slide master before printing or giving to the client Conditional Pcs vs Unconditional Pcs (p1, p2, p3) 80% power 90% power (0.05, 0.05, 0.9) 57.5 (53.7) 64.6 (58.6) (0.1, 0.1, 0.8) 71.1 (65.6) 73.8 (71.7) (0.15, 0.15, 0.7) 82.5 (73.4) 82.5 (79.9) (0.2, 0.2, 0.6) 86.1 (78.9) 90.0 (85.3) (0.25, 0.25, 0.5) 90.9 (82.5) 92.2 (88.8) (0.3, 0.3, 0.4) 93.2 (84.5) 94.4 (90.7) Treatment effect = 0.250, s =1 delete these guides from slide master before printing or giving to the client 65 delete these guides from slide master before printing or giving to the client Conditional Pcs When p2 = p1 0.9 Power:90% 0.8 Conditional Pcs Power:80% Unconditional Power = 90% ○: D/s = 0.125 +: D/s = 0.250 Power = 80% ○: D/s = 0.125 +: D/s = 0.250 delete these guides from slide master before printing or giving to the client 66 delete these guides from slide master before printing or giving to the client PMDA Guidance (Sept 28 2007) Basic Principles on Global Clinical Trials ( http://www.pmda.go.jp/english/publications/index.html ) Method 1 Look at DJapan/Dall. Want Pr (DJapan/Dall > 0.5 | Common D) > 80% Method 2 The “consistency” approach. delete these guides from slide master before printing or giving to the client 67 delete these guides from slide master before printing or giving to the client EMEA Reflection Paper Released for public comments in January 2009. Questions the relevance of some clinical data from emerging regions to support marketing applications in EU due to Intrinsic factors including genetic and nature of disease Extrinsic factors including medical practice, disease definition and study population Includes 5 product areas where extrapolation of study results to European population had been found to be difficult. Encourages an in-depth prospective evaluation of factors if the trial is to provide evidence to support EU filing. It is possible that additional clinical trials within EU might be necessary if extrapolation is judged to be problematic. delete these guides from slide master before printing or giving to the client 68 delete these guides from slide master before printing or giving to the client Summary When there is no reason to suspect the risk difference to differ across strata, IV weighting produces the minimum variance and asymptotically unbiased estimate. However, when the proportions are in the range of (0.25, 0.75), CMH estimates are generally quite close to the IV estimates. When risk difference is suspected to differ across strata, CMH tends to produce more sensible estimates. It is critically important to know the studies and where the data came from. Naïve pooling could produce very misleading results and should be avoided. Stratification often leads to subgroup analysis. We need to consider the role subgroup analysis will play in reporting and interpreting trial results. delete these guides from slide master before printing or giving to the client 69 delete these guides from slide master before printing or giving to the client References Califf RM. (2007). Multiregional clinical trials. Presented at the PhRMA-FDA workshop, Oct 29-30, Washington DC. Dmitrienko A, Molenberghs G, Chuang-Stein C, and Offen W. (2005) Analysis of Clinical Trials Using SAS: A Practical Guide. Cary, NC: SAS Institute Inc. EMEA Points to consider on adjustment for baseline covariates. CPMP/EWP/2863/99 (Nov 2003, coming into operation). EMEA Reflection paper on the extrapolation of results from clinical studies conducted outside Europe to the EU-population. CHMP/EWP/692702/2008. Released for public comments, January 2009. Greenlander S. (1998). Collapsibility. Encyclopedia of Biostatistics, Wiley. 786-788. Grizzle JE. (1982). A note on stratifying versus complete random assignment in clinical trials. Controlled Clinical Trials, 3:365-368. Kawai N, Chuang-Stein C, Komiyama O, Ii Y. (2008). An approach to rationalize partitioning sample size into individual regions in a multi-regional trial. Drug Information Journal, 42(2):139-147. delete these guides from slide master before printing or giving to the client 70 delete these guides from slide master before printing or giving to the client References Li Z, Chuang-Stein C, Hoseyni C. (2007). The probability of observing negative subgroup results when the treatment effect is positive and homogeneous across all subgroups. Drug Information Journal, 41(1):47-56. Ministry of Health, Labour and Welfare. (2007). Basic Principles on Global Clinical Trials. Available at: http://www.pmda.go.jp/operations/notice/2007/file/0928010-e.pdf O’Neill R. (2007). Multi-regional Clinical Trials: Why be concerned? A Regulatory perspective on Issues. Presented at the PhRMA-FDA workshop, Oct 29-30, Washington DC. Permutt T. (2007). A note on stratification in clinical trials. Drug Information Journal, 41:719-722. delete these guides from slide master before printing or giving to the client 71