Challenges in Process Comparison Studies Seth Clark, Merck and Co., Inc. Acknowledgements: Robert Capen, Dave Christopher, Phil Bennett, Robert Hards, Xiaoyu Chen, Edith Senderak, Randy Henrickson 1 Key Issues • There are different challenges for biologics versus small molecules in process comparison studies • Biologic problem is often poorly defined • Strategies for addressing risks associated with process variability early in product life cycle with limited experience 2 Biologic Process Comparison Problem • Biological products such as monoclonal antibodies have complex bioprocesses to derive, purify, and formulate the “drug substance” (DS) and “drug product” (DP) Buffers Resins Cells Separation & Fermentation Purification Medium Buffers DP • Filtration DS Formulation The process definition established for Phase I clinical supplies may have to be changed for Phase III supplies (for example). – – – – Scale up change: 500L fermenter to 5000L fermenter Change manufacturing site Remove additional impurity for marketing advantage Change resin manufacturer to more reliable source 3 Comparison Exercise ICH Q5E: The goal of the comparability exercise is to ensure the quality, safety and efficacy of drug product produced by a changed manufacturing process, through collection and evaluation of the relevant data to determine whether there might be any adverse impact on the drug product due to the manufacturing process changes Scientific justification for analytical only comparison N Comparison decision Y Meaningful change in CQAs or important analytical QAs N Comparable Y Meaningful change in preclinical animal and/or clinical S/E N Y Not Comparable 4 What about QbD? X space Y space Critical process parms., Material Attrb. Critical Quality Attributes Models Complete? Acceptable Quality Constraint Region that links to Safety, efficacy, etc. Knowledge Space Models? S/E = f(CQAs) + e = f(g(CPP)) + e QbD relates process parameters (CPPs) to CQAs which drive S/E in the clinic Z space Clinical Safety/Efficacy (S/E) Acceptable Clincial S/E 5 Risks and Appropriate Test Truth Comparable Not Comparable Comparable Correct Consumer Risk (mostly) Conclusion Not Comparable Producer Risk (mostly) Correct H0: Not Comparable Analytically Action: Examine with scientific judgment, determine if preclinical/clinical studies needed to determine comparability Ha: Comparable Analytically Action: • • • • • • Support scientific argument with evidence for Comparable CQAs Hypotheses of an equivalence type of test Process mean and variance both important Study design and “sample size” need to be addressed Meaningful differences are often not clear Difficulty defining meaningful differences & need to demonstrate “highly similar” imply statistically meaningful differences may also warrant further evaluation 6 Non-comparability can result from “improvement” Specification Setting USL URL CQA f(CQAs) = S/E ?? ~ LRL Clinical Safety/Efficacy (S/E) LSL • In many cases for biologics an explicit f linking CQA to S/E is unknown • usually is an qualitative link between CQA and S/E • Difficult to establish such an f for biologics • Specs correspond to this link and are refined & supported with clinical experience and data on process capability and stability 7 Process and Spec Life Cycle Time Preclinical CQA Release USL Phase I Commercial Phase III Process 1 Process Development 1 Process 2 2 Process 3 3 LSL Process 3 Process 4 4 Design Space in Effect 1 Preliminary specs and process 1 identified Preclinical/Animal data Phase I Phase III Study Study Clinical Trial Data Commercial 2 Upper spec revised based on clinical S Process revised to lower mean 3 Process revised again but is not tested in clinic (analytical comparison only) 4 Process 3 in commercial production8with further post approval changes Sample Size Problem • “Wide format” • Unbalanced (N old process > N new process) • Process variation, N = # lots – Usually more of a concern – Independence of lots – What drives # lots available? 1. Needs for clinical program 2. Time, resources, funding available 3. Rules of thumb – Minimum 3 lots/process for release – 3 lots/process or fewer stability – 1-2 for forced degradation (2 previous vs 1 new) • DF for estimating assay variation – Usually less of a concern • Have multiple stability testing results • Have assay qualification/validation data sets 9 More about # of Lots Same source DS lot! DP Lot L00528578 L00528579 L00518510 L00518511 L00518542 DS Lot 07-001004 07-001007 07-001013 07-001013 07-001013 “Three consecutive successful batches has become the de facto industry practice, although this number is not specified in the FDA guidance documents” Schneider et. al. (2006) “…batches are not independent. This could be the case if the manufacturer does not shut down, clean out, and restart the manufacturing process from scratch for each of the validation batches.” Peterson (2008) 10 Stability Concerns Forced Degradation Evaluate differences in slope between processes Evaluate differences in derivative curve οΆ CQA/οΆ week Long term Stability Blue process shows improvement in rate ο Not comparable Y = ( ο + Lot ) + (1 + LotTemp + Temp)*f(Months) + eTest + eResidual • • • Constrained intercept multiple temperature model gives more precise lot release means and good estimates of assay + sample variation Similar sample size problems Generally don’t test for differences in lot variation given limited # lots 11 Methods and Practicalities • Methods used – Comparable to data range π2(π2 ) ≤ π1(π1 ) and π2(1) ≥ π1(1) – Conforms to control-limit • Tolerance limits • 3 sigma limits π2(π2 ) ≤π1 + πππΈ and π2(1) ≥ π1 −πππΈ • multivariate process control – Difference test π1 −π2 +π‘ππΈππππ < 0 or π1 −π2 −π‘ππΈππππ > 0 – Equivalence test π1 −π2 +π‘ππΈππππ < Δ and π1 −π2 −π‘ππΈππππ > −Δ • Not practical – Process variance comparison – Large # lots late in development, prior to commercial 12 Methods and Practicalities Symbols are N historical lots Comparisons to N2=3 new lots LSL = -1 Mean=0 USL = 1 Delta = 0.25 Assay var = 2*lot var Total SD = 0.19 Alpha = Pr(test concludes analytically comparable when not) = Pr(consumer risk) Beta = Pr(test concludes not analytically comparable when is) = Pr(producer risk) 13 Defining a Risk Based Meaningful Difference ο³ 0 πΏπ πΏ, 0 3 RSD ππ πΏ − πΏπ πΏ 6πΆππ ππ πΏ + πΏπ πΏ ππ πΏ − πΏπ πΏ , 2 6πΆππ 2 3 2 1 1 Cpkο³C Boundary ο Cpuο³C Boundary 0 ππ πΏ, 0 Starting process πΆππ = min 1 Change not meaningful 2 Change meaningful 3 Change borderline meaningful Risk level of meaningful differences are fine tuned through Cpk or Cpu ο 0,0 πΆππ’ = ππ πΏ, 0 π − πΏπ πΏ ππ πΏ − π , 3π 3π ln(ππ πΏ) − ln(π) 3π LRL = Lower release limit URL = Upper release limit ο = process mean Key quality 14 ο³ = process variance characteristic Defining a Risk Based Meaningful Difference ππ πΏ + πΏπ πΏ ππ πΏ − πΏπ πΏ , 2 6πΆππ ππ πΏ − πΏπ πΏ 6πΆππ 0 RSD ο³ 2 2 1 Cpkο³C Boundary πΏπ πΏ, 0 ο Starting process 1 Meaningful change 2 Meaningful change? 1 Cpuο³C Boundary 0 ππ πΏ, 0 0,0 ο ππ πΏ, 0 Underlying Assumption that we are starting with a process that already has acceptable risk 15 Two-sided meaningful change • Simplifying Assumptions – Process 1 is in control with good capability (true Cpk>C) with respect to meaningful change window, (L,U) – Process 1 is approx. centered in meaningful change window – Process distributions are normally distributed with same process variance, ο³2 • Equivalence Test on process distribution mean difference H0: |π1 − π2 | ≥ Δ HA: |π1 − π2 | < Δ Risk based Δ in terms of Cpk: Δ= π−πΏ πΆ 1− 2 πΆππ The power of this test at π1 − π2 = 0 for unbalanced π gives the sample size calculation: π1 π2 ≥ π‘π1 +π2 −2,1−πΌ + π‘π1 +π2 −2,1−π½/2 π1+ π2 2 1 2 3(πΆππ − πΆ) Sample size driven by type I and II risks and πΆππ − πΆ, the process risk rel. to max risk 16 Historical Two-sided meaningful change sample sizes New A comparison of 3 batches to 3 batches requires a 3 sigma effect size A 2 sigma effect size requires a 13 batch historical database to compare to 3 new batches A 1 sigma effect size requires 70 batch historical database to compare to 10 new batches (not shown) 17 Effect size = process capability in #sigmas vs max tolerable capability in #sigmas One-sided (upper) meaningful change • Similar simplifying assumptions as with two-sided evaluation – Meaningful change window is now (0,U) • Test on process distribution mean difference H0: π2 − π1 ≥ Δ Linear HA: π2 − π1 < Δ H0: π2 /π1 ≥ Δ Ratio HA: π2 /π1 < Δ Risk based Δ in terms of Cpk: Δ= π πΆ 1− 2 πΆππ Risk based Δ in terms of Cpk: Δ=2 1− πΆ πΆππ The sample size at π2 − π1 = 0 or π2 /π1 =1 for unbalanced π : π1 π2 ≥ π1+ π2 π‘π1 +π2 −2,1−πΌ + π‘π1 +π2 −2,1−π½ 3(πΆππ − πΆ) 2 Sample size driven by type I and II risks and πΆππ − πΆ, the process risk rel. to max risk 18 Historical One-sided meaningful change sample sizes New A comparison of 3 batches to 3 batches requires a 3 sigma effect size A 2 sigma effect size requires a 6 batch historical database to compare to 3 new batches A 1 sigma effect size requires 20 batch historical database to compare to 10 new batches (not shown) 19 Effect size = process capability in #sigmas vs max tolerable capability in #sigmas Study Design Issues Designs for highly variable assays: what is a better design? Design Process 1 + assay Run 1 Run 1 P1L1 P1L2 P1L1 P2L1 Run 2 Run 2 Process 1 Process 2 + assay versus P1L2 P2L2 … … Process 2 P2L1 P2L2 Run na Run na P1Lk P1Lk P1Lk P2Lk 20 Sample size with control of assay variation Tested in same runs Comparisons to N2=3 new lots LSL = -1 Mean=0 USL = 1 Delta = 0.25 Run var = 2*lot var Rep var = lot var Total SD = 0.15 21 Summary • Many challenges in process comparison for biologics, chief being number of lots to evaluate the change • For risk based mean shift comparison, process capability needs to be at least a 4 or 5 sigma process within meaningful change windows, such as within release limits. • Careful design of method testing and use of stability information can improve sample size requirements • If this is not achievable, the test/criteria needs to be less powerful (increased producer risk), such as by “flagging” any observed difference to protect consumers risk • Flagged changes need to be assessed scientifically to determine analytical comparability 22 Backup 23 References • • • • • ICH Q5E: Comparability of Biotechnological/Biological Products Subject to Changes in their Manufacturing Process Peterson, J. (2008), “A Bayesian Approach to the ICH Q8 Definition of Design Space,” Journal of Biopharmaceutical Statistics, 18: 959-975 Schneider, R., Huhn, G., Cini, P. (2006). “Aligning PAT, validation, and post-validation process improvement,” Process Analytical Technology Insider Magazine, April Chow, Shein-Chung, and Liu, Jen-pei (2009) Design and Analysis of Bioavailability and Bioequivalance Studies, CRC press Pearn and Chen (1999), “Making Decisions in Assessing Process Capability Index Cpk” 24 Defining a Risk Based Meaningful Difference ππ πΏ + πΏπ πΏ ππ πΏ − πΏπ πΏ , 2 6πΆππ ππ πΏ − πΏπ πΏ 6πΆππ ππ πΏ + πΏπ πΏ ππ πΏ − πΏπ πΏ , 2 6πΆππ 2 2 ο³ ο³ 3 3 1 1 Cpkο³C Boundary 0 πΏπ πΏ, 0 ο Cpmο³C Boundary 0 ππ πΏ, 0 Starting process πΆππ = min 1 Change not meaningful 2 Change meaningful 3 Change borderline meaningful Risk level of meaningful differences are fine tuned through Cpk or Cpm ο πΏπ πΏ, 0 πΆππ ππ πΏ, 0 π − πΏπ πΏ ππ πΏ − π , 3π 3π ππ πΏ − πΏπ πΏ = 6π π−π 1+ π LRL = Lower release limit URL = Upper release limit ο = process mean ο³ = process variance 2 25 Test Cpk? Assume process 1 is in control and has good capability (true Cpk>1) with respect to the release limits. Suppose process 2 is considered comparable to process 1 if πΆππ,2 > 1. That is we want to test H0: πΆππ,2 ≤ 1 Examine with scientific judgment HA: πΆππ,2 > 1 Evidence for Comparable CQAs How many lots are needed to have 80% power assuming they are measured with high precision (e.g., precision negligible) with alpha=0.05? Critical Value = 2 /( n ο 1) ο [( n ο 1) / 2 ] 3 n ο [( n ο 2 ) / 2 ] t ( n ο 1,1 ο ο‘ , 3 C n) Power = 1 − Pr π‘(π − 1,3πΆππ π) Pearn and Chen (1999), “Making Decisions in Assessing Process Capability Index Cpk” 26 Power Assume process 1 is in control and has good capability (true Cpk>1) with respect to the release limits. Suppose process 2 is considered comparable to process 1 if πΆππ,2 > 1. That is we want to test H0: πΆππ,2 ≤ 1 Examine further with scientific judgment HA: πΆππ,2 > 1 Evidence for Comparable CQAs alpha 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 K Sigmas mean from Cpk2 limits 1.33 1.67 1.33 1.33 1.33 1.67 1.67 1.67 4 5 4 4 4 5 5 5 N 49 17 10 5 3 10 5 3 Power 0.80 0.82 0.23 0.13 0.09 0.54 0.25 0.13 Power 27 Comparability to Range Method Process Distribution? P1L4 P1L1 P2L3 P1L3 P1L2 P1L6 P2L1 P2L2 P1L5 H0: π2(π2) ≥ π1(π1) or π2(1) ≤ π1(1) HA: π2(π2) < π1 1. 2. 3. 4. π1 +β and π2(1) > π1(1) −β Determine subset of all historical lots that are representative of historical lot distribution with sufficient data List of historical true lot means defines our historical distribution New process (P2) has significant evidence of comparability if the range of true lot means for the new 28 process can be shown to be within the range of the historical true lots + meaningful difference If meaningful difference is not defined, set β = 0