Jacqueline Law, Art DeVault Roche Molecular Systems Sept 19, 2003 1 Diagnostics Methods Comparison Studies for Quantitative Nucleic Acid Assays Introduction PCR based quantitative nucleic acid assays Literature references Acceptance criteria Examples References 2 Diagnostics Outline to validate a new assay Purposes: To show that the new assay has good agreement with the reference assays To show that the assay performs similarly with different types of specimen Premises of methods comparison studies: A linear relationship between the two assays LOD, dynamic range have to be already established Appropriate transformation to normalize the data Analysis: To detect constant bias and proportional bias 3 Diagnostics Methods Comparison Studies: methods is constant across the data range Y v s. M D i ffer en D i f e r n c -2 -1 0 1 2 M e t h o d Y 3 4 5 6 7 8 M ethod Diagnostics Constant Bias: the difference between the two 345678 A v er age 2345678 4 Me t hod X the two methods is linear across the data range M ethod Y v s. M D i f e r n c -0.5 0. 0.5 1.0 M e t h o d Y 3 4 5 6 7 8 D i ffer enc 3 4 5 6 7 8 A v er age 3 5 Diagnostics Proportional Bias: the difference between 4 5 6 7 Me t hod X ) To quantify the viral load by PCR method Characteristics: A wide dynamic range (e.g. 10cp/mL to 1E7 cp/mL) Skewed distribution (non-normal): typically log10 transformation for the data Heteroscedasticity: variance is higher at higher titer levels log10 transformation may not achieve homogeneity in variance (variance at lower end may increase) Other transformation: log x 6 x2 2 Diagnostics PCR based nucleic acid assays PCR based assays: Diagnostics a wide dynamic range - data are log10 transformed O b s e r v d T i t ( L o g c p / m ) -4 -2 0 2 4 6 8 O b s e r v d T i t ( c p / m L ) 0 5*10^7 10^8 1.5*0^8 2*10^8 2.5*10^8 3*10^8 U ntr an L sf o o g r1 m 0 2*10^ 6*10^ 10^ 1. 7 4*10^ 8 7 2 4 8 6 7 8 N om inal N Ti om t er i ( nal c p log10 transformation may remove some skewness 0 5*10^-8 1.5*0^-7 0. 0.1 0.2 0.3 0.4 Untransformed c p/m 7.2E L 1.4E 1 c p 0. 0.1 0.2 0.3 7 Diagnostics PCR based assays: 0 2 4 6 8 1 0 2 4 0 6 0 8 1 0 1 0 0 1 2 0 4 6 0 0 * 1 1 1 0 0 . ^ 4 ^ 7 6 *1 T ite r T ite r 012345 c p/m 7.2E L 1.4E 1 c p 0. 0.5 1.0 1.5 2.0 2.5 3.0 log10 transformed 0. 0.2 0.4 0.6 0.8 7 T ite r - 1 -0 . 0 5 0 . . 1 0 5 .5 .0 1 1 .4 1 .6 2 .8 2 .0 .2 6 6 .8 7 .9 7 .0 7 .1 .2 T ite r 8 T ite r T ite r Correlation coefficient Other coefficients T-test Bland-Altman plot Ordinary least squares regression Passing-Bablok regression Deming regression 9 Diagnostics Literature references on Methods Comparison Studies Measures the strength of linear relationship between two assays Does not measure agreement: cannot detect constant or proportional bias Correlation coefficient can be artificially high for assays that cover a wide range: how high is high? 0.95? 0.99? 0.995? 10 Diagnostics Correlation coefficient R or R2 Concordance coefficient (Lin, 1989): Measures the strength of relationship between two assays that fall on the 45o line through the origin 2 1 2 C 2 1 22 1 2 2 Gold-standard correlation coefficient (St.Laurent 1998): Measures the agreement between a new assay and a gold standard 11 SGG G S DD SGG Diagnostics Other coefficients Paired t-test on the difference in the measurements by two assays Can only detect constant bias Cannot detect proportional bias 12 Diagnostics T-test (Bland and Altman, 1986) Methods: Plot the Difference of the two assays (D = X-Y) vs. the Average of the two assays (A = (X+Y)/2) Visually inspect the plot and see if there are any trends in the plot proportional bias Summarize the bias between the two assays by the mean, SD, 95% CI constant bias Modification: regress D with A, test if slope = 0 (Hawkins, 2002) A useful visual tool: transformation, heteroscedasticity, outliers, curvature 13 Diagnostics Bland-Altman graphical analysis M ethod Y v s. Diagnostics Bland Altman plot (continued) M M e t4 hodY(lgTiter) 6 D i f e r n c ( l o g T i t e r ) -0.5 0. 0.5 1.0 8 D i ffer en 345678 2 A v er age 2 14 4 6 8 Me t hod X ( lo g Tit er ) (l Methods: Regress the observed data of the new assay (Y) with those of the reference assay (X) Minimize the squared deviations from the identity line in the vertical direction Modifications: weighted least squares Assumptions: The reference assay (X) is error free, or the error is relatively small compared to the range of the measurements e.g. in clinical chemistry studies, the measurement errors are minimal 15 Diagnostics Ordinary least-squares regression (continued) If measurement errors exist in both assays, the estimates are biased slope tends to be smaller intercept tends to be larger 16 Diagnostics Ordinary least-squares regression (Passing and Bablok, 1983) A nonparametric approach - robust to outliers Methods: Diagnostics Passing-Bablok regression Estimate the slope by the shifted median of the slopes between all possible sets of two points (Theil estimate) Confidence intervals by the rank techniques Assumptions: The measurement errors in both assays follow the same type of distribution (not necessarily normal) The ratio of the variance is a constant (variance not necessarily constant across the range of data) The sampling distributions of the samples are arbitrary 17 (Linnet, 1990) Methods: Orthogonal least squares estimates: minimize the Diagnostics Deming regression squared deviation of the observed data from the regression line Standard errors for the estimates obtained by Jackknife method Weighted Deming regression when heteroscedastic Assumptions: Measurement errors for both assays follow independent normal distributions with mean 0 Error variances are assumed to be proportional 18 (variance not necessarily constant across the range of data) (Linnet, 1993) Electrolyte study (homogeneous variance): OLS, Passing-Bablok: biased slope, large Type I error, larger RMSE than Deming Deming: unbiased slope, correct Type I error Diagnostics Comparison of the 3 regression methods Metabolite study (heterogeneous variance): All have unbiased slope estimates Weighted LS and weighted Deming are most efficient Type I error is large for OLS, weighted LS, Deming and Passing-Bablok Presence of outliers: Passing-Bablok is robust to outliers 19 Deming regression requires detection of outliers Statistical packages: SAS, Splus Other packages (for Bland-Altman plot, OLS regression, Passing-Bablok regression, Deming regression): Analyse-it (Excel add-on): does not support weighted Deming regression Method Validator (a freeware) CBStat (Linnet K.) 20 Diagnostics Software Independent acceptance criteria for slope and intercept estimates: e.g. slope estimate within (0.9, 1.1), intercept estimate within (-0.2, 0.2) Drawback: asymmetrical acceptance region across the data range 21 Diagnostics Acceptance criteria for regression type analysis Diagnostics Asymmetrical acceptance region = 0 .2 = 2 Y 2 4 B i a s = M e t h o d Y M e t h o d X ( L g T i t e r ) -1.5 -1.0 -0.5 0. 0.5 1.0 1.5 Y MethodY(LgTiter)4 6 8 Sl ope= Asy ( 0.9 m , m 1.1 etr ) 6 -0 .2 8 Me t hod 22 + 2 4 1 .1 + 6 * X 0 .9 8 X ( M Le ot g ho Td it e X r) (L Goals: to show that the new assay is ‘equivalent’ to the reference assay to demonstrate that the bias between the two assays is within some acceptable threshold across the clinical range Acceptance Criteria: EBias EY X A Choice of tolerance level A: accuracy specification for the new assay 23 Diagnostics Proposed acceptance criteria Reference Assay: X i i i New Assay: Yi i i where i is the true concentration, i and i are the independent random measurement errors Bias: Yi X i 1 i i i Acceptance Criteria: E Yi X i 1 i A 24 Diagnostics Mathematical models {Int (-0.2,0.2), Slope (0.9,1.1) } vs. { A= 0.5, L=2, U=7} Diagnostics Comparison of the acceptance criteria: B i a s : M e t h o d Y M e t h o d X ( L g T i t e r ) -1.5 -1.0 -0.5 0. 0.5 1.0 1.5 M e t h o d Y ( L g T i t e r ) 2 3 4 5 6 7 Accepta Sy nce mm R e e t 2 3 4 5 6 7 25 Me t hod 2 3 4 5 6 7 X ( M Le ot g ho Td it e X r) ( criteria for the intercept and slope are dependent Diagnostics Acceptance region for the parameters: S l o p e ( B t a ) 0.8 0.9 1.0 1. 1.2 A c c e p ta n -0 .5 0 .0 0 .5 26 In te rc e p t (A H0 : Bias A vs. H a : Bias A where A is the accuracy specification of the new assay Methods: If the 90% two-sided confidence interval of the Bias lies entirely within the acceptance region (- A, A), then the two assays are equivalent Deming-Jackknife is used to do the estimation 27 Diagnostics Equivalence test (a.k.a. errors-in-variables regression, a structural or functional relationship model) Minimize the sum of squares: n 2 2 S xi i yi i i 1 Diagnostics Deming regression: where = Var()/Var() (assumed known or to be estimated) The solutions are given by: 1 ˆ S S S S 4 S 2 S 2 yy xx xy xx yy 2 xy ˆ y ˆ x Weighted Deming regression: wi 28 1 1 SDi2 Xˆ i Yˆi 2 Duplicate measurements: 1 SD 2N 2 X 1 xi1 xi 2 , SD 2 N 2 SD X ˆ SDY2 2 2 Y 2 y y i1 i 2 >2 replicates: residual errors by ANOVA Mis-specification of (Linnet 1998): biased slope estimate large Type I error 29 Diagnostics Estimation of in Deming regression to obtain the final parameter estimates and the SEs Omit one pair of data at a time, obtain the Deming-regression estimates: ˆ i , ˆi The ith pseudo-values of the intercept and slope are: i nˆ n 1ˆi i n ˆ n 1 ˆi Final estimates and SEs for and are the mean and standard error of i and i 30 Diagnostics Jackknife estimation: At each nominal level , the ith pseudo-value of the Bias is: Biasi n 1 n 1 (i ) (i ) 1 The bias estimate and the SE at each nominal level are the mean and SE of Biasi The 90% CI of the bias at each nominal level are compared to the acceptance region (-A, A) The two assays are concluded to be equivalent if all the CI lie entirely within (-A, A) 31 Diagnostics Bias estimation by Jackknife Example 1: Diagnostics methods comparison for two HIV-1 assays N e w M t h o d ( l g T i e r ) 3 4 5 6 M ethods 3 32 4 5 6 Referenc e M Bland-Altman plot: Diagnostics potential outliers in the data Difer-0n.c5=Nw-Refrnc(logTiter) 0. 0.5 1.0 Bla n d - A l 3 33 4 5 A ve ra g e 6 of R Identify outliers: fitting a linear regression Diagnostics line to the Bland Altman plot 34860 L e v r a g 0.2 0.4 0.6 0.8 0.1 S t u d e n i z R s d u a l -3 -2 -1 0 1 2 3 R esi dua L l e P v e lo ra t 34944 34851 34794 0. 0. 0 0. 01 0. 02 0. 03 04 010 20 30 40 50 34 Fit t ed D if f er S am enc pl e es Remove outliers: Bland-Altman plot shows Diagnostics no trend in Difference vs. Average D-0.6 ifernc=Nw-Refrnc-(0lo.4gTiter) -0.2 0. 0.2 0.4 Bland- A mean difference = 0.02 (95% CI: -0.06, 0.10) slope = 0.033 (p-value = 0.5) 3 4 5 6 Av erage 35 of R Regression analysis: Diagnostics results from the 3 methods are very similar N e w M t h o d ( l g T i e r ) 3 4 5 6 R egr es O LS : Y= 0 P as s ing- B Dem ing- J a 3 36 4 5 6 Referenc e Bias estimation: almost all 90% CI lie within Diagnostics the tolerance bounds (-0.2, +0.2) D-0.6 ifernc=Nw-Refrnc(l-o0g.4Titer) -0.2 0. 0.2 0.4 E s tim ated 3 37 4 5 6 Referenc e between EDTA Plasma and Serum S e r u m ( l o g T i t ) 2 3 4 5 6 7 8 M atr ix 2 3 4 5 6 7 E 8 E DT A 38 Diagnostics Example 2: to show matrix equivalency (l og most titers higher than 1E5 IU/mL, heteroscedasticity? Diagnostics Bland-Altman plot on average titer: D i f e r n c = S u m E D T A ( l o g i t e r ) -1.0 -0.5 0. 0.5 Bla n d - A slope = 0.03 (p-value = 0.6) mean difference = -0.06 (95% CI: -0.16, 0.04) 3 39 4 5 6 7 A ve ra g e of E Checking for heteroscedasticity: S D E T A ( l o g i t e r ) 0.2 0.4 0.6 0.8 0.1 0.12 0.14 3 4 5 6 7 M ean 40 Ser um S D e r u m ( l o g T i t ) 0.2 0.4 0.6 0.8 0.1 0.12 0.14 ED T A Diagnostics residual errors from random effects models 3 4 5 6 7 E DT M A e (a lo n gS T e it r e 1: Diagnostics Pooled within-sample SD for EDTA = 0.0706 Pooled within-sample SD for Serum = 0.0715 V a r ( E D T A o r s ) / V a ( S e u m E r o s ) 0 2 4 6 8 L a mb d a M edi an 3 41 4 5 Lam bda 6 A ve ra g e 7 of E Bias estimation: Diagnostics large variability at low titers due to sparse data fail to demonstrate equivalency at low end D-1.0 ifernc=Sum-EDTA(logiter) -0.5 0. 0.5 E s tim ated 3 4 5 6 7 E DT A 42 (l og Bland M., Altman D. (1986). ‘Statistical methods for assessing agreement between two methods of clinical measurement’. Lancet 347: 307-310. Hawkins D. (2002). ‘Diagnostics for conformity of paired quantitative measurements’. Stat in Med 21: 1913-1935. Lin L.K. (1989). ‘A concordance correlation coefficient to evaluate reproducibility’. Biometrics 45: 255-268. Linnet K. (1990). ‘Estimation of the linear relationship between the measurements of two methods with proportional bias’. Stat in Med 9: 14631473. Linnet K. (1993). ‘Evaluation of regression procedures for methods comparison studies’. Clin Chem 39: 424-432. Linnet K. (1998). ‘Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparisons studies’. Clin Chem 44: 1024-1031. Linnet K. (1999). ‘Necessary sample size for method comparison studies based on regression analysis’. Clin Chem 45: 882-894. 43 Diagnostics References Passing H., Bablok W. (1983). ‘A new biometrical procedure for testing the equality of measurements from two different analytical methods’. J Clin Chem Clin Biochem 21: 709-720. Passing H., Bablok W. (1984). ‘Comparison of several regression procedures for method comparison studies and determination of sample sizes’. J Clin Chem Clin Biochem 22: 431-445. St. Laurent R.T. (1998). ‘Evaluating Agreement with a Gold Standard in Method Comparison Studies’. Biometrics 54: 537-545. 44 Diagnostics References (continued)