Statistical Analysis Library SDD Rev 0.1 Statistical Analysis Library Software Design Document August 14, 2012 Rev 0.1 Required Approvals: Author of this Revision: Andrew Buckler System Engineer: Andrew Buckler Print Name Signature Date Document Revisions: Revision Revised By Reason for Update Date 0.1 Initial version August 14, 2012 BBMSC AJ Buckler 1 of 9 Statistical Analysis Library SDD Rev 0.1 Table of Contents 1. EXECUTIVE SUMMARY ................................................................................................ 3 1.1. REFERENCED STANDARDS ..................................................................................................... 3 1.1.1. Terminology and Methodological Standards ................................................................. 3 1.1.2. Implementation Technology Standards .......................................................................... 3 2. COMPONENT-LEVEL REQUIREMENTS ................................................................... 3 2.1. FUNCTIONALITY .................................................................................................................... 3 3. PLATFORM SPECIFIC MODEL .................................................................................... 4 3.1. APPROACH ............................................................................................................................. 5 3.1.1. Module for Bias and Linearity Analysis......................................................................... 5 3.1.2. Module for Bland-Altman and Lin’s Concordance Correlation Coefficient (CCC) ...... 6 3.1.3. Module to compute Linear Mixed Effects Model (LME)................................................ 7 3.1.4. Module to aggregate measurement uncertainty ............................................................. 8 3.2. ANALYSIS MODULES IMPLEMENTATION DETAILS ................................................................. 8 REFERENCES .............................................................................................................................. 9 BBMSC 2 of 9 Statistical Analysis Library SDD Rev 0.1 1. Executive Summary The library of re-useable statistical analysis modules in the first iteration are called from the R server under Iterate. 1.1. Referenced Standards 1.1.1. Terminology and Methodological Standards We proactively apply consensus terms and to carefully discipline ourselves to documented implementations. Certain terms, notably precision, accuracy, repeatability, reproducibility, variability, and uncertainty, are examples of terms that represent qualitative concepts and thus should be used with care [1]. For example, the accuracy should not be equated with small bias; e.g., the colloquial Wikipedia definition suggests that accuracy and precision are independent [2], leading to confusion such as that high accuracy can occur with low precision. See the Application Scope Description document (Analyze ASD), for definitions and methodological discussions. Domain Standards Description Metrology Workshop Terminology and methodology standards 1.1.2. Implementation Technology Standards Technology Standards Description R Open source statistical system 2. Component-level Requirements Note that the normal process of requirements development does not guarantee that adjacent requirements are directly related. In situations where requirements are tightly related or where requirements are to be considered in the context of an upper level requirement, explicit parent-child relationships have been created. The Origin of a requirement is intended to identify the application for which the requirement was originally defined. 2.1. Functionality Functional requirements. BBMSC 3 of 9 Statistical Analysis Library SDD Rev 0.1 Requirements Comment/Test Indication Activity: Develop analytical method to support testable hypothesis (e.g., in R scripts) N/A (heading) Customize the analysis tool. N/A (heading) Analyze data with known statistical properties to determine whether custom statistical tools are producing valid results. Specific test patterns with expected results Customize built-in statistical methods Add the measures, summary statistics, outlier analyses, plot types, and other statistical methods needed for the specific study design. Load statistical methods into analysis tool Configure presentation of longitudinal data Use of layered functions Configurable items in call layer Source functions using R server under Iterate Support of QI-Bench “chg” file Configure the report generator to tailor the formats of exported data views. The report generator exports data views to files according to a run-time-configured list of the data views that should be included in the report. Configurable items in call layer Activity: Perform an analysis for a specific test (repeat across tests considered for group analysis) N/A (heading) The technical quality of imaging biomarkers is assessed with respect to the accuracy and precision of the related physical measurement(s). The following comparisons of markups can be made: Bias and Linearity, Bland Altman and Lin’s CCC, Linear Mixed Effects N/A (heading) Analyze statistical variability Bland Altman and Lin’s CCC, Linear Mixed Effects Measurements of agreement Bland Altman and Lin’s CCC, Linear Mixed Effects User-defined calculations Configurable items in call layer Correlate tumor change with clinical indicators Support of QI-Bench “cov” file as input to Bias and Linearity and Linear Mixed Effects modules Calculate regression analysis Linear Mixed Effects Calculate factor analysis Support of QI-Bench “cov” and “dcm” files as input to Bias and Linearity and Linear Mixed Effects modules Calculate ANOVA Linear Mixed Effects module Provide review capability to user for all calculated items and plots Configurable items in call layer Drill-down interesting cases, e.g., outlying sub-distributions, compare readers on hard tumors, etc. Subset S files with configurable items in call layer 3. Platform Specific Model The statistical analysis methods are called from Iterate, built on Taverna, and interfaced to the RDSM. BBMSC 4 of 9 Statistical Analysis Library SDD Rev 0.1 3.1. Approach Insofar as possible, we take the approach of applying the same analysis techniques to both literature sources (providing a meta-analysis) as well as the directly measured data so as to facilitate straightforward comparison of the two disparate sources. A second principle in our approach is to provide multiple metrics where possible, given that any given metric may have limitations, subtleties in interpretation, whereas other metrics may be complementary. 3.1.1. Module for Bias and Linearity Analysis As a consequence of no universally agreed upon definition of linearity, there is also no consensus on a single measure of linearity. For calibrated biomarker measurements, calibrated to an accepted standard reference, there is some expectation for a direct linear response that the biomarker v. reference relationship is linear with slope=1 and intercept=0. The re-usable analysis module utilized to assess bias and linearity takes as input data files comprising reference truth data, such as available from fluiddisplacement method applied to synthetic phantoms or ex-vivo specimens, and calculates a set of metrics that represent foundational characteristics of the assay. This analysis is only defined when there exists reliable reference truth. Methodology The module assesses linearity or goodness of fit to a reference function, considering a number of factors when selecting the most appropriate model. The standard practice methodologies associated with linear and nonlinear characteristic curves, when there is a linear biomarker characteristic curve and available standard reference assess strength of a linear relationship between the quantitative biomarker and the measurand. The use of a standard reference, or phantom, is used with the regressor assumed to be fixed and does not noticeably vary under the same environmental conditions. Linear Regression Assumptions: The measurand is fixed and the biomarker is measured at that value The measurand value is not related to the slope, intercept or variance of the biomarker Measurand: The known quantitative measurement of a physical or digital phantom that is known within an acceptably small tolerance. For a physical phantom, the measurement is known with negligible error. In the case of a digital phantom, the measurand is known without error. Model: y 1 x 0 Known error distribution – Normal Least squares Estimates: Slope, intercept and error variance Confidence limits BBMSC 5 of 9 Statistical Analysis Library SDD Rev 0.1 Linear Performance: Significant Slope (H0: 1=0) Optional: Significant intercept ((H0: 0 = 0) Test for Nonlinearity Significant quadratic term (p<0.05) A module to analyze the bias and linearity has been written in R, a language and environment for statistical computing and chartics. The Bias and Linearity module takes the SEGFILE = s_Bias-Linearity_seg.csv and the DCMFILE = s_Bias-Linearity_cov.csv files as input, merges them and creates the intermediate means and standard deviations, which are corresponding to the original means and standard deviations reported in the literature data. In addition, the code generates the standard deviations of the unique BYVAR combinations whereas BYVAR = variable for summarizing by (e.g., "ManufacturersModelName" will produce a scatter-plot of reference versus measured values by each model. If no by variable is specified, then a plot is created for each unique SUBJID. In such scatter-plot one standard deviation error bars are associated to all the measurements, except to those for which the authors did not report any standard deviation. For a user specified number of iterations (NSIM), the code generates one value per phantom. The code then computes the slope and intercept of the regression line, as well as the average difference between simulated value and fluid displacement value. For each iteration, the slope, intercept and average difference are saved. After the specified number of iterations are complete, the code computes the 2.5 and 97.5 percentile of the slope, intercept, and average difference to give a 95% non parametric confidence interval. In addition, the code outputs a non-parametric 95% confidence intervals for the quadratic term, which consists of a quadratic coefficient and p-value associated with the quadratic coefficient and the percent of quadratic p-values greater than 0.05 . Finally, the code computes the concordance correlation coefficient CCC. Resulting Metrics The following performance metrics for linearity of quantitative biomarkers are utilized: Significance of quadratic term Slope = 1.0 + tolerance Intercept = 0.0 + tolerance Pearson correlation coefficient 3.1.2. Module for Bland-Altman and Lin’s Concordance Correlation Coefficient (CCC) We utilize two re-usable modules to characterize repeatability and reproducibility. This analysis module produces relatively simple but powerful metrics on input data records representing up to two inter-reader, up to two intra-reader, and up to two test-retest readings to perform relatively popular metrics but without use of a model that produces the most accurate assessments given the latter’s ability to account for mixed effects and utilize all availability readings. BBMSC 6 of 9 Statistical Analysis Library SDD Rev 0.1 Methodology One of the more popular methods for describing agreement, between- or within-readers, test-retest acquisitions, or in other settings has been promulgated by Bland and Altman in a landmark paper from 1986 [3]. The authors note that “In clinical measurement comparison of a new measurement technique with an established one is often needed to see whether they agree sufficiently for the new to replace the old. Such investigations are often analyzed inappropriately, notably by using correlation coefficients. The use of correlation is misleading. An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.” The paper describes a calculation method and a convention regarding how to graphically present the results that we implement here. Another landmark paper describes the “Concordance Correlation Coefficient” (CCC) that may be compared to correlation coefficients but seeks to avoid a common difficulty with them [4]. The metric seeks to overcome limitations of Pearson correlation coefficients, paired t-tests, and application of least squares analysis. The concordance correlation coefficient is a measure of agreement that is a product of the correlation coefficient that is penalized by a bias term that reflects the degree to which the regression line differs from the line of agreement. The further the regression line is from the line of agreement, the higher the penalty, and the lower the CCC. In has come to be known as Lin’s CCC, which we provide here. Resulting Metrics The following performance metrics for linearity of quantitative biomarkers are utilized: Bland-Altman charts for inter-reader, intra-reader, and test-retest performance, annotataed with upper and lower agreement limits. Lin’s CCC for inter-reader, intra-reader, and test-retest performance. 3.1.3. Module to compute Linear Mixed Effects Model (LME) We utilize two re-usable modules to characterize repeatability and reproducibility. This analysis module provides additional insight beyond the relatively simple metrics produced by the Bland-Altman and Lin’s CCC methods. This module accepts as input data records representing an arbitrary number of inter-reader, intra-reader, and/or testretest readings to model multiple fixed and random effects. As such, it is capable of the most accurate assessment due to its ability to account for multiple sources of variability and utilize all availability readings. That said, it is also the most complex and is highly dependent on the appropriateness of model assumptions as well as effect assignment. Methodology <cite methods papers that substantiates approach, e.g. [5]> <brief overview of approach, assumptions, and how to interpret the results> Resulting Metrics The following performance metrics for linearity of quantitative biomarkers are utilized: BBMSC Pareto of effects 7 of 9 Statistical Analysis Library SDD Distribution of model effects, including residuals Inter-reader and intra-reader ICC qq-plot from indicating how well the residuals follow a normal distribution Rev 0.1 3.1.4. Module to aggregate measurement uncertainty <fill in>. Methodology <cite methods papers that substantiates approach> <brief overview of approach, assumptions, and how to interpret the results> Resulting Metrics The following performance metrics for linearity of quantitative biomarkers are utilized: <itemize> 3.2. Analysis Modules Implementation Details Our approach is to code modules in Matlab <ref>, R [6], and SAS <ref>. <more details> Core Analysis Modules: AnalyzeBiasAndLinearity PerformBlandAltmanAndCCC ModelLinearMixedEffects ComputeAggregateUncertainty Meta-analysis Extraction Modules: CalculateReadingsFromMeanStdev o written in MATLAB to generate synthetic data in cases where the number of readings is known and a mean and standard deviation is reported for them. In particular, it generates N random numbers from a normal distribution, and normalizes them to the desired mean and standard deviation, respectively mu and sigma. CalculateReadingsFromStatistics o written in R to generate synthetic data in cases where the number of readings is known, and mean, standard deviation, and inter- and intrareader correlation coefficients are reported for them. CalculateReadingsAnalytically Utility Functions: BBMSC PlotBlandAltman GapBarplot BLscatterplotfn 8 of 9 Statistical Analysis Library SDD Rev 0.1 REFERENCES 1. 2. 3. 4. 5. 6. BBMSC Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results, Appendix D. Available from: http://physics.nist.gov/Pubs/guidelines/appd.1.html#d12, accessed August 1, 2012. Accuracy and precision. Available from: http://en.wikipedia.org/wiki/Accuracy_and_precision, accessed July 31, 2012. Bland, J.M. and D.G. Altman, Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 1986. 1(8476): p. 307-10. Lin, L.I., A concordance correlation coefficient to evaluate reproducibility. Biometrics, 1989. 45(1): p. 255-68. Eliasziw, M., et al., Statistical methodology for the concurrent assessment of interrater and intrarater reliability: using goniometric measurements as an example. Phys Ther, 1994. 74(8): p. 777-88. R: A Language and Environment for Statistical Computing, 2012, R Core Team: Vienna. 9 of 9