Statistical_Analysis_Library_SDD_0.1 - QI

advertisement
Statistical Analysis Library SDD
Rev 0.1
Statistical Analysis Library Software Design Document
August 14, 2012
Rev 0.1
Required Approvals:
Author of this
Revision:
Andrew Buckler
System Engineer:
Andrew Buckler
Print Name
Signature
Date
Document Revisions:
Revision Revised By
Reason for Update
Date
0.1
Initial version
August 14, 2012
BBMSC
AJ Buckler
1 of 9
Statistical Analysis Library SDD
Rev 0.1
Table of Contents
1.
EXECUTIVE SUMMARY ................................................................................................ 3
1.1. REFERENCED STANDARDS ..................................................................................................... 3
1.1.1. Terminology and Methodological Standards ................................................................. 3
1.1.2. Implementation Technology Standards .......................................................................... 3
2.
COMPONENT-LEVEL REQUIREMENTS ................................................................... 3
2.1. FUNCTIONALITY .................................................................................................................... 3
3.
PLATFORM SPECIFIC MODEL .................................................................................... 4
3.1. APPROACH ............................................................................................................................. 5
3.1.1. Module for Bias and Linearity Analysis......................................................................... 5
3.1.2. Module for Bland-Altman and Lin’s Concordance Correlation Coefficient (CCC) ...... 6
3.1.3. Module to compute Linear Mixed Effects Model (LME)................................................ 7
3.1.4. Module to aggregate measurement uncertainty ............................................................. 8
3.2. ANALYSIS MODULES IMPLEMENTATION DETAILS ................................................................. 8
REFERENCES .............................................................................................................................. 9
BBMSC
2 of 9
Statistical Analysis Library SDD
Rev 0.1
1. Executive Summary
The library of re-useable statistical analysis modules in the first iteration are called from
the R server under Iterate.
1.1. Referenced Standards
1.1.1. Terminology and Methodological Standards
We proactively apply consensus terms and to carefully discipline ourselves to
documented implementations. Certain terms, notably precision, accuracy, repeatability,
reproducibility, variability, and uncertainty, are examples of terms that represent
qualitative concepts and thus should be used with care [1]. For example, the accuracy
should not be equated with small bias; e.g., the colloquial Wikipedia definition suggests
that accuracy and precision are independent [2], leading to confusion such as that high
accuracy can occur with low precision.
See the Application Scope Description document (Analyze ASD), for definitions and
methodological discussions.
Domain
Standards
Description
Metrology
Workshop
Terminology and methodology standards
1.1.2. Implementation Technology Standards
Technology
Standards
Description
R
Open source statistical system
2. Component-level Requirements
Note that the normal process of requirements development does not guarantee that
adjacent requirements are directly related. In situations where requirements are tightly
related or where requirements are to be considered in the context of an upper level
requirement, explicit parent-child relationships have been created.
The Origin of a requirement is intended to identify the application for which the
requirement was originally defined.
2.1. Functionality
Functional requirements.
BBMSC
3 of 9
Statistical Analysis Library SDD
Rev 0.1
Requirements
Comment/Test Indication
Activity: Develop analytical method to support testable
hypothesis (e.g., in R scripts)
N/A (heading)
Customize the analysis tool.
N/A (heading)
Analyze data with known statistical properties to determine
whether custom statistical tools are producing valid results.
Specific test patterns with expected
results
Customize built-in statistical methods
Add the measures, summary statistics, outlier analyses, plot
types, and other statistical methods needed for the specific study
design.
Load statistical methods into analysis tool
Configure presentation of longitudinal data
Use of layered functions
Configurable items in call layer
Source functions using R server under
Iterate
Support of QI-Bench “chg” file
Configure the report generator to tailor the formats of exported
data views. The report generator exports data views to files
according to a run-time-configured list of the data views that should
be included in the report.
Configurable items in call layer
Activity: Perform an analysis for a specific test (repeat across tests
considered for group analysis)
N/A (heading)
The technical quality of imaging biomarkers is assessed with
respect to the accuracy and precision of the related physical
measurement(s).
The following comparisons of markups can be made:
Bias and Linearity, Bland Altman and
Lin’s CCC, Linear Mixed Effects
N/A (heading)
Analyze statistical variability
Bland Altman and Lin’s CCC, Linear
Mixed Effects
Measurements of agreement
Bland Altman and Lin’s CCC, Linear
Mixed Effects
User-defined calculations
Configurable items in call layer
Correlate tumor change with clinical indicators
Support of QI-Bench “cov” file as input
to Bias and Linearity and Linear Mixed
Effects modules
Calculate regression analysis
Linear Mixed Effects
Calculate factor analysis
Support of QI-Bench “cov” and “dcm”
files as input to Bias and Linearity and
Linear Mixed Effects modules
Calculate ANOVA
Linear Mixed Effects module
Provide review capability to user for all calculated items and plots
Configurable items in call layer
Drill-down interesting cases, e.g., outlying sub-distributions,
compare readers on hard tumors, etc.
Subset S files with configurable items
in call layer
3. Platform Specific Model
The statistical analysis methods are called from Iterate, built on Taverna, and interfaced
to the RDSM.
BBMSC
4 of 9
Statistical Analysis Library SDD
Rev 0.1
3.1. Approach
Insofar as possible, we take the approach of applying the same analysis techniques to
both literature sources (providing a meta-analysis) as well as the directly measured data
so as to facilitate straightforward comparison of the two disparate sources.
A second principle in our approach is to provide multiple metrics where possible, given
that any given metric may have limitations, subtleties in interpretation, whereas other
metrics may be complementary.
3.1.1. Module for Bias and Linearity Analysis
As a consequence of no universally agreed upon definition of linearity, there is also no
consensus on a single measure of linearity. For calibrated biomarker measurements,
calibrated to an accepted standard reference, there is some expectation for a direct
linear response that the biomarker v. reference relationship is linear with slope=1 and
intercept=0. The re-usable analysis module utilized to assess bias and linearity takes
as input data files comprising reference truth data, such as available from fluiddisplacement method applied to synthetic phantoms or ex-vivo specimens, and
calculates a set of metrics that represent foundational characteristics of the assay. This
analysis is only defined when there exists reliable reference truth.
Methodology
The module assesses linearity or goodness of fit to a reference function, considering a
number of factors when selecting the most appropriate model. The standard practice
methodologies associated with linear and nonlinear characteristic curves, when there is
a linear biomarker characteristic curve and available standard reference assess
strength of a linear relationship between the quantitative biomarker and the measurand.
The use of a standard reference, or phantom, is used with the regressor assumed to be
fixed and does not noticeably vary under the same environmental conditions.
 Linear Regression Assumptions:
 The measurand is fixed and the biomarker is measured at that value
 The measurand value is not related to the slope, intercept or variance of the
biomarker
 Measurand:
 The known quantitative measurement of a physical or digital phantom that is
known within an acceptably small tolerance. For a physical phantom, the
measurement is known with negligible error. In the case of a digital phantom,
the measurand is known without error.
 Model:
 y  1 x   0  
 Known error distribution – Normal
 Least squares
 Estimates:
 Slope, intercept and error variance
 Confidence limits
BBMSC
5 of 9
Statistical Analysis Library SDD
Rev 0.1
 Linear Performance:
 Significant Slope (H0: 1=0)
 Optional: Significant intercept ((H0: 0 = 0)
 Test for Nonlinearity
 Significant quadratic term (p<0.05)
A module to analyze the bias and linearity has been written in R, a language and
environment for statistical computing and chartics. The Bias and Linearity module takes
the SEGFILE = s_Bias-Linearity_seg.csv and the DCMFILE = s_Bias-Linearity_cov.csv
files as input, merges them and creates the intermediate means and standard
deviations, which are corresponding to the original means and standard deviations
reported in the literature data. In addition, the code generates the standard deviations of
the unique BYVAR combinations whereas BYVAR = variable for summarizing by (e.g.,
"ManufacturersModelName" will produce a scatter-plot of reference versus measured
values by each model. If no by variable is specified, then a plot is created for each
unique SUBJID. In such scatter-plot one standard deviation error bars are associated to
all the measurements, except to those for which the authors did not report any standard
deviation.
For a user specified number of iterations (NSIM), the code generates one value per
phantom. The code then computes the slope and intercept of the regression line, as
well as the average difference between simulated value and fluid displacement value.
For each iteration, the slope, intercept and average difference are saved. After the
specified number of iterations are complete, the code computes the 2.5 and 97.5
percentile of the slope, intercept, and average difference to give a 95% non parametric
confidence interval. In addition, the code outputs a non-parametric 95% confidence
intervals for the quadratic term, which consists of a quadratic coefficient and p-value
associated with the quadratic coefficient and the percent of quadratic p-values greater
than 0.05 . Finally, the code computes the concordance correlation coefficient CCC.
Resulting Metrics
The following performance metrics for linearity of quantitative biomarkers are utilized:




Significance of quadratic term
Slope = 1.0 + tolerance
Intercept = 0.0 + tolerance
Pearson correlation coefficient
3.1.2. Module for Bland-Altman and Lin’s Concordance Correlation
Coefficient (CCC)
We utilize two re-usable modules to characterize repeatability and reproducibility. This
analysis module produces relatively simple but powerful metrics on input data records
representing up to two inter-reader, up to two intra-reader, and up to two test-retest
readings to perform relatively popular metrics but without use of a model that produces
the most accurate assessments given the latter’s ability to account for mixed effects and
utilize all availability readings.
BBMSC
6 of 9
Statistical Analysis Library SDD
Rev 0.1
Methodology
One of the more popular methods for describing agreement, between- or within-readers,
test-retest acquisitions, or in other settings has been promulgated by Bland and Altman
in a landmark paper from 1986 [3]. The authors note that “In clinical measurement
comparison of a new measurement technique with an established one is often needed
to see whether they agree sufficiently for the new to replace the old. Such investigations
are often analyzed inappropriately, notably by using correlation coefficients. The use of
correlation is misleading. An alternative approach, based on graphical techniques and
simple calculations, is described, together with the relation between this analysis and
the assessment of repeatability.” The paper describes a calculation method and a
convention regarding how to graphically present the results that we implement here.
Another landmark paper describes the “Concordance Correlation Coefficient” (CCC)
that may be compared to correlation coefficients but seeks to avoid a common difficulty
with them [4]. The metric seeks to overcome limitations of Pearson correlation
coefficients, paired t-tests, and application of least squares analysis. The concordance
correlation coefficient is a measure of agreement that is a product of the correlation
coefficient that is penalized by a bias term that reflects the degree to which the
regression line differs from the line of agreement. The further the regression line is from
the line of agreement, the higher the penalty, and the lower the CCC. In has come to be
known as Lin’s CCC, which we provide here.
Resulting Metrics
The following performance metrics for linearity of quantitative biomarkers are utilized:

Bland-Altman charts for inter-reader, intra-reader, and test-retest performance,
annotataed with upper and lower agreement limits.

Lin’s CCC for inter-reader, intra-reader, and test-retest performance.
3.1.3. Module to compute Linear Mixed Effects Model (LME)
We utilize two re-usable modules to characterize repeatability and reproducibility. This
analysis module provides additional insight beyond the relatively simple metrics
produced by the Bland-Altman and Lin’s CCC methods. This module accepts as input
data records representing an arbitrary number of inter-reader, intra-reader, and/or testretest readings to model multiple fixed and random effects. As such, it is capable of the
most accurate assessment due to its ability to account for multiple sources of variability
and utilize all availability readings. That said, it is also the most complex and is highly
dependent on the appropriateness of model assumptions as well as effect assignment.
Methodology
<cite methods papers that substantiates approach, e.g. [5]>
<brief overview of approach, assumptions, and how to interpret the results>
Resulting Metrics
The following performance metrics for linearity of quantitative biomarkers are utilized:

BBMSC
Pareto of effects
7 of 9
Statistical Analysis Library SDD

Distribution of model effects, including residuals

Inter-reader and intra-reader ICC

qq-plot from indicating how well the residuals follow a normal distribution
Rev 0.1
3.1.4. Module to aggregate measurement uncertainty
<fill in>.
Methodology
<cite methods papers that substantiates approach>
<brief overview of approach, assumptions, and how to interpret the results>
Resulting Metrics
The following performance metrics for linearity of quantitative biomarkers are utilized:

<itemize>
3.2. Analysis Modules Implementation Details
Our approach is to code modules in Matlab <ref>, R [6], and SAS <ref>. <more details>
Core Analysis Modules:




AnalyzeBiasAndLinearity
PerformBlandAltmanAndCCC
ModelLinearMixedEffects
ComputeAggregateUncertainty
Meta-analysis Extraction Modules:



CalculateReadingsFromMeanStdev
o written in MATLAB to generate synthetic data in cases where the number
of readings is known and a mean and standard deviation is reported for
them. In particular, it generates N random numbers from a normal
distribution, and normalizes them to the desired mean and standard
deviation, respectively mu and sigma.
CalculateReadingsFromStatistics
o written in R to generate synthetic data in cases where the number of
readings is known, and mean, standard deviation, and inter- and intrareader correlation coefficients are reported for them.
CalculateReadingsAnalytically
Utility Functions:



BBMSC
PlotBlandAltman
GapBarplot
BLscatterplotfn
8 of 9
Statistical Analysis Library SDD
Rev 0.1
REFERENCES
1.
2.
3.
4.
5.
6.
BBMSC
Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement
Results, Appendix D. Available from:
http://physics.nist.gov/Pubs/guidelines/appd.1.html#d12, accessed August 1,
2012.
Accuracy and precision. Available from:
http://en.wikipedia.org/wiki/Accuracy_and_precision, accessed July 31, 2012.
Bland, J.M. and D.G. Altman, Statistical methods for assessing agreement
between two methods of clinical measurement. Lancet, 1986. 1(8476): p. 307-10.
Lin, L.I., A concordance correlation coefficient to evaluate reproducibility.
Biometrics, 1989. 45(1): p. 255-68.
Eliasziw, M., et al., Statistical methodology for the concurrent assessment of
interrater and intrarater reliability: using goniometric measurements as an
example. Phys Ther, 1994. 74(8): p. 777-88.
R: A Language and Environment for Statistical Computing, 2012, R Core Team:
Vienna.
9 of 9
Download