Analytical vs. Diagnostic Performance

advertisement
Analytical vs. Diagnostic Performance: An Overview, with an
Emphasis on Method Comparison and the Assessment of Bias
Bente Flatland, DVM, MS, DACVIM, DACVP
Associate Professor of Clinical Pathology
Department of Biomedical and Diagnostic Sciences
University of Tennessee, Knoxville, TN
This presentation is adapted from: Flatland B, Friedrichs KR, Klenner S. Differentiating between Analytical and
Diagnostic Performance Evaluation with a Focus on the Method Comparison Study and Identification of Bias.
Veterinary Clinical Pathology 2014; in press. I wish to acknowledge my manuscript co-authors for their
contributions to the material in these proceedings.
Abbreviations Used (listed alphabetically)
CLIA
CV
EQA
PT
QC
STARD
TE
Clinical and Laboratory Improvement Amendments
coefficient of variation
external quality assessment
proficiency testing
quality control
Standards for Reporting of Diagnostic Accuracy
total error
Introduction
Analytical and diagnostic performance of laboratory tests are separate concepts, although they are
clearly related in that a test having poor analytical performance is likely to have poor diagnostic
performance.1 Analytical performance refers to how well an instrument or method can measure the
analyte of interest – in other words, are results both reliable (reproducible) and valid (accurate)?
Diagnostic performance refers to how well a given test can discriminate diseased and non-diseased
individuals. Evaluation of diagnostic performance should follow evaluation of analytical performance.2
Evaluation of Analytical Error, with an Emphasis on Bias Assessment
Analytical performance evaluation assesses the magnitude of analytical error, including
evaluation of imprecision (random error) and bias (inaccuracy). Bias refers to the difference between a
value measured by the test (index) instrument and the "true" value of the analyte as measured by a
comparative method.3 Bias is relative, and the magnitude of bias observed depends upon the comparative
method or material chosen to represent the "true" analyte value.1
The true value of an analyte is a theoretical concept4; in clinical laboratory medicine, the “true”
value is ideally represented by results from a thoroughly researched, highly stable laboratory method
known as a definitive method.5 Definitive methods are complex, technically demanding, and may be
expensive. These are typically used during instrument and method development and to assign values to
Page 1 of 5
Analytical vs. Diagnostic Performance (Flatland)
ASVCP 2014, Education Symposium
certified reference materials and standards at the manufacturing level. In contrast, a reference method can
be employed by qualified laboratories for routine clinical use and has thoroughly documented accuracy
and precision and low susceptibility to interferents; practicality is not necessarily a prime concern for
reference methods.5 A field method is intended to be practical and has adequate precision, accuracy, and
specificity for its intended use.5 For routine clinical use, reference laboratories employ reference methods
or field methods; POCT sites use field methods.
If definitive methods are not available or practical, what options do laboratories have for defining
“true” analyte concentration? Comparative materials or methods that may be used include (a) the known
values of certified reference materials, (b) the known values of assayed quality control materials, (c) peer
group means from EQA/PT events, and (d) mean values from another reference or field method (method
comparison study).1 Each of these approaches to bias assessment has pros and cons, and the choice of
comparative material or method for bias assessment should be made keeping goals of the instrument
performance study (i.e., intended uses of the bias data) in mind. Choice of comparative method may also
be influenced by logistical considerations such method or material availability, financial costs, and
analyte concentrations of interest.1
The major purpose of any method comparison study is to define bias (inaccuracy) of an index
instrument or method. Even if bias is minimal, agreement between methods may be less than optimal if
either method (or both) is imprecise; full assessment of method agreement evaluates both bias and
imprecision.1
In theory, bias assessment should always be species-specific; however, this may be dictated by
complexities of the analyte and test systems in question, as well as laboratory resources. Species-specific
bias assessment presents logistical challenges, including availability of suitable, stable, species-specific
samples and the financial and time costs of investigating bias for multiple species.1 It may be less crucial
for simple, low-molecular-weight analytes (e.g., electrolytes or glucose) than for more complex tests (e.g.,
immunoassays or hematology testing).6 Further, necessity of species-specific bias assessment may
depend on intended use of the bias data (e.g., QC validation or instrument harmonization). Research
documenting the impact of species-specific bias assessment (or lack thereof) on successful veterinary
laboratory outcomes is ideally needed.1
When two field methods (or a field and reference method) are compared, bias is expected.
Comparison of two field methods is relevant if investigators wish to (a) show that a new method is a
satisfactory substitute for an older method, (b) investigate whether a new method being installed at the
testing site requires method-specific reference intervals and to aid reference interval transfer, (c)
determine comparability of two methods (e.g., for performing serial patient evaluations), or (d) facilitate
Page 2 of 5
Analytical vs. Diagnostic Performance (Flatland)
ASVCP 2014, Education Symposium
local harmonization of multiple instruments at the testing site via calibration to a consensus mean or other
value.7
Methods comparison studies should state the purpose of the study (i.e., intended use of the bias
data), present the type and magnitude of any bias identified, and discuss clinical implications of the
observed bias.1 The type of statistical analysis performed in method comparison studies is dictated by the
type of variable(s) being assessed and is different for continuous numerical data and quantitative or semiquantitative data.8 Further, method comparison statistics assume that data are independent (i.e., no
repeated measures from the same individuals). Statistical methods (including how regression and
difference plots are constructed) and the equation used to calculate bias should be given. When bias data
are included in calculations of observed total error (TEobs), the absolute mathematical value for bias is
used.9
Analytical Quality Requirements in Analytical Performance Assessment
A full assessment of analytical performance also involves comparison to an analytical quality
requirement. A quality requirement (a.k.a. analytical quality specification) is a pre-determined
benchmark to which analytical performance of an instrument or method is judged. Analytical quality
requirements are derived from various resources, including expert consensus, biological variation data,
and regulatory requirements, and are classified into to a hierarchy denoting how rigorous (i.e., how
evidence-based and medically relevant) they are.10 Commonly used analytical quality requirements in
veterinary medicine are allowable total error (TEa, consensus-based, level 3), maximum total error (TEmax,
biological variation-based, level 2a), maximum imprecision (CVmax, biological variation-based, level 2a),
maximum bias (Biasmax, biological variation-based, level 2a), and performance requirements established
by CLIA (regulatory-based, level 4).1,9 Biological variation data must obviously be available for the
species in question in order for biological variation-based quality requirements to be calculated.
Depending on the chosen quality requirement, assessment of analytical performance can focus on
imprecision (e.g., observed CV compared to CVmax), bias (e.g., observed bias compared to biasmax), or
both (e.g., TEobs compared to TEa, TEmax, or a CLIA requirement). Any observed value less than the
relevant/chosen quality requirement is considered acceptable.1,11 Observed values greater than the quality
requirement indicate that analytical performance is falls short of what is desired and should prompt
trouble-shooting to identify causes of suboptimal analytical performance. If steps can be taken that
should improve analytical performance, the performance evaluation should be repeated, and results
reassessed in light of the quality requirement.11
Diagnostic Performance Assessment
Page 3 of 5
Analytical vs. Diagnostic Performance (Flatland)
ASVCP 2014, Education Symposium
Diagnostic performance of any laboratory test should be assessed relative to disease diagnosis
and is only needed and relevant for analyte changes of medical importance.1 Diagnostic performance is
most easily assessed for individual laboratory tests for diseases that have a clear gold standard test that
can be used for comparison. A gold standard (a.k.a. reference standard) is the best available test or
method (or constellation of test results and clinical findings) for which a positive (abnormal) result
definitively confirms the presence of disease (also referred to as the “target condition”).12 The gold
standard used to confirm presence of disease and definitively classify study subjects as positive or
negative should be independent of the test whose diagnostic performance is under evaluation (the index
test).2,12
Diagnostic performance may also be of interest for individual tests that, while relatively nonspecific when interpreted individually, have value as part of a panel of tests, or have value when
interpreted in light of a stringent clinical decision threshold (as opposed to a wider population-based
reference interval).1
Studies of diagnostic performance should follow STARD criteria.12,13 Points to keep in mind
when designing such studies are that (a) laboratories implementing well-established methods having
known diagnostic performance may not need to make their own assessment of diagnostic performance
[however, some assessment of analytical performance is required], (b) diagnostic performance of a test is
dependent upon the decision limits used to determine whether a test is positive or negative, (c) the
definition of "disease" (the target condition) should be clinically relevant to patient quality-of-life [and
potentially logistical and/or financial considerations], and (d) inclusion and exclusion criteria for study
subjects must be carefully considered and clearly stated.1
Summary
Before a new method is introduced for routine use, assessment of analytical (and sometimes
diagnostic) performance should be carried out. The main purpose of method comparison in assessing
analytic performance is to define bias (inaccuracy) relative to a comparative method or material, chosen
to represent “true” results. Authors of method comparison studies should clearly define their purpose and
quality requirements in advance of data collection, realizing that the choice of comparative method or
material depends, in part, upon intended use of the resulting bias data.1 Method comparison study data
analysis and interpretation should document not only the type and magnitude of bias, but also discuss its
clinical implications.1 Only after analytical performance has been assessed and appropriate reference
intervals and/or clinical decision thresholds established can evaluation of diagnostic performance proceed.
Diagnostic performance assessment is easiest when a clear gold standard test for disease diagnosis is
Page 4 of 5
Analytical vs. Diagnostic Performance (Flatland)
ASVCP 2014, Education Symposium
available. Authors of diagnostic performance studies should follow STARD criteria and clearly define
disease stage and the study population prior to data collection.1
References
1. Flatland B, Friedrichs KR, Klenner S. Differentiating between analytical and diagnostic
performance evaluation with a focus on the method comparison study and identification of bias.
Vet Clin Pathol 2014; in press.
2. Jensen AL, Kjelgaard-Hansen M. Diagnostic test validation. In: Weiss DJ, Wardrop KJ, eds.
Schalm’s Veterinary Hematology, 6th ed. Ames, IA: Wiley-Blackwell;2010:1027-1033.
3. Stockham SL, Scott MA. Introductory concepts. In: Stockham SL, Scott MA. Fundamentals of
Veterinary Clinical Pathology. 2nd Ed. Ames, IA:Blackwell Publishing; 2008:3-51.
4. Armbruster D. Accuracy controls: Assessing Trueness (Bias). Clin Lab Med. 2013;33:125-137.
5. Tietz NW. A model for a comprehensive measurement system in clinical chemistry. Clin Chem
1979;25:833-839.
6. Armbruster D, Miller RR. The Joint Committee for Traceability in Laboratory Medicine
(JCTLM): A global approach to promote the standardization of clinical laboratory test results.
Clin Biochem Rev 2007;28:105-113.
7. Miller WG, Myers GL, Gantzer ML, et al. Roadmap for harmonization of clinical laboratory
measurement procedures. Clin Chem 2011;57:1108-1117.
8. Jensen AL, Kjelgaard-Hansen M. Method comparison in the clinical laboratory. Vet Clin Pathol.
2006;35:276-286.
9. Harr KE, Flatland B, Nabity M, Freeman KP. ASVCP Guidelines: allowable total error
guidelines for biochemistry. Vet Clin Pathol 2013;42:424-436.
10. Kenny D, Fraser CD, Hyltoft-Petersen P, Kallner A. Consensus agreement. Scand J Clin Lab
Invest 1999;59:585.
11. Lester S, Karr KE, Rishniw M, Pion PD. Current quality assurance concepts and considerations
for quality control of in-clinic biochemistry testing. J Am Vet Med Assoc 2013;242:182-192.
12. Bossuyt PM, Reitsma JB, Bruns DE, et al. The STARD Statement for reporting studies of
diagnostic accuracy: explanation and elaboration. Clin Chem 2003;49:7-18.
13. Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of
diagnostic accuracy: the STARD initiative. Vet Clin Pathol 2007;36:8-12.
Page 5 of 5
Download