DRAFT - Signup4.net

advertisement
DRAFT 9/25/08
Recommendation Process
Advisory Committee on Heritable Disorders and Genetic Diseases of Newborns and Children
Purpose
The Advisory Committee on Heritable Disorders and Genetic Diseases of Newborns and Children
(the Advisory Committee) has, as one of its charges, the responsibility of making evidence-based
recommendations regarding what additional tests should be added to the current core set of
newborn screening tests (NBS), as well as evaluating and updating the current core set. To support
the steps required to meet this responsibility, the Advisory Committee has approved a process for
nominating conditions and a process for the systematic review of the evidence (SER) regarding
screening for these conditions. The purpose of this document is to outline the process for creating
a recommendation regarding adding tests to the core set based on the SER.
Indirect chain of evidence
For evaluation of the heritable conditions nominated for recommended inclusion in the uniform
NBS panel of disorders, it is unlikely that the Advisory Committee will have the evidence of
newborn screening or treatment for rare disorders that is widely considered by formal traditional
evidence reviews to be the most reliable. Such evidence usually includes peer-reviewed, large
scale, replicated intervention studies or randomized controlled trials. For most disorders, it will be
necessary to consider evidence that is a compilation of less robust data such as modest-sized open
label clinical studies for evaluating treatment and extrapolating from population-based
observational studies, as available, when evaluating tests. This approach involves creating a chain
of evidence beginning with what is known about the condition and then moving to evaluate the
technical performance of the test, or analytic validity. Next, evidence regarding the strength of the
association between the test result and the condition of interest, or the tests ability to accurately
and reliably identify or predict the disorder or outcome of interest, or clinical validity, must be
evaluated. Finally, the Advisory Committee must evaluate the test’s clinical utility, or the efficacy
and effectiveness of the test in directing the management of newborns and children, balancing
important health outcomes (benefits) and harms of identification and treatment. This will involve
considering whether there is adequate evidence regarding the effectiveness of treatment for the
condition, and adequate consideration of potential harm. While this approach is similar to those
used in other evidence-based recommendation processes, we recognize that allowances must be
made for the evaluation of rare disorders, which is unique in that we may not understand the
clinical significance of test results, phenotype expression of detected genotypes, or the full range
of potentially effective medical or other management options.
Key Questions
The chain of indirect evidence is put together with a set of key questions flowing from an analytic
framework that conceptualizes the issue under study. These questions guide the literature search
help organize the systematic evidence review, and provide a road map for the Advisory Committee
for translating the evidence into a recommendation. The Advisory Committee will need to review
the SER and determine if the evidence for each key question is convincing, adequate, or
inadequate. Figure 1 is a generic analytic framework for use by the Advisory Committee. The
numbers in the figure correspond to the key questions discussed below.
Figure 1. Analytic Framework
1
Testing for
condition
3
Treatment of
Condition
6
General
2
population
of newborns
4
Mortality,
morbidity,
and other
outcomes
Identification
of condition
6
Harms of
testing/identification
7
Harms of
treatment/other
interventions
Key question 1: Is there direct evidence that screening for the condition at birth leads to improved
health outcomes? (overarching question)
As mentioned previously, the level of evidence to support the overarching question involves
controlled intervention trials involving screen-detected individuals. Again, it is unlikely that, for
many conditions considered by the Advisory Committee that there will be direct evidence
available. The remaining key questions allow for the development of a chain of indirect evidence
that, if adequately addressed by research, can be used to support a recommendation.
Key question 2: What is known about the condition? Is the condition well-defined and
important? What is the incidence of the condition in the U.S. population? What is the spectrum of
disease for the condition? What is the natural history of the condition, including the impact of
recognition and treatment?
The question of how well and specifically defined the condition is represents an essential piece of
information to guide the rest of the evidence review and consideration of any recommendation.
Sufficient importance can be judged by considering both incidence of the condition and the
severity of its health impact, such that a condition of lower severity can be important due to a high
incidence, and a rare condition can be important due to serious health consequences.
Understanding the spectrum of disease is essential in considering whether there are cases of the
condition for which treatment is not effective or otherwise unwarranted, which also relates to the
natural history of the condition.
Key question 3: Is there a test for the condition with sufficient analytic utility and validity?
Analytic utility involves the choice of the testing target or targets, the choice of testing platform,
the availability of and access to testing reagents, considering issues such as whether these are
commercially available, custom synthesized, “home-brewed”, and/or part of current research, and
whether they have right to use (RTU) clearance.
Analytic validity refers to the technical, laboratory accuracy of the test in measuring what it is
intended to measure. It must be distinguished from clinical validity, which is the test’s ability to
predict the development of clinical disease. For example, TMS testing may result in a pattern of
acylcarnitines and/or amino acids that is associated with a certain condition. Analytic validity
would deal with the sensitivity and specificity of the TMS testing protocol in accurately and
reliably detecting that pattern. Types of evidence for analytic validity are different than those for
clinical validity and need to address pre-analytic, analytic, and post-analytic issues. Pre-analytical
issues to evaluate include sample and reagent stability.
Consideration of the analytical phase involves evaluating accuracy (including method
comparison), precision (both inter- and intra-assay), recovery, linearity, carry-over (if
applicable), detection limits, signal suppression (if applicable, especially for MS/MS),
intensity criteria (signal/noise), age and gender matched reference values (if applicable),
disease range, and cutoff level defining clinical significance (required for 2nd tier test).
Post-analytical issues to consider include evaluation on interpretive guidelines, used to
define a case, the spectrum of differential diagnoses, and the algorithm for short term
follow-up/confirmatory testing (biochemical, in vitro, and/or molecular).
.
The Advisory Committee will use explicit criteria for judging this evidence as adequate
(acceptable quality and sufficient number of studies) or inadequate (too much uncertainty
regarding the analytic validity). There is a developed detailed description of evaluating analytic
validity, presented in Appendix A, which was developed in part by the EGAPP (Evaluating
Genomic Applications in Practice and Prevention) Working Group (EWG) of the Centers for
Disease Control and Prevention (CDC) and can serve as a starting point for discussion.
It is difficult to determine in isolation what level of analytic validity should be considered
sufficient, as the ramifications of errors from analytic validity are seen when evaluating clinical
validity. However, analytic validity is key to the dissemination of the test. The goal, of course,
would be to have very high analytic sensitivity and specificity and a high level of certainty that
testing programs across the country would be able to implement use of this test with the same level
of analytic validity.
It is possible that evidence on clinical validity will be adequate, while evidence on analytic validity
is not available or is otherwise inadequate. It may be acceptable for the Advisory Committee to
make a positive recommendation to add the condition to the core set, though issues of
dissemination and implementation will need to be carefully considered.
Key question 4: Does the test accurately and reliably detect the condition and clinical disease?
This refers to the test’s clinical validity, which refers to the ability of the test to
accurately predict the development of symptomatic or clinical disease. Clinical
sensitivity and specificity drive both false positives, which carry certain risks, and false
negatives, which would then be detected later if and when the condition became
symptomatic. Key metrics to consider for clinical utility include the sensitivity,
specificity, positive predictive value and false positive rate.
There are two parts to this key question: is the evidence sufficient to conclude that we know what
the clinical validity is? This involves only a consideration of the strength and quality (taken
together as adequacy) of the evidence in the SER to determine that we know the sensitivity and
specificity of the test. The second part of this question relates to whether or not this level of
clinical validity sufficient to justify testing, given the ability of the test to detect a reasonable
number of affected individuals who would be expected to manifest clinical disease, the tradeoff of
risks of false positives, and the benefits of early detection of true positives; these issues related to
both test performance and the incidence/prevalence of the condition. Consideration must be given
to the potential for individuals to test positive but not develop clinical disease. Issues of trade-offs
between false positives, false negatives, and identification of non-clinical conditions all impact
clinical utility. A detailed description of evaluating clinical validity, modified from the in-press
article on the EWG methods, is presented in Appendix B.
Key question 5: Are there available treatments for the condition that improve important health
outcomes? Does treatment of the condition detected through NBS improve important health
outcomes when compared with waiting until clinical detection? Are there subsets of affected
children more likely to benefit from treatment that can be identified through testing or clinical
findings? Are the treatments for affected children standardized, widely available, and if
appropriate, FDA approved?
This question refers to clinical utility, or the ability of testing for the condition to translate to
improvements in important health outcomes, and to whether the potential benefits of testing,
diagnosis and treatment exceed the potential harms. It involves evaluating whether there are
treatments available and the effectiveness of treatment when provided for those in whom the
condition would become clinically manifest, or provided in order to decrease the risk of
developing clinical disease. It is important to note that treatment may include a broad list of
interventions including counseling and support services, beyond the narrow definition of medical
therapy.
To address this question, the Advisory Committee will need to determine the value of proposed
health outcomes considered. The EWG is in the process of publishing a paper on health outcomes
for consideration in evidence-based recommendations for genomic tests. This list is referenced in
the Secretary’s Advisory Committee on Genetics, Health, and Society (SACGHS) report, U.S.
System of Oversight of Genetic Testing: A Response to the Charge of the Secretary of Health and
Human Services (see table below).
These outcomes are not of equal weight or value, and it is likely that a good deal of debate in
Advisory Committee deliberations regarding clinical utility will involve balancing the tradeoffs
between different favorable and unfavorable outcomes. A detailed description of evaluating
clinical utility, modified from the in-press article on the EWG methods, is presented in Appendix
C.
Key questions 6 and 7: Are there harms or risks identified for the identification and/or treatment
of affected children?
These questions are often incompletely address in medical research, yet are key to allowing the
Advisory Committee to balance the potential benefits and risks when making a recommendation
regarding a condition. Included in harms are direct harms to physical health as well as other issues
including labeling, anxiety, adverse impacts on parent and family relationships, and other ethical,
legal, and social implications. At times, the Advisory Committee may need to estimate the degree
or “upper bounds” of potential harm to support decisions regarding net benefit of testing for a
condition.
Key question 8: What is the estimated cost-effectiveness of testing for the condition?
This question does not appear in the analytic framework diagram, but is a consideration that the
Advisory Committee is specifically interested in. There is little published empiric research on the
cost-effectiveness of any health care service, and we would not expect to find studies involving
primary data collection on newborn screening cost-effectiveness. Instead, this question may be
addressed in the literature through decision modeling, which can provide estimates that the
Advisory Committee will take into consideration when adopting a recommendation.
Translating evidence into recommendations categories for Advisory Committee Reports
Based on the evidence report, assessment of the strength and quality of the available evidence, and
consideration of other clinical and social contextual issues, the Advisory Committee will make
recommendations to the Secretary of Health and Human Services regarding whether conditions
should be added to the core set of those recommended for newborn screening. The information is
intended to provide transparent, authoritative advice. It may also be used to promote specific
research to fill in gaps in the evidence for specific conditions. There are three elements to consider
in making the recommendation: magnitude of net benefit, overall adequacy of evidence, and
certainty of net benefit/harm.
Magnitude of net benefit
Essential factors for the development of a recommendation include the relative importance of the
outcomes considered; the health benefits associated testing for the condition and subsequent
interventions; if health benefits are not available from the literature, then the maximum potential
benefits; the harms associated with testing for the condition such as adverse clinical outcomes,
increase in risk, unintended ethical, legal, and/or social issues that result from testing and
subsequent interventions; if harms are not available from the literature, then the maximum
potential harms; and the efficacy and effectiveness of testing for the condition and follow-up
compared to current practice, which might include doing nothing. Benefits and harms may include
psychosocial, familial and social outcomes. Simple decision models or outcomes tables might be
helpful in assessing the magnitudes of benefits and harms, and in estimating the net effect.
Consistent with the processes of other evidence-based recommendation groups, the magnitude of
net benefit (benefit minus harm) can classified as at least moderate, small, or zero/net harm. For
the purposes of the Advisory Committee in making recommendations, moderate or greater net
benefit will considered “significant” and will support a recommendation to add the condition, and
zero/harmful net benefit will support a recommendation to not add the condition. Those conditions
where the magnitude of net benefit is classified as small will be discussed on a case-by-case basis
and classified as either significant or not significant. A recommendation to add a condition where
testing is expected to provide only small net benefit should be supported by a high degree of
certainty based on the evidence (see certainty of net benefit below).
Overall adequacy of evidence
The adequacy of the evidence to answer the key questions can be summarized and classified
across the questions as adequate or inadequate (using explicit criteria). This is also referred to as
assessing the strength of the linkages in the chain of evidence. Adequate evidence would require
studies of fair or better quality of at least clinical utility to support a recommendation. Insufficient
evidence would include no evidence, studies of poor quality, or studies with conflicting results.
There are six critical appraisal questions that should be used to determine adequacy of the evidence
for each key question:
1. Do the studies have the appropriate research design to answer the key
question?
2. To what extent are the studies of high quality (internal validity)?
3. To what extent are the studies generalizable to the US population (external
validity)?
4. How many studies and how large have been done to answer the key question
(precision of the evidence)?
5. How consistent are the studies?
6. Are there additional factors supporting conclusions?
For adequate evidence to support a conclusion, there must be evidence to support most if
not all of these questions satisfactorily.
Certainty of net benefit
Based on the summaries of the evidence for each key question and the evidence chain, the
certainty of the conclusions regarding the net benefit can be classified as sufficient or low. A
conclusion to either recommend adding or not adding the condition with sufficient certainty has an
acceptable risk or level of comfort of “being wrong” and thus a low susceptibility to being
overturned or otherwise altered by additional research. Insufficient certainty should not lead to a
recommendation for or against adding the condition, but should lead to a recommendation for
further research.
Finally, there are likely to be conditions where the evidence is inadequate to reach a conclusion
and make a recommendation based on at least fair evidence of clinical utility and significant net
benefit, but contextual issues support a recommendation to add the condition with a commitment
to fill in the gaps in evidence as experience with the test is gained. We recognize that these
recommendations do not meet the strict criteria of evidence-based as generally accepted, but are
“evidence-informed” or “evidence-supported”. Contextual issues might include things such as
known benefits associated with testing (and intervention) for similar conditions, high incidence
that would translate to potential substantial net benefit, availability of promising but yet unproven
new therapies, or indirect evidence of perhaps lower value health outcomes but with evidence of
low potential harm. These conditions will be recommended with “provisional” status. Conditions
added with a provisional status should be re-evaluated at a time when sufficient numbers of tests
have been performed such that observational data may be available to fill in the gaps in the
evidence chain. This amount time will depend on the incidence of the condition in the populations
tested.
Similarly, population-based pilot studies should be developed and implemented in order to answer
specific evidence gaps. These pilots must be applicable to U.S. populations. The decision whether
to recommend a test provisionally or to refer for pilot studies should be made with careful
considerations of the potential harms associated with the premature acceptance of unproven
clinical strategies, weighed against the potential but health benefits and potential harms of waiting
for more compelling evidence.
Recommendations will be based on the level of certainty that testing will result in significant net
health benefit, based on the evaluation of the evidence. The following matrix will guide
recommendation category.
Table 1: Decision Matrix for Advisory Committee Recommendations
RECOMMENDATION
LEVEL OF CERTAINTY
Recommend adding the
test to the core set
Sufficient
Recommend not adding
the test to the core set
Sufficient
MAGNITUDE OF NET BENEFIT
Significant
Zero or net harm
Recommend adding the
test with “provisional”
status
Insufficient, but the
potential for net
benefit is compelling
enough to add the test
now, with a commitment
to evaluated the experience
with the test over time
Potentially significant,
supported by contextual
considerations
Recommend not adding
the test now, but instead
recommend pilot studies
Insufficient, and additional
evidence is needed to make a
conclusion about net benefit
Potentially significant
or unknown
APPENDIX A
Analytic Validity
The analytic validity of a newborn screening test is its ability to accurately and reliably measure
the presence of a specific pattern of acylcarnitines and/or aminoacides, in the case of tandom mass
spectroscopy, or a specific genetic mutation, of interest in the clinical laboratory, and in specimens
representative of the population of interest. Analytic validity includes analytic sensitivity
(detection rate), analytic specificity (1-false positive rate), reliability (e.g., repeatability of test
results), and assay robustness (e.g., resistance to small changes in pre-analytic or analytic
variables). Errors that affect analytic validity can occur throughout the testing process, and are
categorized as pre-analytic, analytic, and post-analytic. Pre-analytic errors are related to samples
(e.g., wrong sample type, insufficient amount, sample mislabeled at the source), sample handling
(e.g., transport temperature, time in transport, mix-up/mislabeling in laboratory), and data entry.
Post-analytic errors are generally related to transcription/data entry of results and laboratory
reports that contain incorrect or confusing interpretations. It has been estimated that pre- and postanalytic variables are the biggest contributors to laboratory mistakes, accounting for at least twothirds of all errors. Studies performed on specimens that do not represent routinely analyzed
clinical specimens, and that are not subject to all aspects of the routine testing process (e.g., sample
collection, transport, processing), are not sufficient for generalizable characterization of analytic
validity.
Tests kits or reagents that have been cleared or approved by the Food and Drug Administration
(FDA) may provide information on analytic validity that is publicly available for review (e.g.,
510(k) summaries). However, a large proportion of currently available testing offerings are
currently available as laboratory developed tests (LDTs), and have not been reviewed by the FDA.
Consequently, information from other sources must be sought and evaluated. Different tests may
use a similar methodology (such as TMS), and information regarding the analytic validity of a
common technology may be informative. However, general information about the technology
cannot be used as a substitute for specific information about the test under review.
Below is a list of strengths and weaknesses of study designs that have been (or could be) used to
obtain unbiased and reliable information about analytic validity, and provides a quality ranking of
data sources. The best information would come from collaborative studies using a single large,
carefully selected panel of well-characterized control samples that are blindly tested and reported,
with the results independently analyzed. Data from proficiency testing schemes (Levels 1 or 2)
can provide information about all three phases of analytic validity (i.e., analytic, pre- and postanalytic) and inter-laboratory variability.
Hierarchy of study designs/data sources
Level 1
 Collaborative study using a large panel of well characterized samples
 Summary data from well-designed external proficiency testing schemes or inter-laboratory
comparison programs
Level 2
 Other data from proficiency testing schemes
 Well-designed peer-reviewed studies (e.g., method comparisons, validation
studies)
 Expert panel reviewed FDA summaries
Level 3
 Less well designed peer-reviewed studies
Level 4
 Unpublished and/or non-peer reviewed research, clinical laboratory or manufacturer data
 Studies on performance of the same basic methodology, but used to test for a
different target
The list below presents criteria for assessing the quality of individual studies on analytic validity.
The quantity of data includes the number of reports, the total number of positive and negative
controls studied, and the range of methodologies represented. The consistency of findings can be
assessed formally (e.g., by testing for homogeneity), or by less formal methods (e.g., providing a
central estimate and range of values) when sufficient data are lacking. One or more internally
valid studies do not necessarily provide sufficient information to justify routine clinical usage.
Supporting the use of a test in routine clinical practice generally requires studies that provide
estimates of analytic validity that are generalizable to use in diverse “real world” settings. Also,
existing data may support the reliable performance of one methodology, but no data may be
available to assess the performance of one or more other methodologies.
Criteria for evaluating study quality
Adequate descriptions of the index test (test under evaluation)
 Source and inclusion of positive and negative control materials
 Reproducibility of test results
 Quality control/assurance measures
Adequate descriptions of the test under evaluation
 Specific methods evaluated
 Number of positive samples and negative controls tested
Adequate descriptions of the basis for the ‘right answer’
 Comparison to a ’gold standard’ referent test
 Consensus (e.g., external proficiency testing)
 Characterized control materials (e.g., NIST*, sequenced)
Avoidance of biases
 Blinded testing and interpretation


Specimens represent routinely analyzed clinical specimens in all aspects (e.g.,
collection, transport, processing)
Reporting of test failures and uninterpretable or indeterminate results
Analysis of data
 Point estimates of analytic sensitivity and specificity with 95% confidence
intervals
 Sample size / power calculations addressed
Finally, the evidence must be examined overall and a decision regarding whether the evidence is
graded as convincing, adequate or inadequate. When the quality of evidence is Convincing, the
observed estimate or effect is likely to be real, rather than explained by flawed study methodology,
and the conclusion is unlikely to be strongly affected by the results of future studies. When the
quality of evidence is Adequate, the observed results may be influenced by flaws in study
methodology and, as more information becomes available, the estimate or effect may change
enough to alter the conclusion. When the quality of evidence is Inadequate, the observed results
are more likely to be the result of flaws in study methodology rather than an accurate assessment,
and subsequent information is more likely to change the estimate or effect enough to change the
conclusion. Availability of only marginal quality studies always results in Inadequate quality.
The criteria for grading evidence for analytic validity are presented below.
Evidence grading for analytic validity
Convincing evidence: Studies that provide confident estimates of analytic sensitivity and
specificity using intended sample types from representative populations
 Two or more Level 1 or 2 studies that are generalizable, have a sufficient number
and distribution of challenges, and report consistent results
 One Level 1 or 2 study that is generalizable and has an appropriate number and
distribution of challenges
Adequate evidence:
 Two or more Level 1 or 2 studies that:
o lack the appropriate number and/or distribution of challenges
o are consistent, but not generalizable.
o Modeling showing that lower quality (Level 3, 4) studies may be acceptable
for a specific well-defined clinical scenario
Inadequate evidence:
 Combinations of higher quality studies that show important unexplained
inconsistencies
 One or more lower quality studies (Level 3 or 4)
 Expert opinion
APPENDIX B
Clinical Validity
Clinical validity of a newborn screening test may be defined as its ability to accurately and reliably
predict the clinically defined disorder of interest. Clinical validity encompasses clinical sensitivity
and specificity, and the disorder prevalence (the proportion of individuals in the selected setting
who have, or will develop, the clinical disorder of interest). The positive and negative predictive
values can be computed from the clinical sensitivity, clinical specificity and prevalence. Other
variables important to clinical validity are penetrance (usually associated with genetic testing, this
is the proportion of individuals with a specific genotype who manifest the specific associated
phenotype; there is a similar construct for TMS patterns), expressivity (the extent to which a
specific phenotype is expressed in individuals with the associated genotype or a disorder is
expressed in an individual with the associated condition defined by the TMS pattern), and the
genetic and environmental factors that may impact the disorder or the tests.
As with analytic validity, the important characteristics defining overall quality of evidence on
clinical validity are the internal validity of individual studies, the number of studies, the
representativeness of the study population(s) compared to the population(s) to be tested, and the
consistency and generalizability of the findings. The list below provides a hierarchy of study
designs for assessing clinical validity.
Hierarchy of study designs/data sources
Level 1
 Well designed longitudinal cohort studies
 Validated clinical decision rule*
Level 2
 Well designed case-control studies
 Level 3 evidence
Level 3
 Lower quality case-control and cross-sectional studies
 Unvalidated clinical decision rule*
Level 4
 Case series
 Unpublished and/or non-peer reviewed research, clinical laboratory or
manufacturer data
 Consensus guidelines
 Expert opinion
*A
clinical decision rule is an algorithm leading to result categorization. It can also be defined as a
clinical tool that quantifies the contributions made by different variables (e.g., test result, family
history) in order to determine classification/ interpretation of a test result (e.g., for diagnosis,
prognosis, therapeutic response) in situations requiring complex decision-making
The list below provides criteria adopted for grading the internal validity of studies (e.g., study
design, execution, minimizing bias). The quantity of data includes the number of studies or the
number and racial/ethnic distribution of total subjects in the studies. The overall consistency of
clinical validity estimates can be determined by formal methods such as meta-analysis. Minimally,
estimates of clinical sensitivity and specificity should include confidence intervals. In most
instances, estimates of clinical validity will be computed from small datasets focused on
individuals with the disease, or from case/control studies which may, or may not, represent the
wide range or frequency of results that will be found in the general population. However, when
tests are to be widely applied (e.g., for screening) additional data may be needed from the general
population to better quantify clinical validity prior to introduction.
Criteria for assessing study quality
Clear description of the disorder/ phenotype and outcomes of interest
 Status verified for all cases
 Appropriate verification of controls
 Verification does not rely on index test result
 Prevalence estimates are provided
Adequate description of study design and test / methodology
Adequate description of the study population
 Inclusion/exclusion criteria
 Sample size, demographics
 Study population defined and representative of the clinical population to be
tested
 Allele/genotype frequencies or analyte distributions known in general and subpopulations
Independent blind comparison with appropriate, credible reference standard(s)
 Independent of the test
 Used regardless of test results
 Description of handling of indeterminate results and outliers
 Blinded testing and interpretation of results
Analysis of data
 Possible biases are identified and potential impact discussed
 Point estimates of clinical sensitivity and specificity with 95% confidence intervals
 Estimates of positive and negative predictive values
Finally, the evidence must be examined overall and a decision regarding whether the evidence is
graded as convincing, adequate or inadequate (see Appendix A). The criteria for grading evidence
of clinical utility are presented below.
Evidence grading for clinical validity
Convincing evidence: Well-designed and conducted studies in representative
population(s) that measure the strength of association between a genotype or biomarker
and a specific and well-defined disease or phenotype
 Systematic review/meta-analysis of Level 1 studies with homogeneity
 Validated Clinical Decision Rule (CDR)
 High quality Level 1 cohort study
Adequate evidence:
 Systematic review of lower quality studies
 Review of Level 1 or 2 studies with heterogeneity
 Case/control study with good reference standards
 Unvalidated CDR (Level 2)
Inadequate evidence:
 Single case-control study
o Non-consecutive cases
o Lacks consistently applied reference standards
 Single Level 2 or 3 cohort/case-control study
o Reference standard defined by the test or not used systematically
o Study not blinded
 Level 4 data
APPENDIX C
Clinical Utility
The clinical utility of a newborn screening test refers to evidence of improved measurable clinical
outcomes, and its usefulness and added value to patient management decision-making compared to
current management without testing. If a test has utility, it means that the results - positive or
negative – provide information that is of value to the person (or sometimes the individual’s family
or community) in seeking an effective treatment or preventive strategy. Clinical utility
encompasses effectiveness (evidence of utility in real clinical settings), and the net benefit (the
balance of benefits and harms). Frequently, it also involves assessment of efficacy (evidence of
utility in controlled settings). As was the case with analytic and clinical validity, the three
important quality characteristics for clinical utility are quality of individual studies and the overall
body of evidence, the quantity of relevant data, and the consistency and generalizability of the
findings. The lists below provide the hierarchy of study designs for clinical utility, and criteria for
grading the internal validity of studies (e.g., study design, execution, minimizing bias) adopted
from other published approaches.
Hierarchy of study designs/data sources
Level 1
 Meta-analysis of randomized controlled trials (RCT)
Level 2
 A single randomized controlled trial
Level 3
 Controlled trial without randomization
 Cohort or case-control study
Level 4
 Case series
 Unpublished and/or non-peer reviewed studies
 Clinical laboratory or manufacturer data
 Consensus guidelines
 Expert opinion
Criteria for assessing study quality
Clear description of the outcomes of interest
 What was the relative importance of outcomes measured; which were prespecified primary outcomes and which were secondary?
Clear presentation of the study design
 Was there clear definition of the specific outcomes or decision options to be
studied (clinical and other endpoints)?
 Was interpretation of outcomes/endpoints blinded?
 Were negative results verified?
Was data collection prospective or retrospective?
If an experimental study design was used, were subjects randomized?
Were intervention and evaluation of outcomes blinded?
 Did the study include comparison with current practice/empirical treatment (value
added)?
Intervention
 What interventions were used?
 What were the criteria for the use of the interventions?
Analysis of data
 Is the information provided sufficient to rate the quality of the studies?
 Are the data relevant to each outcome identified?
 Is the analysis or modeling explicit and understandable?
 Are analytic methods pre-specified, adequately described, and appropriate for the
study design?
 Were losses to follow-up and resulting potential for bias accounted for?
 Is there assessment of other sources of bias and confounding?
 Are there point estimates of impact with 95% CI?
 Is the analysis adequate for the proposed use?
Finally, the evidence must be examined overall and a decision regarding whether the evidence is
graded as convincing, adequate or inadequate (see Appendix A). The criteria for grading evidence
of clinical utility are presented below.
Grading evidence for clinical utility
Convincing evidence: Well-designed and conducted studies in representative
population(s) that assess specified health outcomes
 Systematic review/meta-analysis of RCTs showing consistency in results
 At least one large RCT (Level 2)
Adequate evidence:
 Systematic review with heterogeneity
 One or more controlled trials without randomization (Level 3)
 Systematic review of Level 3 cohort studies with consistent results
Inadequate evidence:
 Systematic review of Level 3 quality studies or studies with heterogeneity
 Single Level 3 cohort or case-control study
 Level 4 data
Download