QI-Bench “Analyze” ASD Rev 0.1 QI-Bench “Analyze” Scope Description June 2011 Rev 0.1 Required Approvals: Author of this Revision: Andrew J. Buckler Project Manager: Andrew J. Buckler Print Name Signature Date Document Revisions: Revision Revised By Reason for Update Date 0.1 AJ Buckler Initial version June 2011 BBMSC 1 of 11 QI-Bench “Analyze” ASD Rev 0.1 Table of Contents 1. EXECUTIVE SUMMARY ................................................................................................ 3 1.1. APPLICATION PURPOSE .......................................................................................................... 3 1.2. APPLICATION SCOPE .............................................................................................................. 3 1.3. THE REASON WHY THE APPLICATION IS NECESSARY............................................................... 4 1.4. TERMS USED IN THIS DOCUMENT .......................................................................................... 4 2. PROFILES........................................................................................................................... 6 2.1. INFORMATION PROFILES ........................................................................................................ 6 2.2. FUNCTIONAL PROFILES .......................................................................................................... 7 2.2.1. Biostatistical Assessment of Predictive Biomarkers ...................................................... 7 2.2.2. Biostatistical Assessment of Prognostic Biomarkers ..................................................... 8 2.3. BEHAVIORAL PROFILES ......................................................................................................... 9 3. CONFORMANCE ASSERTIONS .................................................................................... 9 4. REFERENCES .................................................................................................................. 10 BBMSC 2 of 11 QI-Bench “Analyze” ASD Rev 0.1 1. Executive Summary Imaging biomarkers are developed for use in the clinical care of patients and in the conduct of clinical trials of therapy. In clinical practice, imaging biomarkers are intended to (a) detect and characterize disease, before, during or after a course of therapy, and (b) predict the course of disease, with or without therapy. In clinical research, imaging biomarkers are intended to be used in defining endpoints of clinical trials. A precondition for the adoption of the biomarker for use in either setting is the demonstration of the ability to standardize the biomarker across imaging devices and clinical centers and the assessment of the biomarker’s safety and efficacy. Currently qualitative imaging biomarkers are extensively used by the medical community. Enabled by the major improvements in clinical imaging, the possibility of developing quantitative biomarkers is emerging. For this document “Biomarker” will be used to refer to the measurement derived from an imaging method, and “device” or “test” refers to the hardware/software used to generate the image and extract the measurement. Regulatory approval for clinical use1 and regulatory qualification for research use depend on demonstrating proof of performance relative to the intended application of the biomarker: In a defined patient population, For a specific biological phenomenon associated with a known disease state, With evidence in large patient populations, and Externally validated. The use of imaging biomarkers occurs at a time of great pressure on the cost of medical services. To allow for maximum speed and economy for the validation process, this strategy is proposed as a methodological framework by which stakeholders may work together. 1.1. Application Purpose The purpose of the QI-Bench project is to aggregate evidence relevant to the process of implementing imaging biomarkers to allow sufficient quality and quantity of data are generated to support the responsible use of these new tools in clinical settings. The efficiencies that follow from using this approach could translate into defined processes that can be sustained to develop and refine imaging diagnostic and monitoring tools for the healthcare marketplace to enable sustained progress in improving healthcare outcomes. Specifically, the “Analyze” app is developed to allow users to: Characterize the method relative to intended use. Apply the existing tools and/or extend them. 1.2. Application Scope From a technology point of view, Analyze refers to the part of the project most closely associated with the Measurement Variability Toolkit portion of AVT live as well as the BBMSC 3 of 11 QI-Bench “Analyze” ASD Rev 0.1 ideas being discussed presently for a library of reference statistical analysis methods. Its job is to enrich the logical specification with statistical results. Most literally, Analyze would be packaged in two forms: 1) as a web-service linking to the databases on the project server dev.bbmsc.com; and 2) as a local installation/instance of the functionality for more sophisticated users. 1.3. The reason why the application is necessary Biomarkers are useful for clinical practice only if they improve performance and to the extent that they are clinically relevant. As such, objective evidence regarding the biomarkers’ relationships to health status must be established. Imaging biomarkers are usually used in concert with other types of biomarkers and with clinical endpoints (such as patient reported outcomes (PRO) or survival). Imaging and other biomarkers are often essential to the qualification of each other. In the past decade researchers have grappled with emerging high-throughput technologies and data analysis problems they present. Statistical validation provides a means of understanding the results of high-throughput datasets. Conceptually, statistical validation involves associating elements in the results of high-throughput data analysis to concepts in an ontology of interest, using the ontology hierarchy to create a summarization of the result, and computing statistical significance for observed trends in such a way as to identify the performance of methods within tested contexts for use and to identify the limits of generalizability across them. The canonical example of statistical validation is when sufficient statistical power either proving or disproving a hypothesis is met, usually stated along with a characterization of variability under defined scenarios. Determining the biological relevance of a quantitative imaging read-out is a difficult problem. First it is important to establish to what extent a read-out is an intermediate end-point capable of being measured prior to a definitive endpoint that is causally rather than coincidentally related. Second, given the combinatorial complexity that arises with a multiplicity of contexts for use coupled with the cost in time and resource to experimentally interrogate the space fully, a logical and mathematical framework is needed to establish how extant study data may be used to establish performance in contexts that have not been explicitly tested. 1.4. Terms Used in This Document The following are terms commonly used that may of assistance to the reader. AAS Application Architecture Specification ASD Application Scope Description BAM Business Architecture Model BRIDG Biomedical Research Integrated Domain Group caBIG Cancer Biomedical Informatics Grid caDSR Cancer Data Standards Registry and Repository BBMSC 4 of 11 QI-Bench “Analyze” ASD CAT Composite Architecture Team CBIIT Center for Biomedical Informatics and Information Technology CFSS Conceptual Functional Service Specification CIM Computational Independent Model DAM Domain Analysis Model EAS Enterprise Architecture Specification ECCF Enterprise Conformance and Compliance Framework EOS End of Support ERB Enterprise Review Board EUC Enterprise Use-case IMS Issue Management System (Jira) KC Knowledge Center NCI National Cancer Institute NIH National Institutes of Health PIM Platform Independent Model PSM Platform Specific Model PMO Project Management Office PMP Project Management Plan QA Quality Assurance QSR FDA’s Quality System Regulation SAIF Service Aware Interoperability Framework SDD Software Design Document SIG Service Implementation Guide SUC System Level Use-case SME Subject Matter Expert SOA Service Oriented Architecture SOW Statement of Work UML Unified Modeling Language UMLS Unified Medical Language System VCDE Vocabularies & Common Data Elements Rev 0.1 When using the template, extend with specific terms related to the particular EUC being documented. BBMSC 5 of 11 QI-Bench “Analyze” ASD Rev 0.1 2. Profiles A profile is a named set of cohesive capabilities. A profile enables an application to be used at different levels and allows implementers to provide different levels of capabilities in differing contexts. Whereas interoperability is the metric with services, applications focus on usability (from a user’s perspective) and reusability (from an implementer’s). Include the following three components in each profile: Information Profile: identification of a named set of information descriptions (e.g. semantic signifiers) that are supported by one or more operations. Functional Profile: a named list of a subset of the operations defined as dependencies within this specification which must be supported in order to claim conformance to the profile. Behavioral Profile: the business workflow context (choreography) that fulfills one or more business purposes for this application. This may optionally include additional constraints where relevant. Fully define the profiles being defined by this version of the application. When appropriate, a minimum profile should be defined. For example, if an application provides access to several business workflows, then one or more should be deemed essential to the purpose of the application. Each functional profile must identify which interfaces are required, and when relevant, where specific data groupings, etc… are covered etc. When profiling, consider the use of your application in: Differing business contexts Different localizations Different information models Partner-to-Partner Interoperability contexts Product packaging and offerings Profiles themselves are optional components of application specifications, not necessarily defining dependencies as they define usage with services. Nevertheless, profiles may be an effective means of creating groupings of components that make sense within the larger application concept. 2.1. Information Profiles BBMSC Identify a named set of information descriptions (e.g. semantic signifiers) that are supported by one or more operations. 6 of 11 QI-Bench “Analyze” ASD Rev 0.1 2.2. Functional Profiles The most demanding standard for clinical biomarker application stems from federal drug approval agencies which have a statutory requirement that any test have demonstrated validity and reliability. In casual scientific conversations in imaging contexts, the words reliability and validity are often used to describe a variety of properties (and sometimes the same one). The metrology view of proof of performance dictates that a measurement result is complete only when it includes a quantitative statement of its uncertainty.2,3 Generating this statement typically involves the identification and quantification of many sources of uncertainty, including those due to reproducibility and repeatability (which themselves may be due to multiple sources). Measures of uncertainty are required to assess whether a result is adequate for its intended purpose and how it compares with alternative methods. A high level of uncertainty can limit utilization, as uncertainty reduces statistical power, especially in the multi-center trials needed for major studies. Uncertainty compromises longitudinal measurements, especially when patients move between centers or when scanner changes occur. 2.2.1. Biostatistical Assessment of Predictive Biomarkers4 Biomarker qualification will be determined by analyses of data addressing treatment induced changes in the biomarker that also correlate with the corresponding clinical outcome. The performance of the biomarker will be assessed against standard practice as a benchmark, as well as how the marker performs relative to established practice. Data used in the study would variously include results from the published literature, retrospectively re-analyzed data from previous clinical trials, and analyzed data from existing ongoing trials and those based on RSNA/QIBA protocols and profiles. The primary purpose of predictive response markers in the phase II setting is to serve as an early but accurate indicator of a promising treatment effect on survival. The key criteria proposed to judge the utility of the new endpoint primarily relate to its ability to accurately and reproducibly predict the eventual phase III endpoint for treatment effect, which is assessed by a difference between two arms on progression-free or overall survival, both at the patient level and more importantly at the trial level. More precisely, the measure of treatment effect on the phase II endpoint must correlate sufficiently well with the measure of treatment effect on the phase III primary endpoint such that the former can be considered reasonably predictive of the latter. It is not sufficient that the endpoint being considered for a phase II trial be a prognostic indicator of clinical outcome. Within the context of a clinical trial, the early endpoint must capture at least a component of treatment benefit, a concept that specifies that a change due to treatment in the early endpoint predicts a change in the ultimate clinical endpoint. Theoretical principles to define treatment benefit were outlined by Prentice,5 although capturing the full treatment benefit (as measured by the phase III endpoint) has been recognized as too cumbersome to be useful in practice.6,7 A more practical and demonstrable criterion requires that the early endpoint captures a substantial proportion of the treatment benefit, for example, more than 50%. This approach has been used to establish the utility of endpoints such as progression-free survival (PFS) by demonstrating that they are sufficiently predictive of OS, even if they do not satisfy BBMSC 7 of 11 QI-Bench “Analyze” ASD Rev 0.1 the Prentice criterion.8,9,10,11,12,13,14,15,16 We are primarily trying to show association of the biomarker to the clinical endpoint that is not likely due to chance. Eventually we would like to do this better than present methods like RECIST. How does 50% compare to RECIST? We would like to claim that FDG and VIA are better than RECIST (or an equivalent measure of disease state). The Freedman approach involves estimating the treatment effect on the true endpoint, defined as s, and then assessing the proportion of treatment effect explained by the early endpoint. However, as noted by Freedman, this approach has statistical power limitations that will generally preclude conclusively demonstrating that a substantial proportion of the treatment benefit at the individual patient level is explained by the early endpoint. In addition, it has been recognized that the proportion explained is not indeed a true proportion, as it may exceed 100%, and that whilst it may be estimated within a single trial, that data from multiple trials are required to provide a robust estimate of the predictive endpoint. Additionally, it can have interpretability problems, also pointed out by Freedman. Buyse and Molenberghs also proposed an adjusted correlation method that overcomes some of these issues. 2.2.2. Biostatistical Assessment of Prognostic Biomarkers17 The assessment framework for predictive markers stems from the accepted definition of a surrogate marker is a measure which can substitute for a more difficult, distant, or expensive-to-measure endpoint in predicting the effect of a treatment or therapy in a clinical trial.18 Greatly complicating the issue is the fact that all the definitions of surrogacy revolve around the elucidation of the joint and conditional distributions of the desired endpoint, putative surrogate and their dependence on a specified therapy.19,20,21,22 Therefore, what may work adequately for a given endpoint and one type of therapy may not be adequate for the same endpoint and a different type of therapy. Disease screening calls for a prognostic marker where it is neither necessary nor possible to anticipate all the potential therapies for which a surrogate marker might be desired. Nevertheless, as measurements are developed that capture more and more accurately the structure, functioning and tissue metabolism of pre-symptomatic cancer, it will become more likely that proposed biomarkers are on the causal pathway to the symptomatic disease and its clinical outcomes and can function as surrogate markers for at least one element of disease. Furthermore, the longitudinal nature of the proposed screening application allows correlation of changes within a person over time between different elements of disease including different measures of structural change, such as volumetric CT findings. So that the screening studies will support analyses that researchers may want to perform to evaluate putative biomarkers and assess their potential for surrogacy, it is designed to have adequate precision for estimating the joint relationship between proposed biomarkers and desired endpoints. At the very least, investigators will be able to identify a number of promising biomarkers for use in early development of treatments and that can be tested in trials as surrogates for treatment effects. These initial objectives for surrogacy may require somewhat different validation BBMSC 8 of 11 QI-Bench “Analyze” ASD Rev 0.1 standards in comparison to use of surrogates by regulatory authorities in registering a new drug treatment. Surrogacy means more than a demonstrable or even a strong association between the desired endpoint and the proposed surrogate and original definitions have been criticized as being limited in scope and having fundamental shortcomings.23 Recent proposals in the context of meta-analysis get more to the heart of surrogacy. By correlating changes in the surrogate with changes in a primary endpoint, these approaches more directly address the surrogacy question. These analytic techniques are equally applicable in a longitudinal setting, such as screening. The techniques for doing so are most easily described in the context of a continuous surrogate (e.g. change in nodule volume) and a continuous outcome. Linear mixed models24 with random slopes (or, more generally, random functions) and intercepts through time are built for both the surrogate marker and the endpoint. That is, the joint distribution of the surrogate marker and the endpoint are modeled using the same techniques as used for each variable individually. The degree to which the random slopes for the surrogate and the endpoint are correlated give a direct measure of how well changes in the surrogate correlate with changes in the endpoint. The ability of the surrogate to extinguish the influence of potent risk factors, in a multivariate model, further strengthens its use as a surrogate marker. In practice, it is likely there will often be competing candidate surrogate markers each correlated to a different degree with the endpoint. The preferred surrogate is one that is biologically defensible and most highly correlated with the endpoint. The statistical significance of the differences between correlations can be evaluated using a parametric bootstrap.25 2.3. Behavioral Profiles The business workflow context (choreography) that fulfills one or more business purposes for this application. This may optionally include additional constraints where relevant. 3. Conformance Assertions Conformance Assertions are testable, verifiable statements made in the context of a single RM-ODP Viewpoint (ISO Standard Reference Model for Open Distributed Processing, ISO/IEC IS 10746|ITU-T X.900). They may be made in four of the five RMODP Viewpoints, i.e. Enterprise, Information, Computational, and/or Engineering. The Technology Viewpoint specifies a particular implementation /technology binding that is run within a ‘test harness’ to establish the degree to which the implementation is conformant with a given set of Conformance Assertions made in the other RM-ODP Viewpoints. Conformance Assertions are conceptually non-hierarchical. However, Conformance Assertions may have hierarchical relationships to other Conformance Assertions within the same Viewpoint (i.e. be increasingly specific). They are not, however, extensible in and of themselves. BBMSC 9 of 11 QI-Bench “Analyze” ASD Rev 0.1 4. References 1 http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?CFRPart=8 20&showFR=1, accessed 28 February 2010. 2 International Organization for Standardization, “Guide to the Expression of Uncertainty in Measurement”, (International Organization for Standardization, Geneva) 1993. 3 Joint Committee for Guides in Metrology, “International Vocabulary of Metrology – Basic and General Concepts and Associated Terms”, (Bureau International des Poids et Mesures, Paris) 2008. 4 Sargent DJ, Rubinstein L, Schwartz L, Dancey JE, Gatsonis C, Dodd LE, Shankar LK. Validation of novel imaging methodologies for use as cancer clinical end-points. European Journal of Cancer 45 (2009) 290-299. 5 Prentice RL. Surrogate endpoints in clinical trials: definitions and operational criteria. Stat Med 1989;8:431–40 6 Freedman LS, Graubard BI, Schatzkin S. Statistical validation of intermediate endpoints for chronic diseases. Stat Med 1992;11:167–78 7 Korn EL, Albert PS, McShane LM. Assessing surrogates as trial endpoints using mixed models. Stat Med 2005;24:163–82. 8 Buyse M, Burzykowski T, Carroll K, et al. Progression-free survival is a surrogate for survival in advanced colorectal cancer. J Clin Oncol 2007;25:5218–24. 9 Buyse M, Thirion P, Carlson RW, Burzykowski T, Molenberghs G, Piedbois P. Relation between tumor response to first line chemotherapy and survival in advanced colorectal cancer: a meta-analysis. Lancet 2000;356:373–8. 10 Burzykowski T, Buyse M, Piccart-Gebhart MJ, et al. Evaluation of tumor response, disease control, progression-free survival, and time to progression as potential surrogate end points in metastatic breast cancer. J Clin Oncol 2008;26:1987–92. 11 Buyse M, Molenberghs G. Criteria for the validation of surrogate endpoints in randomized experiments. Biometrics 1998;54:1014–29 12 Burzykowski T, Molenberghs G, Buyse M, Geys H, Renard D. Validation of surrogate end points in multiple randomized clinical trials with failure time end points. Appl Stat 2001;50:405–22 13 Burzykowski T, Molenberghs G, Buyse M. The validation of surrogate end points by using data from randomized clinical trials: a case-study in advanced colorectal cancer. J Royal Stat Soc A 2004;167:103–24 14 Bruzzi P, Del Mastro L, Sormani MP, et al. Objective response to chemotherapy as a potential surrogate end point of survival in metastatic breast cancer patients. J Clin Oncol 2005;23:5117–25. BBMSC 10 of 11 QI-Bench “Analyze” ASD Rev 0.1 15 Sargent DJ, Wieand HS, Haller DG, et al. Disease-free survival versus overall survival as a primary end point for adjuvant colon cancer studies: individual patient data from 20,898 patients on 18 randomized trials. J Clin Oncol 2005;23:8664–70. 16 Sargent DJ, Patiyil S, Yothers G, et al. End points for colon cancer adjuvant trials: observations and recommendations based on individual patient data from 20,898 patients on 18 randomized trials from the ACCENT group. J Clin Oncol 2007;25:4569–74. 17 Nevitt MC, Felson DT, Lester G. The Osteoarthritis Initiative, Protocol for the Cohort Study. Osteoarthritis Initiative, V 1.1 6.21.06 18 Prentice RL. Surrogate endpoints in clinical trials: Definition and operational criteria. Statistics in Medicine 1989;8:431-40. 19 Buyse M, Molenberghs G, Burzykowski T, Renard D, Geys H. Statistical validation of surrogate endpoints: problems and proposals. Drug Information Journal 2000;34:447-54. 20 Freedman LS, Graubard BI, Schatzkin A. Statistical validation of intermediate endpoints for chronic diseases. Statistics in Medicine 1992;11:167-78. 21 Buyse M, Molenberghs G, Burzykowski T, Geys H, Renard D. The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics 2000;1:1-19. 22 Fleming TR, DeMets DL. Surrogate endpoints in clinical trials: are we being mislead? Annals Int Med 1996;125:605-13. 23 Fleming TR, DeMets DL. Surrogate endpoints in clinical trials: are we being mislead? Annals Int Med 1996;125:605-13. 24 McCulloch CE, Searle SR. Generalized, Linear and Mixed Models. New York: Wiley; 2000. 25 Davison AC, Hinkley DV. Bootstrap methods and their application. Cambridge: Cambridge University Press; 1997. BBMSC 11 of 11