System Use Case - QI

advertisement
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
QI-Bench: Informatics Services for Characterizing
Performance of Quantitative Medical Imaging
System Use Case
August 16, 2011
Rev 1.0
Required Approvals:
Author of this Revision:
Andrew J. Buckler
System Architect: Andrew J. Buckler
Print Name
Signature
Date
Document Revisions:
Revision
BBMSC
Revised By
Reason for Update
0.1
Andrew J. Buckler
Stand-alone doc from EUC
1.0
Andrew J. Buckler
Updated to open Phase 4
Date
June 15, 2011
August 16, 2011
1 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
Table of Contents
1. INTRODUCTION ...................................................................................................................................................3
1.1. PURPOSE & SCOPE ...............................................................................................................................................3
1.2. INVESTIGATORS, COLLABORATORS, AND ACKNOWLEDGEMENTS........................................................................3
1.3. TERMS USED IN THIS DOCUMENT ........................................................................................................................4
2. DESIGN OVERVIEW ............................................................................................................................................5
3. USE CASES .............................................................................................................................................................7
3.1. CREATE AND MANAGE SEMANTIC INFRASTRUCTURE AND LINKED DATA ARCHIVES..........................................8
3.1.1. Define, Extend, and Disseminate Ontologies, Vocabularies, and Templates ..............................................9
3.1.2. Install and Configure Linked Data Archive Systems ................................................................................. 10
3.1.3. Create and Manage User Accounts, Roles, and Permissions .................................................................... 11
3.1.4. Query and Retrieve Data from Linked Data Archive................................................................................. 12
3.2. CREATE AND MANAGE PHYSICAL AND DIGITAL REFERENCE OBJECTS ............................................................. 12
3.2.1. Develop Physical and/or Digital Phantom(s) ............................................................................................ 13
3.2.2. Import Data from Experimental Cohort to form Reference Data Set ........................................................ 14
3.2.3. Create Ground Truth Annotations and/or Manual Seed Points in Reference Data Set ............................. 16
3.3. CORE ACTIVITIES FOR BIOMARKER DEVELOPMENT .......................................................................................... 17
3.3.1. Set up an Experimental Run....................................................................................................................... 17
3.3.2. Execute an Experimental Run .................................................................................................................... 19
3.3.3. Analyze an Experimental Run .................................................................................................................... 21
3.4. COLLABORATIVE ACTIVITIES TO STANDARDIZE AND/OR OPTIMIZE THE BIOMARKER ....................................... 22
3.4.1. Validate Biomarker in Single Center or Otherwise Limited Conditions.................................................... 23
3.4.2. Team Optimizes Biomarker Using One or More Tests .............................................................................. 25
3.4.3. Support “Open Science” Publication model ............................................................................................. 27
3.5. CONSORTIUM ESTABLISHES CLINICAL UTILITY / EFFICACY OF PUTATIVE BIOMARKER .................................... 28
3.5.1. Measure Correlation of Imaging Biomarkers with Clinical Endpoints ..................................................... 29
3.5.2. Comparative Evaluation vs. Gold Standards or Otherwise Accepted Biomarkers .................................... 30
3.5.3. Formal Registration of Data for Qualification .......................................................................................... 32
3.6. COMMERCIAL SPONSOR PREPARES DEVICE / TEST FOR MARKET ...................................................................... 34
3.6.1. Organizations Issue “Challenge Problems” to Spur Innovation ............................................................... 34
3.6.2. Compliance / Proficiency Testing of Candidate Implementations ............................................................. 36
3.6.3. Formal Registration of Data for Approval or Clearance .......................................................................... 37
4. REFERENCES ...................................................................................................................................................... 39
BBMSC
2 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
1. Introduction
1.1. Purpose & Scope
Quantitative results from imaging methods have the potential to be used as biomarkers in both routine
clinical care and in clinical trials, in accordance with the widely accepted NIH Consensus Conference
definition of a biomarker.1 In particular, when used as biomarkers in therapeutic trials, imaging methods
have the potential to speed the development of new products to improve patient care. 2,3
Imaging biomarkers are developed for use in the clinical care of patients and in the conduct of clinical
trials of therapy. In clinical practice, imaging biomarkers are intended to (a) detect and characterize
disease, before, during or after a course of therapy, and (b) predict the course of disease, with or without
therapy. In clinical research, imaging biomarkers are intended to be used in defining endpoints of clinical
trials. A precondition for the adoption of the biomarker for use in either setting is the demonstration of the
ability to standardize the biomarker across imaging devices and clinical centers and the assessment of
the biomarker’s safety and efficacy.
Although qualitative biomarkers can be useful, the medical community currently emphasizes the need for
objective, ideally quantitative, biomarkers. “Biomarker” refers to the measurement derived from an
imaging method, and “device” or “test” refers to the hardware/software used to generate the image and
extract the measurement.
Regulatory approval for clinical use4 and regulatory qualification for research use depend on
demonstrating proof of performance relative to the intended application of the biomarker:
 In a defined patient population,
 For a specific biological phenomenon associated with a known disease state,
 With evidence in large patient populations, and
 Externally validated.
This document describes public resources for methods and services that may be used for the
assessment of imaging biomarkers that are needed to advance the field. It sets out the workflows that
are derived the problem space and the goal for these informatics services as described in the Basic Story
Board.
1.2. Investigators, Collaborators, and Acknowledgements


Buckler Biomedical Associates LLC
Kitware, Inc.
In collaboration with:
 Information Technology Laboratory of (ITL) National Institute of Standards and Technology
(NIST)
 Quantitative Imaging Biomarker Alliance (QIBA)
 Imaging Workspace of caBIG
It is also important to acknowledge the many specific individuals who have contributed to the
development of these ideas. A subset of some of the most significant include Dan Sullivan, Constantine
Gatsonis, Dave Raunig, Georgia Tourassi, Howard Higley, Joe Chen, Rich Wahl, Richard Frank, David
Mozley, Larry Schwartz, Jim Mulshine, Nick Petrick, Ying Tang, Mia Levy, Bob Schwanke, and many
others <if you do not see your name, please do not hesitate to raise the issue as it is our express
intent to have this viewed as an inclusive team effort and certainly not only the work of the direct
investigators.>
BBMSC
3 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
1.3. Terms Used in This Document
The following are terms commonly used that may of assistance to the reader.
BAM
BRIDG
BSB
caB2B
Business Architecture Model
Biomedical Research Integrated Domain Group
Basic Story Board
Cancer Bench-to-Bedside
caBIG
CAD
caDSR
CDDS
CD
CDISC
CBER
CDER
CIOMS
CIRB
Clinical management
Clinical trial
Cancer Biomedical Informatics Grid
Computer-Aided Diagnosis
Disease Data Standards Registry and Repository
Clinical Decision Support Systems
Compact Disc
Clinical Data Interchange Standards Consortium
Center for Biologics Evaluation and Research
Center for Drug Evaluation and Research
Council for International Organizations of Medical Sciences
Central institutional review board
The care of individual patients, whether they be enrolled in clinical trial(s) or not
A regulatory directed activity to prove a testable hypothesis for a determined
purpose
Computed Tomography
Domain Analysis Model
Digital Imaging and Communication in Medicine
Deoxyribonucleic Acid
Data Safety Monitoring Board
Enterprise Conformance and Compliance Framework
Electronic Case Report Form
Electrocardiogram
Electronic Medical Records
Enterprise Use Case
Enterprise Vocabulary Services
Food and Drug Administration
Fluorodeoxyglucose
Health Level Seven
Institutional Biosafety Committee
Institute of Biological Engineering
Institutional Review Board
Interagency Oncology Task Force
Image Query
in-vitro diagnosis
Medicines and Healthcare Products Regulatory Agency
Magnetic Resonance Imaging
National Cancer Institute
National Institute of Biomedical Imaging and Engineering
National Institutes of Health
National Library of Medicine
CT
DAM
DICOM
DNA
DSMB
ECCF
eCRF
EKG
EMR
EUC
EVS
FDA
FDG
HL7
IBC
IBE
IRB
IOTF
IQ
IVD
MHRA
MRI
NCI
NIBIB
NIH
NLM
BBMSC
4 of 40
QI-Bench: Informatics Services for Quantitative Imaging
Nuisance variable
Observation
PACS
PET
Pharma
Phenotype
PI
PRO
QA
QC
RMA
RNA
SDTM
SEP
SUC Rev 1.0
A random variable that decreases the statistical power while adding no information
of itself
The act of recognizing and noting a fact or occurrence
Picture Archiving and Communication System
Positron Emission Tomography
pharmaceutical companies
The observable physical or biochemical characteristics of an organism, as
determined by both genetic makeup and environmental influences. 5
Principal Investigator
Patient Reported Outcomes
Quality Assurance
Quality Control
Robust Multi-array Average
Ribonucleic Acid
Study Data Tabulation Model
Surrogate End Point
Tx
Systematized Nomenclature of Medicine – Clinical Terms
Service-Oriented Architecture
System Use Case
In clinical trials, a measure of effect of a certain treatment that correlates with a real
clinical endpoint but does not necessarily have a guaranteed relationship. 6
Treatment
UMLS
US
VCDE
VEGF
WHO
XIP
XML
Unified Medical Language System
Ultrasound
Vocabularies & Common Data Elements
vascular endothelial growth factor
World Health Organization
eXtensible Imaging Platform
Extensible Markup Language
SNOMED CT
SOA
SUC
Surrogate endpoint
2. Design Overview
The workflow for statistical validation implemented by QI-Bench is shown in Figure 1.
BBMSC
5 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
Reference Data Set
Manager
Reference Data Sets, Annotations, and
Analysis Results
QIBO
Batch Analysis
Service
3. Batch analysis
scripts
Source of clinical study
results
Quantitative Imaging
Specification Language
UPICT Protocols, QIBA
Profiles, entered with
Ruby on Rails web
service
4. Output
UPICT Protocols, QIBA
Profiles, literature
papers and other
sources
QIBO(red edges
represent
biostatistical
generalizability)
Clinical Body of Evidence
BatchMaketo enable
(formatted
Scripts
SDTM
and/or other
standardized registrations
Figure 1: QI-Bench Semantic Workflow
This functionality is decomposed into the following free-standing, but linked, applications (Figure 2).
•Specify context for
use and assay
methods.
•Use consensus terms
in doing so.
Specify
QIBO,
RadLex/
Snomed/
NCIt built
using Ruby
on Rails.
Formulate
•Assemble applicable
reference data sets.
•Include both imaging
and non-imaging
clinical data.
caB2B,
NBIA,
PODS data
elements,
DICOM
query tools.
•Compose and iterate
batch analyses on
reference data.
•Accumulate
quantitative read-outs
for analysis.
Execute
MIDAS,
BatchMake,
Condor
Grid. Built
using
CakePHP.
Analyze
•Characterize the
method relative to
intended use.
•Apply the existing
tools and/or extend
them.
MVT
portion of
AVT, reuseable
library of R
scripts.
•Compile evidence for
regulatory filings.
•Use standards in
transfer to regulatory
agencies.
Package
STDM
standard of
CDISC into
repositories
like FDA’s
Janus.
Figure 2: QI-Bench Applications
BBMSC
6 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
QI-Bench may be deployed at various scopes and configurations:
Figure 3: QI-Bench Web Deployment
3. Use Cases
The following sections describe the principal workflows which have been identified. The sequence in
which they are presented is set up to draw attention to the fact that each category of workflows builds on
others. As such, it forms a rough chronology as to what users do with a given biomarker over time, and
may also be useful to guide the design in such a way as it may be implemented and staged efficiently
(Fig. 7).
BBMSC
7 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
Figure 4: Relationship between workflow categories that illustrates progressive nature of the
activities they describe and possibly also suggesting means for efficient implementation and
staging.
3.1. Create and Manage Semantic Infrastructure and Linked Data Archives
Scientific research in the medical imaging field involves interdisciplinary teams, in general performing
separate but related tasks from acquiring datasets to analyzing the processing results. Collaborative
activity requires that these be defined and implemented with sophisticated infrastructure that ensures
interoperability and security.
The number and size of the datasets continue to increase every year due to advancements in the field. In
order to streamline the management of images coming from clinical scanners, hospitals rely on picture
archiving and communication systems (PACS). In general, however, research teams can rarely access
PACS located in hospitals due to security restriction and confidentiality agreements. Furthermore, PACS
have become increasingly complex and often do not fit in the scientific research pipeline.
The workflows associated with this enterprise use case utilize a “Linked Image Archive” for long term
storage of images and clinical data and a “Reference Data Set Manager” to allow creation and use of
working sets of data used for specific purposes according to specified experimental runs or analyses. As
such, the Reference Data Set is a selected subset of what is available in the Linked Data Archive (Fig. 8).
BBMSC
8 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
Figure 5: Use Case Model for Create and Manage Reference Data Set (architectural view)
As in the case with the categories as a whole, individual workflows are generally understood as building
on each other (Fig. 9).
Figure 6: Workflows are presented to highlight how they build on each other.
The following sections describe the workflows to establish and manage the Linked Data Archive and
Reference Data Sets.
3.1.1. Define, Extend, and Disseminate Ontologies, Vocabularies, and
Templates
Means to structure and define lexicons, vocabularies, and interoperable architectures in support the
creation and integration of Linked Data Archives, Reference Data Sets, Image Annotation Tool templates,
etc. is needed to establish common grammars and semantics and provide for external interfaces.
For example, users and clinicians need to record different image metadata (quantitative and semantic),
necessitating a different “template” to be filled out while evaluating the image. Means for extending
Annotation Tool templates so that new experiments can be supported without programming on the
application (workstation) side is required. Once a template is created, any workstation will simply select it
to enable Reference Data Set per the template. This will be crucial when people start a new cancer study.
Need a platform-independent image metadata template editor to enable the community to define custom
templates for research and clinical reporting. There will be ultimately be a tie in to eCRF.
BBMSC
9 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
PRE-CONDITIONS
 Relationship to external knowledge sources (e.g., ontologies) is understood and links are made.
 Information model for our services has been established.
FLOW OF EVENTS
1. Scientist identifies novel measurement, observation, processing step, or statistical analysis
method.
2. Scientist defines new data structures.
3. Semantically defined interoperable data available on grid.
4. Grid users able to discover and integrate those data without coding.
5. New user can now open up same study in viewer and when they annotate, they could see new
annotations and markups associated with the algorithm.
POST-CONDITIONS
 Controlled vocabulary and semantic infrastructure is defined.
 Semantics are maintained.
3.1.2. Install and Configure Linked Data Archive Systems
Imaging biomarker developers have a critical need to work with as large and diverse data as early as
possible in the development cycle, and then subsequently throughout it. This spans a wide range of
potentially useful imaging datasets including synthetic and real clinical scans of phantoms and clinical
imaging datasets of patients with and without the disease/condition being measured. It is also important
to have sufficient metadata (i.e. additional clinical information) to develop the algorithms and obtain early
indications of full algorithm or algorithm component performance. All data and metadata should be stored
in an electronic format easy to manipulate.
The community will subdivide obtained data into a subset that is used for development (completely
viewable) and another which is accessible through defined services (rather than completely viewable)
used to assess algorithm performance. Identification of two subsets of data that are similar in
characteristics allows the needs of the diverse stakeholders to be met. Further sub-division may be done
by users (for example, within the development team, they may choose to create subdivisions of the
available training data into “true” training vs. a sub-group which is set aside for use in internal testing).
However, the community as a whole will sequester a portion of the available data as means for a publiclyrecognized testing capability. This latter use case is further developed as support for a publicly trusted
honest broker to sequester test sets and apply them within a performance evaluation regime that may be
used for a variety of purposes, as described later in the set of use cases.
PRE-CONDITIONS
 Data exists from some sources that are representative of the clinical contexts for use and that
represent to some level (even if not perfectly) the target patient population.
 Data is de-identified to remove personal health information (PHI).
 To the extent available, data is linked (or capable of being linked) across imaging and non-imaging
data elements.
 Policy is set for what data goes into the development set vs. that which is sequestered.
WORK FLOW
1. Create a network of federated databases.
.1
BBMSC
We need to collect as much information on the provenance (at least origin, ideally
transformations at all stages of processing) of the data as possible. Users of the data should
be able to put together an independent set of data with the same characteristics and end up
with the same results when analyzed quantitatively.
10 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
2. Implement a secure protocol that seamlessly queries and retrieves interconnected instances. This
“mesh” will work as a communication network in which each node acts as a messaging relay.
.1
If a client requires a dataset not present on the local server, this one will automatically
broadcast a query/retrieve message to the connected node which will either relay the
message or send the actual dataset if it is stored there.
3. For the development set:
.1
Provide access to public data with no security, i.e., with a guest login.
.2
Provide a mechanism to set up purpose-built collections.
.3
Provide a mechanism to create access lists for verifying whether a user should be allowed
visibility to a collection.
.4
Allow the user to add data to a data basket (see query types below in “Query and Retrieve
Data from Linked Archive.”
4. Store sequestered data and make sure that only trusted entities can access the data stored in the
database.
.1
This sequestered data will be exclusively accessed by trusted sites that might be located
remotely. In order to ensure that the site is trusted we will encrypt and sign the data using
public-key cryptography. This mechanism is highly secure and ensures not only the integrity
of the data transferred but also that only the intended recipient can open the data.
Furthermore, the transfer channel between the two sites will be encrypted using securesocket layer (SSL) encryption for further security.
.2
Provide a set of services (but not the direct data) to users.
.3
Provide means and policy to refresh sequestered test sets.
POST-CONDITIONS
 Appropriate access controls are in place and understood
 Formal data sharing agreements are not required or are in place for exploratory cross center
analysis
 Data is in the archive and may be queried and read out.
 Informatics services are defined and available for users to access the data sets, directly or
indirectly is given in the sequestering policy.
3.1.3. Create and Manage User Accounts, Roles, and Permissions
Figure 7: Use Case Model for Create and Manage Reference Data Set (activity relationships)
BBMSC
11 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
PRE-CONDITIONS
 Architectural elements are implemented and capable of being maintained.
WORK FLOW
The IT persons use the development and customization interface to prepare the tool support for the study
according to the tool adaptation specifications. This includes, at a minimum,
 customizing the Image Annotation Tool representation of annotations
 customizing the de-identification tools
 customizing the annotation tool
 customizing the statistical analysis tool
 configuring the internal databases, and
 installing the tools and data at all the sites where the team members work.
 adapt clinical trial management system interface
POST-CONDITIONS
 Operation may proceed.
3.1.4. Query and Retrieve Data from Linked Data Archive
Once digital media are archived in the system they are made accessible to other research units and
several methods of transfer are available: either via the internet or via file sharing. The stored digital
content can be searched, visualized and downloaded directly from the system.
PRE-CONDITIONS
 Analysis goal has been defined.
 Type of query and any necessary parameters are determined.
FLOW OF EVENTS
1. If not already done, establish appropriate data elements and concepts to represent the query
grammar by following workflow “Extend and Disseminate Ontologies, Templates, and
Vocabularies.”
2. If not already done, establish a base of data from which query will be satisfied by following
workflow “Install and Configure Linked Data Archive Systems.”
3. User is looking for examples of annotated images of a certain study.
4. User seeks annotated examples of radiology images for specific tissue or organ.
5. Rare event detection (e.g., rare lesions) – computer triages images and identifies images that
require secondary review
6. User is looking for examples of annotated images of a certain phenotype.
7. Content based image retrieval (content-based image retrieval implies support by computer
methods for complex queries performing potentially dynamic image classification according to
image annotations.)
POST-CONDITIONS
 Required data have been defined and collected.
3.2. Create and Manage Physical and Digital Reference Objects
The first and critical building block in the successful implementation of quantitative imaging biomarkers is
to establish the quality of the physical measurements involved in the process. The technical quality of
imaging biomarkers is assessed with respect to the accuracy and precision of the related physical
measurement(s). The next stage is to establish clinical utility (e.g., by sensitivity and specificity) in a
BBMSC
12 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
defined clinical context of use. Consequently, NIST-traceable materials and objects are required to meet
the measurement needs, guidelines and benchmarks. Appropriate reference objects (phantoms) for the
technical proficiency studies with respect to accuracy and precision, and well-curated and characterized
clinical Reference Data Sets with respect to sensitivity and specificity must be explicitly identified.
Individual workflows are generally understood as building on each other (Fig. 11).
Figure 8: Workflows are presented to highlight how they build on each other.
The following sections describe the workflows to Create and Manage Physical and Digital Reference
Objects.
3.2.1. Develop Physical and/or Digital Phantom(s)
Imaging phantoms, or simply "phantoms", are specially designed objects that are scanned or imaged in
the field of medical imaging to evaluate, analyze, and tune the performance of various imaging devices.
These objects are more readily available and provide more consistent results than the use of a living
subject or cadaver, and likewise avoid subjecting a living subject to direct risk. Phantoms were originally
employed for use in 2D x-ray based imaging techniques such as radiography or fluoroscopy, though more
recently phantoms with desired imaging characteristics have been developed for 3D techniques such as
MRI, CT, Ultrasound, PET, and other imaging methods or modalities.
A phantom used to evaluate an imaging device should respond in a similar manner to how human tissues
and organs would act in that specific imaging modality. For instance, phantoms made for 2D radiography
may hold various quantities of x-ray contrast agents with similar x-ray absorbing properties to normal
tissue to tune the contrast of the imaging device or modulate the patients exposure to radiation. In such a
case, the radiography phantom would not necessarily need to have similar textures and mechanical
properties since these are not relevant in x-ray imaging modalities. However, in the case of
ultrasonography, a phantom with similar rheological and ultrasound scattering properties to real tissue
would be essential, but x-ray absorbing properties would not be needed.7
PRE-CONDITIONS
 Characteristics of the targeted biology are known.
 Interaction between biology and the intended imaging process are sufficiently understood to
discern what is needed to mimic expected behavior.
WORK FLOW
1. Acquire or develop reference object(s) and other support material(s) for controlled experimentation
and ongoing quality control (QC).
2. Describe phantom and other controlled condition support material for “stand-alone” assessment
and required initial and ongoing quality control specifics.
3. Address issues including manufacturing process, shelf life, physical standards traceability, and
shipping.
4. Implement comprehensive QC program, including the process for initial acceptance of an imaging
device as well as required ongoing QC procedures, data analysis, and reporting requirements.
POST-CONDITIONS
BBMSC
13 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
 Phantoms exist and are capable of being deployed at centers and/or shipped as part of proficiency
testing protocols.
3.2.2. Import Data from Experimental Cohort to form Reference Data Set
This Use Case includes the activities necessary for defining and assembling analysis datasets that draw
from the totality of data stored in the Linked Data Archive. Typically, the datasets are generated for
exploratory queries used in the presentation of comparative data. The Reference Data Set Manager
integrates with grid computing environments to facilitate its integration into the different research groups
through a universal interface to several distributed computing environments. It is understood that this
interface is provided through the open-source tool aimed for batch processing called the Batch Analysis
Service.
In this workflow, we utilize Reference Data Set Manager, a web-based digital repository tuned for medical
and scientific datasets that provides a flexible data management facility, a search engine, and an online
image viewer. The Reference Data Set Manager allows users to store, manage and share scientific
datasets, from a web browser or through a generic programming interface, thereby facilitating the
dissemination of valuable imaging datasets to research collaborators.
By way of example, a core set of data needed to develop a cancer therapy assessment algorithm is:
1. A set of images with known ground truth (e.g. anthropomorphic phantom).
.1
This dataset would ideally consist of real or simulated scans of a Reference Data Set of
physical objects with known volumetric characteristics.
.2
A range of object sizes, densities, shapes, and lung attachment scenarios should be
represented.
.3
A range of image acquisition characteristics should be obtained including variation in slice
thickness, tube current, and reconstruction kernel.
.4
Metadata must contain the location and volumetric characteristics of all objects and any
additional information on their surrounding or adjacent environment.
2. A set of clinical images where outcome has been determined.
.1
This dataset would ideally consist of longitudinal scans of a large and diverse Reference
Data Set of patients using many different image acquisition devices and image acquisition
parameters.
.2
The location and volumetric assessment of all lesions within each longitudinal acquisition
must be established by an independent method, such as the assessment of multiple expert
readers. This should include the localization and volumetric estimation of new lesions.
.3
Metadata should at a minimum contain the location and independent volumetric assessment
of all lesions, including the location of new lesions. Additional information on the variance of
the independent volumetric assessment should also be available.
.4
Additional metadata, such as the clinical characteristics of the patients (e.g. age, gender),
classification of cancer (e.g. small cell) and lesion types (e.g. solid, non-solid), lesion
attachment scenarios (e.g. lung pleura, major vessels), and cancer therapy approach,
magnitude and duration, would also be useful to algorithm developers as they determine the
strengths and weaknesses of different algorithmic methods.
PRE-CONDITIONS
 Experimental question is defined.
 Determine from what linked data archive the Reference Data Set will be assembled.
WORK FLOW
The activity consists of sub-activities, with data flows between them.
BBMSC
14 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
1. Determine inclusion / exclusion criteria that determine whether a particular patient, series, or tumor
will be included in the study:
General criteria may include: Case closed? N usable follow-up series? Patient outcome known?
summary assessment of progression available?
Image criteria may include: Acquisition parameters, Image quality assessment.
Tumor criteria may include: type of tumor, grade of tumor, genomic information of the tumor,
proteomic information of the tumor.
Demographic information may include: Age – at the time of image acquisition, Sex.
Clinical criteria might include: Excision findings? Survival, Pathology criteria, Histology criteria,
Past Radiology criteria, CEA criteria (a blood biomarker), other blood biomarkers, other
biomarker criteria, cancer-specific staging, trial-specific assessment of change, Stage of
Cancer, Tumor-Node-Metastases, Pain, Difficulty breathing, Peak flow meter readings, Quality
of life measurements.
Only a subset of these will be used as inclusion criteria on any specific study. Some depend on
what kind of malignancy is being studied.
Note that the clinical criteria might be used for selecting patients and for documenting patient
population characteristics even if the study does not correlate image measurements with other
clinical indicators.
2. Apply pre-screening criteria to prepare a list of candidate patients for inclusion in the study,
possibly using workflow “Query and Retrieve Data from Linked Data Archive.”
3. Select cases for the Reference Data Set according to the inclusion criteria.
4. Once the selection of the input datasets and parameters is performed for each stage of the
pipeline, validate the workflow to make sure it satisfies the global requirements and store it for
archiving and further editing.
5. Avoid downloading large Reference Data Set of datasets anytime an experiment is run. The first
time the experiment is run, query Image Archive for the selected datasets and will cache them
locally for further retrieval.
6. If a dataset is not stored locally (for example, if the cache is full) automatically fetch the data and
transfer it to the client.
7. Store the metadata information associated with a given dataset as well as a full data provenance
record.
8. Retrieve clinical data from CTMS.
.1
Review clinical data for quality and criteria.
9. Audit the selection process to assure the quality of the selections and, for example, to check for
bias in the selections. Approaches to bias reduction:
BBMSC
.1
Define the hypotheses that we would seek to explore using the dataset and thereafter to
design the experiments that would allow these explorations.
.2
Establish rules for the fraction of cases that are submitted, e.g., the first 70% enrolled in
chronological order
.3
Utilize a masked index of all the subjects in all arms (active intervention arm(s) and control
arm) of multiple trials from multiple companies that meet the selection criteria (imaging
modality, anatomic area of concern, tumor type, etc.) could be assembled. Using a random
number scheme a predetermined number of subjects from the entire donated set of subjects
could be selected (perhaps with mechanisms to ensure that no single trial or vendor
15 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
dominated the Reference Data Set). The data would be de-identified as to corporate source,
trial, and of course subject ID.
.4
Since the universe from which the subjects would be chosen is complete (hopefully including
both successful and unsuccessful trials) and the selection is randomized, we should have
eliminated many of the potential sources of bias. Since the data would be pooled and deidentified but not attributable, the anxiety related to releasing proprietary information should
be lessened too.
10. Capture user comments needed to explain any non-obvious decisions or actions taken, such as
when it is unclear whether a case meets the selection criteria.
POST-CONDITIONS
 One or more Reference Data Sets is assembled to comprise a Reference Data Set with a defined
purpose and workflow.
 The Reference Data Set(s) include all linked data necessary for the objectives (e.g., non-imaging
data also).
3.2.3. Create Ground Truth Annotations and/or Manual Seed Points in
Reference Data Set
There are essentially two types of experimental studies supported. One is a correlative analysis where
an imaging measure is taken and a coefficient of correlation is computed with another parameter, such as
clinical outcome. A second type is where a measure of accuracy is computed, which necessitates a
representation of the “ground truth,” whether this be defined in a manner traceable to physical standards
or alternatively by a consensus of “experts” where physical traceability is either not possible or feasible.
This workflow is utilized in the latter case.
PRE-CONDITIONS
 An Reference Data Set has been assembled as one or more Reference Data Sets.
 Definition of what constitutes “ground truth” for the data set is established and has been checked
as to its suitability for the experimental objective it will support.
WORK FLOW
1. If not already done, establish appropriate data elements and concepts to represent the annotations
by following workflow “Extend and Disseminate Ontologies, Templates, and Vocabularies.”
2. If not already done, create a Reference Data Set by following workflow “Import Data to Reference
Data Set to Form Reference Data Set.”
3. The investigators define annotation instructions that specify in detail what the radiologist/reader
should do with each type of reading task.
4. Trans-code imported data into Image Annotation Tool, manually cleaning up any data that cannot
easily be trans-coded automatically.
5. Create nominal ground truth annotations. (This differs from ordinary reading tasks by removing any
tool restrictions and by allowing the reader a lot more time to do the job right. It may entail
presenting several expert markups for comparison, selection, or averaging.)
BBMSC
.1
The investigators assign reading tasks to radiologist/readers, placing a seed annotation in
each task, producing worklists.
.2
The radiologist/readers prepare seed annotations for each of the qualifying biological
features (e.g., tumors) in each of the cases, attaching the instructions to each seed
annotation and assuring that the seed annotations are consistent with the instructions.
.3
The radiologist/readers annotates according to reference method (e.g., RECIST) (to allow
comparative studies should that be within the objectives of the experimental on this
Reference Data Set).
16 of 40
QI-Bench: Informatics Services for Quantitative Imaging
.4
SUC Rev 1.0
Inspect and edit annotations, typically as XML, to associate them with other study data.
4. Record audit trail information needed to assure the validity of the study.
POST-CONDITIONS
 The Reference Data Set has been annotated with properly defined and implemented “ground truth”
and/or manual “seed points” as defined for the experimental purpose of the set.
3.3. Core Activities for Biomarker Development
In general, biomarker development is the activity to find and utilize signatures for clinically relevant
hallmarks with known/attractive bias and variance. E.g., signatures indicating apoptosis, reduction,
metabolism, proliferation, angiogenesis or other processes evident in ex-vivo tissue imaging that may
cascade to the point where they affect organ function and structure. Validate phenotypes that may be
measured with known/attractive confidence interval. Such image-derived metrics may involve the
extraction of lesions from normal anatomical background and the subsequent analysis of this extracted
region over time, in order to yield a quantitative measure of some anatomic, physiologic or
pharmacokinetic characteristic. Computational methods that inform these analyses are being developed
by users in the field of quantitative imaging, computer-aided detection (CADe) and computer-aided
diagnosis (CADx).8,9 They may also be obtained using quantitative outputs, such as those derived from
molecular imaging.
Individual workflows are generally understood as building on each other (Fig. 12).
Figure 9: Workflows are presented to highlight how they build on each other.
The following sections describe workflows for Core Activities for Biomarker Development.
3.3.1. Set up an Experimental Run
It is perhaps helpful to describe two fundamental types of experiments: those that call for new acquisitions
(and post-processing), vs. those which utilize already acquired data (only require the post-processing).
However, in the most flexible generality, these are both understood simply as how one defines the
starting point and which steps are included in the run. As such, they may both be supported using the
same toolset. In the former case, there is need to support means by which the physical phantom or other
imaging object is present local to a device, where the tool provides “directions” for how and when the
scans are done with capture of the results; in the latter case, the tool must interface with the candidate
implementation and may run local to the user or remote from them. In either case, the fundamental
abstraction of an imaging pipeline is useful, where the notion is that any given experiment describes the
full pipeline but focuses on granularity at the stage of the pipeline of interest (Fig. 13).
BBMSC
17 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
Assess change in tumor burden
...
Assess change per target lesion
Obtain images per timepoint (2)
Patient
Prep
Acquire
Imaging
Agent
(if any)
Reconstru
ct and
Postprocess
Calculate
Calculate
volume
volume
volumes
images
-ORDirectly process
images to
analyze change
Lesion
volume at
time
point (vt)
Volume
change per
target
lesion (Δvt)
Subtract
volumes
volume
changes
Interpret
Change in
tumor burden
by volume
(ΔT.B.)
Figure 10: Example technical description of biomarker, elaborating the pipeline. Knowledge of the
read-outs subject to statistical analysis as well as the pipeline steps are given.
The Batch Analysis Service is an open-source, cross platform tool for batch processing large amounts of
data. The Batch Analysis Service can process datasets either locally or on distributed systems. The Batch
Analysis Service uses a scripting language with a specific semantic to define loops and conditions. The
Batch Analysis Service provides a scripting interface to command line applications. The arguments of the
command line executable can be wrapped automatically if the executable is able to produce a specific
XML description of its command line arguments. Manual wrapping can also be performed using the
“Application Harness” interface provided by The Batch Analysis Service.
The Batch Analysis Service allows users to upload scripts with the associated parameters description
directly into the Reference Data Set Manager. Thus, when a user decides to process a Reference Data
Set using a given Batch Analysis Service script, the Reference Data Set Manager automatically parse the
parameter description and generate an HTML page. The HTML page provides an easy way to enter
tuning parameters for the given algorithm and once the user has specified the parameters, the Batch
Analysis Service configuration file is generated and the script is run on the selected Reference Data
Set(s). This flexibility allows sharing Processing Pipelines easily between organizations, since the Batch
Analysis Service script is describing the pipeline.
PRE-CONDITIONS
 In cases where the experimental objective requires a physical or digital phantom, such is available.
WORK FLOW
1. Define the biomarker in terms of pipeline steps as well as read-outs that will be subject of analysis.
2. If not already done, establish appropriate data elements and concepts to represent the pipeline and
the annotations on which it will operate by following workflow “Extend and Disseminate Ontologies,
Templates, and Vocabularies.”
3. Depending on the experimental objective, two basic possibilities exist for creating the image
markups:
.1
The user will implement the Application Harness API to interface a candidate biomarker
implementation to the Batch Analysis Service.
.2
Alternatively, the reference implementation for the given biomarker will be utilized.
4. Depending on the experimental objective, means for ensuring availability of physical and/or digital
BBMSC
18 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
phantoms are undertaken.
5. Based on the pipeline definition, describe the experimental run to include, for example, set up for
image readers so as to guide the user through the task, and save the results.
.1
It could, for example, create an observation for each tumor, each in a different color. Or, one
tumor per session. Depends on experiment design. This activity also tailors the order in
which the steps of the reading task may be performed.
.2
Design an experiment-specific tumor identification scheme and install it in the tumor
identification function in the task form preparation tool.
.3
Define the types of observations, characteristics, and modifiers to be collected in each
reading session and program them into the constrained-vocabulary menus. (May be only
done in ground truth annotations in early phases.)
.4
Determine whether it will be totally algorithmic or will have manual steps. There are a
number of reasons for manual steps, including for example if an acquisition is needed,
whether the reader chooses his/her own seed stroke, whether manual improvements are
permitted after the algorithm has made the attempt, or combinations of these. Allow for
manual, semi-automatic, and automatic annotation.
.5
Specify the sequence of steps associated with acquisitions, for example of a phantom
shipped to the user, of new image data that will be added to the Reference Data Set(s).
.6
Specify the sequence of steps that the reader is expected to perform on each type of reading
task for data in the Reference Data Set(s).
6. Customize the analysis tool.
.1
Analyze data with known statistical properties to determine whether custom statistical tools
are producing valid results.
.2
Customize built-in statistical methods
.3
Add the measures, summary statistics, outlier analyses, plot types, and other statistical
methods needed for the specific study design.
.4
Load statistical methods into analysis tool
.5
Configure presentation of longitudinal data
.6
Customize outlier analysis
.7
Configure the report generator to tailor the formats of exported data views. The report
generator exports data views to files according to a run-time-configured list of the data views
that should be included in the report.
7. Install databases and tools as needed at each site, loading the databases with initial data. This
includes installing image databases at all actor sites, installing clinical report databases at all
clinical sites, and installing annotation databases at Reader, Statistician, and PI sites.
8. Represent the Processing Pipeline so that it may be easily shared between organizations.
POST-CONDITIONS
 One or more Processing Pipeline scripts are defined and may be executed on one or more
Reference Data Sets.
3.3.2. Execute an Experimental Run
Once the Processing Pipeline script is written, the Batch Analysis Service can run the script locally or
across distributed processing jobs expressed as a directed-acyclic graph (DAG). When distributing jobs,
the Batch Analysis Service first creates the appropriate DAG based on the current input script. By default,
the Batch Analysis Service performs a loop unrolling and considers each scope within the loop as a
different job which may ultimately be distributed on different nodes. This allows distributing independent
BBMSC
19 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
jobs automatically (as long as each iteration is considered independent). The Batch Analysis Service also
provides a way for the user to specify if a loop should be executed sequentially instead of in parallel.
Before launching the script (or jobs) on the grid, the user can visualize the DAG to make sure it is correct.
Then, when running the job, the user can monitor the process of each job distributed with the Batch
Analysis Service.
The Biomarker Evaluation GUI provides online monitoring of the grid processing in real time. Results of
the processing at each node of the grid are instantly transferred back to the Biomarker Evaluation GUI
and can be used to generate dashboard for batch processing. It permits to quickly check the validity of
the processing by comparing the results with known baselines. Each row of the dashboard corresponds
to a particular processing stage and is reported in red color if the result does not meet the validation
criterion.
PRE-CONDITIONS
 A Processing Pipeline script is defined and/or conceived that may be executed on one or more
Reference Data Sets.
 One or more Reference Data Set(s) have been assembled and/or conceived for batch analysis.
 In cases where the experimental objective requires “ground truth”, “gold standard”, and/or “manual
seed points”, then such annotations have been created through workflow “Create Ground Truth
Annotations and/or Manual Seed Points in Reference Data Set”.
WORK FLOW
1. If not already done, set up a processing pipeline script by following workflow “Set up an
Experimental Run.”
2. If not already done, create an Reference Data Set by following workflow “Import Data to Reference
Data Set to Form Reference Data Set” (optionally through workflow “Create Ground Truth
Annotations and/or Manual Seed Points in Reference Data Set”).
3. Create electronic task forms for each of the manual tasks (e.g., acquisition or reading) to be
analyzed in the study.
4. Fully-automated methods can just run at native compute speed. For semi-automated methods:
.1
Assign each task form to one or more human participants, organizing the assignments into
worklists that could specify, for example, when each task is performed and how many tasks
are performed in one session.
5. Support two levels of Application Harness:
.1
Full patient work-up: candidate algorithm or method encompasses not only individual tumor
mark-up but also works across all tumors – just call the algorithm with the whole patient data
.2
Individual tumor calculations: candidate algorithm expects to be invoked on an individual
tumor basis – requires implementation of a reference implementation for navigating across
measurable tumors and call the algorithm on each one
6. Upload Processing Pipelines and run them on selected Reference Data Sets.
BBMSC
.1
Translate to a grid job description and sent to the distributed computing environment along
with the input datasets.
.2
Integrate input data and parameters, processing tools and validation results.
.3
For semi-automated methods which require a “user in-the-loop”, the manual process will be
performed beforehand and manual parameters will be passed to the system via XML files
(e.g., seed points) without any disruption of the batch processing workflow.
.4
Automatically collect the parameters, data and results during each stage of the Processing
Pipeline and stores the results in the database for further analysis.
.5
Develop an interface to the statistical packages and the evaluation system. When generating
a distributed computing jobs list, ensure that the post-processing information (values and
20 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
datasets) are collected and a post-processing package is created and sent via REST
protocol.
.6
Record all the input parameters, machine specification, input datasets and final results and
stores them in database. At the end of the experiment, be able to access the processed data
and visualize the results via web interface.
.7
Integrate with grid computing technology to enable efficient distributed processing.
.8
Process datasets either locally or on distributed systems.
.9
Use a scripting language with a specific semantic to define loops and conditions.
.10 Provide a scripting interface to any command line applications.
.11 Perform a loop unrolling and consider each scope within the loop as a different job, which
will be ultimately distributed on different nodes.
.12 From a single script, translate a complete workflow to a set of job requirements for different
grid engines.
.13 Run command line executables associated with a description of the expected command line
parameters. This pre-processing step is completely transparent to the user.
7. Provide online monitoring of the grid processing in real time.
8. Generate result dashboards. A dashboard allows one to quickly validate a processing task by
comparing the results with known baselines. A dashboard is a table showing the results of an
experiment.
9. Record audit trail information needed to assure the validity of the study.
POST-CONDITIONS
 Results are available for analysis.
3.3.3. Analyze an Experimental Run
Support hypothesis testing of the form given in decision trees and claims for context of use
PRE-CONDITIONS
 Results are available for analysis.
WORK FLOW
1. If not already done, create analyzable results by following workflow “Execute an Experimental
Run.”
2. The following comparisons of markups can be made:
.1
Analyze statistical variability
.2
Measurements of agreement
.3
User-defined calculations
.4
Correlate tumor change with clinical indicators
.5
Calculate regression analysis
.6
Calculate factor analysis
.7
Calculate ANOVA
.8
Calculate outliers
3. Estimate the confidence intervals on tumor measurements due to the selected independent
variables, as measured by validated volume difference measures.
BBMSC
21 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
4. Review statistics results to uncover promising hypotheses about the data. Typical methods
include: Box plot, Histogram, Multi-vari chart, Run chart, Pareto chart, Scatter plot, Stem-and-leaf
plot, Odds ratio, Chi-square, Median polish, or Venn diagrams.
5. Provide review capability to user for all calculated items and plots
6. Drill-down interesting cases, e.g., outlying sub-distributions, compare readers on hard tumors, etc.
POST-CONDITIONS
 Analyses are performed and reports are generated.
3.4. Collaborative Activities to Standardize and/or Optimize the Biomarker
The first and critical building block in the successful implementation of quantitative imaging biomarkers is
to establish the quality of the physical measurements involved in the process. The technical quality of
imaging biomarkers is assessed with respect to the accuracy and reproducibility of the related physical
measurement(s). Consequently, a well thought-out testing protocol must be developed so that, when
carefully executed, it can ensure that the technical quality of the physical measurements involved in
deriving the candidate biomarker is adequate. The overarching goal is to develop a generalizable
approach for technical proficiency testing which can be adapted to meet the specific needs for a diverse
range of imaging biomarkers (e.g., anatomic, functional, as well as combinations).
Guidelines of “good practice” to address the following issues are needed: (i) composition of the
development and test data sets, (ii) data sampling schemes, (iii) final evaluation metrics such as accuracy
as well as ROC and FROC metrics for algorithms that extend to detection and localization. With
development/testing protocols in place, the user would be able to report the estimated accuracy and
reproducibility of their algorithms on phantom data by specifying the protocol they have used.
Furthermore, they would be able to demonstrate which algorithmic implementations produce the most
robust and unbiased results (i.e., less dependent on the development/testing protocol). The framework
we propose must be receptive to future modifications by adding new development/testing protocols based
on up-to-date discoveries.
Inter-reader variation indicates difference in training and/or proficiency of readers. Intra-reader differences
indicate differences from difficulty of cases. To show the clinical performance of an imaging test, the
sponsor generally needs to provide performance data on a properly-sized validated set that represents a
true patient population on which the test will be used. For most novel devices or imaging agents, this is
the pivotal clinical study that will establish whether performance is adequate.
In this section, we describe workflows that start with developed biomarker and seek to refine it by
organized group activities of various kinds. These activities are facilitated by deployment of the
Biomarker Evaluation Framework within and across centers as a means of supporting the interaction
between investigators and to support a disciplined process of accumulating a body of evidence that will
ultimately be capable of being used for regulatory filings.
By way of example, a typical scenario to demonstrate how the Reference Data Set Manager involves
three investigators working together on to refine a biomarker and tests to measure it: Alice who is
responsible for acquiring images for a clinical study. Martin, who is managing an image processing
laboratory responsible for analyzing the images acquired by Alice, and Steve, a statistician located at a
different institution. First, Alice receives volumetric images from her clinical collaborators; she logs into the
Reference Data Set Manager and creates the proper Reference Data Sets of datasets. She uses the web
interface to upload the datasets into the system. The metadata are automatically extracted from the
datasets (DICOM or other well known scientific file formats). She then adds more information about each
dataset, such as demographic and clinical information, and changes the Reference Data Set’s policies to
make it available to Martin. Martin is instantly notified that new datasets are available in the system and
are ready to be processed. Martin logs in and starts visualizing the datasets online. He visualizes the
dataset as slices and also uses more complex rendering technique to assess the quality of the
acquisition. As he browses each dataset, Martin selects a subset of datasets of interest and put them in
the electronic cart. At the end of the session, he downloads the datasets in his cart in bulk and gives them
to his software engineers to train the different algorithms. As soon as the algorithms are validated on the
training datasets, Martin uploads the algorithms, selects the remaining testing datasets and applies the
BBMSC
22 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
Processing Pipeline to the full Reference Data Set using the Batch Analysis Service. The pipeline is
automatically distributed to all the available machines in the laboratory, decreasing the computation time
by several orders of magnitude. The datasets and reports generated by the Processing Pipeline are
automatically uploaded back into the system. During this time, Martin can monitor the overall progress of
the processing via his web browser. When the processing is done, Martin gives access to Steve in order
to validate the results statistically. Even located around the world, Steve can access and visualize the
results, make comments and upload his statistical analysis in the system.
Individual workflows are generally understood as building on each other (Fig. 14).
Figure 11: Workflows are presented to highlight how they build on each other.
The following sections describe workflows for Collaborative Activities to Standardize and/or Optimize the
Biomarker.
3.4.1. Validate Biomarker in Single Center or Otherwise Limited
Conditions
Clinicians, educators, behavioral scientists, health scientists, and many other consumers of tests and
measurements typically demand that any test have demonstrated validity and reliability. In casual
scientific conversations in imaging contexts, the words reliability and validity are often used to describe a
variety of properties (and sometimes the same one). The metrology view of proof of performance dictates
that a measurement result is complete only when it includes a quantitative statement of its
uncertainty.10,11 Generating this statement typically involves the identification and quantification of many
sources of uncertainty, including those due to reproducibility and repeatability (which themselves may be
due to multiple sources). Measures of uncertainty are required to assess whether a result is adequate for
its intended purpose and how it compares with alternative methods. A high level of uncertainty can limit
utilization, as uncertainty reduces statistical power, especially in the multi-center trials needed for major
studies. Uncertainty compromises longitudinal measurements, especially when patients move between
centers or when scanner changes occur.
The following figures illustrate what occurs at this stage of a biomarker’s development, as it is
represented by one or more candidate test implementations. Figure 15 illustrates one approach for the
evaluation of a set of segmentation/classification algorithms, in this case for processing slide images for
ex-vivo pathology but in a validation methodology equally applicable to in-vivo radiology.12
BBMSC
23 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
Figure 12: Algorithm validation comprises the cross-comparison of the proposed method with
other methods and the assessment of its performance.
Figure 16 provides a use case diagram view of the steps being undertaken.
Figure 13: Use Case Model to Validate Biomarker under Limited Conditions
The user is working, possibly at single center or possibly with a consortium. S/he wants to test out a new
image processing algorithm on a set of radiology studies. He has tested out the algorithm on one set of
images at his local institute and is confident his code can be run remotely in a batch processing mode.
One of the consortium collaborators has a set of radiology images (studies) that can be used to test the
algorithm. The results of the algorithm include image masks (Markup) and quantitative results
(Annotation) that can be shared and viewed by everyone.
Note that this is an aggregate workflow that specializes workflows elaborated earlier in this document.
BBMSC
24 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
PRE-CONDITIONS
 At least some development of the biomarker as measured by a candidate implementation has been
pursued, whether through workflows such as described in “Biomarker Development” or otherwise.
FLOW OF EVENTS
1. If not already done, follow workflow “Import Data to Reference Data Set to Form Reference Data
Set”
2. As needed, follow workflow “Create Ground Truth Annotations and/or Manual Seed Points in
Reference Data Set”
3. As needed, follow workflow “Develop Physical and/or Digital Phantom(s)”
4. If not already done, follow workflow “Set up an Experimental Run.”
.1
User puts his code and description of files (URL of Image Archive, Filenames or StudyIDs,
etc) into the DataModel defined by [Image Analysis/Biomarker Evaluation Framework] Image
Annotation Tool] and the algorithm code is saved in the Algorithm Code Storage. The name
of the algorithm is saved in the associated data model – dynamic extension applied.
5. Follow workflow “Execute an Experimental Run.”
.1
User then sends this information out via Grid Services and the Image Archive and
associated CPU processors start to run the batch processing. [Image Analysis/Biomarker
Evaluation Framework, Image Archive Data Service, Image Information Data Service].
.2
As the algorithm is processed on each of the images, the results are saved in new
Annotation and markup that will be associated with that image.
.3
After processing is complete, user receives status that the remote job is completed and
available for viewing.
.4
User has the ability to see the results. Alternatively, they can develop methods to extract out
all annotation results through the Biomarker Evaluation Framework without having to go
through the viewer.
6. Follow workflow “Analyze an Experimental Run.”
.1
Qualitative pairwise algorithm comparison.
.2
The goal is to provide a validation assessment of technical and clinical performance via highthroughput computing tasks.
7. Iterate for confidence and refinement.
POST-CONDITIONS
 Quantitative analysis algorithm is ready for further evaluation within a consortium or by a sponsor
as part of one or more regulatory pathways.
3.4.2. Team Optimizes Biomarker Using One or More Tests
When one or more imaging tests reach the point where their technical performance has been
demonstrated in well-controlled settings, whether by an individual sponsor or by a collaboration,
organized activities undertaken by teams are pursued to optimize the biomarker and characterize/improve
the class of tests available to measure it. These activities generally include:
 Implementation and refinement of protocols that include recommended operating points for
acquisition, analysis, interpretation, and QC, according to the documented intended use,
 Development and merging of training and test datasets from various sources to establish or
augment linked data archive(s)
 Assessment and characterization of variability, minimum detectable change, and other aspects of
BBMSC
25 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
performance in the intended environment including subject variability associated with the
physiological and pathophysiological processes present in the target population – that is, moving
beyond the more highly controlled conditions on which the biomarker and its tests may have been
initially discovered and developed.
Prospective trials and or retrospective analyses are stored as Reference Data Sets based on the
intended-use claims, with care to exclude biases. The studies are undertaken in part to provide data to
support proposed cut-points (i.e., decision thresholds), if imaging results are not reported as a continuous
variable, and performance characteristics (including sensitivity, specificity and accuracy) are reported to
complete this step.
PRE-CONDITIONS
 The claim intended use of the imaging test must be clearly stated before initiating technical
validation studies so that appropriate data are generated to support that use.
 Summary of Good Manufacturing Practices (GMP) issues as they related to producing the imaging
test.
 Devices and software used to perform the imaging test must meet quality system requirements. 13
FLOW OF EVENTS
Probably a useful way of thinking of this is that it specializes / extends workflow “Validate Biomarker in
Single Center or Otherwise Limited Conditions”:
1. Write a Profile, in part through following workflow “Extend and Disseminate Ontologies, Templates,
and Vocabularies”:
.1
Describe the patient population characteristics, which determine how general the study is,
and therefore the importance of the result. The study design should include how to capture
the data to back up claims about the patient population characteristics.
.2
High-level descriptions of the processing hardware and software, highlighting potential
weaknesses, should be provided.
.3
Detailed descriptions of the procedures to be used for image acquisition, analysis and
interpretation of the quantitative imaging biomarker as a clinical metric should be included.
.4
Procedures to be used when results are not interpretable or are discrepant from other known
test results must be described; this is especially important for imaging tests used for
eligibility or assignment to treatment arms.
.5
Develop a standardized schema for validation and compliance reports. Ensure consensus
on the XML schema is attained with all QIBA participants.
2. As needed, follow workflow “Develop Physical and/or Digital Phantom(s)”
3. Use the Profile to automatically follow the workflow “Import Data to Reference Data Set to Form
Reference Data Set” using the Profile for automatic data selection.
.1
Automatically select the optimal datasets based on its associated metadata. If only a limited
number of datasets is identified, the user will be asked to provide more datasets. Similarly, if
too many datasets are returned, the user will be asked to choose the specific datasets to be
included in the experiment.
4. As needed, follow workflow “Create Ground Truth Annotations and/or Manual Seed Points in
Reference Data Set” to annotate the Reference Data Set(s).
5. Use the Profile to automatically follow the workflow “Set up and Experimental Run” to describe the
Processing Pipeline for a set of validation script(s).
6. If not already done, wrap candidate implementation(s) to be compatible with Application Harness.
7. Follow workflow “Execute an Experimental Run.”
8. Follow workflow “Analyze an Experimental Run” according to the prescriptions for validation and
BBMSC
26 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
compliance as outlined in the Profile.
9. Iterate for confidence and refinement.
10. After an experiment runs on selected datasets, results should be compared to the expected target
values defined in the “Qualification Data” file.
POST-CONDITIONS
 One or more validated imaging tests measuring a biomarker with known performance
characteristics regarding accuracy and reproducibility within a specified context for use.
3.4.3. Support “Open Science” Publication model
The user shares and disseminates the artifacts, results, processed data, and, in some cases, the primary
data. For hypothesis testing scientific articles or reports, the user documents the methods, materials, and
results. The user explains if and how the hypothesis is rejected or accepted in light of the experiments
conducted (the "new findings") and in the context of other similar reported findings. For discovery science
scientific articles or reports, the user documents the methods and materials used in the experiments as
well as, the results and summarizes the characteristics and intended use of the data or derived materials
created (the "new findings"). In either case, the final documentation should be sufficient for another
person similarly skilled in the field to replicate the experiment and new findings. Preliminary results may
be released prior to formal presentation of results as, generally, a peer-reviewed journal article. Forums
for sharing results may include staff seminars and seminars or posters at a scientific conference in
addition to published journals. Primary or processed data may be submitted to accessible databases.
The “open science” initiative allows scientists to expand upon traditional research results by providing
software for interactively viewing underlying source data. The interactive Image Viewer system enhances
the standard scientific publishing by adding interactive visualization. Using interactive Image Viewer,
authors have the ability to create 3-dimensional visualization of their datasets, add 3-dimensional
annotations and measurements and make the datasets available to reviewers and readers. The system is
composed of two main components: the archiving system and the visualization software. A customized
version of Reference Data Set Manager provides the data storage, delivers low-resolution datasets for
pre-visualization, and in the background serves the full-resolution dataset. Reference Data Set Manager
must support MeSH, the U.S. National Library of Medicine (NLM) controlled vocabulary used for indexing
scientific publications. The second component, the interactive Image Viewer visualization software,
interacts directly with the Reference Data Set Manager in order to retrieve stored datasets. Readers of an
interactive Image Viewer-enabled manuscript can automatically launch the interactive Image Viewer
software by clicking on a web link directly in the PDF. Within ten seconds, a low-resolution dataset is
loaded from Reference Data Set Manager and can be interactively manipulated in 3D via the interactive
Image Viewer software.
PRE-CONDITIONS




Primary and/or processed data sets, new equipment or software, and /or other artifacts generated.
Data have been interpreted.
Hypothesis accepted or rejected (applicable to hypothesis testing).
Experimental materials and resultant materials available.
FLOW OF EVENTS
1. If not already done, create reportable results by following workflow “Analyze an Experimental Run.”
2. Summarize the results.
3. Describe and/or interpret the results in the broader scientific context of the research ("narrate the
findings"), using tables and figures as required to illustrate key points.
4. Communicate the validation, evaluation, and interpretation in any of several ways, including, but
not limited to:
.1
BBMSC
Publish a manuscript
27 of 40
QI-Bench: Informatics Services for Quantitative Imaging
.2
Present at a scientific meeting
.3
Submit raw or processed data to appropriate resources/organizations
.4
Establish mechanisms/s to share new equipment or software
SUC Rev 1.0
5. Specific examples:
.1
Make image analysis algorithms available as publically accessible caGrid analytical services.
.2
Upload raw image and other data sets into a repository where on-line journals may access
them by readers having a user interface that can run reference algorithms and/or could give
access to data for local consideration.
.3
Perform the workflow in a highly consistent manner so that results can be reproduced,
standardized and eventually utilize common analysis tools.
POST-CONDITIONS




Publication of scientific articles or abstracts.
Work has been shared through at least one means.
Data and/or artifacts have been made available.
New equipment or software is disseminated.
3.5. Consortium Establishes Clinical Utility / Efficacy of Putative Biomarker
Biomarkers are useful only when accompanied by objective evidence regarding the biomarkers’
relationships to health status. Imaging biomarkers are usually used in concert with other types of
biomarkers and with clinical endpoints (such as patient reported outcomes (PRO) or survival). Imaging
and other biomarkers are often essential to the qualification of each other.
The following figure expands on Figure 17 and specializes workflow “Team Optimizes Biomarker Using
One or More Tests” as previously elaborated to build statistical power regarding the clinical utility and/or
efficacy of a biomarker.
Figure 14: Use Case Model to Establish Clinical Utility / Efficacy of a Putative Biomarker
BBMSC
28 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
Individual workflows are generally understood as building on each other (Fig. 18).
Figure 15: Workflows are presented to highlight how they build on each other.
The following workflows are elaborated.
3.5.1. Measure Correlation of Imaging Biomarkers with Clinical Endpoints
The class of tests serves as a basis for defining the measurement technology for a biomarker which may
then be assessed as to its clinical utility. This assessment may be done in the context of an effort to
qualify the biomarker for use in regulatory decision making in clinical trials, or it may be a comparable
activity associated with individual patient management without explicitly following a qualification pathway.
In either case, the hallmark of this step is the assessment of clinical utility on the basis of at least some
capability to measure it.
Biomarker reproducibility in the clinical context is assessed using scans from patients that were imaged
with the particular modality repeatedly and over an appropriately short period of time, without intervening
therapy. The statistical approaches include standard analyses using intraclass correlation and BlandAltman plots for the assessment of agreement between measurements.14,15 However, more detailed
determinations are also of interest for individual biomarkers. For example, it may be useful to determine
the magnitude of observed change in a biomarker that would support a conclusion of change in the true
measurement for an individual patient. It may also be of interest to determine if two modalities measuring
the same quantity can be used interchangeably. 16 The diagnostic accuracy of biomarkers (that is, the
accuracy in detecting and characterizing the disease) is assessed using methods suitable to the nature of
the detection task, such as ROC, FROC, and LROC. In settings where the truth can be effectively
considered as binary and the task is one of detection without reference to localization, the broad array of
ROC methods will be appropriate.17,18 Since the majority of VIA imaging biomarkers in the volumetric
analysis area produce measurements on a continuous scale, methods for estimating and comparing ROC
curves from continuous data are needed. In settings where a binary truth is still possible but localization is
important, methods from free-response ROC analysis are appropriate.19,20,21
PRE-CONDITIONS
 A mechanistic understanding or “rationale” of the role of the feature(s) assessed by the imaging
test in healthy and disease states is documented.
 A statement of value to stakeholders (patients, manufacturers, biopharma, etc.), expressed in the
context of the alternatives is understood (e.g., with explicit reference to methods that are presently
used in lieu of the proposed biomarker).
 Consensus exists on whether the test is quantitative, semi-quantitative, or qualitative (descriptive);
what platform will be used; what is to be measured; controls; scoring procedures, including the
values that will be used (e.g., pos vs. neg; 1+, 2+ 3+); interpretation; etc.
FLOW OF EVENTS
1. If not already done, follow workflow “Install and Configure Linked Data Archive Systems.”
2. If not already done, follow workflow “Import Data to Reference Data Set to Form Reference Data
Set” for each of potentially several Reference Data Sets as needed to build statistical power.
3. If not already done, follow workflow “Team Optimizes Biomarker Using One or More Tests” to
perform clinical performance groundwork to characterize sensitivity and specificity for readers using
BBMSC
29 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
the imaging test when interpreted as a biomarker under specified conditions.
4. Conduct pilot study(ies) of the analysis to establish capability of the class of tests that represent the
biomarker using the training set (e.g., following Sargent et al., as set utility determinations restricted
to demonstrating that in single studies the endpoint captures much of the treatment benefit at the
individual patient level.) 22
.1
Demonstrating a high correlation at the patient level between the early endpoint and the
ultimate clinical endpoint within a trial, randomized or not, is not sufficient to validate an
endpoint. Such a correlation may be a result of prognostic factors that influence both
endpoints, rather than a result of similar treatment effect on the two endpoints. Despite this
caveat, a reasonably high patient level correlation (for example, >50%) would suggest the
possible utility of the early endpoint and the value of subsequently assessing, by means of a
larger analysis, the predictive ability of the early endpoint for the ultimate phase 3 endpoint
for treatment effect at the trial level.
.2
For predictive markers, the Freedman approach involves estimating the treatment effect on
the true endpoint, defined as s, and then assessing the proportion of treatment effect
explained by the early endpoint. However, as noted by Freedman, this approach has
statistical power limitations that will generally preclude conclusively demonstrating that a
substantial proportion of the treatment benefit at the individual patient level is explained by
the early endpoint. In addition, it has been recognized that the proportion explained is not
indeed a true proportion, as it may exceed 100%, and that whilst it may be estimated within
a single trial, that data from multiple trials are required to provide a robust estimate of the
predictive endpoint. Additionally, it can have interpretability problems, also pointed out by
Freedman. Buyse and Molenberghs also proposed an adjusted correlation method that
overcomes some of these issues.
.3
For prognostic markers, the techniques for doing so are most easily described in the context
of a continuous surrogate (e.g. change in nodule volume) and a continuous outcome. Linear
mixed models23 with random slopes (or, more generally, random functions) and intercepts
through time are built for both the surrogate marker and the endpoint. That is, the joint
distribution of the surrogate marker and the endpoint are modeled using the same
techniques as used for each variable individually. The degree to which the random slopes for
the surrogate and the endpoint are correlated give a direct measure of how well changes in
the surrogate correlate with changes in the endpoint. The ability of the surrogate to
extinguish the influence of potent risk factors, in a multivariate model, further strengthens its
use as a surrogate marker.
5. Conduct the pivotal meta-analysis on the test set (extending the results to the trial level and
establishing the achievable generalizability based on available data). Follow statisitical study
designs as consistent with the claims and the type of biomarker along the lines described in the
Basic Story Board for predictive vs. prognostic biomarkers.
POST-CONDITIONS
 Sufficient data for registration according to workflow “Formal Registration of Data for Approval or
Clearance” is available.
3.5.2. Comparative Evaluation vs. Gold Standards or Otherwise Accepted
Biomarkers
Whereas the correlation of a putative biomarker with clinical endpoints seems instrumental for the
biomarker to be qualified for defined uses, in and of itself it does not result in the acceptance by the
community that it ought to be used vs. what the community is already using. In addition to the efficacy of
a biomarker unto itself as described in workflow “Measure Correlation of Imaging Biomarkers with Clinical
Endpoints,” comparative analyses would be pursued that identify the relative advantages (or
disadvantages as the case may be) of using this biomarker vs. another biomarker. Two specific examples
that are currently relevant include spirometry vs. based lung densitometry, and use of diameter
measurements on single axial slices as presently inculcated in RECIST. Ultimately, use of all putative
BBMSC
30 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
imaging biomarkers are understood to be in relation to how it is done without benefit of the imaging
biomarker and industry uptake for the biomarker requires an evaluation of relative performance against
identified figures of merit.
Following the RECIST example, when RECIST was defined (reference) it was the decision of the
assessment group which measurement should be chosen as basis for the finally clinical classification of
disease state development (progression, stability, response to treatment). Already at that time volumes
were under investigation. Due to its impracticality at that time to make easy tumor measurements (highly
manual process of tumor lesion markup; thick image slices that made a volumetric assessment
imprecise)) diameter was the way to go forward. RECIST went through several rounds of refinement with
regard to clinical validity of classification and usefulness in different phases of clinical drug development.
Today RECIST1.1 is the accepted standard for clinical therapy response assessment in oncology trials.
Meanwhile there have been further developments in the imaging techniques (higher spatial resolution,
Thinner slices) and in algorithm development that make a volume measurement feasible. There are
studies (reference to Merck) available that support the higher sensitivity of volume measurements versus
diameter measurements with regard to change detection. Therefore it is in the interest of the relevant
stakeholders to work towards recommendations for a datasets that enable FDA to accept the use of
volume instead of diameter for RECIST. A core set of data needed to proove the validity of volume
assessment compared to diameter measurements based on high quality, mixed (less than-5mm) slice
thickness images of cancer cases acquired in several clinical trials by different pharmaceutical companies
and academic consortia:
 This dataset would ideally consist of longitudinal scans of different clinical trials sponsored by
pharmaceutical companies and comparable trials of publicly funded research (e.g. LIDC)
 A range of disease state, therapeutic intervention and RECIST based clinical outcome should be
covered.
 All diameter measurements must be available.
 A real life range of image acquisition devices and image acquisition parameters.
 Metadata must contain a basic set of additional clinical data that support the clinical case
identification and validation.
 The location and volumetric assessment of all lesions within each longitudinal acquisition must be
established by an FDA accepted method, like a multi-reader approach.
PRE-CONDITIONS
 A presumably more effective biomarker must be proposed for comparison under documented
conditions with an existing, explicitly identified, and clinically accepted biomarker.
FLOW OF EVENTS
There are two approaches: either:
 Follow workflow “Measure Correlation of Imaging Biomarkers with Clinical Endpoints” for each of the
two biomarkers, the “new” and the “accepted” and assess the degree to which each (independently)
correlates with the desired clinical endpoint. The comparison is framed “indirectly” in terms of how well
each correlates. The one that correlates better is said to be superior.
 Alternatively, the two biomarkers may be compared directly by following workflow “Measure
Correlation of Imaging Biomarkers with Clinical Endpoints” only for the new biomarker and replacing the
target of the correlation to be the result of following workflow “Create Ground Truth Annotations and/or
Manual Seed Points in Reference Data Set” in the Reference Data Sets according to the previously
accepted biomarker. The comparison in this case is more direct, with the implication being that the
biomarker which calls an event first is considered better. The caveat is that the accepted biomarker may
not actually be correct; in fat, it may be that the reason that the new biomarker is proposed is to overcome
some deficiency in the prior biomarker so a direct comparison may be inconclusive because the “truth” of
the event called is not established nor is it clear what happens in those cases where one biomarker calls
an event but the other does not. Which one is correct in this case (Fig. 19).
BBMSC
31 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
Figure 16: Example Result of Comparing New Biomarker to Previous Accepted Biomarker
POST-CONDITIONS
 The relative performance characteristics of the two biomarkers is known, subject to caveats in the
approach used.
3.5.3. Formal Registration of Data for Qualification
Sub-communities cooperate to pursue regulatory qualification of a “class” of implementations said to be
the biomarker itself. These sub-communities have a critical need to work with as large and diverse a
Reference Data Set of imaging data across multiple viable implementations to substantiate and
characterize the performance of the imaging biomarker independent from a specific implementation, and
use this in the context of the biomarker qualification pathway. This spans a wide range of potentially
useful imaging datasets including synthetic and real clinical scans of phantoms and clinical imaging
datasets of patients with and without the disease/condition being measured.
Biomarkers are Qualified by therapeutic review groups in CDER. The process is still evolving but it
generally requires evidence from clinical trials that the biomarker information correlates closely with a
clinically meaningful parameter. We have adopted concepts and language from the current FDA process
for the qualification of biomarkers,24,25,26 to make clear the specifics regarding necessary steps for a
sponsoring collaborative to use it for qualification of putative biomarkers (Fig. 20).
BBMSC
32 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
Figure 17: The qualification pathway is a collaborative undertaking between sponsors and
regulatory agencies. In the figure, activities undertaken by the sponsor are indicated in the left
hand column. Activities undertaken by national regulatory agencies (e.g., FDA or European
Medicines Agency (EMEA)) are indicated on the right, and documents used to facilitate the
communication are indicated in the center. It should be noted that the sponsor in this schematic
could be a collaborative enterprise rather than a single commercial entity, to reflect the multistakeholder nature of the activity.
PRE-CONDITIONS
 Sufficient data for registration is available.
FLOW OF EVENTS
1. Development of a “Briefing Document,” for the regulatory agency, that describes all known
evidence accumulated that pertains to the imaging biomarker’s qualification and that lays out a plan
to complete steps to conclude the qualification process.
.1
Claims (executive summary in the guidance)
.2
Organization and administrative details for the organization sponsoring the qualification
(section 2.1 in the guidance)
.3
Clinical context for use and decision trees (section 2.2 in the guidance)
.4
Summary of literature and what is done to date (section 2.3 in the guidance)
.5
Recommendation for completing the qualification (full data package) (section 2,4 in the
guidance)
.6
Technical description of how the biomarker is measured (section 2.5 in the guidance)
2. Pursuit of a face-to-face meeting with the regulatory agency Biomarker Qualification Review Team
to elicit agency feedback on the plan to complete the “Full Data Package.”
BBMSC
33 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
3. Complete the Full Data Package:
.1
Conduct (a) pilot(s) of the meta-analysis to establish capability of the class of tests that
represent the biomarker by following workflow “Measure Correlation of Imaging Biomarkers
with Clinical Endpoints.”
.2
Conduct the pivotal meta-analysis on the test set (extending the results to the trial level and
establishing the achievable generalizability based on available data) by following workflow
“Measure Correlation of Imaging Biomarkers with Clinical Endpoints.”
4. Design, test, and implement specifications and services to utilize SDTM datasets, populate the
Janus database, and create SDTM materialized views.
5. Utilize the agency’s STDM Validation Service as it is made publicly available to clinical trial
sponsors to enable validation of their data sets using the same criteria to be applied by Janus. This
service will not load data into the Janus repository as it is intended to be used as a preview or
check of draft data sets before actual submission to FDA. Once data sets pass the SDTM
Validation, sponsors will be able to submit them to Janus via the FDA gateway.
6. Draft guidance on incorporation of the imaging biomarker into clinical trials.
7. Meeting with the Biomarker Qualification Review Team to elicit regulatory agency feedback on the
clinical efficacy study results for the purpose of obtaining agency acceptance that the biomarker
becomes known as qualified.
8. Utilize the agency’s Load Service to automatically load data sets that are submitted to Janus and
"pass" the SDTM validation checks via the integrated STDM Validation service into the Oracle
database.
9. Promote use of the qualified imaging biomarker through education.
POST-CONDITIONS
 Biomarker is qualified for use in a known and specified clinical context.
3.6. Commercial Sponsor Prepares Device / Test for Market
Individual workflows are generally understood as building on each other (Fig. 21).
Figure 18: Workflows are presented to highlight how they build on each other.
The following workflows are elaborated.
3.6.1. Organizations Issue “Challenge Problems” to Spur Innovation
One approach to encouraging innovation that has proven productive in many fields is for an organization
to announce and administer a public “challenge” whereby a problem statement is given and solutions are
solicited from interested parties that “compete” for how well they address the problem statement. The
development of image processing algorithms has benefitted from this approach with many organized
activities from a number of groups. Some of these groups are organized by industry (e.g., Medical Image
Computing and Computer Assisted Intervention or MICCAI27), academia (e.g., at Cornell University28), or
government agencies (e.g., NIST29). This workflow is intended to support such challenges.
BBMSC
34 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
With regard to benchmarking, NIST’s Information Technology Laboratory has experience with
performance evaluation of software and algorithms for text retrieval, biometrics, face recognition, speech,
and motion image quality. While the evaluation of biomedical change analysis algorithms presents very
real challenges in developing suitable assessment methods, the use of benchmarking offers to provide
significant insight into algorithms and to contribute to their improvement.
It is important to note that one of the reasons for doing this is to meet the need that a biomarker is defined
in part by the “class” of tests available for it. That is, it is not defined by a single test or candidate
implementation but rather by an aggregated understanding of potentially several such tests. As such, it is
necessary through this or other means to organize activities to determine how the class performs,
irrespective of any candidate that purports to be a member of the class. As such, this workflow is related
to the “Compliance / Proficiency Testing of Candidate Implementations” workflow and it may be that an
organization such as NIST can both host challenges as well as serve in the trusted broker role using
common infrastructure for these separate but related functions.
Note that this is an aggregate workflow that specializes workflows elaborated earlier in this document.
PRE-CONDITIONS
 A well defined problem space has been described where there are multiple parties that have an
interest in offering or developing solutions.
WORK FLOW
1. Organization hosting the challenge prepares a protocol and set of instructions, including data
interchange methods, for use by participants.
.1
If not already done, follow workflow “Install and Configure Linked Data Archive Systems”
.2
Optionally, follow workflow “Extend and Disseminate Ontologies, Templates, and
Vocabularies”
.3
Follow workflow “Import Data to Reference Data Set to Form Reference Data Set” to create
a Reference Data Set for use by participants.
.4
Follow workflow “Create Ground Truth Annotations and/or Manual Seed Points in Reference
Data Set” to annotate the Reference Data Set.
.5
Follow workflow “Develop Physical and/or Digital Phantom(s)”
.6
Follow workflow “Set up an Experimental Run” to create a Processing Pipeline script for use
by participants.
2. Potential participant downloads and read this the protocol and instructions.
3. Participant expresses interest and is cleared for participation.
.1
Participant sets up the informatics tooling associated with the Biomarker Evaluation
Framework.
.2
The participant wraps their algorithm to make it compatible with the Application Harness.
.3
Organization hosting the challenge follows workflow “Create and Manage User Accounts,
Roles, and Permissions” to register the participant.
4. Participant accesses the following as have been established for the challenge:
.1
The Processing Pipeline script which has been set up for the challenge.
.2
The Reference Data Set which has been assembled for the challenge.
5. Optionally, participant may follow any of the workflows in “Biomarker Development” to prepare for
the challenge.
6. Participant follows workflow “Execute an Experimental Run” to meet the challenge, sharing the
results according to the formal submission instructions provided by the organizer.
BBMSC
35 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
7. The organizer follows workflow “Analyze an Experimental Run” on submitted entries.
.1
Analyze results according to the challenge objectives;
.2
Provide Participants with individual analysis of their results; and
.3
Publish the results of the evaluation, without publicly identifying individual scores by
Participant.
8. If it serves the desired goal, the following workflows could be structured in a challenge format as
specializes of this otherwise flexible concept:
.1
Organize the challenge so as to follow the workflow “Measure Correlation of Imaging
Biomarkers with Clinical Endpoints”
.2
Organize the challenge so as to follow the workflow “Comparative Evaluation vs. Gold
Standards or Otherwise Accepted Biomarkers”
POST-CONDITIONS
 Participants have a calibrated understanding of how their solution compares with other solutions.
 The public (or members of the organization that sponsored the challenge), understand how the
“class” of solutions to the problem cluster in terms of performance.
3.6.2. Compliance / Proficiency Testing of Candidate Implementations
A substantial goal of this effort is to provide infrastructure to whereby multiple stakeholders collaborate to
test hypotheses about the technical feasibility and the medical value of imaging biomarkers. In these
examples, the outcome will be an efficient means to collectively gather and analyze a body of evidence
which would be accepted by regulatory bodies that can then be utilized by individual entities to make
clinical trial management more cost effective than if they had to pursue it individually. Once such imaging
biomarkers have been validated and accepted by the community including the FDA, validation of the
individual candidate tests could proceed as a “proficiency test” conducted by a trusted broker rather than
requiring each sponsor to individually collect and prove from the ground up.
The approach is to build a body of clinical evidence on the clinical utility of a given biomarker, based on
some number of tests to measure it that may be collectively described as a class of acceptable tests for
the biomarker. This class is characterized by the Profile as defined in workflow “Team Optimizes
Biomarker Using One or More Tests” and supported by the results of either or both of the “Measure
Correlation of Imaging Biomarkers with Clinical Endpoints” and/or “Comparative Evaluation vs. Gold
Standards or Otherwise Accepted Biomarkers” workflows, regardless of whether formal qualification is
sought.
The Profile is used to establish targeted levels of performance with means for formal data selection that
allows a batch process to be run on data sequestered by a trusted broker that is requested by commercial
entities that wish to obtain a certificate of compliance (formal) or simply an assessment of proficiency as
measured with respect to the Profile.
PRE-CONDITIONS
 A Profile exists and is associated with a body of clinical evidence sufficient to establish the
performance of a class of compliant measurements for the biomarker.
FLOW OF EVENTS
In this case there are two primary actors: the development sponsor, and the honest broker:
1. Individual sponsor:
BBMSC
.1
The sponsor needs to identify what clinical indications for use it wishes to have its
implementation tested against.
.2
Algorithms included in the imaging test for data and results interpretation must be prespecified before the study data is analyzed. Alteration of the algorithm to better fit the data is
36 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
generally not acceptable and may invalidate a study.
.3
It needs means to interface its implementation to the brokers system in a black box manner
such that the broker does not have visibility to proprietary implementation details.
.4
It needs to receive back performance data and supporting documentation capable of being
incorporated into regulatory filings at its discretion.
2. Honest broker:
.1
The honest broker needs means to archive data sets that may be selectively accessed
according to specific clinical indications and that may be mapped to image quality standards
that have been described as so-called “acceptable”, “target”, and “ideal”
.2
It needs to accept black-box systems for interface and running in batch on selected data
sets.
.3
It needs means to set seed points or support other reader interaction in semi-automated
scenarios.
.4
It needs to produce documentation regarding results inclusive of a charge-back mechanism
to recover operational costs.
3. The training set will continued to be available and refreshed with new cases for direct access by
interested investigators for testing of new imaging software algorithms or clinical hypotheses.
Future investigators will have access to the training set for additional studies.
4. Define services whereby the test set is indirectly accessible via the trusted broker.
POST-CONDITIONS
 Indication of whether the candidate implementation complies with the Profile (which in turn
specifies the targeted performance with respect to clinical context for use).
3.6.3. Formal Registration of Data for Approval or Clearance
Imaging devices are Approved (or Cleared) by CDRH. Recently CDRH has been requiring more evidence
of patient benefit before it will approve a new device. An issue particularly relevant to algorithm approval
is that a developer needs large numbers of clinical images for algorithm development (development), and
then a statistically valid number of cases to establish performance. Ideally the testing cases should be
different from the development cases, and regulatory agencies would like the testing to be carried out by
a neutral party to ensure that the results are trusted.
This workflowrefers to the activity of implementation teams to seek public compliance certifications or
other evaluations that are contributory in their 510(k) and/or PMA applications so as to leverage publicly
sequestered data by an honest broker. There are two objectives in doing so: a) so that individual sponsor
don’t need to bear the full cost and time to collect such data themselves; and b) to provide a trusted
objectivity that their proposed implementation is indeed a complaint member of the class of valid
implementations of a biomarker with performance that meets or exceed targeted levels of performance
that is recognized by national regulatory organizations
With this body of data in place and structured according to regulatory agency preferences as to format
(e.g., in SDTM), then it may be referenced as a “master file” contributory to approval or clearance as long
as the product has been passed through workflow “Compliance / Proficiency Testing of Candidate
Implementations” and bears a certificate of compliance.
PRE-CONDITIONS
 A Profile exists and is associated with a body of clinical evidence sufficient to establish the
performance of a class of compliant measurements for the biomarker.
 A commercial sponsor has obtained a certificate of compliance for their candidate implementation.
FLOW OF EVENTS
BBMSC
37 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
1. Design, test, and implement specifications and services to utilize SDTM datasets, populate the
Janus database, and create SDTM materialized views.
2. Utilize the agency’s STDM Validation Service as it is made publicly available to clinical trial
sponsors to enable validation of their data sets using the same criteria to be applied by Janus. This
service will not load data into the Janus repository as it is intended to be used as a preview or
check of draft data sets before actual submission to FDA. Once data sets pass the SDTM
Validation, sponsors will be able to submit them to Janus via the FDA gateway.
3. Utilize the agency’s Load Service to automatically load data sets that are submitted to Janus and
"pass" the SDTM validation checks via the integrated STDM Validation service into the Oracle
database.
POST-CONDITIONS
 Data is registered.
BBMSC
38 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
4. References
1
Clinical Pharmacology & Therapeutics (2001) 69, 89–95; doi: 10.1067/mcp.2001.113989.
2
Janet Woodcock and Raymond Woosley. The FDA Critical Path Initiative and Its Influence on New
Drug Development. Annu. Rev. Med. 2008. 59:1–12.
3
http://www.fda.gov/ScienceResearch/SpecialTopics/CriticalPathInitiative/CriticalPathOpportunitiesRep
orts/ucm077262.htm, accessed 5 January 2010.
4
http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?CFRPart=820&showFR=1,
accessed 28 February 2010.
5
http://www.answers.com/topic/phenotype. Accessed 17 February 2010.
6
http://www.answers.com/topic/surrogate-endpoint. Accessed 17 February 2010.
7
Iturralde, Mario P. (1990), CRC dictionary and handbook of nuclear medicine and clinical imaging,
Boca Raton, Fla.: CRC Press, pp. 564, ISBN 0849332338.
Giger, QIBA newsletter, February 2010.
Giger M, Update on the potential of computer-aided diagnosis for breast disease, Future Oncol.
(2010) 6(1), 1-4.
International Organization for Standardization, “Guide to the Expression of Uncertainty in
Measurement”, (International Organization for Standardization, Geneva) 1993.
Joint Committee for Guides in Metrology, “International Vocabulary of Metrology – Basic and General
Concepts and Associated Terms”, (Bureau International des Poids et Mesures, Paris) 2008.
Pan T., et al. “Imaging Data Analysis and Management for Microscopy Images of Diffuse Gliomas,”
as presented at during the November 2010 TBPT Face to Face meeting, Houston, Tx.
8
9
10
11
12
13
Food and Drug Administration, 21 CFR 820, Quality system regulation.
http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?CFRPart=820. Accessed 5
January 2010.
14
Fleiss, J. The design and analysis of clinical experiments. Wiley, New York 1986.
Bland M. and Altman D.; Measuring agreement in method comparison studies Stat Methods Med Res
1999; 8; 135.
Barnhart H. and Barboriak D. Applications of the Repeatability of Quantitative Imaging Biomarkers: A
Review of Statistical Analysis of Repeat Data Sets Translational Oncology (2009) 2, 231–235.
Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction. New
York, NY: Oxford University Press.
Zhou, Z., Obuchowski, N., & McClish, D. (2002). Statistical Methods in Diagnostic Medicine. New
York: Wiley.
Chakraborty, D. P. and Berbaum, K. S. Observer studies involving detection and localization:
Modeling, analysis and validation. Medical Physics 31(8), 2313–2330. 2004.
Edwards, D. C., Kupinski, M. A., Metz, C. E., and Nishikawa, R. M. Maximum likelihood fitting of
FROC curves under an initial-detection-and-candidate-analysis model. Medical Physics 29(12),
2861–2870. (2002).
Bandos, A, Rockette H, Song T, Gur D. Area under the Free-Response ROC Curve (FROC) and a
Related Summary Index Biometrics, 2009; 65, 247–256.
Sargent DJ et al., Validation of novel imaging methodologies for use as cancer clinical trial endpoints, European Journal of Cancer, 45 (2009) 290-299.
15
16
17
18
19
20
21
22
23
24
McCulloch CE, Searle SR. Generalized, Linear and Mixed Models. New York: Wiley; 2000.
Goodsaid F and Frueh F, Process map proposal for the validation of genomic biomarkers,
Pharmacogenomics (2006) 7(5), 773-782.
BBMSC
39 of 40
QI-Bench: Informatics Services for Quantitative Imaging
SUC Rev 1.0
25
Goodsaid FM and Frueh FW, Questions and answers about the Pilot Process for Biomarker
Qualification at the FDA, Drug Discovery Today, Vol 4, No. 1, 2007.
26
Goodsaid FM et al., Strategic paths for biomarker qualification, Toxicology 245 (2008) 219-223.
27
http://www.grand-challenge.org/index.php/Main_Page, accessed 23 December 2010.
28
http://www.preventcancer.org/uploadedFiles/Education/Conferences,_Workshops,_and_Educational_
Programs/Day-2-Friday-1100-AM-Tony-Reeves.pdf, accessed 23 December 2010.
29
http://www.nist.gov/itl/iad/dmg/biochangechallenge.cfm, accessed 23 December 2010.
BBMSC
40 of 40
Download