QI-Bench_Monthly,_2-9

advertisement
Monthly Program Update
February 9, 2012
WITH FUNDING
SUPPORT
PROVIDED BY
NATIONAL
INSTITUTE OF
STANDARDS
AND
TECHNOLOGY
Andrew J. Buckler, MS
Principal Investigator
Agenda
• 10,000 foot view of where we’ve been and
where we’re going
• Implementation-independent computational
model for quantitative imaging development
• For each QI-Bench app, a demonstration of
what exists and a gap analysis relative to model
– (including first demo of Formulate, and review of
major Execute update)
2
10,000 Foot View
Period
Activity
2009 -> Winter 2011
User needs and requirements
analysis
Spring/Summer 2011
Initial Execute (including RDSM
and BAS) and desktop Analyze
(initial prototyping)
3, 4
Autumn 2011
Initial Specify (including QIBO
and BiomarkerDB)
Individual time point
demonstrator
5, 10
Winter 2012
Initial Formulate, and major
update to Execute
3A Pilot
5, 30
Spring/Summer 2012
Computational model to drive
architecture, and support for
semi-automated workflows
3A Pivotal and
longitudinal change
demonstrator
Autumn 2012 -> 2013 Exercise end-to-end chain with
reproducible workflows
Test Bed
Developers, Users
1, 2
(various)
3
Where we left off last month…
RDF Triple Store
CT
Volumetry
used_for
Therapeutic
Efficacy
Specify
obtained_by
CT
measure_of
Tumor
growth
Formulate
Execute
Reference Data Sets
QIBO
Analyze
Y=β0..n+β1(QIB)+β2T+ eij
4
DNF Model: Data
// Data resources:
RawDataType = ImagingDataType | NonImagingDataType | ClinicalVariableType
CollectedValue = Value + Uncertainty
DataService = { RawData | CollectedValue }
// implication being that contents may change over time
ReferenceDataSet = { RawData | CollectedValue }
// with fixed refresh policy and documented (controlled) provenance
// Derived from analysis of one or more ReferenceDataSets:
TechnicalPerformance = Uncertainty | CoefficientOfVariation |
CoefficientOfReliability | …
ClinicalPerformance = ReceiverOperatingCharacteristic | PPV/NPV |
RegressionCoefficient | …
SummaryStatistic = TechnicalPerformance| ClinicalPerformance
5
DNF Model: Knowledge
// Managed as Knowledge store:
Relation = subject property object (property object)
BiomarkerDB = { Relation }
//Examples:
OntologyConcept has Instance
| Biomarker isUsedFor BiologicalUse // “use”
| Biomarker isMeasuredBy AssayMethod // “method”
| AssayMethod usesTemplate AimTemplate // “template”
| AimTemplate includes CollectedValuePrompt // “prompt”
| ClinicalContext appliesTo IndicatedBiology // “biology”
| (AssayMethod targets BiologicalTarget) withStrength TechnicalPerformance
| (Biomarker pertainsTo ClinicalContext) withStrength ClinicalPerformance
| generalizations beyond this
6
Requirements drive function
// Business Requirements
FNIH, QIBA, and C-Path participants don’t have a way to provide precise specification for
context for use and applicable assay methods (to allow semantic labeling):
BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling);
7
Requirements drive function
// Business Requirements
FNIH, QIBA, and C-Path participants don’t have a way to provide precise specification for
context for use and applicable assay methods (to allow semantic labeling):
BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling);
Researchers and consortia don’t have an ability to exploit existing data resources with
high precision and recall:
ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} );
8
Requirements drive function
// Business Requirements
FNIH, QIBA, and C-Path participants don’t have a way to provide precise specification for
context for use and applicable assay methods (to allow semantic labeling):
BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling);
Researchers and consortia don’t have an ability to exploit existing data resources with
high precision and recall:
ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} );
Technology developers and contract research organizations don’t have a way to do largescale quantitative runs:
ReferenceDataSet .CollectedValue+ = Execute (ReferenceDataSet.RawData);
9
Requirements drive function
// Business Requirements
FNIH, QIBA, and C-Path participants don’t have a way to provide precise specification for
context for use and applicable assay methods (to allow semantic labeling):
BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling);
Researchers and consortia don’t have an ability to exploit existing data resources with
high precision and recall:
ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} );
Technology developers and contract research organizations don’t have a way to do largescale quantitative runs:
ReferenceDataSet .CollectedValue+ = Execute (ReferenceDataSet.RawData);
The community lacks way to apply definitive statistical analyses of annotation and image
markup over specified context for use:
BiomarkerDB.SummaryStatistic+ = Analyze ( { ReferenceDataSet .CollectedValue } );
10
Requirements drive function
// Business Requirements
FNIH, QIBA, and C-Path participants don’t have a way to provide precise specification for
context for use and applicable assay methods (to allow semantic labeling):
BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling);
Researchers and consortia don’t have an ability to exploit existing data resources with
high precision and recall:
ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} );
Technology developers and contract research organizations don’t have a way to do largescale quantitative runs:
ReferenceDataSet .CollectedValue+ = Execute (ReferenceDataSet.RawData);
The community lacks way to apply definitive statistical analyses of annotation and image
markup over specified context for use:
BiomarkerDB.SummaryStatistic+ = Analyze ( { ReferenceDataSet .CollectedValue } );
Industry lacks standardized ways to report and submit data electronically:
efiling transactions+ = Package (BiomarkerDB, {ReferenceDataSet} );
11
Layered requirement: Reproducible
Workflows with Documented Provenance
// Business Requirements
FNIH, QIBA, and C-Path participants don’t have a way to provide precise specification for
context for use and applicable assay methods (to allow semantic labeling):
BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling);
Researchers and consortia don’t have an ability to exploit existing data resources with
high precision and recall:
ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} );
Technology developers and contract research organizations don’t have a way to do largescale quantitative runs:
ReferenceDataSet .CollectedValue+ = Execute (ReferenceDataSet.RawData);
The community lacks way to apply definitive statistical analyses of annotation and image
markup over specified context for use:
BiomarkerDB.SummaryStatistic+ = Analyze ( { ReferenceDataSet .CollectedValue } );
Industry lacks standardized ways to report and submit data electronically:
efiling transactions+ = Package (BiomarkerDB, {ReferenceDataSet} );
12
Implementation-independent
computational model:
// Altogether
efiling transactions =
Package (Analyze (Execute (Formulate (Specify (biomarker domain expertise),
DataService))));
Computational and informatics design implications:
• It is possible to model chain as a process to achieve prescribed levels of
statistical significance for validity and utility
• It is possible to apply logical and statistical inference to address generalizability
of the results across clinical contexts
• Of course, bottleneck remains availability of data; purpose here is to define
informatics services to make best use of that data as to:
– How to optimize information content from any given experimental study, and
– How to incorporate individual study results into a formally defined description of the
biomarker acceptable to regulatory agencies.
13
BiomarkerDB = Specify (domain expertise to describe biomarkers);
Role
Domain
expert
Use Case
Supported Now
Gap vs. Model
Add/edit
knowledge
Web App
(thin)
Can create triples
in local store for
local community
Need to create knowledge
spanning domains and
communities (not just
local)
Desktop
(thick)
Triples can be
curated singly,
Ontology curated
in Protégé
Need to edit clusters of
triples (not just one at a
time), need system support
for input to ontology-level
curation
All ontologies on
Server-side BioPortal are
supported
Can’t link across ontologies
directly
Informaticist
Curate
knowledge
IT systems
expert
Link
ontologies
14
Project directions for Specify
•
Extend the QIBO to link to existing established
ontologies
1.
2.
•
leverage BFO upper ontology to align different
ontologies
convert portions of BRIDG and LSDAM to ontology
models in OWL
Automated conversion done in two steps:
1.
2.
•
convert current Sparx Enterprise Architect XMI
EMF UML format
export resulting EMF UML into a RDF/OWL
representation using TopBraid Composer
Provide GUI to traverse the QIBO concepts according
to their relationships and create statements
represented as RDF triples and stored in an RDF store.
•
•
example: “Image is from Patient AND Patient has age
>60 AND Patient has Disease where Disease has value
Lung Cancer AND Patient has Smoking Status where
Smoking Status has value True.” This translates to “Find
me all images that are from a patient older than 60,
diagnosed with lung cancer and is a smoker”.
Each set of RDF triples will be stored as a “profile” in
Bio2RDF
15
ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} );
Role
Use Case
Supported Now
Find data
with high
precision
and recall
Web App
(thin)
Form and
expose
Informaticist
queries to
find data
Desktop
(thick)
Domain
expert
IT systems
expert
Configure
knowledge
resources
and data
services
Gap vs. Model
Perform saved
searches and
organize data
Granular, role-based,
security with single sign-on
to both private and public
data resources
Use UML models
Define queries in terms of
RDF triples driven by
ontologies (not UML)
Prepare and support more
Configure
flexible method (e.g.,
Server-side resources that use
SPARQL) so as not to be
caGrid
limited by caGrid
16
Project directions for Formulate
•
Enables users to select the profiles (set
of RDF triples) created in Specify,
execute a query and retrieve the
results in various forms.
• assemble/transform the set of
RDF triples to SPARQL queries:
1. form an uninterrupted chain
linking the instance of the
input class from the ontology
to the desired output class
2. formulate/invoke necessary
SPARQL queries against the
web services.
Activities:
•Load models used in Specify to caB2B
•A transformer that will take Source (specify)
•Load it to caB2B
•Expose the query through caB2B Web Client
•Create a form-based query parameterization utility
Longer term:
•Thick client will assist modification of the query or
facilitate reparameterization
Activities
• End point representation (caGrid Service/SPARQL)
• Target language is not DCQL but SPARQL
• Come up with some SPARQL End Point wrappers for some projects
of interest (pilots)
•Universal wrapper for caGrid services (?)
•For sure one for DICOM and TCIA/Midas
Activities
• Results in RDF
• Transformed results in CSV and some form digestable by Midas
• Directly stored (direct interfacing)
• Manual load (indirect interfacing)
17
ReferenceDataSet .CollectedValue+ = Execute (ReferenceDataSet.RawData);
Role
Domain
expert
Use Case
Set up
studies to
gather
imaging
results
Establish
metadata
standards
Informaticist
and define
scripted
runs
IT systems
expert
Configure
data marts
of mixed
data types
Supported Now
Gap vs. Model
Web App
(thin)
Warehouse data
flexibly, initiate
automated and
(soon) semiautomated runs
More complete automated
support for data curation
and script editing (current
automated support for
curation is limited)
Desktop
(thick)
Single time point
and (soon)
longitudinal
change studies on
single data types
Build composite markers
with multiple parameters
per modality and spanning
multiple modalities
Support multiple
Server-side and mixed types
of bit streams
Explicit support for IHE
Profiles
18
Project directions for Execute
•
Script to write “Image Formation” content into Biomarker Database for provenance
of Reference Data Sets: Application for pulling in data from the “Image Formation” schema to populate the biomarker
database. This data will originate from the DICOM imagery imported into QI-Bench.
•
Laboratory Protocol for the NBIA Connector and Batch Analysis Service: Laboratory protocol
to describe the use of the NBIA Connector and the Image Formation script to import data into QIBench and use of the Batch Analysis Service
for server-side processing.
•
Support change analysis biomarkers serial studies (up to two time points in the
current period, extensible to additional in subsequent development iterations): Support
experiments including at minimum two time points. An example of this is the change in volume or SUV, rather than (only) estimation of the
value at one time point.
•
•
Document and solidify the API harness for execution modules of the Batch Analysis
Service: This task will include the documentation and complete specification of the Batchmake Application API.
Support scripted reader studies: Support reader studies through worklist items specified via AIM templates as well as
Query/Retrieve via DICOM standards for interaction with reader stations. ClearCanvas will serve as the target reader station for the first
implementation.
•
Automated support for export of data to Analyze: Including generation of annotation and image markup
output from reference algorithms (i.e., LSTK for volumetric CT and Slicer3D for SUV) based on AIM templates instead of the current hardcoded implementation. An AIM Template is an .xsd file.
•
Automated support for import of data from Formulate: This task will include refactoring and
stabilization of the NBIA Connector in order to incorporate its functionality into Formulate.
19
BiomarkerDB.SummaryStatistic+ = Analyze ({ReferenceDataSet .CollectedValue});
Role
Domain
expert
Use Case
Run and
modify
statistical
analyses
Configure
set of
Informaticist analysis
scripts in
toolbox
IT systems
expert
Configure
data input
and
output
services
Supported Now
Gap vs. Model
Web App
(thin)
n/a
Desktop
(thick)
Persist statistical
Run relevant
calculations results as Ncalculations at
ary relations in knowledge
technical
store, across larger data
performance level
sets and at clinical level too
Create the web app
Connect directly to RDSM
Connects to
and knowledge store (and
Server-side (some)caGrid data through Formulate,
services
broader set of data
services)
20
Analyze
Current Capabilities
21
Project directions for Analyze
Desktop GUI
Current MVT
MVT Measurement
XIP
LIB
R LIB
Variability Tool
Cached objects:
AIM/DICOM, etc
Data Access
AIM, DICOM
Experiment data
Metadata
NonGrid
Data
Sources
AD Client
Modified XIP
Host
Images, Patient info
Annotations, Collections
Experiments
Web Client
AD
Server
Scope:
• GWT (or Tapestry) UI ; Create
web client version
• Change from DB2 to
RESTful service layer; Add
calculation results to
persistent database
Implemented according to Open
Source Development
Initiative (OSDI)
recommendations;
In such a way as to enable the
enhancement roadmap; and
Integrated with projects driving
advanced semantics and FDA
support
Data services
22
efiling transactions+ = Package (BiomarkerDB, {ReferenceDataSet});
Role
Use Case
Define and
“pull” a
data
package
Domain
expert
Define
Informaticist data
mappings
IT systems
expert
Connect to
electronic
regulatory
systems
Supported Now
Gap vs. Model
Web App
(thin)
n/a
Define and “pull” a data
package
Desktop
(thick)
n/a
Define data mappings
Server-side n/a
Connect to electronic
regulatory systems
Package is a lower priority application, current effort is to participate
In vocabulary standards efforts, staging implementation later.
23
Outlook
Period
Activity
2009 -> Winter 2011
User needs and requirements
analysis
Spring/Summer 2011
Initial Execute (including RDSM
and BAS) and desktop Analyze
(initial prototyping)
3, 4
Autumn 2011
Initial Specify (including QIBO
and BiomarkerDB)
Individual time point
demonstrator
5, 10
Winter 2012
Initial Formulate, and major
update to Execute
3A Pilot
5, 30
Spring/Summer 2012
Computational model to drive
architecture, and support for
semi-automated workflows
3A Pivotal and
longitudinal change
demonstrator
Autumn 2012 -> 2013 Exercise end-to-end chain with
reproducible workflows
Test Bed
Developers, Users
1, 2
(various)
24
25
Value proposition of QI-Bench
• Efficiently collect and exploit evidence establishing
standards for optimized quantitative imaging:
– Users want confidence in the read-outs
– Pharma wants to use them as endpoints
– Device/SW companies want to market products that produce them
without huge costs
– Public wants to trust the decisions that they contribute to
• By providing a verification framework to develop
precompetitive specifications and support test
harnesses to curate and utilize reference data
• Doing so as an accessible and open resource facilitates
collaboration among diverse stakeholders
26
Summary:
QI-Bench Contributions
• We make it practical to increase the magnitude of data for increased
statistical significance.
• We provide practical means to grapple with massive data sets.
• We address the problem of efficient use of resources to assess limits of
generalizability.
• We make formal specification accessible to diverse groups of experts that are
not skilled or interested in knowledge engineering.
• We map both medical as well as technical domain expertise into
representations well suited to emerging capabilities of the semantic web.
• We enable a mechanism to assess compliance with standards or
requirements within specific contexts for use.
• We take a “toolbox” approach to statistical analysis.
• We provide the capability in a manner which is accessible to varying levels of
collaborative models, from individual companies or institutions to larger
consortia or public-private partnerships to fully open public access.
27
QI-Bench
Structure / Acknowledgements
•
•
Prime: BBMSC (Andrew Buckler, Gary Wernsing, Mike Sperling, Matt Ouellette)
Co-Investigators
–
–
•
•
Financial support as well as technical content: NIST (Mary Brady, Alden Dima, Guillaume Radde)
Collaborators / Colleagues / Idea Contributors
–
–
–
–
–
–
•
FDA (Nick Petrick, Marios Gavrielides)
UCLA (Grace Kim)
UMD (Eliot Siegel, Joe Chen, Ganesh Saiprasad)
VUmc (Otto Hoekstra)
Northwestern (Pat Mongkolwat)
Georgetown (Baris Suzek)
Industry
–
–
•
Kitware (Rick Avila, Patrick Reynolds, Julien Jomier, Mike Grauer)
Stanford (David Paik, Tiffany Ting Liu)
Pharma: Novartis (Stefan Baumann), Merck (Richard Baumgartner)
Device/Software: Definiens (Maria Athelogou), Claron Technologies (Ingmar Bitter)
Coordinating Programs
–
–
RSNA QIBA (e.g., Dan Sullivan, Binsheng Zhao)
Under consideration: CTMM TraIT (Andre Dekker, Jeroen Belien)
28
Download