4th Program Face to Face February 25, 2013 WITH FUNDING SUPPORT PROVIDED BY NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY Andrew J. Buckler, MS Principal Investigator, QI-Bench Agenda 1:00 PM Overview of QI-Bench Progress since last F2F CTP and Q/R Demonstration Buckler Reynolds 1:30 PM Architecture and Design Motivation Component Model and Biomarker DB Data Virtualization Layer Dima Wernsing Reynolds 2:30 PM Break Specify and Formulate Compute Services, Analysis Library, Workflows QI-Bench Client / Workstation Suzek Danagoulian Wernsing 3:45 PM CT Volumetry Test Bed Buckler 4:15 PM Wrap-up (all) 4:45 PM Adjourn 2 Resources are needed to address widening gap in imaging capability as practiced vs. capability of modern medicine 3 Example: Beyond Anatomy to Palette of Functional Measures 18F-NaF 18F-FDG bone formation glucose metabolism 18F- 18F-FLT FACBC amino acid metabolism angiogenesis proliferation Biologic Target hypoxia 18F- DCE-MRI FMISO PET receptor status 18F-FES apotosis 18F-XXX 4 Biomarker Representation in Imaging vs. Genomics/Proteomics • Imaging has been around far longer than genomics/proteomics 1895 • 1995 Both are arrays of numbers but only one has data conveniently “pre-aligned” for quantitative analysis Missing for imaging 5 Community Development of Quantitative Imaging Biomarkers • User Base: – Consortia and foundations interested in broadly promoting imaging biomarkers (e.g., FNIH Biomarkers Consortium, Prevent Cancer Foundation, RSNA QIBA) – Academic groups and research centers developing novel imaging biomarkers and applications (e.g., Stanford, Georgetown, etc.) – Medical device and software manufacturers producing software that quantifies image biomarkers (e.g., Definiens, Vital Images) – Biopharmaceutical companies and/or CROs interested in utilizing specific imaging biomarkers in clinical trials (e.g., Merck, Otsuka) – Government regulatory and standards agencies (e.g., FDA, NIST) A community of people working together: No single stakeholder can do it alone, and this results in a need for standardized terminology and applications using it. 6 Ex vivo and In vivo Biomarker Resources Ex vivo Biomarkers (genomic/proteomic) In vivo Biomarkers (imaging) Material Resources Biobanks Probe/Tracer Banks (Karolinska Institutue Biobank, British Columbia Biobank) (Radiotracer Clearinghouse) Data Resources Biomarker Databases Imaging Biomarker Resources Metadata Resources (GEO, ArrayExpress, EDRN Biomarker Database, Infectious Disease Biomarker Database) (Midas, NBIA, Xnat, …) Information Models Information Models GO, MIAME RadLex, DICOM, AIM, etc. Certainly, as the science evolves, ex vivo and in vivo biomarkers will be thought of as on the same playing field and even combined 7 QIBO is analogous to GO • Advantages of shared terminology: – GO: Gene families, homologs, orthologs create rich relationships; synonyms between researchers resolved – QIBO: Imaging biomarkers have rich relationships; synonyms between researchers resolved • Scope of the ontology: – GO: does not enumerate all gene products; supports annotation – QIBO: does not enumerate all imaging biomarkers; supports annotation • Cross-links between collaborating databases – GO: ArrayExpress, EMBL, Ensembl, GeneCards, KEGG, MGD, NextBio, PDB, SGD, UniProt, etc... – QIBO: NBIA, Radiotracer Clearinghouse, etc… • Variable level of detail queries: – GO: all gene products in mouse genome vs. zooming in on only receptor tyrosine kinases – QIBO: all ways to measure tumor volume vs. zooming in on % change of CT measurements of NSCLC tumor volumes 8 Worked Example (starting from claim analysis we discussed in February 2011) Measurements of tumor volume are more precise (reproducible) than unidimensional tumor measurements of tumor diameter. Longitudinal changes in whole tumor volume during therapy predict clinical outcomes (i.e., OS or PFS) earlier than corresponding uni-dimensional measurements. Therefore, tumor response or progression as determined by tumor volume will be able to serve as the primary endpoint in well-controlled Phase II and III efficacy studies of cytotoxic and selected targeted therapies (e.g., antiangiogenic agents, tyrosine kinase inhibitors, etc.) in several solid, measurable tumors (including both primary and metastatic cancers of, e.g., lung, liver, colorectal, gastric, head and neck cancer,) and lymphoma. Changes in tumor volume can serve as the endpoint for regulatory drug approval in registration trials. Biomarker claim statements are information-rich and may be used to set up the needed analyses. 9 The user enters information from claim into the knowledgebase using Specify Measurements of tumor volume are more precise (reproducible) than uni-dimensional tumor measurements of tumor diameter. Longitudinal changes in whole tumor volume during therapy predict clinical outcomes (i.e., OS or PFS) earlier than corresponding uni-dimensional measurements. Therefore, tumor response or progression as determined by tumor volume will be able to serve as the primary endpoint in wellcontrolled Phase II and III efficacy studies of cytotoxic and selected targeted therapies (e.g., antiangiogenic agents, tyrosine kinase inhibitors, etc.) in several solid, measurable tumors (including both primary and metastatic cancers of, e.g., lung, liver, colorectal, gastric, head and neck cancer,) and lymphoma. Changes in tumor volume can serve as the endpoint for regulatory drug approval in registration trials. Categ oric Contin uous Subject Predicate Object CT images Tumor Volumetry analyzes CT Longitudinal Volumetry estimates TumorSize Change TumorSize Change predicts Treatment Response Contin uous 10 …pulling various pieces of information, Measurements of tumor volume are more precise (reproducible) than uni-dimensional tumor measurements of tumor diameter. Longitudinal changes in whole tumor volume during therapy predict clinical outcomes (i.e., OS or PFS) earlier than corresponding uni-dimensional measurements. Therefore, tumor response or progression as determined by tumor volume will be able to serve as the primary endpoint in wellcontrolled Phase II and III efficacy studies of cytotoxic and selected targeted therapies (e.g., antiangiogenic agents, tyrosine kinase inhibitors, etc.) in several solid, measurable tumors (including both primary and metastatic cancers of, e.g., lung, liver, colorectal, gastric, head and neck cancer,) and lymphoma. Changes in tumor volume can serve as the endpoint for regulatory drug approval in registration trials. Interv ention Target Subject Predicate Object CT images Tumor Volumetry analyzes CT <compliant>Longit udinalVolumetry estimates TumorSizeChange TumorSizeChange predicts CytotoxicTreatment Response TyrosineKinase Inhibitor is CytotoxicTreatment well-controlled Phase II and III efficacy studies uses CytotoxicTreatment Response Cytotoxic Treatment influences NonSmallCellLung Cancer CT images Thorax Thorax contains NonSmallCellLung Cancer Indicat ion 11 …to form the specification. Measurements of tumor volume are more precise (reproducible) than uni-dimensional tumor measurements of tumor diameter. Longitudinal changes in whole tumor volume during therapy predict clinical outcomes (i.e., OS or PFS) earlier than corresponding uni-dimensional measurements. Therefore, tumor response or progression as determined by tumor volume will be able to serve as the primary endpoint in wellcontrolled Phase II and III efficacy studies of cytotoxic and selected targeted therapies (e.g., antiangiogenic agents, tyrosine kinase inhibitors, etc.) in several solid, measurable tumors (including both primary and metastatic cancers of, e.g., lung, liver, colorectal, gastric, head and neck cancer,) and lymphoma. Changes in tumor volume can serve as the endpoint for regulatory drug approval in registration trials. To substantiate quality of evidence development To produce data for registration Subject Predicate Object CT images Tumor Volumetry analyzes CT <compliant>Longitudinal Volumetry estimates TumorSizeChange TumorSizeChange predicts CytotoxicTreatmentResponse TyrosineKinaseInhibitor is CytotoxicTreatment well-controlled Phase II and III efficacy studies uses CytotoxicTreatmentResponse CytotoxicTreatment influences NonSmallCellLungCancer CT images Thorax Thorax contains NonSmallCellLungCancer regulatory drug approval dependsOn PrimaryEndpoint well-controlled Phase II and III efficacy studies assess PrimaryEndpoint CT Volumetry is <putative>SurrogateEndpoint 12 Formulate interprets the specification as testable hypotheses, Measurements of tumor volume are more precise (reproducible) than uni-dimensional tumor measurements of tumor diameter. Longitudinal changes in whole tumor volume during therapy predict clinical outcomes (i.e., OS or PFS) earlier than corresponding uni-dimensional measurements. Therefore, tumor response or progression as determined by tumor volume will be able to serve as the primary endpoint in wellcontrolled Phase II and III efficacy studies of cytotoxic and selected targeted therapies (e.g., antiangiogenic agents, tyrosine kinase inhibitors, etc.) in several solid, measurable tumors (including both primary and metastatic cancers of, e.g., lung, liver, colorectal, gastric, head and neck cancer,) and lymphoma. Changes in tumor volume can serve as the endpoint for regulatory drug approval in registration trials. Technical characteri stic Type of biomarker, in this case predictive (could have been something else, e.g., prognostic), to establish the mathematical formalism Subject Predicate Object CT images Tumor Volumetry analyzes CT <compliant>Longitudinal Volumetry estimates TumorSizeChange 1 TumorSizeChange predicts CytotoxicTreatmentResponse 2 TyrosineKinaseInhibitor is CytotoxicTreatment well-controlled Phase II and III efficacy studies uses CytotoxicTreatmentResponse CytotoxicTreatment influences NonSmallCellLungCancer CT images Thorax Thorax contains NonSmallCellLungCancer regulatory drug approval dependsOn PrimaryEndpoint well-controlled Phase II and III efficacy studies assess PrimaryEndpoint CT Volumetry is <proven>SurrogateEndpoint 13 3 …setting up an investigation (I), study (S), assay (A) hierarchy… Subject Predicate Object CT images Tumor Volumetry analyzes CT 1 <compliant>Longitudinal Volumetry estimates TumorSizeChange 2 TumorSizeChange predicts CytotoxicTreatmentResponse TyrosineKinaseInhibitor is CytotoxicTreatment well-controlled Phase II and III efficacy studies uses CytotoxicTreatmentResponse CytotoxicTreatment influences NonSmallCellLungCancer Investigation-Study-Assay Hierarchy: CT images Thorax • Thorax contains NonSmallCellLungCancer regulatory drug approval dependsOn PrimaryEndpoint well-controlled Phase II and III efficacy studies assess PrimaryEndpoint CT Volumetry is <putative>SurrogateEndpoint 3 Investigations to Prove the Hypotheses: 1. 2. 3. • • • Technical Performance = Biological Target + Assay Method Clinical Validity = Indicated Biology + Technical Performance Clinical Utility = Biomarker Use + Clinical Validity Investigation = {Summary Statistic} + {Study} Study = {Descriptive Statistic} + Protocol + {Assay} Assay = RawData + {AnnotationData} AnnotationData = [AIM file|mesh|…] 14 …and loading data into Execute (at least raw data, possibly annotations if they already exist) DISCOVERED DATA: Subject Predicate Object A Is Patient A isDiagnosedWith DiseaseA DiseaseA Is NonSmallLCellLunCancer Pazopanib Is TyrosoineKinaseInhibitor A hasBaseline CT A hasTP1 CT A hasTP2 CT B isDiagnosedWith DiseaseA B hasBaseline CT B hasTP1 CT A hasOutcome Death B hasOutcome Survival …LOADING DATA INTO THE RDSM: …ADDING TRIPLES TO CAPTURE URIs: Subject Predicate Object ClinicalUtility is Investigation URI ClinicalValidity is Investigation URI TechnicalPerformance is Investigation URI Investigation has SummaryStatisticType Investigation has Study URI Study has DescriptiveStatisticType Study has Protocol URI Study has Assay URI Assay has RawData URI 15 If no annotations, Execute creates them (in either case leaving Analyze with its data set up for it) Either in batch or via Scripted reader studies (using “Share” and “Duplicate” functions of RDSM to leverage cases across investigations) (self-generating knowledgebase from RDSM hierarchy and ISA-TAB description files) Subject Predicate Object ClinicalUtility is Investigation URI ClinicalValidity is Investigation URI TechnicalPerformance is Investigation URI Investigation has SummaryStatisticType Investigation has Study URI Study has DescriptiveStatisticType Study has Protocol URI Study has Assay URI Assay has RawData URI Assay has AnnotationData URI AIM file is AnnotationData URI Mesh is AnnotationData URI 16 Analyze performs the statistical analyses… Subject Predicate Object Subject Predicate Object CT images Tumor A Is Patient Volumetry analyzes CT A isDiagnosedWith DiseaseA 1 <compliant>Longitudinal Volumetry estimates TumorSizeChange DiseaseA Is NonSmallLCellLunCancer 2 TumorSizeChange predicts CytotoxicTreatmentResponse A hasClinicalObserva tion B TyrosoineKinaseInhibitor is CytotoxicTreatment B Is TumorShrinkage well-controlled Phase II and III efficacy studies uses CytotoxicTreatmentResponse C Is Patient CytotoxicTreatment influences NonSmallCellLungCancer C hasClinicalObserva tion B CT images Thorax D hasClinicalObserva tion B Thorax contains NonSmallCellLungCancer Pazopanib Is TyrosoineKinaseInhibitor regulatory drug approval dependsOn PrimaryEndpoint A isTreatedWith Pazopanib well-controlled Phase II and III efficacy studies assess PrimaryEndpoint A hasOutcome Death CT Volumetry is SurrogateEndpoint for CytotoxicTreatment C hasOutcome Survival 3 17 …and adds the results to the knowledgebase (using W3C “best practices” for “relation strength”). Subject Predicate Object Subject Predicate Object CT images Tumor 45324 biasMethod <r script used> Volumetry analyzes CT 45324 bias <summary statistic> 1 <compliant>Longitudinal Volumetry estimates TumorSizeChange 45324 variabilityMethod <r script used> 2 TumorSizeChange predicts CytotoxicTreatmentResponse 45324 variability <summary statistic> URI=45324 3 URI=9956 TyrosoineKinaseInhibitor is CytotoxicTreatment 9956 <correlation>Method <r script used> well-controlled Phase II and III efficacy studies uses CytotoxicTreatmentResponse 9956 correlation <summary statistic> CytotoxicTreatment influences NonSmallCellLungCancer 9956 <ROC>Method <r script used> CT images Thorax 9956 ROC <summary statistic> Thorax contains NonSmallCellLungCancer 98234 Effect of treatment on true endpoint <value> regulatory drug approval dependsOn PrimaryEndpoint 98234 Effect of treatment on surrogate endpoint <value> well-controlled Phase II and III efficacy studies assess PrimaryEndpoint 98234 Effect of surrogate on true endpoint <value> CT Volumetry is SurrogateEndpoint for CytotoxicTreatment 98234 Effect of treatment on true endpoint relative to that on surrogate endpoint <value> URI=98234 18 Package Structure submissions according to eCTD, HL7 RCRIM, and SDTM Subject Predicate Object 45324 biasMethod <r script used> 45324 bias <summary statistic> 45324 variabilityMethod <r script used> 45324 variability <summary statistic> 9956 <correlation>Method <r script used> 9956 correlation <summary statistic> 9956 <ROC>Method <r script used> 9956 ROC <summary statistic> 98234 Effect of treatment on true endpoint <value> 98234 Effect of treatment on surrogate endpoint <value> 98234 Effect of surrogate on true endpoint <value> 98234 Effect of treatment on true endpoint relative to that on surrogate endpoint <value> Section 2 Summaries 2.1. Biomarker Qualification Overview 2.1.1. Introduction 2.1.2. Context of Use 2.1.3. Summary of Methodology and Results 2.1.4. Conclusion 2.2. Nonclinical Technical Methods Data 2.2.1. Summary of Technical Validation Studies and Analytical Methods 2.2.2. Synopses of individual studies 2.3. Clinical Biomarker Data 2.3.1. Summary of Biomarker Efficacy Studies and Analytical Methods 2.3.2. Summary of Clinical Efficacy [one for each clinical context] 2.3.3. Synopses of individual studies Section 3 Quality <used when individual sponsor qualifies marker in a specific NDA> Section 4 Nonclinical Reports 4.1. Study reports 4.1.1. Technical Methods Development Reports 4.1.2. Technical Methods Validation Reports 4.1.3. Nonclinical Study Reports (in vivo) 4.2. Literature references Section 5 Clinical Reports 5.1. Tabular listing of all clinical studies 5.2. Clinical study reports and related information 5.2.1. Technical Methods Development reports 5.2.2. Technical Methods Validation reports 5.2.3. Clinical Efficacy Study Reports [context for use] 5.3. Literature references 19 20 Development priorities Theoretical Base Domain Specific Language Functionality Curation pipeline workflows Computational Model DICOM: • Segmentation objects • Query/retrieve • Structured Reporting Enterprise vocabulary / data service registry Worklist for scripted reader studies End-to-end Specify-> Package workflows Improved query / search tools (including link of Formulate and Execute) Executable Specifications Continued expansion of Analyze tool box Test Beds Further analysis of 1187/4140, 1C, and other data sets using LSTK and/or use API to other algorithms Support more 3A-like challenges Integration of detection into pipeline Meta-analysis of reported results using Analyze False-positive reduction in lung cancer screening Other biomarkers 21 Unifying Goal • Perform end-to-end characterization of imaging biomarkers (e.g., vCT) including meta-analysis of literature, incorporation of results from groups like QIBA, and "scaled up" using automated detection and reference quantification methods. • Integrated characterization across heterogenous data sources (e.g., QIBA, FDA, LIDC/RIDER, Give-a-scan, Open Science sets), through analysis modules and rolling up in a way directly useful for electronic submissions. • Specifically have medical physicists, statisticians, and imaging scientists able to use it (as opposed to only software engineers) 22 CTP AND Q/R DEMONSTRATION 23 DICOM Protocol Implementation • Uses DCMTK to handle protocol interaction • Datasets can be selectively available via the DICOM interface • Protocol support was tested on Osirix, Clear Canvas, and Ginkgo CADx 24 DICOM Anonymization • Remove Patient Identifying Information based on established protocols • Leverages the Clinical Trials Processor (CTP) from the Radiological Society of North America • All processing happens client-side 25 Demo 26 Demo 27 Demo 28 Demo 29 Demo 30 dima ARCHITECTURE AND DESIGN MOTIVATION 31 Big Picture Motivations for Rethinking Design and Architecture • Fully embracing reuse and open source can lead to an eclectic architectures and implementations • Issues: – Finding broadly fluent developers – System deployment and maintenance – Compliance with organizational security plans – Potential loss of architectural coherence and project focus • Strategy – Describe the required system using a Domain Specific Language (DSL) – Use description to guide implementation – Use Java Platform as much as possible for implementation • Started with sketch using a Backus-Naur (BNF) notation • Began looking at describing portions of system in a Javabased DSL 33 DSL Examples Simple Camera Language SQL Grammar: <Program> ::= <CameraSize> <CameraPosition> SELECT Book.title AS Title, COUNT(*) AS Authors FROM Book JOIN Book_author ON Book.isbn = Book_author.isbn GROUP BY Book.title; <CommandList> <CameraSize> ::= "set" "camera" "size" ":" <number> "by" <number> "pixels" "." <CameraPosition> ::= "set" "camera" "position" ":" <number> "," <number> "." <CommandList> ::= <Command>+ <Command> ::= "move" <number> "pixels" <Direction> "." <Direction> ::= "up" | "down" | "left" | "right“ Example: Set camera size: 400 by 300 pixels. Set camera position: 100, 100. Move 200 pixels right. Move 100 pixels up. 34 BNF Model Data Knowledge Data resources: RawDataType = ImagingDataType | NonImagingDataType | ClinicalVariableType CollectedValue = Value + Uncertainty DataService = { RawData | CollectedValue } Implication that contents may change over time ReferenceDataSet = { RawData | CollectedValue } With fixed refresh policy and documented (controlled) provenance Managed as Knowledge store: Relation = subject property object (property object) BiomarkerDB = { Relation } Derived from analysis of ReferenceDataSets: TechnicalPerformance = Uncertainty | CoefficientOfVariation | CoefficientOfReliability | … ClinicalPerformance = ReceiverOperatingCharacteristic | PPV/NPV | RegressionCoefficient | … SummaryStatistic = TechnicalPerformance| ClinicalPerformance Examples: OntologyConcept has Instance | Biomarker isUsedFor BiologicalUse use | Biomarker isMeasuredBy AssayMethod method | AssayMethod usesTemplate AimTemplate template | AimTemplate includes CollectedValuePrompt prompt | ClinicalContext appliesTo IndicatedBiology biology | (AssayMethod targets BiologicalTarget) withStrength TechnicalPerformance | (Biomarker pertainsTo ClinicalContext) withStrength ClinicalPerformance | generalizations beyond this 35 Business Requirements Provide: • Means for FNIH, QIBA, and C-Path participants to precisely specify context for use and applicable assay methods (allow semantic labeling): BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling); • Ability for researchers and consortia to use data resources with high precision and recall: ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} ); • Vehicle for technology developers and contract research organizations to do largescale quantitative runs: ReferenceDataSet .CollectedValue+ = Execute (ReferenceDataSet.RawData); • Means for community to apply definitive statistical analyses of annotation and image markup over specified context for use: BiomarkerDB.SummaryStatistic+ = Analyze ( { ReferenceDataSet .CollectedValue } ); • Standardized methods for industry to report and submit data electronically: efiling transactions+ = Package (BiomarkerDB, {ReferenceDataSet} ); 36 Computational Model efiling transactions = Package (Analyze (Execute (Formulate (Specify (biomarker domain expertise), DataService)))); Data availability is the bottleneck - purpose here is to define informatics services to make best use of data to: – Optimize information content from any given experimental study, and – Incorporate individual study results into a formally defined description of the biomarker acceptable to regulatory agencies. 37 wernsing COMPONENT MODEL AND BIOMARKER DB 38 39 Most familiar: Data Services 40 Also familiar: Compute Services 41 Less familiar to some, but foundational to the full vision: The Blackboard 42 Current implementation of Specify 43 Beginning of the *new* Specify 44 Current Bio2RDF site 45 Less familiar to some, but foundational to the full vision: The Blackboard 46 Interfacing to existing ecosystem: Workstations 47 Internal components within QI-Bench to make it work: Controller and Model Layers 48 Internal components within QI-Bench to make it work: QI-Bench REST 49 Last but not least: QI-Bench Web GUI 50 DATA VIRTUALIZATION LAYER 51 Most familiar: Data Services 52 Data Virtualization Layer (Motivation) • Datasets come in disparate forms from many different databases with different APIs • We need a method to aggregate data in such a way that all data may be addressed equally • We need a Java-based solution for this 53 Publicly Available Now for Detailed Use QI-Bench Demonstrators (inc;. QIBA and FDA data) Public facing: 5,281image series over 6 studies of 3 anatomic regions (Secure instance: 17,000 image series over 7 studies of 1 anatomic regions) LIDC/RIDER/TCIA 2,129 patients over 9 studies of 4 anatomic regions: Give-a-scan, 23 patients at http://www.giveascan.org/community/view/2: Open Science sets (e.g., biopsy cases), 1,209 datasets over 3 studies at http://midas3.kitware.com/midas/community/6: NBIA 3,759 image series from 771 patients over 17 studies of 3 anatomic regions (the 3770 from Formulate’s simple search). have roughly <count them> patients, e.g., 54 Data Virtualization (Implementation) • Teiid, a framework for exposing nearly any data source via a JDBC-compliant API • Teiid will allow for adding new imaging and informatics databases with minimal effort • Teiid gets us to think about “the data we want.” 55 suzek SPECIFY AND FORMULATE 56 Motivation • Specify: Support a researcher to state a hypothesis in a natural language like way using ontologies – The tumor volume change computed from longitudinal thorax CT images is a biomarker for treatment response to a specific drug family • Formulate: Support seamless collection of data sets to support hypothesis Subject Predicate Object CT images Thorax Thorax contains NonSmallCellLungCancer Volumetry analyzes CT LongitudinalVolumetry estimates TumorSizeChange TumorSizeChange predicts CytotoxicTreatmentRespo nse TyrosineKinaseInhibitor is CytotoxicTreatment – One needs longitudinal thorax CT images from lung cancer patients should have been treated with a specific drug family 57 Contribution • A natural language like way of formalizing and standardizing hypothesis statement • A computable way to persist the hypothesis to supporting reuse and iteration • A automated way to identify the data sets to support/study hypothesis • A reproducible flow from hypothesis to data collection/analysis Solution Overview data services Unstructured or semistructured expert sources for clinical context for use and assay methods Formulate Specify current triples QIBO and linked ontologies new/updated triples existing datasets hypotheses and saved queries (testable) assertions New datasets Knowledge base (in triple store) Reference Data Sets enriched annotations initial annotations raw data derived data (evaluation applications) compute services 59 Representing Semantics • Leverage existing established ontologies and extend QIBO • Normalize representation to ontologies • E.g. convert portions of BRIDG and LSDAM in UML to ontologies Specify • Navigate the ontology hierarchy for concepts • Create triple (subject, predicate, object) using concepts from ontology • Manage and store triples that represent the hypothesis The tumor volume change computed from longitudinal thorax CT images is a biomarker for treatment response to a specific drug family Formulate • Automatically populating the query from the triples create by Specify • Invoking query against data services • Collecting and aggregating normalized data into triples from data services Transform: Entity: CT images Properties:, For Thorax, From patients with Non Small Cell Lung Cancer SELECT ?image WHERE { ?image x:type CT ; ?image x:isFor Thorax; ?image x:isFrom ?patient; ?patient x:has nonSmallCellLungCancer … } Formulate • Supported by existing image related data services wrapped to: – Serve to SPARQL queries – Provide metadata aligned with the same ontologies used by Specify Specify - Current Status • Specify: A prototype leveraging Annotation and Image Markup (AIM) Template Builder – All navigation/management capabilities in UI – Triple storage Formulate - Current Status • Formulate: A proof of concept leveraging a data query tools specifically designed for caGrid services; caB2B – Forms from UML-based metadata to help search – Query storage Challenges and Future Directions • Alignment of Formulate with ontologies – A new formulate using SPARQL and ontologies used by Specify • Integration of Specify and Formulate – Import and transform mechanisms to convert a Specify triples to Formulate • Wrapping existing services and their metadata – Data integration solutions such as Teiid to wrap native imaging services (e.g. MIDAS) danagoulian COMPUTE SERVICES, ANALYSIS LIBRARY, WORKFLOWS 67 Compute Services 68 Compute Services: Objects for the Analyze Library In Place Technical Performance Clinical Performance Capabilities to analyze literature, to extract Capability to analyze clinical performance, e.g. • • • Reported technical performance Covariates commonly measured in clinical trials analyze relative effectiveness of response criteria and/or read paradigms. Capability to analyze data to • • Characterize image dataset quality Characterize datasets of statistical outliers. Capability to analyze technical performance of datasets to, e.g. • • • • Characterize effects due to scanner settings, geography, scanner, site, and patient status. Quantify sources of error and variability Characterize variability in the reading process. Evaluate image segmentation algorithms. • • • In Progress In Queue response analysis in clinical trials. characterize metric‘s limitations. establish biomarker‘s value as a surrogate endpoint. 69 Analyze Library: Coding View • Core Analysis Modules: • • • • AnalyzeBiasAndLinearity PerformBlandAltmanAndCCC ModelLinearMixedEffects ComputeAggregateUncertainty • Meta-analysis Extraction Modules: • • • • CalculateReadingsFromMeanStdev (written in MATLAB to generate synthetic Data) CalculateReadingsFromStatistics (written in R to generate synthetic data. Inputs are number of readings, mean, standard deviation, inter- and intra-reader correlation coefficients). CalculateReadingsAnalytically • Utility Functions: • • • PlotBlandAltman GapBarplot Blscatterplotfn 70 Drill-down on segmentation analysis activities Metric Purpose STAPLE Language Status To compute a probabilistic estimate FDA of the true segmentation and a measure of the performance level by each segmentation MATLAB testing STAPLE Same as above ITK C++ implemented soft STAPLE Extension of STAPLE to estimate performance from probabilistic segmentations TBD TBD TBD DICE Metric evaluation of spatial overlap ITK C++ implemented Vote Probability map ITK C++ implemented P-Map Probability map C. Meyer Perl implemented Versus (Peter Bajcsy) JAVA testing Jaccard, Pixel-based comparisons Rand, DICE, etc. Source 71 Update on Workflow Engine for the Compute Services • allows users to create their own workflows and facilitates sharing and reusing of workflows. • has a good interface for capture of the provenance of data. • ability to work across different platforms (Linux, OSX, and Windows). • easy access to a geographically distributed set of data repositories, computing resources, and workflow libraries. • robust graphical interface. • can operate on data stored in a variety of formats, locally and over the internet (APIs, Web RESTful interfaces, SOAP, etc…). • directly interfaces to R, MATLAB, ImageJ, (or other viewers). • ability to create new components or wrap existing components from other programs (e.g., C programs) for use within the workflow. • provides extensive documentation. • grid-based approaches to distributed computation. 72 wernsing QI-BENCH CLIENT / WORKSTATION 73 QI-Bench Client – a first view 74 The Scientist‘s view 75 The Biomarker view 76 The Blackboard view 77 One idea of the Blackboard view 78 The Clinician‘s view 79 Personalize Your Experience 80 Future Development Continue with core infrastructure development. Jena TDB Jersey Kepler integration Struts 2 Teiid integration Parallel work QI-Bench API Workflows Connections to the API Web GUI Workstation plugins 81 buckler CT VOLUMETRY TEST BED 82 Test bed: e.g., the 3A challenge series… Some of the Participants Investigation 1 Pilot Pivotal Tr ain Te st •Defined set of data •Defined challenge •Defined test set policy Investigation Pilot Pivotal Tr ain Te st Investigation Pilot Pivotal Tr ain Te st Investigation n Pilot 1. 2. 3. 4. 5. 6. Median Technologies Vital Images, Inc. Fraunhofer Mevis Siemens Moffitt Cancer Center Toshiba 7. 8. 9. 10. 11. GE Healthcare Icon Medical Imaging Columbia University INTIO, Inc. Vital Images, Inc. Pivotal Tr ain Te st 83 f the Participants GE Healthcare Icon Medical Imaging Columbia University INTIO, Inc. Vital Images, Inc. Broader capability: Systematic qualification of CT volumetry PROFILE Authoring and T Inter-analysis technique (algorithm) variability (3A) Transformation Correla endpoin Survival Plot 1.0 Transformation Modality Environment % Reaching Partial Response 7. 8. 9. 10. 11. Transformation Therapy Decision Environment 0.8 0.6 0.4 0.2 0.0 TherapyPatient Machine Explore figuresof-merit and QC procedures Technologies ages, Inc. ofer Mevis s Cancer Center Intra- and inter-reader variability (1A) Patient Feedback Human Observer 0 42 84 126 168 Time o Minimum detectable biological change (1B 5 readers 3 reads each 84 Scope of Consideration: Purpose and value of the test-bed • Thesis: enough data exists and/or could be made available with sufficient clarity on how it would be used to qualify CT volumetry as a biomarker in cancer treatment contexts. • Qualification per se is neither the only nor even necessarily the best goal, but it does provide a defined target that is useful in driving activity. Another working model is how RECIST has come to be accepted. There is considerable overlap in the needs of these two models. QI-Bench is ideally suited to meeting these needs. • This drives both technical as well as clinical performance characterization activities. • Formally articulating requirements for these activities and reducing them to practice using open source methods backed by rigorous system development process continues to drive us. 85 Scope of Consideration: Purpose and value of the test-bed • Theoretical contributions lie in the area of formal methods for maximizing value of data, specifically in pushing the limits of generalizability to eke out as much utility per unit of data or analytical resource as possible. – Means to this end is to develop practical methods that merge logical and statistical inference. • Practical contributions lie in the area of developing tangible and effective systems for image archival, representation of wide-ranging and heterogeneous metadata, and facilities to conduct reproducible workflows to increase scientific rigor and discipline in promoting imaging biomarkers. – Means to this end are the applications we develop and the deployment options we implement. • CT Volumetry is a rich example because so many have worked on it so long, yet without benefit of actual convergence for lack of these capabilities. – However we are not limited to it. Sponsored uses of this capability have been conducted in both anatomic and functional applications of MR, we also hope other QIBA committees might have an interest to use it, e.g., FDG-PET, PDF-MRI, etc. It would be relatively easy for them to do so based on technology choices. Also, NCI CIP, QIN, and other groups have started to express interest. 86 Current initiatives on the test-bed • The “common footing” analyses of QIBA studies • The next 3A challenge as described today • More broadly: – Specific datasets for vCT – Literature-based meta-analysis – Umbrella SAP – Project plan 87 88 Value proposition of QI-Bench • Efficiently collect and exploit evidence establishing standards for optimized quantitative imaging: – Users want confidence in the read-outs – Pharma wants to use them as endpoints – Device/SW companies want to market products that produce them without huge costs – Public wants to trust the decisions that they contribute to • By providing a verification framework to develop precompetitive specifications and support test harnesses to curate and utilize reference data • Doing so as an accessible and open resource facilitates collaboration among diverse stakeholders 89 Summary: QI-Bench Contributions • We make it practical to increase the magnitude of data for increased statistical significance. • We provide practical means to grapple with massive data sets. • We address the problem of efficient use of resources to assess limits of generalizability. • We make formal specification accessible to diverse groups of experts that are not skilled or interested in knowledge engineering. • We map both medical as well as technical domain expertise into representations well suited to emerging capabilities of the semantic web. • We enable a mechanism to assess compliance with standards or requirements within specific contexts for use. • We take a “toolbox” approach to statistical analysis. • We provide the capability in a manner which is accessible to varying levels of collaborative models, from individual companies or institutions to larger consortia or public-private partnerships to fully open public access. 90 QI-Bench Structure / Acknowledgements • Prime: BBMSC (Andrew Buckler, Gary Wernsing, Mike Sperling, Matt Ouellette, Kjell Johnson, Jovanna Danagoulian) • Co-Investigators – – • • Financial support as well as technical content: NIST (Mary Brady, Alden Dima, John Lu) Collaborators / Colleagues / Idea Contributors – – – – – – • Georgetown (Baris Suzek) FDA (Nick Petrick, Marios Gavrielides) UMD (Eliot Siegel, Joe Chen, Ganesh Saiprasad, Yelena Yesha) Northwestern (Pat Mongkolwat) UCLA (Grace Kim) VUmc (Otto Hoekstra) Industry – – • Kitware (Rick Avila, Patrick Reynolds, Julien Jomier, Mike Grauer) Stanford (David Paik) Pharma: Novartis (Stefan Baumann), Merck (Richard Baumgartner) Device/Software: Definiens, Median, Intio, GE, Siemens, Mevis, Claron Technologies, … Coordinating Programs – – RSNA QIBA (e.g., Dan Sullivan, Binsheng Zhao) Under consideration: CTMM TraIT (Andre Dekker, Jeroen Belien) 91