Translational Research IT (TraIT) “TraIT and OpenClinica: partners in translational research” Marinel Cavelaars, Cuneyt Parlayan, Jacob Rousseau, Sander de Ridder, Jan Willem Boiten and Jeroen Beliën Boston; June 21st 2013 Overview • Introduction and background – CTMM – Translational Research • TraIT – Three real-life examples: OpenClinica, BMIA, tranSMART • OpenClinica.com – TraIT partnership • CTMM-TRACER and OpenClinica by Sander de Ridder – Scripts, Long Lists, Tools developed – Things we learned/found useful Who am I? • My name: Jeroen Beliën, PhD, MSc • Associate Professor, medical informatics, dept. of Pathology, VU University medical center, Amsterdam jam.belien@vumc.nl – Digital Pathology, Image processing, IT in translational research – String of Pearls – IT-lead 2 CTMM projects: DeCoDe and TRACER – CTO CTMM-TraIT – BioMedBridges • Member of taskforce Stichting Palga – Palga: Dutch National Electronic Pathology Archive • Faculty member of NBIC CTMM, TIPharma and BMM offer an integrated approach for innovations in the Dutch health care sector TIPharma: drugs • Translational research on novel pharmaceutical therapies CTMM: diagnosis • Early detection of disease by invitro and in-vivo diagnostics Biomarkers • Target finding, animal models and lead selection • Stratification of patients for personalized treatment • Drug formulation, delivery and targeting • Assessing efficiency and efficacy of medicines by imaging Image guided drug delivery • Image guided delivery of medication • Focus on cancer, cardiovascular, neurodegenerative and infectious /autoimmune disease. • Special Theme focusing on the efficiency of the process of drug development Imaging for regenerative medicine Drug delivery BMM: devices • Smart drug delivery systems • Innovations in contemporary organ replacement therapies • Passive and active scaffolds, including cell signalling functions Public-private partnerships: Financial model Subsidy: 50% of research cost Academia € 75 mln In kind Industry € 37,5 mln CASH Government € 37,5 mln Kind CTMM projects € 300 mln € 150 mln Subsidy 50% CTMM projects Stroke Heart Failure Breast Arrhythmia Diabetes Kidney Failure Lung Thrombosis Peripheral Vascular Disease Prostate Colon Leukemia Alzheimer Rheumatoid Arthritis Sepsis Translational research process Guiding principle: connecting phenotype to biology Patient enters medical center Clinical Procedures Electronic Health Record Imaging Samples Experiments Clinical database Image database Biobank database Experimental data Data Integration External data Scientific Output Downstream analysis Intellectual Property Improved Healthcare TraIT consortium - Started Oct. 2011 status 2013: 26 partners Growing TraIT project team The TraIT approach • IT infrastructure = main goal • No research on the side • Workflow-oriented approach • Create data pipelines to link data production and data analysis • User driven priority setting • Regular reprioritization possible (agile) • Avoid reinventing wheels • Adopt/adapt existing technology and expertise • Connect with other initiatives • Organizations (NBIC, EBI, PSI, IMI, etc.) • Think big; start small; act now • Short term focus on immediate needs CTMM projects Division in work packages TraIT has been subdivided into four work packages (WPs) supporting data generating domains, and two work packages dealing with the overarching TraIT requirements: data integration and professional support respectively: Five data generating work packages Data integration & analysis across the four platforms Shared service center for hardware, training & support WP 1 Clinical Data Imaging Data WP 2 WP 7 Clinical PathoImaging logy Data Imaging WP 3 Biobanking Data WP 5 Core Infrastructure WP 6 Deployment WP 4 Experimental Data High-level TraIT data flows Hospital (IT) HIS PACS LIS Research Data LIMS … Public Data … P s e u d o n y m i z a t i o n Translational Research (IT) data domains clinical data integrated data Open Clinica translational analytics workbench imaging data annotations NBIA biobanking CBM-NL e.g. tranSMART/ i2b2 cohort explorer e.g. R experimental data Various solutions e.g. Galaxy TraIT Pseudonymization Hospital (IT) BSN 274839 .. BSN Name J.Doe Translational Research (IT) data domains TumorStage T3c HIS .. Name ImageID Image 274839 J.Doe .. BSN .. PACS .. .. Name SampleID Sample 274839 J.Doe .. 782 .. 346 .. LIS .. T T P SubjectID Cairo_135 .. Cairo_135 e.g. GEO, EMBL-EBI ImageID Cairo_ NBIA img_492+ AIM .. .. SampleID 12 SampleID Public Study Volume Cairo_135 Cairo_smpl_42 50 cc e.g. e.g. ..caTissue .. CBM catalog .. experimental data SampleID GeneExpProfile e.g. e.g. Cairo_smpl_42 0.23, PhenotypeDB, 012, 0.52, 1.67, … Galaxy, .. Annai..Systems Chipster T3c 12 Cairo_ img_492 Cairo_ smpl_42 GeneExp 0.23, 012, Profile 0.52, 1.67, .. .. biobanking SubjectID Public Data TumorStage TumorSize SubjectID ImageID Image TumorSize Research Data … TumorStage T3c .. imaging data .. LIMS integrated data translational analytics Cairo Private Study workbench SubjectID Cairo_135 clinical data T T P SubjectID Public_1931 TumorStage T3c TumorSize 12 Public_ ImageID img_46 Public_ SampleID smpl_23 GeneExp 0.23, 012, Profile 0.52, 1.67, .. tranSMART/ cohort explorer R Galaxy TraIT - study driven approach 2013 Task 1: 2014 • study selection Study 1 Study 2 Task 2: UC 1 UC 2 ··· UC … ··· • use cases & prototypes Task 3, 4, 5: development of • data integration platform • analytics workbench • shared components Study … Data Integration Translational Analytics Workbench Data Integration Translational Analytics Workbench ··· p s e u d o Data Integration E T L integrated translational data warehouse A A A Translational Analytics Workbench Analytics Three real-life examples Hospital (IT) Translational Research (IT) Example 1: CTMM INCOAG clinical Example 3: CTMM PCMM integrated data Open Clinica Example 2: CTMM AIRFORCE T PACS T P imaging NBIA e.g. tranSMART Real-life example 1 - CTMM Incoag • Discover new risk factors for thrombotic diseases • Approach: Combine existing clinical studies into one OpenClinica data set for higher statistical power OpenClinica: • Clinical data capture • Web-based • Open-source • Full audit-trail • 10,000+ installations • TraIT tool of choice Incoag - Technical integration Out-of-the-box OpenClinica can be applied in most projects: currently used in CTMM projects AirForce, Cohfar, DeCoDe, Parisk, PCMM, and Tracer Specific Incoag question: how to combine 5+ independent existing studies from mixed sources into one OpenClinica installation? Study 1 Study 2 ? Sustainable storage in TraIT environment Study 3 Incoag - Technical integration Solution: TraIT-team created a batch upload toolbox for OpenClinica Will be submitted to the OpenClinica open-source community Study 1 Study 2 Sustainable storage in TraIT environment Study 3 Incoag - Semantic integration Second question from Incoag project: how to identify common fields and data items? Study 1 Study 3 Study 2 Study 5 Study 4 How to determine the overlap? Incoag - Semantic integration Second question from Incoag project: how to identify common fields and data items? Study 1 100-150 fields in each study Study 3 Study 2 Study 5 Study 4 How to determine the overlap? More than 1005 combinations to consider! Common ground? Studies speak different “languages”: A biomedical “Esperanto” needed Incoag - Semantic integration Project 1: Provide tools to standardize studies at data registration (as far as possible): TraIT building blocks to rapidly build CRFs for new studies based on common dictionary Study n Project 2: First test with tools for automatic “after-the-fact” Automatic mapping against harmonization for historical data: Study 1 multiple dictionaries (SNOMED-CT, LOINC, NCI thesaurus & Gene Ontology) Study 3 Study 5 Study 2 Study 4 Harmonized Incoag dataset Real-life example 2 – CTMM AirForce • Personalized chemo-radiation of lung and head & neck cancer • Lung cancer patients with PET-CT (and clinical data & tissue) – VUMC, MUMC+, NKI, UMCG + 35 patients from Policlinico Gemelli in Rome (via MUMC+) • Transfer of images from Rome using TraIT’s BioMedical Image Archive (www.bmia.nl) WP2 High level design – Upload (Implemented) Image storage & simple webshop like image viewing (based on NBIA) Image pseudonymization pipeline (based on CTP from the RSNA) • Install TraIT de-identification client in Rome – Adopt: Clinical Trial Processor (RSNA, open source, Java) • Configure DICOM de-identification DICOM TAGS AirForce - de-identification of images – Replace Codice Sanitario (PatientID) with AirForce ID – Keep important tags (e.g. some tags are crucial for downstream analysis of PET) • Result: A pipeline to TraIT’s BMIA from the local Rome Image Archive DICOM IMAGE – Remove identifying DICOM tags AirForce - QC of de-identification • Perform QC step by collection administrator before images are visible in BMIA to prevent privacy breach (esp. burnt-in names). AirForce - Resulting image archive in BMIA • Collection AirForce on www.bmia.nl with 35 patients from Rome • Web shop model where you can fill a basket with patients for download Real-life example 3 – CTMM PCMM • Develop and validate biomarkers for diagnosis of prostate cancer • Requires correlation of phenotype data to biomarker data • Potential solution: tranSMART; to be validated with real-life data from CTMM projects like PCMM Can we address the generic translational question with the tranSMART solution? Role of tranSMART in TraIT PCMM – tranSMART as a candidate solution tranSMART: • Developed in J&J • Made open-source • “Data workbench” for translational researchers • Searching across studies • Data exploration PCMM - Import of prostate data Reference to public data sources available Gleason score, PSA values, etc. Prostate data Usually gene expression data will be loaded as well; not yet done for PCMM PCMM - QC of the data set PCMM - QC of the data set Drag-and-drop data parameters to create simple distribution plots and statistical values PCMM: tranSMART for correlation analysis Easy to create correlation plots between existing and potential predictors for prostate cancer Second tranSMART developer/user meeting, June 17th-19th 2013, Amsterdam Recombinant / Deloitte CDISC Thomson Reuters Pfizer eTRIKS / Imperial College CTMMTraIT Sanofi Johnson & Johnson University of Michigan Philips University of Luxembourgh OpenClinica.com – TraIT partnership Statement of Work • TraIT: automate data capture in OC as much as possible – E.g. automate upload of excel data and hospital lab data – Approach: OC’s Web Services • Requires Improvements on OIDs and Bug Fixes • Support configurable role based authentication and authorization within OC – E.g. Central review of images for all subjects in the different sites. Each image is reviewed by three reviewers who are not allowed to see each other’s reports in the CRFs • Parameterized links in CRFs – E.g. Links to images or to other subjects, with a dynamic URL based on data in CRF Other wishes • Study migration – E.g. Users want to switch to different OC server – Currently only "ClinicalData" ODM is imported – Studies can be exported in full detail but cannot be imported as such • Support reference to ontologies in the CRF – Standardization of data • Easy view for data entry – E.g. tree structure that indicates where you are while entering data for easy navigation to other CRF for subject Uptake of OpenClinica 50 47 studies 77 sites 256 users 47 47 45 Number of studies 40 35 30 26 26 25 Pre TraIT effect: all multicenter VUmc studies 20 15 15 15 10 5 0 3 0 3 3 0 Q3 Q4 Q1 Q2 Q3 2008 2009 Q4 Also multicenter studies UMCU, UMCN, EMC, Meander MC 3 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Mid Mid june june 2010 2011 2012 2013 Timeline Start DeCoDe OpenClinica • • Start TraIT OpenClinica The load on TraIT OpenClinica increased significantly in 2012 Considerable time and energy was spent on delivery management (availability, capacity and security) and on improvement of the TraIT OpenClinica user support Who am I? • My name: Sander de Ridder s.deridder@vumc.nl – Computer Science (MSc) & Bioinformatics (MSc) • Inflammatory Disease Profiling, Dept. of Pathology, VU University medical center, Amsterdam – Bioinformatics for Inflammatory Disease Profiling Group – IT implementation CTMM TRACER CTMM-TRACER Background information on TRACER • CTMM TRACER: Rheumatoid Arthritis – Prospective data – Retrospective data (To Do) • Go Live: – Wednesday the 5th of June • Started at 9:00 - Finished at 12:00 • Approximately 1 hour/study Prospective Studies VERA ERA ESRA Sites 4 7 7 Events 7 6 6 CRFs ~35 ~30 ~30 Rules ~250 ~450 ~650 Age Calculation After entering the DOB and the date of signing… The age is calculated Age calculation script: http://en.wikibooks.org/wiki/OpenClinica_User_Manual/AgeField Created by Sander de Ridder and improved by Gerben Rienk Long List Implementation • Problem: – Maximum of 4000 characters for single-select response options text – Some lists need more characters: e.g. medication list > 9000 characters • Solution: – Created external list – Add field to CRF which opens new page with list – Allows user to select option; selected value is copied back to CRF ITEM_NAME RESPONSE_TYPE RESPONSE_OPTIONS_TEXT RESPONSE_VALUES_OR_CALCULATIONS Smoking_Category single-select Never smoked, Current smoker 1,2 Example: Medication User selects “Other” and then clicks on question 3)’s field A new tab/window opens with an HTML page with a single-select The user can select desired medication from the list Selected medication is copied to the CRF Some tools we created: CRF validator • Compares items between CRFs based on uids and ensures they match – CRF1 • ID: Patient_Weight; DATA_TYPE: INT – CRF2 • ID: Patient_Weight; DATA_TYPE: REAL Mismatch for Patient_Weight! • Checks NULL-flavour coding integrity – Coding: -1=No Information, -2=Not Applicable, -3=Unknown, … – CRF1 • RESPONSE_OPTIONS_TEXT: No Information RESPONSE_VALUES_OR_CALCULATIONS: -2 Incorrect NULL-flavour coding! Prevents errors and inconsistencies Some tools we created: ID-Translator • • Move rules file to new OC server replace all item IDs Automatic translation of item identifiers in rules Prevented replace errors and saved many hours of work • Requires: – ViewCRFVersion file • – Contains item ID information for CRF on new server Rule file with properly specified header • Contains item ID information for CRF on old server ITEM_NAME OC_ID ViewCRFVersion (new Server) Rules for old server Parse ViewCRFVersion mapping ITEM_NAME – new OC_ID MedicatieBijgewerkt = I_TRACE_MEDICATIEBIJGEWERKT_4714 ITEM_NAME Parse Header of rule file mapping ITEM_NAME – old OC_ID OC_ID MedicatieBijgewerkt = I_TRACE_PATIENTSTUDIE_MOMENT_AFROND Translate rule file old OC_ID new OC_ID via ITEM_NAME I_TRACE_PATIENTSTUDIE_MOMENT_AFROND = I_TRACE_MEDICATIEBIJGEWERKT_4714 Translated Rules for new server Things we learned/found useful • ITEM_NAME max 64 characters – • Truly unique identifiers (description label) – – • Prevent conflict with retrospective data Easy to keep NULL-flavour coding consistent Specify identifiers in header of rule file – • Easy to link to study definition (CTMMC) Useful for consistency checking Negative NULL-flavour coding – – • SPSS compatibility Automatic translation JavaScript code – $.noConflict(); • – Reference to jquery • • • Prevents our code from interfering with OC’s code <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"> Prevents dependency on OC’s jQuery version Create a checklist and follow it during go-live Goal: make researchers want to use OpenClinica and tranSMART Acknowledgements And many more…