TraIT and OpenClinica: partners in translational research

advertisement
Translational Research IT (TraIT)
“TraIT and OpenClinica: partners in
translational research”
Marinel Cavelaars, Cuneyt Parlayan, Jacob Rousseau,
Sander de Ridder, Jan Willem Boiten and Jeroen Beliën
Boston; June 21st 2013
Overview
• Introduction and background
– CTMM
– Translational Research
• TraIT
– Three real-life examples: OpenClinica, BMIA, tranSMART
• OpenClinica.com – TraIT partnership
• CTMM-TRACER and OpenClinica by Sander de Ridder
– Scripts, Long Lists, Tools developed
– Things we learned/found useful
Who am I?
•
My name: Jeroen Beliën, PhD, MSc
•
Associate Professor, medical informatics, dept. of Pathology, VU
University medical center, Amsterdam
jam.belien@vumc.nl
– Digital Pathology, Image processing, IT in translational research
– String of Pearls
– IT-lead 2 CTMM projects: DeCoDe and TRACER
– CTO CTMM-TraIT
– BioMedBridges
•
Member of taskforce Stichting Palga
– Palga: Dutch National Electronic Pathology Archive
•
Faculty member of NBIC
CTMM, TIPharma and BMM
offer an integrated approach for innovations in
the Dutch health care sector
TIPharma: drugs
• Translational research on novel
pharmaceutical therapies
CTMM: diagnosis
• Early detection of disease by invitro and in-vivo diagnostics
Biomarkers
• Target finding, animal models and
lead selection
• Stratification of patients for
personalized treatment
• Drug formulation, delivery and
targeting
• Assessing efficiency and efficacy
of medicines by imaging
Image guided
drug delivery
• Image guided delivery of
medication
• Focus on cancer, cardiovascular,
neurodegenerative and infectious
/autoimmune disease.
• Special Theme focusing on the
efficiency of the process of drug
development
Imaging for
regenerative
medicine
Drug
delivery
BMM: devices
• Smart drug delivery systems
• Innovations in contemporary organ replacement
therapies
• Passive and active scaffolds, including cell
signalling functions
Public-private partnerships: Financial model
Subsidy: 50% of research cost
Academia
€ 75 mln
In kind
Industry
€ 37,5
mln
CASH
Government
€ 37,5
mln
Kind
CTMM projects € 300 mln
€ 150 mln
Subsidy
50%
CTMM projects
Stroke
Heart
Failure
Breast
Arrhythmia
Diabetes
Kidney Failure
Lung
Thrombosis
Peripheral Vascular
Disease
Prostate
Colon
Leukemia
Alzheimer
Rheumatoid Arthritis
Sepsis
Translational research process
Guiding principle: connecting phenotype to biology
Patient enters
medical center
Clinical
Procedures
Electronic
Health Record
Imaging
Samples
Experiments
Clinical database
Image database
Biobank
database
Experimental
data
Data
Integration
External data
Scientific Output
Downstream
analysis
Intellectual
Property
Improved
Healthcare
TraIT consortium - Started Oct. 2011
status 2013: 26 partners
Growing TraIT project team
The TraIT approach
• IT infrastructure = main goal
• No research on the side
• Workflow-oriented approach
• Create data pipelines to link data production and data analysis
• User driven priority setting
• Regular reprioritization possible (agile)
• Avoid reinventing wheels
• Adopt/adapt existing technology and expertise
• Connect with other initiatives
• Organizations (NBIC, EBI, PSI, IMI, etc.)
• Think big; start small; act now
• Short term focus on immediate needs CTMM projects
Division in work packages
TraIT has been subdivided into four work packages (WPs) supporting data generating domains, and
two work packages dealing with the overarching TraIT requirements: data integration and
professional support respectively:
Five data generating work
packages
Data integration & analysis
across the four platforms
Shared service center for
hardware, training &
support
WP 1
Clinical
Data
Imaging Data
WP 2
WP 7
Clinical PathoImaging
logy
Data
Imaging
WP 3
Biobanking
Data
WP 5 Core Infrastructure
WP 6 Deployment
WP 4
Experimental
Data
High-level TraIT data flows
Hospital (IT)
HIS
PACS
LIS
Research Data
LIMS
…
Public Data
…
P
s
e
u
d
o
n
y
m
i
z
a
t
i
o
n
Translational Research (IT)
data domains
clinical data
integrated
data
Open
Clinica
translational
analytics
workbench
imaging data
annotations
NBIA
biobanking
CBM-NL
e.g.
tranSMART/
i2b2
cohort
explorer
e.g. R
experimental
data
Various
solutions
e.g. Galaxy
TraIT Pseudonymization
Hospital (IT)
BSN
274839
..
BSN
Name
J.Doe
Translational Research (IT)
data domains
TumorStage
T3c
HIS
..
Name ImageID Image
274839 J.Doe
..
BSN
..
PACS
..
..
Name SampleID Sample
274839 J.Doe
..
782
..
346
..
LIS
..
T
T
P
SubjectID
Cairo_135
..
Cairo_135
e.g.
GEO, EMBL-EBI
ImageID
Cairo_
NBIA
img_492+ AIM
..
..
SampleID
12
SampleID
Public Study
Volume
Cairo_135 Cairo_smpl_42
50 cc
e.g.
e.g.
..caTissue .. CBM catalog
..
experimental data
SampleID
GeneExpProfile
e.g.
e.g.
Cairo_smpl_42 0.23, PhenotypeDB,
012, 0.52, 1.67, …
Galaxy,
..
Annai..Systems
Chipster
T3c
12
Cairo_
img_492
Cairo_
smpl_42
GeneExp 0.23, 012,
Profile 0.52, 1.67, ..
..
biobanking
SubjectID
Public Data
TumorStage
TumorSize
SubjectID ImageID Image TumorSize
Research Data
…
TumorStage
T3c
..
imaging data
..
LIMS
integrated data translational
analytics
Cairo Private Study
workbench
SubjectID Cairo_135
clinical data
T
T
P
SubjectID Public_1931
TumorStage
T3c
TumorSize
12
Public_
ImageID
img_46
Public_
SampleID
smpl_23
GeneExp 0.23, 012,
Profile 0.52, 1.67, ..
tranSMART/
cohort explorer
R
Galaxy
TraIT - study driven approach
2013
Task 1:
2014
• study selection
Study
1
Study
2
Task 2:
UC 1
UC 2
···
UC …
···
• use cases &
prototypes
Task 3, 4, 5:
development of
• data integration platform
• analytics workbench
• shared components
Study
…
Data Integration
Translational
Analytics
Workbench
Data Integration
Translational
Analytics
Workbench
···
p
s
e
u
d
o
Data Integration
E
T
L
integrated
translational
data
warehouse
A A A
Translational
Analytics
Workbench
Analytics
Three real-life examples
Hospital (IT)
Translational Research (IT)
Example 1: CTMM
INCOAG
clinical
Example 3: CTMM PCMM
integrated
data
Open
Clinica
Example 2: CTMM AIRFORCE
T
PACS
T
P
imaging
NBIA
e.g.
tranSMART
Real-life example 1 - CTMM Incoag
• Discover new risk factors for thrombotic diseases
• Approach: Combine existing clinical studies into one
OpenClinica data set for higher statistical power
OpenClinica:
• Clinical data capture
• Web-based
• Open-source
• Full audit-trail
• 10,000+ installations
• TraIT tool of choice
Incoag - Technical integration
Out-of-the-box OpenClinica can be applied in most projects: currently used
in CTMM projects AirForce, Cohfar, DeCoDe, Parisk, PCMM, and Tracer
Specific Incoag question: how to combine 5+ independent existing studies
from mixed sources into one OpenClinica installation?
Study 1
Study 2
?
Sustainable storage in TraIT environment
Study 3
Incoag - Technical integration
Solution: TraIT-team created a batch upload toolbox for OpenClinica
Will be submitted to the OpenClinica open-source community
Study 1
Study 2
Sustainable storage in TraIT environment
Study 3
Incoag - Semantic integration
Second question from Incoag project: how to identify common
fields and data items?
Study
1
Study
3
Study
2
Study
5
Study
4
How to determine the overlap?
Incoag - Semantic integration
Second question from Incoag project: how to identify common
fields and data items?
Study
1
100-150 fields
in each study
Study
3
Study
2
Study
5
Study
4
How to determine the overlap?
More than 1005
combinations to
consider!
Common
ground?
Studies speak different “languages”:
A biomedical “Esperanto” needed
Incoag - Semantic integration
Project 1: Provide tools to standardize studies at data registration (as
far as possible):
TraIT building blocks
to rapidly build CRFs
for new studies based
on common dictionary
Study
n
Project 2: First test with tools for automatic “after-the-fact”
Automatic mapping against
harmonization for historical data:
Study
1
multiple dictionaries
(SNOMED-CT, LOINC, NCI
thesaurus & Gene Ontology)
Study
3
Study
5
Study
2
Study
4
Harmonized
Incoag
dataset
Real-life example 2 – CTMM AirForce
• Personalized chemo-radiation of lung and head & neck cancer
• Lung cancer patients with PET-CT (and clinical data & tissue)
– VUMC, MUMC+, NKI, UMCG + 35 patients from Policlinico
Gemelli in Rome (via MUMC+)
• Transfer of images from Rome using TraIT’s BioMedical Image
Archive (www.bmia.nl)
WP2 High level design – Upload
(Implemented)
Image storage & simple webshop like image viewing
(based on NBIA)
Image pseudonymization pipeline
(based on CTP from the RSNA)
• Install TraIT de-identification client in
Rome
– Adopt: Clinical Trial Processor
(RSNA, open source, Java)
• Configure DICOM de-identification
DICOM TAGS
AirForce - de-identification of images
– Replace Codice Sanitario (PatientID)
with AirForce ID
– Keep important tags (e.g. some tags
are crucial for downstream analysis
of PET)
• Result: A pipeline to TraIT’s BMIA
from the local Rome Image Archive
DICOM IMAGE
– Remove identifying DICOM tags
AirForce - QC of de-identification
• Perform QC step by collection administrator before images are
visible in BMIA to prevent privacy breach (esp. burnt-in names).
AirForce - Resulting image archive in BMIA
• Collection AirForce on www.bmia.nl with 35 patients from Rome
• Web shop model where you can fill a basket with patients for
download
Real-life example 3 – CTMM PCMM
• Develop and validate biomarkers for diagnosis of prostate cancer
• Requires correlation of phenotype data to biomarker data
• Potential solution: tranSMART; to be validated with real-life data
from CTMM projects like PCMM
Can we address the
generic translational
question with the
tranSMART solution?
Role of tranSMART in TraIT
PCMM – tranSMART as a candidate solution
tranSMART:
•
Developed in J&J
•
Made open-source
•
“Data workbench” for
translational researchers
•
Searching across studies
•
Data exploration
PCMM - Import of prostate data
Reference to public data sources
available
Gleason score,
PSA values, etc.
Prostate
data
Usually gene expression data will be
loaded as well; not yet done for PCMM
PCMM - QC of the data set
PCMM - QC of the data set
Drag-and-drop data parameters to create simple
distribution plots and statistical values
PCMM: tranSMART for correlation analysis
Easy to create correlation plots between existing
and potential predictors for prostate cancer
Second tranSMART developer/user meeting,
June 17th-19th 2013, Amsterdam
Recombinant
/ Deloitte
CDISC
Thomson
Reuters
Pfizer
eTRIKS /
Imperial College
CTMMTraIT
Sanofi
Johnson &
Johnson
University
of
Michigan
Philips
University of
Luxembourgh
OpenClinica.com – TraIT partnership
Statement of Work
• TraIT: automate data capture in OC as much as possible
– E.g. automate upload of excel data and hospital lab data
– Approach: OC’s Web Services
• Requires Improvements on OIDs and Bug Fixes
• Support configurable role based authentication and
authorization within OC
– E.g. Central review of images for all subjects in the different sites.
Each image is reviewed by three reviewers who are not allowed
to see each other’s reports in the CRFs
• Parameterized links in CRFs
– E.g. Links to images or to other subjects, with a dynamic URL
based on data in CRF
Other wishes
• Study migration
– E.g. Users want to switch to different OC server
– Currently only "ClinicalData" ODM is imported
– Studies can be exported in full detail but cannot be
imported as such
• Support reference to ontologies in the CRF
– Standardization of data
• Easy view for data entry
– E.g. tree structure that indicates where you are while
entering data for easy navigation to other CRF for subject
Uptake of OpenClinica
50
47 studies
77 sites
256 users
47
47
45
Number of studies
40
35
30
26
26
25
Pre TraIT effect:
all multicenter
VUmc studies
20
15
15
15
10
5
0
3
0
3
3
0
Q3 Q4 Q1 Q2 Q3
2008
2009
Q4
Also multicenter studies
UMCU, UMCN, EMC,
Meander MC
3
Q1
Q2
Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Mid
Mid
june
june
2010
2011
2012
2013
Timeline
Start
DeCoDe
OpenClinica
•
•
Start
TraIT
OpenClinica
The load on TraIT OpenClinica increased significantly in 2012
Considerable time and energy was spent on delivery management (availability, capacity and
security) and on improvement of the TraIT OpenClinica user support
Who am I?
•
My name: Sander de Ridder
s.deridder@vumc.nl
– Computer Science (MSc) & Bioinformatics (MSc)
•
Inflammatory Disease Profiling, Dept. of Pathology, VU University
medical center, Amsterdam
– Bioinformatics for Inflammatory Disease Profiling Group
– IT implementation CTMM TRACER
CTMM-TRACER
Background information on TRACER
•
CTMM TRACER: Rheumatoid Arthritis
– Prospective data
– Retrospective data (To Do)
•
Go Live:
– Wednesday the 5th of June
• Started at 9:00 - Finished at 12:00
• Approximately 1 hour/study
Prospective Studies
VERA
ERA
ESRA
Sites
4
7
7
Events
7
6
6
CRFs
~35
~30
~30
Rules
~250
~450
~650
Age Calculation
After entering the DOB
and the date of signing…
The age is calculated
Age calculation script: http://en.wikibooks.org/wiki/OpenClinica_User_Manual/AgeField
Created by Sander de Ridder and improved by Gerben Rienk
Long List Implementation
•
Problem:
– Maximum of 4000 characters for single-select response options text
– Some lists need more characters: e.g. medication list > 9000
characters
•
Solution:
– Created external list
– Add field to CRF which opens new page with list
– Allows user to select option; selected value is copied back to CRF
ITEM_NAME
RESPONSE_TYPE
RESPONSE_OPTIONS_TEXT
RESPONSE_VALUES_OR_CALCULATIONS
Smoking_Category
single-select
Never smoked, Current smoker
1,2
Example: Medication
User selects “Other” and then
clicks on question 3)’s field
A new tab/window opens with an
HTML page with a single-select
The user can select desired
medication from the list
Selected medication is copied to
the CRF
Some tools we created: CRF validator
•
Compares items between CRFs based on uids and ensures they match
– CRF1
• ID: Patient_Weight; DATA_TYPE: INT
– CRF2
• ID: Patient_Weight; DATA_TYPE: REAL
 Mismatch for Patient_Weight!
•
Checks NULL-flavour coding integrity
– Coding: -1=No Information, -2=Not Applicable, -3=Unknown, …
– CRF1
• RESPONSE_OPTIONS_TEXT: No Information
RESPONSE_VALUES_OR_CALCULATIONS: -2
 Incorrect NULL-flavour coding!
 Prevents errors and inconsistencies
Some tools we created: ID-Translator
•
•
Move rules file to new OC server  replace all item IDs
Automatic translation of item identifiers in rules
 Prevented replace errors and saved many hours of work
•
Requires:
–
ViewCRFVersion file
•
–
Contains item ID information for CRF on new server
Rule file with properly specified header
•
Contains item ID information for CRF on old server
ITEM_NAME
OC_ID
ViewCRFVersion (new Server)
Rules for old server
Parse ViewCRFVersion
 mapping ITEM_NAME – new OC_ID
MedicatieBijgewerkt = I_TRACE_MEDICATIEBIJGEWERKT_4714
ITEM_NAME
Parse Header of rule file
 mapping ITEM_NAME – old OC_ID
OC_ID
MedicatieBijgewerkt = I_TRACE_PATIENTSTUDIE_MOMENT_AFROND
Translate rule file
old OC_ID  new OC_ID via ITEM_NAME
I_TRACE_PATIENTSTUDIE_MOMENT_AFROND =
I_TRACE_MEDICATIEBIJGEWERKT_4714
Translated Rules for new server
Things we learned/found useful
•
ITEM_NAME max 64 characters
–
•
Truly unique identifiers (description label)
–
–
•
Prevent conflict with retrospective data
Easy to keep NULL-flavour coding consistent
Specify identifiers in header of rule file
–
•
Easy to link to study definition (CTMMC)
Useful for consistency checking
Negative NULL-flavour coding
–
–
•
SPSS compatibility
Automatic translation
JavaScript code
–
$.noConflict();
•
–
Reference to jquery
•
•
•
Prevents our code from interfering with OC’s code
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js">
Prevents dependency on OC’s jQuery version
Create a checklist and follow it during go-live
Goal: make researchers want to use OpenClinica and tranSMART
Acknowledgements
And many more…
Download