TIES Cancer Research Network Y3 Face to Face Meeting U24 CA 180921 Session 3 New Partner Introductions October 9th, 2015 University of Pennsylvania Overview of Research Biobanking and Data Analytics Sidney Kimmel Cancer Center at Thomas Jefferson University John Reber Systems Development Manager Translational Pathology Core, Jefferson SKCC TCRN F2F, Washington DC, October 9, 2015 Overview of the Sidney Kimmel Cancer Center (SKCC) at Jefferson Background Jefferson Medical College was founded in 1824, and the Hospital a year later. JMC is the second largest private medical school in the U.S. The Sidney Kimmel Cancer Center at Jefferson was founded in 1991 with approximately 30 investigators in the basic sciences. Today, the SKCC has approximately 400 members that include physicians and scientists dedicated to discovery and development of novel approaches for cancer treatment. About SKCC SKCC’s mission is to make transformational discoveries of the cellular and molecular biology of the malignant process, and effectively translate the latest research discoveries into clinical trials to provide the highest quality of care to all patients including those of diverse ethnic and racial populations. The SKCC’s Jefferson Cancer Network oversees clinical trials research at over 20 community hospitals and practices in the Pennsylvania, Delaware, New Jersey, and New York. SKCC’s IT infrastructure GE Centricity inpatient EMR Allscripts outpatient (ambulatory care) EHR Cerner A/P lab system EPIC inpatient and outpatient EPIC Beaker OpenSpecimen research biobank management TIES clinical text extraction i2b2 research data mart TriNetX data analytics network Specimen annotation management TJUH clinical paraffin block archive Pathology Department research tissue bank (J. Evans, PI) Pancreatic tumor bank (C. Yeo, PI) Breast tumor bank (J. Palazzo, PI) Thyroid tumor bank (E. Pribitkin, PI) Brain tumor bank (D. Andrews, PI) Liver tumor bank (V. Navarro, PI) JJJjjjj Brain tumor bank Jefferson integrated Research Specimen management (OpenSpecimen) > 230,000 patients > 650,000 specimens > 100,000 patients via i2b2 RDM Cancer patients having comprehensive annotation from the Tumor Registry and banked specimens Text Information Extraction System (TIES) deployment at SKCC Version 5.3 of TIES has been deployed at Jefferson SKCC. To date, ~450,000 pathology reports are available from TIES. Pathology reports are automatically communicated from the Cerner A/P system to the TIES database via an HL7 feed. De-identification of the reports is accomplished using the De-ID Corporation product. Jefferson’s i2b2 Research Data Mart • Built on “informatics for integrating biology and the bedside” (i2b2) framework from the NIH-funded National Center for Biomedical Computing based at Partners HealthCare System (Harvard). • RDM data are de-identified. Re-identification possible via an honest broker, who has access to a reidentification application. • Currently ~ 45 million observations on > 450,000 patients. Data refreshed weekly. Current Jefferson Data Resource Landscape i2b2 RESEARCH DATA MART OPEN SPECIMEN biospecimen annotation (SNOMED) TJUH CLINICAL DATA WAREHOUSE IMPAC METRIQ DEMOGRAPHICS (gender, race, age, vital status, ethnicity) cancer registry site, stage, histology, treatment, survival (ICD-O-3) DIAGNOSES (ICD9) PROCEDURES (ICD9) CLINICAL LABS (LOINC) CERNER A/P “omic” data MEDICATIONS FORTE ONCORE clinical trial data Patient data obtained from TJUH EMR DEMOGRAPHICS Age Ethnicity Gender Race Vital Status (alive/dead) DIAGNOSES Disease systems --> diseases (organized by ICD9 coding) CLINICAL LAB RESULTS Chemistry Coagulation Hematology MEDICATIONS Anti-neoplastic INPATIENT PROCEDURES Diagnostic and Treatment procedures (organized by ICD9 coding) Patient mutation data obtained from Pathology Molecular Diagnostic Testing (both outsourced and in-house) ALK rearrangement BRAF p.D594E BRAF p.K601E BRAF p.V600E c.1782T>G EGFR EGFR EGFR p.E746K EGFR Deletion in exon 19 Insertion in exon 20 c.2236G>A c.1801A>G c.1799T>A c.2236_2250del15 p.E746_A750delELREA EGFR c.2156G>C p.G719A EGFR c.2155G>T p.G719C EGFR c.2155G>A p.G719S EGFR c.2573T>G p.L858R EGFR c.2582T>A p.L861Q EGFR c.2303G>T p.S768I JAK2 p.V617F c.1849G>T JAK3 p.V722I c.2164G>A KRAS p.G12A KRAS p.G12C KRAS p.G12D KRAS p.G12R KRAS p.G12S KRAS p.G12V KRAS p.G13D c.35G>C NRAS p.Q61H NRAS p.Q61K NRAS p.Q61L NRAS p.Q61R c.183A>T PIK3CA p.E545K PIK3CA p.H1047L PIK3CA p.H1047R c.1633G>A c.34G>T c.35G>A c.34G>C c.34G>A c.35G>T c.38G>A c.181C>A c.182A>T c.182A>G c.3140A>T c.3140A>G TP53 c.843C>A TP53 c.811G>T TP53 c.857A>C TP53 c.400T>C TP53 c.734G>A p.G245D TP53 c.388C>G TP53 c.524G>A p.R175H TP53 c.817C>T p.R273C TP53 c.818G>A p.R273H TP53 c.318C>G TP53 c.659A>G TP53 c.707A>G p.D281E p.E271* p.E286A p.F134L p.L130V p.S106R p.Y220C p.Y236C Specimen annotation from campus biobanks Eight biobanks, including the TJUH paraffin block archive of ~400,000 cases since 1990. Anatomic origin (SNOMED) Class (tissue, fluid) Type (frozen, FFPE) Pathology (normal, malignant, diseased) Slide images Patient data from Jefferson Tumor Registry Over 100,000 cases since 1990. Primary Cancer Diagnosis Age at diagnosis/date of diagnosis Survival (months) from diagnosis Tumor histology and behavior Stage (AJCC/TNM, clinical and pathological) Grade Recurrence local, distant Treatment chemotherapy, radiation, surgery, transplant, palliative Disease-specific factors ex: (prostate --> Gleason score) Example data summaries from the i2b2 RDM CLINICAL DIAGNOSES OF TJUH PATIENTS WITH THYROID SPECIMENS Pathology images are available via i2b2 query tool TriNetX application offers an alternative query tool with enhanced data visualization Google-like query interface Graphic result display TriNetX application offers an alternative query toolwith enhanced data visualization Interactive display capability Selected areas of research using RDM: Hallgeir Rui, MD, PhD: Molecular Cancer Epidemiology, cancer pharmacogenetics, individualised cancer risk assessment and prognostication. Hushan Yang, PhD: Molecular Cancer Epidemiology. Scott Waldman, MD, PhD: Pharmacology and experimental therapeutics. Ron Myers, PhD: Gene environmental risk assessmant. Stephen Peiper, MD: Biomarker discovery using Next Generation Sequencing. Future Plans: Allow access to TIES report directly from i2b2 selected cohorts (as we do with slide images). Provide direct investigator access to the TIES application. Provide direct investigator access to the TCRN. Cancer Research @ SB: Program Development Programs under development: • Metastasis and Experimental Therapeutics • Metabolomics/Lipidomics • Precision Approaches to Cancer: Imaging, informatics, genomics: Wei Zhao and Helene Beneviste, Joel Saltz, Scott Powers • Prevention and Population Research • GI Cancer Integrative Multi-scale Analysis in Biomedical Informatics • Predict treatment outcome, select, monitor treatments • Computer assisted exploration of new classification schemes • Integrated analysis and presentation of observations, features analytical results – human and machine generated Current ITCR Specific Aim 2: Database infrastructure to manage and query image data, image analysis results. Specific Aim 3: HPC software that targets clusters, cloud computing, and leadership scale systems. Specific Aim 4: Develop visualization middleware for 2D/3D image and feature data and for integrated image and “omic” data. Image Analysis Quantitative Feature Analysis in Pathology: Emory In Silico Center for Brain Tumor Research Dan Brat (PI), Joel Saltz (PD) NLM/NCI: Integrative Analysis/Digital Pathology R01LM011119,R01LM009239 Joel Saltz and David Foran NCI: Tools to Analyze Morphology and Spatially Mapped Molecular Data, 1U24CA180924-01A1 Joel Saltz Marcus Foundation Grant Ari Kaufman, Joel Saltz Nuclear and feature segmentations yield new determinants/correlates for outcomes. Current ITCR: Storage and Visualization Scalable storage architectures allow for effective and efficient comparison of multiple image analysis methods. Current ITCR: Classification with CNNs Le Hou, Dimitris Samaras, Tahsin Kurc,Yi Gao, Liz Vanner, James Davis, Joel Saltz How do we use TIES? • IT actively deploying TIES for consortium and internal use. • Regulatory progress – IRB submission in preparation. – Accept Terms of Agreement. – Assigned roles. • Extract additional morphology features for integrative research. Using TIES to derive Morphology Features A simple extraction pipeline: Process pathology reports using TIES. Learn which concepts indicate morphology features. Identify modifiers that specify values for these features. Using TIES to derive Morphology Features How good is this? Feature mention detection F1 = 0.93 Feature is absent or not F1 = 0.74 Feature value assignment F1 = 0.67 Collaboration Opportunities with TCRN • Develop new tools that combine human generated features and machine generated features. – Collaboration on TIES analysis service enhancements. • Joint engagement with cancer research partners to exploit these tools. • Institutional collaboration with TIES in biomarker and clinical study research Our Vision for Collaboration Goal: Proposal: Facilitate integrated analysis of whole slide images and pathology reports on the same specimen. A micro-services composition for a global API ecosystem. Integrated Image / NLP analysis Context User Interfaces Image Analysis HTTP REST API NLP