Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phenotyping Jyotishman Pathak, PhD Assistant Professor of Biomedical Informatics June 11, 2012 Project 3: Collaborators & Acknowledgments • CDISC (Clinical Data Interchange Standards Consortium) • Rebecca Kush, Landen Bain • Centerphase Solutions • Gary Lubin, Jeff Tarlowe • Group Health Seattle • David Carrell • Harvard University/MIT • Guergana Savova, Peter Szolovits • Intermountain Healthcare/University of Utah • Susan Welch, Herman Post, Darin Wilcox, Peter Haug • Mayo Clinic • Cory Endle, Rick Kiefer, Sahana Murthy, Gopu Shrestha, Dingcheng Li, Gyorgy Simon, Matt Durski, Craig Stancl, Kevin Peterson, Cui Tao, Lacey Hart, Erin Martin, Kent Bailey, Scott Tabor SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-2 SHARPn High-Throughput Phenotyping Phenotyping is still a bottleneck… [Image from Wikipedia] ©2012 MFMER | slide-4 EHR systems: United States 2002—2011 [Millwood et al. 2012] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-5 Electronic health records (EHRs) driven phenotyping • EHRs are becoming more and more prevalent within the U.S. healthcare system • Meaningful Use is one of the major drivers • Overarching goal • To develop high-throughput automated techniques and algorithms that operate on normalized EHR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-6 http://gwas.org SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-7 EHR-driven Phenotyping Algorithms - I • Typical components • • • • • • • Billing and diagnoses codes Procedure codes Labs Medications Phenotype-specific co-variates (e.g., Demographics, Vitals, Smoking Status, CASI scores) Pathology Imaging? • Organized into inclusion and exclusion criteria SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-8 EHR-driven Phenotyping Algorithms - II Rules Evaluation Phenotype Algorithm Transform Mappings Visualization Transform Data NLP, SQL SHARPn High-Throughput Phenotyping [eMERGE Network] ©2012 MFMER | slide-9 Example: Hypothyroidism Algorithm No thyroid-altering medications (e.g., Phenytoin, Lithium) 2+ non-acute visits in 3 yrs ICD-9s for Hypothyroidism Abnormal TSH/FT4 Thyroid replace. meds Antibodies for TTG or TPO (anti-thyroglobulin, anti-thyroperidase) No ICD-9s for Hypothyroidism No Abnormal TSH/FT4 No thyroid replace. meds No Antiboides for TTG/TPO No secondary causes (e.g., pregnancy, ablation) No hx of myasthenia gravis Case 1 Case 2 Control [Denny et al., 2012] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-10 Hypothyroidism Algorithm: Validation Positive Predictive Values (PPV) Based on Chart Review – All Sites EHR-based Cases/Controls Sampled for Chart Review Cases/Controls Old Case PPV (%) New Case PPV (%) Group Health 430/1,188 50/50 92 98 Marshfield 509/1193 50/50 88 91 Mayo Clinic 250/2,145 100/100 76 97 103/516 50/50 88 98 184/1,344 50/50 90 98 1,421/6,362 — 87 96 Site Northwestern Vanderbilt All sites [Denny et al., 2012] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-11 Data Categories used to define the EHR-driven Phenotyping Algorithms Clinical gold EHR-derived Phenotype Validation standard phenotype Definitions (PPV/NPV) Alzheimer’s Dementia Demographics, clinical examination of mental status, histopathologic examination Diagnoses, medications Demographics, laboratory tests, radiology reports Cataracts Clinical exam finding (Ophthalmologic examination) Diagnoses, procedure codes Demographics, medications 98%/98% Peripheral Arterial Disease Radiology test results (ankle-brachial index or arteriography) Diagnoses, Demographics procedure codes, medications, radiology test results 94%/99% Type 2 Diabetes Laboratory Tests Diagnoses, laboratory Demographics, tests, medications height, weight, family history 98%/100% Cardiac Conduction ECG measurements ECG report results 73% Demographics, 97% diagnoses, procedure codes, medications, [eMERGE Network] laboratory tests ©2012 MFMER | slide-12 Genotype-Phenotype Association Results disease Atrial fibrillation Crohn's disease Multiple sclerosis Rheumatoid arthritis Type 2 diabetes marker gene / region rs2200733 Chr. 4q25 rs10033464 Chr. 4q25 rs11805303 IL23R rs17234657 Chr. 5 rs1000113 Chr. 5 rs17221417 NOD2 rs2542151 PTPN22 rs3135388 DRB1*1501 rs2104286 IL2RA rs6897932 IL7RA rs6457617 Chr. 6 rs6679677 RSBN1 rs2476601 PTPN22 rs4506565 TCF7L2 rs12255372 TCF7L2 rs12243326 TCF7L2 rs10811661 CDKN2B rs8050136 FTO rs5219 KCNJ11 rs5215 KCNJ11 rs4402960 IGF2BP2 0.5 0.5 published 1.0 2.0 Odds Ratio SHARPn High-Throughput Phenotyping observed 5.0 5 [Ritchie et al. 2010] ©2012 MFMER | slide-13 Key lessons learned from eMERGE • Algorithm design and transportability • • • • Non-trivial; requires significant expert involvement Highly iterative process Time-consuming manual chart reviews Representation of “phenotype logic” for transportability is critical • Standardized data access and representation • Importance of unified vocabularies, data elements, and • • value sets Questionable reliability of ICD & CPT codes (e.g., billing the wrong code since it is easier to find) Natural Language Processing (NLP) is critical SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-14 Algorithm Development Process - Modified Rules Semi-Automatic Execution Evaluation Phenotype Algorithm Transform Mappings Visualization Transform Data NLP, SQL SHARPn High-Throughput Phenotyping [eMERGE Network] ©2012 MFMER | slide-15 Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria Use the NQF Quality Data Model (QDM) • Rules • Conversion of structured phenotype criteria into executable queries Evaluation • Use JBoss® Drools (DRLs) Semi-Automatic Execution Phenotype Algorithm • Standardized representation of Visualization • Transform Mappings Transform clinical data Create new and re-use existing clinical element models (CEMs) Data NLP, SQL SHARPn High-Throughput Phenotyping [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] ©2012 MFMER | slide-16 The SHARPn “phenotyping funnel” CEMs Mayo Clinic EHR QDMs Intermountain EHR DRLs Phenotype specific patient cohorts [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-17 Clinical Element Models Higher-Order Structured Representations [Stan Huff, IHC] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-18 Pre- and Post-Coordination [Stan Huff, IHC] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-19 CEMs available for patient demographics, medications, lab measurements, procedures etc. SHARPn High-Throughput Phenotyping [Stan Huff, IHC] SHARPn data normalization flow - I CEM MySQL database with normalized patient information [Welch et| al. 2012] ©2012 MFMER slide-21 SHARPn data normalization flow - II CEM MySQL database with normalized patient information SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-22 Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria Use the NQF Quality Data Model (QDM) • Rules Semi-Automatic Execution Evaluation Phenotype Algorithm • Standardized representation of Visualization • Transform Mappings Transform clinical data Create new and re-use existing clinical element models (CEMs) Data NLP, SQL SHARPn High-Throughput Phenotyping [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] ©2012 MFMER | slide-23 Our task: human readable machine computable [Thompson et al., submitted 2012] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-24 NQF Quality Data Model (QDM) • Standard of the National Quality Forum (NQF) • A structure and grammar to represent quality measures in a standardized format • Groups of codes in a code set (ICD-9, etc.) • "Diagnosis, Active: steroid induced diabetes" using "steroid induced diabetes Value Set GROUPING (2.16.840.1.113883.3.464.0001.113)” • Supports temporality & sequences • AND: "Procedure, Performed: eye exam" > 1 year(s) starts before or during "Measurement end date" • Implemented as set of XML schemas • Links to standardized terminologies (ICD-9, ICD-10, SNOMED-CT, CPT-4, LOINC, RxNorm etc.) SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-25 116 Meaningful Use Phase I Quality Measures SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-26 Example: Diabetes & Lipid Mgmt. - I Human readable HTML SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-27 Example: Diabetes & Lipid Mgmt. - II Computable XML SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-28 NQF Measure Authoring Tool (MAT) SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-29 Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria Use the NQF Quality Data Model (QDM) • Rules • Conversion of structured phenotype criteria into executable queries Evaluation • Use JBoss® Drools (DRLs) Semi-Automatic Execution Phenotype Algorithm • Standardized representation of Visualization • Transform Mappings Transform clinical data Create new and re-use existing clinical element models (CEMs) Data NLP, SQL SHARPn High-Throughput Phenotyping [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] ©2012 MFMER | slide-30 JBoss® open-source Drools rules based management system (RBMS) • Represents knowledge with declarative production rules • Origins in artificial intelligence expert systems • Simple when <pattern> then <action> rules specified in text files • Separation of data and logic into separate components • Forward chaining inference model (Rete algorithm) • Domain specific languages (DSL) SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-31 Example Drools rule {Rule Name} rule when "Glucose <= 40, Insulin On“ {binding} {Java Class} {Class Getter Method} $msg : GlucoseMsg(glucoseFinding <= 40, currentInsulinDrip > 0 ) then {Class Setter Method} glucoseProtocolResult.setInstruction(GlucoseInstructions GLUCOSE _LESS_THAN_40_INSULIN_ON_MSG); end Parameter {Java Class} SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-32 Automatic translation from NQF QDM criteria to Drools Measure Authoring Toolkit From non-executable to executable Drools Engine Measures XML-based Structured representation Data Types XML-based structured representation Value Sets Converting measures to Drools scripts Mapping data types and value sets Drools scripts Fact Models saved in XLS files [Li et al., submitted 2012] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-33 Automatic translation from NQF QDM criteria to Drools [Li et al., submitted 2012] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-34 The “executable” Drools flow ©2012 MFMER | slide-35 Phenotype library and workbench - I http://phenotypeportal.org 1. Converts QDM to Drools 2. Rule execution by querying the CEM database 3. Generate summary reports ©2012 MFMER | slide-36 Phenotype library and workbench - II http://phenotypeportal.org ©2012 MFMER | slide-37 Phenotype library and workbench - III http://phenotypeportal.org ©2012 MFMER | slide-38 Phenotype library and workbench - IV SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-39 SHARPn High-Throughput Phenotyping Additional on-going research efforts - I • Machine learning and association rule mining • Manual creation of algorithms take time • Let computers do the “hard work” • Validate against expert developed ones [Caroll et al. 2011] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-41 Additional on-going research efforts - I • • • • • Patien TB t Origins from sales data Items (columns): co-morbid conditions Transactions (rows): patients Itemsets: sets of co-morbid conditions Goal: find all itemsets (sets of conditions) that frequently co-occur in patients. • One of those conditions should be DM. DL M ND … IEC 001 Y Y Y Y 002 Y Y Y Y 003 Y Y 004 Y 005 A Y Y B Y C D • Support: # of transactions the itemset I appeared in • Support({TB, DLM, ND})=3 • Frequent: an itemset I is frequent, if support(I)>minsup AB AC ABD AD BC BD CD ACD X: infrequent [Simon et al. 2012] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-42 Additional on-going research efforts - II SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-43 Additional on-going research efforts - II TRALI/TACO sniffer SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-44 SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-45 Active Surveillance for TRALI and TACO Of the 88 TRALI cases correctly identified by the CART algorithm, only 11 (12.5%) of these were reported to the blood bank by the clinical service. Of the 45 TACO cases correctly identified by the CART algorithm, only 5 (11.1%) were reported to the blood bank by the clinical service. SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-46 Additional on-going research efforts - III • Phenome-wide association scan (PheWAS) • Do a “reverse GWAS” using EHR data • Facilitate hypothesis generation [Pathak et al. submitted 2012] ©2012 MFMER | slide-47 Publications till date (conservative) 14 12 12 10 8 6 8 6 Papers Abstracts Under review 6 4 2 2 0 Year 1 (2011) Year 2 (2012) Year 3 (2013) SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-48 Mayo projects and collaborations • Ongoing • Transfusion related acute lung injury (Kor) • Drug induced liver injury (Talwalkar) • Drug induced thrombocytopenia and neutropenia (Al-Kali) • Active surveillance for celiac disease (Murray) • Warfarin dose response & heartvalve replacements (Pereira) • Phenotype definition standardization (HCPR/Quality) • Getting started/planning • Pharmacogenomics of systolic heart failure • • • (Bielinski/Pereira) Pharmacogenomics of SSRI (Mrazek/Weinshilboum) Lumbar image reporting with epidemiology (Kallmes) Active clinical trial alerting (CTMS/Cancer Center) SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-49 HTP related presentations • June 11th, 2012 • Using EHRs for clinical research (Vitaly Herasevich) • Association rule mining and T2D risk prediction (Gyorgy Simon) • Scenario-based requirements engineering for developing EHR add-ons to support CER in patient care settings (Junfeng Gao) • June 12th, 2012 • Exploring patient data in context clinical research studies: Research • • • • Data Explorer (Adam Wilcox et al.) Utilizing previous result sets as criteria for new queries with FURTHeR (Dustin Schultz et al.) Semantic search engine for clinical trials (Yugyung Lee) Knowledge-driven workbench for predictive modeling (Peter Haug et al.) Clinical analytics driven care coordination for 30-day readmission – Demonstration from 360 Fresh.com (Ramesh Sairamesh) SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-50 Thank You! Pathak.Jyotishman@mayo.edu SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-51