Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phenotyping Project Lead: Jyotishman Pathak, PhD PI: Christopher G. Chute, MD, DrPH June 12, 2012 Electronic health records (EHRs) driven phenotyping • Overarching goal • To develop high-throughput automated techniques and algorithms that operate on normalized EHR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-2 Current HTP project themes • Standardization of phenotype definitions • Library of phenotyping algorithms • Phenotyping workbench • Machine learning techniques for phenotyping • Just-in-time phenotyping SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-3 Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria Use the NQF Quality Data Model (QDM) • Rules • Conversion of structured phenotype criteria into executable queries Evaluation • Use JBoss® Drools (DRLs) Semi-Automatic Execution Phenotype Algorithm • Standardized representation of Visualization • Transform Mappings Transform clinical data Create new and re-use existing clinical element models (CEMs) Data NLP, SQL SHARPn High-Throughput Phenotyping [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] ©2012 MFMER | slide-4 NQF Quality Data Model (QDM) • Standard of the National Quality Forum (NQF) • A structure and grammar to represent quality measures in a standardized format • Groups of codes in a code set (ICD-9, etc.) • "Diagnosis, Active: steroid induced diabetes" using "steroid induced diabetes Value Set GROUPING (2.16.840.1.113883.3.464.0001.113)” • Supports temporality & sequences • AND: "Procedure, Performed: eye exam" > 1 year(s) starts before or during "Measurement end date" • Implemented as set of XML schemas • Links to standardized terminologies (ICD-9, ICD-10, SNOMED-CT, CPT-4, LOINC, RxNorm etc.) SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-5 116 Meaningful Use Phase I Quality Measures SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-6 Example: Diabetes & Lipid Mgmt. - I Human readable HTML SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-7 Example: Diabetes & Lipid Mgmt. - II Computable XML SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-8 Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria Use the NQF Quality Data Model (QDM) • Rules • Conversion of structured phenotype criteria into executable queries Evaluation • Use JBoss® Drools (DRLs) Semi-Automatic Execution Phenotype Algorithm • Standardized representation of Visualization • Transform Mappings Transform clinical data Create new and re-use existing clinical element models (CEMs) Data NLP, SQL SHARPn High-Throughput Phenotyping [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] ©2012 MFMER | slide-9 Drools-based Phenotyping Architecture Clinical Element Database Data Access Layer Business Logic Transformation Layer Transform physical representation Normalized logical representation (Fact Model) Inference Engine (Drools) Service for Creating Output (File, Database, etc) SHARPn High-Throughput Phenotyping List of Diabetic Patients ©2012 MFMER | slide-10 Automatic translation from NQF QDM criteria to Drools [Li et al., submitted 2012] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-11 The “executable” Drools flow ©2012 MFMER | slide-12 Phenotype library and workbench - I http://phenotypeportal.org 1. Converts QDM to Drools 2. Rule execution by querying the CEM database 3. Generate summary reports ©2012 MFMER | slide-13 Phenotype library and workbench - II http://phenotypeportal.org ©2012 MFMER | slide-14 Phenotype library and workbench - III SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-15 Machine learning and HTP - I • Machine learning and association rule mining • Manual creation of algorithms take time • Let computers do the “hard work” • Validate against expert developed ones [Caroll et al. 2011] SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-16 Machine learning and HTP - II • • • • • Patien TB t Origins from sales data Items (columns): co-morbid conditions Transactions (rows): patients Itemsets: sets of co-morbid conditions Goal: find all itemsets (sets of conditions) that frequently co-occur in patients. • One of those conditions should be DM. DL M ND … IEC 001 Y Y Y Y 002 Y Y Y Y 003 Y Y 004 Y 005 A Y Y B Y C D • Support: # of transactions the itemset I appeared in • Support({TB, DLM, ND})=3 • Frequent: an itemset I is frequent, if support(I)>minsup AB AC ABD AD BC BD CD ACD X: infrequent [Simon et al. 2012] SHARPn High-Throughput Phenotyping Just-in-Time phenotyping - I Transfusion-related Acute Lung Injury (TRALI) Transfusion-associated Circulatory Overload (TACO) Electronic Health Records and Phenomics Just-in-Time phenotyping - II TRALI/TACO “sniffer” SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-19 Electronic Health Records and Phenomics Active Surveillance for TRALI and TACO Of the 88 TRALI cases correctly identified by the CART algorithm, only 11 (12.5%) of these were reported to the blood bank by the clinical service. Of the 45 TACO cases correctly identified by the CART algorithm, only 5 (11.1%) were reported to the blood bank by the clinical service. SHARPn High-Throughput Phenotyping Publications till date (conservative) 14 12 12 10 8 6 8 6 Papers Abstracts Under review 6 4 2 2 0 Year 1 (2011) Year 2 (2012) Year 3 (2013) SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-22 2011 Milestones Standardized definitions for phenotype criteria Rules-based environment for phenotype algorithm execution National library for standardized phenotype definitions (collaboration with eMERGE) Machine learning techniques for algorithm definitions Online, real-time phenotype execution Phenotyping algorithm authoring environment SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-23 2012 Milestones • Machine learning techniques for algorithm definitions • Online, real-time phenotype execution • Collaboration with NQF, Query Health and i2b2 infrastructures • Use cases and demonstrations • MU quality metrics (w/ NQF, Query Health) • Cohort identification (w/ eMERGE, PGRN) • Value analysis (w/ Mayo CSHCD, REP) • Clinical trial alerting (w/ Mayo Cancer Ctr./CTSA) SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-24 Project 3: Collaborators & Acknowledgments • CDISC (Clinical Data Interchange Standards Consortium) • Rebecca Kush, Landen Bain • Centerphase Solutions • Gary Lubin, Jeff Tarlowe • Group Health Seattle • David Carrell • Harvard University/MIT • Guergana Savova, Peter Szolovits • Intermountain Healthcare/University of Utah • Susan Welch, Herman Post, Darin Wilcox, Peter Haug • Mayo Clinic • Cory Endle, Rick Kiefer, Sahana Murthy, Gopu Shrestha, Dingcheng Li, Gyorgy Simon, Matt Durski, Craig Stancl, Kevin Peterson, Cui Tao, Lacey Hart, Erin Martin, Kent Bailey, Scott Tabor, Chris Chute SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-25