SHARPn Secondary Data Use Normalization, NLP, Phenotyping CG Chute, SM Huff - coPIs SHARPn Team • Agilex Technologies • Harvard Univ. • CDISC (Clinical Data Interchange • Intermountain Healthcare Standards Consortium) • Mayo Clinic • Centerphase Solutions • Mirth Corporation, Inc. • Deloitte • MIT • Group Health, Seattle • MITRE Corp. • IBM Watson Research Labs • Regenstrief Institute, Inc. • University of Utah • SUNY • University of Pittsburgh • University of Colorado Themes & Projects http://informatics.mayo.edu/sharp/index.php/SHARP_Project_Wiki:Current_events Data Normalization Highlights H Liu And team Data Normalization Target Value Sets Information Models Normalization Targets Tooling Raw EMR Data Normalized EMR Data Normalization Process Normalization Targets Clinical Element Models – Based on Intermountain Healthcare/GE Healthcare’s detailed clinical models – Future with CIMI Clinical Information Modeling Initiative Terminology/value sets associated with the models – Using standards where possible Secondary Use Clinical Element Models http://www.clinicalelement.com GenericStatement Core CEMs SecondaryUse CEMs GenericComponent Links AdministrativeGender, … Severity, Status Embracing the fact that data may not be able to be normalized and enabling bottom-up and top-down Normalization Process Configuration of Model (Syntactic) and Terminology (Semantic) Mapping UIMA Pipeline to transform raw EMR data to normalized EMR data based on mappings End-to-end DN framework NLP Highlights GS Savova And team • • • • • • • • Sign/Symptom CEM template Disease/Disorder CEM template Alleviating_factor associatedCode Body_laterality Body_location Body_side Conditional Course Duration End_time Exacerbating_factor Generic Negation_indicator Relative_temporal_context Severity Start_time Subject Uncertainty_indicator Alleviating_factor Associated_sign_or_symptom associatedCode Body_laterality Body_location Body_side Conditional Course Duration End_time Exacerbating_factor Generic Negation_indicator Relative_temporal_context Severity Start_time Subject Uncertainty_indicator Medication CEM template associatedCode Change_status Conditional Dosage Duration End_date Form Frequency Generic Negation_indicator Route Start_date Strength Subject Uncertainty_indicator • Change_status associatedCode • Dosage Body Location • Duration Conditional • End_date Generic • Form Negation_indicator Procedure CEM template • Frequency Lab CEM template associatedCode Severity Anatomical Site CEM• template Route Body_laterality Abnormal_interpretation Body_location associatedCode associatedCode Subject Body_side • Start_date Body_laterality Conditional Conditional Body_side Delta_flag Uncertainty_indicator Device • Strength Conditional Estimated_flag Generic Lab_value Negation_indicator Ordinal_interpretation Reference_range_narrative Subject Uncertainty_indicator Generic Negation_indicator Subject Uncertainty_indicator End_date Generic Method Negation_indicator Relative_temporal_context Start_date Subject Uncertainty_indicator Processing Clinical Notes A 43-year-old woman was diagnosed with type 2 diabetes A 43-year-old woman was diagnosed with type 2 diabetes mellitus mellitus by her family physician 3 months before this by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg 2.5 mg once daily was prescribed. Since then, Glyburide self-monitoring of once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood blood glucose (SMBG) showed blood glucose levels of 250-270 glucose levels of 250-270 mg/dL. She was referred to an mg/dL. She was referred to an endocrinologist for further endocrinologist for further evaluation. evaluation. On acutely examination, On examination, she was normotensive and not ill. Hershe was normotensive and not acutely ill.a Her body mass index (BMI) was 18.7 kg/m2 following body mass index (BMI) was 18.7 kg/m2 following recent 10 lb a recentand 10 ankle lb weight loss. Her thyroid was weight loss. Her thyroid was symmetrically enlarged symmetrically enlarged and ankle reflexes absent. Her reflexes absent. Her blood glucose was 272 mg/dL, and her bloodshowed glucose was 272 mg/dL, and her hemoglobin A1c hemoglobin A1c (HbA1c) was 10.3%. A lipid profile a total (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL cholesterol of 261 mg/dL, triglyceride level of 321 level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. was normal. Urinanalysis showed trace ketones. Thyroid function was normal. Urinanalysis showed trace She adhered to a regular exercise program and vitamin regimen, ketones. smoked 2 packs of cigarettes daily for the past 25 years, and She adhered to a regular exercise program and vitamin limited her alcohol intake to 1 drink daily. Her mother's brother regimen, smoked 2 packs of cigarettes daily for the was diabetic. past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic. A 43-year-old woman A 43-year-old woman was was diagnosed with diagnosed with type 2 type 2 diabetes mellitus diabetes mellitus by her by her family physician family physician 3 A 43-year-old woman was3 months before this mpresentation. Her initial diagnosed with type 2 diabetes presentation. Her blood glucose wasby 340 mg/dL. mellitus her family physician initial blood glucose Glyburide 3 months before this was 340 mg/dL. presentation. Her initial blood Glyburide glucose was 340 mg/dL. Glyburide Clinical Element Model Disorder CEM text: code: subject: relative temporal context: negation indicator: diabetes mellitus 73211009 patient 3 months ago not negated Medication CEM text: code: subject: frequency: negation indicator: strength: Glyburide 315989 patient once daily not negated 2.5 mg Tobacco Use CEM text: code: subject: relative temporal context: negation indicator: smoking 365981007 patient 25 years not negated Disorder CEM text: code: subject: relative temporal context: negation indicator: diabetes mellitus 73211009 family member not negated A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic. Apache cTAKES: Components • • • • • • Sentence boundary detection (Apache OpenNLP technology) Tokenization (rule-based) Morphologic normalization (NLM’s LVG) POS tagging (Apache OpenNLP technology) Shallow parsing (Apache OpenNLP technology) Named Entity Recognition Dictionary mapping (lookup algorithm) types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications • Assertion discovery (attributes for negation, uncertainty, conditional, generic) • Dependency parser • Constituency parser • • • • • • 15 Semantic Role Labeling Relation Extraction Co-reference module Drug Profile module Smoking status classifier Clinical Element Model (CEM) normalization module High Throughput Clinical Phenotyping Highlights J Pathak And team EHR-driven Phenotyping Algorithms – The Process Rules Evaluation Phenotype Algorithm Transform Mappings Visualization Transform Data NLP, SQL High-Throughput Phenotyping from EHRs [eMERGE Network] Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria Use the NQF Quality Data Model (QDM) • Rules • Conversion of structured phenotype criteria into executable queries Evaluation • Use JBoss® Drools (DRLs) Semi-Automatic Execution Phenotype Algorithm • Standardized representation Visualization • Transform Mappings Transform of clinical data Create new and re-use existing clinical element models (CEMs) Data NLP, SQL High-Throughput Phenotyping from EHRs [Welch et al., JBI 2012; 45(4):763-71] DROOLS [Li et al., AMIA 2012] High-Throughput Phenotyping from EHRs http://phenotypeportal.org [Endle et al., AMIA 2012] NLM funded Library of Computable Phenotyping Algorithms High-Throughput Phenotyping from EHRs Clinical Validation Highlights KR Bailey And team Validation Highlights Enumeration of sources by datatypes : –a) Diagnoses –b) Laboratory values –c) Vital signs (Ht, Wt, BMI, SBP, DBP, HR) –d) Medications Characterize sources, availability, quality Compare sources and data –a) Within institution –b) Across institution URLs http://sourceforge.net/projects/sharpn/ http://ctakes.apache.org http://phenotypeportal.org/ SHARPn: Secondary Use of EHR Data