Improving Access to Clinical Data Locked in Narrative Reports: An Informatics Approach Wendy W. Chapman, PhD Division of Biomedical Informatics University of California, San Diego Overview • The promise of natural language processing (NLP) • Challenges of developing NLP in the clinical domain • Challenges in applying NLP in the clinical domain • Improving access to text through NLP resources The promise of NLP • Vast & growing amounts of clinical text • Rich in information – Patient care – Evaluation/QC – Comparative effectiveness research – Epidemiology • Locked in free text • Natural language promising can help unlock that information • Encouraging NLP success stories The promise of NLP Murff (2011) JAMA NLP captures: • Renal failure • Pulmonary embolism • Deep vein thrombosis • Sepsis • Pneumonia • Miocardial infarction Results: “... higher sensitivity and lower specificity compared with patient safety indicators based on discharge coding.” “The promise of natural language processing ... may be closer than ever.” Other promising NLP accomplishments ... • Smoking status (Savova, Hazlehurst) • Peripheral arterial disease (Pathak) • Medication extraction (Uzuner) • Pneumonia (Chapman) • Colonoscopy quality metrics (Harkema) • Breast cancer recurrence (Carrell) • Colorectal cancer screening behavior (Denny) • Rheumatoid arthritis (Zeng) Overview • The promise of natural language processing (NLP) • Challenges of developing NLP in the clinical domain • Challenges in applying NLP in the clinical domain • Improving access to text through NLP resources NLP Success Fresh off its butt-kicking performance on Jeopardy!, IBM’s supercomputer "Watson" has enrolled in medical school at Columbia University,” New York Daily News February 18th 2011 Clinical NLP Since 1960’s Why has clinical NLP had little impact on clinical care? Barriers to Development • Sharing clinical data difficult – Have not had shared datasets for development and evaluation – Modules trained on general English not sufficient • Insufficient common conventions and standards for annotations – Data sets are unique to a lab – Not easily interchangeable • Limited collaboration – Clinical NLP applications silos and black boxes – Have not had open source applications • Reproducibility is formidable – Open source release not always sufficient – Software engineering quality not always great – Mechanisms for reproducing results are sparse Overview • The promise of natural language processing (NLP) • Challenges of developing NLP in the clinical domain • Challenges in applying NLP in the clinical domain • Improving access to text through NLP resources Security & Privacy Concerns • Clinical texts have many patient identifiers – 18 HIPAA identifiers • Names • Addresses • Items not regulated by HIPAA Institutions are reluctant to share data – tight end for the Steelers • Unique cases – 50s-year-old woman who is pregnant • Sensitive information – HIV status Lack of user-centered development and scalability – Perceived cost of applying NLP outweighs the perceived benefit (Len D’Avolio) Overview • The promise of natural language processing (NLP) • Challenges of developing NLP in the clinical domain • Challenges in applying NLP in the clinical domain • Improving access to text through NLP resources Access to Resources for Developing NLP Algorithms NLP Experts Clinicians & Researchers Informaticists Patients Resources for NLP Developers Knowledge Bases Clinical Data Annotations Annotation Environment Domain Schema Ontology Linguistic representation of clinical elements “Patient denies a family history of colon cancer” Evaluation Melissa Tharp Modifier Ontology Modifiers of clinical elements Disease: colon cancer Experiencer: family Negation: no Historical: yes Schema Ontology: Elements Schema Ontology: Relationships Modifier Ontology Modifiers are important for interpreting text – Chest radiograph confirms pneumonia – Family history of pneumonia – No evidence of pneumonia Affirmation/negation Uncertainty Allowable modifiers Experiencer For each clinical element Historical/Recent Severity Modifier Ontology Types of Linguistic modifiers expressions Actions Translations Schema Ontology Imports Modifier Ontology Medications – – – – Type Dose Frequency Route Diagnosis – – – – – Negation Uncertainty Severity History Experiencer Consistent with other models: Clinical element models, cTAKES type system, Common model Domain Ontology for NLP • Instance of schema ontology • Clinical elements from a particular domain Synonyms Misspellings Regular expression Resources for NLP Experts Schemas Lack of shareable data is a barrier • University of Pittsburgh Repository Clinical Data Annotations Annotation Environment Evaluation – 111,045 reports of 9 types – 600 users – No longer available • MT Samples – 2,300 reports from MTSamples.com – De-identified Resources for NLP Experts Schemas AMIA NLP Working Group ShARe - Sharing Annotated Resources 5R01GM090187: Chapman, Savova, Elhadad Clinical Data Annotations Annotation Environment Evaluation • 600 clinical notes from MIMIC II repository • Annotate disorders and modifiers – Anatomic location • Map to SNOMED codes • CLEF Shared Task 2013 and 2014 – https://sites.google.com/site/shareclefehealth/ B South, D Mowery, S Velupillai, L Christensen, S Meystre Resources for NLP Experts Schemas Distributed annotation in secure environment Annotator Registry Clinical Data Annotation Admin eHOST Annotations Annotation Environment Evaluation Web application iDASH cloud Client app VA, SHARP, and NIGMS : S Duvall, B South, B Adams, G Savova, N Elhadad, H Hochheiser Annotator Registry Annotators • Enlist for annotation • Certify for annotation tasks – Personal health information – Part-of-speech tagging – UMLS mapping • Set pay rate NLP Admins • Search for annotators http://nlp-ecosystem.ucsd.edu/annotators 1. Assign annotators to a task 2. Create a Schema 3. Assign users and set time expectations 4. Keep track of progress Resources for NLP Experts Schemas Distributed annotation in secure environment Annotator Registry Clinical Data Annotation Admin eHOST Annotations Annotation Environment Evaluation Web application iDASH cloud Client app Resources for NLP Experts Schemas • Compare output of NLP annotators • NLP system vs human annotation Clinical Data • View annotations • Calculate outcome measures Annotations Annotation Environment Evaluation • Drill down to all levels of annotation • Perform error analysis Document & annotations Outcome Measures for Selected Annotations Report List Attributes for Selected Annotation Select Classifications to View Relationships for Selected Annotation VA and ONC SHARP: Christensen, Murphy, Frabetti, Rodriguez, Savova Access to Information in Text NLP Experts Clinicians & Researchers Informaticists Patients Controlled Vocabs Dry cough Productive cough Cough Hacking cough Bloody cough User’s Concepts Cough Dyspnea Infiltrate on CXR Wheezing Fever Cervical Lymphadenopathy Attribute-values Temp 38.0C Low-grade temperature User’s Concepts Cough Dyspnea Infiltrate on CXR Wheezing Fever Cervical Lymphadenopathy Efficient Access to Information in the Patient Chart “Family history of colon cancer” Knowledge Author Schema Builder Chart Review Interface "x-ray pneumothorax"@en respiratorySyndrome "air in the pleural space on x-ray"@en Disease: colon cancer Experiencer: family Negation: no Historical: yes broader preferred label alternative label xRayPneumothorax data category "symptom" data category modified "chest_radiography" isAssociatedWithDisease definition pneumothoraxDX NLP Schema "xray pneumothorax"@en alternative label 2011-03-31 "Air between the lung and the chest wall seen on chest roentgenogram" Domain Ontology Knowledge Author • Front end interface for users • Back end – Schema ontology – Modifier ontology • Output – Domain ontology – Schema for NLP system B Scuba, F Fana, Liqin Wang, Mingyuan Zhang, Y Liu, M Kong, F Drews African American Adult Questions | Discussion wwchapman@ucsd.edu Ibuprofen Ibuprofen p.o. No family history of colon cancer Linguistic modifiers Calls Voogo synonym tool Access Information in Patient Chart Knowledge Author Chart Review Interfaces • Navigate patient data more efficiently • Point chart reviewer to ambiguous and contradictory information – Reduce bias Access Information in Patient Chart Knowledge Author Chart Review Interfaces EMR NLP Subjects, Diagnoses Findings, Anatomical Locations Viz Feedback – improve models Population Patient Document Expression User Identifies Patients Meeting Criteria Interactive Search and Review of Clinical Records with Multi-layered Semantic Annotation NLM 1R01LM010964-01. Chapman, Wiebe, Hwa. Population View Patient View Access to NLP Tools and Interfaces NLP Experts Clinicians & Researchers Informaticists Patients Access to NLP Tools v3NLP (Zeng, Divita) pyConText (Chapman) RapTat (Matheny, Gobbell) NLP Workbench Classifier Workbench NLP Platform Annotations Mix & Match Visualization Workbench KB User • Interact • Customize TextVect Select NLP Features NLP Workbench User Select Representation X N-grams Binary X UMLS Concepts X Count Part-of-speech tags Classifier Workbench tf-idf X Negation TextVect Visualization Workbench Yes No No NLP Tools Feature Selection Algorithms Training Set A Kumar, C Elkan, S Abdelrahman https://github.com/abhishek-kumar/TextVect Yes 1 0 0 0 1 1 1 No 0 0 1 1 0 0 0 No 0 0 0 1 0 1 0 Evaluation of TextVect CMC dataset Micro-FMeasure I2b2 dataset Micro-FMeasure Average 0.77 Baseline 0.71 Best Average 0.91 0.89 TextVect 0.82 Best 0.97 TextVect 0.95 Access to Visualizations of NLP Output NLP Workbench Classifier Workbench Visualization Workbench NLP System Annotations Visualizatio n workbench Timeline View Jianlin Shi, T Wang, E Shenvi, R El-Kareh, M Tharp, R Reeves Access to Understanding NLP Experts Clinicians & Researchers Informaticists Patients Access to Understanding Clinical Notes Chief Complaint: Hypoxic respiratory failure Major Surgical or Invasive Procedure: Intubation. History of Present Illness: 81 yo man w/ho CAD, COP, PVD, AAA xfered from OSH for mngmt resp failure. Pt was found @ home by EMS followign c/o [**05-29**] "crushing", nonradiating SSCP. Pt diaphoretic during transport. Sat 84->94% on NRB. Given ASA, NT, nebs en route to OSH where started on BIPAP and eventually intubated. BP on arrival 240/140 so started on NTG drip titrated up until BP fell to 90/58 resulting in IVF, dopamine. Given 80 IV lasix. First set enzymes negative and BNP 1700. Pt xferred for further management. • Definitions • Medical terms • Acronyms/abbreviations • Pictures • Internet sites • Biomedical literature • Normal range checking Conclusion • Collaborations for NLP improve ability to – Create potentially useful resources and tools • Provide access to – Resources for NLP development – Information in reports – NLP and visualization tools • Major challenge is applying NLP • Future need – More integration with other tools – More coordination Acknowledgments BLU Lab • • • • • • • • • • Collaborators Lee Christensen • Melissa Tharp • Mike Conway • Danielle Mowery • Bill Scuba • Milan Kovacevich • Dieter Hillert • Samir Abdelrahman • Leah Willis • Bob Angell • Harry Hochheiser Jan Wiebe Rebecca Hwa Guergana Savova Noemie Elhadad Michael Matheny Rob El-Kareh Ruth Reeves Qing Zeng Guy Divita • Frank Drews • • • • • • • • • • • Sumithra Vellupilai Maria Kvist Maria Skeppstedt Aron Henrikkson Brian Chapman David Carrell Sascha Dublin Zia Agha Stephane Meystre Scott DuVall Jianlin Shi Questions | Discussion wendy.chapman@utah.edu