Clinical Natural Language Processing

Computational Intelligence in Biomedical and Health Care Informatics HCA 590 (Topics in Health Sciences) Rohit Kate Clinical Natural Language Processing 1 Reading Paper: What can natural language processing do for clinical decision support? Dina Demner-Fushman, Wendy Chapman, Clement McDonald Journal of Biomedical Informatics 42 (2009) 760-772 Paper: 2010 i2b2/VA Challenge on Concepts, Assertions, and Relations in Clinical Text Uzuner Ö., South B., Shen S., DuVall S. Journal of the American Medical Informatics Association 2011;18(5):552-556 Clinical Decision Support Systems • A clinical decision support (CDS) system is any computer program designed to help healthcare professionals to make clinical decisions or present them with patient-specific assessments and recommendations – – – – – – Suggest diagnosis and medications Trigger reminders Flag abnormal values Alert about drug interactions Remind the user of overlooked diagnoses Provide advice based on patient-specific data CDS Systems and Narrative Text • Such computer based support is much more effective if the computer system has access to electronic medical records (EMRs) and has the ability to process them • Major portion of patient records, including radiology reports, operative notes, discharge summaries etc. are recorded as narrative text (dictated, transcribed or directly entered) in a natural language such as English • Facts that should activate a CDS system are often found only in free text CDS Systems and NLP • Much of the data that could support CDS is textual and therefore cannot be leveraged by a CDS system without natural language processing (NLP) • NLP to be used for CDS needs to be: – Reliable – High-quality – Modular and flexible – Fast Active and Passive CDS with NLP • Active: System leverages available information and pushes patient-specific information to the user • Passive: The users themselves seek the support • Users: Depending upon the application, besides clinicians users could be patients, researchers, administrators, students, and coders • Besides free text in the clinical records, NLP for CDS could also be processing biomedical literature, web pages etc. Active and Passive CDS with NLP Figure from the paper: Users Example of an Idealized NLP-CDS System • It would monitor EMR for insertions of new data • When free text is entered for example “Right lower lung opacity, which could be contusion or pneumonia”, NLP system will kick in to analyze it • NLP system will extract information that the disorder could be “pulmonary contusion” or “pneumonia”, this will go to the CDS system • The CDS system will look up decision rules for suspected pneumonia and retrieve results of blood test and evaluate white blood cell count • If the count is high, the system will suggest as a reminder message (may be in natural language) that the patient is more likely to have pneumonia than pulmonary contusion and why Example of an Idealized NLP-CDS System • Idealizing further, the NLP system may look up medical literature and solicit more information and present natural language summaries – For example, present summaries of best approaches to manage both disorders – May look up medical publications, guidelines, actionable recommendations available in free text • This idealized system will have to deal with all the challenges of clinical NLP Challenges of Clinical Language Processing • Good Performance – Performance should be good enough to be used for clinical applications, should not be significantly worse than the medical experts – System should have flexibility to trade-off precision and recall • Recovery of Implicit Information – NLP system should contain enough medical knowledge to make appropriate inferences – “rupture” means “rupture of membranes” – “patchy opacity” and “focal infiltrate” may indicate “pneumonia” Challenges of Clinical Language Processing • Interoperability: NLP system should seamlessly integrate into clinical information systems – Many different interchange formats (e.g. HL7) – Different types of reports with different formats, text may contain tables, structured fields etc. – Output of NLP system should be mapped to appropriate controlled vocabulary, e.g. UMLS, SNOMED or ICD Challenges of Clinical Language Processing • Training set availability – Patient records are confidential and requires approval of institutional review board (IRB) – There are methods to de-identify names etc. but identifying names etc. is not easy – These issues do not arise when processing literature • Limited availability in electronic form – Many clinical documents are still written on paper – Optical Character Recognition (OCR) is not accurate especially with physicians’ notes Challenges of Clinical Language Processing • Expressiveness – More than 200 different expressions for severity information: faint, mild, borderline, 3rd degree, mild to moderate etc. – Complex modifiers: “no improvement in pneumonia” in text will match a query “improvement in pneumonia” • A lot of abbreviations which could be ambiguous – pvc may mean pulmonary vascular congestion in chest Xray report and premature ventricular complexes in electrocardigram report Challenges of Clinical Language Processing • Compactness of text – Very compact containing many abbreviations – Sentence boundaries poorly delineated Admit 10/23 71 yo woman h/o DM, HTN, Dilated CM/CHF, Afib s/p embolic event, chronic diarrhae, admitted with SOB. • Rare events – Medical errors and adverse events are not reported frequently, difficult to train a system to detect them Shared Tasks in Clinical Language Processing • Evaluation – Difficult to obtain gold-standard data, time-consuming for medical experts to annotate data – Evaluation competitions or Shared Tasks are very useful, they help compare different systems on the same data • i2b2 shared tasks 2008-2012: – – – – – https://www.i2b2.org/NLP/Obesity/ https://www.i2b2.org/NLP/Medication/ https://www.i2b2.org/NLP/Relations/ https://www.i2b2.org/NLP/Coreference/ https://www.i2b2.org/NLP/TemporalRelations/ • ShARe/CLEF eHealth 2013-2014: – https://sites.google.com/site/shareclefehealth/ – http://clefehealth2014.dcu.ie/ • SemEval 2014 Task 7- Analysis of Clinical Text: – http://alt.qcri.org/semeval2014/task7/ • TREC Medical Records task i2b2 2010: Concepts • Concepts: – Medical Problems – Treatments – Tests • System input: raw text of medical records • System output: A plain text file that contains entries of the form: c=“concept text” offset || t=“concept type” (offset indicates line and token numbers of the document) For example: – c=“cancer” 5:8 5:8 || t=“problem” – c=“chemotherapy” 5:4 5:4 || t=“treatment” – c=“chest x-ray” 6:12 6:13 || t=“test” 16 i2b2 2010: Assertions • Assertions (attributes of medical problems): – – – – – – Present Absent Possible Conditional Hypothetical Not associated with the patient • System input: raw text of medical records and given concepts • System output: Assertions on all problem concepts (and only problem concepts) c=“concept text” offset || t=“concept type” || a=“assertion value” For example: – c=“hypertension” 5:4 5:4 || t=“problem” || a=“absent” – c=“diabetes” 6:12 6:12 || t=“problem” || a=“possible” 17 i2b2 2010: Relations • Extract the relations that exist between the concepts: – medical problems and treatments • 6 possible relations – medical problems and tests • 3 possible relations – medical problems and other medical problems • 2 possible relations • System input: raw text medical records with given concepts and assertions (optional) • System output: relations of pairs of concepts in the following format: – c="a cardiac catheterization" 9:12 9:14 || r="TeCP" || c="chest pain" 9:5 9:6 – c="a cardiac catheterization" 9:12 9:14 || r="TeRP" || c="an occluded right coronary artery" 9:23 9:27 – c="a cardiac catheterization" 9:12 9:14 || r="TeRP" || c="a 40-50% proximal stenosis" 9:29 9:32 18 i2b2 2010: Data • 349 Training reports – – – – 97 discharge summaries from Partners 73 discharge summaries from Beth-Israel Deaconess Medical Center 98 Discharge summaries from University of Pittsburgh Medical Center 81 progress notes from University of Pittsburgh Medical Center • 477 Test reports – 133 discharge summaries from Partners – 123 discharge summaries from Beth-Israel Deaconess Medical Center – 102 Discharge summaries from University of Pittsburgh Medical Center – 119 progress notes from University of Pittsburgh Medical Center 19 i2b2 2010: Best Results • Total 41 teams participated (22 for concepts, 21 for assertions and 16 for relations) Best F-measures (harmonic mean of precision & recall): • Concepts: 85% F-measure • Assertions: 92.6% F-measure • Relations: 73.7% F-measure 20

Clinical Natural Language Processing

Related documents

Products

Support

Clinical Natural Language Processing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib