Challenges in Understanding Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can Help Sujan Perera, Amit Sheth, Krishnaprasad Thirunarayan Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) Wright State University Suhas Nair, Neil Shah ezDI LLC Why is it necessary to understand clinical notes? • 80% of patient data is unstructured1 • Structured data is incomplete and not accurate2,3 1goo.gl/abqYYn 2Strengths and Limitations of CMS Administrative Data in Research of clinical and administrative data sources for hospital coronary artery bypass graft surgery report cards 3Comparison Why is it necessary to understand clinical notes? • Key indicators for decision making reside in patient notes • facilities • “Holter monitor was ordered by Lisa. She failed to get this because she did not have transportation” • non-compliance • “Atrial fibrillation with poorly controlled ventricular rate due to noncompliance.” • financial status • “The patient mentioned that Bystolic is expensive and cannot afford it now.” • family history • “his father is hypertensive” Why is it necessary to understand clinical notes? • ICD10 adaptation – need to understand the relationships E08 - Diabetes mellitus due to underlying condition E08.0 - Diabetes mellitus due to underlying condition with hyperosmolarity E08.00 - without nonketotic hyperglycemic-hyperosmolar coma (NKHHC) E08.01 - with coma E08.1 - Diabetes mellitus due to underlying condition with ketoacidosis E08.10 - without coma E08.11 – with coma • The underlying condition can be congenital rubella, Cushing's syndrome, cystic fibrosis, malignant neoplasm, malnutrition, pancreatitis Patient Data Distribution Unstructured data Structured data Understanding Clinical Notes • What conditions patient has? • What symptoms patient has? • What medications with what dosage he/she taking? • What are the relationships between entities? • What is patients medical history? • What is his/her family history? • What is patients behavior? • What are his diet and discharge instructions? Understanding Clinical Notes - Tasks • Entity identification • she does not state that the reason for her stopping after two blocks is shortness of breath or chest discomfort. • Annotate with standard vocabulary • • Shortness of breath – UMLS concept C0013404 Chest discomfort – UMLS concept C0235710 • Relationship identification • Temporal information extraction • if her symptoms do not change at all, she will go back on the lipitor after two weeks. • Negation detection • he has not required any nitroglycerin. • Certainty detection • She is not sure if she is just depressed or not. • Conditioning detection • if he experiences any chest pain, shortness of breath with exertion or dizziness or syncopal episodes to let us know. Understanding Linguistic Constructs • Rule based algorithms are popular • simplicity • low computational cost • Maintains a dictionary of words indicate particular language construct • Negation – no, not, deny, cannot, don’t etc • Simple rules deciding applicability • <negation phrase> * <entity> • <entity> * <negation phrase> • Does not associate with the correct phrase of the sentence • Lead to incorrect output • • he did not make the increase on his metoprolol. his weight has not changed and that his edema is primarily at the end of the day. • Sometimes even associating to the correct phrase is not enough • • “I do not have an explanation for this dyspnea.” “there was no evidence of ischemia." • Common to other constructs (certainty and conditioning) Leads to Conflicting Instances • Failure to identify such linguistic constructs leads to conflicting instances • Document 1: • Coronary artery disease listed in the current diagnosis list • “Send for carotid duplex to rule out carotid artery stenosis given his risk factors and underlying coronary artery disease.“ • Document 2: • “Extremities : Warm and dry. No clubbing or cyanosis. No lower extremity edema.“ • “I have advised the patient on the side effect of potential lower extremity edema.“ • Document 3 • • “He is not having any symptoms of chest pain or exertional syncope or dizziness.” “I advised him that if he experiences chest pain, shortness of breath with exertion or dizziness or syncopal episodes to let us know and we can do appropriate workup.” • 620 instances within 3172 documents Solution • Our method attempts to resolve the conflicts by understanding other observations by leveraging coded domain knowledge Symptoms Medication Syncope Atrial Fibrillation Warfarin Medication Atenolol Is_symptom_of Is_medication_for Medication Aspirin • We used only medication information because extraction algorithms performs well with its list structure. Knowledge Base Domains Concepts 1008161 Cardiology Problems(diseases, symptoms) 125778 Orthopedics Procedures 262360 Oncology Medicines 298993 Neurology Medical Devices 33124 Etc… Relationships 77261 is treated with (disease -> medication) 41182 is relevant procedure (procedure -> disease) 3352 is symptom of (symptom -> disease) 8299 contraindicated drug (medication -> disease) 24428 Knowledge Base High Cholesterol Statin simvastatin zocor Pravastatin vytorin Pravachol Is_medication_for Type_of • We use medication hierarchy to find the relationships Evaluation • 25 Documents • 32 conflicting instances Predicted Class Actual Class Positive Negative Positive 18 6 Negative 3 5 Accuracy = TP + TN TP + TN + FP + FN = 71.87% Observations and Insights • False Negatives are for common symptoms • headache, obesity, shortness of breath • Doctors may not prescribe medications for these symptoms • False Positives based on common medications • Aspirin • Conflicts on major conditions can resolve accurately • Coronary artery disease, Atrial fibrillation, Peripheral vascular disease, Ischemia, Cardiomyopathy etc • Patient should take medications for these conditions Observations and Insights • The evidences should be ranked • Metoprolol is strong evidence for hypertension than aspirin • Insulin is strong evidence for diabetics • More sophisticated evidence aggregation method should be used • Rule-based • Probabilistic method Beyond Conflicts • Populate relationships among the entities • Which medications are associated with which condition • Derive implicit information in the patient notes • Patient notes can be incomplete • Domain experts read the note and can understand beyond what is written there • This insights are important for prediction algorithms • Knowledge driven inferencing can be used to fill this gap Thank You Perera, Sujan, Amit Sheth, Krishnaprasad Thirunarayan, Suhas Nair, and Neil Shah. "Challenges in understanding clinical notes: Why nlp engines fall short and where background knowledge can help." In Proceedings of the 2013 international workshop on Data management & analytics for healthcare, pp. 2126. ACM, 2013. PDF