dare-talk-160313221013.pptx

advertisement
Challenges in Understanding Clinical Notes:
Why NLP Engines Fall Short and Where
Background Knowledge Can Help
Sujan Perera, Amit Sheth, Krishnaprasad Thirunarayan
Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)
Wright State University
Suhas Nair, Neil Shah
ezDI LLC
Why is it necessary to understand clinical notes?
• 80% of patient data is unstructured1
• Structured data is incomplete and not accurate2,3
1goo.gl/abqYYn
2Strengths
and Limitations of CMS Administrative Data in Research
of clinical and administrative data sources for hospital coronary artery bypass graft
surgery report cards
3Comparison
Why is it necessary to understand clinical notes?
• Key indicators for decision making reside in patient notes
• facilities
• “Holter monitor was ordered by Lisa. She failed to get this
because she did not have transportation”
• non-compliance
• “Atrial fibrillation with poorly controlled ventricular rate due
to noncompliance.”
• financial status
• “The patient mentioned that Bystolic is expensive and cannot
afford it now.”
• family history
• “his father is hypertensive”
Why is it necessary to understand clinical notes?
• ICD10 adaptation – need to understand the relationships
E08 - Diabetes mellitus due to underlying condition
E08.0 - Diabetes mellitus due to underlying condition with hyperosmolarity
E08.00 - without nonketotic hyperglycemic-hyperosmolar coma (NKHHC)
E08.01 - with coma
E08.1 - Diabetes mellitus due to underlying condition with ketoacidosis
E08.10 - without coma
E08.11 – with coma
•
The underlying condition can be congenital rubella, Cushing's syndrome, cystic fibrosis,
malignant neoplasm, malnutrition, pancreatitis
Patient Data Distribution
Unstructured data
Structured data
Understanding Clinical
Notes
• What conditions patient has?
• What symptoms patient has?
• What medications with what dosage he/she
taking?
• What are the relationships between entities?
• What is patients medical history?
• What is his/her family history?
• What is patients behavior?
• What are his diet and discharge instructions?
Understanding Clinical Notes - Tasks
• Entity identification
•
she does not state that the reason for her stopping after two blocks is shortness of
breath or chest discomfort.
• Annotate with standard vocabulary
•
•
Shortness of breath – UMLS concept C0013404
Chest discomfort – UMLS concept C0235710
• Relationship identification
• Temporal information extraction
•
if her symptoms do not change at all, she will go back on the lipitor after two weeks.
• Negation detection
•
he has not required any nitroglycerin.
• Certainty detection
•
She is not sure if she is just depressed or not.
• Conditioning detection
•
if he experiences any chest pain, shortness of breath with exertion or dizziness or
syncopal episodes to let us know.
Understanding Linguistic Constructs
• Rule based algorithms are popular
• simplicity
• low computational cost
• Maintains a dictionary of words indicate particular language construct
• Negation – no, not, deny, cannot, don’t etc
• Simple rules deciding applicability
• <negation phrase> * <entity>
• <entity> * <negation phrase>
• Does not associate with the correct phrase of the sentence
• Lead to incorrect output
•
•
he did not make the increase on his metoprolol.
his weight has not changed and that his edema is primarily at the end
of the day.
• Sometimes even associating to the correct phrase is not enough
•
•
“I do not have an explanation for this dyspnea.”
“there was no evidence of ischemia."
• Common to other constructs (certainty and conditioning)
Leads to Conflicting Instances
• Failure to identify such linguistic constructs leads to conflicting instances
• Document 1:
• Coronary artery disease listed in the current diagnosis list
• “Send for carotid duplex to rule out carotid artery stenosis
given his risk factors and underlying coronary artery disease.“
• Document 2:
• “Extremities : Warm and dry. No clubbing or cyanosis. No
lower extremity edema.“
• “I have advised the patient on the side effect of potential
lower extremity edema.“
• Document 3
•
•
“He is not having any symptoms of chest pain or exertional syncope or
dizziness.”
“I advised him that if he experiences chest pain, shortness of breath with
exertion or dizziness or syncopal episodes to let us know and we can do
appropriate workup.”
• 620 instances within 3172 documents
Solution
• Our method attempts to resolve the conflicts by understanding other
observations by leveraging coded domain knowledge
Symptoms
Medication
Syncope
Atrial Fibrillation
Warfarin
Medication
Atenolol
Is_symptom_of
Is_medication_for
Medication
Aspirin
• We used only medication information because extraction algorithms
performs well with its list structure.
Knowledge Base
Domains
Concepts
1008161
Cardiology
Problems(diseases, symptoms)
125778
Orthopedics
Procedures
262360
Oncology
Medicines
298993
Neurology
Medical Devices
33124
Etc…
Relationships
77261
is treated with (disease -> medication)
41182
is relevant procedure (procedure -> disease)
3352
is symptom of (symptom -> disease)
8299
contraindicated drug (medication -> disease)
24428
Knowledge Base
High
Cholesterol
Statin
simvastatin
zocor
Pravastatin
vytorin
Pravachol
Is_medication_for
Type_of
• We use medication hierarchy to find the relationships
Evaluation
• 25 Documents
• 32 conflicting instances
Predicted Class
Actual Class
Positive
Negative
Positive
18
6
Negative
3
5
Accuracy =
TP + TN
TP + TN + FP + FN
= 71.87%
Observations and Insights
• False Negatives are for common symptoms
• headache, obesity, shortness of breath
• Doctors may not prescribe medications for these symptoms
• False Positives based on common medications
• Aspirin
• Conflicts on major conditions can resolve accurately
• Coronary artery disease, Atrial fibrillation, Peripheral vascular
disease, Ischemia, Cardiomyopathy etc
• Patient should take medications for these conditions
Observations and Insights
• The evidences should be ranked
• Metoprolol is strong evidence for hypertension than aspirin
• Insulin is strong evidence for diabetics
• More sophisticated evidence aggregation method should be
used
• Rule-based
• Probabilistic method
Beyond Conflicts
• Populate relationships among the entities
• Which medications are associated with which condition
• Derive implicit information in the patient notes
• Patient notes can be incomplete
• Domain experts read the note and can understand beyond
what is written there
• This insights are important for prediction algorithms
• Knowledge driven inferencing can be used to fill this gap
Thank You
Perera, Sujan, Amit Sheth, Krishnaprasad Thirunarayan, Suhas Nair, and Neil
Shah. "Challenges in understanding clinical notes: Why nlp engines fall short and
where background knowledge can help." In Proceedings of the 2013
international workshop on Data management & analytics for healthcare, pp. 2126. ACM, 2013. PDF
Download