sem-talk-160313215550.pptx

advertisement
Implicit Entity Recognition in Clinical
Documents
Sujan Perera1, Pablo Mendes2, Amit Sheth1, Krishnaprasad
Thirunarayan1, Adarsh Alex1, Christopher Heid3, Greg Mott3
1Kno.e.sis
Center, Wright State University, 2IBM Research, San Jose,
3Boonshoft School of Medicine, Wight State University
1
Example
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showed accumulation of fluid in his extremities. He does not have
any chest pain.”
Implicit Entity Recognition in Clinical Documents
2
Example
Person
Person
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showed accumulation of fluid in his extremities. He does not have
any chest pain.”
Named Entity Recognition
Implicit Entity Recognition in Clinical Documents
3
Example
Person
Person
C0018795
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showedC0015672
accumulation of fluid in his extremities. He does not have
any chest pain.”
C0008031
Named Entity Recognition
Entity Linking
Implicit Entity Recognition in Clinical Documents
4
Example
Person
Person
C0018795
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showedC0015672
accumulation of fluid in his extremities. He does not have
any chest pain.”
C0008031
Named Entity Recognition
Entity Linking
Co-reference Resolution
Implicit Entity Recognition in Clinical Documents
5
Example
Person
Person
C0018795
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showedC0015672
accumulation of fluid in his extremities. He does not have
any chest pain.”
C0008031
Named Entity Recognition
Entity Linking
Co-reference Resolution
Negation Detection
Implicit Entity Recognition in Clinical Documents
6
Example
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showed accumulation of fluid in his extremities. He does not have
any chest pain.”
Implicit Entity Recognition in Clinical Documents
7
Example
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he
started to have left shoulder twinges and tingling in his hands. A stress test
Shortness of for
Breath
done on 2013-06-02 revealed that the patient exercised
6 (NEG)
1/2 minutes,
stopped due to fatigue. However, Mr. Smith is comfortably breathing in room
air. He also showed accumulation of fluid in his extremities. He does not have
Edema
any chest pain.”
Implicit Entity Recognition
Implicit Entity Recognition in Clinical Documents
8
More Examples
Sentence
Entity
“Rounded calcific density in right upper quadrant likely representing
a gallstone within the neck of the gallbladder.”
Cholecystitis
“His tip of the appendix is inflamed.”
Appendicitis
“The respirations were unlabored and there were no use of
accessory muscles.”
“She was walking outside on her driveway and suddenly fell
unconcious, with no prodrome, or symptoms preceding the event.”
“This is important to prevent shortness of breath and lower
extremity swelling from fluid accumulation.”
Implicit Entity Recognition in Clinical Documents
Shortness of
breath (NEG)
Syncope
Edema
9
Implicit Entity Recognition
Implicit Entity Recognition (IER) is the task of
determining whether a sentence has a reference
to an entity, even though it does not mention
that entity by its name.
Implicit Entity Recognition in Clinical Documents
10
Automation of clinical documents
• New healthcare policies : automation required
Readmission
Prediction
Assisting
Professionals
CAC
CDI
• State-of-art approaches focus on explicit mentions.
• The overall understanding about the patients record needs:
• Explicit/implicit facts
• Domain knowledge
• Some conditions are frequently mentioned implicitly.
• 40% of shortness of breath mentions.
• 35% of edema mentions.
Implicit Entity Recognition in Clinical Documents
11
What is involved in solving the problem?
Shortness of breath (NEG)
“At the time of discharge she was breathing comfortably with a respiratory
rate of 12 to 15 breaths per minute.”
“Rounded calcific density in right upper quadrant likely representing a
gallstone within the neck of the gallbladder.”
Cholecystitis (POS)
• Language understanding.
Term ‘comfortable’ is the antonym of ‘uncomfortable’
• Domain knowledge.
‘gallstones blocking the tube leading out of your gallbladder
cause cholecystitis’
Implicit Entity Recognition in Clinical Documents
12
Our Solution
Similarity Calculation
Entity Model Creation
Annotations
Entity Representative Term Selection
Candidate Sentence
Selection
Candidate Sentence
Pruning
Implicit Entity Recognition in Clinical Documents
ERT Selection
• The knowledge base consists of definitions of the entities.
• Entity Representative Terms may indicate the implicit mentions of
the entities.
breathing
fluid
gallstone
Shortness of breath
edema
cholecystitis
• The representative power of a term for an entity is calculated by its
TF-IDF value.
rt is the representative power of the term t for the entity e, freq(t,Qe) is the
frequency of the term t in the definitions of e, E is the total number of
entities, Et is the number of entities defined using term t.
Implicit Entity Recognition in Clinical Documents
Entity Model
• Entity Indicator.
• Entity Indicator consists of the terms that describe features of
the entity in the definition.
• E.g., ‘A disorder characterized by an uncomfortable sensation of
difficulty breathing’ – {uncomfortable, sensation, difficulty,
breathing}.
• Entity Model – collection of entity indicators
Entity Model
Entity Indicator1
Entity Indicator2
Entity Indicator3
Implicit Entity Recognition in Clinical Documents
16
Candidate Sentence Selection & Pruning
• Candidate sentences – sentences with ERT.
• Candidate sentences are pruned to remove the noise.
• Selected nouns, verbs, adjectives and adverbs within the fixed
window size from the ERT of the sentence.
“His propofol was increased and he was allowed to wake up a second time
later on the evening of surgery and was ultimately weaned from mechanical
ventilation and successfully extubated at about 09:30 that evening.”
pruning
{weaned, mechanical, ventilation, successfully, extubated}
Implicit Entity Recognition in Clinical Documents
17
Similarity Calculation
• The similarity between entity model and pruned candidate sentence
is calculated to annotate the sentence.
• The syntactic diversity of the words and the negated mentions need
special attention.
• Multiple similarity measures are used.
t1 and t2 are the words, M is set of similarity measures. M = {WUP, LCH, LIN,
JCN, Word2Vec, Levenshtein}
Implicit Entity Recognition in Clinical Documents
Similarity Calculation
• The similarity between entity model and the pruned sentence is
calculated by weighting the maximum similarity of each word in the
entity model by its representative power.
e – entity indicator
s – pruned sentence
α(te, s) – determines if term t in e
is antonym of any term in s.
Implicit Entity Recognition in Clinical Documents
Similarity Calculation
• The similarity between entity model and the pruned sentence is
calculated by weighting the maximum similarity of each word in the
entity model by its representative power.
e – entity indicator
s – pruned sentence
α(te, s) – determines if term t in e
is antonym of any term in s.
f(te, s) – calculates the similarity
of term in e with the terms in
sentence.
Implicit Entity Recognition in Clinical Documents
Similarity Calculation
• The similarity between entity model and the pruned sentence is
calculated by weighting the maximum similarity of each word in the
entity model by its representative power.
e – entity indicator
s – pruned sentence
α(te, s) – determines if term t in e
is antonym of any term in s.
f(te, s) – calculates the similarity
of term in e with the terms in
sentence.
sim(e, s) – measures the similarity
between entity indicator and the
pruned sentence.
Implicit Entity Recognition in Clinical Documents
Dataset
• Used the dataset used by SemEval-2014 task 7.
• 857 sentences selected for 8 entities.
• The entities are selected based on the frequency of their
appearance and feedback from domain experts.
• Annotated by three domain experts.
• Annotation agreement 0.58.
Implicit Entity Recognition in Clinical Documents
Dataset
Entity
Positive
Assertions
Negative
Assertions
None
Shortness of Breath
93
94
29
Edema
115
35
81
Syncope
96
92
24
Cholecystitis
78
36
4
Gastrointestinal Gas
18
14
5
Colitis
12
11
0
Cellulitis
8
2
0
Fasciitis
7
3
0
Implicit Entity Recognition in Clinical Documents
Evaluation
• Baselines
• MCS algorithm (Mihalcea 2006)
• SVM (trained on n-grams)
• Evaluation metrics
• Positive Precision and recall
• Negative Precision and recall
• 70% training and 30% testing
• Threshold selection for our algorithm and MCS
• Thresholds were selected based on the annotation performance
in the training dataset
Implicit Entity Recognition in Clinical Documents
Annotation Performance
Method
PP
PR
PF1
NP
NR
NF1
Our
0.66
0.87
0.75
0.73
0.73
0.73
MCS
0.50
0.93
0.65
0.31
0.76
0.44
SVM
0.73
0.82
0.77
0.66
0.67
0.67
• Our algorithm outperforms baselines in negative category.
• SVM is able to leverage the supervision to beat our algorithm in
positive category.
Implicit Entity Recognition in Clinical Documents
Annotation Performance
• The similarity value of our algorithm as a feature to the SVM.
• This proves our similarity value can be used as an effective feature
with a supervised approach.
Method
PP
PR
PF1
NP
NR
NF1
SVM
0.73
0.82
0.77
0.66
0.67
0.67
SVM+MCS
0.73
0.82
0.77
0.66
0.66
0.66
SVM+Our
0.77
0.85
0.81
0.72
0.75
0.73
Implicit Entity Recognition in Clinical Documents
Annotation Performance with
varying training dataset size
Positive Assertions
Negative Assertions
Implicit Entity Recognition in Clinical Documents
Limitations
• The approach misses the implicit mentions of entities with no ERT.
• Implicit mentions of shortness of breath without the term
‘breathing’
• “The patient had low oxygen saturation”
• “The patient was gasping for air”
• “Patient was air hunger”
• 113 instances vs 8990 instances
Implicit Entity Recognition in Clinical Documents
Conclusion
• Introduced the problem of implicit entity recognition in clinical
documents.
• Developed a unsupervised approach and showed that it
outperforms supervised approach.
• Proved that supervised approach can use our similarity value as a
feature to reduce labeling cost and to improve the performance.
Thank You
Sujan Perera, Pablo Mendes, Amit Sheth, Krishnaprasad Thirunarayan, Adarsh Alex, Christopher
Heid, Greg Mott, 'Implicit Entity Recognition in Clinical Documents', In proceedings of The Fourth
Joint Conference on Lexical and Computational Semantics (*SEM), 2015, PDF
http://knoesis.org/researchers/sujan/
Implicit Entity Recognition in Clinical Documents
Download