Implicit Entity Recognition in Clinical Documents Sujan Perera1, Pablo Mendes2, Amit Sheth1, Krishnaprasad Thirunarayan1, Adarsh Alex1, Christopher Heid3, Greg Mott3 1Kno.e.sis Center, Wright State University, 2IBM Research, San Jose, 3Boonshoft School of Medicine, Wight State University 1 Example “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed accumulation of fluid in his extremities. He does not have any chest pain.” Implicit Entity Recognition in Clinical Documents 2 Example Person Person “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed accumulation of fluid in his extremities. He does not have any chest pain.” Named Entity Recognition Implicit Entity Recognition in Clinical Documents 3 Example Person Person C0018795 “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showedC0015672 accumulation of fluid in his extremities. He does not have any chest pain.” C0008031 Named Entity Recognition Entity Linking Implicit Entity Recognition in Clinical Documents 4 Example Person Person C0018795 “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showedC0015672 accumulation of fluid in his extremities. He does not have any chest pain.” C0008031 Named Entity Recognition Entity Linking Co-reference Resolution Implicit Entity Recognition in Clinical Documents 5 Example Person Person C0018795 “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showedC0015672 accumulation of fluid in his extremities. He does not have any chest pain.” C0008031 Named Entity Recognition Entity Linking Co-reference Resolution Negation Detection Implicit Entity Recognition in Clinical Documents 6 Example “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed accumulation of fluid in his extremities. He does not have any chest pain.” Implicit Entity Recognition in Clinical Documents 7 Example “Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test Shortness of for Breath done on 2013-06-02 revealed that the patient exercised 6 (NEG) 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed accumulation of fluid in his extremities. He does not have Edema any chest pain.” Implicit Entity Recognition Implicit Entity Recognition in Clinical Documents 8 More Examples Sentence Entity “Rounded calcific density in right upper quadrant likely representing a gallstone within the neck of the gallbladder.” Cholecystitis “His tip of the appendix is inflamed.” Appendicitis “The respirations were unlabored and there were no use of accessory muscles.” “She was walking outside on her driveway and suddenly fell unconcious, with no prodrome, or symptoms preceding the event.” “This is important to prevent shortness of breath and lower extremity swelling from fluid accumulation.” Implicit Entity Recognition in Clinical Documents Shortness of breath (NEG) Syncope Edema 9 Implicit Entity Recognition Implicit Entity Recognition (IER) is the task of determining whether a sentence has a reference to an entity, even though it does not mention that entity by its name. Implicit Entity Recognition in Clinical Documents 10 Automation of clinical documents • New healthcare policies : automation required Readmission Prediction Assisting Professionals CAC CDI • State-of-art approaches focus on explicit mentions. • The overall understanding about the patients record needs: • Explicit/implicit facts • Domain knowledge • Some conditions are frequently mentioned implicitly. • 40% of shortness of breath mentions. • 35% of edema mentions. Implicit Entity Recognition in Clinical Documents 11 What is involved in solving the problem? Shortness of breath (NEG) “At the time of discharge she was breathing comfortably with a respiratory rate of 12 to 15 breaths per minute.” “Rounded calcific density in right upper quadrant likely representing a gallstone within the neck of the gallbladder.” Cholecystitis (POS) • Language understanding. Term ‘comfortable’ is the antonym of ‘uncomfortable’ • Domain knowledge. ‘gallstones blocking the tube leading out of your gallbladder cause cholecystitis’ Implicit Entity Recognition in Clinical Documents 12 Our Solution Similarity Calculation Entity Model Creation Annotations Entity Representative Term Selection Candidate Sentence Selection Candidate Sentence Pruning Implicit Entity Recognition in Clinical Documents ERT Selection • The knowledge base consists of definitions of the entities. • Entity Representative Terms may indicate the implicit mentions of the entities. breathing fluid gallstone Shortness of breath edema cholecystitis • The representative power of a term for an entity is calculated by its TF-IDF value. rt is the representative power of the term t for the entity e, freq(t,Qe) is the frequency of the term t in the definitions of e, E is the total number of entities, Et is the number of entities defined using term t. Implicit Entity Recognition in Clinical Documents Entity Model • Entity Indicator. • Entity Indicator consists of the terms that describe features of the entity in the definition. • E.g., ‘A disorder characterized by an uncomfortable sensation of difficulty breathing’ – {uncomfortable, sensation, difficulty, breathing}. • Entity Model – collection of entity indicators Entity Model Entity Indicator1 Entity Indicator2 Entity Indicator3 Implicit Entity Recognition in Clinical Documents 16 Candidate Sentence Selection & Pruning • Candidate sentences – sentences with ERT. • Candidate sentences are pruned to remove the noise. • Selected nouns, verbs, adjectives and adverbs within the fixed window size from the ERT of the sentence. “His propofol was increased and he was allowed to wake up a second time later on the evening of surgery and was ultimately weaned from mechanical ventilation and successfully extubated at about 09:30 that evening.” pruning {weaned, mechanical, ventilation, successfully, extubated} Implicit Entity Recognition in Clinical Documents 17 Similarity Calculation • The similarity between entity model and pruned candidate sentence is calculated to annotate the sentence. • The syntactic diversity of the words and the negated mentions need special attention. • Multiple similarity measures are used. t1 and t2 are the words, M is set of similarity measures. M = {WUP, LCH, LIN, JCN, Word2Vec, Levenshtein} Implicit Entity Recognition in Clinical Documents Similarity Calculation • The similarity between entity model and the pruned sentence is calculated by weighting the maximum similarity of each word in the entity model by its representative power. e – entity indicator s – pruned sentence α(te, s) – determines if term t in e is antonym of any term in s. Implicit Entity Recognition in Clinical Documents Similarity Calculation • The similarity between entity model and the pruned sentence is calculated by weighting the maximum similarity of each word in the entity model by its representative power. e – entity indicator s – pruned sentence α(te, s) – determines if term t in e is antonym of any term in s. f(te, s) – calculates the similarity of term in e with the terms in sentence. Implicit Entity Recognition in Clinical Documents Similarity Calculation • The similarity between entity model and the pruned sentence is calculated by weighting the maximum similarity of each word in the entity model by its representative power. e – entity indicator s – pruned sentence α(te, s) – determines if term t in e is antonym of any term in s. f(te, s) – calculates the similarity of term in e with the terms in sentence. sim(e, s) – measures the similarity between entity indicator and the pruned sentence. Implicit Entity Recognition in Clinical Documents Dataset • Used the dataset used by SemEval-2014 task 7. • 857 sentences selected for 8 entities. • The entities are selected based on the frequency of their appearance and feedback from domain experts. • Annotated by three domain experts. • Annotation agreement 0.58. Implicit Entity Recognition in Clinical Documents Dataset Entity Positive Assertions Negative Assertions None Shortness of Breath 93 94 29 Edema 115 35 81 Syncope 96 92 24 Cholecystitis 78 36 4 Gastrointestinal Gas 18 14 5 Colitis 12 11 0 Cellulitis 8 2 0 Fasciitis 7 3 0 Implicit Entity Recognition in Clinical Documents Evaluation • Baselines • MCS algorithm (Mihalcea 2006) • SVM (trained on n-grams) • Evaluation metrics • Positive Precision and recall • Negative Precision and recall • 70% training and 30% testing • Threshold selection for our algorithm and MCS • Thresholds were selected based on the annotation performance in the training dataset Implicit Entity Recognition in Clinical Documents Annotation Performance Method PP PR PF1 NP NR NF1 Our 0.66 0.87 0.75 0.73 0.73 0.73 MCS 0.50 0.93 0.65 0.31 0.76 0.44 SVM 0.73 0.82 0.77 0.66 0.67 0.67 • Our algorithm outperforms baselines in negative category. • SVM is able to leverage the supervision to beat our algorithm in positive category. Implicit Entity Recognition in Clinical Documents Annotation Performance • The similarity value of our algorithm as a feature to the SVM. • This proves our similarity value can be used as an effective feature with a supervised approach. Method PP PR PF1 NP NR NF1 SVM 0.73 0.82 0.77 0.66 0.67 0.67 SVM+MCS 0.73 0.82 0.77 0.66 0.66 0.66 SVM+Our 0.77 0.85 0.81 0.72 0.75 0.73 Implicit Entity Recognition in Clinical Documents Annotation Performance with varying training dataset size Positive Assertions Negative Assertions Implicit Entity Recognition in Clinical Documents Limitations • The approach misses the implicit mentions of entities with no ERT. • Implicit mentions of shortness of breath without the term ‘breathing’ • “The patient had low oxygen saturation” • “The patient was gasping for air” • “Patient was air hunger” • 113 instances vs 8990 instances Implicit Entity Recognition in Clinical Documents Conclusion • Introduced the problem of implicit entity recognition in clinical documents. • Developed a unsupervised approach and showed that it outperforms supervised approach. • Proved that supervised approach can use our similarity value as a feature to reduce labeling cost and to improve the performance. Thank You Sujan Perera, Pablo Mendes, Amit Sheth, Krishnaprasad Thirunarayan, Adarsh Alex, Christopher Heid, Greg Mott, 'Implicit Entity Recognition in Clinical Documents', In proceedings of The Fourth Joint Conference on Lexical and Computational Semantics (*SEM), 2015, PDF http://knoesis.org/researchers/sujan/ Implicit Entity Recognition in Clinical Documents