Computational Intelligence in Biomedical and Health Care Informatics HCA 590 (Topics in Health Sciences) Rohit Kate Clinical Natural Language Processing 2 1 Reading • Chapter 15, Text 6 • Paper: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, Component Evaluation and Applications Guergana K Savova, James J Masanz, Philip V Ogren, Jiaping Zheng, Sunghwan Sohn, Karin C Kipper-Schuler, Christopher G Chute Journal of American Medical Informatics Association 2010;17:507-513 2 Clinical NLP Systems • General-purpose Clinical NLP Systems: Could be applied to different tasks – MetaMap – cTAKES – MedLEE • Specialized Clinical NLP Systems – Detecting clinical events – Processing radiology, pathology and other reports • General-purpose systems can be used to build specialized systems, hence the distinction is mostly in the purpose behind building the system 3 MetaMap http://metamap.nlm.nih.gov/ MetaMap Slides adapted from: http://skr.nlm.nih.gov/papers/references/10.11.14.MetaMapTutorial.pptx • A tool to identify concepts in clinical text • Identifying a concept means mapping it to a terminology/ontology or UMLS Metathesaurus • Concept identification is useful/essential for many tasks including – – – – – Information extraction/Data mining Classification/Categorization Text summarization Question answering Literature-based knowledge discovery 4 Concept Identification Programs • Selected programs that map biomedical text to a thesaurus – – – – – – – SAPHIRE (Hersh et al., 1990) CLARIT (Evans et al., 1991) MetaMap (Aronson et al., 1994) Metaphrase (Tuttle et al., 1998) MMTx (2001) KnowledgeMap (Denny et al., 2003) Mgrep (Meng, 2009--unpublished) • Characteristics of MetaMap – – – – Linguistic rigor Flexible partial matching Emphasis on thoroughness rather than speed Restricted to English syntax and vocabulary Example (best mappings) • PMID – 9339686 • AB –Cerebral blood flow (CBF) in newborn infants is Cerebrovascular Circulation Infant, Newborn CEREBRAL BLOOD FLOW IMAGING often below levels necessary to sustain brain viability Frequent Levels (qualifier value) Sustained Brain Entire brain in adults. Adult Viable Example (best mappings with WSD) • PMID – 9339686 • AB –Cerebral blood flow (CBF) in newborn infants is Cerebrovascular Circulation Infant, Newborn CEREBRAL BLOOD FLOW IMAGING often below levels necessary to sustain brain viability Frequent Levels (qualifier value) Sustained Brain Entire brain in adults. Adult Viable MetaMap Examples • “inferior vena caval stent filter” maps to – ‘Inferior Vena Cava Filter’ (‘Vena Cava Filters’) and – ‘Stent’ • “medicine” with --allow_overmatches maps to – ‘Alternative Medicine’ or – ‘Medical Records’ or – ‘Nuclear medicine procedure, NOS’ or ... • “pain on the left side of the chest” with --quick_composite_phrases maps to – ‘Left sided chest pain’ (under development) Example: Normal Processing Phrase: “lung cancer.” Meta Candidates (8): 1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process] 1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process] 861 Cancer (Malignant Neoplasms) [Neoplastic Process] 861 Lung [Body Part, Organ, or Organ Component] 861 Cancer (Cancer Genus) [Invertebrate] 861 Lung (Entire lung) [Body Part, Organ, or Organ Component] 861 Cancer (Specialty Type - cancer) [Biomedical Occupation or Discipline] 768 Pneumonia [Disease or Syndrome] Meta Mapping (1000): 1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process] Meta Mapping (1000): 1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process] Example: Compound Mappings Phrase: “obstructive sleep apnea.” Meta Candidates (8): ... with --compute_all_mappings Meta Mapping (1000): 1000 Obstructive sleep apnoea (Sleep Apnea, Obstructive) [Disease or Syndrome] Meta Mapping (901): 827 Obstructive (Obstructed) [Functional Concept] 901 Apnea, Sleep (Sleep Apnea Syndromes) [Disease or Syndrome] Meta Mapping (851): 827 Obstructive (Obstructed) [Functional Concept] 827 Sleep [Organism Function] 827 APNOEA (Apnea) [Pathologic Function] … Example: Show Sources (-G) Phrase: “scorpion sting.“ Meta Candidates (4): 1000 Scorpion sting {MDR,DXP} [Injury or Poisoning] 861 Sting (Sting Injury {MTH,MSH,MDR,RCD,SNM,SNOMEDCT,SNMI,WHO}) [Injury or Poisoning] 694 Scorpion (Scorpions {LCH,MSH,MTH,SNM,SNOMEDCT,SNMI,CSP, RCD,NCBI}) [Invertebrate] 694 SCORPION (Scorpion antigen {MTH,LNC}) [Immunologic Factor] Meta Mapping (1000): 1000 Scorpion sting {MDR,DXP} [Injury or Poisoning] Example: Restrict to Sources (-GR LCH) Phrase: “scorpion sting.” Meta Candidates (1): 694 Scorpion (Scorpions {LCH}) [Invertebrate] Meta Mapping (694): 694 Scorpion (Scorpions {LCH}) [Invertebrate] Example: Restrict to Semantic Types (-J neop) Phrase: “lung cancer.” Meta Candidates (3): 1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process] 1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process] 861 Cancer (Malignant Neoplasms) [Neoplastic Process] Meta Mapping (1000): 1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process] Meta Mapping (1000): 1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process] cTAKES http://incubator.apache.org/ctakes/ • • • • cTAKES: Mayo clinical Text Analysis and Knowledge Extraction System Developed at Mayo clinic with collaborations Publicly available Core components – – – – – – Sentence boundary detection (OpenNLP technology) Tokenization (rule-based) Morphologic normalization (NLM's LVG) POS tagging (OpenNLP technology) Shallow parsing (OpenNLP technology) Named Entity Recognition • • – – – – – – – 14 Dictionary mapping (lookup algorithm) Semantic types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications Assertion module Dependency parser Constituency parser Semantic Role Labeler Coreference resolver Drug Profile module Smoking status classifier cTAKES Example From: http://informatics.mayo.edu/sharp/images/2/25/CTAKES.ppt 15 MedLEE (MedLEE slides adapted from: https://cdc.confex.com/cdc/phin2008/recordingredirect.cgi/id/4165) • Medical Language Extraction and Encoding – Extracts, structures, and encodes clinical information in narrative patient reports – Comprehensive coverage – Can be used for diverse clinical applications – Development started in 1991 – Used at Columbia University Medical Center since 1995 – Numerous independent evaluations • Rule-based system constructed manually • Now with a company “Health Fidelity” 16 Applications MedLEE Patient report ....New maculopapular rash on trunk …. Analytics MedLEE Problem:rash Status:new Descriptor:maculopapular Bodyloc:trunk Code:C0460005 (trunk) Code:C0241488 (trunk maculopapular rash) Coded data Clinical Guidance -- Indicate potential notifiable disease for reporting. -- Inform of local outbreak and indicate appropriate tests. -- Indicate need for vaccination. Surveillance -- Indicate potential bioterrorist event. -- Transmit syndromic event to health dept for surveillance. Quality Assurance Detect potential cases of medication reaction. Clinical Research -- Detect cases of rash for inclusion in trial of new treatment. -- Find genetic associations with atopic rash. 17 Text Reports Processed • • • • • • • • Radiology Reports Cardiology Reports Pathology Reports Admission notes Discharge Summaries Resident Sign out notes Office Visits Telephone encounters 18 Applications using MedLEE • • • • • • • • • • • • Biosurveillance Syndromic surveillance Adverse Drug Event detection Decision Support Clinical Research Clinical Trials Quality Assurance Automated Encoding Patient Management Data mining – finding trends and associations Linking patient record to the literature Summarization 19 MedLEE Example “New maculopapular rash on trunk.” Pre[new,maculopapular,rash,on,trunk,’.’] processor Parser [problem,rash,[status,new],[descriptor,maculopapular],[bodyloc,tru nk]] Encoder [problem,rash,[status,new],[descriptor,maculopapular],[bodyloc,tru nk,[code,C0460005^trunk]],[code,C0241488^trunk maculopapular rash] Output format: XML <problem v = “rash”> <status v = “new”></status> <descriptor v = “maculopapular”></descriptor> <bodyloc v = ”trunk”> <code v = “C046005^trunk”</code> </bodyloc> <code v = “C0241488^trunk maculopapular rash” </code> </problem> 20 Detecting Clinical Events • Find clinical events, for example, adverse reactions, drug interactions etc. from medical records for research purposes or to trigger alerts • Simple keyword search is not sufficient • An NLP based system does a better job – Get a structured representation, for example, using MedLEE – Query on this representation • Hripscak et al. [2003] detected 45 types of adverse events from discharge summaries, for example, pulmonary embolism, medication errors etc. (sensitivity: 0.15-0.37, specificity: 0.99) 21 Processing Radiology Reports • Radiology reports is the genre of clinical reports on which most NLP systems have been applied • Special Purpose Radiology Understanding System (SPRUS) extracted and coded findings and radiologists’ interpretation [Haug et al. 1990] • SymText identified pneumonia-related concepts and detected presence and absence of bacterial pneumonia from radiology reports [Fiszman et al. 1998] • It evolved to MPLUS system that could classify various brain conditions and chief complaints [Christensen et al. 2002 ; Chapman et al. 2005] • Now evolved to Onyx which is being applied to dental exams [Christensen et al. 2007] 22 Other Clinical NLP Tasks • Summarization: Provide an overview of patient record or scientific literature – For clinicians – For patients • Question-Answering: Provide short answers or short summaries to questions asked on natural language 23 Summarization • A well-known task in general NLP • Provide clinicians a succinct summary of patient records – Categories necessary: Labs & tests, problem & treatment, history, findings, allergies, meds, plan, and identifying information – Meng et al. [2005] used semantic patterns to extract the information needed to generate a summary • Summarizing scientific publications: Fiszman et al. [2004] Presents findings of biomedical literature in graphical structures 24 Question-Answering • A well-known task in general NLP • CDS system could support clinical decision making in the form of answering clinical questions – – – – Provide information on particular patients Data on health and sickness within the local population Medical knowledge Other legal, social or ethical questions • Currently these are answered by manually browsing EMRs • NLP can help in automatically answering questions when: – Questions are in natural language – Answers could be found in natural language text • MedQA system [Lee et al. 2006] answers definitional questions by integrating information retrieval, extraction and summarization techniques to generate paragraph level answers 25 Direct Applications of NLP in Healthcare • Analyzing text written by patients to gauge their mental status • Monitoring medication compliance and drug abuse These applications are currently experimental and not fully deployed. 26 Analyzing Patient Text • Linguistic Inquiry and Word Count (LIWC) tool [Pennebaker et al. 2003] was used to analyze text written by patients for: – Predicting post-bereavement improvements in mental and physical health – Predicting adjustment to cancer – Recognizing suicidal and non-suicidal individuals • Roark et al. [2007] applied parsing methods to analyze sentences by patients to diagnose mild cognitive impairment 27 Monitoring Medical Compliance and Drug Abuse • Post-marketing surveillance of Internet chatter (message board postings, blogs etc.) related to pharmaceutical products was done to detect abuse of certain drugs by Butler et al. [2007] using NLP methods • Malouf et al. [2006] found associations, such as side effects, risks and dosage related issues, for epilepsy patients and their caregivers for certain medications from Internet discussion groups • Currently researchers are working at determining adverse drug effects from Internet postings using NLP methods 28 Future Work and Conclusions • CDS systems, in general, are not currently in wide use and that is also true for NLP-CDS systems, despite demonstrated benefits and local successes, but there is a renewed interest due to health-care data becoming electronic • Improving future use of NLP in CDS will need: – Adapting to clinicians and getting their trust – Progress in clinical NLP – Evaluation of impact on health-care not just the specific NLP task 29