Clinical Natural Language Processing 2

advertisement
Computational Intelligence in
Biomedical and Health Care Informatics
HCA 590 (Topics in Health Sciences)
Rohit Kate
Clinical Natural Language
Processing 2
1
Reading
• Chapter 15, Text 6
• Paper: Mayo clinical Text Analysis and Knowledge Extraction System
(cTAKES): Architecture, Component Evaluation and Applications
Guergana K Savova, James J Masanz, Philip V Ogren, Jiaping Zheng,
Sunghwan Sohn, Karin C Kipper-Schuler, Christopher G Chute
Journal of American Medical Informatics Association 2010;17:507-513
2
Clinical NLP Systems
• General-purpose Clinical NLP Systems: Could be
applied to different tasks
– MetaMap
– cTAKES
– MedLEE
• Specialized Clinical NLP Systems
– Detecting clinical events
– Processing radiology, pathology and other reports
• General-purpose systems can be used to build
specialized systems, hence the distinction is
mostly in the purpose behind building the system
3
MetaMap
http://metamap.nlm.nih.gov/
MetaMap Slides adapted from:
http://skr.nlm.nih.gov/papers/references/10.11.14.MetaMapTutorial.pptx
• A tool to identify concepts in clinical text
• Identifying a concept means mapping it to a
terminology/ontology or UMLS Metathesaurus
• Concept identification is useful/essential for
many tasks including
–
–
–
–
–
Information extraction/Data mining
Classification/Categorization
Text summarization
Question answering
Literature-based knowledge discovery
4
Concept Identification
Programs
• Selected programs that map biomedical text to a
thesaurus
–
–
–
–
–
–
–
SAPHIRE (Hersh et al., 1990)
CLARIT (Evans et al., 1991)
MetaMap (Aronson et al., 1994)
Metaphrase (Tuttle et al., 1998)
MMTx (2001)
KnowledgeMap (Denny et al., 2003)
Mgrep (Meng, 2009--unpublished)
• Characteristics of MetaMap
–
–
–
–
Linguistic rigor
Flexible partial matching
Emphasis on thoroughness rather than speed
Restricted to English syntax and vocabulary
Example (best mappings)
• PMID – 9339686
• AB –Cerebral blood flow (CBF) in newborn infants is
Cerebrovascular Circulation
Infant, Newborn
CEREBRAL BLOOD FLOW IMAGING
often below levels necessary to sustain brain viability
Frequent
Levels (qualifier value)
Sustained
Brain
Entire brain
in adults.
Adult
Viable
Example (best mappings with
WSD)
• PMID – 9339686
• AB –Cerebral blood flow (CBF) in newborn infants is
Cerebrovascular Circulation
Infant, Newborn
CEREBRAL BLOOD FLOW IMAGING
often below levels necessary to sustain brain viability
Frequent
Levels (qualifier value)
Sustained
Brain
Entire brain
in adults.
Adult
Viable
MetaMap Examples
• “inferior vena caval stent filter” maps to
– ‘Inferior Vena Cava Filter’ (‘Vena Cava Filters’) and
– ‘Stent’
• “medicine” with --allow_overmatches maps to
– ‘Alternative Medicine’ or
– ‘Medical Records’ or
– ‘Nuclear medicine procedure, NOS’ or ...
• “pain on the left side of the chest” with
--quick_composite_phrases maps to
– ‘Left sided chest pain’ (under development)
Example: Normal Processing
Phrase: “lung cancer.”
Meta Candidates (8):
1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process]
1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process]
861 Cancer (Malignant Neoplasms) [Neoplastic Process]
861 Lung [Body Part, Organ, or Organ Component]
861 Cancer (Cancer Genus) [Invertebrate]
861 Lung (Entire lung) [Body Part, Organ, or Organ Component]
861 Cancer (Specialty Type - cancer) [Biomedical Occupation or
Discipline]
768 Pneumonia [Disease or Syndrome]
Meta Mapping (1000):
1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process]
Meta Mapping (1000):
1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process]
Example: Compound Mappings
Phrase: “obstructive sleep apnea.”
Meta Candidates (8):
...
with
--compute_all_mappings
Meta Mapping (1000):
1000 Obstructive sleep apnoea (Sleep Apnea, Obstructive) [Disease or
Syndrome]
Meta Mapping (901):
827 Obstructive (Obstructed) [Functional Concept]
901 Apnea, Sleep (Sleep Apnea Syndromes) [Disease or Syndrome]
Meta Mapping (851):
827 Obstructive (Obstructed) [Functional Concept]
827 Sleep [Organism Function]
827 APNOEA (Apnea) [Pathologic Function]
…
Example: Show Sources
(-G)
Phrase: “scorpion sting.“
Meta Candidates (4):
1000 Scorpion sting {MDR,DXP} [Injury or Poisoning]
861 Sting (Sting Injury {MTH,MSH,MDR,RCD,SNM,SNOMEDCT,SNMI,WHO})
[Injury or Poisoning]
694 Scorpion (Scorpions {LCH,MSH,MTH,SNM,SNOMEDCT,SNMI,CSP,
RCD,NCBI}) [Invertebrate]
694 SCORPION (Scorpion antigen {MTH,LNC}) [Immunologic Factor]
Meta Mapping (1000):
1000 Scorpion sting {MDR,DXP} [Injury or Poisoning]
Example: Restrict to Sources
(-GR LCH)
Phrase: “scorpion sting.”
Meta Candidates (1):
694 Scorpion (Scorpions {LCH}) [Invertebrate]
Meta Mapping (694):
694 Scorpion (Scorpions {LCH}) [Invertebrate]
Example: Restrict to Semantic Types
(-J neop)
Phrase: “lung cancer.”
Meta Candidates (3):
1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process]
1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process]
861 Cancer (Malignant Neoplasms) [Neoplastic Process]
Meta Mapping (1000):
1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process]
Meta Mapping (1000):
1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process]
cTAKES
http://incubator.apache.org/ctakes/
•
•
•
•
cTAKES: Mayo clinical Text Analysis and Knowledge Extraction System
Developed at Mayo clinic with collaborations
Publicly available
Core components
–
–
–
–
–
–
Sentence boundary detection (OpenNLP technology)
Tokenization (rule-based)
Morphologic normalization (NLM's LVG)
POS tagging (OpenNLP technology)
Shallow parsing (OpenNLP technology)
Named Entity Recognition
•
•
–
–
–
–
–
–
–
14
Dictionary mapping (lookup algorithm)
Semantic types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications
Assertion module
Dependency parser
Constituency parser
Semantic Role Labeler
Coreference resolver
Drug Profile module
Smoking status classifier
cTAKES Example
From: http://informatics.mayo.edu/sharp/images/2/25/CTAKES.ppt
15
MedLEE
(MedLEE slides adapted from: https://cdc.confex.com/cdc/phin2008/recordingredirect.cgi/id/4165)
• Medical Language Extraction and Encoding
– Extracts, structures, and encodes clinical information
in narrative patient reports
– Comprehensive coverage
– Can be used for diverse clinical applications
– Development started in 1991
– Used at Columbia University Medical Center since
1995
– Numerous independent evaluations
• Rule-based system constructed manually
• Now with a company “Health Fidelity”
16
Applications
MedLEE
Patient report
....New maculopapular rash
on trunk ….
Analytics
MedLEE
Problem:rash
Status:new
Descriptor:maculopapular
Bodyloc:trunk
Code:C0460005 (trunk)
Code:C0241488 (trunk
maculopapular rash)
Coded data
Clinical Guidance
-- Indicate potential notifiable
disease for reporting.
-- Inform of local outbreak and
indicate appropriate tests.
-- Indicate need for vaccination.
Surveillance
-- Indicate potential bioterrorist
event.
-- Transmit syndromic event to
health dept for surveillance.
Quality Assurance
Detect potential cases of
medication reaction.
Clinical Research
-- Detect cases of rash for
inclusion in trial of new
treatment.
-- Find genetic associations
with atopic rash.
17
Text Reports Processed
•
•
•
•
•
•
•
•
Radiology Reports
Cardiology Reports
Pathology Reports
Admission notes
Discharge Summaries
Resident Sign out notes
Office Visits
Telephone encounters
18
Applications using MedLEE
•
•
•
•
•
•
•
•
•
•
•
•
Biosurveillance
Syndromic surveillance
Adverse Drug Event detection
Decision Support
Clinical Research
Clinical Trials
Quality Assurance
Automated Encoding
Patient Management
Data mining – finding trends and associations
Linking patient record to the literature
Summarization
19
MedLEE Example
“New maculopapular rash on trunk.”
Pre[new,maculopapular,rash,on,trunk,’.’]
processor
Parser
[problem,rash,[status,new],[descriptor,maculopapular],[bodyloc,tru
nk]]
Encoder
[problem,rash,[status,new],[descriptor,maculopapular],[bodyloc,tru
nk,[code,C0460005^trunk]],[code,C0241488^trunk maculopapular
rash]
Output
format:
XML
<problem v = “rash”>
<status v = “new”></status>
<descriptor v = “maculopapular”></descriptor>
<bodyloc v = ”trunk”>
<code v = “C046005^trunk”</code>
</bodyloc>
<code v = “C0241488^trunk maculopapular rash” </code>
</problem>
20
Detecting Clinical Events
• Find clinical events, for example, adverse reactions,
drug interactions etc. from medical records for
research purposes or to trigger alerts
• Simple keyword search is not sufficient
• An NLP based system does a better job
– Get a structured representation, for example, using
MedLEE
– Query on this representation
• Hripscak et al. [2003] detected 45 types of adverse
events from discharge summaries, for example,
pulmonary embolism, medication errors etc.
(sensitivity: 0.15-0.37, specificity: 0.99)
21
Processing Radiology Reports
• Radiology reports is the genre of clinical reports on which
most NLP systems have been applied
• Special Purpose Radiology Understanding System (SPRUS)
extracted and coded findings and radiologists’
interpretation [Haug et al. 1990]
• SymText identified pneumonia-related concepts and
detected presence and absence of bacterial pneumonia
from radiology reports [Fiszman et al. 1998]
• It evolved to MPLUS system that could classify various brain
conditions and chief complaints [Christensen et al. 2002 ;
Chapman et al. 2005]
• Now evolved to Onyx which is being applied to dental
exams [Christensen et al. 2007]
22
Other Clinical NLP Tasks
• Summarization: Provide an overview of
patient record or scientific literature
– For clinicians
– For patients
• Question-Answering: Provide short answers
or short summaries to questions asked on
natural language
23
Summarization
• A well-known task in general NLP
• Provide clinicians a succinct summary of patient
records
– Categories necessary: Labs & tests, problem &
treatment, history, findings, allergies, meds, plan, and
identifying information
– Meng et al. [2005] used semantic patterns to extract
the information needed to generate a summary
• Summarizing scientific publications: Fiszman et al.
[2004] Presents findings of biomedical literature
in graphical structures
24
Question-Answering
• A well-known task in general NLP
• CDS system could support clinical decision making in the form of
answering clinical questions
–
–
–
–
Provide information on particular patients
Data on health and sickness within the local population
Medical knowledge
Other legal, social or ethical questions
• Currently these are answered by manually browsing EMRs
• NLP can help in automatically answering questions when:
– Questions are in natural language
– Answers could be found in natural language text
• MedQA system [Lee et al. 2006] answers definitional questions by
integrating information retrieval, extraction and summarization
techniques to generate paragraph level answers
25
Direct Applications of NLP in
Healthcare
• Analyzing text written by patients to gauge
their mental status
• Monitoring medication compliance and drug
abuse
These applications are currently experimental
and not fully deployed.
26
Analyzing Patient Text
• Linguistic Inquiry and Word Count (LIWC) tool
[Pennebaker et al. 2003] was used to analyze text
written by patients for:
– Predicting post-bereavement improvements in mental
and physical health
– Predicting adjustment to cancer
– Recognizing suicidal and non-suicidal individuals
• Roark et al. [2007] applied parsing methods to
analyze sentences by patients to diagnose mild
cognitive impairment
27
Monitoring Medical Compliance
and Drug Abuse
• Post-marketing surveillance of Internet chatter
(message board postings, blogs etc.) related to
pharmaceutical products was done to detect abuse of
certain drugs by Butler et al. [2007] using NLP methods
• Malouf et al. [2006] found associations, such as side
effects, risks and dosage related issues, for epilepsy
patients and their caregivers for certain medications
from Internet discussion groups
• Currently researchers are working at determining
adverse drug effects from Internet postings using NLP
methods
28
Future Work and Conclusions
• CDS systems, in general, are not currently in wide
use and that is also true for NLP-CDS systems,
despite demonstrated benefits and local
successes, but there is a renewed interest due to
health-care data becoming electronic
• Improving future use of NLP in CDS will need:
– Adapting to clinicians and getting their trust
– Progress in clinical NLP
– Evaluation of impact on health-care not just the
specific NLP task
29
Download