Clinical Text Analysis & Knowledge Extraction System (cTAKES)

advertisement
Recent Efforts in Clinical NLP:
Clinical Text Analysis and Knowledge
Extraction System (cTAKES)
Guergana K. Savova, PhD
Children’s Hospital Boston and Harvard
Medical School
Acknowledgements
Software developers and contributors at different times (in no specific
order)
James Masanz, Mayo Clinic
Patrick Duffy, Mayo Clinic
Philip Ogren, University of Colorado
Sean Murphy, Mayo Clinic
Vinod Kaggal, Mayo Clinic
Jiaping Zheng, Childrens Hospital Boston
Pei Chen, Childrens Hospital Boston
Jihno Choi, University of Colorado
Investigators (in no specific order)
Christopher Chute, MD, DrPH, Mayo Clinic
James Buntrock, MS, Mayo Clinic
Guergana Savova, PhD, Childrens Hospital Boston
Overview
Background
Clinical Text Analysis and Knowledge Extraction
System (cTAKES)
cTAKES for developers


Download and install of cTAKES
How to build the dictionary
cTAKES: graphical user interface
Definitions
• Information Extraction (IE)
• Extracting existing facts from unstructured or loosely structured text
into a structured form
• Information Retrieval (IR)
• Finding documents relevant to a user query
• Named Entity Recognition (NER)
• Discovery of groups of textual mentions that belong to certain
semantic class
• Natural Language Processing (NLP)
• Computational methods for text processing based on linguistically
sound principles
• Clinical NLP – NLP for the clinical narrative
• Biomedical NLP – NLP for the clinical narrative and biomedical
literature
4
Problem Space
• Structured information
• Relational databases
• Easy to extract information from them
• Semi-structured information
• Loosely formatted XML, CSV tables
• Not challenging to extract information
• Unstructured information
• Scholarly literature, clinical notes, research reports, webpages
• Majority of information is unstructured!!
• Real challenge to extract the information
5
Overarching Goal
Open-source, general-purpose clinical NLP toolkit





Phenotype extraction from unstructured data
Library of modules
Cohesive with other initiatives
Cutting edge methodologies
Best software development practices
Our principles
 Open source
 Scalable and robust
 Modular and expandable
 Based on existing standards and conventions
 Scalable, adaptable methodologies through open collaboration in
the open-source development
Processing Clinical Notes
A 43-year-old
woman was diagnosed with type 2 diabetes
A 43-year-old woman was diagnosed with type
2 diabetes
mellitus
mellitus by her family physician 3 months before
this by her family physician 3 months before this
presentation.
Her initial blood glucose was 340 mg/dL.
presentation. Her initial blood glucose was 340
mg/dL. Glyburide
2.5 mg
2.5 mg once daily was prescribed. Since then, Glyburide
self-monitoring
of once daily was prescribed. Since then,
self-monitoring
of blood glucose (SMBG) showed blood
blood glucose (SMBG) showed blood glucose levels
of 250-270
glucose
levels
of
250-270 mg/dL. She was referred to an
mg/dL. She was referred to an endocrinologist for further
endocrinologist for further evaluation.
evaluation.
On acutely
examination,
On examination, she was normotensive and not
ill. Hershe was normotensive and not acutely
ill.a Her
body
mass index (BMI) was 18.7 kg/m2 following
body mass index (BMI) was 18.7 kg/m2 following
recent
10 lb
a recentand
10 ankle
lb weight loss. Her thyroid was
weight loss. Her thyroid was symmetrically enlarged
symmetrically
enlarged and ankle reflexes absent. Her
reflexes absent. Her blood glucose was 272 mg/dL,
and her
bloodshowed
glucose
was 272 mg/dL, and her hemoglobin A1c
hemoglobin A1c (HbA1c) was 10.3%. A lipid profile
a total
(HbA1c)
was
10.3%.
A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL
cholesterol
of 261 mg/dL, triglyceride level of 321
level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid
function
mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL.
was normal. Urinanalysis showed trace ketones.
Thyroid function was normal. Urinanalysis showed trace
She adhered to a regular exercise program and vitamin regimen,
ketones.
smoked 2 packs of cigarettes daily for the past 25 years, and
She
adhered
to a regular exercise program and vitamin
limited her alcohol intake to 1 drink daily. Her
mother's
brother
regimen, smoked 2 packs of cigarettes daily for the
was diabetic.
past 25 years, and limited her alcohol intake to 1
drink daily. Her mother's brother was diabetic.
A 43-year-old woman
A 43-year-old woman was
was diagnosed with
diagnosed with type 2
type 2 diabetes mellitus
diabetes mellitus by her
by her family physician
family physician
3
A 43-year-old
woman was3 months before this
mpresentation.
Her
initial
diagnosed with type 2 diabetes
presentation. Her
blood glucose
wasby
340
mg/dL.
mellitus
her
family physician
initial blood glucose
Glyburide
3 months before this
was 340 mg/dL.
presentation. Her initial blood
Glyburide
glucose was 340 mg/dL.
Glyburide
Clinical Element Model
http://intermountainhealthcare.org/cem/Pages/
home.aspx
Disorder CEM
text:
code:
subject:
relative temporal context:
negation indicator:
diabetes mellitus
73211009
patient
3 months ago
not negated
Medication CEM
text:
code:
subject:
frequency:
negation indicator:
strength:
Glyburide
315989
patient
once daily
not negated
2.5 mg
Tobacco Use CEM
text:
code:
subject:
relative temporal context:
negation indicator:
smoking
365981007
patient
25 years
not negated
Disorder CEM
text:
code:
subject:
relative temporal context:
negation indicator:
diabetes mellitus
73211009
family member
not negated
A 43-year-old woman was diagnosed with type 2 diabetes
mellitus by her family physician 3 months before this
presentation. Her initial blood glucose was 340 mg/dL.
Glyburide 2.5 mg once daily was prescribed. Since then,
self-monitoring of blood glucose (SMBG) showed blood
glucose levels of 250-270 mg/dL. She was referred to an
endocrinologist for further evaluation.
On examination, she was normotensive and not acutely
ill. Her body mass index (BMI) was 18.7 kg/m2 following
a recent 10 lb weight loss. Her thyroid was
symmetrically enlarged and ankle reflexes absent. Her
blood glucose was 272 mg/dL, and her hemoglobin A1c
(HbA1c) was 10.3%. A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321
mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL.
Thyroid function was normal. Urinanalysis showed trace
ketones.
She adhered to a regular exercise program and vitamin
regimen, smoked 2 packs of cigarettes daily for the
past 25 years, and limited her alcohol intake to 1
drink daily. Her mother's brother was diabetic.
Comparative Effectiveness
Disorder CEM
text:
code:
subject:
relative temporal context:
negation indicator:
diabetes mellitus
73211009
patient
3 months ago
not negated
Medication CEM
text:
code:
subject:
frequency:
negation indicator:
strength:
Glyburide
315989
patient
once daily
not negated
2.5 mg
Tobacco Use CEM
text:
code:
subject:
relative temporal context:
negation indicator:
smoking
365981007
patient
25 years
not negated
Disorder CEM
text:
code:
subject:
relative temporal context:
negation indicator:
diabetes mellitus
73211009
family member
not negated
Compare the effectiveness of different treatment
strategies (e.g., modifying target levels for glucose,
lipid, or blood pressure) in reducing cardiovascular
complications in newly diagnosed adolescents and
adults with type 2 diabetes.
Compare the effectiveness of traditional behavioral
interventions versus economic incentives in
motivating behavior changes (e.g., weight loss,
smoking cessation, avoiding alcohol and substance
abuse) in children and adults.
Meaningful Use
Disorder CEM
text:
code:
subject:
relative temporal context:
negation indicator:
diabetes mellitus
73211009
patient
3 months ago
not negated
Medication CEM
text:
code:
subject:
frequency:
negation indicator:
strength:
Glyburide
315989
patient
once daily
not negated
2.5 mg
Tobacco Use CEM
text:
code:
subject:
relative temporal context:
negation indicator:
smoking
365981007
patient
25 years
not negated
Disorder CEM
text:
code:
subject:
relative temporal context:
negation indicator:
diabetes mellitus
73211009
family member
not negated
• Maintain problem list
• Maintain active med list
• Record smoking status
• Provide clinical summaries for each office visit
• Generate patient lists for specific conditions
• Submit syndromic surveillance data
Clinical Practice
Disorder CEM
text:
code:
subject:
relative temporal context:
negation indicator:
diabetes mellitus
73211009
patient
3 months ago
not negated
Medication CEM
text:
code:
subject:
frequency:
negation indicator:
strength:
Glyburide
315989
patient
once daily
not negated
2.5 mg
• Provide problem list and meds from the visit
Applications
Meaningful use of the EMR
Comparative effectiveness
Clinical investigation


Patient cohort identification
Phenotype extraction
Epidemiology
Clinical practice
and many more….
With deep semantic processing, the sky is the limit
for applications
Partnerships
NCBC-funded initiatives


Integrating Data for Analysis, Anonymization and Sharing (iDASH)
Ontology Development and Information Extraction (ODIE)
Veterans Administration
Strategic Health Advanced Research Projects (SHARP)


SHARP 3: SMaRT app (http://www.smartplatforms.org/)
SHARP 4: www.sharpn.org
R01s



Shared annotated lexical resource
Temporal relation discovery for the clinical domain
Milti-source integrated platform for answering clinical questions
eMERGE, PGRN (Pharmacogenomics Research Network)
Linguistic Data Consortium and Penn Treebank
MITRE Corporation
Integrating cTAKES within i2b2
….a scalable informatics framework that will enable clinical
researchers to use existing clinical data for discovery research
and, when combined with IRB-approved genomic data,
facilitate the design of targeted therapies for individual
patients with diseases having genetic origins.
Querying encrypted clinical notes stored in
the i2b2 database
Processing the result notes through
cTAKES
Persisting extracted concepts into the i2b2
database
Thus, the concepts are now searchable by
the researcher
Enabling the training and running
classifiers directly from the i2b2
workbench
https://www.i2b2.org/events/slides/i2b2_AMIA_Tutorial_20100310.pdf
clinical Text Analysis and Knowledge
Extraction System (cTAKES)
15
16
cTAKES Adoption
May, 2011:

2306 downloads*
eMERGE (SGH, NW)
PGRN (HMS, NW)
Extensions:
Yale
(YATEX),
MITRE
* Source: http://sourceforge.net/project/stats/?group_id=255545&ugn=ohnlp&type=&mode=alltime
• Open source
cTAKES Technical Details
• Apache v2.0 license
• http://sourceforge.net/projects/ohnlp/
• Java 1.5
• Dependency on UMLS which requires a UMLS license (free)
• Framework
• IBM’s Unstructured Information Management Architecture (UIMA)
open source framework, Apache project
• Methods
• Natural Language Processing methods (NLP)
• Based on standards and conventions to foster interoperability
• Application
• High-throughput system
18
cTAKES: Components
•
•
•
•
•
•
Sentence boundary detection (OpenNLP technology)
Tokenization (rule-based)
Morphologic normalization (NLM’s LVG)
POS tagging (OpenNLP technology)
Shallow parsing (OpenNLP technology)
Named Entity Recognition
• Dictionary mapping (lookup algorithm)
• Machine learning (MAWUI)
• types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications
• Negation and context identification (NegEx)
•
•
•
•
Dependency parser
Drug Profile module
Smoking status classifier
CEM normalization module (soon to be released)
19
Output Example: Drug Object
• “Tamoxifen 20 mg po daily started on March 1, 2005.”
• Drug
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Text: Tamoxifen
Associated code: C0351245
Strength: 20 mg
Start date: March 1, 2005
End date: null
Dosage: 1.0
Frequency: 1.0
Frequency unit: daily
Duration: null
Route: Enteral Oral
Form: null
Status: current
Change Status: no change
Certainty: null
20
Output Example: Disorder Object
• “No evidence of cholangiocarcinoma.”
• Disorder
•
•
•
•
•
•
Text: cholangiocarcinoma
Associated code: SNOMED 70179006
Certainty: 1
Context: current
Relatedness to patient: true
Status: negated
21
(1) cTAKES for developers
Download and install of cTAKES
Building the dictionary
Jiaping Zheng
Children’s Hospital Boston
Introduction
See separate pdf for the slides
Graphical User Interface (GUI) to cTAKES:
a Prototype
Pei J. Chen
Children’s Hospital Boston
24
cTAKES as a Service
Objectives
1.
2.
3.
Demo cTAKES prototype web application
Empower End Users to leverage cTAKES
Gather feedback for future cTAKES GUI
Potential system integrations with other applications
(i.e. i2b2, ARC, Web Annotator)
Developed within i2b2 to integrate cTAKES in the i2b2
NLP cell
cTAKES Web Application: a Prototype
http://chipweb2.chip.org/cTakes_webservice_trunk/index.html
Single clinical note
Technologies
Front-End
Middleware
Back-End
Web GUI
Web Services
cTAKES


ExtJS
JavaScript
 JAVA
 Apache CXF
 JSON


JAVA
UIMA
Deployment Considerations
Deployment Model
Security
Performance
Licensing (UMLS, Apache, GPL v.3)
Download