Final Presentation: Information Extraction From Medical Records

advertisement
Information Extraction From
Medical Records
by Alexander Barsky
Current Methodology:
Broad assessment of patient contained in
beginning of chart with references to more
specific areas. Specific divisions follow broad
assessment. Records are listed in
chronological order of activity.
Chart Example:
.
Problem:
A patient's medical chart is very detailed and
very complex in nature. Any attempt to
quickly locate specific information will be met
with frustration.
Example:
.
Solution:
Create a system that properly extracts wanted information
based on a predefined set of parameters.
Example: "Hormonal imbalance during puberty". Retrieve all
references to hormonal imbalances but only between two
specific time periods in medical chart.
Tool At our disposal:
JAPE : Java Annotation Patterns Engine.
Use : pattern matching and semantic extraction
GATE : General Architecture for Text Engineering.
Use: Information Extraction, document annotation, and
XML output.
C# : Visual C# Winforms.
Use: Medium for conversion between XML and .csv
file
formats.
Solution Methodology:
1. Create corpus of documents in GATE.
2. Introduce rules for information extraction.
3. Annotate documents in corpus.
4. Output annotated documents in XML.
5. Strip file of unnecessary elements and convert to .csv.
ANNIE
A-Nearly-New-Information-Extraction-System
-Tokeniser - splits sentence into simple tokens
-Gazetter - identify entity names contained in lists
-Sentence Splitter - splits text into sentences based on
lists.
-Parts of Speech Tagger - identifies text as
different POS.
-Coreference Matcher- identifies relationships between
previously defined entities.
Success in Information Extraction is
based on integrating most if not all
ANNIE components
-
JAPE : Key to Extraction
-
JAPE Example
-
XML Output:
-
Problem: Too much unorganized
information.
Solution :
XLST to the rescue!!!
XLST - Extensible Stylesheet Language Transformations
- Add specific rules to seperate needed from unnecessary
information.
XLST Example
-Find all the nodes within the <Lookup>. Add string between
the tags.
CSV File Type
Comma Seperated Value - Used to present
information in a tabular system. Useful for
analyzing large amount of data in an easy to
understand format. Most common program
to use it is Excel.
.
Potential Problem:
Regardless of how well all the ANNIE tools
are utilized and how well the JAPE rules are
defined, proper recall precentage won't ever
be exact.
Solution: Machine Learning
Machine learning is our best chance to
increase precision of output results. Training
a computer to recognize commonally used
reporting phraseology will organize extraction
better with more precise, concise outputs.
Lucky for us, GATE include plugins to program
machine learning.
Download