UIMA - Mayo Clinic Informatics

advertisement
UIMA
SHARP 4 - NLP
May 25, 2010
Outline
• UIMA Terminology (not just TLAs)
• Parts of a UIMA pipeline
• Running a pipeline
• Viewing annotations
• Creating a new annotator
UIMA terminology
• CAS
XCAS
JCAS View
• Analysis Engine (AE) / Annotator
– Aggregate Analysis Engine
• XML output:
• Type System
XCAS
XMI
JCasGen
• CAS Visual Debugger (CVD)
• CPE (Collection Processing Engine)
UIMA and Eclipse
• UIMA plugin for Eclipse requires EMF
• UIMA plugin provides visual editors for
descriptors
• An “Update site” exists for installing plugin
UIMA Pipeline Flow
• Collection Reader
• (CAS Initializer - deprecated)
• Analysis Engine (AE) / Annotator
• CAS Consumer
Pipeline Example
UIMA term
Example
Collection Reader
Read files from a dir
Analysis Engine
Sentence annotator
Analysis Engine
Tokenizer annotator
CAS Consumer
Output tokens to a DB
Options for running UIMA tools
• Tools:
– CPE Configurator
– CVD
• Options:
– Command line scripts/.bat files
– Run within Eclipse
Tying together a UIMA pipeline
• Type System
– Defines the data types passed along
• CAS
(Common Analysis Structure)
– Container for the data
Tying together a UIMA pipeline
• CPE descriptor – select the parts
– Collection Reader
– Analysis Engine(s)
– CAS Consumer
• Aggregate analysis engine
– Multiple Analysis Engines and their order
Options for running a pipeline
• CVD GUI
– Single Aggregate Analysis Engine
– No Collection Reader
• CPE GUI
• Instantiate a CpeDescription and invoke
the process() method
2.3. Running a CPE from Your Own Java Application
Example: Running a pipeline
Running cTAKES within Eclipse using a CPE
Use run configuration
UIMA_CPE_GUI--clinical_documents_pipeline
CPE
test1.xml
from
clinical documents pipeline\desc\collection_processing_engine
Options for viewing annotations
• CVD
• Annotation viewer
• XML viewer
• Text editor
Example: Viewing annotations
Viewing annotations using the CVD
• Load the Type System
• Load the XCAS or XMI
Example: Running an AE in CVD
Using CVD to run an Analysis Engine
– No Collection Reader
– Single Analysis Engine (can be an aggregate)
– No CAS Consumer
– Just paste/type in text to process
Family history of hyperlipidemia.
Creating a New Annotator
•
•
•
•
•
Create Java project
Right click -> Add UIMA Nature
Add UIMA jars to .classpath (Build Path)
Create Analysis Engine (AE) descriptor
Add types to AE descriptor, or optionally
create separate Type System descriptor
• Write code!
Questions?
Supplemental slides follow
Example: Creating a PEAR file
•
•
•
•
•
•
Right click -> Add UIMA Nature
Right click -> Generate Pear
Select Analysis Engine descriptor
Select OS and JDK
Modify Properties if needed
Select what to include
Example: Modifying a parameter
UIMA’s descriptor editors allow you to
modify most parameters without looking at
the XML itself.
Links
• Getting started with UIMA
http://uima.apache.org/doc-uima-annotator.html
• UIMA Update site for use in Eclipse
http://www.apache.org/dist/incubator/uima/eclipse-update-site/
Email address
masanz.james@mayo.edu
Download