UIMA Introduction James Masanz

advertisement
UIMA Introduction
SHARPn Summit
June 11, 2012
Outline
 UIMA Terminology (not just TLAs)
 Parts of a UIMA pipeline
 Running a pipeline
 Viewing annotations interactively
UIMA Terminology
 CAS
XCAS
JCAS View
 Analysis Engine (AE) / Annotator
 XML output:
 Type System
XCAS
XMI
JCasGen
 CAS Visual Debugger (CVD)
 CPE (Collection Processing Engine)
UIMA
 Framework
– Defining data types
– Passing data from one component to another
 Tooling
– Viewing results
– Debugging
– Editing XML visually
Data Through a Pipeline
 Type System
– Defines the data types passed along
 CAS (Common Analysis Structure)
– Container for the data passed along
– Created by UIMA from the Type System
Parts of a UIMA Pipeline
 Collection Reader
– Read input document
 Analysis Engine(s) / Annotator(s)
– Process document
 CAS Consumer
– Output data
Tying a Pipeline Together
 CPE descriptor (Collection Processing Engine)
– Collection Reader
– Analysis Engine(s)
– CAS Consumer
 Aggregate analysis engine
– Multiple Analysis Engines and their order
Pipeline Example
UIMA term
Example
Collection Reader
Read files from a dir
Analysis Engine
Sentence detector
Analysis Engine
Tokenizer annotator
Analysis Engine
Part of Speech tagger
CAS Consumer
Output tokens to DB
UIMA plugin for Eclipse
 Provides visual editors for descriptors
– Mini GUI for selecting options
– Rather than editing XML directly
 An “Update site” exists for installing plugin
http://www.apache.org/dist/incubator/uima/eclipse-update-site
UIMA Tooling Options
 Tools:
– CPE Configurator
– CVD (CAS Visual Debugger)
 Options:
– Command line scripts/.bat files
– Run within Eclipse
Running a Pipeline - CPE
 cTAKES provides a script and a bat file
runctakesCPE
 Choose a CPE descriptor, such as
test_plaintext.xml
from
cTAKESdesc/cdpdesc/collection_processing_engine
Viewing Annotations - CVD
 Viewing annotations using the CVD
– Load the Type System
– Load the XCAS or XMI
Annotation Viewers
 UIMA tools
– CVD (CAS Visual Debugger)
– Annotation viewer
 Viewing XML output
– Any XML viewer
– Any text editor
Questions?
http://uima.apache.org/
Supplemental slides follow
Options to Run a Pipeline
 CPE GUI
 CVD GUI
– Single Aggregate Analysis Engine
– No Collection Reader
 Instantiate a CpeDescription and invoke
the process() method
 uimaFIT– removes dependency on XML
Creating a New Annotator
 Within Eclipse
–
–
–
–
–
Create Java project
Right click -> Add UIMA Nature
Add UIMA jars to .classpath (Build Path)
Create Analysis Engine (AE) descriptor
Add types to AE descriptor, or optionally
create separate Type System descriptor
– Write code!
Running an AE in CVD
Using CVD to run an Analysis Engine
– No Collection Reader
– Single Analysis Engine (can be an aggregate)
– No CAS Consumer
– Load an Analysis Engine
– Paste/type in text to process
Family history of hyperlipidemia.
Modifying a parameter
UIMA’s descriptor editors allow you to
modify most parameters without looking at
the XML itself.
Links
 Getting started with UIMA
http://uima.apache.org/doc-uima-annotator.html
 UIMA Update site for use in Eclipse
http://www.apache.org/dist/incubator/uima/eclipse-update-site
Email address
masanz.james@mayo.edu
Download