UIMA SHARP 4 - NLP May 25, 2010 Outline • UIMA Terminology (not just TLAs) • Parts of a UIMA pipeline • Running a pipeline • Viewing annotations • Creating a new annotator UIMA terminology • CAS XCAS JCAS View • Analysis Engine (AE) / Annotator – Aggregate Analysis Engine • XML output: • Type System XCAS XMI JCasGen • CAS Visual Debugger (CVD) • CPE (Collection Processing Engine) UIMA and Eclipse • UIMA plugin for Eclipse requires EMF • UIMA plugin provides visual editors for descriptors • An “Update site” exists for installing plugin UIMA Pipeline Flow • Collection Reader • (CAS Initializer - deprecated) • Analysis Engine (AE) / Annotator • CAS Consumer Pipeline Example UIMA term Example Collection Reader Read files from a dir Analysis Engine Sentence annotator Analysis Engine Tokenizer annotator CAS Consumer Output tokens to a DB Options for running UIMA tools • Tools: – CPE Configurator – CVD • Options: – Command line scripts/.bat files – Run within Eclipse Tying together a UIMA pipeline • Type System – Defines the data types passed along • CAS (Common Analysis Structure) – Container for the data Tying together a UIMA pipeline • CPE descriptor – select the parts – Collection Reader – Analysis Engine(s) – CAS Consumer • Aggregate analysis engine – Multiple Analysis Engines and their order Options for running a pipeline • CVD GUI – Single Aggregate Analysis Engine – No Collection Reader • CPE GUI • Instantiate a CpeDescription and invoke the process() method 2.3. Running a CPE from Your Own Java Application Example: Running a pipeline Running cTAKES within Eclipse using a CPE Use run configuration UIMA_CPE_GUI--clinical_documents_pipeline CPE test1.xml from clinical documents pipeline\desc\collection_processing_engine Options for viewing annotations • CVD • Annotation viewer • XML viewer • Text editor Example: Viewing annotations Viewing annotations using the CVD • Load the Type System • Load the XCAS or XMI Example: Running an AE in CVD Using CVD to run an Analysis Engine – No Collection Reader – Single Analysis Engine (can be an aggregate) – No CAS Consumer – Just paste/type in text to process Family history of hyperlipidemia. Creating a New Annotator • • • • • Create Java project Right click -> Add UIMA Nature Add UIMA jars to .classpath (Build Path) Create Analysis Engine (AE) descriptor Add types to AE descriptor, or optionally create separate Type System descriptor • Write code! Questions? Supplemental slides follow Example: Creating a PEAR file • • • • • • Right click -> Add UIMA Nature Right click -> Generate Pear Select Analysis Engine descriptor Select OS and JDK Modify Properties if needed Select what to include Example: Modifying a parameter UIMA’s descriptor editors allow you to modify most parameters without looking at the XML itself. Links • Getting started with UIMA http://uima.apache.org/doc-uima-annotator.html • UIMA Update site for use in Eclipse http://www.apache.org/dist/incubator/uima/eclipse-update-site/ Email address masanz.james@mayo.edu