UIMA Introduction SHARPn Summit June 11, 2012 Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations interactively UIMA Terminology CAS XCAS JCAS View Analysis Engine (AE) / Annotator XML output: Type System XCAS XMI JCasGen CAS Visual Debugger (CVD) CPE (Collection Processing Engine) UIMA Framework – Defining data types – Passing data from one component to another Tooling – Viewing results – Debugging – Editing XML visually Data Through a Pipeline Type System – Defines the data types passed along CAS (Common Analysis Structure) – Container for the data passed along – Created by UIMA from the Type System Parts of a UIMA Pipeline Collection Reader – Read input document Analysis Engine(s) / Annotator(s) – Process document CAS Consumer – Output data Tying a Pipeline Together CPE descriptor (Collection Processing Engine) – Collection Reader – Analysis Engine(s) – CAS Consumer Aggregate analysis engine – Multiple Analysis Engines and their order Pipeline Example UIMA term Example Collection Reader Read files from a dir Analysis Engine Sentence detector Analysis Engine Tokenizer annotator Analysis Engine Part of Speech tagger CAS Consumer Output tokens to DB UIMA plugin for Eclipse Provides visual editors for descriptors – Mini GUI for selecting options – Rather than editing XML directly An “Update site” exists for installing plugin http://www.apache.org/dist/incubator/uima/eclipse-update-site UIMA Tooling Options Tools: – CPE Configurator – CVD (CAS Visual Debugger) Options: – Command line scripts/.bat files – Run within Eclipse Running a Pipeline - CPE cTAKES provides a script and a bat file runctakesCPE Choose a CPE descriptor, such as test_plaintext.xml from cTAKESdesc/cdpdesc/collection_processing_engine Viewing Annotations - CVD Viewing annotations using the CVD – Load the Type System – Load the XCAS or XMI Annotation Viewers UIMA tools – CVD (CAS Visual Debugger) – Annotation viewer Viewing XML output – Any XML viewer – Any text editor Questions? http://uima.apache.org/ Supplemental slides follow Options to Run a Pipeline CPE GUI CVD GUI – Single Aggregate Analysis Engine – No Collection Reader Instantiate a CpeDescription and invoke the process() method uimaFIT– removes dependency on XML Creating a New Annotator Within Eclipse – – – – – Create Java project Right click -> Add UIMA Nature Add UIMA jars to .classpath (Build Path) Create Analysis Engine (AE) descriptor Add types to AE descriptor, or optionally create separate Type System descriptor – Write code! Running an AE in CVD Using CVD to run an Analysis Engine – No Collection Reader – Single Analysis Engine (can be an aggregate) – No CAS Consumer – Load an Analysis Engine – Paste/type in text to process Family history of hyperlipidemia. Modifying a parameter UIMA’s descriptor editors allow you to modify most parameters without looking at the XML itself. Links Getting started with UIMA http://uima.apache.org/doc-uima-annotator.html UIMA Update site for use in Eclipse http://www.apache.org/dist/incubator/uima/eclipse-update-site Email address masanz.james@mayo.edu