Providing Annotation Options, Without Revealing Too Much Jeffrey P Ferraro 1-2 MS, Scott L DuVall PhD, 3-4 Peter Haug 1-2 MD Healthcare, Salt Lake City, UT –– 2 Department of Biomedical Informatics University of Utah, Salt Lake City, UT 3 Internal Medicine University of Utah, Salt Lake City, UT –– 4 VA Salt Lake City Health Care System, Salt Lake City, UT 1 Intermountain Background For an ongoing research study exploring domain adaptation of a Natural Language Processing (NLP), part-ofspeech tagging system, a reference standard for the target domain of healthcare was created using clinical text reports. One of the challenges faced in clinical NLP is the limited availability of high quality annotated clinical text required for training NLP machine learning models, as in this case, a Transformation Based Learner. Objective An automated annotation tool was developed to assist human annotators in the efficient production of a high quality, reference standard of part-of-speech tags. It was important to reduce demand effect bias that could be caused by system generated part-of-speech cues being accepted by human annotators even when inaccurate. Methods Six (6) part-of-speech tagging systems were used with their out-of-the-box tagging models to initially tag the clinical text corpus. These include the Stanford Log-linear Part-Of-Speech Tagger1, OpenNLP pos–tagger2, MorphAdorner3, LingPipe4, the Illinois Part of Speech Tagger5, and the Specialist Lexicon6 modified to only contain non–ambiguous terms. Tag output from each system was mapped to a normalized tag-set. To control for demand effect, tags generated by the six taggers were filtered for duplicates leaving only distinct tag choices so that selection . . . Annotators Kappa HA1 vs. HA2 0.953 HA1 vs. Illinois Tagger 0.859 HA2 vs. Illinois Tagger 0.852 HA1 vs. Stanford Tagger 0.850 HA2 vs. Stanford Tagger 0.843 HA1 vs. OpenNLP 0.850 HA2 vs. OpenNLP 0.841 HA1 vs. MorphAdorner 0.822 HA2 vs. MorphAdorner 0.820 HA1 vs. LingPipe 0.810 HA2 vs. LingPipe 0.807 HA1 vs. Spec. Lexicon 0.803 HA2 vs. Spec.Lexicon 0.815 Conclusion Methods (con’t) . . . was not prompted based on a tag being displayed multiple times. A webinterface was created to display each sentence to be annotated. For each word, the auto-generated cue tags along with a drop-down containing the complete tag-set were displayed, as illustrated in Figure 1.The drop-down was populated with a default only when all systems generated the same tag, otherwise, the human annotator needed to actively select a tag. Regardless of default, any tag could be manually selected as the appropriate tag for each word. Results A clinical text corpus of 212 randomly selected sentences from the 10 most common clinical report types from Intermountain Medical Center were annotated by two annotators using the annotation tool. An Inter-annotator agreement of 0.95 (p-value < 0.0001) was achieved over 3,672 tagged words using Fleiss’ kappa. Demand effect was analyzed by evaluating the kappa between the human annotators and the six taggers. We would expect to see a relatively high kappa score between human annotators and the taggers as these taggers are on average 80-86% accurate on clinical texts. We would not, however, expect to see scores between human annotator and tagger reaching the same levels of which are seen between human annotators. This would be indicative of passive system agreement and demand effect. We would expect, and did see, higher kappa scores between human annotators reflecting reliance on their expert knowledge of the target domain. Using the approach described, we were able to leverage existing, moderately accurate, NLP tools to assist human annotators in producing a high quality reference standard without the introduction of bias due to the semiautomated support for the manual annotation process. References 1.The Stanford Natural Language Processing Group. Stanford Log-linear Part-Of-Speech Tagger. http://nlp.stanford.edu/software/tagger.shtml 2.OpenNLP. OpenNLP Tools. http://opennlp. sourceforge.net/ 3. Academic and Research Technologies, Northwestern University. MorphAdorner. Acknowledgements http://morphadorner.northwestern.edu/ 4. Alias-i. LingPipe. http://alias-i.com/lingpipe/index.html 5.Cognitive Computation Group. University of Illinois at Contact Information Urbana-Champaign. Illinois Part of Speech Tagger. http://l2r.cs.uiuc.edu/~cogcomp/asoftware.php?skey=FL BJPOS 6.The Lexical Systems Group. National Library of Medicine. Specialist NLP Tools. http://lexsrv3.nlm.nih.gov/Specialist/Home/index.html