A Comparative Study of Two Natural Language Processing Frameworks Yixin Bian, Gunes Koru, Hongfang Liu Department of Information Systems, University of Maryland, Baltimore County,MD,21250,USA June 11, 2012 Introduction UIMA (Unstructured Information Management Architecture) is a framework for natural language processing, originally developed by IBM but now maintained by the Apache Software Foundation. (General Architecture for Text GATE Engineering) is a Java suite of tools originally developed at the University of Sheffield and now used worldwide by a wide community of scientists, companies for all sorts of natural language processing tasks. Introduction Both developed in Java. Although they share common goals, the two architectures are different in many aspects. Which one to adopt ? Introduction In this paper, we compare them from three perspectives: Software design quality Code Metrics Software maintenance Code smells Bugs Bug survival curves User's manual The Comparison of Metrics The number of classes UIMA GATE 2,187 2,822 Min Median Max Total Average Value Min Median Max Total Average Value Line of Code 0 25 2944 169,516 77.51 0 23 3869 228,454 80.95 CBO 0 2 84 11822 5.41 0 2 65 11203 3.97 NOC 0 0 71 1170 0.53 0 0 81 1027 0.36 RFC 0 6 347 35220 16.1 0 3 214 29909 10.6 DIT 0 1 10 3837 1.75 0 1 8 4731 1.68 LCOM 0 16 100 79374 36.29 0 0 100 85051 30.14 WMC 0 4 345 15166 6.93 0 2 180 15220 5.39 The Number of Code Smells Code Smell The number of code smells in UIMA Average (UIMA/KLOC) The number of code smells in GATE Average (GATE/KLOC) Data Class 6 0.035 11 0.05 Data Clumps 63 0.372 21 0.091 Feature Envy 26 0.153 0 0 Refused Bequest 101 0.6 448 2.05 Long Message Chain 19 0.112 30 0.137 Shortgun Surgery 23 0.136 189 0.863 God Class 16 0.094 48 0.219 Total 254 1.5 747 3.41 The Number of Bugs Detection Tool UIMA GATE FindBugs (2.0.0) 6 178 PMD (5.0) 1798 1794 Lint4j (0.9.13) 84 494 The Comparison of Bug Survival Curves The Comparison of User Manuals Contents UIMA GATE Catalog √ √ Tutoral of manual √ √ Overview and characteristics of software product √ √ Installation and setup √ √ Introduction of product application √ √ Frequently Asked Questions (FAQ) √ Known issues and problems with the software √ × × Terms , concepts and their basic definitions in software √ × Conclusion Software design quality Software maintenance User’s manual UIMA is better than GATE. Thank you !