Text Analysis Using Automated Language Translators CDT John Stanford MAJ Ian McCulloh Agenda • • • • • Overview and Hypothesis Literature Review Motivation (Radio Address Case Study) Arabic Translation Data Conclusions and Recommendations Overview and Hypothesis • Text analysis is a useful tool for gathering intelligence. • A language barrier exists that makes text analysis harder in nonEnglish-speaking regions. • Hiring human translators to translate texts into English is slow, expensive, and possibly a security issue. • Hypothesis: Output from automated machine translators such as the Forward Area Lanuage Converter (FALCon) is difficult for the average person to understand, but is just as useful for text analysis as human-translated text. Literature Review • This project relates to two ARL projects: FALCon and the ARL Dynamic Network Analysis Lab. • Language can be modeled mathematically as a network of concepts using an adjacency matrix (Sowa, 1984). • Preprocessing steps such as stemming, deletion, and thesaurus application prepare a text for analysis (Carley and Diesner, 2004). • AutoMap, being developed by Carnegie Mellon University, inputs texts and outputs adjacency matrices. • ORA, also being developed by CMU, inputs the adjacency matrices and outputs the mental models (Carley and Reminga, 2004). Text Analysis Process Radio Address Study • 94 of the President’s weekly radio addresses analyzed • From after Sep 11th to after the beginning of OIF (15 Sep 2001 to 21 June 2003) • Concept of ‘violence’ plotted on timeline; high occurrence after Sep 11th and leading up to OIF Arabic Text Analysis • Arabic translated using CyberTrans, part of the FALCon package. • 22 Arabic articles from the Department of State’s news site analyzed (US Dept of State, 2006). Analysis Results • Top concepts for the two methods of translation are the same in 16 of the 22 articles. • Top concept in the human-translated text is in the top three machinetranslated concepts for all articles • When the methods differ, the human translation isn’t necessarily better. Human Machine Conclusions and Recommendations • Automated text analysis makes it fast and economical to look at trends in local publications of strategically significant regions over either time or space. • Detailed statistical analysis must be done on this data. • Intelligence agencies who have access to large volumes of REDFOR data should run this kind of text analysis to verify that it works as well on REDFOR data as BLUFOR data. • FALCon development should continue and possibly be expanded to other languages such as Farsi. Works Cited Bush, George. (2001-03). “President Bush’s Radio Addresses by date and topic.” Washington, DC: Office of the Press Secretary. Available from < http://www.whitehouse.gov/news/radio/index.html>. Carley, Kathleen and Diesner, Jana. (2004). Revealing Social Structure from Texts: Meta-Matrix Text Analysis as a novel method for Network Text Analysis. Causal Mapping for Information Systems and Technology Research: Approaches, Advances, and Illustrations., Harrisburg, PA: Idea Group Publishing. Sowa, J.F. (1984). Conceptual Structures: Information Processing in Mind and Machine. Reading, MA: Addison-Wesley. US Dept of State. (2006). “News from Washington.” Washington, DC: Office of the Press Secretary. Available from < http://usinfo.state.gov/usinfo/products/washfile.html>. Questions? Dept of Mathematical Sciences Unites States Military Academy Dynamic Network Analysis Lab Army Research Lab