Stanford, J. McCulloh, I. (2006) Text Analysis Using Automated Language Translators. Proceedings of the 14th Annual Army Research Lab - US Military Academy Technical Symposium, Aberdeen, MD 1 Nov 2006.

advertisement
Text Analysis Using Automated
Language Translators
CDT John Stanford
MAJ Ian McCulloh
Agenda
•
•
•
•
•
Overview and Hypothesis
Literature Review
Motivation (Radio Address Case Study)
Arabic Translation Data
Conclusions and Recommendations
Overview and Hypothesis
• Text analysis is a useful tool for gathering intelligence.
• A language barrier exists that makes text analysis harder in nonEnglish-speaking regions.
• Hiring human translators to translate texts into English is slow,
expensive, and possibly a security issue.
• Hypothesis: Output from automated machine translators such as the
Forward Area Lanuage Converter (FALCon) is difficult for the
average person to understand, but is just as useful for text analysis
as human-translated text.
Literature Review
• This project relates to two ARL projects: FALCon and the ARL
Dynamic Network Analysis Lab.
• Language can be modeled mathematically as a network of concepts
using an adjacency matrix (Sowa, 1984).
• Preprocessing steps such as stemming, deletion, and thesaurus
application prepare a text for analysis (Carley and Diesner, 2004).
• AutoMap, being developed by Carnegie Mellon University, inputs
texts and outputs adjacency matrices.
• ORA, also being developed by CMU, inputs the adjacency matrices
and outputs the mental models (Carley and Reminga, 2004).
Text Analysis Process
Radio Address Study
• 94 of the President’s weekly radio addresses analyzed
• From after Sep 11th to after the beginning of OIF (15 Sep 2001 to 21
June 2003)
• Concept of ‘violence’ plotted on timeline; high occurrence after Sep
11th and leading up to OIF
Arabic Text Analysis
• Arabic translated using CyberTrans, part of the FALCon package.
• 22 Arabic articles from the Department of State’s news site analyzed
(US Dept of State, 2006).
Analysis Results
• Top concepts for the two methods of
translation are the same in 16 of the
22 articles.
• Top concept in the human-translated
text is in the top three machinetranslated concepts for all articles
• When the methods differ, the human
translation isn’t necessarily better.
Human
Machine
Conclusions and
Recommendations
• Automated text analysis makes it fast and economical to look at
trends in local publications of strategically significant regions over
either time or space.
• Detailed statistical analysis must be done on this data.
• Intelligence agencies who have access to large volumes of
REDFOR data should run this kind of text analysis to verify that it
works as well on REDFOR data as BLUFOR data.
• FALCon development should continue and possibly be expanded to
other languages such as Farsi.
Works Cited
Bush, George. (2001-03). “President Bush’s Radio Addresses by date and
topic.” Washington, DC: Office of the Press Secretary. Available from <
http://www.whitehouse.gov/news/radio/index.html>.
Carley, Kathleen and Diesner, Jana. (2004). Revealing Social Structure
from Texts: Meta-Matrix Text Analysis as a novel method for Network
Text Analysis. Causal Mapping for Information Systems and Technology
Research: Approaches, Advances, and Illustrations., Harrisburg, PA:
Idea Group Publishing.
Sowa, J.F. (1984). Conceptual Structures: Information Processing in Mind
and Machine. Reading, MA: Addison-Wesley.
US Dept of State. (2006). “News from Washington.” Washington, DC:
Office of the Press Secretary. Available from <
http://usinfo.state.gov/usinfo/products/washfile.html>.
Questions?
Dept of Mathematical Sciences
Unites States Military Academy
Dynamic Network Analysis Lab
Army Research Lab
Download