RbMT and SMT - WordPress.com

advertisement
Dallin Hardcastle
LING 480
11/14/2012
RbMT or SMT?
For this paper, I have investigated major differences between Rule-Based
Machine Translation (RbMT) and Statistical Machine Translation (SMT) to discover
which system is not only currently superior, but also the pathway of the future of machine
translation. While each system can be very beneficial, I have concluded that neither is
currently superior, and neither is the concrete answer for a more efficient translation
service. The answer lies in an effective mix of both ideologies, a blend of both RbMT
and SMT, or a Hybrid Machine Translation, will lead to the biggest advances in machine
translation since the invention of the computer.
RbMT uses grammars, phonological rules, and other linguistic principles to
perform translations. There is tremendous upside in such translation systems if they have
been thoroughly developed. SYSTRAN is a company who has had a very successful past
in RbMT, dating back to 1968 when the company was founded by Dr. Peter Toma. They
were one of the few translation companies that survived the major decrease of funding
from ALPAC. Their system helped the United States translate millions of documents
during the Cold War and was the foundation of the free online translation service, Yahoo!
Babel Fish. The downfall of RbMT is that rapid translation is not feasible unless
extensive grammatical rules have already been established between certain languages.
This development is very costly, time consuming, and slow.
Serious SMT study began in the early 1990’s, when the United States government,
specifically the DARPA (Defense Advanced Research Projects Agency), funded and
IBM project called CANDIDE. The idea was to form accurate algorithms to statistically
analyze an extensive set of bilingual corpora to provide accurate, fluent-sounding
translations. This project guaranteed 80% accuracy to its algorithms, therefore not
guaranteeing accurate translations. DARPA eventually rated SYSTRAN’s system higher
on the accuracy scale and funding for CANDIDE was cut. There are certain advantages
to SMT, however, as it gives very rapid translations. Google Translate is probably the
most used free MT translation technology online today, and it is statistically based.
However, if one needs to translate a complex sentence that requires knowledge of
grammatical structure, SMT is not a reliable solution.
Dr. Sabine Hunsicker, a researcher at the German Research Center for Artificial
Intelligence, compares the two MT systems, “While SMT systems suffer from a lack of
grammatical structure, resulting in ungrammatical sentences, RbMT systems have to deal
with a lack of lexical coverage” (Hunsicker, 312). Both systems have serious
shortcomings. Dr. Yoricks Wilcks, a professor of Artifical Intelligence at Sheffield
University in the U.K., believes that a hybrid system is the answer to the future (Wilcks,
89). Ironically, IBM and SYSTRAN are two companies heading the development of
such technology. SYSTRAN released a new hybrid technology in 2010, and IBM has
teamed up with LinguaSys and Google Translate in the development of their own. It will
be interesting to see how these new Hybrid MT are superior to their predecessors.
The question lies in whether or not Hybrid MT will provide consistent FAHQUT
translations. My hypothesis is that while the number of full time translators may reduce,
the number of full time technical support will increase. The industry will continue to
grow and more jobs will be available, but available to those with technological training,
not just translators.
Works Cited
Wilks, Yorick, 1939. Machine Translation its Scope and Limits. Ed. SpringerLink
(Online service). New York; London: Springer, 2008. Print.
Hunsicker, Sabine, 2012. Machine Learning for Hybrid Machine Translation.
Proceedings of the 7th Workshop on Statistical Machine Translation, pages 312216 Association for Computational Linguistics. Montreál, Canada, June 8, 2012.
Download