Machine translation - Villanova Department of Computing Sciences

advertisement
Introduction to Machine
Translation
CSC 5930 Machine Translation
Fall 2012
Dr. Tom Way
1
HISTORY OF MACHINE
TRANSLATION
2
History of Machine Translation
(Based on work by John Hutchins, mt-archive.info)
• Before the computer: In the mid 1930s, a FrenchArmenian Georges Artsrouni and a Russian Petr
Troyanskii applied for patents for ‘translating machines’.
• The pioneers (1947-1954): the first public MT demo was
given in 1954 (by IBM and Georgetown University).
• Machine translation was one of the first applications
envisioned for computers
3
History of MT (2)
Warren Weaver, PhD was an American scientist, mathematician,
and science administrator. He is widely recognized as one of the
pioneers of machine translation, and as an important figure in
creating support for science in the United States.
4
History of MT (3)
First demonstrated by IBM in
1954 with a basic word-forword translation system
5
History of MT (4)
• The decade of optimism (1954-1966)
ended with the…
• ALPAC (Automatic Language Processing
Advisory Committee) report in 1966:
“There is no immediate or predictable
prospect of useful machine translation."
6
History of MT (5)
The ALPAC Report
The ALPAC (Automatic Language
Processing Advisory Committee)
was a govt. committee of seven
scientists.
Their 1966 report was very
skeptical of the progress in
computational linguistics and
machine translation.
7
History of MT (6)
• The aftermath of the ALPAC report…
• Research on machine translation virtually
stopped from 1966 to 1980
8
History of MT (7)
• Then, a rebirth…
• The 1980s: Interlingua, example-based
Machine Translation
• The 1990s: Statistical MT
• The 2000s: Hybrid MT
• The 2010s: Google, real-time, mobile,
Crowdsourcing, more hybrid approaches
9
MACHINE TRANSLATION
TODAY
10
Where are we now?
• Huge potential/need due to the internet, globalization
and international politics.
• Quick development time due to Statistical Machine
Translation (SMT), the availability of parallel data and
computers.
• Translation is reasonable for language pairs with a large
amount of resources.
• Start to include more “minor” languages.
11
Rule-based MT
The Vauquois Triangle
12
Statistical MT
The Rosetta Stone
13
What is MT good for?
•
•
•
•
Rough translation: web data
Computer-aided human translation
Translation for limited domain
Cross-lingual IR
• Machines beat humans at:
– Speed: much faster than humans
– Memory: can easily memorize millions of word/phrase
translations.
– Manpower: machines are much cheaper than humans
– Fast learner: it takes minutes or hours to build a new system.
– Never complain, never get tired, …
14
Interest in Machine Translation (1)
• Commercial interest:
– U.S. has invested in machine translation (MT)
for intelligence purposes
– MT is popular on the web—it is the most used
of Google’s special features
– EU spends more than $1 billion on translation
costs each year.
– (Semi-)automated translation could lead to
huge savings
15
Interest in Machine Translation (2)
• Academic interest:
– One of the most challenging problems in NLP
research
– Requires knowledge from many NLP subareas, e.g., lexical semantics, syntactic
parsing, morphological analysis, statistical
modeling,…
– Being able to establish links between two
languages allows for transferring resources
from one language to another
16
Goals & Uses
•
•
•
•
•
•
•
Translating
Summarizing
Communicating
Pre-editing
Grammar analysis
Analyzing text
Understanding text and images
17
DO WE REALLY NEED
MACHINE TRANSLATION?
18
Languages on the Internet
19
Languages on Twitter
20
Languages in Los Angeles
21
Why do we need MT?
22
Why do we need MT?
23
Why do we need MT?
24
Why is MT hard?
25
Why is MT hard?
26
Why is MT hard?
27
Why is MT hard?
• For example…
• Commercial system “Language Weaver”
created in 2002
• Uses statistical techniques from
cryptography and machine to acquire
statistical models from human translations
• Sold in 2010 for $42.5 million
28
“Language Weaver” SMT System –
Comparison: Arabic to English
v.2.0 – October 2003
v.2.4 – October 2004
v.3.0 - February 2005
Download