Machine Translation

Maria Hedblom
Kognitionsvetenskap 2
Ht 2010
Table of content
1. Introduction............................................................................................2
1.1 Purpose..................................................................................................................................2
1.2 Word definition and comments.............................................................................................2
1.3 Why Machine Translation?...................................................................................................3
1.4 Introducing Machine Translation..........................................................................................4
2. Problems with Machine Translation.....................................................5
2.1 Metaphors and Anecdotes.....................................................................................................6
2.2 Ambiguity and Fertility.........................................................................................................7
3. Traditional Machine translation............................................................8
3.1 Direct MT.............................................................................................................................8
3.2 Interlingua system ................................................................................................................9
3.3 Transfer system MT............................................................................................................10
4. Modern Machine Translation...............................................................11
4.1 Rule based MT....................................................................................................................11
4.2 Corpus based MT................................................................................................................12
4.3 Statistic based MT..............................................................................................................13
4.4. Hybrid MT ........................................................................................................................14
5. Discussion............................................................................................15
6. References............................................................................................17
1. Introduction
1.1 Purpose
Part of the examination in the course Artificiell Intelligens II(Artificial Intelligence II) with
course code 729G11, was to research a subject within Artificial Intelligence and to write an essay
of the knowledge learnt.
The purpose of this rapport is therefore to increase, first and foremost, my own
knowledge in this particular subject but I hope that others who read this may find it helpful to
understand the basics in Machine Translation.
The rapport will focus on trying to explain some of the problems of Machine Translation,
it is different forms, and how their artificial intelligence makes them unique amongst the others.
The reason why I have chosen to study Machine Translation is because I am interested in
how different language affect us humans and how Machine Translation can help us understand
some of these differences both linguistically and culturally.
1.2 Word definition and comments
Machine Translation
Knowledge based Machine Translation
Rule based Machine Translation
Statistical based Machine Translation
Hybrid Machine Translation
Artificial Intelligence
Expectation Maximization
How to read it:
This indicates that the translation of the words or sentence
is in the defined language.
Important term
René Descartes
French philosopher, scientist and military living between 15961650. Known among other things for his involvement in
rationalism and the theory that knowledge is achieved from
Rosetta Stone
A basalt stone, dating from 196 BC, with an inscription found in
the Egyptian city Rosetta in 1799. The inscription is written in
three languages: Greek, Egyptian demotic hieroglyphs and
Egyptian hieroglyphs. Mostly known for its influence on
deshiffering the Egyptian hieroglyphs.2
Star Trek
A science fiction tv-show, movie series and world wide
1.3 Why Machine Translation?
Thanks to modern technology such as the Internet, cell-phones and a highly developed
infrastructure we can easily have conversations with people all over the world, do business
between nations that speak different languages and travel just for the fun of it.
The Internet, online literature and efficiency in reaching as many as possible with a
certain message, are only few of several reasons why Machine Translation(MT) is so important.
It can help us overcome the barriers that different languages create and increase the
communication between nations and therefore also the welfare of the world.
Unfortunately this is not quite as easy as one would like, to translate a text from one
language to another, a reason to why this research is still very current and ongoing.
Machine Translation has proven to be very useful within several areas and I will mention
1 The Philosophy Net,, 2010-09-26
2 Nationalencyklopedin, 2010-09-24
a few of these to help you understand how useful software such as this can be.
The first, perhaps unnecessary, area to mention is the possibility to make a Literary
translation from a text. This has been found to be very difficult, and is still not possible for all
texts, but it is rather easy to produce a Rough translation of a text using a MT. This is very useful
when you, for example, look at a foreign web page and wish to get a gist of the contents.3
Another area is Restricted-source translation. This means that a text concerning a certain
subject is translated. An example that has proved to be rather successful is the METEO, a
translating system made by the TAUM group at the University of Montreal that is been in use
since the 1977's4, to translated weather rapports from English to French.5
The last area, that I will mention, that MT have been successful in is Preedited
translation: a human writes a text based on the rules the MT in question uses, in other words a
restricted language, to make it easier to translate this text to different languages afterwards.
These restricted languages are often called Caterpillar English because the first one to try this
translation model was Caterpillar Corp. when they tried to make the translation of manuals more
1.4 Introducing Machine Translation
In 1799 French officers found a stone, dating from 196 BC, in the Egyptian city of Rosetta. A
stone with an inscription written in three different languages. The content of the stone inscription
was irrelevant but it did contribute with the final piece of the puzzle to be able to interpret the
Egyptian Hieroglyphs. Ever since then the Rosetta Stone is a symbol for translation.
Today we might not find stones to help us understand other languages but instead we
have invented Machine Translation, a system that allows us to do as little as possible and still
understand other languages. But what is Machine Translation?
“The job of a translator is to render in one language the meaning expressed by a
passage of text in another language.”7
For MT this is preferable done with as little of, if any at all, human interaction as possible.
Russel,S, Norvig, P; Artificial Intelligence: A Modern Approach, New Jersey 2003, p.851
Chandious, 2010-09-25
Russel, p.851
Russel, p.851
Brown, P.F, Cocke, J, Della Pietra A, Della Pietra, J.V, Jelinek, F, Lafferty, J.D, Mercer, R.L, Roossin, P.S; A
Statistical Approach to Machine Translation, Computational Lingustics Volume 16 (1990) 2:79-85
Even though Machine Translation has been a practical process only since the 1950's, the
idea of machine translation has been alive for more than 300 years. In the 17th century René
Descartes introduced the idea of a universal language. A language that in different tongues
shared both the same content and symbols. Today there isn't such a wide spread language apart
from perhaps the mathematical symbols. But we can find universal languages in smaller
“universes”. For example the Chinese writing system, where there are many different dialects
but each and every one of them uses the same writing system. English could also be considered
as a form of a universal language.
How did this then turn into actual Machine Translation? It wasn't until the early 20th
century when the technology had advanced enough for MT to start evolving. A few pioneers,
among them George Artsrouni and Petr Smirnov-Troyanskii, patented in the early 1930´s their
ideas. Artsrouni had made a storage device on paper tape which could find the equivalent word
in other languages. But it was Smirnov-Troyanskii's idea that had more significance to modern
MT. He proposed a three staged translation step. The first, including a linguist from the source
language, was to make a logical analysis turning the sentence into base forms and syntactic
functions. The second step was to let a machine translate it to the other language from these base
forms, and lastly another linguistic was to make the final touch to the translation.8
It may seem simple enough but it is very similar to today's MT, even though step one and
three also is done by machines rather by humans.
During the 1950´s and the development of computers, MT had prime years and the
expectations were very high. Surprisingly little of the early MT was based on theoretical
linguistics9. Instead the approaches focused on word-for-word translation and statistical methods.
After a couple of decades it was found that the expectations may have been a bit overrated and
the business was rather drawn back for some time. The first real successful MT after the first
wave was in the 1977's when the METEO by the TAUM group at the university of Montreal.10
Today MT researchers know the limitations and instead of trying to annihilate them fully,
focus is put on trying to produce good enough translations despite the problems.
8 Hutchins, W.J; Machine Translation: A Brief History, p.3
9 Nirenburg, S; Readings in Machine Translation, 2003, p.4
10 Hutchins, p.7
2. Problems with Machine Translation
As I have already mentioned there are several problems that prevents Machine Translation(MT)
from being able to translate a text perfectly. Sergei Nirenburg gives a list of six elements that the
MT must manage before a perfect translation is possible.
The first one is the Field of discourse, the MT must be able to recognise the general
subject of the text. The second is that it must be able to Recognise coherent word groups, such as
idioms and compound nouns. The third is to know the Syntactic function for each word. The
forth is to understand The selectional relations between words in open classes, that is, nouns,
verbs, adjectives, and adverbs. The fifth is Antecedents, meaning the ability to understand how
words are built and to place translated words in the right order based on previous statements in
the text.
These first five are more focused on a linguistic approach, with some exception, and can
more or less be solved using linguistics rules and corpuses in different forms. Even though there
is knowledge of where we should start to look for a solution to these problems, it is not as MT
creators have all the answers, it is still very difficult. But perhaps the most challenging part is the
last one on Nirenburgs list, for an MT to actually Understand the context of a text. For it to not
only understand the words and the gist but to actually follow the reasoning, the underlying
contents and meanings. It is likely that this is the last piece of the puzzle of MT, if it ever is
2.1 Metaphors and Anecdotes
As Nirenburg said understanding the context is one of the most difficult parts of MT. Metaphors
and anecdotes are two of these difficult things a MT has to translate and are therefore suitable to
explore further.
Machine Translation programs use different kinds of Artificial Intelligence to translate a
text, which is explained later. But take a person trying to learn a language, it is a good enough
simile. To help him he has got a dictionary and a grammar book. He will most likely translate
English[It's raining cats and dogs] literately, which most likely won't make any sense to him, the
expression does not indicate that it in fact is raining mammals. The direct translation would be
Swedish[Det regnar katter och hundar] whereas a more appropriate translation would be
11 Nirenburg, p.7
Swedish[Det spöregnar] which basically means that it rains a lot.
To fully understand the expression he will need someone, or something, to inform him of
that this is a phrase that is not meant to be taken literately but as an expression for something
else. This can be done in a large lexicon where all metaphors are listed, which probably would
work pretty well even though it is both memory and time insufficient. But what about anecdotes?
Anecdotes are stories that is trying to say something else than the actual contents. These
are usually build on the local culture and are therefore very difficult to explain. Religious text for
example is full of anecdotes. The Holy Bible's Jesus speaks more or less only in anecdotes as He
delivers His message12. An MT would probably be able to give a fair enough translation, but it is
likely that the real message would be misinterpreted.
It is interesting to question whether a machine translator can lead to loss of, apart from
the obvious context, also of culture. For example religious texts or even poetry wouldn't perhaps
be the first thing you'd want a machine to translate, would it?
2.2 Ambiguity and Fertility
Ambiguity means that a word or a text can have two or more meanings, of course making MT
more difficult. Take for example the word English[Light], light as in either not dark or not heavy.
Ambiguity comes in two forms, it is either Lexical or Structural. The example from
before is a typical Lexical ambiguity, meaning that one word have one or several different
meanings. If we assume a MT has knowledge over English and Swedish grammar and knows
that Coca Cola is a name, a translation might look like this:
English[Coca Cola light] → Swedish[Ljus Coca Cola] or
Swedish[Lätt Coca Cola]
This is perhaps not exactly what we're trying to express. Another example is
English[The queen can't bear children ] → Swedish[Drottningen kan inte få barn] or
Swedish[Drottningen står inte ut med barn]
In an attempt to prevent this lexical ambiguity to ruin MT, Fertility was invented. It is a form
mostly used in Statistical based MT and means that a word with fertility n is copied n times to
12 The Holy Bible, Matthew 1:1 - John 21:25
make sure that all possible translations is considered.13
Structural ambiguity is when a sentence can be interpreted wrong. For example
English[The policeman killed the man with a gun] a sentence that means that the policeman used
a gun to kill a man, or that the policeman killed a guy that had a gun. Neither interpretation is
wrong, nor is one more correct than the other one. The only way to know which interpretation is
more correct than the other is to analyse the context of the text, something a MT does not
3. Traditional Machine translation
The traditional Machine Translation(MT) is what today's MT is based on, even though it evolves
more and more when new ideas get into the picture. But to understand where MT is today we
need to have knowledge over the original versions of MT.
3.1 Direct MT
This form of MT is the most basic one. It translates the individual words in a sentence from one
language to another using a two-way dictionary. To its help it uses very simple grammar rules,
only the most basic and general ones. Irregular verb morphologies are often incorrect since they
are bent as general verbs and often the word order lack in perfection due to the word-for-word
translation system.15
Even though the MT makes many faults in the translation, the translation is usually
comprehensible to a human reader. However since the translation choose the words from a
statistical point of view, the most common translation first, and not depending on the content of
the text, it is likely that some of the source texts meaning can be lost in the translation, especially
in more advanced texts.16
The picture below is an attempt to show how Direct MT works in translation from
different languages, A-D.
Russel, p.855
Bach, K; Ambiguity,, 2010-09-26
Watters, P.A, Patel, M; Semantic processing preformance of Internet machine translation system, p.153
Watters, p.155
gure 1: Explains the principles of
Direct MT.
One of the oldest and most widely used Direct MT's is SYSTRAN, a translation software that
has been active for over 40 years and is integrated in several online MT17, for example Yahoo's
“Babelfish”18. To make a simple example over how it works the following sentence has been
translated using a Direct MT:
English[The dog is in his house] → Swedish[Hunden är i hans hus]
Those of us who speak Swedish can see that even though the words are translated correctly the
translation of “his → hans” is inappropriate. The pronoun should be bent into “sitt” instead of
3.2 Interlingua system
This is actually a sub version of Direct MT rather than a separate MT version in itself. Even
though the basics in how the translation works is fundamentally the same, it is still unique in its
way to reach a translation.
This form of Direct MT simply converts the words into a universal language that is
created for the MT simply to translate it once more to the language we originally were interested
in. Sometimes this universal language is Esperanto or English. This has a lot of benefits
concerning the desire to translate a text into many different languages. Since we no longer need a
two-way dictionary an Interlingua MT can easily add new languages to be translated without
17 SYSTRAN,, 2010-10-03
18 Babelfish,, 2010-10-03
much effort. However its time efficiency is lower than that of an ordinary Direct MT.19
Compare figure 2 that shows how an interlingua system works and that of an ordinary
Direct MT in figure 1.
Figure 2: picture showing how
different languages are translated
through a Interlingua language.
An example made with English as an interlingual language could look like this:
Spanish[El perro está in su casa] → Interlingua[The dog is in his house] →
Swedish[Hunden är i hans hus]
Important to notice in interlingua translations is that since there are two translations, one from
the source language and one from the interlingua language, a lot more information can be lost or
interpreted wrongly on the way to the target language. But even though it may lose information
and misinterpret it is a lot more practical when several languages are to be interpreted since it
only needs to translate it once from the source language.
19 Watters, p.153
Figure 3: explains how a language A can be translated to
several languages using an interlangua.
3.3 Transfer system MT
Transfer system MT is slightly more clever than the direct translation MT because it is based on
a database of translation rules rather than a dictionary. Whenever a sentence matches one of the
rules, or examples, it is translated directly using a dictionary such as in direct translation. It goes
from the source language to a morphological and syntactic analysis to produce a sort of
interlingua on the base forms of the source language, from this it translates it to the base forms of
the target language and from there a better translation is made to create the final step in the
translation.20 See figure 4.
Figure 4: describing a transfer system.
This is a MT that can occur at lexical, syntactic or semantic lever, whereas the direct MT
only works on a lexical level. This of course makes the MT superior in the quality of the
translation but there are still a lot of problems if you for example want to translate ”White
House” from English to Spanish it would follow rules such as this:
20 Sánchez-Martínez,F, Ney, N; Using Alignment Templates to Infer Shallow-Transfer Machine Translation Rules,
Berlin 2006, p.755
English [adjective noun] → Spanish[noun adjective]
English[White House] → Spanish[Casa Blanca]
However correct this literal translation may be, it is perhaps not necessarily what we meant.
When we want to translate the government building of the United States we end up with a
classical movie.
4. Modern Machine Translation
4.1 Rule based MT
This has its origin in Transfer system MT where it in likeness uses a database of rules, usually
based on morphological and bilingual dictionaries, to translate a text.21 One problem is that since
it never stores new information it is incapable of handling new texts.22
One of the more common rule-based MTs is the Knowledge based MT(KBMT). It works
from the point of view that you need to understand the text to be able to translate it. To return to
the White House-example from before, it would be easy to put it into context and from
knowledge based MT know that White House is not to be translated because it is a name.
Even though KBMT have been found to be quite accurate in its translations it needs a lot
of computer memory. This has made it rather difficult to apply to larger texts such a book or
even a newspaper.23
4.2 Corpus based MT
Corpus based MT gets all information from large corpuses were grammatical rules have been
cast aside to use pairs of the different languages to translate words and sentences.
Example based MT is one of the bigger versions of a Corpus based MT. It uses
Translation templates which is a bilingual pair of sentences or phrases where words are coupled
and replaced by variables .24 The goal is to have a large, and good, enough corpus to be able to
directly translate word after word in a sentence based on the translation templates. The idea is
21 Sánchez-Martínez, p.759
22 Carl, M, Pease, C, Iomdin, L.L, Streiter, O; Towards a Dynamic Linkage of Example-based and Rule-based
Machine Translation, Machine Translation (2000) 15:223-257, p.225
23 Knight, K, Luk, S.K; Building a Large-Scale Knowledge Base for Machine Translation, 1994, 2010-09-24
24 Sanchez-Martínez, p.757
that since these pairs always are correct, if we find one of the pairs in the source text, we don't
need grammar rules to know the corresponding phrase or word.
Figure 5: Describes the idea behind
Translation templates of a phrase in
two different languages.
However, most Example based MT's usually uses some grammar to get the corpus to a
minimum, this however almost turns the Example based MT into a Transfer system MT, with the
important exception that in Example based MT the rules are very specific, almost one for every
case, whereas in Transfer systems MT the rules are more general.
4.3 Statistic based MT
Statistic based MT(SMT) is, together with Hybrid MT, one of the most frequently used MT's
today. It is a Corpus based MT that usually translate using the Expectation Maximization
Algorithm or Baye's rule, two similar ways to determine the probability of a translation being
The general idea is that the translation will be from the most likely translated word. To
know which word is most common statistical data that is gathered from several bilingual
corpuses, today the Internet is the biggest source.
In Statistic based MT we are interested in the probability that “White House” is in fact to
be interpreted as “Casa Blanca” rather than the actual “White House” and based on the
probability the MT chose the translation.
A SMT build on probability rules such as Baye's rule works like this:
25 Sofianopoulos, S, Tambouratzis, G; Multi-objective optimisation of real valued parameters of a hybrid MT
system using Genetic Algorithms, Pattern Recognition Letters 31 (2010), p.1672
Baye's rule says that:
P(S | E) = P(E | S)*P(S) / P(E)
This however equals the following equation because the P(E) is constant.
P(S | E) = P(E | S)*P(S)
P(E | S) : is how probable that the English sentence is a translation to the Swedish one.
P(S) : is the probability that it is this Swedish sentence
P(E) : is the probability that it is this English sentence
This means that the probability that the English sentence is a correct translation of the Swedish
sentence is equal to the probability that the Swedish sentence is a correct translation of the
English sentence multiplied with the probability that the Swedish sentence is correct.26
Expectation Maximization(EM) Algorithm is another mathematical approach to
determinate the likelihood, rather than the probability, of missed data or in this case, missed
translations, and from this the likelihood of the translation to be the best. Then it repeats this
steps until only one translation is left. Hopefully it is the most accurate one.27
4.4. Hybrid MT
There are a number of different types of Hybrid MT(HMT):s since the essence of this version of
Machine Translation focus upon a mix of the other ones. There is still a lot of research being
done within this type of MT and due to this it is difficult to give a complete explanation of HMT.
But to help you get the general idea of the concept an attempt to explain how one of the
more common hybrid MT's works. Namely the Generation-Heavy MT(GHMT).
GHMT comes from a history of primarily transfer and interlingua MT but is in itself
neither. Due to the fact that it has more knowledge of words and lexical translations it is more of
a knowledge based MT than a transfer system. However since this lexical knowledge is
represented in a transfer-like interlingua tree it is assumable to categorize GHMT as a partly
transfer system.28
26 Russel, S, Norvig, P; Artificial Intelligence: A Modern Approach, New Jersey 2003, p.853
27 Borman, S; The expectation Maximization Algorithm: A short tutorial, 2004
28 Habash, N, Dorr, B, Monz, C; Symbolic to statistical hybridization: Extending generation heavy machine
The data result from the lexical tree is then analysed and ranked from a statistical point of
view to get the most likely translation in the top of the tree. The next step is a rather complicated
one based of several step in which I will not venture in further in the believe that that will most
likely complicate matters more than it will be beneficial to the understanding of Hybrid MT.
What happens in this next step is, very much simplified, the final translation from the interlingua
to the desired language including organizing the words after appropriate grammar and sentence
structure as done by a traditional Knowledge based MT.29 See figure 6 for an overview of how
GHMT works.
Figure 6: Simplified explanation of how the GHMT works.
5. Discussion
That Machine Translation is something important has already been established since it is useful
in more areas than one and it is likely that the MT systems of today will get more powerful and
be able to produce even better translations than today. The different forms of MT all have
benefits and disadvantages and it is likely that the best MT that ever will be created is a Hybrid
MT since it can take the best aspects of all the other MTs.
However I do believe that we will never be able to create a MT that is able to translate
every possible text, from all different languages perfectly. Even if we were able to create a MT
that could handle the grammar correctly, with all its exceptions, metaphors, ambiguity and so on,
I simply believe that a language is more than linguistics and words. A language is a culture and is
therefore difficult enough for a human to translate, since a culture is something that has to be
translation, 2009, p.28
29 Habash, p.32
This said, I do enjoy the world of MT very much and use it very frequently, if not daily,
and perhaps it is not so important to take the cultural part of MT so seriously. As long as we are
aware of the limitations of the technology, and realise that MT might misinterpret or interpret
culture that we as humans of another culture cannot understand, we still can benefit from the
possibilities that a language is available to us even if it is trough a veil.
Even though I have disbeliefs of whether we ever will create a perfect MT I do believe
that MT will be evolved into not only written language but also spoken language. This is quite a
fresh research area that is still under construction but I see it as a possible future that one day we
won't need an interpreter to be able to directly communicate with someone in another language.
Perhaps all we will need is a small gadget that translates the other persons speech directly into
our ears. In this way, to experience the language in a more appropriate and direct association to it
is culture it is likely that the mistakes that the translation gadget makes won't be of the same
magnitude since a human is there to correct it. This is a wonderful thought, to be able to directly
talk to people of other languages but I believe that this will be found to be even more difficult
than the MTs for written language. When we speak we have a lot of different ways of speaking;
slang, dialects, different ways of word order, pauses. If you believe that you speak as they do in
the movies, you either only watches dogmatic documentaries or you're unaware of how you
speak. However I do believe that this will be a possibility and not only something you'd see on
Star Trek.
But perhaps it is likelier that the world has united under a universal language before this
is perfected, maybe under a language such as Mandarin or perhaps English.
6. References
Sean Borman, The expectation Maximization Algorithm: A short tutorial, 2004, 2010-10-02
Peter F. Brown, John Cocke, A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek,
John D. Lafferty, Robert L.Mercer & Paul S. Roossin; A Statistical Approach to Machine
Translation, Computational Lingustics Volume 16 (1990) 2:79-85, 2010-09-24
Michael Carl, Catherine Pease, Leonid L. Iomdin & Oliver Streiter; Towards a Dynamic
Linkage of Example-based and Rule-based Machine Translation, Machine Translation
(2000) Nr 15:223-257.
Nizar Habash, Bonnie Dorr & Christof Monz; Symbolic-to-statistical hybridization:
Extending generation-heavy machine translation, Mach Translat (2009) Nr 23:23–63
W. John Hutchins; Machine Translation: A Brief History, Oxford 1995, 2010-10-02
Kevin Knight & Steve K. Luk; Building a Large-Scale Knowledge Base for Machine
Translation, 1994,, 2010-09-24
Felipe Sanchez-Martínez & Hermann Ney; Using Alignment Templates to Infer ShallowTransfer Machine Translation Rules, Berlin 2006, 2010-10-03
Sokratis Sofianopoulos & George Tambouratzis; Multi-objective optimisation of real
valued parameters of a hybrid MT system using Genetic Algorithms, Pattern
Recognition Letters (2010) Nr 31:1672–1682
Paul A. Watters & Malti Patel; Semantic processing preformance of Internet machine
translation system, Internet Research: Electronic Networkning Applications and Policy
Volume 9(1999) Nr 2:153-160
Sergei Nierenburg, Harold L. Somers & Yorick A. Wilks: Reading in Machine
Translation, 2003
Suart Russel & Peter Norvig; Artificial Intelligence: A Modern Approach, New Jersey
Web pages:
Kent Bach; Ambiguity,, 2010-09-26
Chandious,, 2010-09-25
Cultivate Interactive, Marieke Napier, 2002
Nationalencyklopedin, 2010-09-24
SYSTRAN,, 2010-10-03
