MODL5003 Principles and applications of machine translation

advertisement
MODL5003 Principles and applications of machine translation
Lecture 10/2/2003 Architectures for MT – direct, transfer and Interlingua
1. Overview




Classification of approaches to MT
Architectures of rule-based MT systems (the MT triangle)
Reviewing each architecture and its problems
Architectures compared
2. Revision of MT problems & how to deal with them
- Rule-based approaches (lecture today)
 Direct MT
 Transfer MT
 Interlingua MT
use formal models of our knowledge of language
("to explicate human knowledge used for translation, put it into an "Expert System")
problems:
expensive to build;
require precise knowledge, which might be not available
- Corpus-based approaches (lecture 28/4/2003)
 Example-based MT
 Statistical MT
use machine learning techniques on large collections of available texts;
e.g. "parallel texts" (aligned sentence by sentence; phrase by phrase)
("to let the data speak for themselves")
recent decade: shift into this direction: IBM MT system
problems:
language data are sparse (difficult to achieve saturation)
high-quality linguistic resources are also expensive
- Corpus-based support for rule-based approaches (current state-of-the-art technology)
speeding up the process of rule-creation
by retrieving translation equivalents automatically
3. Architectures of MT systems (the MT triangle*)
Interlingua**
Analysis
ST
Transfer
Direct translation
Synthesis
TT
* Other linguistic engineering technologies also have similar "triangle" hierarchy of architectures: e.g., Text-to-Speech triangle
**Interlingua = language independent representation of a text
4. Direct systems
 Essentially: word for word translation with some attention to local linguistic context
 No linguistic representation is build
(historically come first: the Georgetown experiment 1954-1963: 250 words, 6 grammar rules, 49 sentences)
1
Example (Paul Bennett, 2001):
English sentence: The questions are difficult
1. the <[N.plur]>  les
/*before plural noun*/
2.
<[article]>
questions[N.plur]
 /*'questions' is plur. noun after the article */
questions
3. <[not: "we" or "you"]> are  sont
/* unless it follows the words "we" or "you"*/
4. <are> difficult  difficilles
/*when it follows 'are'*/
(algorithm: a "window" of a limited size moves through the text and checks if any rules match)
Problems with direct systems
> there is no intermediate representation
- application development considerations:
- rules are "tactical", not "strategic" (do not generalise too much)
for each word-form (a member of a paradigm ) a separate set of rules is required
rules have little linguistic significance
there is no obvious link between our ideas about translation knowledge and the formalism
it is very hard to "think of" an accurate set of "direct" rules and to encode them manually
dealing with highly inflected languages becomes difficult
Russian: 90.000 dictionary entries (lexemes, lemas, headwords) have about 4.000.000 word forms
(These words do not cover about 3% of the text: do not include proper names, etc. that are also inflected)
Should there be 4.000.000 sets of rules for translation from Russian?
What happens if we translate between two highly inflected languages?
(combinatorial grow of the number of rules:
any Russian adjective with 24 word forms can be translated by a German adjective with 16 word forms
(potentially 384 rules) + additional word forms for German adjectives in the presence of an article
> some "intermediate morphological representation" is required (moving towards transfer systems)
- large systems become difficult to maintain and to develop further:
- at certain point a system becomes non-manageable: it is not possible to locate a rule which causes errors
there appear problems with debugging, spotting errors, and avoiding new errors if new features are introduced
(before moving forward we need to ensure that the system does not work worse)
- the problem of interaction of rules: rules are not completely independent (e.g., rules 1 and 2 in the example)
e.g., rules for introducing / translating an article in German influence the choice of the adjective ending
what happens if we have millions of rules?
Translation of German adjective stark (example of John Hutchins, 2002):
Das ist ein starker Mann
This is a strong man
Es war sein stärkstes Theaterstück
It has been his best play
Wir hoffen auf eine starke Beteiligung
We hope a large number of people will take part
Eine 100 Mann starke Truppe
A 100 strong unit
Der starke Regen überraschte uns
We were surprised by the heavy rain
Maria hat starkes Interesse gezeigt
Mary has shown strong interest
Paul hat starkes Fieber
Paul has high temperature
Das Auto war stark beschädigt
The car was badly damaged
Das Stück fand einen starken Widerhall
The piece had a considerable response
Das Essen was stark gewürzt
The meal was strongly seasoned
Hans ist ein starker Raucher
John is a heavy smoker
2
Er hatte daran starken Zweifel
He had grave doubts about it
- it is difficult to find out if the set of rules is complete or not:
it is difficult to predict the size of the set of rules in advance (depends on the direction and the language pair)
- no reusability:
a new set of rules is required for each language pair:
no knowledge can be reused for new language pairs
a multilingual system, which translates in both directions between all language pairs
requires n × (n – 1) modules: e.g., 5 languages = 20 modules with complex direction-specific sets of rules:
L1
L2
L3
L4
L5
- linguistic considerations:
sometimes information for disambiguation appears not locally (not in the immediate context)
(the length of the disambiguating context is not possible to predict)
E.g. (Paul Bennett, 2001)
The questions are hard
hard  difficile
 dur
What kind of information do we need here?
What happens if we have a complex sentence?
The questions she tackled yesterday seemed very hard
To bake tasty bread is very hard
Ukr.: ПитанняN.nom міняється.V щодня
Pytann'a.N.nom min'ajet's'a.V shchodn'a
2. The question.N changes.N have been agreed
Ukr.: Зміну.N.acc. питаньN.gen було погоджено
ZminuN.acc pytan'N.gen bulo pohodzheno
- Moreover: translation of the word question is also different, because its function in a phrase has changed
- Even if the function does not change in the English sentence:
translation might depend on the overall structure
3. The question.N changes.N have been difficult
Ukr.: Зміна.N.nom. питаньN.gen була складною
ZminaN.acc pytan'N.gen bula skladnoju
(passive constructions are translated into Ukrainian sometimes by "middle" voice:
Accusative subject + impersonal verb form, translation #2)
Q.: What is the difference between English sentences 2 and 3?
1. The question.N changes.V every day
The disambiguation information is non-local
Also: changing a word order is difficult for direct systems
rules for changing word order have to operate across some representation of the entire sentence
3
"The meaning that a word, a phrase, or a sentence conveys is determined not just by itself, but by other
parts of the text, both preceding and following… The meaning of a text as a whole is not determined
by the words, phrases and sentences that make it up, but by the situation in which it is used".
M.Kay et. al.: Verbmobil, CSLI 1994, pp. 11-13
Advantages of the direct systems:
However: direct translation is possible between structurally similar languages
(usually related historically, e.g., Romance or Slavic languages
with similar morphological and syntactic systems, word order, etc.)
Cases of non-local disambiguation might be rare ("best guess" might work for the majority of cases)
Or: only shallow linguistic representation could be sufficient:
morphologic
shallow syntactic (which does not involve the analysis of the complete sentence structure, e.g., "chunking")
(on the borderline between "direct" and "transfer") systems
Most commercial systems use this approach. Why?
Does it have any advantages?
1. Saving resources
Translation is much faster (essentially, involves matching strings of limited length).
this is important for "real time" speech-to-speech translation,
embedded MT applications for hand-held devices with cheep slow processors
Translation requires limited memory (not dependent on the length of the input)
Future of the embedded systems:
reasonably good "direct" MT approximations of full-scale "transfer" systems
(which work in a limited subject domain)
2. Machine-learning techniques could be applied straightforwardly to create a direct MT systems
large sets of "direct" MT rules are unmanageable for human developers, but
what if we let the computer to develop an MT system from some training data?
- "Direct" rules are easier to learn automatically
- Generalisations and intermediate representations are difficult for machine learning
some kinds of generalisations are still not possible
corpus-based methods can be implemented as "direct" systems
Experiments: IBM statistical MT system, etc. (next lecture)
4
Problem: insufficient training material: aligned parallel texts are expensive, not available for all languages
- the data is sparse, do not produce sufficiently accurate lists of the "direct" rules.
5. Indirect systems
Translation is made on the basis of a linguistic analysis of the ST and some kind of linguistic representation
(interface representation -- IR)
ST  Interface Representation(s)  TT
Transfer systems:
-- IRs are language-specific
-- Language-pair specific mappings are used
Interlingual systems:
-- IRs are language-independent
-- No language-pair specific mappings are used
6. Transfer systems
- Involve 3 stages: analysis - transfer - synthesis
- Analysis and synthesis are monolingual and independent, i.e.:
analysis is the same irrespective of the TL;
synthesis is the same irrespective of the SL
- Transfer is bilingual, and each transfer module is specific to a particular language-pair
Synthesis (generation) is straightforward
Number of modules for a multilingual system:
n × (n – 1) transfer modules
n × (n + 1) modules in total
A 5-language system (if translates in both directions between all language-pairs) has
20 transfer modules and 30 modules in total
L1
L2
IR1
L3
IR2
IR3
IR4
L4
IR5
L5
More modules than for direct systems?
5
Advantage: reusability of Analysis and Synthesis modules:
essentially it is separation of reusable (transfer-independent) information from language-pair mapping
- operations on higher level of abstraction
the task:
to do as much work as possible in reusable modules of analysis and synthesis
to keep transfer modules as simple as possible
(this is often described as "moving towards Interlingua")
Now we can generalise over features, lexemes, tree configurations, functions of word groups
We can view the properties: how they relate to each other
The men wait for a train
S, present
verb
subject,
pl., def
object,
sg., indef
wait
man
train
S, present
verb
subject,
pl., def
object,
sg., indef
attendre
homme
train
Les hommes attendent un train
Lexical items are replaced and the features are copied…
Necessary transformations are performed…
There is no need to translate each inflected word form: the lexicon for transfer becomes smaller.
Advantage: translation equivalents are expressed in a compact and intuitively clear way
Possible to deal with structural differences, differences in word order:
Dutch: Jan zwemt  English: Jan swims
Dutch: Jan zwemt graag  English: Jan likes to swim
(lit.: Jan swims "pleasurably", with pleasure)
Spanish: Juan suele ir a casa  English: Juan usually goes home
(lit.: Juan tends to go home, soler (v.) = 'to tend')
English: John hammered the metal flat  French: Jean a aplati le métal au marteau
(Resultative construction in English; French lit.: Jean flattened the metal with a hammer)
English: The bottle floated past the rock  Spanish: La botella pasó por la piedra flotando
(Spanish lit.: 'The bottle past the rock floating')
English: The hotel forbids dogs  German: In diesem Hotel sind Hunde verboten
6
(German lit.: Dogs are forbidden in this hotel)
English: The trial cannot proceed  German: Wir können mit dem Prozeß nicht fortfahren
(German lit.: We cannot proceed with the trial)
English: This advertisement will sell us a lot  German: Mit dieser Anziege verkaufen wir viel
(German lit.: With this advertisement we will sell a lot)
English: 10 pounds will buy you a decent milk …
(English has less constraints in subjects;
German generation module needs to generate correct surface structure from semantic roles of words)
It is possible to handle idioms in a flexible way: Engl.: "to call a spade a spade"; "to kick the bucket"
- higher quality of translation is achievable, even for structurally different languages
Using a transfer approach still leaves many open questions:
- the depth of the SL analysis
- the nature of the interface representation (syntactic, semantic, both?)
- transfer components may vary in size and complexity depending how far up the MT triangle they fall
- the nature of transfer may be influenced by how typologically similar the languages involved are:
the more typologically different -- the more complex is the transfer
Transfer components consist of 2 parts
- lexical transfer
- structural transfer
IRs should abstract away from (many) surface features of language,
and therefore -- form more lanuguage-independent representations
Some principles of IRs:
IRs should form an adequate basis for transfer, i.e., they should
- contain enough information to make transfer (a) possible; (b) simple
- provide sufficient information for synthesis
(criticism: IRs need to combine information of very different kinds)
in IRs:
1. lematisation:
each member of a lexical item is represented in a uniform way, e.g., sing.N., Inf.V.
(allows reducing transfer lexicon)
2. freaturisation:
only content words are represented in IRs 'as such',
function words and morphemes become features on content words (e.g., plur., def., past…)
inflectional features only occur in IRs if they have contrastive values
(are syntactically or semantically relevant)
7
3. neutralisation
neutralising many surface differences, e.g.,
- active and passive distinction
- different word orders
- surface properties are represented as features (e.g., voice = passive)
- possibly: representing syntactic categories:
John seems to be rich (logically, John is not a subject of seem):
= It seems to someone that John is rich
Mary is believed to be rich = One believes that Mary is rich
(it might be easier to translate "normalised" structures into other languages)
4. reconstruction
- to facilitate the transfer, certain aspects that are not overtly present in a sentence should occur in IRs
especially, for the transfer to languages, where such elements are obligatory:
John tried to leave: S[ try.V John.NP S[ leave.V John.NP]]
5. disambiguagtion
ambiguities should be resolved at IR, e.g., attachment of PPs.
Lexical ambiguities can be annotated with numbers: table_1, _2…
IRs should be defined on principal basis, not on ad-hoc solutions:
eclectic way of creating IRs -- a great problem until now.
types of transfer: based on some concepts from the theory of [human] translation
7. Interlingual systems
Interlingual systems involve just 2 stages:
analysis -- synthesis
both are monolingual and independent
there are no bilingual parts to the system at all (no transfer)
generation is not straightforward
A multilingual system with n languages (which translates in both directions between all language-pairs)
requires 2n modules: 5-language system contains 5 modules
L1
L3
L2
IL
L4
L5
- Each module needs to be more complex
- There is more work on the analysis part
- IR needs to be universal (not specific to particular languages)
- IL must be based on universal semantics, and not oriented towards any particular family or type of languages
- IR principles still apply (even more so): Neutralisation must be applied cross-linguistically,
(with often different surface realisations of the same meaning being mapped into one single IR)
8
- there should be no lexical items in theory, just universal semantic primitives:
(e.g., kill: [cause[become [dead]]])
From transfer to interlingua:
Ex.: (F. van Eynde)
Luc seems to be ill  Fr: *Luc semble être malade
 Fr: Il semble que Luc est malade
SEEM-2 (ILL (Luc))
SEMBLER (MALADE (Luc))
Problem: the translation of predicates:
A solution: treat predicates as language-specific expressions of universal concepts
SCHIJNEN-1 = concept-372
SCHIJNEN-2 = concept-373
SHINE = concept-372
SEEM = concept-373
BRILLER = concept-372
SEMBLER = concept-373
but: Criticism of "universal semantic features":
Problem with Interlingua:
Semantic differentiation is target-language specific
runway  startbaan, landingsbaan (landing runway; take-of runway)
cousin  cousin, cousine (m., f.)
- there is no good reason in English to consider these words ambiguous;
- making such distinctions is comparable to lexical transfer
Consequence: not all distinctions which are needed for translation are motivated monolingually
(concepts may be not ambiguous in the source language, but -- ambiguous in the other languages)
All possible distinctions could not be anticipated.
(Adding a new language might require changing all other modules)
In practice: IL does not work as it should!
8. Transfer and Interlingua compared
- Much work is the same for both approaches
- Translation vs. paraphrase (limits on the translator's freedom)
translation is limited by conflicting restrictions
fluency considerations
by adequacy considerations
- Bilingual contrastive knowledge is central to translation
9
- translators know about contrast of languages
(+ know correct systems of correspondences, e.g., legal terms, where "retelling" is not an option)
IL leaves no place for bilingual knowledge
IL can work only in limited domains, syntactically and lexically restricted
Transfer systems can capture contrastive knowledge
"Given the existence of category and level shift in translation equivalence, an Interlingua representation would be
not universal: it would be obliged to neutralise differences between lexical categories and between grammar and the
lexicon.
Given different partitioning of the conceptual space in different languages, word have to be broken into a set of
components, sufficient to discriminate between any two or more concepts represented by different words in any one
language. In many cases, the work involved in analysis would merely be undone in generation for an SL - TL pair in
which a particular shift was inapplicable. More importantly, the criteria for defining the set of components would be very
difficult to formalise. In fact, given that translation is primarily a matter of defining equivalents, to require that they be
derived in terms of a formal system with significantly different properties from any natural language would appear to add
an unnecessary complication to the task.
Rather then partitioning the work of the system across independent modules, an Interlingua approach ensures that
the behaviour of every module is critically dependent on that of very other. Thus, extensibility both in terms of handling
new languages, and incorporating expert bilingual knowledge, is drastically curtailed.
[For an IL system]… The discarding of SL-specific information necessitated by the Interlingua caused problems in
TL synthesis, severely hindering possible improvements of translation quality… What is needed to compute true
translation equivalents may already have been lost. The translation produced … might better be called paraphrases.
Transfer has a theoretical background, it is not an engineering ad-hoc solution, a "poor substitute for Interlingua". It
must be takes seriously and developed through solving problems in contrastive linguistics and in knowledge
representation appropriate for translation tasks".
Whitelock and Kilby, 1995, p. 7-9
10
Download