MODL5003 Principles and applications of machine translation Lecture 10/2/2003 Architectures for MT – direct, transfer and Interlingua 1. Overview Classification of approaches to MT Architectures of rule-based MT systems (the MT triangle) Reviewing each architecture and its problems Architectures compared 2. Revision of MT problems & how to deal with them - Rule-based approaches (lecture today) Direct MT Transfer MT Interlingua MT use formal models of our knowledge of language ("to explicate human knowledge used for translation, put it into an "Expert System") problems: expensive to build; require precise knowledge, which might be not available - Corpus-based approaches (lecture 28/4/2003) Example-based MT Statistical MT use machine learning techniques on large collections of available texts; e.g. "parallel texts" (aligned sentence by sentence; phrase by phrase) ("to let the data speak for themselves") recent decade: shift into this direction: IBM MT system problems: language data are sparse (difficult to achieve saturation) high-quality linguistic resources are also expensive - Corpus-based support for rule-based approaches (current state-of-the-art technology) speeding up the process of rule-creation by retrieving translation equivalents automatically 3. Architectures of MT systems (the MT triangle*) Interlingua** Analysis ST Transfer Direct translation Synthesis TT * Other linguistic engineering technologies also have similar "triangle" hierarchy of architectures: e.g., Text-to-Speech triangle **Interlingua = language independent representation of a text 4. Direct systems Essentially: word for word translation with some attention to local linguistic context No linguistic representation is build (historically come first: the Georgetown experiment 1954-1963: 250 words, 6 grammar rules, 49 sentences) 1 Example (Paul Bennett, 2001): English sentence: The questions are difficult 1. the <[N.plur]> les /*before plural noun*/ 2. <[article]> questions[N.plur] /*'questions' is plur. noun after the article */ questions 3. <[not: "we" or "you"]> are sont /* unless it follows the words "we" or "you"*/ 4. <are> difficult difficilles /*when it follows 'are'*/ (algorithm: a "window" of a limited size moves through the text and checks if any rules match) Problems with direct systems > there is no intermediate representation - application development considerations: - rules are "tactical", not "strategic" (do not generalise too much) for each word-form (a member of a paradigm ) a separate set of rules is required rules have little linguistic significance there is no obvious link between our ideas about translation knowledge and the formalism it is very hard to "think of" an accurate set of "direct" rules and to encode them manually dealing with highly inflected languages becomes difficult Russian: 90.000 dictionary entries (lexemes, lemas, headwords) have about 4.000.000 word forms (These words do not cover about 3% of the text: do not include proper names, etc. that are also inflected) Should there be 4.000.000 sets of rules for translation from Russian? What happens if we translate between two highly inflected languages? (combinatorial grow of the number of rules: any Russian adjective with 24 word forms can be translated by a German adjective with 16 word forms (potentially 384 rules) + additional word forms for German adjectives in the presence of an article > some "intermediate morphological representation" is required (moving towards transfer systems) - large systems become difficult to maintain and to develop further: - at certain point a system becomes non-manageable: it is not possible to locate a rule which causes errors there appear problems with debugging, spotting errors, and avoiding new errors if new features are introduced (before moving forward we need to ensure that the system does not work worse) - the problem of interaction of rules: rules are not completely independent (e.g., rules 1 and 2 in the example) e.g., rules for introducing / translating an article in German influence the choice of the adjective ending what happens if we have millions of rules? Translation of German adjective stark (example of John Hutchins, 2002): Das ist ein starker Mann This is a strong man Es war sein stärkstes Theaterstück It has been his best play Wir hoffen auf eine starke Beteiligung We hope a large number of people will take part Eine 100 Mann starke Truppe A 100 strong unit Der starke Regen überraschte uns We were surprised by the heavy rain Maria hat starkes Interesse gezeigt Mary has shown strong interest Paul hat starkes Fieber Paul has high temperature Das Auto war stark beschädigt The car was badly damaged Das Stück fand einen starken Widerhall The piece had a considerable response Das Essen was stark gewürzt The meal was strongly seasoned Hans ist ein starker Raucher John is a heavy smoker 2 Er hatte daran starken Zweifel He had grave doubts about it - it is difficult to find out if the set of rules is complete or not: it is difficult to predict the size of the set of rules in advance (depends on the direction and the language pair) - no reusability: a new set of rules is required for each language pair: no knowledge can be reused for new language pairs a multilingual system, which translates in both directions between all language pairs requires n × (n – 1) modules: e.g., 5 languages = 20 modules with complex direction-specific sets of rules: L1 L2 L3 L4 L5 - linguistic considerations: sometimes information for disambiguation appears not locally (not in the immediate context) (the length of the disambiguating context is not possible to predict) E.g. (Paul Bennett, 2001) The questions are hard hard difficile dur What kind of information do we need here? What happens if we have a complex sentence? The questions she tackled yesterday seemed very hard To bake tasty bread is very hard Ukr.: ПитанняN.nom міняється.V щодня Pytann'a.N.nom min'ajet's'a.V shchodn'a 2. The question.N changes.N have been agreed Ukr.: Зміну.N.acc. питаньN.gen було погоджено ZminuN.acc pytan'N.gen bulo pohodzheno - Moreover: translation of the word question is also different, because its function in a phrase has changed - Even if the function does not change in the English sentence: translation might depend on the overall structure 3. The question.N changes.N have been difficult Ukr.: Зміна.N.nom. питаньN.gen була складною ZminaN.acc pytan'N.gen bula skladnoju (passive constructions are translated into Ukrainian sometimes by "middle" voice: Accusative subject + impersonal verb form, translation #2) Q.: What is the difference between English sentences 2 and 3? 1. The question.N changes.V every day The disambiguation information is non-local Also: changing a word order is difficult for direct systems rules for changing word order have to operate across some representation of the entire sentence 3 "The meaning that a word, a phrase, or a sentence conveys is determined not just by itself, but by other parts of the text, both preceding and following… The meaning of a text as a whole is not determined by the words, phrases and sentences that make it up, but by the situation in which it is used". M.Kay et. al.: Verbmobil, CSLI 1994, pp. 11-13 Advantages of the direct systems: However: direct translation is possible between structurally similar languages (usually related historically, e.g., Romance or Slavic languages with similar morphological and syntactic systems, word order, etc.) Cases of non-local disambiguation might be rare ("best guess" might work for the majority of cases) Or: only shallow linguistic representation could be sufficient: morphologic shallow syntactic (which does not involve the analysis of the complete sentence structure, e.g., "chunking") (on the borderline between "direct" and "transfer") systems Most commercial systems use this approach. Why? Does it have any advantages? 1. Saving resources Translation is much faster (essentially, involves matching strings of limited length). this is important for "real time" speech-to-speech translation, embedded MT applications for hand-held devices with cheep slow processors Translation requires limited memory (not dependent on the length of the input) Future of the embedded systems: reasonably good "direct" MT approximations of full-scale "transfer" systems (which work in a limited subject domain) 2. Machine-learning techniques could be applied straightforwardly to create a direct MT systems large sets of "direct" MT rules are unmanageable for human developers, but what if we let the computer to develop an MT system from some training data? - "Direct" rules are easier to learn automatically - Generalisations and intermediate representations are difficult for machine learning some kinds of generalisations are still not possible corpus-based methods can be implemented as "direct" systems Experiments: IBM statistical MT system, etc. (next lecture) 4 Problem: insufficient training material: aligned parallel texts are expensive, not available for all languages - the data is sparse, do not produce sufficiently accurate lists of the "direct" rules. 5. Indirect systems Translation is made on the basis of a linguistic analysis of the ST and some kind of linguistic representation (interface representation -- IR) ST Interface Representation(s) TT Transfer systems: -- IRs are language-specific -- Language-pair specific mappings are used Interlingual systems: -- IRs are language-independent -- No language-pair specific mappings are used 6. Transfer systems - Involve 3 stages: analysis - transfer - synthesis - Analysis and synthesis are monolingual and independent, i.e.: analysis is the same irrespective of the TL; synthesis is the same irrespective of the SL - Transfer is bilingual, and each transfer module is specific to a particular language-pair Synthesis (generation) is straightforward Number of modules for a multilingual system: n × (n – 1) transfer modules n × (n + 1) modules in total A 5-language system (if translates in both directions between all language-pairs) has 20 transfer modules and 30 modules in total L1 L2 IR1 L3 IR2 IR3 IR4 L4 IR5 L5 More modules than for direct systems? 5 Advantage: reusability of Analysis and Synthesis modules: essentially it is separation of reusable (transfer-independent) information from language-pair mapping - operations on higher level of abstraction the task: to do as much work as possible in reusable modules of analysis and synthesis to keep transfer modules as simple as possible (this is often described as "moving towards Interlingua") Now we can generalise over features, lexemes, tree configurations, functions of word groups We can view the properties: how they relate to each other The men wait for a train S, present verb subject, pl., def object, sg., indef wait man train S, present verb subject, pl., def object, sg., indef attendre homme train Les hommes attendent un train Lexical items are replaced and the features are copied… Necessary transformations are performed… There is no need to translate each inflected word form: the lexicon for transfer becomes smaller. Advantage: translation equivalents are expressed in a compact and intuitively clear way Possible to deal with structural differences, differences in word order: Dutch: Jan zwemt English: Jan swims Dutch: Jan zwemt graag English: Jan likes to swim (lit.: Jan swims "pleasurably", with pleasure) Spanish: Juan suele ir a casa English: Juan usually goes home (lit.: Juan tends to go home, soler (v.) = 'to tend') English: John hammered the metal flat French: Jean a aplati le métal au marteau (Resultative construction in English; French lit.: Jean flattened the metal with a hammer) English: The bottle floated past the rock Spanish: La botella pasó por la piedra flotando (Spanish lit.: 'The bottle past the rock floating') English: The hotel forbids dogs German: In diesem Hotel sind Hunde verboten 6 (German lit.: Dogs are forbidden in this hotel) English: The trial cannot proceed German: Wir können mit dem Prozeß nicht fortfahren (German lit.: We cannot proceed with the trial) English: This advertisement will sell us a lot German: Mit dieser Anziege verkaufen wir viel (German lit.: With this advertisement we will sell a lot) English: 10 pounds will buy you a decent milk … (English has less constraints in subjects; German generation module needs to generate correct surface structure from semantic roles of words) It is possible to handle idioms in a flexible way: Engl.: "to call a spade a spade"; "to kick the bucket" - higher quality of translation is achievable, even for structurally different languages Using a transfer approach still leaves many open questions: - the depth of the SL analysis - the nature of the interface representation (syntactic, semantic, both?) - transfer components may vary in size and complexity depending how far up the MT triangle they fall - the nature of transfer may be influenced by how typologically similar the languages involved are: the more typologically different -- the more complex is the transfer Transfer components consist of 2 parts - lexical transfer - structural transfer IRs should abstract away from (many) surface features of language, and therefore -- form more lanuguage-independent representations Some principles of IRs: IRs should form an adequate basis for transfer, i.e., they should - contain enough information to make transfer (a) possible; (b) simple - provide sufficient information for synthesis (criticism: IRs need to combine information of very different kinds) in IRs: 1. lematisation: each member of a lexical item is represented in a uniform way, e.g., sing.N., Inf.V. (allows reducing transfer lexicon) 2. freaturisation: only content words are represented in IRs 'as such', function words and morphemes become features on content words (e.g., plur., def., past…) inflectional features only occur in IRs if they have contrastive values (are syntactically or semantically relevant) 7 3. neutralisation neutralising many surface differences, e.g., - active and passive distinction - different word orders - surface properties are represented as features (e.g., voice = passive) - possibly: representing syntactic categories: John seems to be rich (logically, John is not a subject of seem): = It seems to someone that John is rich Mary is believed to be rich = One believes that Mary is rich (it might be easier to translate "normalised" structures into other languages) 4. reconstruction - to facilitate the transfer, certain aspects that are not overtly present in a sentence should occur in IRs especially, for the transfer to languages, where such elements are obligatory: John tried to leave: S[ try.V John.NP S[ leave.V John.NP]] 5. disambiguagtion ambiguities should be resolved at IR, e.g., attachment of PPs. Lexical ambiguities can be annotated with numbers: table_1, _2… IRs should be defined on principal basis, not on ad-hoc solutions: eclectic way of creating IRs -- a great problem until now. types of transfer: based on some concepts from the theory of [human] translation 7. Interlingual systems Interlingual systems involve just 2 stages: analysis -- synthesis both are monolingual and independent there are no bilingual parts to the system at all (no transfer) generation is not straightforward A multilingual system with n languages (which translates in both directions between all language-pairs) requires 2n modules: 5-language system contains 5 modules L1 L3 L2 IL L4 L5 - Each module needs to be more complex - There is more work on the analysis part - IR needs to be universal (not specific to particular languages) - IL must be based on universal semantics, and not oriented towards any particular family or type of languages - IR principles still apply (even more so): Neutralisation must be applied cross-linguistically, (with often different surface realisations of the same meaning being mapped into one single IR) 8 - there should be no lexical items in theory, just universal semantic primitives: (e.g., kill: [cause[become [dead]]]) From transfer to interlingua: Ex.: (F. van Eynde) Luc seems to be ill Fr: *Luc semble être malade Fr: Il semble que Luc est malade SEEM-2 (ILL (Luc)) SEMBLER (MALADE (Luc)) Problem: the translation of predicates: A solution: treat predicates as language-specific expressions of universal concepts SCHIJNEN-1 = concept-372 SCHIJNEN-2 = concept-373 SHINE = concept-372 SEEM = concept-373 BRILLER = concept-372 SEMBLER = concept-373 but: Criticism of "universal semantic features": Problem with Interlingua: Semantic differentiation is target-language specific runway startbaan, landingsbaan (landing runway; take-of runway) cousin cousin, cousine (m., f.) - there is no good reason in English to consider these words ambiguous; - making such distinctions is comparable to lexical transfer Consequence: not all distinctions which are needed for translation are motivated monolingually (concepts may be not ambiguous in the source language, but -- ambiguous in the other languages) All possible distinctions could not be anticipated. (Adding a new language might require changing all other modules) In practice: IL does not work as it should! 8. Transfer and Interlingua compared - Much work is the same for both approaches - Translation vs. paraphrase (limits on the translator's freedom) translation is limited by conflicting restrictions fluency considerations by adequacy considerations - Bilingual contrastive knowledge is central to translation 9 - translators know about contrast of languages (+ know correct systems of correspondences, e.g., legal terms, where "retelling" is not an option) IL leaves no place for bilingual knowledge IL can work only in limited domains, syntactically and lexically restricted Transfer systems can capture contrastive knowledge "Given the existence of category and level shift in translation equivalence, an Interlingua representation would be not universal: it would be obliged to neutralise differences between lexical categories and between grammar and the lexicon. Given different partitioning of the conceptual space in different languages, word have to be broken into a set of components, sufficient to discriminate between any two or more concepts represented by different words in any one language. In many cases, the work involved in analysis would merely be undone in generation for an SL - TL pair in which a particular shift was inapplicable. More importantly, the criteria for defining the set of components would be very difficult to formalise. In fact, given that translation is primarily a matter of defining equivalents, to require that they be derived in terms of a formal system with significantly different properties from any natural language would appear to add an unnecessary complication to the task. Rather then partitioning the work of the system across independent modules, an Interlingua approach ensures that the behaviour of every module is critically dependent on that of very other. Thus, extensibility both in terms of handling new languages, and incorporating expert bilingual knowledge, is drastically curtailed. [For an IL system]… The discarding of SL-specific information necessitated by the Interlingua caused problems in TL synthesis, severely hindering possible improvements of translation quality… What is needed to compute true translation equivalents may already have been lost. The translation produced … might better be called paraphrases. Transfer has a theoretical background, it is not an engineering ad-hoc solution, a "poor substitute for Interlingua". It must be takes seriously and developed through solving problems in contrastive linguistics and in knowledge representation appropriate for translation tasks". Whitelock and Kilby, 1995, p. 7-9 10