Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004 Translation ”substitute the text material of one language (SL) by the equivalent text material of another language (TL)” (Catford 1965: 20) ”Translation consists in producing in the target language the closest natural equivalent of the text material of the source language, in the first hand concerning meaning, in the second hand concerning style (Nida 1975: 32) ”Translation is in theory impossible, but in practice fairly possible” Mounin (1967) Catford, J. C. (1965), A Linguistic Theory of Translation, Oxford Press, England. Mounin, G. (1967) Les problèmes théotitiques de la traduction. Paris Nida, E. (1975), A Framework for the Analysis and Evaluation of Theories of Translation, in Brislin, R. W. (ed) (1975), Translation Application and Research, Gardner Press, New York. Anna Sågvall Hein, GSLT, September 2004 Equivalence • • • • form meaning style effect Anna Sågvall Hein, GSLT, September 2004 Formal and dynamic equivalence • Formal equivalence focuses attention on the message itself, in both form and content. It aims to allow the reader to understand as much of the SL context as possible. • Dynamic equivalence is based on the principle of equivalent effect, i.e. that the relationship between receiver and message should aim at being the same as that between the original receivers and the SL message. (Nida 75) Anna Sågvall Hein, GSLT, September 2004 Can computers translate? • Not a simple yes or no; it depends on the purpose of the translation and the required quality. Anna Sågvall Hein, GSLT, September 2004 Classical problems with MT • unrealistic expectations • bad translations • difficulties in integrating MT in the work flow – the Ericsson case Anna Sågvall Hein, GSLT, September 2004 Feasibility of machine translation • • • • • quality in relation to purpose control of the source language human machine interaction re-use of translations evalution Anna Sågvall Hein, GSLT, September 2004 Quality • publishing quality • editing quality • browsing qualiy Anna Sågvall Hein, GSLT, September 2004 Translation related tasks • • • • • • • translation browsing gisting drafting message dissemination cross-language information searches cross-language interchanges Anna Sågvall Hein, GSLT, September 2004 MT as a cross-language communication tool MT is used not only for pure translation purposes but also for writing in a foreign language and for browsing (Hutchins 2001) Hutchins, J., 2001, Towards a new vision for MT, Introductory speech at MT Summit VIII conference, 18-22 September 2001 (http://ourworld.compuserve.com/homepages/WJHutchins/M TS-2001.pdf) Anna Sågvall Hein, GSLT, September 2004 Control of the source language • spell checked and grammar checked SL • sublanguage – Domain – Text type • controlled language Anna Sågvall Hein, GSLT, September 2004 Spell checking and grammar checking • If there are spelling errors or typos in the SL dictionary search will fail • If there are grammatical errors in the SL grammatical analysis will fail • Where and how should spell and grammar checking be accounted for? Before or in the process? Anna Sågvall Hein, GSLT, September 2004 Controlled language • consistent authoring of source texts – reduction of ambiguity – full linguistic coverage • controlled vocabulary – full lexical coverage • controlled grammar – full grammatical coverage • controlled language checking – e.g. Scania Checker Anna Sågvall Hein, GSLT, September 2004 Ex. of controlled languages • Simplified English • KANT controlled English • Scania Swedish – Scania checker Anna Sågvall Hein, GSLT, September 2004 Human intervention • before – language checking • during – e.g. ambiguity resolution • after – post-editing Anna Sågvall Hein, GSLT, September 2004 Re-use of translations • • • • • translation memories translation dictionaries incl. terminologies lexicalistic translation statistical machine translation example-based translation Anna Sågvall Hein, GSLT, September 2004 Evaluation of MT • human • automatic – using a gold standard • coverage (recall) • quality (precision) • global similarity measures – merge of recall and precision – BLEU, NIST Anna Sågvall Hein, GSLT, September 2004 Why machine translation? • cheaper • faster • more consistent – when it succeeds … Anna Sågvall Hein, GSLT, September 2004 What is MT proper? To be considered as MT, a system should provide • minimally correct morphology • minimal syntactic processing • minimal semantic processing • handle and produce full sentences Hutchins, J., 2000, The IAMT Certification initiative and defining translation system categories (http://nl.ijs.si/eamt00/proc/Hutchins.pdf) Anna Sågvall Hein, GSLT, September 2004 Examples of MT products • • • • Systran (http://babelfish.altavista.com/) Comprendium (based on Metal) ProMT (http://www.translate.ru/eng) ESTeam See further: http://ourworld.compuserve.com/homepages/WJHutchins/ Compendium-4.pdf , http://www.foreignword.com/Technology/mt/mt.htm Anna Sågvall Hein, GSLT, September 2004 Basic strategies • direct translation • rule-based translation – transfer – interlingua • example-based translation • statistical translation • hybrids Anna Sågvall Hein, GSLT, September 2004 Direct translation • no complete intermediary sentence structure • translation proceeds in a number of steps, each step dedicated to a specific task • the most important component is the bilingual dictionary • typically general language • problems with – ambiguity – inflection – word order and other structural shifts Anna Sågvall Hein, GSLT, September 2004 Simplistic approach • • • • sentence splitting tokenisation handling capital letters dictionary look-up and lexical substitution incl. some heuristics for handling ambiguities • copying unknown words, digits, signs of punctuation etc. • formal editing Anna Sågvall Hein, GSLT, September 2004 Advanced classical approach (Tucker 1987) • • • • • Source text dictionary look-up and morphological analysis Identification of homographs Identification of compound nouns Identification of nouns and verb phrases Processing of idioms Anna Sågvall Hein, GSLT, September 2004 Advanced approach, cont. • • • • • processing of prepositions subject-predicate identification syntactic ambiguity identification synthesis and morphological processing of target text rearrangement of words and phrases in target text Anna Sågvall Hein, GSLT, September 2004 Feasibility of the direct translation strategy Is it possible to carry out the direct translation steps as suggested by Tucker with sufficient precision without relying on a complete sentence structure? Anna Sågvall Hein, GSLT, September 2004 Assignment 1: manual direct translation Sv. Ytterst handlar kampen för sysselsättning om att hålla samman Sverige. En. Ultimately, the fight for full employment concerns the cohesion of Swedish society. (from Statement of Government Policy 1996) • Define an algorithm and a dictionary (based on Norstedts) for simplistic translation of the example. • Present the model and the result. Anna Sågvall Hein, GSLT, September 2004 Assignment 1, cont. • Improve the result stepwise in accordance with the advanced direct translation strategy – Specify each step carefully and demonstrate its effect on the translation. • Evaluate and discuss the final result. • Translate the ex. using Systran (http://kwic.systran.fr/systran/svdemo) and discuss the differences in an evaluative way • Report the assignment and up-load on the web (041001) Anna Sågvall Hein, GSLT, September 2004 Current trends in direct translation • re-use of translations – translation memories of sentences and sub-sentence units such as words, phrases and larger units – lexicalistic translation – example-based translation – statistical translation Will re-use of translations overcome the problems with the direct translation approach that were discussed above? If so, how can they be handled? Anna Sågvall Hein, GSLT, September 2004 Systran • • • • • • System Translation developed in the US by Peter Toma first version 1969 (Ru-En) EC bought the rights of Systran in 1976 currently 18 language pairs demo version sv-en in 2003 (http://kwic.systran.fr/systran/svdemo) • http://babelfish.altavista.com/ Anna Sågvall Hein, GSLT, September 2004 Systran, cont. • more than 1,600,000 dictionary units • 20 domain dictionaries • daily use by EC translators, administrators of the European institutions • originally a direct translation strategy – see H&S • today more of a transfer-based strategy Anna Sågvall Hein, GSLT, September 2004 Ex. 1: fairly good translation /Systran sv-en • "Enskilda företagare som inte bildat bolag klassificeras hit." • "Individual entrepreneurs that have not formed companies are classified here.” • Systemet har känt igen bildat som en perfektform och översätter tempusformen korrekt have formed med negationen not på rätt plats. Anna Sågvall Hein, GSLT, September 2004 Ex. 2: word order problem/ Systran sv-en • "När byarna kontaktades hade de inte ens utsatts för influensa." • "When the villages were contacted had they not even been exposed to flu.” • Systemet har inte hittat subjekt och predikat och ger därför fel ordföljd. Anna Sågvall Hein, GSLT, September 2004 Ex. 3: ambiguity problem/ Systran sv-en • "Vad kan vi lära av Arrawetestammen?" • "What can we faith of the Arawete?” • Systemet hittar inte sambandet mellan kan och lära och ser därför inte att lära är ett verb. Anna Sågvall Hein, GSLT, September 2004 Ex. 4: ambiguity problem/ Systran sv-en • ”Extrapoleringen går till så här. " • ”The extrapolation goes to so here.” • Systemet känner inte till partikelverbet känna till och översätter därför felaktigt ord för ord. Anna Sågvall Hein, GSLT, September 2004 Systran Linguistic Resources • Dictionaries – – – – POS Definitions Inflection Tables Decomposition Tables Segmentation Dictionaries • Disambiguation Rules • Analysis Rules Anna Sågvall Hein, GSLT, September 2004 Systran Processing Steps • Analysis – – – – – Lookup Compound Decomposition Disambiguation Syntactic Analysis Compound Expansion • Sentence Transfer – – – – Initial Target Structure Lookup Default Transfer of Attributes Structure Transformation Anna Sågvall Hein, GSLT, September 2004 Systran Processing Steps (cont) • Sentence Synthesis – Structure Transformation – Inflection lookup – Surface Transformation Anna Sågvall Hein, GSLT, September 2004 Motivations for transfer-based translation • lexical ambiguity • structural differences See further Ingo 91 Anna Sågvall Hein, GSLT, September 2004 Example 1 Sv. Fyll på olja i växellådan. En. Fill gearbox with oil. (from the Scania corpus) • fyll på fill • obj adv • adv obj Anna Sågvall Hein, GSLT, September 2004 Example 2 Sv. I oljefilterhållaren sitter en överströmningsventil. En. The oil filter retainer has an overflow valve. (from the Scania corpus) • sitter has • adv subj • subj obj Anna Sågvall Hein, GSLT, September 2004 Transfer-based translation • intermediary sentence structure • basic processes – analysis – transfer – generation (synthesis) • language modules – dictionary and grammar of SL – transfer dictionary and transfer rules – dictionary and grammar of TL Anna Sågvall Hein, GSLT, September 2004 SL Transfer Direct translation Metal Multra Interlingua Anna Sågvall Hein, GSLT, September 2004 TL Levels of intermediary structure • cf. J&M, Chapter 21 • word order Anna Sågvall Hein, GSLT, September 2004 Metal • See H&S Anna Sågvall Hein, GSLT, September 2004 MULTRA Multilingual Support for Translation and Writing • translation engine • transfer-based – shake-and-bake • • • • modular unification-based preference machinery trace-able Anna Sågvall Hein, GSLT, September 2004 Anna Sågvall Hein, GSLT, September 2004 Analysis • chart parser (Lisp C) – procedural formalism • unification and other kinds of operations • sentence structure – feature structure – grammatical relations – surface order implicit via grammatical relations See further Sågvall Hein&Starbäck (99),Weijnitz (02), Dahllöf (89) Anna Sågvall Hein, GSLT, September 2004 Transfer • unification-based • declarative formalism – Multra transfer formalism (Beskow 93) • lexical and structural rules • rules are partially ordered • a more specific rule takes precedence over a less specific one – specificity in terms of number of transfer equations • all applicable rules are applied • written in prolog Anna Sågvall Hein, GSLT, September 2004 Generation • syntactic generation – Multra syntactic generation formalism (Beskow 97a) – PATR-like style • unification • concatenation • typed features • morphological generation (Beskow 97b) – lexical insertion rules – morphological realisation and phonological finish in prolog • written in prolog Anna Sågvall Hein, GSLT, September 2004 An example: Tippa hytten. Tippa hytten. : (* = (PHR.CAT = CL MODE = IMP SUBJ = 2ND VERB = (WORD.CAT = VERB INFF = IMP DIAT = ACT LEX = TIPPA.VB.1 VSURF = +) OBJ.DIR = (PHR.CAT = NP NUMB = SING GENDER = UTR CASE = BASIC DEF = DEF HEAD = (LEX = HYTT.NN.1 WORD.CAT = NOUN))) REG = (V1.LEM = TIPPA.VB) SEP = (WORD.CAT = SEP LEX = STOP.SR.0))) Anna Sågvall Hein, GSLT, September 2004 Transfer structure Transfer structure [VERB : [WORD.CAT : VERB LEX : TILT.VB.0 DIAT : ACT INFF : IMP] OBJ.DIR : [PHR.CAT : NP DEF : DEF NUMB : SING HEAD : [WORD.CAT : NOUN LEX : CAB.NN.0]] MODE : IMP SUBJ: 2ND VSURF: + SEP : [WORD.CAT : SEP LEX : STOP.SR.0] PHR.CAT : CL] Anna Sågvall Hein, GSLT, September 2004 Generation Tilt the cab. Anna Sågvall Hein, GSLT, September 2004 A grammar rule defrule legal.obj { <?1 phr.cat> = 'np, not <?1 case> = 'gen, not <?1 case> = 'subj } Anna Sågvall Hein, GSLT, September 2004 Transfer rules • • • • copy feature delete feature transfer feature assign feature Anna Sågvall Hein, GSLT, September 2004 Copy feature LABEL mode SOURCE <* mode> = ?x1 TARGET <* mode> = ?x2 TRANSFER Anna Sågvall Hein, GSLT, September 2004 Delete feature LABEL REG SOURCE <* REG> = ANY TARGET <*> = <*> TRANSFER Anna Sågvall Hein, GSLT, September 2004 Transfer feature LABEL OBJ.DIR SOURCE <* OBJ.DIR> = ?x1 TARGET <* OBJ.DIR> = ?x2 TRANSFER ?x1 <=> ?x2 Anna Sågvall Hein, GSLT, September 2004 Define feature LABEL trycka.in-press SOURCE <* lex sym>=trycka.vb+in.ab.1 <* word.cat>=VERB TARGET <* lex>=press.vb.1 <* word.cat>=VERB TRANSFER Anna Sågvall Hein, GSLT, September 2004 A generation rule LABEL CL.IMP X1 ---> X2 X3 X4 : <X1 PHR.CAT> = CL <X1 VERB> = <X2> <X1 TYPE> = IMP <X1 OBJ.DIR> = <X3> <X1 SEP> = <X4> Anna Sågvall Hein, GSLT, September 2004 A contextual lexical rule LABEL tänka.på-think.about SOURCE <* verb lex sym> = tänka.vb.1 <* obj.prep phr.cat> = pp <* obj.prep prep> = ?prep <* obj.prep prep lex sym> = på.pp.1 <* obj.prep rect> = ?rect1 TARGET <* obj.prep phr.cat> = pp <* obj.prep prep word.cat> = PREP <* obj.prep prep lex> = about.pp.1 <* obj.prep rect> = ?rect2 TRANSFER ?rect1<=>?rect2 Anna Sågvall Hein, GSLT, September 2004 A generation trace 1-Applying Rule cl-sep 1- Applying Rule cl.imp 1- Applying Rule subj2nd-verb-obj.dir 1Applying Rule verb.main.act 1Applying Rule np.the-df 1Applying Rule ng.noun-def 1-Success! Anna Sågvall Hein, GSLT, September 2004 Language resources in the MATS system • dictionary in a database with different views • analysis grammar • transfer grammar – incl. contextually defined lexical rules • generation grammar Anna Sågvall Hein, GSLT, September 2004 sv-en_LinkLexicon en-Inflections en_LemmaLexicon en_LexemeLexicon en_Lexicon en_StemLexicon sv_Inflections sv_LemmaLexicon sv_LexemeLexicon sv_Lexicon sv_StemLexicon The MATS system Frozen demo… Anna Sågvall Hein, GSLT, September 2004 Assignment 2: Working with MATS http://stp.ling.uu.se/~evapet/mt04/assignment2.html Anna Sågvall Hein, GSLT, September 2004 Lexicalistic translation • Identify (lexical) translation units in the source sentence • Translate each unit separately (considering the context) • Order the result in agreement with a model of the target language Formulation due to Lars Ahrenberg; see further AH (reading list) ; see also Beaven, L. John, Shake-and-Bake Machine Translation. Coling –92, Nantes, 23-28 Aout 1992. Anna Sågvall Hein, GSLT, September 2004 T4F – a lexicalistic system • processes in T4F – – – – – tokenisation tagging transfer transposition filtering See further AH (in the reading list) Anna Sågvall Hein, GSLT, September 2004 Interlingua translation • See SN Anna Sågvall Hein, GSLT, September 2004 Anna Sågvall Hein, GSLT, September 2004 Anna Sågvall Hein, GSLT, September 2004 Anna Sågvall Hein, GSLT, September 2004 Applications of alignment • • • • • translation memories translation dictionaries lexicalistic translation statistical machine translation example-based translation Anna Sågvall Hein, GSLT, September 2004 Translation memories • based on sentence links • optionally, sub sentence links See further Macklovitch, E. (2000) Anna Sågvall Hein, GSLT, September 2004 Translation dictionaries • based on word links • refinement of word links Anna Sågvall Hein, GSLT, September 2004 Refinement of word alignment data • neutralise capital letters where appropriate • lemmatise or tag source and target units • identify ambiguities – search for criteria to resolve them • identify partial links – compounds? – remove or complete them • manual revision? Anna Sågvall Hein, GSLT, September 2004 Informally about statistical MT • build a translation dictionary based on word alignment • aim for as big fragments as possible • keep information on link frequency • build an n-gram model of the target language • implement a direct translation strategy – including alternatives ordered by length and frequency • process the output by the n-gram model filtering out the best alternatives and adjust the translation accordingly Anna Sågvall Hein, GSLT, September 2004 Example-based MT HS (in the reading list) Anna Sågvall Hein, GSLT, September 2004 Some current research topics • • • • • • • • • intersentential dependences hybrid systems: data-driven and rule-driven improved alignment techniques improved language modeling in ST automatic learning from post-editing translation by structural correspondences translation of spoken language improved preference strategies ambiguity preserving translation Anna Sågvall Hein, GSLT, September 2004 Intersentential dependencies • pronoun resolution • lexical ambiguity resolution, such as – (torkar)motorn – (förbrännings)motorn the motor the engine • fluency Anna Sågvall Hein, GSLT, September 2004 Preserving the information structure • information structure is expressed in different ways in the source and the target • syntactic clues are exploited in the analysis to compute the information structure (topicfocus articulation) • information structure is used to guide the generation Anna Sågvall Hein, GSLT, September 2004 An example Torkarmotorn M2 är sammankopplad med omkopplare S24 och intervallrelä R22. För att inte motorn skall överbelastas, t.ex. om torkarbladen fastnat, finns en inbyggd termovakt som bryter strömmen till motorn när … Wiper motor M2 is connected to switch S24 and intermittent relay R22. To prevent motor overload, e.g. if the wiper blade gets stuck, there is an integral thermal sensor which breaks the current to the motor when … Anna Sågvall Hein, GSLT, September 2004 Preferences • syntactic preferences – the principle of right association – the principle of minimal attachment – two-stage processing • semantic preferences – – – – lexical selectional restrictions lexical contextual rules conceptual taxonomies likelihood of occurrence See further Bennet, P. & Paggio, P., 1993, Preference in Eurotra. Anna Sågvall Hein, GSLT, September 2004 Preferences in Multra • parsing – a formalism for expressing syntactic preferences in the parse • not fully developed • transfer – contextual lexical rules – rule specificity • generation – rule specificity Anna Sågvall Hein, GSLT, September 2004 Hybrid systems • • • • • aims components problems architecture scores Anna Sågvall Hein, GSLT, September 2004 Aims of a hybrid system • simple techniques for simple tasks • complex techniques for complex tasks Anna Sågvall Hein, GSLT, September 2004 Components of a hybrid systems • component strategies – translation memory • full sentences • fragments • direct translation – statistical translation – ebmt Anna Sågvall Hein, GSLT, September 2004 Component strategies, cont’d • rule-based translation – simplistic analysis (cf. direct translation) • word by word (S sequence of words) • phrase by phrase (S sequence of phrases) – partial parsing – full parsing Anna Sågvall Hein, GSLT, September 2004 Problems of a hybrid system • how does the system know when a simple technique is appropriate? – does the source tell? – does the target tell? Anna Sågvall Hein, GSLT, September 2004 Architecture and scores • simple first? • concerting results? • scoring? Anna Sågvall Hein, GSLT, September 2004 Improved techniques for re-use of translation • combining clues for word alignment (Tiedemann 2003) • interactive word alignment (Ahrenberg et al. 2003) • parallel treebanks Anna Sågvall Hein, GSLT, September 2004 Translation by structural correspondences • LFG • HPSG Anna Sågvall Hein, GSLT, September 2004 Translation of spoken language See Krauver, Steven (ed.), 2000, Machine Translation, June 2000. Volume 15, Issue 12, Special issue on Spoken Language Translation. Anna Sågvall Hein, GSLT, September 2004