Two sides of the same coin assessing translation quality through adequacy and acceptability error analysis Joke Daems joke.daems@ugent.be www.lt3.ugent.be/en/projects/robot Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker What makes error analysis so complicated? “There are some errors for all types of distinctions, but the most problematic distinctions were for adequacy/fluency and seriousness.” – Stymne & Ahrenberg, 2012 Does a problem concern adequacy, fluency, both, neither? How do we determine the seriousness of an error? Two types of quality “Whereas adherence to source norms determines a translation's adequacy as compared to the source text, subscription to norms originating in the target culture determines its acceptability.” - Toury, 1995 Why mix? 2-step TQA approach Acceptability = target norms Quality Assessment Adequacy = target vs. source Subcategories Acceptability Grammar & Syntax Contradiction Lexicon Deletion Spelling & typos Adequacy Addition Style & register Word sense Coherence Meaning shift Acceptability: fine-grained Grammar & Syntax Lexicon Spelling & Typos Style & Register Coherence article wrong preposition capitalization register conjunction comparative/superlative wrong collocation spelling mistake untranslated missing info singular/plural word nonexistent compound repetition logical problem verb form punctuation disfluent paragraph article-noun agreement typo short sentences inconsistency noun-adj agreement long sentence coherence - other subject-verb agreement text type reference style – other missing superfluous word order structure grammar – other Adequacy: fine-grained Meaning shift contradiction meaning shift caused by misplaced word word sense disambiguation deletion hyponymy addition hyperonymy explicitation terminology coherence quantity inconsistent terminology time other meaning shift caused by punctuation How serious is an error? “Different thresholds exist for major, minor and critical errors. These should be flexible, depending on the content type, end-user profile and perishability of the content.” - TAUS, error typology guidelines, 2013 Give different weights to error categories depending on text type & translation brief Reducing subjectivity • Flexible error weights • More than one annotator • Consolidation phase TQA: Annotation (brat) 1) Acceptability 2) Adequacy Application example: comparative analysis Top HT problems newspaper articles Top PE problems newspaper articles other meaning shift wrong collocation punctuation word sense typo deletion compound word sense other meaning shift wrong collocation punctuation 0% 5% 10% 15% 0% Top HT problems technical texts 5% 10% 15% Top PE problems technical texts terminology logical problem compound other meaning shift untranslated wrong collocation compound terminology logical problem article untranslated other meaning shift 0% 5% 10% 15% 20% 0% 5% 10% 15% 20% Next step: diagnostic & comparative evaluation • What makes a ST-passage problematic? • How problematic is this passage really? (i.e.: how many translators make errors) • Which PE errors are caused by MT? • Which MT errors are hardest to solve? Link all errors to corresponding ST-passage Source text-related error sets • ST: Changes in the environment that are sweeping the planet... • MT: Veranderingen in de omgeving die het vegen van de planeet tot stand brengen... (wrong word sense) "Changes in the environment that bring about the brushing of the planet..." • PE1: Veranderingen in de omgeving die het evenwicht op de planeet verstoren... (other type of meaning shift) "Changes in the environment that disturb the balance on the planet..." • PE2: Veranderingen in de omgeving die over de planeet rasen... (wrong collocation + spelling mistake) "Changes in the environment that raige over the planet..." 30 30 25 25 20 20 15 15 10 10 5 5 0 0 word order missing constituent verb form structure deletion other meaning shift logical problem article terminology Top 10 MT errors newspaper articles compound missing constituent word order subj-verb agreement punctuation verb form untranslated article compound word sense wrong collocation Application example: impact of MT errors on PE Top 10 MT errors technical texts Summary • Improve error analysis by: – judging acceptability and adequacy separately – making error weights depend on translation brief – having more than one annotator – introducing consolidation phase • Improve diagnostic and comparative evaluation by: – linking errors to ST-passages – taking number of translators into account Open questions • How can we reduce annotation time? – Ways of automating (part) of the process? – Limit annotation to subset of errors? • How to better implement ST-related error sets? – Ways of automatically aligning ST, MT, and various TT’s at word-level? Thank you for listening For more information, contact: joke.daems@ugent.be Suggestions? Questions? Quantification of ST-related error sets ST MT (1) MT1(0.5) wrong word sense (0.5) PE (1) MT2 (0.5) PE1 (0.5) other meaning shift (0.5) PE2(0.5) wrong collocation (0.25) spelling mistake (0.25) Inter-annotator agreement Initial agreement Agreement after consolidation Correlation between annotators HT&PE acceptability HT&PE adequacy MT acceptability MT adequacy Exp1 Exp1 Exp1 Exp1 Exp2 Exp2 Exp2 Exp2 39% 50% 42% 46% 53% 79% 57% 51% (κ=0.32) (κ=0.44) (κ=0.31) (κ=0.30) (κ=0.49) (κ=0.77) (κ=0.46) (κ=0.41) 67% 81% 82% 94% 84% 95% 94% 86% (κ=0.65) (κ=0.80) (κ=0.79) (κ=0.92) (κ=0.83) (κ=0.94) (κ=0.92) (κ=0.83) r=0.67, r=0.95, r=0.87, r=0.86, n=38, n=34, n=38, n=34, n/a p<0.001 p<0.001 p<0.001 p<0.001 n/a n/a n/a Agreement 90% 89% 89% 88% 83% 93% 86% 86% on categories (κ=0.89) (κ=0.88) (κ=0.87) (κ=0.83) (κ=0.81) (κ=0.93) (κ=0.79) (κ=0.82)