PowerPoint-presentatie

advertisement
Two sides of the same coin
assessing translation quality through
adequacy and acceptability error analysis
Joke Daems
joke.daems@ugent.be
www.lt3.ugent.be/en/projects/robot
Supervised by:
Lieve Macken, Sonia Vandepitte, Robert Hartsuiker
What makes error analysis
so complicated?
“There are some errors for all types of
distinctions, but the most problematic
distinctions were for adequacy/fluency and
seriousness.”
– Stymne & Ahrenberg, 2012
 Does a problem concern adequacy, fluency, both,
neither?
 How do we determine the seriousness of an error?
Two types of quality
“Whereas adherence to source norms
determines a translation's adequacy as
compared to the source text, subscription to
norms originating in the target culture
determines its acceptability.”
- Toury, 1995
 Why mix?
2-step TQA approach
Acceptability
= target norms
Quality
Assessment
Adequacy
= target vs.
source
Subcategories
Acceptability
Grammar &
Syntax
Contradiction
Lexicon
Deletion
Spelling &
typos
Adequacy
Addition
Style &
register
Word sense
Coherence
Meaning shift
Acceptability: fine-grained
Grammar & Syntax
Lexicon
Spelling & Typos
Style & Register
Coherence
article
wrong preposition
capitalization
register
conjunction
comparative/superlative
wrong collocation
spelling mistake
untranslated
missing info
singular/plural
word nonexistent
compound
repetition
logical problem
verb form
punctuation
disfluent
paragraph
article-noun agreement
typo
short sentences
inconsistency
noun-adj agreement
long sentence
coherence - other
subject-verb agreement
text type
reference
style – other
missing
superfluous
word order
structure
grammar – other
Adequacy: fine-grained
Meaning shift
contradiction
meaning shift caused by misplaced word
word sense disambiguation
deletion
hyponymy
addition
hyperonymy
explicitation
terminology
coherence
quantity
inconsistent terminology
time
other
meaning shift caused by punctuation
How serious is an error?
“Different thresholds exist for major, minor and
critical errors. These should be flexible,
depending on the content type, end-user profile
and perishability of the content.”
- TAUS, error typology guidelines, 2013
 Give different weights to error categories
depending on text type & translation brief
Reducing subjectivity
• Flexible error weights
• More than one annotator
• Consolidation phase
TQA: Annotation (brat)
1) Acceptability
2) Adequacy
Application example:
comparative analysis
Top HT problems
newspaper articles
Top PE problems
newspaper articles
other meaning shift
wrong collocation
punctuation
word sense
typo
deletion
compound
word sense
other meaning shift
wrong collocation
punctuation
0%
5%
10%
15%
0%
Top HT problems
technical texts
5%
10%
15%
Top PE problems
technical texts
terminology
logical problem
compound
other meaning shift
untranslated
wrong collocation
compound
terminology
logical problem
article
untranslated
other meaning shift
0%
5%
10%
15%
20%
0%
5%
10%
15%
20%
Next step:
diagnostic & comparative evaluation
• What makes a ST-passage problematic?
• How problematic is this passage really? (i.e.:
how many translators make errors)
• Which PE errors are caused by MT?
• Which MT errors are hardest to solve?
 Link all errors to corresponding ST-passage
Source text-related error sets
• ST: Changes in the environment that are sweeping the
planet...
• MT: Veranderingen in de omgeving die het vegen van de
planeet tot stand brengen... (wrong word sense) "Changes
in the environment that bring about the brushing of the
planet..."
• PE1: Veranderingen in de omgeving die het evenwicht op
de planeet verstoren... (other type of meaning shift)
"Changes in the environment that disturb the balance on
the planet..."
• PE2: Veranderingen in de omgeving die over de planeet
rasen... (wrong collocation + spelling mistake) "Changes in
the environment that raige over the planet..."
30
30
25
25
20
20
15
15
10
10
5
5
0
0
word order
missing constituent
verb form
structure
deletion
other meaning shift
logical problem
article
terminology
Top 10 MT errors
newspaper articles
compound
missing constituent
word order
subj-verb agreement
punctuation
verb form
untranslated
article
compound
word sense
wrong collocation
Application example:
impact of MT errors on PE
Top 10 MT errors
technical texts
Summary
• Improve error analysis by:
– judging acceptability and adequacy separately
– making error weights depend on translation brief
– having more than one annotator
– introducing consolidation phase
• Improve diagnostic and comparative evaluation by:
– linking errors to ST-passages
– taking number of translators into account
Open questions
• How can we reduce annotation time?
– Ways of automating (part) of the process?
– Limit annotation to subset of errors?
• How to better implement ST-related error
sets?
– Ways of automatically aligning ST, MT, and various
TT’s at word-level?
Thank you for listening
For more information, contact:
joke.daems@ugent.be
Suggestions?
Questions?
Quantification
of ST-related error sets
ST
MT (1)
MT1(0.5)
wrong word
sense (0.5)
PE (1)
MT2 (0.5)
PE1 (0.5)
other
meaning shift
(0.5)
PE2(0.5)
wrong
collocation
(0.25)
spelling
mistake
(0.25)
Inter-annotator agreement
Initial
agreement
Agreement
after
consolidation
Correlation
between
annotators
HT&PE
acceptability
HT&PE adequacy
MT acceptability
MT adequacy
Exp1
Exp1
Exp1
Exp1
Exp2
Exp2
Exp2
Exp2
39%
50%
42%
46%
53%
79%
57%
51%
(κ=0.32) (κ=0.44) (κ=0.31) (κ=0.30) (κ=0.49) (κ=0.77) (κ=0.46) (κ=0.41)
67%
81%
82%
94%
84%
95%
94%
86%
(κ=0.65) (κ=0.80) (κ=0.79) (κ=0.92) (κ=0.83) (κ=0.94) (κ=0.92) (κ=0.83)
r=0.67, r=0.95, r=0.87, r=0.86,
n=38,
n=34,
n=38,
n=34,
n/a
p<0.001 p<0.001 p<0.001 p<0.001
n/a
n/a
n/a
Agreement
90%
89%
89%
88%
83%
93%
86%
86%
on categories (κ=0.89) (κ=0.88) (κ=0.87) (κ=0.83) (κ=0.81) (κ=0.93) (κ=0.79) (κ=0.82)
Download