Translating with people who speak only one language MonoTrans2: A New Human

advertisement
MonoTrans2:
A New
Human
Translating
with
people
who
Computation
to Support
speak onlySystem
one language
Monolingual Translation
Chang Hu, Benjamin B. Bederson, Philip Resnik
and Yakov Kronrod
Too Much to Translate
International Children’s Digital Library
– 4,386 books
– 54 languages
– 100K unique visitors/month
– 1,500 volunteer translators
English and Spanish?
Croatian and Japanese?
www.childrenslibrary.org
Uncommon Languages
 Fanm gen tranche pou fe`
yon pitit nan Delmas 31
 Undergoing children delivery
Delmas 31
Munro, Robert. 2010. Crowdsourced translation for emergency response and beyond. NSF Workshop on
crowdsourcing and translation, University of Maryland.
Bilingual Translators are Hard to Find
Machine Translation?
Large volume, cheap, fast
Unreliable quality
Translation with bilingual translators
Translate with the Monolingual Crowd
Wikipedia: 900 translators vs. 1,200,000 contributors
Chang Hu. Collaborative Translation by Monolingual Users, CHI '09
Chang Hu, Benjamin B. Bederson, Philip Resnik. Translation by Iterative Collaboration
between Monolingual Users (MonoTrans), GI '10
Monolingual Crowds Fixing
Machine Translation Together
Estoy bien.
I am fine.
1
Estoy bien.
1
Vote on back translation
Vote on candidates
Estoy bien.
Am fine.
1
Vote on candidates
2
Target-side editing
I am fine.
Estoy bien.
1
Vote on back translation
Estoy bien.
I am been.
1
Vote on candidates
2
Target-side editing
3
Identify translation errors
I am been.
been.
Estoy bien.
bien.
1
Vote on back translation
2
Explain phrase
Estoy bien.
bien.
I am been.
been.
Estoy bien.
I am been.
1
Vote on candidates
2
Target-side editing
3
Identify translation errors
I am been.
been.
Estoy bien.
bien.
1
Vote on back translation
2
Explain phrase
Estoy bien.
bien.
Paraphrase source sentence
Yo estoy bien.
…
3
1
2
repeat …
3
UI
Experiments
Experiment1 – Children’s Books
• 60 Spanish / 22 German speakers
• ICDL volunteers
• Worked on
– 4 Spanish books => German
– 1 German book => Spanish
• Machine translation engine: Google Translate
Evaluation of MonoTrans2 Output
• 2 German-Spanish bilingual evaluators (not
part of MonoTrans2!)
• Fluency and accuracy
• 5-point score
• How much improvement over Google
Translate?
Original: Estoy muy bien.
Fluent, not accurate: The weather is good.
Accurate, not fluent: Me is very good.
Results - Fluency
150
# of sentences
125
Google
MonoTrans2
100
75
50
25
0
1
Worst
2
3
4
5
Best
Results - Fluency
150
# of sentences
125
Google
MonoTrans2
100
75
50
25
0
1
Worst
2
3
4
5
Best
Results - Accuracy
150
# of Sentences
125
Google
MonoTrans2
100
75
50
25
0
1
Worst
2
3
4
5
Best
Results - Accuracy
150
# of Sentences
125
Google
MonoTrans2
100
75
50
25
0
1
Worst
2
3
4
5
Best
Ready for ICDL?
Ready: both bilingual evaluators agree score = 5
Machine translation (Google) only: 10% of sentences
MonoTrans2: 68% of sentences ready
150
125
100
75
50
25
0
150
125
100
75
50
25
0
Google
MonoTrans2
1
2
3
4
5
Google
MonoTrans2
1
2
3
4
5
Experiment2
Haitian Earthquake SMS
•
•
•
•
4 Haitian Creole speakers
5 English-speaking students
21 other English speakers
Worked on 408 text messages
Machine translation (Google) only: 25% of sentences
MonoTrans2: 38% of sentences ready
Difficulty: text messages >> children’s books
Sample Results
Haitian Creole:
Ground Truth:
Google:
MonoTrans2:
Enfòmasyon sou tranblemen de tè
Information on the earthquake
Information tranblemen ground
Information on the earthquake
Sample Results
Haitian Creole: Bonjou. Mwen ta renmen konnen si
imigrasyon ouvè SVP. Mèsi.
Ground Truth: Hello. I would like to know if
immigration is open please. Thank you.
Google:
Hello. I would like to know if open
immigration SVP. Thank you.
MonoTrans2: Hello. I would like to know if
immigration is open please. Thank you.
Recap
• MonoTrans2
– No human bilingual knowledge
– Dramatic improvement from machine translation
?
translatetheworld.org
Take-Away Message
• People + machine > people or machine
• Combining two crowds with different skills
translatetheworld.org
Backup Slides
International Children’s Digital Library
[previously funded by NSF ITR]
www.childrenslibrary.org
• Translation speed
– Professional translators: 2000 words per day
– MonoTrans2: 800 words per day
– Translation firm on the four German/Spanish books: 4
days
– MonoTrans2: 4 days
– Haitian SMS experiment: 284.75 words per minute
UI
Target Side - Identify Errors
Target Side - Edit Translations
Source Side
Source Side – Explain Errors
Ready for ICDL?
Sentences for which both bilingual evaluators agree score = 5
Google
MonoTrans2
Sentences with fluency = 5
21
112
Sentences with adequacy = 5
17
118
Sentences where BOTH = 5
17
110
(N=162 sentences worked on in the experiment)
Machine translation only:
10% of sentences ready
MonoTrans2: 68% of sentences ready
Experiment 3
• An alternative use case for crowdsourced
translation…
 My family in Carrefour, 24
Cote Plage, 41A needs food
and water
 People trapped in Sacred
Heart Church, PauP
 General Hospital has less than
24 hrs. supplies
 Undergoing children delivery
Delmas 31
Munro, Robert. 2010. Crowdsourced translation for emergency response and beyond. NSF Workshop on
crowdsourcing and translation, University of Maryland.
MonoTrans2 now available at:
www.translatetheworld.org
Fluency Distribution
Adequacy Distribution
Punchline (provisional)
Sentences for which three bilingual evaluators agree score = 5
Sentences with fluency = 5
Sentences with adequacy = 5
Sentences where BOTH = 5
Google
MonoTrans2
1 (1%)
22 (30%)
11 (14%)
29 (38%)
0 (0%)
14 (18%)
(N=76 sentences completed)
Straight MT:
0% of sentences preserve all the meaning
MonoTrans2: 38% of sentences preserve all the meaning
Download