Open Access version via Utrecht University Repository

advertisement
Machine Translation
Jan Odijk
Utrecht. March 7, 2011
1
Overview
•
•
•
•
Lexicons
Statistical MT
MT: What is (perhaps) possible
Conclusions
2
Lexicons
• “Wat helemaal niet moeilijk is
– Grote woordenboeken met veel moeilijke
woorden en vaktermen”
– (Steven Krauwer, vorige college)
• I disagree
3
Lexicons
• True if you know the words and terms in
advance
• But new words and terms (usually with different
translations) are created all the time in science,
technology and industry
• So you must have techniques to find (identify,
extract) such new words/terms and their translations
as automatically as possible
– To tune the lexicons to specific domains
– to continuously extend them
4
Lexicons
• Many terms are multiword expressions
– With some internal variation
– Not always contiguous
– This requires special treatment in the lexicon
and in the grammar
• House* of representatives (Chambre* des représentants)
• Patatas* fritas* (French fries*)
• Chômeur* (Unemployed person*)
5
Lexicons
• Modern formal grammars depend highly on
lexical properties
• They have very general rule schemata,
which are filled in by properties of lexical
items
– e.g. a word of category X and its complements
form a XPhrase
– E.g. mass nouns can occur without article in
singular;
– count nouns can occur with een in singular
6
Lexicons
• Properties of lexical items
– E.g. which complements a verb takes
• E.g. a direct object noun phrase, also an indirect
object, predicate, prepositional complement, etc
• E.g. an infinitival complement, with or with te, with
or without om, with or without a subject, etc.
– With which preposition it can be combined
• Kijken naar, zorgen voor, houden van
– Nouns: mass or count?
7
Lexicons
• Traditional dictionaries do not contain such
information (or very rarely)
• And what is available is not represented in a
formal manner
• So computers cannot use this information
directly
8
Lexicons
• It is very difficult to assign such properties
correctly in a systematic manner
– It requires very good knowledge of syntax
– Often the phenomena are not understood well
enough
– Words often have multiple options with
different meanings and translations
– Try it yourself for lopen; innemen
– Count/Mass: vis; wijn; bestek; meubilair
9
Lexicons
• It is very difficult to assign such properties
correctly in a systematic manner (Cont.)
– Lexicographers are not trained to assign such
properties
– It must be done for many words
– Consistency within one person is hard to
achieve
– Consistency among multiple people is evebn
harder
10
Lexicon: Semantics
• Selection restrictions with type system to
approach modeling of world knowledge
– Requires sophisticated syntactic analysis
•
•
•
•
•
Boek: info (legible)
Uur: time unit  duration
Vergadering: event  duration
Lezen: subject=human; object=info (legible)
Durational adjunct must be a duration phrase
11
Lexicon: Semantics
• Selection restrictions
–
–
–
–
–
–
–
–
Pak (1) (suit): cloths
Pak (2) (package): entity
Dragen (1) (wear): subj=animate; object=cloths
Dragen (2) (carry): subj=animate; object= entity
Schoen: cloths
Entity > cloths
Identity preferred over subsumption
Homogeneous object preferred over heterogeneous one
12
Lexicon: Semantics
• Selection restrictions
– Hij draagt een bruin pak
•
•
•
•
He wears a brown suit (1: cloths=cloths)
He carries a brown package (1: entity=entity)
He carries a brown suit (2: entity > cloth)
*He wears a brown package (cloth ¬> entity)
– Hij draagt een bruin pak en zwarte schoenen
• He wears a brown suit and black shoes (1: homogeneous and
cloths=cloths)
• He carries a brown suit and black shoes (2: homogeneous but
entity > cloths)
• He carries a brown package and black shoes(2:
inhomogeneous but entity=entity)
• *He wears a brown package and black shoes (cloths ¬> entity)
13
Statistical MT
• Statistical MT
• Derives MT-system automatically
– From statistics taken from
• Aligned parallel corpora ( translation model)
• Monolingual target language corpora ( language
model)
• Being worked since early 90’s
14
Statistical MT
• Plus:
– No or very limited grammar development
– Includes language and world knowledge automatically
(but implicitly)
– Based on actually occurring data
– Currently many experimental and commercial systems
• Minus:
– Requires large aligned parallel corpora
– Unclear how much linguistics will be needed anyway
– Probably restricted to very limited domains only
15
Statistical MT
•
•
•
•
Google Translate (statistical MT)
Hij draagt een pak.  √He wears a suit.
Hij draagt schoenen.  √ He wears shoes.
Hij draagt bruine schoenen en een pak.
•  √ He wears a suit and brown shoes. (!!)
• Hij draagt het pakket  √ He carries the package
• Hij heeft een pak aan.  *He has a suit.
• Voert uw bedrijf sloten uit?
•
–  *Does your company locks out?
16
Hybrid MT
• Euromatrix esp. “the Euromatrix”, and
– Successor project EuromatrixPlus
– …
– Efficient inclusion of linguistic knowledge into statistical
machine translation
– The development and testing of hybrid architectures for the
integration of rule-based and statistical approaches
17
Hybrid MT
• META-NET 2010-2013 (EU-funding)
– Building a community with shared vision and strategic
research agenda
– Building META-SHARE, an open resource exchange
facility
– Building bridges to neighbouring technology fields
•
•
•
•
Bringing more Semantics into Translation
Optimising the Division of Labour in Hybrid MT
Exploiting the Context for Translation
Empirical Base for Machine Translation
18
Hybrid MT
• PACO-MT 2008-2011
• Investigates hybrid approach to MT
– Rule-based and statistical
– Uses existing parser for source language
analysis
– Uses statistical n-gram language models for
generation
– Uses statistical approach to transfer
19
MT: What is (perhaps) possible
•
•
•
•
•
•
•
•
Cross-Language Information Retrieval
Low Quality MT for Gist extraction
MT and Speech Technology
Controlled Language
Limited Domain
Interaction with author
Combinations of the above
Computer-aided translation
20
MT: What is (perhaps) possible
• Cross-Language Information Retrieval
(CLIR)
–
–
–
–
Input query: in own language
Input query translated into target languages
Search in target language documents
Results in target language
• Translation of individual words only
• Growing need (growing multilingual Web)
• No perfect translation required
21
MT: What is (perhaps) possible
22
MT: What is (perhaps) possible
• Low quality MT for Gist extraction
• Low quality but still useful
• If interesting high quality human translation
can be requested (has to be paid for)
23
MT: What is (perhaps) possible
24
MT: What is (perhaps) possible
25
MT: What is (perhaps) possible
• CLIR
– Fills a growing need in the market
– Is technically feasible
– Creates need for translation of found
documents
• Solved partially by low quality MT
• Potentially creates need for more human translation
• Stimulates (funds) research into more sophisticated
MT
26
MT: What is (perhaps) possible
• Combine MT (statistical or rule-based) with OCR
technology
–
–
–
–
Make a picture of a text with your phone
Text is OCR-ed
Text is translated
(usually a short and simple text)
• Linguatec Shoot & Translate
• Word Lens
27
MT: What is (perhaps) possible
• Combine MT (statistical or rule-based) with
Speech technology
– Complicates the problem on the one hand but
– Speech technology (ASR) is currently limited to very
limited domains (makes MT simpler)
– Many useful applications for speech technology
currently in the market
• Directory assistance
Tourist Information
• Tourist communication Call Centers
• Navigation
Hotel reservations
– Some will profit from in-built automatic translation
28
MT: What is (perhaps) possible
• Large EC FP6 project TC-STAR (2004-)
– (http://www.tc-star.org/)
– Research into improved speech technology
(ASR and TTS)
– Research into statistical MT
– Research in combining both (speech-to-speech
translation)
– In a few selected limited domains
29
MT: What is (perhaps) possible
• Commercial Speech2Speech Translation
• Jibbigo
– http://www.jibbigo.com
• Speech-to-speech translation (iPhone, Android)
• http://www.phonedog.com/2009/10/30/iphone-appjibbigo-speech-translator
• Talk to Me (Android phones)
30
MT: What is (perhaps) possible
• Controlled Language
– Authoring System limits vocabulary and syntax
of document authors
– Often desirable in companies to get consistent
documentation (e.g. aircraft maintenance
manuals)
• AECMA Simplified English
• GIFAS Rationalized French
– Makes MT easier (language well-defined)
31
MT: What is (perhaps) possible
• Limited Domain
– Translation of
• Weather reports (TAUM-Meteo, Canada)
• Avalanche warnings (Switzerland)
– Fast adaptation to domain/company-specific
vocabulary and terminology
32
MT: What is (perhaps) possible
• Interaction with author
– No fully automatic translation
– Document author resolves
• Ambiguities unresolved by the system
• In a dialogue between the author and the system in
the source language
• Approach taken in Rosetta project (Philips)
• Will only work if the
– #unresolved ambiguities is low
– Questions to resolve ambiguity are clear
33
MT: What is (perhaps) possible
• Hij droeg een bruin pak
– Wat bedoelt u met “pak”
• (1) kostuum
• (2) pakket
• Hij droeg een bruin pak
– Wat bedoelt u met “dragen (droeg)”
• (1) aan of op hebben (kleding)
• (2) bij zich hebben (bijv. in de hand)
34
MT: What is (perhaps) possible
• Combinations of the above
35
MT: What is (perhaps) possible
• Computer-aided translation
– For end-users
– For professional translators/localization industry
• Limited functionality
– Specific terminology
• Bootstrap translation automatically
– Human revision and correction (Post-edit)
• Only if
– MT Quality is such that it reduces effort
– The system is fully integrated in the workflow system
36
Conclusions
• MT is really very difficult!
• Even making a lexicon for an MT system is very
difficult (and a lot of work)
• Statistical MT yields practical relatively quick to
produce systems (but low-quality)
– Provided you have huge amounts of data
• Focus of research is on hybrid systems (mixed
statistically based/knowledge based) (PACO-MT,
META-NET,…)
37
Conclusions
• Several constrained versions do yield usable
technology with state-of-the-art MT
• In some cases: even potentially creates additional
needs for MT and human translation
38
– Try it yourself for lopen; innemen
– Count/Mass: vis; wijn; bestek; meubilair
• http://www.vandale.nl/
39
Do not go beyond this slide
40
MT Evaluation
• Evaluation depends on purpose of MT and how it
is used
– application, domain, controlled language
• Many aspects can be evaluated
– functionality, efficiency, usability, reliability,
maintainability, portability
– translation quality
– embedding in work flow
• post-editing options/tools
41
MT Evaluation
• Focus here:
– does the system yield good translations
according to human judgement
– in the context of developing a system
• Again, many aspects:
– fidelity (how close), correctness, adequacy,
informativeness, intelligibility, fluency
– and many ways to measure these aspects
42
MT Evaluation
•
Test suite
– Reference =
• list of (carefully selected) sentences
• with their translations (ordered by score)
– translations judged correct by human (usually developer)
– upon every update of the system output of the new system is compared to the
reference
• if different: system has to be adapted, or reference has to be adapted
•
Advantages
– focus on specific translation problems possible
– excellent for regression testing
– Manual judgement needed only once for each new output
• –other comparisons are automatic
•
Disadvantages
– not really independent
– particularly suited for pure rule-based systems
– human judgement needed if output differs from reference
43
MT Evaluation
• Comparison against
– translation corpus
– independently created by human translators
– possibly multiple equivalently correct translations of a sentence
• Advantages
– truely independent
– also suited for data-driven systems
• Disadvantage
– requires human judgement (every time there is a system update)
• high effort by highly skilled people, high costs, requires a lot of time
– human judgement is not easy (unless there is a perfect match)
• Useful
– for a one-time evaluation of a stable system
– not for evaluation during development
44
MT Evaluation
• Edit-Distance (Word Accuracy)
– metric to determine closeness of translations
automatically
– the least number of edit operations to turn the
translated sentence into the reference sentence
– Alshawi et al. 1998
45
MT Evaluation
•
•
•
•
•
•
•
WA = 1- ((d+s+i)/max(r,c))
d= number of deletions
s = number of substitutions
i = number of insertions
r = reference sentence length
c = candidate sentence length
easy to calculate using Levenshtein distance
algorithm (dynamic programming)
• various extensions have been proposed
46
MT Evaluation
• Advantages
– fully automatic given a reference set
• Disadvantages
– penalizes candidates if a synonym is used
– penalizes swaps of words and block of words
too much
47
MT Evaluation
• BLEU (method to automate MT Evaluation)
– the closer a machine translation is to a
professional human translation, the better it is
– BiLingual Evaluation Understudy
• Required:
– corpus of good quality human reference
translations
– a “closeness” metric
48
MT Evaluation
• Two candidate translations from Chinese
source
– C1: It is a guide to action which ensures that the
military always obeys the commands of the
party
– C2: It is to insure the troops forever hearing the
activity guidebook that party direct
• Intuitively: C1 is better than C2
49
MT Evaluation
• Three reference translations
– R1: It is a guide to action that ensures that the
military will forever heed Party commands
– R2: It is the guiding principle which guarantees
the military forces always being under the
command of the Party
– R3: It is the practical guide for the army always
to heed the directions of the party
50
MT Evaluation
• Basic idea:
– a good candidate translation shares many words
and phrases with reference translations
– comparing n-gram matches can be used to
rank candidate translations
• n-gram: a sequence of n word occurrences
– in BLEU n=1,2,3,4
- 1-grams give a measure of adequacy
- longer n-grams give a measure of fluency
51
MT Evaluation
• For unigrams:
– count the number of matching unigrams
• in all references
– divide by the total number of unigrams (in the
candidate sentence)
52
MT Evaluation
• Problem
– C1: the the the the the the the (=7/7=1)
– R1: the cat is on the mat
• Solution:
– clip matching count (7) by maximum reference
count (2)  2 (CountClip)
–  modified unigram precision = 2/7=0.29
53
MT Evaluation
• Example (unigrams)
– C1: It is a guide to action which ensures that the
military always obeys the commands of the party
(17/18=0.94)
– R1: It is a guide to action that ensures that the military
will forever heed Party commands
– R2: It is the guiding principle which guarantees the
military forces always being under the command of the
Party
– R3: It is the practical guide for the army always to heed
the directions of the party
54
MT Evaluation
• Example (unigrams)
– C2: It is to insure the troops forever hearing the activity
guidebook that party direct (8/14=0.57)
– R1: It is a guide to action that ensures that the military
will forever heed Party commands
– R2: It is the guiding principle which guarantees the
military forces always being under the command of the
Party
– R3: It is the practical guide for the army always to heed
the directions of the party
55
MT Evaluation
• Example (bigrams)
– C1: It is a guide to action which ensures that the
military always obeys the commands of the party
(10/17=0.59)
– R1: It is a guide to action that ensures that the military
will forever heed Party commands
– R2: It is the guiding principle which guarantees the
military forces always being under the command of the
Party
– R3: It is the practical guide for the army always to heed
the directions of the party
56
MT Evaluation
• Example (bigrams)
– C2: It is to insure the troops forever hearing the activity
guidebook that party direct (1/13=0.08)
– R1: It is a guide to action that ensures that the military
will forever heed Party commands
– R2: It is the guiding principle which guarantees the
military forces always being under the command of the
Party
– R3: It is the practical guide for the army always to heed
the directions of the party
57
MT Evaluation
•
•
•
•
•
Extend to a full multi-sentence corpus
compute n-gram matches sentence by sentence
sum the clipped n-gram counts for all candidates
divide by the number of n-grams in the text corpus
pn =
– ∑C ∈ {Candidates}∑n-gram ∈ C Countclip(n-gram)
– divided by
– ∑C’ ∈ {Candidates}∑n-gram’ ∈ C’ Count(n-gram’)
58
MT Evaluation
• Combining n-gram precision scores
• weighted linear average works reasonable
– ∑Nn=1 wn pn
• but: n-gram decisions decays exponentially
with n (so log to compensate for this)
– exp (∑Nn=1 wn log pn)
• weights in BLEU: wn = 1/N
59
MT Evaluation
• BLEU is a precision measure
– #(C ∩ R) / #C
• Recall is difficult to define because of
multiple reference translations
– e.g. #(C ∩ Rs) / # Rs
• where Rs = Ui Ri
– will not work
60
MT Evaluation
•
•
•
•
•
•
•
C1: I always invariably perpetually do
C2: I always do
R1: I always do
R2: I invariably do
R3: I perpetually do
Recall of C1 over R1-3 is better than C2
but C2 is a better translation
61
MT Evaluation
• But without Recall:
–
–
–
–
–
C1: of the
compared with R1-3 as before
modified unigram precision = 2/2
modified bigram precision = 1/1
which is the wrong result
62
MT Evaluation
• Length
– n-gram precision penalizes translations longer
than the reference
– but not translations shorter than the reference
–  Add Brevity Penalty (BP)
63
MT Evaluation
• bi= best match length = reference sentence
length closest to candidate sentence i‘s
length (e.g. r:12, 15, 17, c: 12  12)
• r = test corpus effective reference length =
∑i bi
• c = total length of candidate translation
corpus
64
MT Evaluation
• BP =
–
–
–
–
computed over the corpus
not sentence by sentence and averaged
1 if c > r
e(1-r/c) if c <= r
• BLEU = BP • exp (∑Nn=1 wn log pn)
65
MT Evaluation
• BLEU:
– claim: BLEU closely matches human judgement
• when averaged over a test corpus
• not necessarily on individual sentences
• shown extensively in Papineni et al. 2001
–  multiple reference translations are desirable
• to cancel out translation styles of individual translators
• (e.g. East Asian economy v. economy of East Asia)
66
MT Evaluation
• Variants on BLEU
– NIST
• http://www.nist.gov/speech/tests/mt/doc/ngramstudy.pdf
• different weights
• different BP
– ROUGE (Lin and Hovy 2003)
• for text summarization
• Recall-Oriented Understudy for Gisting Evaluation
67
MT Evaluation
• Main Advantage of BLEU
– automatic evaluation
• good for use during development
• particularly useful for data-based systems
• Disadvantage
– defined for a whole test corpus
– not for individual sentences
– just measures difference with reference
68
Download