official module spec.

advertisement
Module Title:
Statistical Machine Translation
Module Code:
CA4-SMT
School:
Computing
Module Coordinator:
Level:
Prof. Andy Way
4
Office Number: L2.01C
Credit Rating: 5
Pre-requisite(s): None
Co-requisite(s): None
Module Aims:
The goal of this module is to provide students with thorough knowledge of the state-of-the-art
model of machine translation (MT). MT is becoming ever more pervasive: on the Web, used
in-house by leading multinational companies and smaller SMEs, and by voluntary
organisations seeking to improve the availability of information in an ever widening range of
the world's languages. Students will be equipped with the background theoretical statistical
modelling underpinning the various components in a statistical MT system (word and phrase
alignment, language modelling, parameter estimation, decoding, and evaluation), as well as
intensive hands-on experience of building their own MT systems using open-source tools.
The specific objectives of the module are:






Understand how statistical MT (SMT) systems came to dominate the field of translation;
Understand the importance of the availability of good quality, representative bilingual
aligned corpora for such models;
Understand the statistical modelling underpinning state-of-the-art SMT systems;
Build the various components of an SMT system in an open-source framework;
Evaluate statistical (and other) MT systems automatically;
Understand the issues involved in improving the state-of-the-art model.
Learning Outcomes:
The student will have a sound understanding of the key issues surrounding the areas of
statistical machine translation. In particular, they will have a good understanding of the
statistical models underpinning each of the components of an SMT system, and be able to
apply that knowledge in building and critically evaluating their own MT systems.
After successfully completing this course students will:



be able to explain how SMT came to dominate the translation landscape, and
summarise competing models of translation;
explain and apply the statistical models underpinning SMT;
build each of the components in an SMT system: word and phrase aligners, language
models, decoders;



efficiently tune each of these components using parameter estimation models;
evaluate the translations output by statistical (and other) MT systems automatically;
explain and summarize the factors involved in improving the state-of-the-art.
Indicative Time Allowances:
Hours
Lectures: 12
Tutorials: 0
Laboratories: 24
Seminars: 0
Independent Learning Time: 39
TOTAL: 75
Note: Assume that a module load represents approximately 75 hours’ work, which includes all
teaching, in-course assignments, laboratory work or other specialised training and an
estimated private learning time associated with the module.
Indicative Syllabus:

The history of MT
o Rule-Based MT
o Example-Based MT
o Statistical MT
 The statistical models underpinning SMT
o Joint probability distributions
o Conditional probability distributions
o Interpolation
o Entropy
o Mutual Information
 Training an SMT system
o word alignment
o phrase alignment
o language modelling,, decoding, and evaluation
 Tuning an SMT system
o parameter estimation
 Running an SMT system
o Decoding
 Evaluating an SMT system
o Automatic Evaluation
o Human Evaluation
o Statistical Significance Testing
 Improving an SMT system
o
Hierarchical Models
o
Tree-Based Models
o
Modelling Source-Language Context
Assessment:
Only include components relevant to this module
Continuous
Assessment
50%
50%
End of module exam
Reading List:

Statistical Machine Translation, P. Koehn, CUP (2010)
Supplementary:



Learning Statistical Translation, C. Goutte et al. (eds.), MIT Press (2009)
Speech & Language Processing, D. Jurafsy & J. Martin, Pearson International (2009)
Foundations of Statistical NLP, C. Manning & H. Schuetze, MIT Press (1999)
The course will be accompanied by a set of journal articles, websites and other on-line resources
to ensure that teaching materials are up to date with current technology and trends.
Programme or List of Programmes on which this module will be delivered:
CASE
BSc in Computer Applications (Sft.Eng.)
ECSA
Study Abroad (Engineering & Computing)
ECSAO
Study Abroad (Engineering & Computing)
Programme Reference Number
Date of Last Revision: 8th April 2010
Download