Module Title: Statistical Machine Translation Module Code: CA4-SMT School: Computing Module Coordinator: Level: Prof. Andy Way 4 Office Number: L2.01C Credit Rating: 5 Pre-requisite(s): None Co-requisite(s): None Module Aims: The goal of this module is to provide students with thorough knowledge of the state-of-the-art model of machine translation (MT). MT is becoming ever more pervasive: on the Web, used in-house by leading multinational companies and smaller SMEs, and by voluntary organisations seeking to improve the availability of information in an ever widening range of the world's languages. Students will be equipped with the background theoretical statistical modelling underpinning the various components in a statistical MT system (word and phrase alignment, language modelling, parameter estimation, decoding, and evaluation), as well as intensive hands-on experience of building their own MT systems using open-source tools. The specific objectives of the module are: Understand how statistical MT (SMT) systems came to dominate the field of translation; Understand the importance of the availability of good quality, representative bilingual aligned corpora for such models; Understand the statistical modelling underpinning state-of-the-art SMT systems; Build the various components of an SMT system in an open-source framework; Evaluate statistical (and other) MT systems automatically; Understand the issues involved in improving the state-of-the-art model. Learning Outcomes: The student will have a sound understanding of the key issues surrounding the areas of statistical machine translation. In particular, they will have a good understanding of the statistical models underpinning each of the components of an SMT system, and be able to apply that knowledge in building and critically evaluating their own MT systems. After successfully completing this course students will: be able to explain how SMT came to dominate the translation landscape, and summarise competing models of translation; explain and apply the statistical models underpinning SMT; build each of the components in an SMT system: word and phrase aligners, language models, decoders; efficiently tune each of these components using parameter estimation models; evaluate the translations output by statistical (and other) MT systems automatically; explain and summarize the factors involved in improving the state-of-the-art. Indicative Time Allowances: Hours Lectures: 12 Tutorials: 0 Laboratories: 24 Seminars: 0 Independent Learning Time: 39 TOTAL: 75 Note: Assume that a module load represents approximately 75 hours’ work, which includes all teaching, in-course assignments, laboratory work or other specialised training and an estimated private learning time associated with the module. Indicative Syllabus: The history of MT o Rule-Based MT o Example-Based MT o Statistical MT The statistical models underpinning SMT o Joint probability distributions o Conditional probability distributions o Interpolation o Entropy o Mutual Information Training an SMT system o word alignment o phrase alignment o language modelling,, decoding, and evaluation Tuning an SMT system o parameter estimation Running an SMT system o Decoding Evaluating an SMT system o Automatic Evaluation o Human Evaluation o Statistical Significance Testing Improving an SMT system o Hierarchical Models o Tree-Based Models o Modelling Source-Language Context Assessment: Only include components relevant to this module Continuous Assessment 50% 50% End of module exam Reading List: Statistical Machine Translation, P. Koehn, CUP (2010) Supplementary: Learning Statistical Translation, C. Goutte et al. (eds.), MIT Press (2009) Speech & Language Processing, D. Jurafsy & J. Martin, Pearson International (2009) Foundations of Statistical NLP, C. Manning & H. Schuetze, MIT Press (1999) The course will be accompanied by a set of journal articles, websites and other on-line resources to ensure that teaching materials are up to date with current technology and trends. Programme or List of Programmes on which this module will be delivered: CASE BSc in Computer Applications (Sft.Eng.) ECSA Study Abroad (Engineering & Computing) ECSAO Study Abroad (Engineering & Computing) Programme Reference Number Date of Last Revision: 8th April 2010