MT-Class-110425-Rese.. - School of Computer Science

advertisement
Machine Translation
MT – Research Landscape
Stephan Vogel
Spring Semester 2011
1
Overview





Some influential projects
Open source toolkits
Conferences
MT evaluations
Literature and general resources
 Disclaimer: this all is incomplete, subjective, biased!
11-711 Machine Translation
2
MT Projects
 Verbmobil
 Large speech translation project in Germany
 Different translation paradigms
 Success story for SMT
 TIDES
 DARPA funded US MT project
 SMT widely used, small and large data track evaluations
 Chinese-English and Arabic-English
 GALE
 DARPA funded
 Follow-up to TIDES
 TransTac
 DARPA funded
 Speech-to-Speech Translation
 Targeted towards force protection
11-711 Machine Translation
3
MT Projects
 TC-Star
 European Project with partners from different universities
 Technology and Corpora for Speech-to-Speech Translation
 http://tcstar.org/
 EuroMatrix




2006-2009, EuroMatixPlus 2009-2012
Translate all European languages
Off-springs: WMT evaluations, MT marathon
euromatrix.net
 Quero
 French-German project
 Kind of TC-Star follow-up
 http://www.quaero.org/modules/movie/scenes/home/index.php?FUSEB
OX_LANG=2
11-711 Machine Translation
4
Open Source Toolkits: Word Alignment
 Game Changer
 Lower barrier to enter the field
 Transparency
 Word Alignment
 GIZA++
 Started out at JHU workshop, subsequently extended by Franz Josef Och (at
RWTH and ISI)
 Most widely used alignment toolkit
 mGIZA++
 Multi-threaded/multi-core extension of GIZA++
 By Qin Gao: http://geek.kyloo.net/software/doku.php/mgiza:overview
 Berkeley Aligner
 Word alignment via quadratic assignment
 http://code.google.com/p/berkeleyaligner/
 PostCAT (Posterior Constrained Alignment Toolkit)
 http://www.seas.upenn.edu/~strctlrn/CAT/CAT.html
11-711 Machine Translation
5
Open Source Toolkits: WA cont.
 Word Alignment tools
 Alignment Set
 Set of tools to manipulate and display alignments
 From TALP research group
 http://www.talp.upc.edu/talp/index.php/en/resources/tools/alingment-set
11-711 Machine Translation
6
Open Source Toolkits: Decoders
 Decoders
 Moses (Edinburgh): phrase-based and recently also hierarchical
 Joshua (JHU): hiero reimplementation
 sourceforge.net/projects/joshua
 Jane (RWTH Aachen): hierarchical
 http://www-i6.informatik.rwth-aachen.de/web/Software/index.html
 cdec (UMD -> CMU): hierarchical and phrase-based
 Marie (TALP): ngram-based (kinda phrase-based)
 www.talp.upc.edu/talp/index.php/en/resources/tools/marie
 Apertium (University of Alicante): rule-based
 Phrasasl (Stanford): phrase-based
 http://www-nlp.stanford.edu/wiki/Software/Phrasal
11-711 Machine Translation
7
Open Source Toolkits: LMs
 SRILM
 Most widely known and used LM toolkit
 SALM
 Written by Joy Ying Zhang (while at LTI)
 http://projectile.sv.cmu.edu/research/public/tools/salm/salm.htm
 IRST-LM
 http://sourceforge.net/projects/irstlm/
 Ken-LM
 Smaller footprint then SRILM
 Written by Kenneth Heafield (LIT PhD student)
 http://kheafield.com/code/kenlm/
11-711 Machine Translation
8
Conferences
 General CL conferences





MT Summit (every 2 years)
AMTA (US)
EAMT (Europe)
TMI
Translating and the Computer
(organised by Aslib)
 IWSLT (organized by C-Star
consortium)
 …





ACL
HLT
EMNLP
Coling
IJCNLP
 Int. Joint Conf on NLP
 LREC
 Language Resources and
Evaluation
 RANLP
 Recent Advances in NLP
 SALTMIL
 Specific MT conferences
 Speech and Langauge
Technology for Minority
Languages
 MT Workshops
 WMT
 Workshop on Machine Translation
 SSST
 Syntax, Semantics, and Structure in
SMT
 …
11-711 Machine Translation
9
Evaluations
 It all started with TIDES




Comparative evaluations
Defined training and test data
Automatic evaluation metrics (NIST mteval, Bleu)
Organized by NIST
 NIST Open MT Evaluations





Continuation and expansion of TIDES MT evaluations
Chinese-English, Arabic-English, Urdu-English
Restricted and unrestricted track
Originally every year, now going to 2 year cycle
http://www.itl.nist.gov/iad/mig/tests/mt/2009/
11-711 Machine Translation
10
Evaluations (cont.)
 WMT Evaluations





Organized in connection with EuroMatrix
Based on Europarl corpora
Many languages
Automatic and manual evaluation
http://www.statmt.org/wmt11/translation-task.html
 IWSLT Evaluations






Spoken language
Languages vary: Chinese, Japanese, Arabic, Italian, …
Speech 1-best and lattices provided
Based on (small) BTEC corpus (basic traveler expression corpus)
Last time also lecture translations
http://iwslt2010.fbk.eu/node/15
11-711 Machine Translation
11
Evaluations (cont.)
 Specific projects have evaluations
 GALE




Arabic-English and Chinese-English
Broadcast news and broadcast conversations, newswire and blogs
Human evaluation (HTER)
Go/No-Go
 Quero
 European languages, also Arabic-French
 This year WMT evaluation was used as Quero evaluation
11-711 Machine Translation
12
Journals
 Machine Translation
 Springer Science, formerly Kluwer Academic Publishers, vol.4- ,1989 Articles available online (abstracts free, full texts on payment of fee)
from Springer
 Chief editor: Andy Way
 http://www.springer.com/computer/ai/journal/10590
 Computation Linguistics
 MIT Press
 Now open access
 http://www.mitpressjournals.org/loi/coli
 ACM TSLP
 Online publication
 Started in 2005
 http://tslp.acm.org/
11-711 Machine Translation
13
Journals (cont.)
 IEEE Transactions on Audio, Speech, and Langauge
Processing
 http://www.signalprocessingsociety.org/publications/periodicals/taslp/
 The Prague Bulletin of Mathematical Linguistics
 Has papers from recent MT Marathons, i.e. esp. descriptions of open
source packages.
 http://ufal.mff.cuni.cz/pbml.html
11-711 Machine Translation
14
Literature
 MT-Archive:




http://www.mt-archive.info/
Compiled by John Hutchins for the EAMT
One stop shop!
Also links to books, journals, conferences
Papers listed by author, language, organization
 ACL Anthology: http://www.aclweb.org/anthology/
11-711 Machine Translation
15
Download