MT@EC European Commission machine translation supporting e-government Spyridon Pilos Head of language applications Directorate-General for Translation MT@Work Brussels, 5.12.2014 European Commission machine translation and public administrations • • • • MT@EC: a service for the EU The context of the free trial Implementation What next? 2 EU official languages over time 3 EU translation services DGT 4 Why does the Commission need MT? • The Commission… • DGT has 1700 translators • Over 2 M pages translated in 2013 • But… …just to make europa.eu fully multilingual almost 6.8 M documents to be translated or 8 500 translators/year! The result: Thousands of non-translated documents (and this does not include user generated content) 5 There are also interactions with and between actors in the Member States Member State X Member State Y Business A2B Citizens A2C Administration A2A Administration A2B A2C A2A A2A EU Administrations First type Second type 6 Vision Wouldn’t it be great if I could start using a public service in any Member State from any place and obtain the information in my mother tongue? 7 8 • EIF=European Interoperability Framework • ISA=programme for interoperability solutions for public administrations 9 EIF*: 12 Underlying principles Need for EC action • Subsidiarity and Proportionality User needs and expectations • User Centricity, Inclusion and Accessibility, Security and Privacy, Multilingualism, Administrative Simplification, Transparency, Preservation of Information Collaboration • Openness, Reusability, Technological Neutrality and Adaptability, Effectiveness and Efficiency 10 * European Interoperability Framework http://ec.europa.eu/isa/documents/isa_annex_ii_eif_en.pdf The role of Machine Translation MT is the only viable solution for: quick and cheap access to information in foreign languages. understanding information received in a foreign language that otherwise could not be used or would require substantial time and costs to translate. making multilingual use of websites possible facilitating cross-lingual information search and analytics. That is why machine translation (MT) is a critically important technology for multilingual Europe MT@EC: a European Commission product • • Released : 26 June 2013 (version 1.0) • • Version 2.0 released on 3 July 2014 Languages: • Technology: All 24 EU official languages 552 language pairs (62 direct) Statistical machine translation using open source software Moses co-funded by EU Framework Programmes for research and innovation • Development by DGT: between 2010-2013 co-funded by the ISA programme (action 2.8) • * Interoperability solutions for public administrations http://ec.europa.eu/isa/actions/02-interoperability-architecture/2-8action_en.htm 12 MT@EC description • Delivery: - web user interface (human to machine) - web services (machine to machine) • Special features: • User interface in 24 languages • Source document format/formatting maintained [not for pdf] • Specific output formats for translation: tmx and xliff • Translation can also be returned by email • Can translate multiple documents to multiple languages • Indication of quality for language pairs (using BLEU Scores) • Feedback mechanism (using EU Survey) 13 MT@EC security • Secure hosting in the EC data centre • Access through ECAS (EC Authentication Service) • Secure document transfers : - over sTESTA*, a very secure private network between public administrations in the EU, separate from the internet - over the internet (through a secure https connection) • * You can check if your organisation has access to sTESTA on: https://portal.testa.eu/jetspeed/portal/homepage/about.psml. 14 MT@EC is already available for… … the staff of European institutions and bodies: Commission Parliament Council Court of Justice Court of Auditors Economic and Social Committee Committee of the Regions European Central Bank, European Investment Bank etc. … online services funded or supported by the EU … real-life trial and pilot projects with public administrations in the EU Member States … collaboration projects with EMT* Universities * European Masters in Translation 15 Online services connected to MT@EC in production Service Description/URL Internal Market Information IMI System http://ec.europa.eu/internal_market/imi-net/index_en.html SOLVIT nLex SOLVIT is an on-line problem solving network concerning misapplication of Internal Market law by public authorities. http://ec.europa.eu/solvit/ A common gateway to National Law http://eur-lex.europa.eu/n-lex/ 16 Online services connected to MT@EC in test Service Description/URL e-Justice The future electronic one-stop-shop in the area of justice http://e-justice.europa.eu/ ODR Platform to facilitate the resolution of consumer disputes out-ofcourt (Alternative Dispute Resolution) http://ec.europa.eu/consumers/redress_cons/adr_en.htm CircaBC Communication and Information Resource Centre for Administrations, Businesses and Citizens https://circabc.europa.eu/ EU Survey Tool for creating multilingual online surveys http://ec.europa.eu/eusurvey/ 17 Online services to be connected to MT@EC in preparation Service Description/URL TED TED (Tenders Electronic Daily) is the online version of the 'Supplement to the Official Journal of the European Union', dedicated to European public procurement http://ted.europa.eu/ Joinup is an open collaborative platform supporting interoperability in Europe https://joinup.ec.europa.eu/ Joinup 18 Online services interested in using MT@EC discussions initiated (indicative list) Service Description/URL EURES The European employment services network (European Job Mobility portal) https://ec.europa.eu/eures/ EQF ESCO The portal supporting the implementation of the European Qualifications Framework for lifelong learning http://ec.europa.eu/eqf/home_en.htm The multilingual classification of European Skills, Competences, Qualifications and Occupations which identifies and categorises skills and competences, qualifications and occupations in all 22 European languages and supports EURES and other similar portals https://ec.europa.eu/esco/ EPALE The European Portal for Adult Learning http://ec.europa.eu/epale 19 MT@EC for Public Administrations Context: MT@EC "Pilot operation" phase until Q4/2014 (ISA) Objective: Develop and test in real-life conditions methods and structures for most efficient use of MT@EC by different beneficiaries (including PAs); normal operation of service. Conditions • PAs participate on a voluntary basis. • No cost for PAs other than use of internal resources. • No commitment by DGT on use of service after the end of the pilot. Output • Service delivery models (including pricing) • Operational support structure and methods 20 MT@EC for Public Administrations - Free real-life trial - Staff members can have direct access to the standard MT@EC service [upon request by the individual PA staff member] • - The Organisation can participate in a customisation pilot project, where DGT can also build specific engines with their own data. • [Administrative Agreement between PA and DGT needed, to be signed until end of June 2015] 21 Customisation pilots for PA • Pilot A: • Pilot B: • Pilot C: • Pilot D: • Pilot E: Connect a PA information system to the standard MT@EC service. DGT builds custom engines with PA data available through MT@EC to all DGT builds custom engines with PA data available through MT@EC only to the PA DGT builds custom engines with PA data for PA to run in PA premises DGT assists PA to build own custom engines to run in PA premises If you are interested email DGT-MT@ec.europa.eu 22 Ongoing pilots Country Name of administration Finland Prime Minister's Office Germany Bundesprachenamt Greece Hellenic Quality Assurance and Accreditation Agency for Higher Education Type Pilot Central translation service Translation service of the Armed Forces C Education administration A E Discussions were held with more PAs but did not lead to signature of agreements on pilots usually because: • there was no need for custom engines • the necessary data were not enough or could not be shared • resources could not be made available for the work to be performed on the PA side. Special types of "pilots" Networks (Association des Conseils d’État et Cours administratives suprêmes de l'UE, Réseau des Présidents des Cours suprêmes judiciaires de l'UE, Legivoc project) 23 New languages (Norwegian) Staff access to MT@EC • Get an individual ECAS user name and password (selfregistration) using your work email address. [go to https://webgate.ec.europa.eu/cas/eim/external/register.cgi and follow the instructions] • Send an email to DGT-MT@ec.europa.eu asking for the activation of access to the service. • DGT will activate your access and inform you by email. 24 Users - total Country reg'd using Country TOTAL registered Austria 3 3 Belgium 5 3 Bulgaria 1 1 Croatia 0 0 Cyprus* 77 46 Czech Republic* 25 15 Denmark 0 0 Estonia 3 3 Finland 2 2 Only one France* 21 15 Germany* 30 28 Greece* 37 23 Hungary 1 0 Ireland 0 0 reg'd using Italy 2 1 Latvia 0 0 Lithuania 1 1 Luxembourg 3 2 Malta 0 0 Netherlands 8 8 Poland 0 0 Portugal* 7 5 32% Romania 9 7 2 to 9 54% Slovakia* 86 39 10 or more 14% Slovenia* 13 7 Spain* 9 7 Sweden* 3 3 UK 1 125 347 using 220 63,4% Requests per user * Countries where national events were organised Top 40 users Country Requests Domain Requests Germany 633 Economy and finance 674 Slovakia 313 Agriculture 218 France 156 Foreign affairs 92 Greece 125 European affairs 61 Cyprus 75 Health 61 Portugal 22 Modernisation 55 Finland 15 Education 48 Spain 14 Local government 48 Slovenia 12 Bulgaria 10 Czech republic 10 Lithuania 10 Domain Requests Transport 37 Telecom 20 Statistical authority 14 Employment 12 Interior 11 Justice 11 Police 11 26 Implementation • Usually individuals ask for their own translations. • In some cases a translation service centralises requests (for example through functional mailbox) • No guidelines on feedback or evaluation were imposed by DGT. Quality is "fit for purpose" (compliance with user requirements). A feedback function is available in MT@EC. • Translation to/from non-EU languages is very important in several cases. • For translators, if MT is not integrated in their translation workflow so as to post-edit easily, then they will not use it. • Original is sometimes hand-written or "confidential". 27 Feedback • Different depending on whether it comes from translators or other users • Little understanding of statistical MT technology and its constraints • Several problems were pointed out: • document formats and formatting • national names and acronyms • non translation of "common" words • ommission of words • consistency • syntax, grammar etc. Hint: Do not test on only one document to draw general conclusions. Usefulness depends greatly on factors such as type of document, quality of original, domain and language pair. 28 Intermediate conclusions (1) On the pilots • In most cases the generic engines were sufficient. • Difficult to find data that are useful in terms of quality and quantity for building engines while ownership and confidentiality is an issue. • Lack of clarity on status of the service after the end of the pilot discouraged investment on the side of PAs. • Translation services asked for guidelines for evaluation and structured feedback. • Information to technicians should be provided in their own language. • Need more clarity on scope of "public administration". 29 Intermediate conclusions (2) On the service • Do not need too much security: sTesta to internet https • The interface should be multilingual • A tool for translators and other users: different attitudes. • Use depends on "fitness for purpose" and not on some general quality of language On communication • Difficult to find the right network to promote (used ISA, EUPAN, COTSOES, DGT Field Offices in MS etc.) • Promotion in national events in the language of the country (even in videoconference) worked best. 30 MT@EC for EMT universities • Free use for teaching or research. • Mutually beneficial project-based cooperation. • The teacher/researcher may ask for access to see how it looks like and check whether it is relevant for his/her work. • If interested s/he sends a short project description (title, duration, objectives, approach, expected volume of requests) and a list of more persons to access. • At the end of the project s/he informs DGT on the outcome of the project or study, as well as any other feedback considered useful to improve the service and its use. Status: On 30.11.2014 we had 103 registered users, of which 75 are students, from 21 universities from 12 countries (11 EU MS and CH), of which 9 have communicated a research/teaching plan. 31 What next? from MT@EC... to the CEF automated translation platform CEF.AT will: • build on the existing MT@EC service • put emphasis on secure, quality, customisable MT 32 MT@EC Outline MT engines Users and Services DISPATCHER managing MT requests by language, subject… MT data language resources specific for each MT engine Language resources built around Euramis DATA MODELLING Customised interfaces ENGINES HUB USER FEEDBACK DATA HUB CEF.AT platform Outline The service MT engines DSIs DISPATCHER by language, domain… Engines factory Language resources managing MT requests Multilingual corpora Monolingual corpora NLP Tools Other SECURE (and performing) From data to engines Collect and clear QUALITY CUSTOMISABLE Real-life trial and customisation pilots for Public Administrations - There is still time for your organisation to participate in a pilot (sign agreement until end of June 2015). - Any staff member of a public administration can ask for access at any time. - Access will be free of charge until further notice. - Service delivery models (including pricing) will be developed only under the Connecting Europe Facility. - Lessons learned from the pilots will be used for developing the operational support structure and methods for the CEF. 35 Useful links • DGT MT page on europa.eu http://ec.europa.eu/dgs/translation/translationresources/machine_translation/index_en.htm • ISA page on action 2.8 Machine translation http://ec.europa.eu/isa/actions/02-interoperability-architecture/2-8action_en.htm Includes: • The ISA Work programme 2010-2014 for MT@EC • Presentations for public administrations • and more… • CEF work programme for 2014 where section 3.1.7 is on the CEF.AT platform https://ec.europa.eu/digital-agenda/sites/digital-agenda/files/WP2014%20%20official%20published.pdf • Language technologies (CEF, H2020,…) http://ec.europa.eu/digital-agenda/language-technologies • Language technology resources (DGT-TM, EuroVoc,…) http://ec.europa.eu/jrc/en/language-technologies 36 Questions? spyridon.pilos@ec.europa.eu DGT-MT@ec.europa.eu