The value of Post Editing - IBM Case Study Frank X. Rojas, Jian Ming Xu, Santi Pont Nesta, Álex Martínez Corrià, Salim Roukos, Helena Chapman, Saroj K. Vohra June 2011 100 years of progress and innovation © 2011 IBM Corporation IBM Case Study – MT Post Editing Introduction MT Innovation Process Overview Findings Conclusion / Recommendations 2 © 2011 IBM Corporation IBM World Wide Translation Operations Marketing Material Machine Translation Legal/Safety/ Contracts Multimedia Publications Overall End to End Process Management Francization Cultural Consultancy Product Integrated Information Centralized DTP Web Process ~2.8 B Words ~60 language pairs Translate ~0.4 B Words 24 Centers World Wide ~115 Translation Suppliers One Stop Shop for all Translation Services 3 © 2011 IBM Corporation IBM Professional Translation Services 2 Consistent Quality Standards Global Brand Identity Professional Quality Standards 250 200 150 100 50 2001 2002 2003 2004 2005 2006 2007 2008 2009 1 Unit Cost >50% Reduction Traditional Technology Process Mgmt 0 2001 2002 2003 2004 2005 2006 2007 2008 2009 Professional Memory 72% 85% Re-Use 3 Human Skill Future: – Ability to reduce cost using conventional methods reaching limits – Business pressure for additional cost elimination – Looking to MT Technology as next wave to reach business goals 4 © 2011 IBM Corporation Historical Perspective 2010 MT piloting Pilot: SPA, ITA, FRE, GER ------------------------------------New E2E process Partnership: WWTO/n.Fluent 8.6 M words Initial n.Fluent/WWTO Spanish MT pilot ------------------------------------Improve efficiency of professional translators es gin n TE 2012 id br Hy M 2011 2011 MT Training Pilot: GER, BPR, JPN, CHS ------------------------------------MT payment profiles ready n.Fluent customized with WWTO translation memories 2010 es RTTS introduced in 2006 as platform for speech and text translation, developed by IBM Research M al tic s i t Sta 2009 2008 s ine dM se a B le u R 2007 2006 5 in ng E T 16.0 M words target RTTS licensed to IBM partners ng TE - MT portal - Generic crowdsourcing - Text translation services June 2008 eSupport (www) “Translate This Page” JPN pilot / rule engine eSupport “Translate This Page” switch to n.Fluent © 2011 IBM Corporation MT Critical Success Metrics Necessary and sufficient condition to measure success – 5.0 M words sampled – Minimum of 3 languages – Net Contribution to ROI by MT Engine: 10% of payable words should be MT – No more than 5% adverse impact to Overall Quality Index – No more than 5% impact to Customer Satisfaction Lack of industry metrics and guidance. – Active research on MT technology... no guidance on operational impacts – A business vacuum existed on how to integrate MT services – No operational process had been defined for MT services 6 © 2011 IBM Corporation Recent Digital Innovations with Biggest Impact in the Business World* IBM’s Watson Q&A computer Google’s autonomous car Technologies to understand and produce natural human speech Instantaneous, high-quality machine translation Smartphones / App phones in the developing world 7 *Andrew McAfee is a principal research scientist in the MIT Sloan School of Business © 2011 IBM Corporation Real-Time Translation Server (RTTS) & n.Fluent IT HELP DESK Real Time Translation Server (RTTS) IBMs MT Engine RTTS provides machine translation for n.Fluent & other applications APIs allow other applications to access these translation services. Customization tools – Domains, chat-specific models, … Commercially licensed to IBM partners Language Pairs to/from English: العربية 中文 Deutsch Français English 日本語 Italiano •BLEU Quality •0.5 •0.45 •0.4 •0.35 •0.3 •0.25 •0.2 •0.15 •0.1 •0.05 •0 한국어 Português Base 29k 180k 350k Words Español Русский n.Fluent IBMs MT translation application Providing machine translation services for: Text, web pages, and documents (Word, Excel, …) Instant Messaging chats (via IM plug-in) Mobile translation application (BlackBerry and others) Enabled with LEARNING via crowdsourcing (internal 450K IBMers) Deployed for eSupport self serving tech support (external) 8 © 2011 IBM Corporation Historical Perspective 2010 MT piloting Pilot: SPA, ITA, FRE, GER ------------------------------------New E2E process Partnership: WWTO/n.Fluent 8.6 M words Initial n.Fluent/WWTO Spanish MT pilot ------------------------------------Improve efficiency of professional translators es gin n TE 2012 id br Hy M 2011 2011 MT Training Pilot: GER, BPR, JPN, CHS ------------------------------------MT payment profiles ready n.Fluent customized with WWTO translation memories 2010 es RTTS introduced in 2006 as platform for speech and text translation, developed by IBM Research M al tic s i t Sta 2009 2008 s ine dM se a B le u R 2007 2006 9 in ng E T 16.0 M words target RTTS licensed to IBM partners ng TE - MT portal - Generic crowdsourcing - Text translation services June 2008 eSupport (www) “Translate This Page” JPN pilot / rule engine eSupport “Translate This Page” switch to n.Fluent © 2011 IBM Corporation MT Post Editing End to End Workflow English TM Pre-Process TM Match Analysis Shipment Editing Session 100% Exact Match New / Changed TM MT Model & MT CAT Translation 1.Show best choice vs vs 2.Select best choice (Post Edit rules) TESTING QUALITY Trans. 3. Commit language MT Pre-Process Upfront & on-going MT tuning via IBM TM professional translations – Professional translation = Best context Matching methods – Traditional TM – Machine TM – breaks down content @ segment level – breaks down segments @ block level using MT models – reconstructs segments preserving formats/mark-up tags MT service level integration 10 = Localization Kit (NLV Folder) © 2011 IBM Corporation MT Pre-processing ALL segment “no match segments” Domain specific parallel training corpus 100% Exact Match New / Changed TM New / Changed MT initial corpus Build dynamic, domain specific MT model TM MT General parallel training corpus Localization kit 100% Exact Match MT Initial MT corpus – done before start of project 11 18-sept.-08 Translation of no match segments © 2011 IBM Corporation TM Editing Environment TM Environment Xxx xxx xx xxx xxx xxx. The application unprotects files before exporting them. Yy yyy yyy Translation Memory 0 - The application unprotects files before exporting them. 1[m] – La aplicación desprotege archivos antes de exportarlos. 2[f 85%] - La aplicación protege los archivos antes de exportarlos [Ctrl + 1] MT TM Translator options Ignore fuzzy and MT Post edit MT Post edit fuzzy Two Seconds Rule: Translators are trained on several strategies to make a quick choice TM Environment Xxx xxx xx xxx xxx xxx. La aplicación desprotege los archivos antes de exportarlos. Yy yyy yyy Typed 12 18-sept.-08 © 2011 IBM Corporation Each event Productivity Measurements Start segment – Choose action End segment 1. accept match [~0 time] 2. edit match [X time] 3. reject match [manual translation] MT productivity evaluation log (MTeval Log) – N events – Words | Time | Existing Proposal | Used Proposal | ... EM : Exact RM : Replace FM : Fuzzy MT : Machine NP : No Proposal A) = “best” Existing Proposal B) = “alternative” Existing Proposal C) = reject all Existing Proposal, 100% human labor Examine productivity per payment category – SUM(Words) / SUM(Time) – Use of IBM Business Analytic Tool (SPSS) – Trim events that fall into 5% (slowest) and 95% (fastest) percentile 13 © 2011 IBM Corporation Single Shipment EXAMPLE Used MT MT SEGMENTID Count 1-EM WORDS Sum 0 NO MT TIME Sum . Prod_W_T Median SEGMENTID Count WORDS Sum TIME Sum Prod_W_T Median . . 1350 10593 3022 2.00 2-RM 4 18 43 .42 239 3905 3085 1.50 3-FM 129 1419 3870 .46 334 5610 9466 .71 5-MT 111 1777 4071 .50 0 . . . 6-NP 133 697 3393 .20 9 131 412 .33 Total 377 3911 11377 .37 1932 20239 15985 1.67 Total # events : 2,309 (377+1,932) Key metrics Total words: 24,150 – 3,911 w/ MT match – 20,239 w/o MT match 14 Total time: 27,362 11,377 w/ MT match 15,985 w/o MT match MT impact to productivity – MT : 0.44 words/sec [1777 words / 4071 sec] – NP • 0.21 w/ MT match • 0.32 w/o MT match Baseline (placebo) MT Leverage : 71.8% [1777 / (1777+697)] rate(MT) / rate(NP): 1.37 i.e. Translator can complete 37% more words in the same time. © 2011 IBM Corporation MT Impact on Fuzzy Match : 4Q10 Findings When FM & MT matches exist simultaneously Productivity: rate(MT) / rate(NP): a. Case : Translator edits FM b. FM-MT Combined case c. Case: Translator edits MT 8.00 Overall – Machine matches not as good as professional (fuzzy) matches – No statistical impact to fuzzy productivity to include MT matches. • SPA highest sample case 7.00 Productivity ratio 6.00 5.00 FM FM-MT MT 4.00 3.00 2.00 1.00 0.00 FRE FM-MT Pick Rate: 15 28.6% GER 4.4% ** Findings subject to change with additional sampling. ITA SPA 57.6% 46.9% © 2011 IBM Corporation MT Key Metrics: 4Q10 Findings MT MT Words # Events New/Changed (% of NP) Leverage FRE 20417 209347 2.87 68.9% GER 36634 250238 1.32 5.4% ITA 78483 715557 2.70 46.2% SPA 783238 7424298 1.74 55.2% Total 918772 8599440 8.6 M words sampled in real time translation service. SPA : Qualified MT engine 4Q10 ITA : Qualified MT engine 4Q10 FRA : Qualified MT engine 1Q11 • While rate(MT) / rate(NP) is high, the findings were not statistically significant in 4Q. GER : Insufficient productivity from MT engine 16 ** Findings subject to change with additional sampling. © 2011 IBM Corporation Overall Savings Assessment Overall savings % – Word savings due to MT efficiency • Convert time savings MT payment factor % – MT payment factor X [MT % words + NP % words] • Results in less payable words. MT productivity savings drives a overall savings – These are not the same due to MT % distribution. Supply chain has to consider cost of MT services 17 ** Findings subject to change with additional sampling. © 2011 IBM Corporation Pay for MT Words Translated not MT Matches We pay for final results (MT payable words) not MT matches – MT matches considered “opinion” until chosen by a human – Too many opinions & opinions by immature MT models are less efficient. Actual MT payable words have value beyond the specific project – Post Edited words are reused in future and unknown MT context Engine has to deliver consistent MT payable words – Minimum needed to quality an MT engine for compensation • High MT productivity [rate(MT) / rate(NP)] • High MT leverage [% of MT matches used] – Compensation to be based on MT payment factor 18 © 2011 IBM Corporation Variance across Languages There is no single maturity path when modeling MT engines across many languages. IBM Pilot: each trained MT engine is a unique asset. – Some languages require more modeling/tuning than others. – Language pairs that service “Loose -> Structured” languages are struggling • German requires more effort than Spanish Are there limitations to statistical MT engines? – New thinking may need to be explored? Each MT engine will have separate MT payment factors. 19 © 2011 IBM Corporation Perspective of MT Post Edit Pilots Domain Specific Professional Translation Services (Professional LSP) Community Translation Services (Controlled Social Crowd) HIGHER All IBM external/internal Pubs / UI external (2011 Pilots) Volunteer Translation Services (General Crowds) internal IBM Free Services (Individual) internal IBM WWTO “human” New n.Fluent “machine” Quality / Reliability Memory Assets General Translation Service Hierarchy LOWER MT Post Editing has impacts across entire Translation Service Hierarchy 20 © 2011 IBM Corporation MT Post Editing Project – Key Lessons 1. Professional (Human) memories are the best assets and deliver the highest quality. 2. Professional memories are a key asset for MT success. 3. All Memory assets need to be protected and managed. 4. Flow of memories between Professional and Machine must be properly balanced. 5. Dynamic modeling offers significant advantage over static modeling. 6. Continuous business analytics is needed to optimize machine assets. 7. A single cost model per language is needed, independent of MT services/engines. 8. An aggressive yet cautious approach is warranted to go forward. MT Post Editing does improve productivity and efficiency of a localization supply chain. 21 © 2011 IBM Corporation