ICU and OpenTM2 Helena Chapman Program Director, IBM Corporate Globalization ICU Overview Agenda • • • • • Isn't Unicode enough? Why ICU? Where is ICU? What's new in ICU 4.6 & 4.8? What's next for ICU? The Nature of Unicode • Handles all modern world languages – (Well, almost all of them) • Efficient and effective processing • Lossless data exchange • Enables single-binary global software But... • 1,400 pages + Annexes + additional standards • Nearly 110,000 characters • Major update every 3 years, minor update about once a year • 80+ character properties, many multi-valued • Affects many processes: display, line-break, regular expressions... Internationalization, Localization & Locales Requirements vary widely across languages & countries • Sorting • Text searching • Bidirectional text processing and complex text layout • Date/time/number/currency formatting • Codepage conversion • …and so on Performance is key • It might be easy to do the right thing • It is hard to do it fast ICU Features • Unicode text handling • Breaks: word, line, … • Charset conversions (175+) • Formatting • Charset detection – Date & time • Collation & Searching – Durations • Locales from CLDR (530+) • Resource Bundles • Calendar & Time zones • Complex-text layout engine – Normalization • Unicode Regular Expressions – Casing – Messages – Numbers & currencies – Plurals • Transforms – Transliterations ICU Works Everywhere Mature, widely used set of C/C++ and Java libraries • Basis for Java 1.1 internationalization, but goes far beyond Java 1.1 Very portable – identical results on all platforms/programming languages • C/C++ (ICU4C): 30+ platforms/compilers • Java (ICU4J): Oracle and IBM JRE Full threading model Customizable & Modular Open source (since 1999) – but non-restrictive • Governed by a Project Management Committee • Contributions from many parties (IBM, Google, Apple, Yahoo, ...) ICU Is Kept Up To Date • 1..2 major ICU releases per year • Each ICU release supports the latest – Unicode version – CLDR version – Time zone database update • TZ DB updates for past ICU versions • Maintenance releases for important bugs ICU in IBM Products • All 5 major IBM software brands • IBM operating systems • Products Ascential Software, Cognos, PSD Print Architecture, DB2, COBOL, Host Access Client, InfoPrint Manager, Informix GLS, iSeries, Language Analysis Systems, Lotus Notes, Lotus Extended Search, Lotus Workplace, WebSphere Message Broker, NUMA-Q, OTI, OmniFind, Pervasive Computing WECMS, Rational Business Developer and Rational Application Developer, SS&S Websphere Banking Solutions, Tivoli Presentation Services, Tivoli Identity Manager, WBI Adapter/ Connect/Modeler and Monitor/ Solution Technology Development/WBIFinancial TePI, Websphere Application Server/ Studio Workload Simulator/Transcoding Publisher, XML Parser. ICU in Google Products • Web Search • Google Analytics • Chrome • Google Gears • Android • Google Groups • Adwords • Others... • Google Finance • Google Maps • Blogger ICU in Apple Products • Mac OS X, including applications • iOS (iPhone, iPad, iPod touch) • Windows applications and related support – Safari – iTunes – Apple Mobile Device Support • Others... Other ICU Users ABAS Software, Adobe, Amazon (Kindle), Amdocs, Apache (Harmony, Lucene, Solr, PDFBox, Tika, Xlan, Xerces, ....), Appian, Argonne National Laboratory, Avaya, BAE Systems Geospatial eXploitation Products, BEA, BluePhoenix Solutions, BMC Software, Boost, BroadJump, Business Objects, caris, CERN, Debian Linux, Dell, Eclipse, eBay, EMC Corporation, ESRI, Free BSD, Gentoo Linux, GroundWork Open Source, GTK+, Harman/Becker Automotive Systems GmbH, HP, Hyperion, Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS, Jikes, Library of Congress, Mathworks, Mozilla, Netezza, OpenOffice, Lawson Software, Leica Geosystems GIS & Mapping LLC, Mandrake Linux, OCLC, Progress Software, Python, QNX, Rogue Wave, SAP, SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), SuSE, Sybase, Symantec, Teradata (NCR), Trend Micro, Virage, webMethods, Wine, WMS Gaming, XyEnterprise, Yahoo!, and many others. Recent Changes in Translation Support • Hardcoded choices for plural: – Plural Varies by Language • • • • • English: singular (1), plural (other) French: singular (0, 1), plural (other) Japanese: no difference Russian: 4 categories Arabic: 6 categories – if(num==1) { msg_singular; } else { msg_plural; } ☹ Does not work in most languages ☹ Translator might see messages independently, translate inconsistently 2007 CLDR/ICU PluralRules • CLDR data <pluralRules locales="be bs hr ru sh sr uk"> <pluralRule count="one">n mod 10 is 1 and n mod 100 is not 11</pluralRule> <pluralRule count="few">n mod 10 in 2..4 and n mod 100 not in 12..14</pluralRule> <pluralRule count="many">n mod 10 is 0 or n mod 10 in 5..9 or n mod 100 in 11..14</pluralRule> <!-- others are fractions --> </pluralRules> • ICU class maps number → keyword (e.g., 23 → "few") 2007 ICU PluralFormat • Sibling of ChoiceFormat, used in MessageFormat "There {num_files,plural, one{is one file} other{are # files}}." • ☺ Single message with all plural variants, translated in context • ☺ Translator: know only relevant set of plural forms/keywords, not detailed rules – More/fewer variants per language (few, many, ...) • ☺ # as shorthand for {num_files,number} References ICU Main Site: http://icu-project.org • Download ICU Releases • User Guide • Demonstrations • Technical FAQ • Bug Report • Mailing Lists (design & support) OpenTM2 Introduction to OpenTM2 – An Open Source Solution for Translators August 23, 2012 – Version Agenda › General Overview of OpenTM2 OpenTM2 An Introduction › Strategy and Vision of OpenTM2 › Objectives & Benefits of OpenTM2 › OpenTM2 Core Functions › OpenTM2 Core Modules & Additional Modules › OpenTM2 Development Schedule › OpenTM2 Supporter › Sources for More Information about OpenTM2 OpenTM2 Overview OpenTM2 An Introduction General Overview – What OpenTM2 is OpenTM2 is ... OpenTM2 is not ... › An open source software project › A globalization management system › A translator's workbench › An open translation management data project › Based on IBM TranslationManager/2 › An enterprise-level translation memory CAT software. › A machine translation tool or environment › A complete version of IBM TranslationManager/2 › Free of implementation costs. › Intended as the reference implementation platform for translation asset standards. OpenTM2 Overview OpenTM2 An Introduction General Overview – The communities Steering Committee › IBM Community › Interaction is management through discussion groups › Lisog / Folt › TRAC is used to report bugs or request › Gala new features. OpenTM2 Overview OpenTM2 An Introduction General Overview - How to contribute to the Project? › Subscribe to the mailing lists › Test OpenTM2 and report bugs and request new features › Review the documentation and help improving it › Help with OpenTM2 release and maintenance tasks › Fix bugs and offer patches › High-level contributors may be invited to take on greater responsibilites OpenTM2 Overview OpenTM2 An Introduction Strategy and Vision – The all over goals › Develop a reference implementation for translation asset exchange standards ● Think of a lossless exchange of translation memories using TMX ● Think of a lossless exchange of translation dictionaries/glossaries using TBX › Encourage the development of open standards across the entire content management chain ● Think of a lossless exchange of source files to be translated using XLIFF ● Think of a lossless segmentation of source files using SRX › Deliver choice to translators and enable them to work on projects without tool lock-in ● Think of OpenTM2 as the open source translation environment OpenTM2 Overview OpenTM2 An Introduction Strategy and Vision – Next steps › Restructuring: ● Build modules and plug-ins ● Re-design existing features and build new features to be open standards based ● Compile it for multi-platform usage ● Build a better architecture, but re-use existing services ● Make it scalable › End Goals: ● An open source solution that will produce high quality localization results without high cost: ● Excellent reuse of translation memories and terminology ● Hooks to project management software ● Hooks to various systems: ERP ● Ability to perform real time collaboration across multiple users OpenTM2 Overview Core Modules & Additional Modules OpenTM2 An Introduction OpenTM2 Overview OpenTM2 An Introduction Sources for more information › The OpenTM2 home page: ● https://sites.google.com/site/opentm2/home › The source code repository (SVN): ● http://145.253.107.23/svn/opentm2/ › The OpenTM2 WiKi: ● http://www.beo-doc.de/opentm2wiki/index.php/Main_Page › The OpenTM2 problem reporting database (TRAC): ● http://source.opentm2.org:8000/opentm2/report OpenTM2 Overview