OpenTM2 is

advertisement
ICU and OpenTM2
Helena Chapman
Program Director, IBM Corporate Globalization
ICU Overview
Agenda
•
•
•
•
•
Isn't Unicode enough?
Why ICU?
Where is ICU?
What's new in ICU 4.6 & 4.8?
What's next for ICU?
The Nature of Unicode
• Handles all modern world languages
– (Well, almost all of them)
• Efficient and effective processing
• Lossless data exchange
• Enables single-binary global software
But...
• 1,400 pages + Annexes + additional standards
• Nearly 110,000 characters
• Major update every 3 years, minor update
about once a year
• 80+ character properties, many multi-valued
• Affects many processes: display, line-break,
regular expressions...
Internationalization, Localization & Locales
Requirements vary widely across languages & countries
•
Sorting
•
Text searching
•
Bidirectional text processing and complex text layout
•
Date/time/number/currency formatting
•
Codepage conversion
•
…and so on
Performance is key
•
It might be easy to do the right thing
•
It is hard to do it fast
ICU Features
•
Unicode text handling
•
Breaks: word, line, …
•
Charset conversions (175+)
•
Formatting
•
Charset detection
– Date & time
•
Collation & Searching
– Durations
•
Locales from CLDR (530+)
•
Resource Bundles
•
Calendar & Time zones
•
Complex-text layout engine
– Normalization
•
Unicode Regular Expressions
– Casing
– Messages
– Numbers & currencies
– Plurals
•
Transforms
– Transliterations
ICU Works Everywhere
Mature, widely used set of C/C++ and Java libraries
•
Basis for Java 1.1 internationalization, but goes far beyond Java 1.1
Very portable – identical results on all platforms/programming
languages
•
C/C++ (ICU4C): 30+ platforms/compilers
•
Java (ICU4J): Oracle and IBM JRE
Full threading model
Customizable & Modular
Open source (since 1999) – but non-restrictive
•
Governed by a Project Management Committee
•
Contributions from many parties (IBM, Google, Apple, Yahoo, ...)
ICU Is Kept Up To Date
• 1..2 major ICU releases per year
• Each ICU release supports the latest
– Unicode version
– CLDR version
– Time zone database update
• TZ DB updates for past ICU versions
• Maintenance releases for important bugs
ICU in IBM Products
• All 5 major IBM software brands
• IBM operating systems
• Products
Ascential Software, Cognos, PSD Print Architecture, DB2, COBOL, Host
Access Client, InfoPrint Manager, Informix GLS, iSeries, Language
Analysis Systems, Lotus Notes, Lotus Extended Search, Lotus Workplace,
WebSphere Message Broker, NUMA-Q, OTI, OmniFind, Pervasive
Computing WECMS, Rational Business Developer and Rational
Application Developer, SS&S Websphere Banking Solutions, Tivoli
Presentation Services, Tivoli Identity Manager, WBI Adapter/
Connect/Modeler and Monitor/ Solution Technology Development/WBIFinancial TePI, Websphere Application Server/ Studio Workload
Simulator/Transcoding Publisher, XML Parser.
ICU in Google Products
• Web Search
• Google Analytics
• Chrome
• Google Gears
• Android
• Google Groups
• Adwords
• Others...
• Google Finance
• Google Maps
• Blogger
ICU in Apple Products
• Mac OS X, including applications
• iOS (iPhone, iPad, iPod touch)
• Windows applications and related support
– Safari
– iTunes
– Apple Mobile Device Support
• Others...
Other ICU Users
ABAS Software, Adobe, Amazon (Kindle), Amdocs, Apache (Harmony, Lucene,
Solr, PDFBox, Tika, Xlan, Xerces, ....), Appian, Argonne National Laboratory,
Avaya, BAE Systems Geospatial eXploitation Products, BEA, BluePhoenix
Solutions, BMC Software, Boost, BroadJump, Business Objects, caris, CERN,
Debian Linux, Dell, Eclipse, eBay, EMC Corporation, ESRI, Free BSD, Gentoo
Linux, GroundWork Open Source, GTK+, Harman/Becker Automotive Systems
GmbH, HP, Hyperion, Inktomi, Innodata Isogen, Informatica, Intel, Interlogics,
IONA, IXOS, Jikes, Library of Congress, Mathworks, Mozilla, Netezza,
OpenOffice, Lawson Software, Leica Geosystems GIS & Mapping LLC,
Mandrake Linux, OCLC, Progress Software, Python, QNX, Rogue Wave, SAP,
SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), SuSE, Sybase,
Symantec, Teradata (NCR), Trend Micro, Virage, webMethods, Wine, WMS
Gaming, XyEnterprise, Yahoo!, and many others.
Recent Changes in Translation Support
• Hardcoded choices for plural:
– Plural Varies by Language
•
•
•
•
•
English: singular (1), plural (other)
French: singular (0, 1), plural (other)
Japanese: no difference
Russian: 4 categories
Arabic: 6 categories
– if(num==1) { msg_singular; } else { msg_plural; }
☹ Does not work in most languages
☹ Translator might see messages independently, translate
inconsistently
2007 CLDR/ICU PluralRules
• CLDR data
<pluralRules locales="be bs hr ru sh sr uk">
<pluralRule count="one">n mod 10 is 1 and
n mod 100 is not 11</pluralRule>
<pluralRule count="few">n mod 10 in 2..4 and
n mod 100 not in 12..14</pluralRule>
<pluralRule count="many">n mod 10 is 0 or
n mod 10 in 5..9 or
n mod 100 in 11..14</pluralRule>
<!-- others are fractions -->
</pluralRules>
• ICU class maps number → keyword (e.g., 23 → "few")
2007 ICU PluralFormat
• Sibling of ChoiceFormat, used in MessageFormat
"There {num_files,plural,
one{is one file}
other{are # files}}."
• ☺ Single message with all plural variants, translated in
context
• ☺ Translator: know only relevant set of plural
forms/keywords, not detailed rules
– More/fewer variants per language (few, many, ...)
• ☺ # as shorthand for {num_files,number}
References
ICU Main Site: http://icu-project.org
• Download ICU Releases
• User Guide
• Demonstrations
• Technical FAQ
• Bug Report
• Mailing Lists (design & support)
OpenTM2
Introduction to
OpenTM2 – An Open
Source Solution for
Translators
August 23, 2012 – Version
Agenda
› General Overview of OpenTM2
OpenTM2
An Introduction
› Strategy and Vision of OpenTM2
› Objectives & Benefits of OpenTM2
› OpenTM2 Core Functions
› OpenTM2 Core Modules & Additional Modules
› OpenTM2 Development Schedule
› OpenTM2 Supporter
› Sources for More Information about OpenTM2
OpenTM2 Overview
OpenTM2
An Introduction
General Overview – What OpenTM2 is
OpenTM2 is ...
OpenTM2 is not ...
› An open source software project
› A globalization management system
› A translator's workbench
› An open translation management data project
› Based on IBM TranslationManager/2
› An enterprise-level translation memory
CAT software.
› A machine translation tool or environment
› A complete version of IBM TranslationManager/2
› Free of implementation costs.
› Intended as the reference implementation
platform for translation asset standards.
OpenTM2 Overview
OpenTM2
An Introduction
General Overview – The communities
Steering Committee
› IBM
Community
› Interaction is management through
discussion groups
› Lisog / Folt
› TRAC is used to report bugs or request
› Gala
new features.
OpenTM2 Overview
OpenTM2
An Introduction
General Overview - How to contribute to the Project?
› Subscribe to the mailing lists
› Test OpenTM2 and report bugs and request new features
› Review the documentation and help improving it
› Help with OpenTM2 release and maintenance tasks
› Fix bugs and offer patches
› High-level contributors may be invited to take on greater responsibilites
OpenTM2 Overview
OpenTM2
An Introduction
Strategy and Vision – The all over goals
› Develop a reference implementation for translation asset exchange standards
●
Think of a lossless exchange of translation memories using TMX
●
Think of a lossless exchange of translation dictionaries/glossaries using TBX
› Encourage the development of open standards across the entire content
management chain
●
Think of a lossless exchange of source files to be translated using XLIFF
●
Think of a lossless segmentation of source files using SRX
› Deliver choice to translators and enable them to work on projects without
tool lock-in
●
Think of OpenTM2 as the open source translation environment
OpenTM2 Overview
OpenTM2
An Introduction
Strategy and Vision – Next steps
› Restructuring:
●
Build modules and plug-ins
●
Re-design existing features and build new features to be open standards based
●
Compile it for multi-platform usage
●
Build a better architecture, but re-use existing services
●
Make it scalable
› End Goals:
●
An open source solution that will produce high quality localization results
without high cost:
●
Excellent reuse of translation memories and terminology
●
Hooks to project management software
●
Hooks to various systems: ERP
●
Ability to perform real time collaboration across multiple users
OpenTM2 Overview
Core Modules & Additional Modules
OpenTM2
An Introduction
OpenTM2 Overview
OpenTM2
An Introduction
Sources for more information
› The OpenTM2 home page:
●
https://sites.google.com/site/opentm2/home
› The source code repository (SVN):
●
http://145.253.107.23/svn/opentm2/
› The OpenTM2 WiKi:
●
http://www.beo-doc.de/opentm2wiki/index.php/Main_Page
› The OpenTM2 problem reporting database (TRAC):
●
http://source.opentm2.org:8000/opentm2/report
OpenTM2 Overview
Download