XLIFF - the XML based Open Standard for Localisable Content Tony Jewtushenko Oracle Corporation - Principal Product Manager Chair – OASIS XLIFF TC The XML Localisation Interchange File Format Agenda • Open Standards Definition and process • Overview of XLIFF Definition, goals, and benefits of XLIFF Architecture and Main Features of XLIFF Use cases • Open Source Localisation Technical Overview Process Overview Use case • Where does XLIFF fit? Tools Support for XLIFF XLIFF Adoption by Open Source community Slide 2 What is an Open Standard? Open standards are: • Publicly available in stable, persistent versions • Developed and approved under a published process • Open to public input: public comments, public archives, no NDAs • Subject to explicit, disclosed IPR terms • See the US, EU, WTO governmental & treaty definitions of “standards” Anything else is proprietary Source: “Relationship Between Open Standards and Open Source Software”, Patrick Gannon – CEO OASIS, Open Source in Government, Washington, DC, 15-17 March 2004 Slide 3 OASIS: Standards Body Home of XLIFF • OASIS: Organization for the Advancement of Structured Information Standards • World’s largest independent, non-profit organization dedicated to the standardisation of eBusiness specifications. • More than 150 member companies plus individuals • Operates XML.ORG Registry, the open community clearinghouse of XML application schemas • Technical work on XML interoperability includes XML conformance and XML Registries/Repositories • General XML and eBusiness technical resource Slide 4 OASIS Standards Process • Specifications are created under an open, democratic, vendor-neutral process – Anyone may participate – No single organisation can dictate the specification specifications must meet everyone’s needs – All discussions are open to the public view and comment • Two Tiered Specification approval process – Committee Draft approved by Technical Committee – OASIS members approve specification as OASIS Standard • Process guarantees that specifications are created by a broad range of industry, not just a single vendor Slide 5 XLIFF Overview A glance at the definitions, goals and benefits of the XML Localisation Interchange File Format. Slide 6 What is XLIFF? A specification for the lossless interchange of localizable data and its related information, which is tool-neutral, has been formalized as an XML vocabulary, and features an extensibility mechanism. Slide 7 Why XLIFF is Needed? Localization offers the following challenges: • Insufficient interoperability between tools. • Lack of support for overall localization workflow. • Necessity of localization tools developers to deal with many formats. • Large number of proprietary intermediate formats. Slide 8 Advantages – Technology (1/2) • For a given utility, only one implementation is necessary (e.g. not one spell checker for PO Files, and another one for HTML). • Increases usability of utilities (i.e. all formats with XLIFF filters can be used with XLIFFenabled utilities). • Can contain either UI or Document content • Metadata provides integration with automated workflow. Slide 9 Advantages – Technology (2/2) • All advantages of XML-based processing: – – – – – – Content validation (XSD) Use of its internationalization features. Better interoperability and cross-platform support. Powerful rendering options (XSL-FO, CSS). Powerful transformation options (XSLT). Greater integration with Web services. • Access to existing, and often open-source, XML implementations Slide 10 XLIFF Timeline • • • • • • • • • • • September 2000 - DataDefinition Kickoff December 2000 - first face to face March 2001 - second face to face End March 2001 - draft 1.0 spec and DTD published June 2001 - White Paper published December 2001 - OASIS XLIFF Technical Committee Proposal submitted April 2002 – XLIFF 1.0 Specification approved by formal vote as an OASIS Committee Specification May 2003 – XLIFF 1.1 Specification approved by formal vote as an OASIS Committee Specification August/Sept 2003 – XLIFF 1.1 Peer Review November 2003 – Revised XLIFF 1.1 Specification approved as OASIS Committee Specification November 2003 – XLIFF 1.1 Specification submitted for public review Slide 11 Drivers Behind XLIFF Alchemy Software Bowne Global Solutions Convey Software Ektron, Inc ENLASO Corp (RWS) Globalsight HP Lotus/IBM Lionbridge LRC Moravia IT Novell Oracle PASS Engineering Microsoft SAP SDL International Sun Microsystems Tektronix TRADOS XML-Intl Slide 12 XLIFF TC in the Standards Community • Shared interests with OASIS Translation Web Services Technical Committee – XLIFF may be used as data container for WS • Shared interests with the OSCAR SIG at LISA – Segmentation and word-count. – Content markup (inline codes). • Shared interests with the W3C i18n WG – – – – Localization directives. Best practices. In the localization aspects of the W3C. recommendations. Web services. Slide 13 Architecture A look at XLIFF’s main features and how they work together. Slide 14 Extract-Localize-Merge Paradigm • Separate data related to localization from parts not related to localization. • Merge translated data with codes at the end of the process to create the final document. • Skeleton file is optional, so this paradigm is also optional Slide 15 A Birds-Eyes View 1. 2. 3. 4. An XLIFF document can capture anything needed for a localization project: Localizable objects (e.g. text strings) in source and target languages. Supplementary information (e.g. glossaries, or material to recreate the original format). Administrative information (e.g. workflow data). Custom data (e.g. initialization information for tools). Slide 16 The XLIFF Document • An XLIFF document is designed to store the extracted data related to localization. • Each given source container (e.g. a file, a database table, and so forth) corresponds to a <file> element in XLIFF. • Each XLIFF document can include several <file> elements. • A whole localization project can possibly be stored in a single XLIFF document. Slide 17 Bilingual Model • Each <file> element is designed to store one source language and one target language. • The rational is that the translation of different target language is done by different people most of the time. • However, languages in <alt-trans> element can be different. For example, proposed matches in national Portuguese when translating into Brazilian Portuguese. Slide 18 Localizable Objects • XLIFF allows not only text string as localizable object but also other object types such as graphics. • Supplementary information can be represented in a generic way through inline codes (e.g. formatting of text). • Relationship between object can be captured (e.g. all items in a menu). Slide 19 An XLIFF Snippet… A simple menu represented as XLIFF Slide 20 Supplementary Info • XLIFF provides “hooks” for storing supplementary information (for example to glossaries or translation memories which should be used). • The supplementary information can be referenced (i.e. reside outside of the document), or embedded within the document. Slide 21 Administrative Info XLIFF provides mechanisms for capturing administrative information: • For relating source material to XLIFF documents. • For storing workflow data. • For providing pre-translation entries generated by TM, MT, translation repository. • For keeping track of changes. Slide 22 XLIFF 1.1 Custom Data In XLIFF 1.1, we have the ability to customise XLIFF by extending via private namespace: – Elements – Attributes – Attribute Values Slide 23 Embedding XLIFF 1.1 • Can embed an entire or part of an XLIFF doc in other XML doc • XML defined by XML Schema (XSD) that includes an <any> element in the definition of the element where the XLIFF data can be inserted Slide 24 Use Cases XLIFF in the localisation process. Slide 25 Basic Use Case – without XLIFF Native File 1 (e.g., HTML) Native File 2 (e.g., Java Files) Developer Applications Customer Specific Tool (s) Native File 3 (e.g., Java Properties) Native File n Publisher/ Customer Domain Localisation Domain Tool Resource Filters Slide 26 Translator Basic Use Case –with XLIFF Direct to XLIFF authoring XLIFF compliant Developer Applications - OR - Pre-processing HTML XLIFF XLIFF file(s) containing Compliant HTML, Java, Properties, etc Editor translatable resources Translator RC Data Java Non XLIFF Properties compliant Developer Applications Publisher/ Customer Domain Localisation Domain Slide 27 Automated Localisation with CAT Use Case XLIFF Translation Kit Pseudo Translate / Test Defect Report Requires Translation 100% Translated Generate XLIFF 0% Translated 100% match Fuzzy match Machine Translate Translate Translation Repository Developer Translation Memory Machine Translation Localization Engineer XLIFF Editor Update 100% Translated XLIFF Translation Kit Slide 28 Translator Open Source Localisation Issues specific to localising Open Source software. Slide 29 Open Source Resource Formats • User Assistance (Help): – DocBook as intermediate container • UI Resources: – Many different format types, but converge on: • PO / POT • Java Resource Bundles (.properties & .java) Slide 30 Docbook • Formed in 1991 • SGML and XML versions • Many commercial XML editors optimised for Docbook • No good Open Source XML editors available. • GNU converts Docbook to (XML->) PO files, translates, then converts back. • Docbook converted to HTML dynamically by Yelp Help Browser. • To optimise performance can pre-convert to HTML Slide 31 UI Resource Format – Java Resources • ListResourceBundle – .java file – Can contain binary data – Compiled into class file • PropertyResourceBundles – – – – – .properties file Contain strings only Values acquired at runtime Requires 8859-1 encoding Non 8859-1 characters represented as UTF8 escape codes (ie, \uxxxx) – native2ascii to convert non 8859-1 content Slide 32 UI Resource Format – Java Resources • Localization challenges: – Each file contains 1 language locale pair – Key / Value Pairs – No normalized metadata – comments often used for ad hoc metadata. Slide 33 UI Resource Format - PO PO (Portable Object) Files, and POT (templates) – – – – – – – – – A “Catalog” Bi-lingual model Resource bundle accessed by “gettext()” Text files Utilities available to convert from many resource types to PO (ie., C, Delphi, Java, Python, etc.) Compiled into “MO” files Support for Plurals Limited metadata Used by most GNU, GNOME, KDE and other Open Source projects Slide 34 PO File Syntax Comments Header Separator Resource(s) # SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER msgid “” msgstr “” "Project-Id-Version: Project Version \n" "PO-Revision-Date: YYYY-DD-MM HH:MM-SSSS\n" "Last-Translator: TranslatorName <email>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=code\n" "Content-Transfer-Encoding: 8bit\n" "POT-Creation-Date: \n" "Language-Team: \n“ white-space (usually a single new line) # translator-comments #. automatic-comments Segment Metadata #: reference... #, flag... msgid untranslated-string msgstr translated-string Slide 35 PO File Plural Form Plural form of a message in the PO file looks like this: white-space # translator-comments #. automatic-comments #: reference... #, flag... msgid untranslated-string msgstr translated-string msgstr_plural translated-string-plural-form msgstr[0] translated-string-plural-form msgstr[1] translated-string-plural-form msgstr[n] translated-string-plural-form “n” is language specific Slide 36 PO File Plural Forms Syntax / Examples Syntax msgid untranslated-string msgstr_plural translated-string-plural-form msgstr[0] translated-string-plural-form msgstr[1] translated-string-plural-form msgstr[n] translated-string-plural-form French msgid "%s file" msgid_plural "%s files" msgstr[0] "%s fichier" msgstr[1] "%s fichiers" Polish msgid "%s file" msgid_plural "%s files" msgstr[0] "%s plik" msgstr[1] "%s pliki" msgstr[2] "%s plików" Slide 37 PO File Localization Challenges • Plural Forms Challenges – Rules differ across languages, and implementations differ across platforms. – PO editing tools don’t support plural form well (poedit, Kbabel), and recommend using text editors . • • • • Limited normalized metadata Little or no context information for translators Docbook represented as PO files loses metadata Limited support for segmentation, alignment Slide 38 Simplified GNU/KDE Style Use Case Generate PO Files PO Preparation & Project Management PO i18n Coordinator UI Developer CVS Docbook/PO converter Docbook Documentation Author Text Editor PO Translation CVSUP PO/Docbook converter Translator Developer Domain PO Editor Localisation Domain TM Slide 39 Open Source Localisation Process • Localization in Open Source community is very technical, and almost entirely manual – primary interface is CVS, even for translators (eg: http://i18n.kde.org/translation-howto/index.html) • Process and tools differ from project to project, even language to language. • Little or no formal linguistic review: quality, style consistency vary widely. • Project Management and translation are performed by volunteers. Slide 40 Tools Support A survey of localization tools that support XLIFF Slide 41 XML-Enabled Translation Tools • Any XML-enabled translation tool can work with an XLIFF document, as long as the text to translate is initially copied in the <target> elements. However, this does not mean it supports all XLIFF features, but just permits translation of <target> content. • Many tools cannot handle conditional translation (for example: <trans-unit translate="no">). Then, you need to add extra elements temporarily. Slide 42 XLIFF Enabled Commercial Tools • Alchemy Software - Catalyst 5.0 – Visual XLIFF 1.1 Editor http://www.alchemysoftware.ie • Heartsome XLIFF Editor, support for PO files, Docbook: http://www.heartsome.net • PASS: Passolo: Visual XLIFF Editor: http://www.passolo.com • Trados: No direct XLIFF support yet, but can edit XLIFF files using modified INI • XML-Intl : XLIFF Editor http://www.xml-intl.com Slide 43 XLIFF Enabled Shareware/Freeware • ENSALO Corp (formerly “RWS Group”) : Extraction Utility for RC Data and Java Properties to XLIFF 1.1 http://dotnet.goglobalnow.net/ Various Freeware Utilities, including converters for PO files: http://www.translate.com/shared/tools Slide 44 XLIFF Enabled Open Source • International Components for Unicode (ICU): – Open Source set of C/C++ and Java libraries for Unicode support, software internationalization and globalization, extends JDK i18n – genrb, and XLIFF2ICUConverter class to convert between common formats and XLIFF – Includes RBManager, a Java based resource bundle editor with XLIFF support http://oss.software.ibm.com/icu/ Slide 45 XLIFF Enabled Open Source • Okapi Framework XSL Template Collection: –Sample utilities for transforming XLIFF to PO, RC, Java Properties http://sourceforge.net/project/showfiles.php?group_id=42949& release_id=67485 • xliffRoundTrip tool –Transforms any XML file to/from XLIFF using XSLT http://sourceforge.net/projects/xliffroundtrip/ • Lionbridge ForeignDesk –Incomplete XLIFF support http://sourceforge.net/projects/foreigndesk/ Slide 46 Future Support for XLIFF Announced: • Apple Corp: Apple’s resource editor AppleGlot • Idiom: Worldserver V.6.0 • SDL International: SDLX support for XLIFF currently in development. See http://www.sdlx.com for more information. • uPortal: Open Source Web portal infrastructure for Universities – XLIFF support announced for Version 3.0, to be released in 2005 Slide 47 Where does XLIFF fit? • Good choice for projects with multiple resource formats, especially good for XML. • XLIFF addresses the process and metadata related problems of Open Source projects: – – – – Supports workflow metadata. Supports multiple resource formats Normalised translation memory / repository data. Simplifies translator usability experience. Slide 48 Where does XLIFF fit? • Issues Blocking Adoption by Open Source: – Adoption requires retooling - lack of existing open source XLIFF tools for PO and Docbook. – PO tools deemed adequate for current requirements – “Volunteer” model reduces urgency to reduce costs Slide 49 Where does XLIFF fit? • Issues Encouraging Adoption by Open Source: – Increase in commercial product development for Open Source platforms • Translation not volunteer effort - cost control important. • Integration with existing automation required. • Increased availability of commercial tools that support XLIFF – Increase in Java Open Source projects • Java projects are well supported by XLIFF. • Well documented L10n best practices include XLIFF • Available commercial and Open Source tools Slide 50 More Information • The XLIFF TC Web Site: http://www.xliff.org • A “best practice” from Sun Developer Network: http://developers.sun.com/dev/gadc/technicalpu blications/whitepapers/translation_technology_ sun.html • Presenter: – XLIFF TC Chair: Tony Jewtushenko (Oracle) (tony.jewtushenko@oracle.com) Slide 51 Thank You... Questions? Slide 52