An Introduction to XLIFF

advertisement
XLIFF - the XML based Open
Standard for Localisable
Content
Tony Jewtushenko
Oracle Corporation - Principal Product Manager
Chair – OASIS XLIFF TC
The XML Localisation Interchange File Format
Agenda
•
Open Standards
Definition and process
•
Overview of XLIFF
Definition, goals, and benefits of XLIFF
Architecture and Main Features of XLIFF
Use cases
•
Open Source Localisation
Technical Overview
Process Overview
Use case
•
Where does XLIFF fit?
Tools Support for XLIFF
XLIFF Adoption by Open Source community
Slide 2
What is an Open Standard?
Open standards are:
• Publicly available in stable, persistent versions
• Developed and approved under a published process
• Open to public input: public comments, public archives, no
NDAs
• Subject to explicit, disclosed IPR terms
• See the US, EU, WTO governmental & treaty definitions of
“standards”
Anything else is proprietary
Source: “Relationship Between Open Standards and Open Source Software”, Patrick Gannon –
CEO OASIS, Open Source in Government, Washington, DC, 15-17 March 2004
Slide 3
OASIS: Standards Body Home of XLIFF
• OASIS: Organization for the Advancement of
Structured Information Standards
• World’s largest independent, non-profit organization
dedicated to the standardisation of eBusiness
specifications.
• More than 150 member companies plus individuals
• Operates XML.ORG Registry, the open community
clearinghouse of XML application schemas
• Technical work on XML interoperability includes
XML conformance and XML Registries/Repositories
• General XML and eBusiness technical resource
Slide 4
OASIS Standards Process
• Specifications are created under an open, democratic,
vendor-neutral process
– Anyone may participate
– No single organisation can dictate the specification specifications must meet everyone’s needs
– All discussions are open to the public view and comment
• Two Tiered Specification approval process
– Committee Draft approved by Technical Committee
– OASIS members approve specification as OASIS Standard
• Process guarantees that specifications are created by a
broad range of industry, not just a single vendor
Slide 5
XLIFF Overview
A glance at the definitions, goals and benefits
of the XML Localisation Interchange File
Format.
Slide 6
What is XLIFF?
A specification for the lossless interchange of
localizable data and its related information,
which is tool-neutral, has been formalized as an
XML vocabulary, and features an extensibility
mechanism.
Slide 7
Why XLIFF is Needed?
Localization offers the following challenges:
• Insufficient interoperability between tools.
• Lack of support for overall localization
workflow.
• Necessity of localization tools developers to
deal with many formats.
• Large number of proprietary intermediate
formats.
Slide 8
Advantages – Technology
(1/2)
• For a given utility, only one implementation is
necessary (e.g. not one spell checker for PO
Files, and another one for HTML).
• Increases usability of utilities (i.e. all formats
with XLIFF filters can be used with XLIFFenabled utilities).
• Can contain either UI or Document content
• Metadata provides integration with automated
workflow.
Slide 9
Advantages – Technology
(2/2)
• All advantages of XML-based processing:
–
–
–
–
–
–
Content validation (XSD)
Use of its internationalization features.
Better interoperability and cross-platform support.
Powerful rendering options (XSL-FO, CSS).
Powerful transformation options (XSLT).
Greater integration with Web services.
• Access to existing, and often open-source,
XML implementations
Slide 10
XLIFF Timeline
•
•
•
•
•
•
•
•
•
•
•
September 2000 - DataDefinition Kickoff
December 2000 - first face to face
March 2001 - second face to face
End March 2001 - draft 1.0 spec and DTD published
June 2001 - White Paper published
December 2001 - OASIS XLIFF Technical Committee Proposal submitted
April 2002 – XLIFF 1.0 Specification approved by formal vote as an
OASIS Committee Specification
May 2003 – XLIFF 1.1 Specification approved by formal vote as an OASIS
Committee Specification
August/Sept 2003 – XLIFF 1.1 Peer Review
November 2003 – Revised XLIFF 1.1 Specification approved as OASIS
Committee Specification
November 2003 – XLIFF 1.1 Specification submitted for public review
Slide 11
Drivers Behind XLIFF
 Alchemy Software
 Bowne Global Solutions
 Convey Software
 Ektron, Inc
 ENLASO Corp (RWS)
 Globalsight
 HP
 Lotus/IBM
 Lionbridge
 LRC
 Moravia IT
 Novell
 Oracle
 PASS Engineering
 Microsoft
 SAP
 SDL International
 Sun Microsystems
 Tektronix
 TRADOS
 XML-Intl
Slide 12
XLIFF TC in the Standards Community
• Shared interests with OASIS Translation Web
Services Technical Committee
– XLIFF may be used as data container for WS
• Shared interests with the OSCAR SIG at LISA
– Segmentation and word-count.
– Content markup (inline codes).
• Shared interests with the W3C i18n WG
–
–
–
–
Localization directives.
Best practices.
In the localization aspects of the W3C. recommendations.
Web services.
Slide 13
Architecture
A look at XLIFF’s main features and how they
work together.
Slide 14
Extract-Localize-Merge Paradigm
• Separate data related to localization from parts not
related to localization.
• Merge translated data with codes at the end of the
process to create the final document.
• Skeleton file is optional, so this paradigm is also
optional
Slide 15
A Birds-Eyes View
1.
2.
3.
4.
An XLIFF document can capture anything
needed for a localization project:
Localizable objects (e.g. text strings) in
source and target languages.
Supplementary information (e.g. glossaries,
or material to recreate the original format).
Administrative information (e.g. workflow
data).
Custom data (e.g. initialization information
for tools).
Slide 16
The XLIFF Document
• An XLIFF document is designed to store the
extracted data related to localization.
• Each given source container (e.g. a file, a
database table, and so forth) corresponds to a
<file> element in XLIFF.
• Each XLIFF document can include several
<file> elements.
• A whole localization project can possibly be
stored in a single XLIFF document.
Slide 17
Bilingual Model
• Each <file> element is designed to store one
source language and one target language.
• The rational is that the translation of different
target language is done by different people
most of the time.
• However, languages in <alt-trans> element
can be different. For example, proposed
matches in national Portuguese when
translating into Brazilian Portuguese.
Slide 18
Localizable Objects
• XLIFF allows not only text string as localizable
object but also other object types such as
graphics.
• Supplementary information can be represented
in a generic way through inline codes (e.g.
formatting of text).
• Relationship between object can be captured
(e.g. all items in a menu).
Slide 19
An XLIFF Snippet…
A simple menu represented as XLIFF
Slide 20
Supplementary Info
• XLIFF provides “hooks” for storing
supplementary information (for example to
glossaries or translation memories which
should be used).
• The supplementary information can be
referenced (i.e. reside outside of the document),
or embedded within the document.
Slide 21
Administrative Info
XLIFF provides mechanisms for capturing
administrative information:
• For relating source material to XLIFF
documents.
• For storing workflow data.
• For providing pre-translation entries generated
by TM, MT, translation repository.
• For keeping track of changes.
Slide 22
XLIFF 1.1 Custom Data
In XLIFF 1.1, we have the ability to customise
XLIFF by extending via private namespace:
– Elements
– Attributes
– Attribute Values
Slide 23
Embedding XLIFF 1.1
• Can embed an entire or part of an XLIFF doc
in other XML doc
• XML defined by XML Schema (XSD) that
includes an <any> element in the definition of
the element where the XLIFF data can be
inserted
Slide 24
Use Cases
XLIFF in the localisation process.
Slide 25
Basic Use Case – without XLIFF
Native File 1
(e.g., HTML)
Native File 2
(e.g., Java
Files)
Developer
Applications
Customer
Specific
Tool (s)
Native File 3
(e.g., Java
Properties)
Native File n
Publisher/
Customer
Domain
Localisation
Domain
Tool Resource
Filters
Slide 26
Translator
Basic Use Case –with XLIFF
Direct to
XLIFF authoring
XLIFF
compliant
Developer
Applications
- OR -
Pre-processing
HTML
XLIFF
XLIFF file(s) containing
Compliant
HTML, Java, Properties, etc Editor
translatable resources
Translator
RC Data
Java
Non XLIFF
Properties
compliant
Developer
Applications
Publisher/
Customer
Domain
Localisation
Domain
Slide 27
Automated Localisation with CAT Use Case
XLIFF
Translation Kit
Pseudo
Translate / Test
Defect
Report
Requires
Translation
100%
Translated
Generate
XLIFF
0% Translated
100%
match
Fuzzy
match
Machine
Translate
Translate
Translation
Repository
Developer
Translation
Memory
Machine
Translation
Localization
Engineer
XLIFF Editor
Update
100%
Translated
XLIFF
Translation Kit
Slide 28
Translator
Open Source Localisation
Issues specific to localising Open Source
software.
Slide 29
Open Source Resource Formats
• User Assistance (Help):
– DocBook as intermediate container
• UI Resources:
– Many different format types, but converge on:
• PO / POT
• Java Resource Bundles (.properties & .java)
Slide 30
Docbook
• Formed in 1991
• SGML and XML versions
• Many commercial XML editors optimised for
Docbook
• No good Open Source XML editors available.
• GNU converts Docbook to (XML->) PO files,
translates, then converts back.
• Docbook converted to HTML dynamically by Yelp
Help Browser.
• To optimise performance can pre-convert to HTML
Slide 31
UI Resource Format – Java Resources
• ListResourceBundle
– .java file
– Can contain binary data
– Compiled into class file
• PropertyResourceBundles
–
–
–
–
–
.properties file
Contain strings only
Values acquired at runtime
Requires 8859-1 encoding
Non 8859-1 characters represented as UTF8 escape codes
(ie, \uxxxx)
– native2ascii to convert non 8859-1 content
Slide 32
UI Resource Format – Java Resources
• Localization challenges:
– Each file contains 1 language locale pair
– Key / Value Pairs
– No normalized metadata – comments often used for
ad hoc metadata.
Slide 33
UI Resource Format - PO
PO (Portable Object) Files, and POT (templates)
–
–
–
–
–
–
–
–
–
A “Catalog”
Bi-lingual model
Resource bundle accessed by “gettext()”
Text files
Utilities available to convert from many resource types to
PO (ie., C, Delphi, Java, Python, etc.)
Compiled into “MO” files
Support for Plurals
Limited metadata
Used by most GNU, GNOME, KDE and other Open
Source projects
Slide 34
PO File Syntax
Comments
Header
Separator
Resource(s)
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
msgid “”
msgstr “”
"Project-Id-Version: Project Version \n"
"PO-Revision-Date: YYYY-DD-MM HH:MM-SSSS\n"
"Last-Translator: TranslatorName <email>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=code\n"
"Content-Transfer-Encoding: 8bit\n"
"POT-Creation-Date: \n"
"Language-Team: \n“
white-space (usually a single new line)
# translator-comments
#. automatic-comments
Segment Metadata
#: reference...
#, flag...
msgid untranslated-string
msgstr translated-string
Slide 35
PO File Plural Form
Plural form of a message in the PO file looks like this:
white-space
# translator-comments
#. automatic-comments
#: reference...
#, flag...
msgid untranslated-string
msgstr translated-string
msgstr_plural translated-string-plural-form
msgstr[0] translated-string-plural-form
msgstr[1] translated-string-plural-form
msgstr[n] translated-string-plural-form
“n” is language specific
Slide 36
PO File Plural Forms Syntax / Examples
Syntax
msgid untranslated-string
msgstr_plural translated-string-plural-form
msgstr[0] translated-string-plural-form
msgstr[1] translated-string-plural-form
msgstr[n] translated-string-plural-form
French
msgid "%s file"
msgid_plural "%s files"
msgstr[0] "%s fichier"
msgstr[1] "%s fichiers"
Polish
msgid "%s file"
msgid_plural "%s files"
msgstr[0] "%s plik"
msgstr[1] "%s pliki"
msgstr[2] "%s plików"
Slide 37
PO File Localization Challenges
• Plural Forms Challenges
– Rules differ across languages, and implementations differ
across platforms.
– PO editing tools don’t support plural form well (poedit,
Kbabel), and recommend using text editors .
•
•
•
•
Limited normalized metadata
Little or no context information for translators
Docbook represented as PO files loses metadata
Limited support for segmentation, alignment
Slide 38
Simplified GNU/KDE Style Use Case
Generate PO Files
PO
Preparation &
Project Management
PO
i18n
Coordinator
UI Developer
CVS
Docbook/PO converter
Docbook
Documentation
Author
Text Editor
PO
Translation
CVSUP
PO/Docbook converter
Translator
Developer
Domain
PO Editor
Localisation
Domain
TM
Slide 39
Open Source Localisation Process
• Localization in Open Source community is
very technical, and almost entirely manual –
primary interface is CVS, even for translators
(eg: http://i18n.kde.org/translation-howto/index.html)
• Process and tools differ from project to project,
even language to language.
• Little or no formal linguistic review: quality,
style consistency vary widely.
• Project Management and translation are
performed by volunteers.
Slide 40
Tools Support
A survey of localization tools that support
XLIFF
Slide 41
XML-Enabled Translation Tools
• Any XML-enabled translation tool can work
with an XLIFF document, as long as the text to
translate is initially copied in the <target>
elements. However, this does not mean it
supports all XLIFF features, but just
permits translation of <target> content.
• Many tools cannot handle conditional
translation (for example: <trans-unit
translate="no">). Then, you need to add extra
elements temporarily.
Slide 42
XLIFF Enabled Commercial Tools
• Alchemy Software - Catalyst 5.0 – Visual XLIFF
1.1 Editor http://www.alchemysoftware.ie
• Heartsome XLIFF Editor, support for PO files,
Docbook: http://www.heartsome.net
• PASS: Passolo: Visual XLIFF Editor:
http://www.passolo.com
• Trados: No direct XLIFF support yet, but can edit
XLIFF files using modified INI
• XML-Intl : XLIFF Editor http://www.xml-intl.com
Slide 43
XLIFF Enabled Shareware/Freeware
• ENSALO Corp (formerly “RWS Group”) :
Extraction Utility for RC Data and Java Properties
to XLIFF 1.1 http://dotnet.goglobalnow.net/
Various Freeware Utilities, including converters
for PO files: http://www.translate.com/shared/tools
Slide 44
XLIFF Enabled Open Source
• International Components for Unicode (ICU):
– Open Source set of C/C++ and Java libraries for
Unicode support, software internationalization and
globalization, extends JDK i18n
– genrb, and XLIFF2ICUConverter class to
convert between common formats and XLIFF
– Includes RBManager, a Java based resource
bundle editor with XLIFF support
http://oss.software.ibm.com/icu/
Slide 45
XLIFF Enabled Open Source
• Okapi Framework XSL Template Collection:
–Sample utilities for transforming XLIFF to PO, RC, Java
Properties
http://sourceforge.net/project/showfiles.php?group_id=42949&
release_id=67485
• xliffRoundTrip tool
–Transforms any XML file to/from XLIFF using XSLT
http://sourceforge.net/projects/xliffroundtrip/
• Lionbridge ForeignDesk
–Incomplete XLIFF support
http://sourceforge.net/projects/foreigndesk/
Slide 46
Future Support for XLIFF Announced:
• Apple Corp: Apple’s resource editor AppleGlot
• Idiom: Worldserver V.6.0
• SDL International: SDLX support for XLIFF currently
in development. See http://www.sdlx.com for more
information.
• uPortal: Open Source Web portal infrastructure for
Universities – XLIFF support announced for Version
3.0, to be released in 2005
Slide 47
Where does XLIFF fit?
• Good choice for projects with multiple
resource formats, especially good for XML.
• XLIFF addresses the process and metadata
related problems of Open Source projects:
–
–
–
–
Supports workflow metadata.
Supports multiple resource formats
Normalised translation memory / repository data.
Simplifies translator usability experience.
Slide 48
Where does XLIFF fit?
• Issues Blocking Adoption by Open Source:
– Adoption requires retooling - lack of existing open
source XLIFF tools for PO and Docbook.
– PO tools deemed adequate for current requirements
– “Volunteer” model reduces urgency to reduce costs
Slide 49
Where does XLIFF fit?
• Issues Encouraging Adoption by Open Source:
– Increase in commercial product development for
Open Source platforms
• Translation not volunteer effort - cost control important.
• Integration with existing automation required.
• Increased availability of commercial tools that support
XLIFF
– Increase in Java Open Source projects
• Java projects are well supported by XLIFF.
• Well documented L10n best practices include XLIFF
• Available commercial and Open Source tools
Slide 50
More Information
• The XLIFF TC Web Site: http://www.xliff.org
• A “best practice” from Sun Developer
Network:
http://developers.sun.com/dev/gadc/technicalpu
blications/whitepapers/translation_technology_
sun.html
• Presenter:
– XLIFF TC Chair: Tony Jewtushenko (Oracle)
(tony.jewtushenko@oracle.com)
Slide 51
Thank You...
Questions?
Slide 52
Download