An Introduction to XLIFF

advertisement
An Introduction
to XLIFF
Tony Jewtushenko
Oracle Corporation - Principal Product Manager
Chair – OASIS XLIFF TC
The XML Localisation Interchange File Format
Agenda
•
Overview of XLIFF
Definition, goals, and benefits of XLIFF
Brief history of XLIFF
•
Architecture
Main features of XLIFF
•
The Real World
Use cases and Tools support for XLIFF
•
Current State of Affairs
Post XLIFF 1.1 – what’s next…
Slide 2
XLIFF Overview
A glance at the definitions, goals and benefits
of the XML Localisation Interchange File
Format.
Slide 3
What is XLIFF?
A specification
for the lossless interchange of localizable data
and its related information,
which is tool-neutral,
has been formalized as an XML vocabulary,
and features an extensibility mechanism.
Slide 4
XLIFF TC’s Charter
“The purpose of the OASIS XLIFF TC is to define, through
XML vocabularies, an extensible specification for the
interchange of localization information. The specification will
provide the ability to mark up and capture localizable data and
interoperate with different processes or phases without loss of
information. The vocabularies will be tool-neutral, support the
localization-related aspects of internationalization and the
entire localization process. The vocabularies will support
common software and content data formats. The specification
will provide an extensibility mechanism to allow the
development of tools compatible with an implementer's own
proprietary data formats and workflow requirements.”
Slide 5
Why XLIFF is Needed?
Localization offers the following challenges:
• Insufficient interoperability between tools.
• Lack of support for overall localization
workflow.
• Necessity of localization tools developers to
deal with many formats.
• Large number of proprietary intermediate
formats.
Slide 6
Advantages – Localization Customer
• Single format for adjunct processing (e.g.
quality control in terms of spell checking).
• Less dependency on vendors which are able to
work with special formats.
• Tighter control on what goes to localization
(Pre-filtering of what to translate or not).
• Controlled information flow (author/developer
notes, item properties, etc.).
• ID-based leveraging.
• All advantages of XML-based processing.
Slide 7
Advantages – Tools Vendor
• Focus on development of core functionality
rather treatment of source format.
• Allow usage of tools in new contexts.
• All advantages of XML-based processing.
Slide 8
Advantages – Service Provider
• Single format for adjunct processing (e.g.
quality control in terms of spell checking).
• Less dependency on specific localization tools.
• Controlled information flow (author/developer
notes, item properties, etc.).
• Allow usage of tools in new contexts.
• All advantages of XML-based processing.
• Open and standard solution for proprietary
formats.
Slide 9
Advantages – Technology
(1/2)
• For a given utility, only one implementation is
necessary (e.g. not one spell checker for RTF,
and another one for HTML).
• Increases usability of utilities (i.e. all formats
with XLIFF filters can be used with XLIFFenabled utilities).
Slide 10
Advantages – Technology
(2/2)
• All advantages of XML-based processing:
–
–
–
–
–
Use of its internationalization features.
Better interoperability and cross-platform support.
Powerful rendering options (XSL-FO, CSS).
Powerful transformation options (XSLT).
Greater integration with Web services.
• Access to existing, and often open-source,
XML implementation (lower costs).
Slide 11
Genesis of XLIFF
• Founded: Sept 2000
• Founding Members: Novell, Oracle and Sun
• Initially named “DataDefinition” group
Slide 12
XLIFF Timeline
•
•
•
•
•
•
•
•
•
•
•
September 2000 - DataDefinition Kickoff
December 2000 - first face to face
March 2001 - second face to face
End March 2001 - draft 1.0 spec and DTD published
June 2001 - White Paper published
December 2001 - OASIS XLIFF Technical Committee Proposal submitted
April 2002 – XLIFF 1.0 Specification approved by formal vote as an
OASIS Committee Specification
May 2003 – XLIFF 1.1 Specification approved by formal vote as an OASIS
Committee Specification
August/Sept 2003 – XLIFF 1.1 Peer Review
November 2003 – Revised XLIFF 1.1 Specification approved as OASIS
Committee Specification
November 2003 – XLIFF 1.1 Specification submitted to OASIS Standards
Review Process
Slide 13
OASIS: Standards Body Home of XLIFF
• OASIS: Organization for the Advancement of Structured
Information Standards
• World’s largest independent, non-profit organization dedicated
to the standardisation of XML applications and Web Services
• More than 150 member companies plus individuals
• Operates XML.ORG Registry, the open community
clearinghouse of XML application schemas clearinghouse of
XML application schemas
• Technical work on XML interoperability includes XML
conformance and XML Registries/Repositories
• General XML technical resource
Slide 14
Drivers Behind XLIFF
 Alchemy Software
 Bowne Global Solutions
 Convey Software
 Ektron, Inc
 Globalsight
 HP
 Lotus/IBM
 Lionbridge
 LRC
 Moravia IT
 Novell
 Oracle
 Microsoft
 RWS Group
 SAP
 SDL International
 Sun Microsystems
 Tektronix
Slide 15
Present OASIS XLIFF TC
•
•
TC Officers:
– TC Chair: Tony Jewtushenko, Oracle Corporation
– TC Secretary: Peter Reynolds, Bowne Global Solutions
– TC Editor: Yves Savourel
Current Members of TC:
•
•
•
•
•
•
•
•
•
•
•
•
Gérard Cattin des Bois, Microsoft
Doug Domeny
Milan Karásek, Moravia-IT
Mark Levins, IBM/Lotus
Christian Lieske, SAP
Mat Lovatt, Oracle
Enda McDonnell
David Pooley, SDL
John Reid, Novell
Reinhard Schaler, LRC
Bryan Schnabel, Tektronix
Shigemichi Yazawa
Slide 16
XLIFF TC in the Community
• Shared interests with OASIS Translation Web
Services Technical Committee
– XLIFF may be used as data container for WS
• Shared interests with the OSCAR SIG at LISA
– Segmentation and word-count.
– Content markup (inline codes).
• Shared interests with the W3C i18n WG
–
–
–
–
Localization directives.
Best practices.
In the localization aspects of the W3C. recommendations.
Web services.
Slide 17
Architecture
A look at XLIFF’s main features and how they
work together.
Slide 18
Extract-Localize-Merge Paradigm
• Separate data related to localization from parts not
related to localization.
• Merge translated data with codes at the end of the
process to create the final document.
• Skeleton file is optional, so this paradigm is also
optional
Slide 19
A Birds-Eyes View
1.
2.
3.
4.
An XLIFF document can capture anything
needed for a localization project:
Localizable objects (e.g. text strings) in
source and target languages.
Supplementary information (e.g. glossaries,
or material to recreate the original format).
Administrative information (e.g. workflow
data).
Custom data (e.g. initialization information
for tools).
Slide 20
The XLIFF Document
• An XLIFF document is designed to store the
extracted data related to localization.
• Each given source container (e.g. a file, a
database table, and so forth) corresponds to a
<file> element in XLIFF.
• Each XLIFF document can include several
<file> elements.
• A whole localization project can possibly be
stored in a single XLIFF document.
Slide 21
Bilingual Model
• Each <file> element is designed to store one
source language and one target language.
• The rational is that the translation of different
target language is done by different people
most of the time.
• However, languages in <alt-trans> element
can be different. For example, proposed
matches in national Portuguese when
translating into Brazilian Portuguese.
Slide 22
Localizable Objects
• XLIFF allows not only text string as localizable
object but also other object types such as
graphics.
• Supplementary information can be represented
in a generic way through inline codes (e.g.
formatting of text).
• Relationship between object can be captured
(e.g. all items in a menu).
Slide 23
An XLIFF Snippet…
A simple menu represented as XLIFF
Slide 24
Supplementary Info
• XLIFF provides “hooks” for storing
supplementary information (for example to
glossaries or translation memories which
should be used).
• The supplementary information can be
referenced (i.e. reside outside of the document),
or embedded within the document.
Slide 25
Administrative Info
XLIFF provides mechanisms for capturing
administrative information:
• For relating source material to XLIFF
documents.
• For storing workflow data.
• For providing pre-translation entries.
• For keeping track of changes.
Slide 26
Administrative Info – Pre-Leveraging
A set of proposed translation can be included
for each <trans-unit> element, using the
<alt-trans> element.
<trans-unit id='1'>
<source xml:lang='en'>The text</source>
<alt-trans quality-match='high'
origin='MTsystem'>
<target xml:lang='fr'>Le texte</target>
</alt-trans>
</trans-unit>
Slide 27
Custom Data in XLIFF 1.0
In XLIFF 1.0, we use the <prop> element and
the ts attribute to store user-defined
information (*note: these features are deprecated in XLIFF 1.1)
<trans-unit id='1' ts='ctx:23a7'>
<prop-group>
<prop prop-type='myType'
>Some property data</prop>
</prop-group>
<source>Text</source>
</trans-unit>
Slide 28
XLIFF 1.1 Custom Data
In XLIFF 1.1, we have the ability to customise
XLIFF by extending:
– Elements
– Attributes
– Attribute Values
Slide 29
Extending Elements
– Extension points in the following elements:
• <header>, <group>, <tool>, <trans-unit>, <alttrans>, and <bin-unit>.
– content of each custom element can be any valid
XML content:
• empty content, PCDATA, mixed content, and so forth
– Custom elements defined in private namespace
schema
Slide 30
Example of Extending Elements in XLIFF 1.1
<xliff version='1.1'
xmlns='urn:oasis:names:tc:xliff:document:1.1'
xmlns:sup='http://www.ChaucerState.ac.pg/Frm/XLFSup-v1'>
<file original='passus-1.doc' source-language='enm‘
datatype='plaintext'>
<group>
<sup:SourceInfo>
<sup:Book>Piers Plowman, Passus 1</sup:Book>
<sup:Author>William Langland</sup:Author>
</sup:SourceInfo>
<sup:WorkInfo Task='transcription' Context='MiddleEnglish:1360'/>
<trans-unit id='1'>
<source xml:lang='enm'>What this mountaigne bymeneth</source>
<target xml:lang='en'>What this mountain means</target>
<sup:Reference Type='strophe'>1-a</sup:Reference>
</trans-unit>
</group>
</file>
</xliff>
Slide 31
Extending Attributes
• Attributes of a namespace different than XLIFF
can be included in these XLIFF elements:
– <file>, <group>, <trans-unit>, <source>,
<target>, <tool>, <bin-unit>, <bin-source>,
<bin-target>, <alt-trans>, <mrk>, <g>, <x/>,
<bx/>, <ex/>, <bpt>, <ept>, <ph>, and <it>
• No specific location where to insert the nonXLIFF attributes
• No limit to the number of non-XLIFF attributes
that can be used in an XLIFF document
Slide 32
Example of Extending Attributes
Attributes from the HTML vocabulary extend the
<group> and <trans-unit>
<xliff version='1.1'
xmlns='urn:oasis:names:tc:xliff:document:1.1'
xmlns:htm='http://www.w3.org/TR/REC-html40'>
<file original='table.htm' source-language='en' datatype='html'>
<group restype='table' htm:border='1' htm:cellpadding='5‘ htm:cellspacing='0' htm:width='100%'>
<group restype='row'>
<trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 1 column 1</source>
</trans-unit>
<trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 1 column 2</source>
</trans-unit>
</group>
<group restype='row'>
<trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 2 column 1</source>
</trans-unit>
<trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 2 column 2</source>
</trans-unit>
</group>
</group>
</file>
</xliff>
Slide 33
Extending Attribute Values
• Attributes where the list of values can be
extended are the following: context-type,
count-type, ctype, datatype, mtype, restype,
size-unit, state, unit, priority, and purpose
• User-defined values must start with a “x-”
prefix
• There is no specified mechanism to validate
individual user-defined values, beyond starting
with “x-”
Slide 34
Example of Extending Attribute Values
• The following excerpt shows how the
user-defined value x-for-engineer can be
utilized in a document:
...
<group>
<context-group name='EngineersData'>
<context context-type='x-for-engineers'>Data...</context>
</context-group>
</group>
...
Slide 35
Embedding XLIFF (XLIFF 1.1)
• Can embed an entire or part of an XLIFF doc
in other XML doc
• XML defined by XML Schema (XSD) that
includes an <any> element in the definition of
the element where the XLIFF data can be
inserted
Slide 36
Deprecated or changed 1.0
•
•
•
•
•
•
reformat – feature changed
tool attribute becomes tool element
new tool-id attribute
ts, prop / prop-group - deprecated
header was required, now optional
default –can specify default values for given
scope
Slide 37
Data Validation
• In 1.0, validation by DTD
• In 1.1, validation by XML Schema – XSD
• XSD provides better control over XML
document:
– Structure – structured order can be specified
– Content – support for standard datatypes like
date
– Semantics – can specify range of valid values
or pattern
– Support for namespace
Slide 38
The Real World
A look at some concrete examples on how
XLIFF can be used in localization projects.
Slide 39
Streamlining L10n Files Exchanges
Localization Vendor
Localization Customer
INC
HLP
INS
ZINC
CSV
RC
NLM
DOC
DOC
MC
ASD
LANG
DB
EN
HGFF
MSG
LANG
XSF
VBN
AGENT
SHL
TFD
PARA
ICS
MDB
LDI
CAT
FIL
CAT
MENU
XRDB
XLIFF
CFG
PCT
PROP
HTML
.INI
.EXE
.JAVA
.XSL
.TXT
..DLL
C++
XML
Vendor
Localization Process
Localization Vendor
Localization Customer
INC
HLP
INS
ZINC
CSV
RC
NLM
DOC
DOC
MC
ASD
LANG
DB
EN
HGFF
MSG
LANG
XSF
VBN
AGENT
SHL
TFD
PARA
ICS
MDB
LDI
CAT
FIL
CAT
MENU
XRDB
XLIFF
CFG
PCT
PROP
HTML
.INI
.EXE
.JAVA
.XSL
.TXT
..DLL
C++
XML
Localization
Preprocessor
Pre-translated
Proprietary
Format File
Localization Vendor
Localization Customer
INC
HLP
INS
ZINC
CSV
RC
NLM
DOC
DOC
MC
ASD
LANG
DB
EN
HGFF
MSG
LANG
XSF
VBN
AGENT
SHL
TFD
PARA
ICS
MDB
LDI
CAT
FIL
CAT
MENU
XRDB
XLIFF
CFG
PCT
PROP
HTML
.INI
.EXE
.JAVA
.XSL
.TXT
..DLL
C++
XML
Localization
Preprocessor
Customer
Supported
Localization Tool
XLIFF
Any tools based on
XLIFF Industry
Standard
Slide 40
Basic Use Case – without XLIFF
Native File 1
(e.g., HTML)
Native File 2
(e.g., Java
Files)
Developer
Applications
Customer
Specific
Tool (s)
Native File 3
(e.g., Java
Properties)
Native File n
Publisher/
Customer
Domain
Localisation
Domain
Tool Resource
Filters
Slide 41
Translator
Basic Use Case –with XLIFF
Direct to
XLIFF authoring
XLIFF
compliant
Developer
Applications
- OR -
Pre-processing
HTML
XLIFF
XLIFF file(s) containing
Compliant
HTML, Java, Properties, etc Editor
translatable resources
Translator
RC Data
Java
Non XLIFF
Properties
compliant
Developer
Applications
Publisher/
Customer
Domain
Localisation
Domain
Slide 42
Simple Automated Localisation Use Case
Pseudo
Translate / Test
XLIFF
Translation Kit
Defect
Report
100%
Translated
Generate
XLIFF
Requires
Translation
0% Translated
Leverage
Translate
Translation
Repository
Developer
Localization
Engineer
XLIFF Editor
Update
100%
Translated
XLIFF
Translation Kit
Slide 43
Translator
Automated Localisation with CAT Use Case
XLIFF
Translation Kit
Pseudo
Translate / Test
Defect
Report
Requires
Translation
100%
Translated
Generate
XLIFF
0% Translated
100%
match
Fuzzy
match
Machine
Translate
Translate
Translation
Repository
Developer
Translation
Memory
Machine
Translation
Localization
Engineer
XLIFF Editor
Update
100%
Translated
XLIFF
Translation Kit
Slide 44
Translator
Benefits: Use of XML Technologies
• XSL can be used to perform many tasks on
XLIFF documents, for example:
– Display translatable content in Web browser.
– Generate statistics (e.g. number of localizable
objects).
• Availability of many XML engines makes
using XLIFF easy.
– Content-related checks (e.g. that certain characters
do not appear as textual contents) can be performed
with ordinary Web browsers.
Slide 45
XML-Enabled Translation Tools
• Any XML-enabled translation tool can work
with an XLIFF document, as long as the text to
translate is initially copied in the <target>
elements. However, this does not mean it
supports all XLIFF features, but just
permits translation of <target> content.
• Many tools cannot handle conditional
translation (for example: <trans-unit
translate="no">). Then, you need to add
extra elements temporarily.
Slide 46
3rd Party Tools Support for XLIFF
• RWS Group : Extraction Utility for RC Data and Java Properties to
XLIFF 1.1 http://dotnet.goglobalnow.net/
Various Utilities: http://www.translate.com/shared/tools
• Alchemy Software - Catalyst 5.0 – Visual XLIFF 1.1 Editor
http://www.alchemysoftware.ie
• XML-Intl : XLIFF Editor http://www.xml-intl.com
• Heartsome XLIFF Editor: http://www.heartsome.net
• SDL International: SDLX support for XLIFF currently in
development. See http://www.sdlx.com for more information.
• Trados: No direct XLIFF support, but can edit XLIFF files using
modified INI
• PASS: Passolo: XML Editor can be configured for XLIFF,
http://www.passolo.com
Slide 47
More Tools Support for XLIFF
• Bowne Global Solutions: Elcano, Online Translation Service
has a web service based connector for XLIFF files
http://elcano.bowneglobal.com
• Oracle: HyperHub: Internal Tool for editing Oracle based data
contained in XLIFF archives
• IBM: Domino Global Workbench Version 6
(http://www6.software.ibm.com/devcon/devcon/docs/dwkbbet6
.htm)
• Sun : Internal XLIFF Editor as described in this article:
http://www.sun.com/developers/gadc/technicalpublication
s/articles/xliff.html
• Open Source XSLT Tools:
http://sourceforge.net/project/showfiles.php?group_id=42
949&release_id=67485
Slide 48
Current State of Affairs
A look at the work under way at the OASIS
XLIFF TC, the future, etc.
Slide 49
Current State of Affairs – To Do
• Specification of canonical representation in
XLIFF of common formats (e.g. Windows
resources, Java properties), so all XLIFF
representations are the same regardless which
tool created the document.
• Translation/Localization tools that support
XLIFF out-of-the-box (not just as another
XML format).
• Open Source filters (e.g. to convert from
Windows message catalogues to XLIFF).
Slide 50
More Information
• The XLIFF TC Web Site: http://www.xliff.org
• Presenter:
– XLIFF TC Chair: Tony Jewtushenko (Oracle)
(tony.jewtushenko@oracle.com)
• Significant Contributors to this Presentation:
– Christian Lieske, (SAP)
(christian.lieske@sap.com)
– Yves Savourel (RWS Group)
(ysavourel@translate.com)
Slide 51
Thank You...
Questions?
Slide 52
Download