Presentation - Localization World

advertisement
Using Open Standards:
Save Money and
Meet Customer Needs
John Watkins, President, ENLASO
jwatkins@enlaso.com
Standards are Hard…
Agenda
• What are Standards
• Who uses Standards
• How to use Standards
What are Standards – Evolution
• SDL’s Trados: “SDL’s computer-assisted translation software products are
the de facto standard in enterprise-wide translation...”1
De facto standards arise from market share: influence through prevalence
Standards may evolve from de facto standards through the cooperation of
the industry and a relevant standards body
• TMX is a standard for the exchange of translation memory data
(independent of the tool used to create the translation memory)
This arises through the cooperation of industry and standards bodies to
agree upon the specifications for the transfer of translation memory data
among tools (be they in proprietary or open source tools)
Today a wide variety of translation memory and related tools use TMX to
ensure interoperability
1
Ignacio Garcia & Vivian Stevenson, TRADOS and the Evolution of Language Tools,
Multilingual, May 2012 http://goo.gl/4qq5g
What are Standards – Definitions
• Standards
– Remove barriers for the purpose of performing functions that
are within an industry
– Are approved and maintained by neutral third parties with input
from industry, to avoid being locked into a proprietary solution
• Open Standards
–
–
–
–
–
Do the above
With Transparency in development and maintenance
And availability to the public (with accessible rights)
Synergy with Open Source software
We are fortunate that core Standards in use in the localization
industry are, indeed, Open Standards
What are Standards – Benefits
• Open Standards facilitate the interoperability of
language services tools
– Freedom to work with a wide variety of tools (many proprietary
tools support open standards for compatibility)
– Processes are developed independent of the tools
• Customers and providers can work more easily with the
various file types
• The best linguists can be used
regardless of their tool preference
• Consequently
– Tools are not constrained
– Workflow is easier
– Projects can be faster, better, and cheaper
Who Uses Standards – Everyone
• Customers
Eschew proprietary solutions
• Service Providers
Support the wide variety of tools to meet customer
and vendor needs
• Linguists
Want flexibility with CAT tool selection
• Tool Developers
Trying to meet everybody’s needs
Who Uses Standards – Contribute
• Size doesn’t matter – we can all contribute
– Mid-Size Localization Company
– Investing in Open Standards
•
•
•
•
Member of GALA Open Standards Initiative
Member of OASIS (XLIFF TC)
Member of W3C (LT-Web, ITS)
Multilingual Web-LT Project (European Commission W3C)
– Develop Open Source Tools (Okapi Framework)
Using Standards – Contribute
• Direct to the Source
– OSCAR/LISA -> Disbanded
• Standards developed by OSCAR under LISA now under the
Creative Commons Attribution license – See GALA
• European Telecommunications Standards Institute (ETSI)
Localization Industry Standards (LIS) Industry Specification
Group as the successor for the LISA/OSCAR portfolio (TMX,
TBX, SRX…): http://goo.gl/y4JgF
– OASIS (XLIFF, DITA…): http://www.oasis-open.org/standards
– W3C (ITS, MultilingualWeb-LT – ITS 2.0): http://www.w3.org/
– European Commission (LT-Web): http://goo.gl/SKa7U
Using Standards – Contribute
• GALA Standards Initiative
– OSCAR standards: http://www.gala-global.org/standards/
– Linport (localizaiton kit standardization)
– Model Service Elements (localization task standardization)
– Coordination and representation
•
•
•
•
QT Launchpad (DFKI - translation quality)
W3C MultilingualWeb-LT (W3C - international web standards)
ISO TC 37 SC5 SD 17100 (ISO translation services)
OASIS, Unicode Consortium, OpenTM2, more to come
– GALA-Connect (working groups for members): http://goo.gl/4pzlQ
– Quarterly webinars on standards developments: http://goo.gl/l94QE
• There are lots of ways/places to contribute!
Using Standards
1. Look at an example project
2. Identify Standards involved
3. Use Standards to provide localized files
Using Standards – Examples
• Example using Four Standards that are stable and work well
– Translation memories
• TMX: Translation Memory eXchange 1
Easily exchange of translation memory among tools
– Segmentation
• SRX: Segmentation Rules eXchange 1
Provide a standard method to describe segmentation rules that are being
exchanged among tools
– Extracted data
• ITS: Internationalization Tag Set 2
Used for XML to support the internationalization and localization of XML schemas
and documents
• XLIFF: XML Localisation Interchange File Format 3
To store localizable data and carry it from one step of the localization process to the
other, while allowing interoperability among tools
1
See GALA Open Standards: http://www.gala-global.org/lisa-oscar-standards
See W3C: http://www.w3.org/TR/its/
3 See OASIS: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff
2
Using Standards – Open Source
• Open Standards are a logical fit with
Open Source tools.
• We work with the Okapi Framework Project1
– Rainbow: Toolbox with functions for pre/post processing,
file conversion, encoding conversion, QA, etc.
– Pensieve: An Okapi TM engine
– Ratel: Segmentation editor
– And much more
1
See Okapi Framework project site at: http://code.google.com/p/okapi
Example Project
MIF
File
Excel
XML
File
New version of the documents to translate
(from Excel and FrameMaker)
Trados
TM
TMX
1
Translation Memory from Trados
Wordfast
TM
TMX
2
Translation Memory from Wordfast
SRX
Rules
Segmentation rules for the TMs
Means to an End
You can use the Okapi Framework to:
–
–
–
–
–
–
–
Manipulate and combine translation memories
Extract text with appropriate filters
Edit segmentation rules and apply them to content
Leverage from TM
Machine translate unmatched text
Create the translation package for the linguists
Rebuild translated files
FrameMaker MIF File
Excel XML File
Three Tasks
1. Consolidate client TMs into a single TM
2. Prepare the translation package to send
to the linguist
3. Post-process the files for delivery
Translation Memories – TMX
• TMX (Translation Memory eXchange) is the
standard way to store source text (segments)
and their corresponding translations
• Supported by most CAT tools
Wordfast TM – TMX
Trados TM – TMX
Combine TMs
Trados
TM
TMX
1
Combining the two TMs into a single one.
Four different tools sharing data through TMX
Rainbow
Toolbox
Wordfast
TM
TMX
2
Pensieve
TM
Three Tasks
 Consolidate the client’s TMs data into a
single Pensieve TM
2. Prepare the translation package to send
to the linguist
3. Post process the files for delivery
XML Extraction – ITS
MIF
File
Excel
File
ITS
Rules
ITS rules can be complex, but it provides a clear way
for the owner of the source material to specify what
needs to be translated. ITS-aware tools can process
XML documents without guesswork.
Pipeline (Driven by Rainbow)
XML Filter
MIF Filter
Content
Extraction
XML Extraction – ITS
• For XML documents, ITS (Internationalization
Tag Set) describes what needs to be extracted
and how to extract it
• W3C MultilingualWeb-LT WG just started to
work on the successor of ITS 1.0
• Using ITS rules to identify localizable text in
the Excel XML document
ITS Rules
Segmentation – SRX
MIF
File
Excel
File
ITS
Rules
Pipeline (Driven by Rainbow)
XML Filter
MIF Filter
Extraction
Segmentation
SRX
Rules
Sharing segmentation rules
is key to sharing TMs
Segmentation – SRX
• Translation is done at the segment level
• SRX (Segmentation Rules eXchange) describes
where to break or not break the content into
segments
• Having the rules for source segments allows
better re-usability of existing TM, giving exact
matches
• Maintain SRX rules with an SRX Editor
Segmentation – SRX
Don’t break segment after
VS. V.S. vs. or v.s.
Translation Kit – XLIFF, TMX
MIF
File
Excel
File
Excel
XLIFF
ITS
Rules
MIF
XLIFF
TMX
Etc.
Translation Kit
Pipeline (Driven by Rainbow)
XML Filter
MIF Filter
Pre-translate
from TM
Pre-translate
unmatched
from MT
Translation Kit
Creation
Extraction
Segmentation
Pensieve TM
Connector
Microsoft MT
Connector
SRX
Rules
Pensieve
TM
Microsoft
MT
Translation Kit – XLIFF, TMX
• To flow through the translation cycle the
extracted content needs to be stored in a
common format many tools understand
• XLIFF (XML Localisation Interchange File
Format) is a standard way to represent
extracted data
• TMX files with all the translation candidates
found
Translation Kit – XLIFF, TMX
Three Tasks
 Consolidate the client’s TMs data into a
single Pensieve TM
 Prepare the translation package to send
to the linguist
3. Post process the files for delivery
Post-Processing
TMX
Etc.
Translation Kit
MIF
File
Excel
File
XML Filter
MIF
XLIFF
MIF Filter
Excel
XLIFF
Pipeline (Driven by Rainbow)
Translator Kit
Filter
Extraction
Translation Kit
PostProcessing
Translated FrameMaker MIF
Translated Excel XML
Three Tasks
 Consolidate the client’s TMs data into a
single TM
 Prepare the translation package to send to
the linguist
 Post process the files for delivery
Summary
• We know
– More about our standards
– Everybody should care about standards
– We can (and do) use them today
• Next Steps
– Consider requiring open standards compliance with
the tools you use to ensure portability
– Get involved in educational opportunities
– Support standards initiatives through the
organization(s) that best fit your needs
References
• TMX 1.4b – Translation Memory eXchange
http://www.gala-global.org/oscarStandards/tmx/
• ITS 1.0 – Internationalization Tag Set
http://www.w3.org/TR/its/
• SRX 2.0 – Segmentation Rules eXchange
http://www.gala-global.org/oscarStandards/srx/
• XLIFF 1.2 – XML Localisation Interchange File Format
http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html
• Okapi Framework (open-source & cross-platform)
http://code.google.com/p/okapi/
• Globalization and Localization Association (GALA
provides access to various standards projects)
http://www.gala-global.org
Questions?
John Watkins, President, ENLASO
jwatkins@enlaso.com
Download