ITS

advertisement
Practical Visualization of ITS 2.0 Categories for Real
World Localization Process
Part of the Multilingual Web-LT Program
(C)
(C)2013
2013
Logrus
LogrusInternational
International
WHAT IS ITS AND WHY IT’S SO IMPORTANT
 The Internationalization Tag Set (ITS) is a set of attributes and elements designed
to provide internationalization and localization support in XML and HTML
documents. It also defines implementations of these concepts
 XML developers can use this namespace to integrate internationalization features
directly into their own XML schemas and documents
 The set is currently almost ready/frozen
 We believe that this is a one of the key standards for localization industry
 The set includes a number of categories of crucial importance to translators:
 Terminology note and Localization Note metadata
 Translate (yes/no) metadata to mark non-translatable text
 ITS metadata make it possible to include various instructions for translators into
documents, add terminology and comments, and mark non-translatable segments
 Will reduce inconsistency in adding translation instructions to documents
 Provides a universal interface for transferring translation metadata between tools
(C)
(C)2013
2013
Logrus
LogrusInternational
International
WHY ARE WE DOING THIS: DETAILS
 To make it possible to comment translatable content irrespective of its nature
 To make these instructions easily accessible to translators and editors
 Including recommendations, instructions, terminology suggestions
 Independent from translation tools
 Saving time: The text is already marked with context information
 One doesn’t have to think whether smth. NEEDS TO BE TRANSLATED or not
 One doesn’t have to think whether smth. IS A TERM or not
 Key advantages/improvements:
 Time (i.e. cost)
 Quality (fewer translation errors)
 Also very important for machine translation applications (post-editing in
context)
(C)
(C)2013
2013
Logrus
LogrusInternational
International
WHY ARE WE DOING THIS: WORKFLOW PARADIGM CHANGE
 FROM:
 Bulk manual translation of “raw” content or post-editing “raw” machine-translation
output
 When external terminology glossaries, localization instructions and reference data are
matched with content in indirect manner mostly in translator’s brain on-the-fly and to the
extent of his/her understanding of these instructions and personal skills
 TO:
 Using natural language processing (NLP) tools and ITS metadata markup to pre-
populate content to be translated or post-edited with context-related information
 When external terminology glossaries, localization instructions and reference data are
matched with content directly through automated process of preliminary linguistic analysis
 Pre-processing is controlled by dedicated qualified linguists/terminologists/editors
 PROVIDED THAT:
 Glossaries, instructions and reference data are converted into format compatible with NLP
tools and ITS markup
 And corresponding content searching algorithms are created (including fuzzy algorithms)
(C)
(C)2013
2013
Logrus
LogrusInternational
International
WHAT IS BEING DEVELOPED
 ITS 2.0 implementation project, a part of the Multilingual Web-LT program funded
by EU
 Developing the ITS Browser Plugin as a building block of future “Work In Context
System” (WICS)
 Making it possible to view standard ITS (Internationalization Tag Set) translationrelated metadata contained in XML, XLIFF, or HTML files
 Can be done in parallel with translating using CAT tools or for reviewing materials
 The JavaScript plugin would support most popular browsers
 For previewing XML or XLIFF, standalone filters for conversion into HTML will be
used
 Implementation:
 Standard-based preview solution: HTML5, Java Script, Web browser
 A script located in the same folder as HTML files
 The script is started by the browser automatically
 It is expected that both scripts and filters will be publicly available
(C)
(C)2013
2013
Logrus
LogrusInternational
International
THE PROJECT IDEA
 ITS metadata-enriched XML or XLIFF files: what’s
inside?
 Previewing ITS metadata in Web browser while
translating content in any CAT tool
 Standard-based preview solution: HTML5, Java Script,
Web browser
 Next step: ITS metadata as a carrier for localization
instructions and any reference data
(C)
(C)2013
2013
Logrus
LogrusInternational
International
THE WORK BREAKDOWN: PROJECT COMPONENTS
Visual designs
Java scripts to render and navigate metadata and content
Rich sample files
Content format conversion algorithms:
 XML+ITS -> HTML5+ITS*
 XLIFF+ITS -> HTML5+ITS*
 XML+ITS -> XLIFF+ITS (just an example)
 HTML+ITS -> HTML5+ITS*
* For the purposes of visualization, some redundant ITS syntax options for
HTML are not supported.
(C)
(C)2013
2013
Logrus
LogrusInternational
International
THE PROJECT CORE: VISUAL DESIGNS
Screen space limitations in localization process:
(C)
(C)2013
2013
Logrus
LogrusInternational
International
THE PROJECT CORE: VISUAL DESIGNS (CONT.)
Collapsed view of metadata
(C)
(C)2013
2013
Logrus
LogrusInternational
International
THE PROJECT CORE: VISUAL DESIGNS (CONT.)
Expanded view of metadata
(C)
(C)2013
2013
Logrus
LogrusInternational
International
THE PROJECT CORE: VISUAL DESIGNS (CONT.)
Summary view of metadata
(C)
(C)2013
2013
Logrus
LogrusInternational
International
THE PROJECT CORE: VISUAL DESIGNS (CONT.)
Color highlighting to indicate metadata linked to content
(C)
(C)2013
2013
Logrus
LogrusInternational
International
THE PROJECT CORE: VISUAL DESIGNS (CONT.)
Visual “tags” to indicate metadata linked to content
(C)
(C)2013
2013
Logrus
LogrusInternational
International
THE PROJECT CORE: VISUAL DESIGNS (CONT.)
Visual tags to highlight metadata (example)
(C)
(C)2013
2013
Logrus
LogrusInternational
International
DEVELOPMENT STATUS
Sample files: to be completed by end of May
File conversion algorithms: to be completed by Sep 30:

XML+ITS -> XLIFF+ITS (July) (sample)

XML+ITS -> HTML5+ITS (August)

HTML+ITS -> HTML5+ITS (August)

XLIFF+ITS -> HTML5+ITS (September)
Visualization scripts: to be completed by end of June
(C)
(C)2013
2013
Logrus
LogrusInternational
International
KNOWN ISSUES: FORMAT CONVERSIONS
“Translation” of XPath expressions from source XML
to target HTML
XLIFF: MRK element to be used instead of SPAN
Selection between SPAN and DIV elements in output HTML
Merging external ITS rule files into internal list of rules
(C)
(C)2013
2013
Logrus
LogrusInternational
International
KNOWN ISSUES: METADATA VISUALIZATION
Parsing local standoff markup along with other rules
Parsing list of merged ITS rules
Hyperlinks embedded in metadata
Static definitions like “Do not translate” for Translate category
Highlighting active ITS item
Displaying summary of all ITS items
Parsing nested ITS metadata
Differences in Java Script implementation between browsers
Navigation through content and ITS items
Fragmentation of content to avoid large pieces of text to be
displayed
(C)
(C)2013
2013
Logrus
LogrusInternational
International
LIVE DEMO
The demo samples are built on the preliminary versions of visual
designs and illustrate just a few ITS data categories:
Localization Note
Terminology
Translate
(C)
(C)2013
2013
Logrus
LogrusInternational
International
THANK YOU!
Questions?
(C)
(C)2013
2013
Logrus
LogrusInternational
International
Download