(1) Translation technologies and the project environment

advertisement
TRANSLATION AND LOCALIZATION
TECHNOLOGIES IN THE CLASSROOM
Theory and Practice
•1~
Contextualizing translation technologies and projects
•2 ~ Management of technologies, workflow and content
•3 ~ Project management and quality control
•4 ~ Reusing and recycling: alignment
•5 ~ Translation memory
•6 ~ Tagged content and translation
•7 ~ Evaluation: processes and post-mortem
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
1
TRANSLATION AND LOCALIZATION
TECHNOLOGIES IN THE CLASSROOM
Theory and Practice
•Professional
•On
and academic background
questions of training and education
•Assessing
and accommodating professional and student needs
•Complying
with academic requirements and professional standards
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
2
Point of Discussion 1
Needs Analysis -- professional and academic
You have been asked or wish to incorporate a technology component into your
translator training / translation program.
1.
2.
3.
4.
5.
6.
7.
What technologies are you going to include?
How will you distinguish between short-term market trends and long-term
transformations (economy, professional life, etc.) with regard to the
technologies? Will you attempt to accommodate both?
What are your concrete training objectives?
What are your overall educational / academic objectives?
Which perspective on technologies for which goal (academic research;
professional use)?
What are the criteria you have established for your priorities?
What competencies and sets of skills? Can we teach students to reflect on the
use of technologies (analyze and critique) at the same time we are teaching
them to learn how to use them?
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
3
1 - Contextualizing translation technologies and projects
1.
PROFESSIONAL AND ACADEMIC OBSERVATIONS

My working premise: we *should* strive to have students reflect on the use and history
of technologies while we are teaching them to learn how to use specific technologies.

Our experience (10-15 years) as users of translation technologies and technologies
overall now allows us to approach them with a more critical and analytical frame of mind.

We can accommodate the imperative to reflect more substantively on technologies by
considering the domains and histories that have contextualized their development:
◦
Human-Computer [Human-Machine] Interaction – from MT to CAT, along the
HT/HAMT/MAHT/MT continuum [bridging the HT-MT gap]
◦

Localization – perhaps the first sustained “encounter” in a globalizing world between
technologies and translation [many “localization procedures” have now become
standard and routine components of translation projects in general]
Collaboration and teams characterize the translation environment today, even though we
may not be aware of this virtual dimension when we work on our translation jobs
individually.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
4
1 - Contextualizing translation technologies and projects
Contextualize through basic questions….
Why do we use translation and localization technologies?
 What has transformed conventional translation projects?
◦ Globalization
◦ Technologies (computer, communications, information, Internet)
◦ Opening up of MT research
◦ Shared, distributed assets channeled through team and collaborative approaches
◦ “Geoculturalization” strategies

“…the act of allowing a local market’s geopolitics and culture to influence strategy,
design and deployment of a product or service, [or] the refinement of the practice
from localization into culturalization. […] For years we’ve heard endless
commentary about globalization and the blurring of cultural boundaries, but I’d
assert that in many ways the opposite is becoming true. The emphasis is now on the
power of the local, as being supported by the global technology infrastructure.”
(Tom Edwards, Englobe consultant, Multilingual , 2008)
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
5
1 - Contextualizing translation technologies and projects
Contextualize through a prism of diverse and converging histories …..

International trade and commerce

Human translation (HT)

Machine translation (MT)

Computer-assisted translation (CAT)

Communication, information, computer technologies

Localization

Internet

Globalization

Globalization, Internationalization, Localization, Translation (GILT)

Content management technologies
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
6
1 - Contextualizing translation technologies and projects
International Trade and Commerce
sea, land, air … and Internet
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
7
1 - Contextualizing translation technologies and projects
International Trade and Commerce

Protocols, regulations, negotiations, agreements

Moving goods: import and export

Selling and buying goods and services

Property and intellectual property

Sales agreements and contracts

Investments and financing

Modes and methods of payment

Insurance

Competition and collaboration

Trade agreements

Technologized and virtual
Localization -------- relationship to ICTs, Globalization and Internet
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
8
1 - Contextualizing translation technologies and projects
Human translation (HT)
SOURCE TEXT
languageculture
TARGET TEXT
languageculture
Other
ADAPTATIONS
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
9
1 - Contextualizing translation technologies and projects
Human translation (HT)
as process and product of linguistic-cultural transfer

as analyzed through linguistic tools in terms of

◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
Sounds as units of representation  phonetics
Sound functions and patterning  phonology
Word structure  morphology (form; lexical category; derivation; inflection)
Sentence structure  syntax (words organized into phrases and sentences)
Meaning  semantics (information content; mental representation; reference)
Usage  pragmatics
Acquisition  language acquisition
Processing  psycholinguistics
Variation  dialects; slang; jargons; idiolects
Languages in contact  borrowings; pidgins; creoles; bilingualism; multilingualism
Change  historical linguistics
Culture and identity  anthropological linguistics and sociolinguistics
Relevancy of linguistics to to MT…
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
10
1 - Contextualizing translation technologies and projects
Contextualize along the HT/HAMT/MAHT/MT continuum ….
….. with a focus on language
Human language – “natural language”  linguistics (and sub-domains)
Natural language: refers to a language that has evolved gradually as the major
means of communication and expression of a community. It has native speakers, in
contrast to computer languages and other artificial languages which have no native
speakers. This type of language is normally used for human communication without
any restriction of semantic scope and syntax.
Machine language – “artificial language”  computational linguistics
Artificial language: refers to a language invented for use in computer programming.
Computational linguistics is the branch of computer science concerned with natural
language processing; it is about the use of computers in the study of human language
and the study of making computers understand information expressed in human
languages. Natural language processing: a branch of computational linguistics which
deals with the computational processing of textual materials in natural languages
through human manipulation.
Human Translation ----------------------------------  Machine Translation
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
11
1 - Contextualizing translation technologies and projects
HT/HAMT/MAHT/MT continuum
Human translation: the process or act of producing a translation by a human being.
To translate from one language to another requires a competent mastery of skills in
language comprehension and reproduction in both the source and target languages.
In human translation, translators use a variety of thought processes and skills to
interpret the meaning of the source text and to communicate the meaning of that
text in the target language. Human translators have proper usage of language
resources, such as term, phrase, and grammar dictionaries, and are capable of
creating a translation that will be clearly understood in the reader’s target language.
Machine translation: refers to the use of machines (usually computers) to translate
texts from one natural language to another. It has other designations such as
“automatic translation”, when the process of translation is emphasized, “mechanical
translation”, when the mode of production is highlighted, and “computer
translation”, when the tool of production is brought to attention.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
12
1 - Contextualizing translation technologies and projects
HT/HAMT/MAHT/MT continuum
What is human-aided machine translation (HAMT)? Refers to the human translator
supplying limited information to “fill out” the machine translation. The required
human assistance may take place before machine processing begins, during the
translation process, or afterwards.
What is machine-aided human translation (MAHT)? Refers to a type of human
translation with limited assistance from the machine. It does not remove from the
translator the burden of actually performing the translation. The machine is a tool
to be used or controlled at the discretion of the translator. Same as “computerassisted translation” (CAT). Also, machine-aided translation, which refers to the use
of computer programmes by translators to help them during the translation
process. This includes such aids as spell checkers, online access to term bank
equivalents.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
13
1 - Contextualizing translation technologies and projects
Machine Translation (MT)
Machine translation is an interdisciplinary enterprise that combines a number of
fields of study such as lexicography, linguistics, computational linguistics, computer
science and language engineering. It is based on the hypothesis that natural
languages can be fully described, controlled and mathematically coded (Wilss 1999:
140).
 MT architecture approaches:
Direct translation (1st generation)
Rule-based (2nd generation)
Corpus-based (3rd generation)
 Today’s translation demands include translation for many different purposes. For MT,
at least four purposes have been identified: dissemination, assimilation, information
exchange and access.

Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
14
1 - Contextualizing translation technologies and projects
Computer-assisted translation (CAT)
The history of computer-assisted translation is tied to the history of the translator’s
workstation.
- -- - - - - - - - - - - - - - - - - - - -- - - - - - - - One definition of a translator’s workstation:
A workstation is a single integrated system that is made up of a number of
translation tools and resources such as a translation memory, an alignment tool, a
tag filter, electronic dictionaries, terminology databases, a terminology management
system and spell and grammar-checkers. There are two major translation tools in a
workstation or workbench: translation memory systems and terminology
management systems. (C.K. Quah)
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
15
1 - Contextualizing translation technologies and projects
Computer-assisted translation (CAT)
“The translator’s workstation” (Harold Somers)
For example:
In the late 1970s we find the first proposal for what is now called translation
memory, in which previous translations are stored in the computer and retrieved as
a function of their similarity to the current text being translated. As computational
linguistic techniques were developed throughout the 1980s, Alan Melby was
prominent in proposing the integration of various tools into a translator’s
workstation at various levels: the first level would be basic word-processing,
telecommunications and terminology management tools; the second level would
include a degree of automatic dictionary look-up and access to translation memory;
and the third would involve more sophisticated translation tools, up to and including
fully automatic MT. Into the 1990s and the present day, commercial MT and CAT
packages begin to appear on the market, incorporating many of these ideas.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
16
1 - Contextualizing translation technologies and projects
“The translator’s workstation” (Harold Somers)
Software with some translation capability will be an integral part of the translator’s
workstation. The most important feature of this is that it is under the user’s control.
The first thing to note is that commercial MT systems are designed primarily with
use by non-linguists in mind. The typical system presents itself as an extended word
processing system, with additional menus and toolbars for the translation-related
functions including translation memory. […]In its most simple mode of use, the user
highlights a portion of text to be translated. The draft translation is then pasted in
the appropriate place in the target text window, ready for post-editing. If the user
can determine what text is to be translated, they will quickly learn to assess what
types of text are likely to be translated well, and can develop a way of working with
the system, translating more difficult sections immediately by hand, while allowing
the system to translate the more straightforward parts. […] Many [CAT] systems
offer a choice of interactive translation in which the system stops to ask the user to
make choices. Full word processing facilities are available in the target text window
to facilitate post-editing. With many systems, the same is true of the source text
window, which simplifies the task of pre-editing, i.e. altering the source text so as to
give the MT system a chance of doing a better draft translation (“post-editing the
source text”).
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
17
1 - Contextualizing translation technologies and projects
“The translator’s workstation” (Harold Somers)
[…]
[T]he translator’s workstation represents the most cost-effective facility for the
professional translator, particularly in large organizations. It makes available to
the translator at one terminal a range of integrated facilities: multilingual word
processing, electronic transmission and receipt of documents, spelling and
grammar checkers, style checkers or drafting aids, publication software,
terminology management, text concordancing software, access to local or
remote term banks, translation memory, and access to automatic translation
software to give rough drafts. The combination of computer aids enables
translators to have under their own control the production of high quality
translations.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
18
1 - Contextualizing translation technologies and projects
Computer-assisted translation (CAT)
http://www.hutchinsweb.me.uk/MTJ-1998.pdf
Origins of the Translator’s Workstation (John Hutchins)
Proposals for the translator’s workstation can be traced back over more than 20
years. Their full integration and acceptance had to await technical developments of
the 1990s, but their desirability for the effective utilization of machine aids and
translation tools was recognized long ago. The title of workstation has been applied
to a number of translation aids, but here we are concerned only with the type of
workstation intended for direct use by professional translators knowing both
source and target languages, and retaining full control over the production of their
translations. Workstations and other computer-based translation tools are
traditionally referred to as systems for “machine aided human translation” (MAHT),
in order to distinguish them from MT systems with some kind of human assistance
either before or after processing (pre- and post-editing), known often as “human
aided machine translation” (HAMT).
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
19
1 - Contextualizing translation technologies and projects
http://www.hutchinsweb.me.uk/MTJ-1998.pdf
Origins of the Translator’s Workstation (John Hutchins)
The 1966 ALPAC report encouraged support for basic computational linguistics and
the development of computer-based aids for translators.
Computer-based terminological resources were received with increasing favor by
translators from the late 1960s. Particularly in large governmental and industrial
organizations, there was an increasingly pressing need for fast access to up-to-date
glossaries and dictionaries in science, technology, economics and the social sciences
in general. The difficulties were clear: rapidly changing terminology in many scientific
and technical disciplines, the emergence of new concepts, new techniques and new
products, the often insufficient standardization of terminology, and the multiplicity of
information sources of variable quality and reliability. It was recognized from the
outset that on-line dictionaries for translators could not be the kinds of dictionaries
developed in MT systems. Translators do not need the kind of detailed information
about grammatical functions, syntactic categories, semantic features, inflected forms,
etc. which is to be found in MT lexica, and which is indeed essential for automatic
analysis. Nor do translators need to consult dictionaries for items of general
vocabulary-which are equally essential components of an MT system dealing with
full sentences.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
20
1 - Contextualizing translation technologies and projects
http://www.hutchinsweb.me.uk/MTJ-1998.pdf
Origins of the Translator’s Workstation (John Hutchins)
In the 1970s, terminology data banks were being built to provide information on
demand about individual words or phrases as the basis for the production of
glossaries for specific texts, and for the production of published up-to-date
specialized dictionaries for general use. Many of the databanks were multilingual,
nearly all provided direct online access and most included definitions. In the case of
other termbanks, the emphasis was on the provision of terms in actual context.
[…] The databases were intended not just for translators but also for lexicographers
and other documentation workers, with facilities for compiling dictionaries and
term glossaries, for producing text-related glossaries for machine-aided translation,
for direct online access to multilingual terminology databanks, and for accessing
already translated texts by means of indexes. The archive of translations, recorded
on magnetic tapes, could also be the source of re-usable translation segments.
However, the whole complex of interlinked linguistic databases was constrained by
the computer technology then available.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
21
1 - Contextualizing translation technologies and projects
http://www.hutchinsweb.me.uk/MTJ-1998.pdf
Origins of the Translator’s Workstation (John Hutchins)
The use of a translation archive was elaborated by Peter Arthern (1979) in a
proposal for what has now, since the late 1980s, become known as a translation
memory. The suggestion was made in a discussion of the potential use of computerbased terminology systems in the European Commission. After stressing the
importance of developing multilingual text processing tools and of providing access
to terminological databanks, Arthern went on to comment that many EC texts were
highly repetitive, frequently quoting whole passages from existing EC documents
and that translators were wasting much time re-translating texts which had already
been translated. He proposed the storage of all source and translated texts, the
ability to quickly retrieve any parts of any texts, and their immediate insertion into
new documents as required. He referred to his concept as “translation by textretrieval”, and envisioned an early model translator’s workstation which could still
accommodate a full MT system. The concept would not come to fruition for
another decade or more.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
22
1 - Contextualizing translation technologies and projects
http://www.hutchinsweb.me.uk/MTJ-1998.pdf
Origins of the Translator’s Workstation (John Hutchins)
One of the most decisive moments in the development of the future translator’s
workstation is now considered to be the (initially limited) circulation of a
memorandum in 1980 by Martin Kay. This combined a critique of the current
approach to MT, namely the aim to produce systems which could essentially replace
human translators or at best relegate them to post-editing and dictionary updating
roles, and an argument for the development of translation tools which would
actually be used by translators. Since this was before the development of
microprocessors and personal computers, the context was a network of terminals
linked to a mainframe computer. Kay’s basic idea was that existing text-processing
tools could be augmented incrementally with translation facilities. The basic need
was a good multilingual text editor and a terminal with a split screen; to this would
be added a facility to automatically look up any word or phrase in a dictionary and
the ability to refer to previous decisions by the translator to ensure consistency in
translation; and finally to provide automatic translation of text segments, which the
translator could opt to let the machine do without intervention and then post-edit
the result, or which could be done interactively, i.e. the computer could ask the
translator to resolve ambiguities.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
23
1 - Contextualizing translation technologies and projects
http://www.hutchinsweb.me.uk/MTJ-1998.pdf
Origins of the Translator’s Workstation (John Hutchins)
Alan Melby, in 1981, put forward the use of a bilingual concordance as a valuable
tool for translators. It enabled translators to identify text segments with potential
translation equivalents in relevant contexts. As an example, he showed an English
text segmented into phrases and its corresponding French version, segmented
likewise. The computer program would then create a concordance based on
selected words or word pairs displaying words in context. The concordance could
be used not only as an aid to study and analyze translations, but also for quickly
determining whether or not a given term was translated consistently in technical
texts, to assist translators in lexical selection, and in the development of an MT
system for some narrow sublanguage. Melby seems to be the first to suggest
concordance application as a translation tool. In his experiment, texts were input
manually and correspondences between texts (later called “alignments”) were also
made by human judgement. Only the concordancing program was automated, but
Melby was clearly looking forward to the availability of electronically produced texts
and of automatic alignment. At the same time, he was making specific proposals for
a translator’s workstation—quite independently of Kay’s proposals in 1980. Like Kay,
Melby wanted the translator to be in control, to make his/her own decisions about
when to translate fully and when to post-edit, and he wanted to assist translation
from scratch by providing integrated computer aids.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
24
1 - Contextualizing translation technologies and projects
http://www.hutchinsweb.me.uk/MTJ-1998.pdf
Origins of the Translator’s Workstation (John Hutchins)
The aim was the “smooth integration of human and machine translations” (Melby
1982), bringing together various ideas for supporting translators in an environment
offering three levels of assistance. At the first level, certain translation aids can be
used without the source text having to be in machine-readable form. The translator
could start by just typing in the translation. This first level would be a text processor
with integrated terminology aids and access to a bilingual terminology data bank,
both in the form of a personal file of terms and in facilities for accessing remote
termbanks (through telecommunications networks). In addition, there might be
access at this level to a database of original and translated texts. At the second level,
the source text would be in machine-readable form. It would add a concordancing
facility to find all occurrences of an unusual word or phrase in the text being
translated, facilities to look up terms automatically in a local term file, display
possible translations, and means of automatically inserting selected terms into the
text. The third level would integrate the translator work station with a full-blown
MT system. Melby suggested that the ideal system would be one which evaluates the
quality of its own output (from “probable human quality” to “deficient”), which the
translator could choose to incorporate unchanged, to revise or to ignore.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
25
1 - Contextualizing translation technologies and projects
http://www.hutchinsweb.me.uk/MTJ-1998.pdf
Origins of the Translator’s Workstation (John Hutchins)
Both Melby and Kay stressed the importance of allowing translators to use aids in
ways they personally found most efficient. The difference between them was that
whereas Melby proposed discrete levels of machine assistance, Kay proposed
incremental augmentation of translator’s computer- based facilities. Translators
could increase their use of computer aids as and when they felt confident and
satisfied with the results. And for both of them, full automation would play a part
only if an MT system made for greater and cost-effective productivity. These ideas of
Kay and Melby were being made when text-processing systems still consisted
essentially of a range of terminals connected to a mainframe computer and to
separate printers for producing publishable final documents. It was natural to
envisage networked systems rather than individual workstations. For ex., Melby
assumed that the future scenario was a “distributed system in which each translator
has a microcomputer tied into a loose network to share resources such as large
dictionaries.” (1982) The technology situation definitively changed with the
appearance of the first personal computers in the mid 1980s, providing access to
word processing and printing facilities within the range of individual professional
translators.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
26
1 - Contextualizing translation technologies and projects
As needs change, technologies evolve, and environments are modified, the
“tools” and “workspace” of the translator likewise are transformed.
The history of the translator’s workstation reflects these changes.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
27
1 - Contextualizing translation technologies and projects
Globalization, Internationalization, Localization, Translation (GILT)
Globalization (g11n): Refers to a broad range of processes necessary to prepare and
launch products and company activities internationally. Addresses the business issues
associated with launching a product globally, such as integrating localization
throughout a company after proper internationalization and product design.
Internationalization (i18n): The process of generalizing a product so that it can handle
multiple languages and cultural conventions without the need for redesign.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
28
1 - Contextualizing translation technologies and projects
Globalization, Internationalization, Localization, Translation (GILT)
Localization (l10n): The process of adapting a product or software to a specific
international language or culture so that it seems natural to that particular region.
True localization considers language, culture, customs and the characteristics of the
target locale. It frequently involves changes to the software’s writing system and may
change keyboard use and fonts as well as date, time and monetary formats.
Translation: The process of converting all of the text or words from the source
language to the target language. An understanding of the context or meaning of the
source language must be established in order to convey the same message in the
target language.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
29
1 - Contextualizing translation technologies and projects
Globalization, Internationalization, Localization, Translation (GILT)
Source: Pierre Cadieux, Technology Editor, LISA Newsletter & Bert Esselink, Chief
Editor, Language International
(http://www.lisa.org/globalizationinsider/2002/03/gilt_globalizat.html)
The "GILT slide" puts it all together.
* Globalization is a two-step process: internationalization and localization.
* There are usually several localization efforts happening in parallel.
* Translation is often the largest part of localization.
Translation refers to the specifically linguistic operations, performed by human or
machine, that actually replaces the expressions in one natural language into those of
another.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
30
1 - Contextualizing translation technologies and projects
Globalization, Internationalization, Localization, Translation (GILT)
Source: Pierre Cadieux, Technology Editor, LISA Newsletter & Bert Esselink, Chief
Editor, Language International
(http://www.lisa.org/globalizationinsider/2002/03/gilt_globalizat.html)
We can see more and more practices and technologies that were previously very
specific to the "localization world" entering into the more traditional translation
industry. For example, translation memory tools are now commonly used by
translators who translate material which is not software related. The concepts of
translation and localization may progressively merge. Localization may no longer be
a separate discipline since sooner or later all translators will have to know at least
the basics of localization – from translation to localization, and back again.
* * *
Localization basics are best understood through the notion/model of
PROJECT.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
31
1 - Contextualizing translation technologies and projects
Localisation Research Centre: http://www.localisation.ie/
The Localisation Industry Standards Association: www.lisa.org
Localization World: http://www.localizationworld.com/
Inttranews: http://inttranews.inttra.net/cgi-bin/home.cgi?langues=eng&phase=1
Multilingual magazine: www.multilingual.com
Common Sense Advisory: http://www.commonsenseadvisory.com/
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
32
1 - Contextualizing translation technologies and projects
Translator’s Tool Box (Jost Zetzsche): http://www.internationalwriters.com/toolbox/
John Hutchins Web site: http://www.hutchinsweb.me.uk/
Translation Automation User Society: http://www.translationautomation.com/joomla/
Byte Level Research: http://www.bytelevel.com/
Jeff Allen’s Post-editing site: http://www.geocities.com/mtpostediting/
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
33
1 - Contextualizing translation technologies and projects
2.
PEDAGOGICAL EXERCISES
◦
Linguistic analysis of text, from perspectives of HT linguistics, MT computational
linguistics, and CAT. Goal: to understand how text is generated by humans and
by machines, for insight on how it is also translated by humans and machines.
Benefit: how to revise HT and MT text.
◦
Review the HT process. Go through the same exercise but indicate how
automation and CAT integrate into this process. How does the HT:CAT
relationship differ from the MT:CAT one?
◦
Explain the above in terms of the translator’s workstation.
◦
Compare the office work environment and translator’s office work environment
in terms of the software MS Office and SDL Trados.
◦
Others?
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
34
2 - Management of technologies, workflow and content
3 - Project management and quality control
1.
PROFESSIONAL AND ACADEMIC OBSERVATIONS

Project management is a process of decision-making.
Resources managed: human; technical; material; financial.
There are general principles of project management; nonetheless, every project is
unique.
Communication is vital to the well-being of the life-cycle and to the success of the
project.
Standards and best practices are important, as is certification of processes, services
and products.
The Project Management Institute recognizes five basic groups of processes:
initiating; planning; executing; controlling and monitoring; and closing.
The PMI recognizes nine knowledge areas: management of project integration,
scope, time, cost, quality, human resources, communications, risk, and procurement.






Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
35
2 - Management of technologies, workflow and content
3 - Project management and quality control
•“A Step by Step Guide to Translation Project Management”
(Sanaa Benmessaoud 2002) at
www.translationdirectory.com/articles/article1543.php
•Project Management Institute (PMI)
www.pmi.org
•“Translation and Project Management”
(C.R. Perez 2002) at
http://accurapid.com/journal/22project.htm
•“Translation Project Management”
Andrey Vasyankin at
http://www.translationdirectory.com/article65.htm
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
36
2 - Management of technologies, workflow and content
3 - Project management and quality control
Project Management:
“The Project Management Institute (PMI) (2000: 6) defines project management as
‘the application of knowledge, skills, tools and techniques to project activities to
meet project requirements.’”
Project Manager:
“A project manager (PM) will be required to plan the budget, track the workflow to
ensure the project is completed on time, and control all the phases of the project to
make sure its outcome will meet the client’s requirements.”
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
37
2 - Management of technologies, workflow and content
3 - Project management and quality control
Translation Project’s Life-cycle (adapted from Perez 2002): commissioning, planning,
groundwork, translation and wind-up.
Steps and phases:
•COMMISSIONING
•Reception of RFQ [Request For Quotation]
•Pre-sales evaluation
•Commissioning
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
38
2 - Management of technologies, workflow and content
3 - Project management and quality control
•PLANNING
•Project evaluation: identify client’s needs and objectives, as well as short-term and longterm goals
•Work sub-division: break-down structure and work packages
•Schedule plan of dependences and sequences: i.e. which work package or activity depends
on the completion or sequence of another
•File management
•Resource and budget plan
•Communication plan
•Quality Assurance plan: to evaluate overall project performance on a regular basis to
provide confidence that the project will satisfy the relevant quality standards [PMI 2000]
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
39
2 - Management of technologies, workflow and content
3 - Project management and quality control
•GROUNDWORK
•Project glossary preparation
•Text alignment
•Text preparation
•TRANSLATION
•WIND-UP
How complex have projects become?
http://www.project-open.com/solution/translation/
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
40
2 - Management of technologies, workflow and content
3 - Project management and quality control
A brief word on file management……
needed for tracking jobs and for storing data (project; client; translator)
Question: how would you create and manage your files?
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
41
2 - Management of technologies, workflow and content
3 - Project management and quality control
2.
PEDAGOGICAL EXERCISES

Think through, visualize and depict the flow of content and work within your
organization or company. [This exercise is crucial, for example, when
conceptualizing databases and putting them into place.]

If you had to propose and explain your organizational/company file management
structure to consultants or to new project managers, what would this structure be
like?

Simulate and carry out a hypothetical project with the class.

Connect with an NGO or other organization to carry out a real project.

Concordia U projects include: Ad-Com Loc company; YMCA Tours Ecuador;
Committee for Social Justice; Romani Yag Web site; Tactical Tech [for Africa].

Find, read and discuss PM position profiles.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
42
2 - Management of technologies, workflow and content
3 - Project management and quality control
Company information: WJ Airways
management recently decided to expand
business operations internationally, with
the hope that global operations would
nourish local domestic business by bringing
in passengers to travel within the country.
International routes will include flights to
and from India, Africa, the Middle East,
China, Latin America and Canada. A
central hub will be established in Budapest,
Hungary. The company will increase its
fleet of aircraft from 50 to 75 within three
years, and offer special vacation packages
for international tourists.
Corporate decisions:
• Localize in-flight magazine, safety
instructions (laminated card, video),
TV screens (publicity, info, movies,
music, maps), company financial
application, Web site (including
online reservation system), and
customer service.
• Localize above service and product
content from English into Hindi,
Swahili, Arabic, Chinese, Spanish,
Portuguese, French and Hungarian.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
43
2 - Management of technologies, workflow and content
3 - Project management and quality control
Lay out the territory to cover the project life-cycle from beginning to end.
Formulate relevant questions and preliminary answers.
Assess and integrate human, material and technological resources.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
44
2 - Management of technologies, workflow and content
3 - Project management and quality control
Programmers
and Engineers
Source
Content
Writers and
Developers
PROJECT
MANAGER
Target
Content
Writers and
Developers
Legal,
Commercial
and Cultural
Consultants
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
45
2 - Management of technologies, workflow and content
3 - Project management and quality control
What is content …?
•“Any digitized information—text, document, image, video, structured record, script,
application code, or metadata—that conveys meaning or represents value in
interactions or transactions. It ranges from documents to HTML to graphics to
telematics and beyond.” (DePalma, Common Sense Advisory, 2008)
•“A system of words, images, audio and video that is integrated with information
architecture and visual design to communicate…” (Harris and McCormack 2000)
The content carries communication!
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
46
2 - Management of technologies, workflow and content
3 - Project management and quality control
Define project content…
•In-flight magazine
•Safety instructions (laminated card, video)
•TV screens (publicity, info, movies, music, maps)
•Company financial application
•Web site (including online reservation system)
•Customer service
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
47
2 - Management of technologies, workflow and content
3 - Project management and quality control
Define project content format …
•In-flight magazine
color desktop-published 100-page magazine
bilingual text entries (English + language representing flight route)
•Safety instructions (laminated card, video)
color desktop-published laminated card based on images and simplified
explanations in bilingual version (English + language representing flight route)
video film with audio providing detailed explanations (subtitled)
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
48
2 - Management of technologies, workflow and content
3 - Project management and quality control
Define project content format …
•TV screens (publicity, info, movies, music, maps)
video film clips or pub spots from sponsors (subtitled or dubbed)
real-time flight information in moving bilingual text [dynamic]
 films (subtitled or dubbed into English or other language)
music (channels should include local music)
maps (territory covered by flight route and plane icon representing real-time
movement [dynamic]
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
49
2 - Management of technologies, workflow and content
3 - Project management and quality control
Define project content format …
•Company financial software application
general ledgers and sub-ledgers to create income statements, balance sheets, and to
track assets, liabilities, income and expenses, including modules for billing, job costing,
points of sale
ability to transfer information and funds between branches; to import data from
other modules, systems, and spreadsheets; and to generate reports
ability to measure adherence to industry or government accounting standards in
currencies of all branches
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
50
2 - Management of technologies, workflow and content
3 - Project management and quality control
Define project content format …
•Web site (including online reservation system)
splash page with options to enter site in all designated languages
flights page: routes, destinations, schedules, booking, check-in
guest and member pages: including profiles and bookings, itineraries, email
newsletters or items of interest
special offers: including vacation packages, car and hotel rentals
rewards and airmiles
travel information: including check-in times and methods; ID and travel documents;
special needs; travel tips; international travel info
company information: welcome; jobs; media and investors; sponsorship; online
store
reasons for flying with the company: marketing and PR
contact info: + FAQs; management profiles; specific request forms
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
51
2 - Management of technologies, workflow and content
3 - Project management and quality control
Define project content format …
•Customer service
customer service agent training course
on-site
on-line
assistance information for phone and on-site presence
in English and localized in all other languages
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
52
2 - Management of technologies, workflow and content
3 - Project management and quality control
A brief word about content format …
Content data is stored in many different formats. Different software applications
represent and store information in different ways. Get to know your file extensions!
A file extension is nothing more than the last characters after the period in the
name of a file. FILExt is a database of file extensions and the various programs that
use them. If you know the file extension, simply enter it into the search box on the
left and click on the Search button. (http://filext.com/)
For example, if we run a search on .pdf:
•Acrobat Portable Document Format
•The PDF format has become a standard for document transfer between computer
architectures. A PDF file retains formatting for the file being transmitted. Free
viewers are available at the Adobe website and other locations.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
53
2 - Management of technologies, workflow and content
3 - Project management and quality control
Define project roles and players …
PROJECT MANAGER(S)
PROGRAMMERS AND ENGINEERS
SOURCE CONTENT WRITERS AND DEVELOPERS
TARGET CONTENT WRITERS AND DEVELOPERS
LEGAL, COMMERCIAL AND CULTURAL CONSULTANTS
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
54
2 - Management of technologies, workflow and content
3 - Project management and quality control
Define project needs …
PROJECT MANAGER(S)
• +project coordinators
PROGRAMMERS AND ENGINEERS for
• software applications (PM; workflow; financial)
•Web site [static and dynamic] content and applications
•database(s)
•graphics localizers
SOURCE CONTENT WRITERS AND DEVELOPERS
•technical writers
•desktop publishers [print and Web]
•Web site developers and Webmasters
•audio-visual producers
•Subject Matter Experts (SMEs)
•terminologists
•style guide writers
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
55
2 - Management of technologies, workflow and content
3 - Project management and quality control
Define project needs …
TARGET CONTENT WRITERS AND DEVELOPERS
•desktop publishers
•Webmasters
•terminologists and lead linguists
•CAT, Loc and MT tool/technology specialists (if not done by PMs)
•translators (with appropriate SME expertise)
•editors
•proofreaders and Quality Control
•subtitlers and dubbers (+ technicians)
LEGAL, COMMERCIAL AND CULTURAL CONSULTANTS
among others …..
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
56
2 - Management of technologies, workflow and content
3 - Project management and quality control
COMPLEXITY can quickly turn CHAOTIC
and so we turn to automation when we can
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
57
Point of Discussion 2
Class Project Planning and Implementation
You have decided to organize a real-life class project so as to more
effectively contextualize the use of translation technologies within a
project framework. Discuss and plan the project details for your class.
Create a project spec sheet. Points to include:

Content

Languages and geographical regions

Players and roles

Resources (including technologies)

Client specifications

Project life-cycle phases, procedures, tasks

Student participation and evaluation *
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
58
4 - Reusing and recycling: alignment

What is alignment? Some definitions…
Alignment: a process that matches up a source text and the target text segment by
segment into translation pairs, which will be stored up in a database to be used as a
translation memory. Alignment makes it possible to reuse previous translations in
future translations. Human input is required in alignment operations.
Alignment tool: translation software for the creation of bilingual text databases where
sentences (or phrases) of source texts are linked to corresponding text segments of
a target language.
Segment: a predefined unit of a source text that can be aligned with its corresponding
translation in a machine or machine-aided translation system.
Segmentation: refers to sentence separation in a machine translation system, the
purpose of which is to divide a text into easily manageable segments. Segmentation
is unnecessary in some languages, but important in others. In the case of Chinese,
one of the most intriguing issues in Chinese-English translation is the problem of
segmenting the Chinese source text as there are no interval markers, or word
boundaries, between two successive characters or phrases in a Chinese sentence.
A Dictionary of Translation Technology
Chan Sin-wai, The Chinese University Press, 2004
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
59
4 - Reusing and recycling: alignment

What is alignment?
Alignment is the process of binding a source-language segment to its corresponding
target-language segment. The purpose of alignment is to create a new translation
memory database or to add to an existing one. The corresponding pairs of source
and target-language segments are called “translation units”. Once the translator has
loaded the parallel texts—an original and its translation—into the system, the tool
makes a proposal for aligning the segments based on a number of algorithms such as
punctuation, numbers, formatting, names and dates, for which the translator is
offered various choices. The translator can then adjust the alignment proposed by
the system before committing the aligned texts to the memory, either by creating a
new one, for ex., for a new subject field or new client, or by adding to an existing
one. Translation units are usually numbered or tagged. The collection of translation
units is stored, in no particular order, in the database for future translations. Most
commercial alignment tools allow alignment at the sentence level. However, in
recent years the attention of researchers is also focused on alignment methods for
translation memory systems below the sentence level.
Translation and Technology
C.K. Quah, Palgrave Macmillan, 2006
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
60
4 - Reusing and recycling: alignment

What is alignment?
Alignment is the process of comparing a source text and its translation, matching
the corresponding segments, and binding them together as translation units in a TM.
For the best results (automatic alignment), the source and target texts must have a
similar, if not identical, structure.
Alignment is the process whereby sections of the source text are linked up with
their corresponding translations. Alignment can take place at many different levels:
text, paragraph, sentence, sub-sentence chunk, or even word. Most bilingual
concordancers align texts at either the paragraph or the sentence level. Alignments
at text level are too high-level to be useful for helping translators find an equivalent
for a particular expression, whereas alignment at word level is notoriously difficult
and error-prone given the lack of one-to-one correspondence between most
natural languages.
Computer-Aided Translation Technology
Lynne Bowker, University of Ottawa Press, 2002
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
61
4 - Reusing and recycling: alignment
How is alignment managed with a CAT tool? (SDL TRADOS)
Alignment is the process of determining which parts of the source and target
language files belong together and putting them side by side or aligning them. The
user plays an interactive role in the alignment process which enhances the alignment
results. WinAlign examines the source and target language texts to determine which
sentence pairs belong together and creates a file which is then imported into
Translator’s Workbench.
WinAlign is based on Unicode and supports all languages supported by Windows
2000 and Windows XP, including Asian languages, bi-directional languages, and
Unicode-only languages such as Hindi.
Alignment Concepts:
 Structure Recognition: When linking source and target texts, WinAlign makes use of
the fact that documents are usually structured and divided into various sections. For
example, when a document is created in Microsoft Word, it usually contains
structural elements identified by style names. The chapter title may be identified by
the Heading 1 style. The same formatting must be preset in the translated text as in
the source. Other text formats, such as HTML, XML, FrameMaker, Interleaf and
Ventura, use tags for this purpose. WinAlign uses this information to create a
structure tree for the source and target documents, and allows you to interactively
influence how this tree is built. Even when the document pairs do not have a clear
structure, WinAlign can use font sizes and paragraph numbering to perform
structure recognition.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
62
4 - Reusing and recycling: alignment

Identifying Segment Pairs: Once this structure has been determined, WinAlign begins
linking the individual segments. A segment is a sentence, title, footnote, table cell, list
element, caption or any other textual unit that WinAlign identifies. The program
examines the source and target texts carefully to create the most accurate segment
alignments possible. Both context-related and content-related characteristics are
taken into consideration. WinAlign analyses all features of the file, for example, index
entries, footnotes, proper names, numbers, dates, formatting or tags. The program
also provides tuning options to determine how much importance should be placed
on these source and target text elements during the alignment. The user can help
optimize the alignment by supplying project-specific abbreviation and terminology
lists. WinAlign considers a large number of factors during alignment, which helps to
produce a high number of matching segment pairs.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
63
4 - Reusing and recycling: alignment

Alignment Workflow:
The alignment workflow is summarized in the following steps:
1
Create a new alignment project in WinAlign.
2
Add source files and target files.
3
Align the source and target files.
4
Review the alignment.
5
Save the alignment project and export the alignment results.
6
Import the alignment results into a Translator’s Workbench translation
memory.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
64
5 - Translation memory

What is translation memory? Some definitions…
•
Translation memory: a computer-aided translation program. In essence it is a
database that stores translated sentences (translation units or segments) with their
respective source segments in a database (the “memory”). For each new segment
to be translated, the program scans the database for a previous source segment
that matches the new segment exactly or approximately (a fuzzy match) and, if
found, suggests the corresponding target segment as a possible translation. A
translator can then accept, modify or reject the suggested translation.
Translation memory system: refers to a type of machine-aided human translation
tool that stores previous translations and offers these translations when identical
or similar sentences are encountered when translating new materials.
Similarity match: a type of matching scheme for the free-form queries in a
computer-aided translation system. The queries are first passed through the system
and the browser performs a similarity match between the internal representation
of the queries and the internal representation of each sentence in the database. In
this way, both surface similarity and structural similarities can be matched.
•
•
A Dictionary of Translation Technology
Chan Sin-wai, The Chinese University Press, 2004
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
65
5 - Translation memory

Translation memory has been defined as a “multilingual text archive containing
(segmented, aligned, parsed and classified) multilingual texts, allowing storage and
retrieval of aligned multilingual text segments against various search conditions”
(EAGLES 1996—The Expert Advisory Group on Language Engineering Standards).
Unlike machine translation systems, which generate translations automatically,
translation memory systems allow professional translators to be in charge of the
decision-making whether to accept or reject a term or an equivalent phrase or
segment suggested by the system during the translation process. Virtually all TM
systems are language-independent and support international character sets that
represent many, if not all, alphabets and scripts digitally.

Translation memory technology works by reusing previously translated texts and
their originals in order to facilitate the production of new translations. It can also
interface with databases of stored specialized terminologies that can be accessed
and retrieved for reuse in new translations.
Translation and Technology
C.K. Quah, Palgrave Macmillan, 2006
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
66
5 - Translation memory
A translation memory system has no linguistic component, and two different
approaches are employed to extract translation segments from the previously
stored texts. These are known as perfect matching and fuzzy matching.
•
A perfect or exact match occurs when a new source language segment is
completely identical including spelling, punctuation and inflections, to the old
segment found in the database, that is in the TM.
•
Unlike a perfect match, a fuzzy match occurs when an old and a new source
language segment are similar but not exactly identical. Even a very small difference
such as punctuation leads to a fuzzy match.
Translation and Technology
C.K. Quah, Palgrave Macmillan, 2006
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
67
5 - Translation memory
As the degree of similarity between old source segments in the database or
memory and new source text segments currently being translated may vary, an
algorithm is used to calculate a percentage which expresses the degree of
match. The higher the percentage of the fuzzy match the closer the similarity
between the two source language segments. The threshold percentage can be
set by the user at a high level, for instance at 90%, to restrict the retrieval of
old source language segments to those containing only small differences from
the new source language segment. In contrast, the threshold can be set at a
low level, for instance at 10%, to allow the translation memory to retrieve
segments only weakly related to the new segment. Segments that mean the
same thing but differ in format such as dates, measurements, time and spellings
all fall in the fuzzy match category although they are differently categorized.
Some systems allow for the automatic processing of such changes. Polysemous
and homonymous words, that is homographs, always need careful handling a
present a challenge.
Translation and Technology
C.K. Quah, Palgrave Macmillan, 2006
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
68
5 - Translation memory
Segmentation is the process of breaking a text up into units consisting of a word or
a string of words that is linguistically acceptable. Segmentation is needed in order
for a TM to perform the matching (perfect and fuzzy) process. A pair of old source
and target language texts is usually segmented into individual pairs of sentences.
However, not all parts of texts, particularly specialist texts, are in a sentence format.
Exceptions include headings, lists and bullet points. As a result, different units of
segmentation are needed. A translator can decide the length of a segment but often
punctuation is used as an indicator. A segment is then allocated a unique number or
tag by the system. It is important to note that while segmentation is quite natural
for Latin-based alphabets, it is rather alien to languages such as Chinese, Thai and
Vietnamese, which are written continuously without any spaces between characters.
Thus, other methods of segmentation are required to determine the beginning and
ending of a segment in such cases. New segments can be added to the TM while
translating, and alternatively previously translated source language texts and their
translations can be entered into the memory through a process of text alignment.
Translation and Technology
C.K. Quah, Palgrave Macmillan, 2006
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
69
5 - Translation memory
Most simply, a TM can be viewed as a list of source text segments explicitly aligned
with their target text counterparts. The resulting structure is sometimes referred
to as a parallel corpus or a bitext. Translation units are stored in the TM database.
Some sophisticated TM programs use a type of technology called a neural network
to store information. A neural network allows information to be retrieved more
quickly than a sequential search technique. The essential idea behind a TM system is
that it allows a translator to reuse or recycle previously translated segments.
Reusing a previous translation in a new text is sometimes referred to as
“leveraging”.
How does a TM system work? This technology works by automatically comparing a
new source text against a database of texts that have already been translated. When
a translator has a new segment to translate, the TM system consults the database
to see if this new segment corresponds to a previously translated segment. If a
matching segment is found, the TM system presents the translator with the previous
translation, and the translator decides whether or not to incorporate it into the
new translation.
Computer-Aided Translation Technology
Lynne Bowker, University of Ottawa Press, 2002
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
70
5 - Translation memory
Segmentation: In most instances, the basic unit of segmentation is the sentence.
However, not all text is written in sentence form. Headings, list items and table cells
are familiar elements of text, but they may not strictly qualify as sentences.
Therefore, many TM systems allow the user to define other units of segmentation
in addition to sentences. These units can include sentence fragments or entire
paragraphs. Deciding what constitutes a segment is not a trivial task. How can the
TM system identify sentences? Punctuation parks such as periods, exclamation
points, and question marks are typically used. Problematic cases are abbreviations,
or section headings, or embedded sentences. Some of these problems can be
resolved by incorporating stop lists (eg. Lists of abbreviations that do not indicate
the end of a sentence, such as Mrs. and e.g.) into the TM system. An additional issue
is the fact that the segmentation units used in the source text may not correspond
exactly to those used in the translation. This lack of one-to-one correspondence
can create difficulties for automatic alignment programs.
Computer-Aided Translation Technology
Lynne Bowker, University of Ottawa Press, 2002
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
71
5 - Translation memory
Matches: most TM systems present the user with a number of different types of
segment matches. The most common types are exact, fuzzy, and term matches.
Research is being done on full and sub-segment matches. Exact matches are the
most straightforward or perfect matches.
An exact match is 100% identical to the segment that the translator is currently
translating, both linguistically and in terms of formatting. The process used by the
TM system to identify perfectly matching segments is one of strict pattern
matching. This means that the two strings must be identical in every way, including
spelling, punctuation, inflection, numbers, and even formatting. Any segment in the
new source text that does not match an original segment precisely will not produce
an exact match. The translator is not forced to accept the translation proposed by
the TM system. Even though a segment may be identical, translators are concerned
with translating complete texts rather than isolated segments so it is important to
read the proposed translation in its new context to be sure that it s both
stylistically appropriate and semantically correct.
Full matches occur when a new source segment differs from a stored TM unit only
in terms of so-called variable elements, which are sometimes referred to as
“placeables” or “named entities”. Variable elements include numbers, dates, times,
currencies, measurements, and sometimes proper names. These elements typically
require some kind of special treatment in a text. TM systems need to ignore
variable elements for matching purposes.
Computer-Aided Translation Technology
Lynne Bowker, University of Ottawa Press, 2002
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
72
5 - Translation memory
Fuzzy matches are approximate or partial matches. A fuzzy match retrieves a
segment that is similar, but not identical, to the new source segment. Some TM
systems use color coding to illustrate various types of differences between the new
source text segment and the retrieved segment. The degree of similarity in a fuzzy
match can range from 1% to 99%, and the user generally has the ability to set the
sensitivity threshold to allow the TM system to locate previously translated
segments that may differ only slightly from the new source text segment or
segments that vary greatly. If the sensitivity threshold is set too high, there is a risk
that the TM will produce “silence”: potentially useful partial matches will not be
retrieved. However, if it is set too low, the system will produce “noise”: the
suggested translations that are retrieved will be too different from the new source
text segment and therefore not helpful. When the threshold is very low, a match
may be made on the basis of very general words (“the”, "and”) and the overall
content of the retrieved segment may contain little of value for helping the
translator to translate the new segment. Many translators prefer to set the
threshold somewhere between 60% and 70%. Although fuzzy matching can be
useful, it requires careful proofreading and editing to ensure that the proposed
translation is appropriate for inclusion in the new target text.
Computer-Aided Translation Technology
Lynne Bowker, University of Ottawa Press, 2002
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
73
5 - Translation memory
Term matches are done through the process of active terminology recognition and
essentially constitutes automatic dictionary lookup. If one or more terms are
recognized as being in the term base, the TM system points to the appropriate term
records and the translator can then make use of the relevant information contained
there. This means that when no exact or fuzzy matches are found for source text
segments, the translator might at least find some translation equivalents for
individual terms in the term base.
Sub-segment matching falls partway between fuzzy and term matching. In fuzzy
matching, the two segments must have a number of elements in common in order
for a match to be established. In term matching, the new source segment is
compared against entries in the term base. In the case of sub-segment matching, the
elements that are compared are smaller chunks of segments. This means that a
match can be retrieved between two small chunks of segments, even if the
complete segments do not have a high degree of overall similarity. When both
segments contain a chunk that is very similar indeed, there is a possibility that the
translator may be able to reuse that chunk. Further refined, a combined full
segment/sub-segment approach allows the TM system to automatically compare the
new source text segment against the stored TM. It will begin by examining complete
segments, first looking for exact matches and then for fuzzy matches, and if no such
match is found at the segment level, it will compare increasingly smaller chunks in
an effort to find a match. In this way, the translator may be presented with subsegment matches originating from several different segments, even if none of those
complete segments qualified as a fuzzy match.
Computer-Aided Translation Technology
Lynne Bowker, University of Ottawa Press, 2002
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
74
5 - Translation memory
This strategy is similar to the approach used in example-based machine translation
(EBMT). The principal difference between a TM as a support tool and a full-fledged
EBMT System is basically a question of who has the primary responsibility for
analysis of the segments and formulation of the target text, whereas with EBMT, the
computer is responsible for producing a complete draft of a target text, though this
may still need to be post-edited by a human translator.
No matches: in which case the translator must translate from scratch. Another
option is to use a machine translation system to translate the portions of the
source text for which no match was found in the TM.
Computer-Aided Translation Technology
Lynne Bowker, University of Ottawa Press, 2002
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
75
5 - Translation memory
There are two main ways in which translations can be entered into the TM
database: through interactive translation or through post-translation alignment.
Interactive translation has the potential to produce a TM that is high in quality
but initially low in volume, where post-translation alignment has the potential
to produce a TM that is higher in volume but (possibly) lower in quality. It is
entirely possible to build a TM using a combination of both.
Computer-Aided Translation Technology
Lynne Bowker, University of Ottawa Press, 2002
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
76
5 - Translation memory
Interactive translation is the most straightforward way for translators to construct
a TM, adding translation units to the memory as they go along. Each time the
translator translates a source text segment, the paired translation unit can be
stored in the TM database. Once a segment has been translated and stored, it
immediately becomes part of the TM. This means that if that segment, or a similar
one, occurs again in the text-even in the very next sentence- the previous
translation is suggested to the translator automatically. The translator then has the
choice of accepting the previous translation or editing it if the context requires
change. Note that many TM systems can also be networked, which means that
multiple translators can contribute to one TM, and the volume of data that it
contains can be built more quickly. In a networked situation, it is possible to give
different types of privileges to different users in order to exercise some form of
quality control. For ex., all users can be given permission to consult the TM, but the
ability to add new TUs can be restricted to revisers or senior translators.
Computer-Aided Translation Technology
Lynne Bowker, University of Ottawa Press, 2002
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
77
5 - Translation memory
Working with an existing TM: there are two main methods – interactive mode and
batch mode. A translator working in interactive mode proceeds to work through
the new source text segment by segment, and the TM system attempts to match
the segments stored in the database against the new source text segments. As each
new segment is translated, the TU is immediately added to the TM and is available
for reuse the next time an identical or similar segment is encountered. In the
second, most TM systems also allow for batch translation, sometimes referred to as
pre-translation, which means that a user can run a complete source text through
the system, and whenever it finds an exact match, it will automatically replace the
new source text segment with the translation that is stored in the TM. Segments for
which no match is found must later be translated by either a human translator or a
machine-translation system. In either case, the entire text must then be post-edited
by a human translator to ensure that the replacements made by the system were
correct. If the translator makes changes to any matches that were inserted
automatically, these changes can subsequently be added to the TM to keep it up to
date.
Computer-Aided Translation Technology
Lynne Bowker, University of Ottawa Press, 2002
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
78
5 - Translation memory
TM systems are often integrated with other tools:
With terminology-management systems -- the TM system compares the source text
segments against the previously translated segments stored in the TM database and
at the same time, using a process known as active terminology recognition, the TMS
compares the individual terms contained in each source text segment against the
terms contained in the term base. If the term is recognized as being in the term
base, the translator’s attention is drawn to the fact that an entry exists for this term,
and the translator can view the term record and then insert the term from the
record directly into the target text.
With bilingual concordancers – which allow the user to retrieve all instances of a
specific search string and view these occurrences in their immediate context. This
means that a translator can ask to see all the occurrences of any text fragment (not
just a pre-defined segment) that appear anywhere in the TM, along with their
translation equivalents. This allows the translator to quickly view the search string in
context together with its translations, which may not always be the same.
With machine translation systems – where a new source text is first compared
against a TM, which will replace those segments for which exact matches are
retrieved. The segments that are still untranslated can be fed into a machine
translation system, which produces a draft translation. The entire document is then
passed on to a human translator for post-editing. The final translation can be aligned
with the original source text and stored in the TM database for future reuse.
Computer-Aided Translation Technology
Lynne Bowker, University of Ottawa Press, 2002
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
79
5 - Translation memory
Most current commercial TM systems offer a quantitative evaluation of the match in
the form of a score, often expressed as a percentage, and sometimes called a fuzzy
match score or similar. How this “score”, is arrived at can be quite complex, and is
not usually made explicit in commercial systems, for proprietary reasons.
In all systems, matching is essentially based on character-string similarity, but many
systems allow the user to indicate weightings for other factors, such as the source of
the example, formatting differences, and even significance of certain words. The
character-string similarity calculation uses the well-established concept of “sequence
comparison”, also known as the “string-edit distance” because of its use in spell
checkers, or more formally the “Levenshtein distance” after the Russian
mathematician who discovered the most efficient way to calculate it. The string-edit
distance is a measure of the minimum number of insertions, deletions and
substitutions needed to change one sequence of letters into another. For ex., to
change “waiter” into “waitress” requires one deletion and three insertions. The
measure can be adjusted to weight in favor of insertions, deletions or substitutions,
or to favor contiguous deletions over non-contiguous ones. In fact, the sequencecomparison algorithm developed by Levenshtein, which compares any sequences of
symbols—characters, words, digits, etc.—has a huge number of applications, ranging
from file comparison in computers, to speech recognition (sound waves represented
as sequences of digits), comparison of genetic sequences such as DNA, image
processing…in fact anything that can be digitized can be compared using Levenshtein
distance.
“Translation Memory Systems”, Harold Somers
Computers and Translation, A translator’s guide, 2003
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
80
5 - Translation memory

How is translation memory managed with a CAT tool? (SDL TRADOS)

Translator’s Workbench is a sophisticated database system that is built around the
core concept of translation memory, a method of capturing, storing and reusing
translations. Archived translations are stored in translation memory databases.
Translator’s Workbench supports interactive translation through the interface with
popular editing environments such as Microsoft Word and TagEditor. This interface
provides direct access to the translation memory database while translation is in
progress.

During translation with Translator's Workbench, the program builds a linguistic
database that stores all translated sentences or segments with their source language
equivalents. These segment pairs are referred to as translation units. At the same
time, Translator’s Workbench builds an artificial neural network that is based on the
content of the linguistic database. The neural network is designed to facilitate fast
and efficient searching using fuzzy matching techniques. The linguistic database and
its associated neural network are together referred to as a translation memory.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
81
5 - Translation memory

Each new translation memory is empty. You can build translation memory
interactively or by importing aligned sentence pairs. During interactive translation,
Translator’s Workbench automatically updates the translation memory that is open
in the background. Each time you translate a segment of text, the corresponding
translation unit is added to the translation memory. If you encounter the same or
similar text in your source document twice, Translator’s Workbench proposes your
previous translation(s). You can accept, reject or edit these suggestions – both new
and updated translations are added to the translation memory. In this way, the
translation memory grows dynamically during the translation process.

You can also populate new or existing translation memories by importing previously
translated material. The import feature enables you to transfer data from one
translation memory to another, or to load translation memory data from WinAlign
alignment projects. In this way, you can take advantage of existing translations when
starting a new project.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
82
5 - Translation memory
During translation, Translator’s Workbench uses database technology to search the
translation memory and propose previous translations for reuse. The search is
based on the degree of similarity between the source segment for translation and
the source segments of translation units that are stored in translation memory.
Translator’s Workbench expresses the degree of similarity between these source
segments in terms of a percentage value. An identical match is therefore known as a
100% match, and is likely to provide the best available translation for the source
segment you are translating.
 As well as proposing identical matches, Translator's Workbench uses a technique
known as fuzzy matching. Source segments from translation memory that are
similar, but not identical, to the source segment for translation are known as fuzzy
matches. Fuzzy match values can range from 99% to 30%, though a minimum match
value of 70% is usually enforced during interactive translation. Translator’s
Workbench allows you to view all fuzzy matches in turn, and highlights the
differences between translation memory content and the source segment for
translation. This helps you to choose the best available translation for the source
segment you are translating. As usual, you can accept, reject or edit suggestions.
 As well as facilitating interactive translation, the fuzzy matching technique is also
used during other types of translation memory search. The concordance feature and
project management utilities such as document analysis and pre-translation all use
fuzzy matching to identify translation memory content that is suitable for reuse.

Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
83
5 - Translation memory

Concordance Searching
The concordance feature in Translator’s Workbench allows you to search the
translation memory for fragments of text or subsegments that are similar or
identical to the text you are translating. Translator’s Workbench presents the search
results as a list of source segments from translation memory in which the search
text occurs, with their corresponding translations. You can configure Translator’s
Workbench to automatically run a concordance search when no match is found for
the current source segment in translation memory. Alternatively, you can run a
manual concordance search using the Concordance command which is available
from the Tools menu in Translator’s Workbench.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
84
5 - Translation memory

Batch Tools
Translator’s Workbench facilitates project management by providing batch tools for
the analysis, pre-translation and post-production of files. The batch tools are socalled because they allow you to process files individually or in batches.
The analysis and pre-translation features help you to identify and apply reusable
translation memory content before interactive translation begins. In this way, you
can derive maximum benefit from existing translation memory content and reduce
the requirement for human translation on new projects. The clean up feature is used
after translation to remove unwanted source text from translated documents and
update the translation memory in accordance with the latest changes. This ensures
maximum consistency between the content of your translated documents and your
translation memory.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
85
5 - Translation memory

Translation Memory Data Format
In a file-based translation memory, linguistic data is stored in a TMW file; the TMW
file is associated with a group of neural network files that enable fuzzy search
capability. In a server-based translation memory, linguistic and neural network data is
stored as a group of database tables in a database management system. The database
management system resides on a database server.
Although the method of data storage for each type of memory is different, the data
format remains the same. In each case, the basic unit of translation memory data is
the translation unit or segment pair. This means that linguistic data from either type
of memory is presented and manipulated in the same way during interactive
translation, project management and maintenance procedures. Furthermore, serverbased translation memories use the same import and export formats as file-based
translation memories. This facilitates the exchange of data between the two types of
translation memory.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
86
5 - Translation memory

TradosTag Bilingual File Format
TradosTag is the default bilingual file format in Translator’s Workbench and TagEditor.
During the translation process, TagEditor converts all formats to TradosTag, which is
an XML-based format for representing tagged text and bilingual data for translation
purposes. Text and formatting information are extracted from the native file format
and presented in an abstracted file format, TradosTag.
TradosTag files have a TTXextension.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
87
5 - Translation memory

Active Terminology Recognition
MultiTerm is integrated with Translator's Workbench to provide active term
recognition during translation. This means that translations of terms stored in
MultiTerm are automatically suggested as you translate your documents. Even if
Translator's Workbench cannot find a suitable segment match for the current
source segment in translation memory, it can still help by retrieving information at
term level. Matching terms from the MultiTerm termbase are highlighted in the
Workbench source window. The corresponding termbase entry is displayed in the
Workbench terminology window.You can easily paste the target term into the
document you are translating, or carry out a further termbase search. Active term
recognition uses the fuzzy matching technique to identify terms that are identical or
similar to the content of your source text. Active term recognition can find not only
reduced word forms (for example, base forms of verbs) but also root forms of
compound words, even if the elements of these compound words are spread
throughout the source segment. You can add terms directly to your termbase from
within Translator’s Workbench and Word, or from within TagEditor, enriching your
termbase as you work.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
88
5 – Translation memory
When creating a new translation memory, Translator's Workbench creates five new
files: a database file in which the translation units are stored and four neural
network files required for fuzzy searches. *.tmw is the main translation memory
database file, and *.mdf, *.mtf, *.mwf,*.iix are the neural network files.
If you want to copy or move a translation memory, copy or move all five files.
Otherwise Translator's Workbench displays an error message when opening the
copied translation memory.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
89
6 - Tagged content and translation


How is tagged content managed with a CAT tool? (SDL TRADOS)
What are tags?
Tags are brief coded statements that contain information about formatting and
structure in the tagged text file. How this information is represented differs from
one file format to another; this is why most tags are file format-specific. However,
certain general characteristics apply to all tagged formats and their representation
in TagEditor.
Opening and closing tags – these tags work in pairs to invoke and revoke an
instruction. The opening tag indicates the start of a character format or structural
element such as a heading. The closing tag marks the end of the formatting or
structural element. A typical example of such a tag pair is one indicating the
beginning and end of an HTML file, or indicating the scope of bold formatting. Text
and other tag pairs may occur in between the opening and closing tags for a
particular instruction.
Stand-alone tags – stand-alone tags work independently, for example the image tag
in HTML. Stand-alone tags are easy to recognise since they do not have sharp
edges.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
90
6 - Tagged content and translation


How is tagged content managed with a CAT tool? (SDL TRADOS)
What are tags?
TagEditor classifies all tags as external or internal, depending on their function:
External tags – external tags have a black border by default. They typically represent
structural information. These tags and their content are completely ignored during
translation and can only appear outside sentences.You rarely need to move or
delete external tags during translation.
Internal tags – internal tags have a red border by default. These tags may represent
formatting information (such as bold), surround hyperlinks or other markers, and
may appear inside the text. Most internal tags can be moved around within the
sentence to suit the translation. Depending on the file format, some internal tags
can be added or deleted as required. By default, TagEditor classifies unknown tags as
internal.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
91
6 - Tagged content and translation
When tags contain text other than structural or formatting information, TagEditor
classifies the text content as translatable or non-translatable:
Non-translatable tags – tags containing text that does not require translation are
classified as non-translatable. Most tags that contain text are non-translatable. Nontranslatable tags function as internal tags.
Translatable text within tags – when tags contain text that requires translation,
TagEditor displays the tag in three parts: the text to be translated appears as
normal text and the parts of the tag that surround it appear as interconnected
parts. You can customise the way TagEditor treats translatable text within tags.
During translation, TagEditor inserts its own tags to mark source and target
segments and to provide information about translation memory match values.
Translation unit tags – delimiting tags identify the source segment, match value and
target segment, respectively, of a regular translation unit. The tag content of each
document is vital to its integrity. By default, TagEditor protects both external and
internal tags in a document and ensures that they stay in place during translation.
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
92
6 - Tagged content and translation
Tag Editor and Translator’s Workbench (TWB)
TagEditor is a specialized application designed for translating and editing tagged text
files. Tagged text formats play an increasingly important role in document authoring
and translation. For example, HTML tags are used to define the structure and layout
of pages on the World Wide Web. Standardised General Markup Language (SGML)
and Extensible Markup Language (XML) are also used for structuring complex
documentation.
 Workbench RTF is a Rich Text Format that is compatible with Translator’s
Workbench.You can use either TagEditor or Word to translate Workbench RTF.
 TradosTag (TTX) Bilingual Format is the default file format for bilingual documents
in TagEditor. It is an XML-based format that provides a standard method for
representing tagged text formats and bilingual data for translation purposes.
TradosTag files have a *.ttx extension.
 During interactive translation, TagEditor converts monolingual source files to the
TradosTag bilingual file format. TagEditor also supports files that have already been
converted to TradosTag before interactive translation. After translation and any
post-translation tasks such as review, tag verification or clean up, target files are
saved in the original file format.

Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
93
6 - Tagged content and translation
Screen shot:
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
94
6 - Tagged content and translation
Screen shot:
Debbie Folaron (Concordia University)
CTTT Braga, Portugal 2008
95
Download