TRANSLATION AND LOCALIZATION TECHNOLOGIES IN THE CLASSROOM Theory and Practice •1~ Contextualizing translation technologies and projects •2 ~ Management of technologies, workflow and content •3 ~ Project management and quality control •4 ~ Reusing and recycling: alignment •5 ~ Translation memory •6 ~ Tagged content and translation •7 ~ Evaluation: processes and post-mortem Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 1 TRANSLATION AND LOCALIZATION TECHNOLOGIES IN THE CLASSROOM Theory and Practice •Professional •On and academic background questions of training and education •Assessing and accommodating professional and student needs •Complying with academic requirements and professional standards Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 2 Point of Discussion 1 Needs Analysis -- professional and academic You have been asked or wish to incorporate a technology component into your translator training / translation program. 1. 2. 3. 4. 5. 6. 7. What technologies are you going to include? How will you distinguish between short-term market trends and long-term transformations (economy, professional life, etc.) with regard to the technologies? Will you attempt to accommodate both? What are your concrete training objectives? What are your overall educational / academic objectives? Which perspective on technologies for which goal (academic research; professional use)? What are the criteria you have established for your priorities? What competencies and sets of skills? Can we teach students to reflect on the use of technologies (analyze and critique) at the same time we are teaching them to learn how to use them? Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 3 1 - Contextualizing translation technologies and projects 1. PROFESSIONAL AND ACADEMIC OBSERVATIONS My working premise: we *should* strive to have students reflect on the use and history of technologies while we are teaching them to learn how to use specific technologies. Our experience (10-15 years) as users of translation technologies and technologies overall now allows us to approach them with a more critical and analytical frame of mind. We can accommodate the imperative to reflect more substantively on technologies by considering the domains and histories that have contextualized their development: ◦ Human-Computer [Human-Machine] Interaction – from MT to CAT, along the HT/HAMT/MAHT/MT continuum [bridging the HT-MT gap] ◦ Localization – perhaps the first sustained “encounter” in a globalizing world between technologies and translation [many “localization procedures” have now become standard and routine components of translation projects in general] Collaboration and teams characterize the translation environment today, even though we may not be aware of this virtual dimension when we work on our translation jobs individually. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 4 1 - Contextualizing translation technologies and projects Contextualize through basic questions…. Why do we use translation and localization technologies? What has transformed conventional translation projects? ◦ Globalization ◦ Technologies (computer, communications, information, Internet) ◦ Opening up of MT research ◦ Shared, distributed assets channeled through team and collaborative approaches ◦ “Geoculturalization” strategies “…the act of allowing a local market’s geopolitics and culture to influence strategy, design and deployment of a product or service, [or] the refinement of the practice from localization into culturalization. […] For years we’ve heard endless commentary about globalization and the blurring of cultural boundaries, but I’d assert that in many ways the opposite is becoming true. The emphasis is now on the power of the local, as being supported by the global technology infrastructure.” (Tom Edwards, Englobe consultant, Multilingual , 2008) Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 5 1 - Contextualizing translation technologies and projects Contextualize through a prism of diverse and converging histories ….. International trade and commerce Human translation (HT) Machine translation (MT) Computer-assisted translation (CAT) Communication, information, computer technologies Localization Internet Globalization Globalization, Internationalization, Localization, Translation (GILT) Content management technologies Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 6 1 - Contextualizing translation technologies and projects International Trade and Commerce sea, land, air … and Internet Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 7 1 - Contextualizing translation technologies and projects International Trade and Commerce Protocols, regulations, negotiations, agreements Moving goods: import and export Selling and buying goods and services Property and intellectual property Sales agreements and contracts Investments and financing Modes and methods of payment Insurance Competition and collaboration Trade agreements Technologized and virtual Localization -------- relationship to ICTs, Globalization and Internet Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 8 1 - Contextualizing translation technologies and projects Human translation (HT) SOURCE TEXT languageculture TARGET TEXT languageculture Other ADAPTATIONS Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 9 1 - Contextualizing translation technologies and projects Human translation (HT) as process and product of linguistic-cultural transfer as analyzed through linguistic tools in terms of ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ Sounds as units of representation phonetics Sound functions and patterning phonology Word structure morphology (form; lexical category; derivation; inflection) Sentence structure syntax (words organized into phrases and sentences) Meaning semantics (information content; mental representation; reference) Usage pragmatics Acquisition language acquisition Processing psycholinguistics Variation dialects; slang; jargons; idiolects Languages in contact borrowings; pidgins; creoles; bilingualism; multilingualism Change historical linguistics Culture and identity anthropological linguistics and sociolinguistics Relevancy of linguistics to to MT… Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 10 1 - Contextualizing translation technologies and projects Contextualize along the HT/HAMT/MAHT/MT continuum …. ….. with a focus on language Human language – “natural language” linguistics (and sub-domains) Natural language: refers to a language that has evolved gradually as the major means of communication and expression of a community. It has native speakers, in contrast to computer languages and other artificial languages which have no native speakers. This type of language is normally used for human communication without any restriction of semantic scope and syntax. Machine language – “artificial language” computational linguistics Artificial language: refers to a language invented for use in computer programming. Computational linguistics is the branch of computer science concerned with natural language processing; it is about the use of computers in the study of human language and the study of making computers understand information expressed in human languages. Natural language processing: a branch of computational linguistics which deals with the computational processing of textual materials in natural languages through human manipulation. Human Translation ---------------------------------- Machine Translation Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 11 1 - Contextualizing translation technologies and projects HT/HAMT/MAHT/MT continuum Human translation: the process or act of producing a translation by a human being. To translate from one language to another requires a competent mastery of skills in language comprehension and reproduction in both the source and target languages. In human translation, translators use a variety of thought processes and skills to interpret the meaning of the source text and to communicate the meaning of that text in the target language. Human translators have proper usage of language resources, such as term, phrase, and grammar dictionaries, and are capable of creating a translation that will be clearly understood in the reader’s target language. Machine translation: refers to the use of machines (usually computers) to translate texts from one natural language to another. It has other designations such as “automatic translation”, when the process of translation is emphasized, “mechanical translation”, when the mode of production is highlighted, and “computer translation”, when the tool of production is brought to attention. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 12 1 - Contextualizing translation technologies and projects HT/HAMT/MAHT/MT continuum What is human-aided machine translation (HAMT)? Refers to the human translator supplying limited information to “fill out” the machine translation. The required human assistance may take place before machine processing begins, during the translation process, or afterwards. What is machine-aided human translation (MAHT)? Refers to a type of human translation with limited assistance from the machine. It does not remove from the translator the burden of actually performing the translation. The machine is a tool to be used or controlled at the discretion of the translator. Same as “computerassisted translation” (CAT). Also, machine-aided translation, which refers to the use of computer programmes by translators to help them during the translation process. This includes such aids as spell checkers, online access to term bank equivalents. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 13 1 - Contextualizing translation technologies and projects Machine Translation (MT) Machine translation is an interdisciplinary enterprise that combines a number of fields of study such as lexicography, linguistics, computational linguistics, computer science and language engineering. It is based on the hypothesis that natural languages can be fully described, controlled and mathematically coded (Wilss 1999: 140). MT architecture approaches: Direct translation (1st generation) Rule-based (2nd generation) Corpus-based (3rd generation) Today’s translation demands include translation for many different purposes. For MT, at least four purposes have been identified: dissemination, assimilation, information exchange and access. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 14 1 - Contextualizing translation technologies and projects Computer-assisted translation (CAT) The history of computer-assisted translation is tied to the history of the translator’s workstation. - -- - - - - - - - - - - - - - - - - - - -- - - - - - - - One definition of a translator’s workstation: A workstation is a single integrated system that is made up of a number of translation tools and resources such as a translation memory, an alignment tool, a tag filter, electronic dictionaries, terminology databases, a terminology management system and spell and grammar-checkers. There are two major translation tools in a workstation or workbench: translation memory systems and terminology management systems. (C.K. Quah) Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 15 1 - Contextualizing translation technologies and projects Computer-assisted translation (CAT) “The translator’s workstation” (Harold Somers) For example: In the late 1970s we find the first proposal for what is now called translation memory, in which previous translations are stored in the computer and retrieved as a function of their similarity to the current text being translated. As computational linguistic techniques were developed throughout the 1980s, Alan Melby was prominent in proposing the integration of various tools into a translator’s workstation at various levels: the first level would be basic word-processing, telecommunications and terminology management tools; the second level would include a degree of automatic dictionary look-up and access to translation memory; and the third would involve more sophisticated translation tools, up to and including fully automatic MT. Into the 1990s and the present day, commercial MT and CAT packages begin to appear on the market, incorporating many of these ideas. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 16 1 - Contextualizing translation technologies and projects “The translator’s workstation” (Harold Somers) Software with some translation capability will be an integral part of the translator’s workstation. The most important feature of this is that it is under the user’s control. The first thing to note is that commercial MT systems are designed primarily with use by non-linguists in mind. The typical system presents itself as an extended word processing system, with additional menus and toolbars for the translation-related functions including translation memory. […]In its most simple mode of use, the user highlights a portion of text to be translated. The draft translation is then pasted in the appropriate place in the target text window, ready for post-editing. If the user can determine what text is to be translated, they will quickly learn to assess what types of text are likely to be translated well, and can develop a way of working with the system, translating more difficult sections immediately by hand, while allowing the system to translate the more straightforward parts. […] Many [CAT] systems offer a choice of interactive translation in which the system stops to ask the user to make choices. Full word processing facilities are available in the target text window to facilitate post-editing. With many systems, the same is true of the source text window, which simplifies the task of pre-editing, i.e. altering the source text so as to give the MT system a chance of doing a better draft translation (“post-editing the source text”). Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 17 1 - Contextualizing translation technologies and projects “The translator’s workstation” (Harold Somers) […] [T]he translator’s workstation represents the most cost-effective facility for the professional translator, particularly in large organizations. It makes available to the translator at one terminal a range of integrated facilities: multilingual word processing, electronic transmission and receipt of documents, spelling and grammar checkers, style checkers or drafting aids, publication software, terminology management, text concordancing software, access to local or remote term banks, translation memory, and access to automatic translation software to give rough drafts. The combination of computer aids enables translators to have under their own control the production of high quality translations. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 18 1 - Contextualizing translation technologies and projects Computer-assisted translation (CAT) http://www.hutchinsweb.me.uk/MTJ-1998.pdf Origins of the Translator’s Workstation (John Hutchins) Proposals for the translator’s workstation can be traced back over more than 20 years. Their full integration and acceptance had to await technical developments of the 1990s, but their desirability for the effective utilization of machine aids and translation tools was recognized long ago. The title of workstation has been applied to a number of translation aids, but here we are concerned only with the type of workstation intended for direct use by professional translators knowing both source and target languages, and retaining full control over the production of their translations. Workstations and other computer-based translation tools are traditionally referred to as systems for “machine aided human translation” (MAHT), in order to distinguish them from MT systems with some kind of human assistance either before or after processing (pre- and post-editing), known often as “human aided machine translation” (HAMT). Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 19 1 - Contextualizing translation technologies and projects http://www.hutchinsweb.me.uk/MTJ-1998.pdf Origins of the Translator’s Workstation (John Hutchins) The 1966 ALPAC report encouraged support for basic computational linguistics and the development of computer-based aids for translators. Computer-based terminological resources were received with increasing favor by translators from the late 1960s. Particularly in large governmental and industrial organizations, there was an increasingly pressing need for fast access to up-to-date glossaries and dictionaries in science, technology, economics and the social sciences in general. The difficulties were clear: rapidly changing terminology in many scientific and technical disciplines, the emergence of new concepts, new techniques and new products, the often insufficient standardization of terminology, and the multiplicity of information sources of variable quality and reliability. It was recognized from the outset that on-line dictionaries for translators could not be the kinds of dictionaries developed in MT systems. Translators do not need the kind of detailed information about grammatical functions, syntactic categories, semantic features, inflected forms, etc. which is to be found in MT lexica, and which is indeed essential for automatic analysis. Nor do translators need to consult dictionaries for items of general vocabulary-which are equally essential components of an MT system dealing with full sentences. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 20 1 - Contextualizing translation technologies and projects http://www.hutchinsweb.me.uk/MTJ-1998.pdf Origins of the Translator’s Workstation (John Hutchins) In the 1970s, terminology data banks were being built to provide information on demand about individual words or phrases as the basis for the production of glossaries for specific texts, and for the production of published up-to-date specialized dictionaries for general use. Many of the databanks were multilingual, nearly all provided direct online access and most included definitions. In the case of other termbanks, the emphasis was on the provision of terms in actual context. […] The databases were intended not just for translators but also for lexicographers and other documentation workers, with facilities for compiling dictionaries and term glossaries, for producing text-related glossaries for machine-aided translation, for direct online access to multilingual terminology databanks, and for accessing already translated texts by means of indexes. The archive of translations, recorded on magnetic tapes, could also be the source of re-usable translation segments. However, the whole complex of interlinked linguistic databases was constrained by the computer technology then available. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 21 1 - Contextualizing translation technologies and projects http://www.hutchinsweb.me.uk/MTJ-1998.pdf Origins of the Translator’s Workstation (John Hutchins) The use of a translation archive was elaborated by Peter Arthern (1979) in a proposal for what has now, since the late 1980s, become known as a translation memory. The suggestion was made in a discussion of the potential use of computerbased terminology systems in the European Commission. After stressing the importance of developing multilingual text processing tools and of providing access to terminological databanks, Arthern went on to comment that many EC texts were highly repetitive, frequently quoting whole passages from existing EC documents and that translators were wasting much time re-translating texts which had already been translated. He proposed the storage of all source and translated texts, the ability to quickly retrieve any parts of any texts, and their immediate insertion into new documents as required. He referred to his concept as “translation by textretrieval”, and envisioned an early model translator’s workstation which could still accommodate a full MT system. The concept would not come to fruition for another decade or more. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 22 1 - Contextualizing translation technologies and projects http://www.hutchinsweb.me.uk/MTJ-1998.pdf Origins of the Translator’s Workstation (John Hutchins) One of the most decisive moments in the development of the future translator’s workstation is now considered to be the (initially limited) circulation of a memorandum in 1980 by Martin Kay. This combined a critique of the current approach to MT, namely the aim to produce systems which could essentially replace human translators or at best relegate them to post-editing and dictionary updating roles, and an argument for the development of translation tools which would actually be used by translators. Since this was before the development of microprocessors and personal computers, the context was a network of terminals linked to a mainframe computer. Kay’s basic idea was that existing text-processing tools could be augmented incrementally with translation facilities. The basic need was a good multilingual text editor and a terminal with a split screen; to this would be added a facility to automatically look up any word or phrase in a dictionary and the ability to refer to previous decisions by the translator to ensure consistency in translation; and finally to provide automatic translation of text segments, which the translator could opt to let the machine do without intervention and then post-edit the result, or which could be done interactively, i.e. the computer could ask the translator to resolve ambiguities. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 23 1 - Contextualizing translation technologies and projects http://www.hutchinsweb.me.uk/MTJ-1998.pdf Origins of the Translator’s Workstation (John Hutchins) Alan Melby, in 1981, put forward the use of a bilingual concordance as a valuable tool for translators. It enabled translators to identify text segments with potential translation equivalents in relevant contexts. As an example, he showed an English text segmented into phrases and its corresponding French version, segmented likewise. The computer program would then create a concordance based on selected words or word pairs displaying words in context. The concordance could be used not only as an aid to study and analyze translations, but also for quickly determining whether or not a given term was translated consistently in technical texts, to assist translators in lexical selection, and in the development of an MT system for some narrow sublanguage. Melby seems to be the first to suggest concordance application as a translation tool. In his experiment, texts were input manually and correspondences between texts (later called “alignments”) were also made by human judgement. Only the concordancing program was automated, but Melby was clearly looking forward to the availability of electronically produced texts and of automatic alignment. At the same time, he was making specific proposals for a translator’s workstation—quite independently of Kay’s proposals in 1980. Like Kay, Melby wanted the translator to be in control, to make his/her own decisions about when to translate fully and when to post-edit, and he wanted to assist translation from scratch by providing integrated computer aids. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 24 1 - Contextualizing translation technologies and projects http://www.hutchinsweb.me.uk/MTJ-1998.pdf Origins of the Translator’s Workstation (John Hutchins) The aim was the “smooth integration of human and machine translations” (Melby 1982), bringing together various ideas for supporting translators in an environment offering three levels of assistance. At the first level, certain translation aids can be used without the source text having to be in machine-readable form. The translator could start by just typing in the translation. This first level would be a text processor with integrated terminology aids and access to a bilingual terminology data bank, both in the form of a personal file of terms and in facilities for accessing remote termbanks (through telecommunications networks). In addition, there might be access at this level to a database of original and translated texts. At the second level, the source text would be in machine-readable form. It would add a concordancing facility to find all occurrences of an unusual word or phrase in the text being translated, facilities to look up terms automatically in a local term file, display possible translations, and means of automatically inserting selected terms into the text. The third level would integrate the translator work station with a full-blown MT system. Melby suggested that the ideal system would be one which evaluates the quality of its own output (from “probable human quality” to “deficient”), which the translator could choose to incorporate unchanged, to revise or to ignore. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 25 1 - Contextualizing translation technologies and projects http://www.hutchinsweb.me.uk/MTJ-1998.pdf Origins of the Translator’s Workstation (John Hutchins) Both Melby and Kay stressed the importance of allowing translators to use aids in ways they personally found most efficient. The difference between them was that whereas Melby proposed discrete levels of machine assistance, Kay proposed incremental augmentation of translator’s computer- based facilities. Translators could increase their use of computer aids as and when they felt confident and satisfied with the results. And for both of them, full automation would play a part only if an MT system made for greater and cost-effective productivity. These ideas of Kay and Melby were being made when text-processing systems still consisted essentially of a range of terminals connected to a mainframe computer and to separate printers for producing publishable final documents. It was natural to envisage networked systems rather than individual workstations. For ex., Melby assumed that the future scenario was a “distributed system in which each translator has a microcomputer tied into a loose network to share resources such as large dictionaries.” (1982) The technology situation definitively changed with the appearance of the first personal computers in the mid 1980s, providing access to word processing and printing facilities within the range of individual professional translators. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 26 1 - Contextualizing translation technologies and projects As needs change, technologies evolve, and environments are modified, the “tools” and “workspace” of the translator likewise are transformed. The history of the translator’s workstation reflects these changes. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 27 1 - Contextualizing translation technologies and projects Globalization, Internationalization, Localization, Translation (GILT) Globalization (g11n): Refers to a broad range of processes necessary to prepare and launch products and company activities internationally. Addresses the business issues associated with launching a product globally, such as integrating localization throughout a company after proper internationalization and product design. Internationalization (i18n): The process of generalizing a product so that it can handle multiple languages and cultural conventions without the need for redesign. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 28 1 - Contextualizing translation technologies and projects Globalization, Internationalization, Localization, Translation (GILT) Localization (l10n): The process of adapting a product or software to a specific international language or culture so that it seems natural to that particular region. True localization considers language, culture, customs and the characteristics of the target locale. It frequently involves changes to the software’s writing system and may change keyboard use and fonts as well as date, time and monetary formats. Translation: The process of converting all of the text or words from the source language to the target language. An understanding of the context or meaning of the source language must be established in order to convey the same message in the target language. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 29 1 - Contextualizing translation technologies and projects Globalization, Internationalization, Localization, Translation (GILT) Source: Pierre Cadieux, Technology Editor, LISA Newsletter & Bert Esselink, Chief Editor, Language International (http://www.lisa.org/globalizationinsider/2002/03/gilt_globalizat.html) The "GILT slide" puts it all together. * Globalization is a two-step process: internationalization and localization. * There are usually several localization efforts happening in parallel. * Translation is often the largest part of localization. Translation refers to the specifically linguistic operations, performed by human or machine, that actually replaces the expressions in one natural language into those of another. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 30 1 - Contextualizing translation technologies and projects Globalization, Internationalization, Localization, Translation (GILT) Source: Pierre Cadieux, Technology Editor, LISA Newsletter & Bert Esselink, Chief Editor, Language International (http://www.lisa.org/globalizationinsider/2002/03/gilt_globalizat.html) We can see more and more practices and technologies that were previously very specific to the "localization world" entering into the more traditional translation industry. For example, translation memory tools are now commonly used by translators who translate material which is not software related. The concepts of translation and localization may progressively merge. Localization may no longer be a separate discipline since sooner or later all translators will have to know at least the basics of localization – from translation to localization, and back again. * * * Localization basics are best understood through the notion/model of PROJECT. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 31 1 - Contextualizing translation technologies and projects Localisation Research Centre: http://www.localisation.ie/ The Localisation Industry Standards Association: www.lisa.org Localization World: http://www.localizationworld.com/ Inttranews: http://inttranews.inttra.net/cgi-bin/home.cgi?langues=eng&phase=1 Multilingual magazine: www.multilingual.com Common Sense Advisory: http://www.commonsenseadvisory.com/ Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 32 1 - Contextualizing translation technologies and projects Translator’s Tool Box (Jost Zetzsche): http://www.internationalwriters.com/toolbox/ John Hutchins Web site: http://www.hutchinsweb.me.uk/ Translation Automation User Society: http://www.translationautomation.com/joomla/ Byte Level Research: http://www.bytelevel.com/ Jeff Allen’s Post-editing site: http://www.geocities.com/mtpostediting/ Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 33 1 - Contextualizing translation technologies and projects 2. PEDAGOGICAL EXERCISES ◦ Linguistic analysis of text, from perspectives of HT linguistics, MT computational linguistics, and CAT. Goal: to understand how text is generated by humans and by machines, for insight on how it is also translated by humans and machines. Benefit: how to revise HT and MT text. ◦ Review the HT process. Go through the same exercise but indicate how automation and CAT integrate into this process. How does the HT:CAT relationship differ from the MT:CAT one? ◦ Explain the above in terms of the translator’s workstation. ◦ Compare the office work environment and translator’s office work environment in terms of the software MS Office and SDL Trados. ◦ Others? Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 34 2 - Management of technologies, workflow and content 3 - Project management and quality control 1. PROFESSIONAL AND ACADEMIC OBSERVATIONS Project management is a process of decision-making. Resources managed: human; technical; material; financial. There are general principles of project management; nonetheless, every project is unique. Communication is vital to the well-being of the life-cycle and to the success of the project. Standards and best practices are important, as is certification of processes, services and products. The Project Management Institute recognizes five basic groups of processes: initiating; planning; executing; controlling and monitoring; and closing. The PMI recognizes nine knowledge areas: management of project integration, scope, time, cost, quality, human resources, communications, risk, and procurement. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 35 2 - Management of technologies, workflow and content 3 - Project management and quality control •“A Step by Step Guide to Translation Project Management” (Sanaa Benmessaoud 2002) at www.translationdirectory.com/articles/article1543.php •Project Management Institute (PMI) www.pmi.org •“Translation and Project Management” (C.R. Perez 2002) at http://accurapid.com/journal/22project.htm •“Translation Project Management” Andrey Vasyankin at http://www.translationdirectory.com/article65.htm Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 36 2 - Management of technologies, workflow and content 3 - Project management and quality control Project Management: “The Project Management Institute (PMI) (2000: 6) defines project management as ‘the application of knowledge, skills, tools and techniques to project activities to meet project requirements.’” Project Manager: “A project manager (PM) will be required to plan the budget, track the workflow to ensure the project is completed on time, and control all the phases of the project to make sure its outcome will meet the client’s requirements.” Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 37 2 - Management of technologies, workflow and content 3 - Project management and quality control Translation Project’s Life-cycle (adapted from Perez 2002): commissioning, planning, groundwork, translation and wind-up. Steps and phases: •COMMISSIONING •Reception of RFQ [Request For Quotation] •Pre-sales evaluation •Commissioning Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 38 2 - Management of technologies, workflow and content 3 - Project management and quality control •PLANNING •Project evaluation: identify client’s needs and objectives, as well as short-term and longterm goals •Work sub-division: break-down structure and work packages •Schedule plan of dependences and sequences: i.e. which work package or activity depends on the completion or sequence of another •File management •Resource and budget plan •Communication plan •Quality Assurance plan: to evaluate overall project performance on a regular basis to provide confidence that the project will satisfy the relevant quality standards [PMI 2000] Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 39 2 - Management of technologies, workflow and content 3 - Project management and quality control •GROUNDWORK •Project glossary preparation •Text alignment •Text preparation •TRANSLATION •WIND-UP How complex have projects become? http://www.project-open.com/solution/translation/ Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 40 2 - Management of technologies, workflow and content 3 - Project management and quality control A brief word on file management…… needed for tracking jobs and for storing data (project; client; translator) Question: how would you create and manage your files? Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 41 2 - Management of technologies, workflow and content 3 - Project management and quality control 2. PEDAGOGICAL EXERCISES Think through, visualize and depict the flow of content and work within your organization or company. [This exercise is crucial, for example, when conceptualizing databases and putting them into place.] If you had to propose and explain your organizational/company file management structure to consultants or to new project managers, what would this structure be like? Simulate and carry out a hypothetical project with the class. Connect with an NGO or other organization to carry out a real project. Concordia U projects include: Ad-Com Loc company; YMCA Tours Ecuador; Committee for Social Justice; Romani Yag Web site; Tactical Tech [for Africa]. Find, read and discuss PM position profiles. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 42 2 - Management of technologies, workflow and content 3 - Project management and quality control Company information: WJ Airways management recently decided to expand business operations internationally, with the hope that global operations would nourish local domestic business by bringing in passengers to travel within the country. International routes will include flights to and from India, Africa, the Middle East, China, Latin America and Canada. A central hub will be established in Budapest, Hungary. The company will increase its fleet of aircraft from 50 to 75 within three years, and offer special vacation packages for international tourists. Corporate decisions: • Localize in-flight magazine, safety instructions (laminated card, video), TV screens (publicity, info, movies, music, maps), company financial application, Web site (including online reservation system), and customer service. • Localize above service and product content from English into Hindi, Swahili, Arabic, Chinese, Spanish, Portuguese, French and Hungarian. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 43 2 - Management of technologies, workflow and content 3 - Project management and quality control Lay out the territory to cover the project life-cycle from beginning to end. Formulate relevant questions and preliminary answers. Assess and integrate human, material and technological resources. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 44 2 - Management of technologies, workflow and content 3 - Project management and quality control Programmers and Engineers Source Content Writers and Developers PROJECT MANAGER Target Content Writers and Developers Legal, Commercial and Cultural Consultants Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 45 2 - Management of technologies, workflow and content 3 - Project management and quality control What is content …? •“Any digitized information—text, document, image, video, structured record, script, application code, or metadata—that conveys meaning or represents value in interactions or transactions. It ranges from documents to HTML to graphics to telematics and beyond.” (DePalma, Common Sense Advisory, 2008) •“A system of words, images, audio and video that is integrated with information architecture and visual design to communicate…” (Harris and McCormack 2000) The content carries communication! Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 46 2 - Management of technologies, workflow and content 3 - Project management and quality control Define project content… •In-flight magazine •Safety instructions (laminated card, video) •TV screens (publicity, info, movies, music, maps) •Company financial application •Web site (including online reservation system) •Customer service Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 47 2 - Management of technologies, workflow and content 3 - Project management and quality control Define project content format … •In-flight magazine color desktop-published 100-page magazine bilingual text entries (English + language representing flight route) •Safety instructions (laminated card, video) color desktop-published laminated card based on images and simplified explanations in bilingual version (English + language representing flight route) video film with audio providing detailed explanations (subtitled) Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 48 2 - Management of technologies, workflow and content 3 - Project management and quality control Define project content format … •TV screens (publicity, info, movies, music, maps) video film clips or pub spots from sponsors (subtitled or dubbed) real-time flight information in moving bilingual text [dynamic] films (subtitled or dubbed into English or other language) music (channels should include local music) maps (territory covered by flight route and plane icon representing real-time movement [dynamic] Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 49 2 - Management of technologies, workflow and content 3 - Project management and quality control Define project content format … •Company financial software application general ledgers and sub-ledgers to create income statements, balance sheets, and to track assets, liabilities, income and expenses, including modules for billing, job costing, points of sale ability to transfer information and funds between branches; to import data from other modules, systems, and spreadsheets; and to generate reports ability to measure adherence to industry or government accounting standards in currencies of all branches Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 50 2 - Management of technologies, workflow and content 3 - Project management and quality control Define project content format … •Web site (including online reservation system) splash page with options to enter site in all designated languages flights page: routes, destinations, schedules, booking, check-in guest and member pages: including profiles and bookings, itineraries, email newsletters or items of interest special offers: including vacation packages, car and hotel rentals rewards and airmiles travel information: including check-in times and methods; ID and travel documents; special needs; travel tips; international travel info company information: welcome; jobs; media and investors; sponsorship; online store reasons for flying with the company: marketing and PR contact info: + FAQs; management profiles; specific request forms Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 51 2 - Management of technologies, workflow and content 3 - Project management and quality control Define project content format … •Customer service customer service agent training course on-site on-line assistance information for phone and on-site presence in English and localized in all other languages Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 52 2 - Management of technologies, workflow and content 3 - Project management and quality control A brief word about content format … Content data is stored in many different formats. Different software applications represent and store information in different ways. Get to know your file extensions! A file extension is nothing more than the last characters after the period in the name of a file. FILExt is a database of file extensions and the various programs that use them. If you know the file extension, simply enter it into the search box on the left and click on the Search button. (http://filext.com/) For example, if we run a search on .pdf: •Acrobat Portable Document Format •The PDF format has become a standard for document transfer between computer architectures. A PDF file retains formatting for the file being transmitted. Free viewers are available at the Adobe website and other locations. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 53 2 - Management of technologies, workflow and content 3 - Project management and quality control Define project roles and players … PROJECT MANAGER(S) PROGRAMMERS AND ENGINEERS SOURCE CONTENT WRITERS AND DEVELOPERS TARGET CONTENT WRITERS AND DEVELOPERS LEGAL, COMMERCIAL AND CULTURAL CONSULTANTS Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 54 2 - Management of technologies, workflow and content 3 - Project management and quality control Define project needs … PROJECT MANAGER(S) • +project coordinators PROGRAMMERS AND ENGINEERS for • software applications (PM; workflow; financial) •Web site [static and dynamic] content and applications •database(s) •graphics localizers SOURCE CONTENT WRITERS AND DEVELOPERS •technical writers •desktop publishers [print and Web] •Web site developers and Webmasters •audio-visual producers •Subject Matter Experts (SMEs) •terminologists •style guide writers Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 55 2 - Management of technologies, workflow and content 3 - Project management and quality control Define project needs … TARGET CONTENT WRITERS AND DEVELOPERS •desktop publishers •Webmasters •terminologists and lead linguists •CAT, Loc and MT tool/technology specialists (if not done by PMs) •translators (with appropriate SME expertise) •editors •proofreaders and Quality Control •subtitlers and dubbers (+ technicians) LEGAL, COMMERCIAL AND CULTURAL CONSULTANTS among others ….. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 56 2 - Management of technologies, workflow and content 3 - Project management and quality control COMPLEXITY can quickly turn CHAOTIC and so we turn to automation when we can Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 57 Point of Discussion 2 Class Project Planning and Implementation You have decided to organize a real-life class project so as to more effectively contextualize the use of translation technologies within a project framework. Discuss and plan the project details for your class. Create a project spec sheet. Points to include: Content Languages and geographical regions Players and roles Resources (including technologies) Client specifications Project life-cycle phases, procedures, tasks Student participation and evaluation * Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 58 4 - Reusing and recycling: alignment What is alignment? Some definitions… Alignment: a process that matches up a source text and the target text segment by segment into translation pairs, which will be stored up in a database to be used as a translation memory. Alignment makes it possible to reuse previous translations in future translations. Human input is required in alignment operations. Alignment tool: translation software for the creation of bilingual text databases where sentences (or phrases) of source texts are linked to corresponding text segments of a target language. Segment: a predefined unit of a source text that can be aligned with its corresponding translation in a machine or machine-aided translation system. Segmentation: refers to sentence separation in a machine translation system, the purpose of which is to divide a text into easily manageable segments. Segmentation is unnecessary in some languages, but important in others. In the case of Chinese, one of the most intriguing issues in Chinese-English translation is the problem of segmenting the Chinese source text as there are no interval markers, or word boundaries, between two successive characters or phrases in a Chinese sentence. A Dictionary of Translation Technology Chan Sin-wai, The Chinese University Press, 2004 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 59 4 - Reusing and recycling: alignment What is alignment? Alignment is the process of binding a source-language segment to its corresponding target-language segment. The purpose of alignment is to create a new translation memory database or to add to an existing one. The corresponding pairs of source and target-language segments are called “translation units”. Once the translator has loaded the parallel texts—an original and its translation—into the system, the tool makes a proposal for aligning the segments based on a number of algorithms such as punctuation, numbers, formatting, names and dates, for which the translator is offered various choices. The translator can then adjust the alignment proposed by the system before committing the aligned texts to the memory, either by creating a new one, for ex., for a new subject field or new client, or by adding to an existing one. Translation units are usually numbered or tagged. The collection of translation units is stored, in no particular order, in the database for future translations. Most commercial alignment tools allow alignment at the sentence level. However, in recent years the attention of researchers is also focused on alignment methods for translation memory systems below the sentence level. Translation and Technology C.K. Quah, Palgrave Macmillan, 2006 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 60 4 - Reusing and recycling: alignment What is alignment? Alignment is the process of comparing a source text and its translation, matching the corresponding segments, and binding them together as translation units in a TM. For the best results (automatic alignment), the source and target texts must have a similar, if not identical, structure. Alignment is the process whereby sections of the source text are linked up with their corresponding translations. Alignment can take place at many different levels: text, paragraph, sentence, sub-sentence chunk, or even word. Most bilingual concordancers align texts at either the paragraph or the sentence level. Alignments at text level are too high-level to be useful for helping translators find an equivalent for a particular expression, whereas alignment at word level is notoriously difficult and error-prone given the lack of one-to-one correspondence between most natural languages. Computer-Aided Translation Technology Lynne Bowker, University of Ottawa Press, 2002 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 61 4 - Reusing and recycling: alignment How is alignment managed with a CAT tool? (SDL TRADOS) Alignment is the process of determining which parts of the source and target language files belong together and putting them side by side or aligning them. The user plays an interactive role in the alignment process which enhances the alignment results. WinAlign examines the source and target language texts to determine which sentence pairs belong together and creates a file which is then imported into Translator’s Workbench. WinAlign is based on Unicode and supports all languages supported by Windows 2000 and Windows XP, including Asian languages, bi-directional languages, and Unicode-only languages such as Hindi. Alignment Concepts: Structure Recognition: When linking source and target texts, WinAlign makes use of the fact that documents are usually structured and divided into various sections. For example, when a document is created in Microsoft Word, it usually contains structural elements identified by style names. The chapter title may be identified by the Heading 1 style. The same formatting must be preset in the translated text as in the source. Other text formats, such as HTML, XML, FrameMaker, Interleaf and Ventura, use tags for this purpose. WinAlign uses this information to create a structure tree for the source and target documents, and allows you to interactively influence how this tree is built. Even when the document pairs do not have a clear structure, WinAlign can use font sizes and paragraph numbering to perform structure recognition. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 62 4 - Reusing and recycling: alignment Identifying Segment Pairs: Once this structure has been determined, WinAlign begins linking the individual segments. A segment is a sentence, title, footnote, table cell, list element, caption or any other textual unit that WinAlign identifies. The program examines the source and target texts carefully to create the most accurate segment alignments possible. Both context-related and content-related characteristics are taken into consideration. WinAlign analyses all features of the file, for example, index entries, footnotes, proper names, numbers, dates, formatting or tags. The program also provides tuning options to determine how much importance should be placed on these source and target text elements during the alignment. The user can help optimize the alignment by supplying project-specific abbreviation and terminology lists. WinAlign considers a large number of factors during alignment, which helps to produce a high number of matching segment pairs. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 63 4 - Reusing and recycling: alignment Alignment Workflow: The alignment workflow is summarized in the following steps: 1 Create a new alignment project in WinAlign. 2 Add source files and target files. 3 Align the source and target files. 4 Review the alignment. 5 Save the alignment project and export the alignment results. 6 Import the alignment results into a Translator’s Workbench translation memory. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 64 5 - Translation memory What is translation memory? Some definitions… • Translation memory: a computer-aided translation program. In essence it is a database that stores translated sentences (translation units or segments) with their respective source segments in a database (the “memory”). For each new segment to be translated, the program scans the database for a previous source segment that matches the new segment exactly or approximately (a fuzzy match) and, if found, suggests the corresponding target segment as a possible translation. A translator can then accept, modify or reject the suggested translation. Translation memory system: refers to a type of machine-aided human translation tool that stores previous translations and offers these translations when identical or similar sentences are encountered when translating new materials. Similarity match: a type of matching scheme for the free-form queries in a computer-aided translation system. The queries are first passed through the system and the browser performs a similarity match between the internal representation of the queries and the internal representation of each sentence in the database. In this way, both surface similarity and structural similarities can be matched. • • A Dictionary of Translation Technology Chan Sin-wai, The Chinese University Press, 2004 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 65 5 - Translation memory Translation memory has been defined as a “multilingual text archive containing (segmented, aligned, parsed and classified) multilingual texts, allowing storage and retrieval of aligned multilingual text segments against various search conditions” (EAGLES 1996—The Expert Advisory Group on Language Engineering Standards). Unlike machine translation systems, which generate translations automatically, translation memory systems allow professional translators to be in charge of the decision-making whether to accept or reject a term or an equivalent phrase or segment suggested by the system during the translation process. Virtually all TM systems are language-independent and support international character sets that represent many, if not all, alphabets and scripts digitally. Translation memory technology works by reusing previously translated texts and their originals in order to facilitate the production of new translations. It can also interface with databases of stored specialized terminologies that can be accessed and retrieved for reuse in new translations. Translation and Technology C.K. Quah, Palgrave Macmillan, 2006 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 66 5 - Translation memory A translation memory system has no linguistic component, and two different approaches are employed to extract translation segments from the previously stored texts. These are known as perfect matching and fuzzy matching. • A perfect or exact match occurs when a new source language segment is completely identical including spelling, punctuation and inflections, to the old segment found in the database, that is in the TM. • Unlike a perfect match, a fuzzy match occurs when an old and a new source language segment are similar but not exactly identical. Even a very small difference such as punctuation leads to a fuzzy match. Translation and Technology C.K. Quah, Palgrave Macmillan, 2006 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 67 5 - Translation memory As the degree of similarity between old source segments in the database or memory and new source text segments currently being translated may vary, an algorithm is used to calculate a percentage which expresses the degree of match. The higher the percentage of the fuzzy match the closer the similarity between the two source language segments. The threshold percentage can be set by the user at a high level, for instance at 90%, to restrict the retrieval of old source language segments to those containing only small differences from the new source language segment. In contrast, the threshold can be set at a low level, for instance at 10%, to allow the translation memory to retrieve segments only weakly related to the new segment. Segments that mean the same thing but differ in format such as dates, measurements, time and spellings all fall in the fuzzy match category although they are differently categorized. Some systems allow for the automatic processing of such changes. Polysemous and homonymous words, that is homographs, always need careful handling a present a challenge. Translation and Technology C.K. Quah, Palgrave Macmillan, 2006 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 68 5 - Translation memory Segmentation is the process of breaking a text up into units consisting of a word or a string of words that is linguistically acceptable. Segmentation is needed in order for a TM to perform the matching (perfect and fuzzy) process. A pair of old source and target language texts is usually segmented into individual pairs of sentences. However, not all parts of texts, particularly specialist texts, are in a sentence format. Exceptions include headings, lists and bullet points. As a result, different units of segmentation are needed. A translator can decide the length of a segment but often punctuation is used as an indicator. A segment is then allocated a unique number or tag by the system. It is important to note that while segmentation is quite natural for Latin-based alphabets, it is rather alien to languages such as Chinese, Thai and Vietnamese, which are written continuously without any spaces between characters. Thus, other methods of segmentation are required to determine the beginning and ending of a segment in such cases. New segments can be added to the TM while translating, and alternatively previously translated source language texts and their translations can be entered into the memory through a process of text alignment. Translation and Technology C.K. Quah, Palgrave Macmillan, 2006 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 69 5 - Translation memory Most simply, a TM can be viewed as a list of source text segments explicitly aligned with their target text counterparts. The resulting structure is sometimes referred to as a parallel corpus or a bitext. Translation units are stored in the TM database. Some sophisticated TM programs use a type of technology called a neural network to store information. A neural network allows information to be retrieved more quickly than a sequential search technique. The essential idea behind a TM system is that it allows a translator to reuse or recycle previously translated segments. Reusing a previous translation in a new text is sometimes referred to as “leveraging”. How does a TM system work? This technology works by automatically comparing a new source text against a database of texts that have already been translated. When a translator has a new segment to translate, the TM system consults the database to see if this new segment corresponds to a previously translated segment. If a matching segment is found, the TM system presents the translator with the previous translation, and the translator decides whether or not to incorporate it into the new translation. Computer-Aided Translation Technology Lynne Bowker, University of Ottawa Press, 2002 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 70 5 - Translation memory Segmentation: In most instances, the basic unit of segmentation is the sentence. However, not all text is written in sentence form. Headings, list items and table cells are familiar elements of text, but they may not strictly qualify as sentences. Therefore, many TM systems allow the user to define other units of segmentation in addition to sentences. These units can include sentence fragments or entire paragraphs. Deciding what constitutes a segment is not a trivial task. How can the TM system identify sentences? Punctuation parks such as periods, exclamation points, and question marks are typically used. Problematic cases are abbreviations, or section headings, or embedded sentences. Some of these problems can be resolved by incorporating stop lists (eg. Lists of abbreviations that do not indicate the end of a sentence, such as Mrs. and e.g.) into the TM system. An additional issue is the fact that the segmentation units used in the source text may not correspond exactly to those used in the translation. This lack of one-to-one correspondence can create difficulties for automatic alignment programs. Computer-Aided Translation Technology Lynne Bowker, University of Ottawa Press, 2002 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 71 5 - Translation memory Matches: most TM systems present the user with a number of different types of segment matches. The most common types are exact, fuzzy, and term matches. Research is being done on full and sub-segment matches. Exact matches are the most straightforward or perfect matches. An exact match is 100% identical to the segment that the translator is currently translating, both linguistically and in terms of formatting. The process used by the TM system to identify perfectly matching segments is one of strict pattern matching. This means that the two strings must be identical in every way, including spelling, punctuation, inflection, numbers, and even formatting. Any segment in the new source text that does not match an original segment precisely will not produce an exact match. The translator is not forced to accept the translation proposed by the TM system. Even though a segment may be identical, translators are concerned with translating complete texts rather than isolated segments so it is important to read the proposed translation in its new context to be sure that it s both stylistically appropriate and semantically correct. Full matches occur when a new source segment differs from a stored TM unit only in terms of so-called variable elements, which are sometimes referred to as “placeables” or “named entities”. Variable elements include numbers, dates, times, currencies, measurements, and sometimes proper names. These elements typically require some kind of special treatment in a text. TM systems need to ignore variable elements for matching purposes. Computer-Aided Translation Technology Lynne Bowker, University of Ottawa Press, 2002 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 72 5 - Translation memory Fuzzy matches are approximate or partial matches. A fuzzy match retrieves a segment that is similar, but not identical, to the new source segment. Some TM systems use color coding to illustrate various types of differences between the new source text segment and the retrieved segment. The degree of similarity in a fuzzy match can range from 1% to 99%, and the user generally has the ability to set the sensitivity threshold to allow the TM system to locate previously translated segments that may differ only slightly from the new source text segment or segments that vary greatly. If the sensitivity threshold is set too high, there is a risk that the TM will produce “silence”: potentially useful partial matches will not be retrieved. However, if it is set too low, the system will produce “noise”: the suggested translations that are retrieved will be too different from the new source text segment and therefore not helpful. When the threshold is very low, a match may be made on the basis of very general words (“the”, "and”) and the overall content of the retrieved segment may contain little of value for helping the translator to translate the new segment. Many translators prefer to set the threshold somewhere between 60% and 70%. Although fuzzy matching can be useful, it requires careful proofreading and editing to ensure that the proposed translation is appropriate for inclusion in the new target text. Computer-Aided Translation Technology Lynne Bowker, University of Ottawa Press, 2002 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 73 5 - Translation memory Term matches are done through the process of active terminology recognition and essentially constitutes automatic dictionary lookup. If one or more terms are recognized as being in the term base, the TM system points to the appropriate term records and the translator can then make use of the relevant information contained there. This means that when no exact or fuzzy matches are found for source text segments, the translator might at least find some translation equivalents for individual terms in the term base. Sub-segment matching falls partway between fuzzy and term matching. In fuzzy matching, the two segments must have a number of elements in common in order for a match to be established. In term matching, the new source segment is compared against entries in the term base. In the case of sub-segment matching, the elements that are compared are smaller chunks of segments. This means that a match can be retrieved between two small chunks of segments, even if the complete segments do not have a high degree of overall similarity. When both segments contain a chunk that is very similar indeed, there is a possibility that the translator may be able to reuse that chunk. Further refined, a combined full segment/sub-segment approach allows the TM system to automatically compare the new source text segment against the stored TM. It will begin by examining complete segments, first looking for exact matches and then for fuzzy matches, and if no such match is found at the segment level, it will compare increasingly smaller chunks in an effort to find a match. In this way, the translator may be presented with subsegment matches originating from several different segments, even if none of those complete segments qualified as a fuzzy match. Computer-Aided Translation Technology Lynne Bowker, University of Ottawa Press, 2002 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 74 5 - Translation memory This strategy is similar to the approach used in example-based machine translation (EBMT). The principal difference between a TM as a support tool and a full-fledged EBMT System is basically a question of who has the primary responsibility for analysis of the segments and formulation of the target text, whereas with EBMT, the computer is responsible for producing a complete draft of a target text, though this may still need to be post-edited by a human translator. No matches: in which case the translator must translate from scratch. Another option is to use a machine translation system to translate the portions of the source text for which no match was found in the TM. Computer-Aided Translation Technology Lynne Bowker, University of Ottawa Press, 2002 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 75 5 - Translation memory There are two main ways in which translations can be entered into the TM database: through interactive translation or through post-translation alignment. Interactive translation has the potential to produce a TM that is high in quality but initially low in volume, where post-translation alignment has the potential to produce a TM that is higher in volume but (possibly) lower in quality. It is entirely possible to build a TM using a combination of both. Computer-Aided Translation Technology Lynne Bowker, University of Ottawa Press, 2002 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 76 5 - Translation memory Interactive translation is the most straightforward way for translators to construct a TM, adding translation units to the memory as they go along. Each time the translator translates a source text segment, the paired translation unit can be stored in the TM database. Once a segment has been translated and stored, it immediately becomes part of the TM. This means that if that segment, or a similar one, occurs again in the text-even in the very next sentence- the previous translation is suggested to the translator automatically. The translator then has the choice of accepting the previous translation or editing it if the context requires change. Note that many TM systems can also be networked, which means that multiple translators can contribute to one TM, and the volume of data that it contains can be built more quickly. In a networked situation, it is possible to give different types of privileges to different users in order to exercise some form of quality control. For ex., all users can be given permission to consult the TM, but the ability to add new TUs can be restricted to revisers or senior translators. Computer-Aided Translation Technology Lynne Bowker, University of Ottawa Press, 2002 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 77 5 - Translation memory Working with an existing TM: there are two main methods – interactive mode and batch mode. A translator working in interactive mode proceeds to work through the new source text segment by segment, and the TM system attempts to match the segments stored in the database against the new source text segments. As each new segment is translated, the TU is immediately added to the TM and is available for reuse the next time an identical or similar segment is encountered. In the second, most TM systems also allow for batch translation, sometimes referred to as pre-translation, which means that a user can run a complete source text through the system, and whenever it finds an exact match, it will automatically replace the new source text segment with the translation that is stored in the TM. Segments for which no match is found must later be translated by either a human translator or a machine-translation system. In either case, the entire text must then be post-edited by a human translator to ensure that the replacements made by the system were correct. If the translator makes changes to any matches that were inserted automatically, these changes can subsequently be added to the TM to keep it up to date. Computer-Aided Translation Technology Lynne Bowker, University of Ottawa Press, 2002 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 78 5 - Translation memory TM systems are often integrated with other tools: With terminology-management systems -- the TM system compares the source text segments against the previously translated segments stored in the TM database and at the same time, using a process known as active terminology recognition, the TMS compares the individual terms contained in each source text segment against the terms contained in the term base. If the term is recognized as being in the term base, the translator’s attention is drawn to the fact that an entry exists for this term, and the translator can view the term record and then insert the term from the record directly into the target text. With bilingual concordancers – which allow the user to retrieve all instances of a specific search string and view these occurrences in their immediate context. This means that a translator can ask to see all the occurrences of any text fragment (not just a pre-defined segment) that appear anywhere in the TM, along with their translation equivalents. This allows the translator to quickly view the search string in context together with its translations, which may not always be the same. With machine translation systems – where a new source text is first compared against a TM, which will replace those segments for which exact matches are retrieved. The segments that are still untranslated can be fed into a machine translation system, which produces a draft translation. The entire document is then passed on to a human translator for post-editing. The final translation can be aligned with the original source text and stored in the TM database for future reuse. Computer-Aided Translation Technology Lynne Bowker, University of Ottawa Press, 2002 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 79 5 - Translation memory Most current commercial TM systems offer a quantitative evaluation of the match in the form of a score, often expressed as a percentage, and sometimes called a fuzzy match score or similar. How this “score”, is arrived at can be quite complex, and is not usually made explicit in commercial systems, for proprietary reasons. In all systems, matching is essentially based on character-string similarity, but many systems allow the user to indicate weightings for other factors, such as the source of the example, formatting differences, and even significance of certain words. The character-string similarity calculation uses the well-established concept of “sequence comparison”, also known as the “string-edit distance” because of its use in spell checkers, or more formally the “Levenshtein distance” after the Russian mathematician who discovered the most efficient way to calculate it. The string-edit distance is a measure of the minimum number of insertions, deletions and substitutions needed to change one sequence of letters into another. For ex., to change “waiter” into “waitress” requires one deletion and three insertions. The measure can be adjusted to weight in favor of insertions, deletions or substitutions, or to favor contiguous deletions over non-contiguous ones. In fact, the sequencecomparison algorithm developed by Levenshtein, which compares any sequences of symbols—characters, words, digits, etc.—has a huge number of applications, ranging from file comparison in computers, to speech recognition (sound waves represented as sequences of digits), comparison of genetic sequences such as DNA, image processing…in fact anything that can be digitized can be compared using Levenshtein distance. “Translation Memory Systems”, Harold Somers Computers and Translation, A translator’s guide, 2003 Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 80 5 - Translation memory How is translation memory managed with a CAT tool? (SDL TRADOS) Translator’s Workbench is a sophisticated database system that is built around the core concept of translation memory, a method of capturing, storing and reusing translations. Archived translations are stored in translation memory databases. Translator’s Workbench supports interactive translation through the interface with popular editing environments such as Microsoft Word and TagEditor. This interface provides direct access to the translation memory database while translation is in progress. During translation with Translator's Workbench, the program builds a linguistic database that stores all translated sentences or segments with their source language equivalents. These segment pairs are referred to as translation units. At the same time, Translator’s Workbench builds an artificial neural network that is based on the content of the linguistic database. The neural network is designed to facilitate fast and efficient searching using fuzzy matching techniques. The linguistic database and its associated neural network are together referred to as a translation memory. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 81 5 - Translation memory Each new translation memory is empty. You can build translation memory interactively or by importing aligned sentence pairs. During interactive translation, Translator’s Workbench automatically updates the translation memory that is open in the background. Each time you translate a segment of text, the corresponding translation unit is added to the translation memory. If you encounter the same or similar text in your source document twice, Translator’s Workbench proposes your previous translation(s). You can accept, reject or edit these suggestions – both new and updated translations are added to the translation memory. In this way, the translation memory grows dynamically during the translation process. You can also populate new or existing translation memories by importing previously translated material. The import feature enables you to transfer data from one translation memory to another, or to load translation memory data from WinAlign alignment projects. In this way, you can take advantage of existing translations when starting a new project. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 82 5 - Translation memory During translation, Translator’s Workbench uses database technology to search the translation memory and propose previous translations for reuse. The search is based on the degree of similarity between the source segment for translation and the source segments of translation units that are stored in translation memory. Translator’s Workbench expresses the degree of similarity between these source segments in terms of a percentage value. An identical match is therefore known as a 100% match, and is likely to provide the best available translation for the source segment you are translating. As well as proposing identical matches, Translator's Workbench uses a technique known as fuzzy matching. Source segments from translation memory that are similar, but not identical, to the source segment for translation are known as fuzzy matches. Fuzzy match values can range from 99% to 30%, though a minimum match value of 70% is usually enforced during interactive translation. Translator’s Workbench allows you to view all fuzzy matches in turn, and highlights the differences between translation memory content and the source segment for translation. This helps you to choose the best available translation for the source segment you are translating. As usual, you can accept, reject or edit suggestions. As well as facilitating interactive translation, the fuzzy matching technique is also used during other types of translation memory search. The concordance feature and project management utilities such as document analysis and pre-translation all use fuzzy matching to identify translation memory content that is suitable for reuse. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 83 5 - Translation memory Concordance Searching The concordance feature in Translator’s Workbench allows you to search the translation memory for fragments of text or subsegments that are similar or identical to the text you are translating. Translator’s Workbench presents the search results as a list of source segments from translation memory in which the search text occurs, with their corresponding translations. You can configure Translator’s Workbench to automatically run a concordance search when no match is found for the current source segment in translation memory. Alternatively, you can run a manual concordance search using the Concordance command which is available from the Tools menu in Translator’s Workbench. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 84 5 - Translation memory Batch Tools Translator’s Workbench facilitates project management by providing batch tools for the analysis, pre-translation and post-production of files. The batch tools are socalled because they allow you to process files individually or in batches. The analysis and pre-translation features help you to identify and apply reusable translation memory content before interactive translation begins. In this way, you can derive maximum benefit from existing translation memory content and reduce the requirement for human translation on new projects. The clean up feature is used after translation to remove unwanted source text from translated documents and update the translation memory in accordance with the latest changes. This ensures maximum consistency between the content of your translated documents and your translation memory. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 85 5 - Translation memory Translation Memory Data Format In a file-based translation memory, linguistic data is stored in a TMW file; the TMW file is associated with a group of neural network files that enable fuzzy search capability. In a server-based translation memory, linguistic and neural network data is stored as a group of database tables in a database management system. The database management system resides on a database server. Although the method of data storage for each type of memory is different, the data format remains the same. In each case, the basic unit of translation memory data is the translation unit or segment pair. This means that linguistic data from either type of memory is presented and manipulated in the same way during interactive translation, project management and maintenance procedures. Furthermore, serverbased translation memories use the same import and export formats as file-based translation memories. This facilitates the exchange of data between the two types of translation memory. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 86 5 - Translation memory TradosTag Bilingual File Format TradosTag is the default bilingual file format in Translator’s Workbench and TagEditor. During the translation process, TagEditor converts all formats to TradosTag, which is an XML-based format for representing tagged text and bilingual data for translation purposes. Text and formatting information are extracted from the native file format and presented in an abstracted file format, TradosTag. TradosTag files have a TTXextension. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 87 5 - Translation memory Active Terminology Recognition MultiTerm is integrated with Translator's Workbench to provide active term recognition during translation. This means that translations of terms stored in MultiTerm are automatically suggested as you translate your documents. Even if Translator's Workbench cannot find a suitable segment match for the current source segment in translation memory, it can still help by retrieving information at term level. Matching terms from the MultiTerm termbase are highlighted in the Workbench source window. The corresponding termbase entry is displayed in the Workbench terminology window.You can easily paste the target term into the document you are translating, or carry out a further termbase search. Active term recognition uses the fuzzy matching technique to identify terms that are identical or similar to the content of your source text. Active term recognition can find not only reduced word forms (for example, base forms of verbs) but also root forms of compound words, even if the elements of these compound words are spread throughout the source segment. You can add terms directly to your termbase from within Translator’s Workbench and Word, or from within TagEditor, enriching your termbase as you work. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 88 5 – Translation memory When creating a new translation memory, Translator's Workbench creates five new files: a database file in which the translation units are stored and four neural network files required for fuzzy searches. *.tmw is the main translation memory database file, and *.mdf, *.mtf, *.mwf,*.iix are the neural network files. If you want to copy or move a translation memory, copy or move all five files. Otherwise Translator's Workbench displays an error message when opening the copied translation memory. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 89 6 - Tagged content and translation How is tagged content managed with a CAT tool? (SDL TRADOS) What are tags? Tags are brief coded statements that contain information about formatting and structure in the tagged text file. How this information is represented differs from one file format to another; this is why most tags are file format-specific. However, certain general characteristics apply to all tagged formats and their representation in TagEditor. Opening and closing tags – these tags work in pairs to invoke and revoke an instruction. The opening tag indicates the start of a character format or structural element such as a heading. The closing tag marks the end of the formatting or structural element. A typical example of such a tag pair is one indicating the beginning and end of an HTML file, or indicating the scope of bold formatting. Text and other tag pairs may occur in between the opening and closing tags for a particular instruction. Stand-alone tags – stand-alone tags work independently, for example the image tag in HTML. Stand-alone tags are easy to recognise since they do not have sharp edges. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 90 6 - Tagged content and translation How is tagged content managed with a CAT tool? (SDL TRADOS) What are tags? TagEditor classifies all tags as external or internal, depending on their function: External tags – external tags have a black border by default. They typically represent structural information. These tags and their content are completely ignored during translation and can only appear outside sentences.You rarely need to move or delete external tags during translation. Internal tags – internal tags have a red border by default. These tags may represent formatting information (such as bold), surround hyperlinks or other markers, and may appear inside the text. Most internal tags can be moved around within the sentence to suit the translation. Depending on the file format, some internal tags can be added or deleted as required. By default, TagEditor classifies unknown tags as internal. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 91 6 - Tagged content and translation When tags contain text other than structural or formatting information, TagEditor classifies the text content as translatable or non-translatable: Non-translatable tags – tags containing text that does not require translation are classified as non-translatable. Most tags that contain text are non-translatable. Nontranslatable tags function as internal tags. Translatable text within tags – when tags contain text that requires translation, TagEditor displays the tag in three parts: the text to be translated appears as normal text and the parts of the tag that surround it appear as interconnected parts. You can customise the way TagEditor treats translatable text within tags. During translation, TagEditor inserts its own tags to mark source and target segments and to provide information about translation memory match values. Translation unit tags – delimiting tags identify the source segment, match value and target segment, respectively, of a regular translation unit. The tag content of each document is vital to its integrity. By default, TagEditor protects both external and internal tags in a document and ensures that they stay in place during translation. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 92 6 - Tagged content and translation Tag Editor and Translator’s Workbench (TWB) TagEditor is a specialized application designed for translating and editing tagged text files. Tagged text formats play an increasingly important role in document authoring and translation. For example, HTML tags are used to define the structure and layout of pages on the World Wide Web. Standardised General Markup Language (SGML) and Extensible Markup Language (XML) are also used for structuring complex documentation. Workbench RTF is a Rich Text Format that is compatible with Translator’s Workbench.You can use either TagEditor or Word to translate Workbench RTF. TradosTag (TTX) Bilingual Format is the default file format for bilingual documents in TagEditor. It is an XML-based format that provides a standard method for representing tagged text formats and bilingual data for translation purposes. TradosTag files have a *.ttx extension. During interactive translation, TagEditor converts monolingual source files to the TradosTag bilingual file format. TagEditor also supports files that have already been converted to TradosTag before interactive translation. After translation and any post-translation tasks such as review, tag verification or clean up, target files are saved in the original file format. Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 93 6 - Tagged content and translation Screen shot: Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 94 6 - Tagged content and translation Screen shot: Debbie Folaron (Concordia University) CTTT Braga, Portugal 2008 95