The GREEN Digital Library A Specialized Materials Science

advertisement
Kent State University
Gregory M. Shreve
Software Localization and Internationalization:
How and Why
1
Shreve
11/7/2004
Kent State University
Internet, E-Commerce & Foreign Markets
Internet World Stats estimates the
current number of WWW users at
785 million. Of these, 29% reside
in North America, 27.7% reside in
Europe, and 31% reside in Asia
with penetration rates of 69.8%,
29.9% and 6.7% respectively.
With 58.7% of current users
residing in regions with an
average penetration rate of only
18.3%, it is clear that these
foreign markets offer substantial
rewards for those prepared to
enter them.
2
Shreve
The growth of the Internet
and e-commerce over the
next decade will be driven
by the expansion of foreign
markets.
11/7/2004
Kent State University
Consumer as Foreigner
In 2003 e-commerce sales to foreign
customers exceeded domestic sales. This
year the European Internet economy is
expected to break the 4 trillion dollar mark,
growing at a compound annual rate of 87%.
Western Europe is expected to lead all
regions with 692 billion dollars in global
online exports in 2004.
North America will move 23% of its exports
online, with the U.S. pumping 210 billion
dollars into cross border e-commerce. The
Asia-Pacific region will reach 219 billion
dollars in 2004, sparked by 57 billion
dollars in Japanese online exports.
3
Shreve
11/7/2004
Kent State University
Global, Globalize, Globalization
Companies that intend to sell online will have to globalize their web
presence and their products to reach the majority of the online
marketplace. They will have to make their web sites, software
interfaces, and product documentation available in the languages
and cultural styles of an increasingly diverse and international
market by applying a process called localization – the translation of
content and adaptation of interface and form to reflect the
expectations of one or many given locales.
For global-strategy American companies, over
40% of total revenue comes from international
sales. These companies market hightechnology products such as software,
medical instrumentation, CAD / CAM devices,
and so on.
4
Shreve
11/7/2004
Kent State University
Global, Globalize, Globalization
Most of these products have a high document overhead, with
instructions on the assembly, use, maintenance, and repair of the
products delivered via off- and on-line electronic documentation. Most
are marketed and supported online. Further, many products may have
embedded software components and user interfaces use on-line
databases. These products and documents must be delivered to
locales, target markets with different cultural and linguistics contexts.
Support
customer, technical, web
Marketing
packages, web
UI
user interfaces
CBT
computer-based-training
5
Shreve
Documentation
manuals, help files
11/7/2004
Kent State University
Language Industry
While global marketing existed before the 1990’s, the translation /
software localization industry (or “language industry” for short) today
has evolved primarily as a result of the rapid global expansion of the
computer software market and the increasing use of the Internet as a
global marketing and customer service tool – all part of globalization.
The corporate problem is, of course,
that many companies do not
understand HOW to prepare their
many products, documents, web
pages and database interfaces for
distribution in other linguistic and
cultural locales – hence the need for
the services of the language industry.
6
Shreve
11/7/2004
Kent State University
New Media, New Markets
Experts estimate the current worth of the U.S. language industry at just
under $2 billion annually, with the global market worth approximately $6
billion. Indications are that growth will continue to be strong into the
next decade because of new electronic media and markets.
Consider the case of massively multiplayer online games (MMOGs): the
language industry enables the
publishers of these games to leverage
their initial development investment by
translating and adapting the games for
international locales. Industry
projections are that MMOGs will post
a 52% cumulative annual growth rate
between 2002 and 2006.
7
Shreve
11/7/2004
Kent State University
Initial Definitions
This presentation examines the issues and processes involved in
software internationalization and localization.
There are three related major processes to consider. We have
already discussed globalization.
• globalization, a strategic decision to reach an international
audience or to include different linguistic and cultural materials in a
product, software application, web site or digital collection;
• internationalization, a design process intended to enable efficient
and cost-effective subsequent linguistic and cultural adaptation;
• localization, the preparation of locale-specific versions of an
application’s interface and content.
G11N
8
Shreve
L10N
I18N
11/7/2004
Kent State University
Internationalization & Localization
Localization is the preparation of locale-specific versions of a
software application, electronic document, internet resource, or
digital collection. It consists of the translation of textual material into
the language and textual conventions of the target locale and the
adaptation of non-textual materials and delivery / display
mechanisms to take into account the cultural requirements of that
locale.
globalization
internationalization
localization
translation
Internationalization is an “upstream” engineering process that
should precede localization. Its aim is to make subsequent
localization/translation easier, more efficient, and less costly.
9
Shreve
11/7/2004
Kent State University
Scope of Processes
Each of these processes has a different scope and occurs at a different
point in the business and document cycles of an organization.
organizational policies & strategies
internationalization
business, IT, & document processes
localization
documents, interfaces, tools
Earlier
globalization
Later
10
Shreve
translation
11/7/2004
Kent State University
Evolution of Software Localization
Software localization developed as
part of the globalization of the
personal computer software
market. Software applications and
supporting electronic documents
were the first “localized” products.
The growth of the Internet and the
World Wide Web created a
demand for localized web pages
and sites. Digital multimedia and
digital repositories (including digital
libraries) are emerging foci of
localization.
11
Shreve
2005
repositories
multimedia
WWW
PC
software
1980
11/7/2004
Kent State University
Document: Display and Content
display
non-linguistic
color, graphics, icons,
symbols, display
organization
document
document
document
documents
date, time, calendar,
currency, number, address
interface: menus, dialogs,
messages, prompts, alerts,
document organization,
writing system
content
Localization focuses on both display
(appearance, presentation) and content.
Thus, localization includes a cultural
adaptation as well as a linguistic
translation component.
12
Shreve
metadata, vocabularies
content: help files, auxiliary
documents, HTML /
XML document content
linguistic
11/7/2004
Kent State University
Localizing Software Applications
Software applications were the first localized “electronic documents
Early localization included finding all “strings” embedded in code:
#include <stdio.h>
main()
{
int n; char y[5];
printf("This program converts decimal numbers to
hexadecimal\n\n");
while(1) {
printf("\nEnter decimal number: ");
scanf("%d",&n);
printf("\nNumber entered is <%d> decimal and <%x>
hexa",n,n);
printf("\nDo you want to continue? ");
scanf("%s",y);
if(strcmp(y,"yes")) {
printf("\n exiting ..\n");
exit();
}
}
source.c
}
13
Shreve
strings are
directly in
code
11/7/2004
Kent State University
Extract Localizable Resources
PortfolioMenu MENU
BEGIN
POPUP "&File"
BEGIN
MENUITEM "&Add Student",1
MENUITEM SEPARATOR
MENUITEM "&Delete Student", 2
MENUITEM SEPARATOR
MENUITEM "&Update Student", 3
MENUITEM "E&xit", 4
END
POPUP "&Tools"
BEGIN
MENUITEM "Add &Portrait", 5
END
POPUP "&Help"
BEGIN
MENUITEM "About Portfolio", 6
MENUITEM SEPARATOR
MENUITEM "Contents", 7
END
END
Strings are not the only
localizable material:
• dialog boxes
• controls
• labels
• menus
• icons
• graphics
• tooltips
RESOURCES
14
Shreve
11/7/2004
Kent State University
Localizing Web Pages
Web sites are also now being localized. The link below points to a
commented HTML file that gives a simple introduction to localizing an
HTML web page. At the localizer’s level some of the issues (not an
exhaustive list) are:









character sets
localizing tag content
recognizing which tags have localizable content
not breaking tags
looking for text generated by attributes (title, alt)
looking for text generated by scripts (server-side, client-side)
evaluating CSS and stylesheet changes
making changes to graphics
dealing with graphics with integral text
Localization of HTML
15
Shreve
11/7/2004
Kent State University
A Solution: Re-Engineer the Software
As one could imagine, localizing
directly in code led to problems.
First, translator / localizers were
quite capable of “breaking code.”
There were also problems
associated with the necessity for
multiple “re-builds” of the basic
software for each language
version. Language expansion
(differences in textual volume)
created sizing problems in dialogs
and controls. Localization was
labor-intensive, difficult and
expensive. A solution was to reengineer the software with the
intent of separating language
resources from the underlying
delivery mechanism.
16
Shreve
11/7/2004
Kent State University
Internationalization: Separate Resources
Internationalization is a reengineering and re-design
process intended to make
localization and translation
easier, faster and more costeffective.
A first step in the internationalization of software
applications is the separation
or extraction of linguistic and
cultural resources from the
application, leaving a “neutral”
software kernel.
resources
application
software
kernel
Extraction requires specialized
localization tools.
17
Shreve
11/7/2004
Kent State University
Extract Localizable Materials
#include <stdio.h>
extern unsigned char
*intl_m_msg(), *intl_f_msg();
main()
{
int n; char y[5];
printf(intl_m_msg("","mypg
",1));
while(1) {
printf(intl_m_msg("","mypg
",2));
scanf("%d",&n);
printf(intl_m_msg("","mypg",3),
n,n);
printf(intl_m_msg("","mypg",4))
;
scanf("%s",y);
if(strcmp(y,
(intl_m_msg("","mypg",6))) {
printf(intl_m_msg("","mypg",5))
;
exit();
}
source.c
}
}
18
Shreve
EXTRACT
1
2
3
This program converts decimal numbers
to hexadecimal\n\n"
\n Enter decimal number:
\n Number entered is <%d> decimal and
<%x> hexa
4
5
6
\n Do you want to continue?
\n exiting ..\n
yes"
mypg.en
11/7/2004
Kent State University
Extract Localizable Materials
#include <stdio.h>
extern unsigned char
*intl_m_msg(), *intl_f_msg();
main()
{
int n; char y[5];
printf(intl_m_msg("","mypg
",1));
while(1) {
printf(intl_m_msg("","mypg
",2));
scanf("%d",&n);
printf(intl_m_msg("","mypg",3),
n,n);
printf(intl_m_msg("","mypg",4))
;
scanf("%s",y);
if(strcmp(y,
(intl_m_msg("","mypg",6))) {
printf(intl_m_msg("","mypg",5))
;
exit();
}
source.c
}
}
19
Shreve
TRANSLATE
1
Ce programme convertit les nombres
décimaux en hexadécimal\n\n
2
3
\nEntrer le nombre décimal:
4
5
6
\nVoulez vous continuer?
\nLe nombre entré est <%d> décimal et
<%x> hexadécimal
\nSortie ..\n
oui
mypg.fr
11/7/2004
Kent State University
Content and Display in Web Pages
Web pages share the problem of “separation of content and coding” with
application software. You can see from our web page example how true this is.
Internationalization solutions in web pages also involve the “extraction” of
linguistic and cultural material from the software vehicle. Cutting edge solutions
create dynamic HTML from XML-based language content.
<gradinquiry>
<name>
<firstname>Joan </firstname>
<lastname>Smith</lastname>
</name>
<address>
<addressline1>266 South Prospect Street</addressline1>
<addressline2/>
<BODY>
<city>Kent</city>
<TABLE>
<state>Ohio</state>
<TR><TD>Joan</TD><TD>Smith</TD></TR>
<zip>44240</zip>
<TR><TD>266 South Prospect Street</TD></TR>
</address>
<TR><TD>Kent</TD></TR>
<country>USA</country>
<TR><TD> Ohio</TD></TR>
<phone>330-673-9999</phone>
<TR><TD> 44240</TD></TR>
<fax>330-672-4017</fax>
.
<email>gshreve@neo.rr.com</email>
.
</gradinquiry>
.
<TABLE>
<BODY>
Shreve
11/7/2004
HTML
XML
20
Kent State University
Two Multilingual Web Architectures
Multiple static versions of pages
stored in a folder hierarchy by
language and navigated by
selection mechanism
Principle of separating linguistic
from software elements
as used in software localization
content is
“dynamically” inserted
in generated local
page templates
XSL
transforms
static web page
is selected and
displayed
language
selection
NEW
OLD
multilingual XML
content
21
Shreve
11/7/2004
Kent State University
I18N Content Management
Style Sheet Repository
deploy
format
localization
Dynamic Pages
translation
Content Repository
(archive, database)
Display Medium
organize, classify
XML
Representation
(content only,
strip format)
22
Shreve
acquire information
This system assumes an
Internationalized
dynamic web page
architecture
11/7/2004
Kent State University
Internationalization: Control
Truly effective internationalization also involves early intervention in
and re-design of “upstream” business and document processes like
authoring to exert greater control and to reduce variability.
creation: authoring
storage
retrieval
document
document
document
documents
acquisition
rendering
distribution
23
Shreve
11/7/2004
Kent State University
Internationalization & Authoring
For instance, intervention in and re-design of document creation
processes (authoring) can yield significant “downstream” benefits for
localization. Controlled language and terminology control are two
strategies.
technical writers
I18N
dependency
controlled languages
terminology control
help text
machine translation
software documents
L10N
24
Shreve
localization
vendor
11/7/2004
Kent State University
Internationalization & Localization
technical writers
controlled languages
terminology control
Internationalization engineers
work with or for clients to create
internationalized products.
help text
software documents
I18N
L10N
localization
vendor
resources
software internationalization
tools
25
Shreve
internationalization engineers
localizable
software
distribution
11/7/2004
Kent State University
Localization Management & Tools
A localization project requires its
own processes and tools.
L10N
localizable
software
distribution
workflow
management
project
management
tools
QA/testing /
validation tools
localization
project
localization
tools
document / version
control
translators / localizers
26
Shreve
11/7/2004
Kent State University
Localization Management & Tools
project manager
localization engineer
localizable
software
distribution
localization
project
localization
tool
(enterprise)
translation
memory
terminology
manager
27
Shreve
localization
toolkit
(distribution)
Translation memories and
terminology managers are
important tools for
maintaining standardized
translations and
glossaries. TMs provide
the focus of QA, ensure
replicability / repeatability,
and allow re-use of
linguistic and cultural
materials.
localization
tool
(translator)
translators / localizers
11/7/2004
Kent State University
Localization Management & Tools
Specialized localization for
alignment and term extraction
are used to automate the
construction of TMs.
text alignment
tool
translation
memory
terminology
manager
localization
toolkit
(distribution)
localization
tool
(translator)
translators / localizers
term extraction
tool
28
Shreve
11/7/2004
Kent State University
Reusability
new version
uses 70% same text
translation
memory
30%
change
latest version
uses 80% same text as
previous
20%
change
Version 2
Version 3
Version 1
initial translation
with TM tool
29
Shreve
Reusability is an especially important
objective of internationalization and
reduces the cost of localization.
11/7/2004
Kent State University
Goals of Internationalization
The goals of internationalization are:
30
reusability
translations
scalability
I18N solution
authority / quality
equivalence
accessibility
cross-language
accuracy / acceptability
target culture(s)
control
target document
Shreve
These goals are met
by separating content
from display, defining
and extracting culturally
variable material from
fixed or neutral material,
intervening in the
document cycle to exert
control over document
processes, and using
translation memories and
terminology management
to ensure critical
characteristics such as
authority and reusability
11/7/2004
Kent State University
Enhanced Corpora
Future directions in
internationalization
will involve exploiting
document corpora
more effectively and
extracting useful
linguistic and textual
objects for control
and re-use.
Control of the
document cycle
begins with
understanding the
documents we
already “own” and
enhancing them.
31
Shreve
11/7/2004
Kent State University
Corpus
32
Shreve
New Localization Objects
Many linguistic
objects useful in
computer-assisted
authoring and
translation, web page
localization, machine
translation and crosslanguage information
retrieval (including
browsing) can be
extracted from a wellunderstood and
deliberately structured
document corpus.
11/7/2004
Kent State University
Corpus Replication
Using statistical
techniques it is
possible to
replicate the
contents of a
monolingual
corpus and add
multilingual
equivalents for
terms, phrases,
document
segments and
other objects to it.
33
Shreve
11/7/2004
Kent State University
What The Industry is Doing Now
The language
industry currently
relies on using
translation
memories and
terminology
managers. There
are significant
drawbacks to this
method that prevent
new gains in cost
reduction and
profitability – the
goal of internationalization.
34
Shreve
11/7/2004
Kent State University
A New Model
New approaches to
internationalization
and automatic
localization leverage
the linguistic value of
existing corpora and
allow the creation of
“enhanced” corpora
whose contents are
understood and
controlled. Statistical
corpus linguistics and
XML combine to allow
the next step in
localization
technology.
35
Shreve
11/7/2004
Kent State University
Peer-to-Peer Localization Resources
A peer-to-peer
networking platform
with a security and
digital rights
management layer
can be used to link
clients in an XML
resource network. A
vendor can assess
per transaction
charges for access
to corpus object
stores.
36
Shreve
11/7/2004
Kent State University
Socio-Cultural Style Sheets
The peer-to-peer
networking platform can
also be used to provide
new capabilities for next
generation localization.
Client-Side SocioCultural Style-sheets
(CSSCS) can provide
for automated solutions
to on-the-fly provision of
web content in the
languages and formats
desired by and
expected by web users
all over the world.
37
Shreve
11/7/2004
Download