Interactive Real-time Translation via the Internet Mark Seligman

advertisement
Interactive
Real-time Translation
via the Internet
MarkSeligman
Universit6 Joseph Fourier
GETA, CLIPS, IMAG-campus, BP53
150, rue de la Chimie
38041 Grenoble Cedex 9, France
seligman @cerf.net
From: AAAI Technical Report SS-97-02. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.
Abstract
The Internet offers a tremendous opportunity for
experimentswith real-time machinetranslation (MT)
dialogues. Interchanges may be typed, or speech
recognition (SR) techniques might be used to accept
spoken input. In either case, given the number of
ambiguities still escaping automatic resolution,
interactive disambiguation for both componentsseems
likely to remainimportantin manyInternet applications
for sometime to come.In fact, simple concatenation of
interactive SRand interactive MTmayoffer a "lowroad"
to "quick and dirty" speech translation, as a pragmatic
near-termalternative to tightly integrated systems
aimingfor more automatic operation. Disambiguationat
the speechrecognition stage is particularly importantas
a foundationfor speechtranslation, in viewof the needto
"kill ambiguitybefore it multiplies". I briefly discuss
interactive disambiguation for MT. Interactive
disambiguationfor speechrecognition is the next topic:
isolated-wordspeechrecognition is seen as an especially
interactive input modewith several advantageswhichare
usually overlooked. Discussion then turns to ways of
transmitting translated text via the Internet, with
consideration of ftp and Internet Relay Chat. A
pioneeringdemoof speech translation over the Interact
organized by Kowalski, Rosenberg, and others is
described, and possible alternative architectures are
examined.
foundation for translation, in view of the need to "kill
ambiguitybefore it multiplies".
Section 2 will propose that in fact simple concatenation of
interactive SR and interactive MTmayoffer a "low road"
to "quick and dirty" speechtranslation, as a pragmaticnearterm alternative to tightly integrated systems aiming for
more automatic translation (though integrated systems
appear more theoretically satisfying and promising in the
longer term).
Section 3 will briefly discuss interactive disambiguation
for MT,demonstrating somepossible modesof interaction
by reference to the LIDIA-1.2 project (Boitet 1996,
Blanchon 1996). This dialogue-based MTsystem was
designed for off-line use, but might be adapted for on-line
use as well.
Interactive disambiguation for speech recognition is next
discussed (in Section 4). In particular, isolated-word (as
opposed to continuous) speech recognition will be viewed
as an especially interactive input mode with several
advantages which are usually overlooked.
Introduction
Text can now be transmitted inexpensively over the
Internet in real time, and a range of relatively inexpensive
speech recognition and translation
components are
becomingavailable. As a result, there is nowa tremendous
opportunity for experiments with real-time machine
translation (MT)of dialogues over the Net. Interchanges
could be typed, or speech recognition (SR) techniques
might be used to accept spoken input in order to create
speech translation systems.
The central themeof this paper is that interactivity will be
crucial for exploiting this opportunity. That is, given the
numberof ambiguities still escaping automatic resolution
in current SR and MT components, interactive
disambiguation for both components seems likely to
remainimportant for manyInternet translation applications
for sometime to come, as Section 1 will argue. For speech
translation, disambiguationat the speech recognition stage
will be described as particularly important in laying a firm
Finally, discussion turns in Section 5 to transmission of
translated text via the Internet, with considerationof ftp and
Internet Relay Chat. A pioneering Intemet demoof speech
translation organized by (Kowalski, Rosenberg, & Krause
1995)will be described, and alternative dialogue translation
architectures for the future will be discussed.
1. TheNeed for Interaction in
Translation
The argumentfor interaction during translation, especially
speechtranslation, is straightforward: at the present state of
the art, attempts to automatically translate even simple
texts often leave manyambiguities which cannot reliably
be automatically resolved. Programs cannot yet resolve
these ambiguities; humans (usually) can; so humans
should help the programs,at least until they can better help
themselves.
142
The ambiguities at issue maybe structural (such as
uncertainattachmentsfor prepositionalphrases, as in The
mansaw the womanin the park with the telescope) or
lexical (such as uncertainty whetherbankmeans"edgeof
river" or "lending institution"). Further, information
necessaryfor smoothtranslation into the target languageis
often missingin the sourcelanguageinput: for instance, a
Japanese noun will normally lack information about
definiteness and number which would be needed for
translation into natural English. This underspecification,
too, can be viewedas a kind of ambiguity,since it leaves
openmanypossible interpretations.
Tothe extent that humansmustintervene, they should do
so early. Translation ambiguities tend to multiply
exponentiallyas several sources of uncertainty permute
withintranslation stages-- withinanalysis, or transfer, or
generation -- and as earlier stages pass unresolved
problemsto later ones. A basic motivation for human
intervention during text translation is to resolve
ambiguities before they can permuteor propagate. The
hopeis that by interveningwithin or betweentranslation
stages to select andprune,users canreducethe uncertainty
whichis passedalong. To nip propagatingambiguitiesin
the bud, interventionshouldtake placeas early as possible.
A slogan for this aspect of the case for interactive
translation: "Kill it (ambiguity)
beforeit multiplies".
But if this argument for early humanintervention is
persuasive for text translation, it should be even more
persuasive for translation based on spokeninput. Speech
recognitionintroducesvery considerableuncertaintyof its
own,anddoesso at the earliest stage in the overall speech
translation process, wherethere is a maximum
possibility
that ambiguitywill be amplifiedby later stages. If much
uncertainty remainsin the speechrecognitionresults, the
wholespeechtranslation process stands on a very shaky
foundation.Anyefforts to eliminatethat uncertainty will
certainly be worthwhile.
2. The High Road Vs. the Low Road to
Speech Translation
I have been arguing that as long as manytranslation
ambiguities escape automatic resolution, interactive
disambiguation will remain desirable; and that, when
disambiguation
is necessary,it shouldbe doneas early as
possible. In systemswith speechinput, early intervention
-- that is, disambiguation
of speechrecognitioncandidates
will be especiallyimportant.
Sincespeechtranslation providesthe greatest scopefor the
interaction whichis our central interest here, andmoreover
presents an especially exciting prospectfor communication
over the Internet, wecan concentrate on this ambitious
translation modefor the balance of the discussion. I now
want to suggest that simple concatenationof interactive
speechrecognition with interactive machinetranslation
maysuffice for a "quick and dirty" speech translation
systemsuitable for widespread
use onthe Internet.
Sucha pragmatic approach contrasts with that of most
current speechtranslation research, wherethe aim is tight
integration amongSystemcomponentin order to achieve
maximallyautomatic speech translation. (Anexampleof
the integrated approachis the Verbmobil
speechtranslation
system (Wahlster 1993). (Seligman & Boitet 1993)
proposeonepossiblearchitecturefor suchintegration.)
The purpose of tight integration between speech and
translation is to allow two-wayfeedback, so that each
component
can help the other to resolve its ambiguities.In
the "quickand dirty" or concatenativeapproach,however,
muchof the need for integration between speech and
translation is removed,since its burdenis passed to a
humanoperator: because the operator is allowed to
completelydisambiguate the speech input before it is
passed to the translation component, feedback from
translation to speech becomesredundant; while in the
reverse direction, ideal transmission of segmental
informationfromspeechto translation is achieved,since a
correct outcome is guaranteed. (Non-segmental, or
prosodic, informationis absent, however.Theaid it might
give to analysis mustbe supplied by the operator during
interactive MTdisambiguation.)Followingthis "lowroad"
to speechtranslation, the goal of relatively automaticor
transparent dialoguetranslation is postponedin hopesof
obtaininga usable systemsooner.
As a matter of fact, the "quick and dirty" approachto
speechtranslation, in whicha humanintervenes between
speech and translation, has been applied for practical
reasons even in systemswhichideally aim for maximally
automatic operation. For example, in the ASURA
speech
translation system (Morimoto1993) used in a widelyreportedinternational demo,speechrecognitionprovidedan
n-bestlist of utterancecandidates.If the correct text string
wasamongthe candidates, the operator selected it, andit
becamethe input to translation -- nowguaranteedto be
correct. If the correctstring wasabsent,the entire utterance
hadto be repeated.
As the goal of the system was maximally automatic
operation, the lack of real integration betweenspeechand
translation mightbe considereda weaknessof the system.
However,
the arrangement
did increaseits reliability to the
point of usability, andthis seemscrucial. Suchinteractive
143
or interventionistspeech-to-translation
interfacesmayoffer
the quickest path to speechtranslation systemswhichare
reliable enoughfor actual use. Andactual use is vital:
actual experiencewith workingsystemsmaywell lead to
rapid progress,evenif the systemsare initially relatively
primitive.
It shouldbe clear that the ultimategoal remainsrelatively
automaticspeechtranslation basedupontight integration
between speech recognition and translation. Greater
automaticityis intrinsically desirable;there can be little
doubtthat it will dependon knowledge
sourceintegration;
andsuchintegration is in anycase a fundamental
issue in
cognitive science and computationalengineering. But one
possible path to this goal is to beginless ambitiouslyin
order to gather momentum
morequickly. As the Scottish
songgoes, "Youtake the high road, andI’ll take the low
road, and rll be in Scotland beforrre ye." Whilework
continues on fully automatic, highly integrated speech
translation systemsas a long-termgoal, it is prudentto
explorehighly interactive, less integrated approachesfor
the near term. Both high and low paths can, and I think
should,be pursuedin tandem.
3. Interactive
Disambiguation
Machine Translation
for
I havebeenarguingfor interactive disambiguation
during
machinetranslation, especially speechtranslation. Some
descriptionof suchinteractionwill nowbe helpful, first for
interactive MT,then for interactive speechrecognition.
In interactive MTsystems,the systemfirst tries to resolve
by itself as manyambiguities as possible. Amongthe
problems which remain, it mayselect those which it
judges most crucial. It then presents the user with
questions, usually multiple choice, whoseanswersshould
resolvethe structural or lexical ambiguity
at issue.
running on a server, which returns a structure
factorizing all ambiguitieswhichcannot be reliably
solved with the knowledge
available. Aquestion mark
then appears next to each ambiguous unit of
translation. The user clicks on it to activate a
disambiguationdialogue whichdoesn’t supposeany
particular expertise in linguistics, computer
science,
or anyof the target languages.
To resolve structural ambiguities, the disambiguation
dialogue presents the user with a menuof possible
interpretations in the sourcelanguage.Eachinterpretation
is based on canonicalpatterns whichhavebeenchosenfor
the type of ambiguityfound. For example,to resolve two
sensesof L ’ ouvriercreusela chaussdeet l’ asphalte(which
maymean"the workerdigs the road and the asphalt" or
"the workerdigs the road and asphalts it"), the program
asks the user to choosebetweenL’ouvriercreusel’asphalte
("the workerdigs the asphalt") andL’ouvrierasphalte la
chaussde("the workerasphalts the road"). Similarly,
resolve lexical ambiguities, the user is askedto choose
among
several definitions in the sourcelanguage.
LIDIA-1.2
andits predecessorshavebeendesignedfor offline translation of text rather than for real-timedialogue
translation. LIDIAdoes, however,demonstratethe sort of
interactionlikely to be neededfor on-linetranslation where
quality is important,and adaptationfor real-time use may
be possible. (See further below.)
4. Interactive
Disambiguation
Recognition
for Speech
I have argued the near-term advantages of interactive
disambiguation for speech translation. However,many
questionsremainaboutthe style of interaction, the points
of interaction,andso on.
For example,it is unlikely that the proceduredescribed
above for ASURA,
selection from an n-best list for the
Descriptionof a recent dialogue-basedMTsystem, LIDIAentire
utterance,
is
at all optimalfroman ergonomic
point
1.2, and full references to earlier workcan be found in
(Boitet 1996), to appear in the proceedingsof MIDDIM-96 of view. Theuser had to choosethe right candidate from
amongten or so similar competitors.Whilethis style of
(the International Seminaron MultimodalInteractive
Disambiguation,Col de Porte, France, August11 - 15,
choice was manageablein a carefully planned demo,it
1996).Agoodidea of the rangeof current workin this mea
wouldhavebeentiring in a morespontaneousdialogue.
can be gainedby browsingthis volume.
Animprovedinterface for a connectedspeechrecognizer
mightcollapse the common
elementsof several candidates,
Boitet’sdescriptionof the program’soperation:
thus presentinga wordor phraselattice to the user, who
might then indicate the correct path using a pointing
... the document
is first interactively "cleaned"and
device. Or it mightpresent only the best candidate, and
"tagged"(for special termsor idioms,utterancestyles,
providea meansto quicklycorrect erroneousportions.
etc.). Theunits of translationare thensent to a stateof-the-art, all-paths analyzerwritten in Ariane-G5
and
144
But there is another approach to selection of speech
recognitioncandidateswhichis particularly interactive,
since it intervenesparticularlyearly andparticularlyoften
to nip uncertainty and error in the bud before they can
propagate. This is the use of isolated-word speech
recognition,as seenin several dictation programs
currently
available on the market.Whendictating wordby word,the
user corrects each wordbefore goingon to the next one.
Theprefix wordstring is thus knownto be correct, andthe
languagemodelcan take full advantageof this knowledge
in guessing what comesnext. Further, the boundariesof
the current wordare knownexactly, and this knowledge,
too, greatly constrains the recognition task. When
ambiguityremains, the dictation programpresents a menu
of candidatesfor the next word,andthe user can choosethe
right one, using a pointing deviceor saying"Chooseone"
or "Choosetwo".
The disadvantages of isolated-word speech input are
obvious enough: this entry modeis muchslower than
natural speech,andconsiderablyless natural, so that some
learning is required. However,
the advantagesof this input
modeare also rather striking, andmaycover a multitudeof
sins at the currentstageof research.
Thefirst advantage
has alreadybeenstressed: reliability or
robustness. Goodresults can nowbe obtained even for
user-independentdictation systems, those requiring no
registrationfor newusers.
Perhapsmoresurprising, though,is the breadthof coverage
whichis immediately
available-- the capacityto recognize
wordsacross manydomains,evenwhenthey are quite rare.
I wasimpressed,for instance, by one dictation system’s
ability to recognize myrendering of Abraham
Lincoln’s
Gettysburg Address: infrequent words like continent,
conceived, liberty, dedicated, proposition, etc. were
correctly recognizedimmediatelyand without ambiguity,
althoughthere had beenno prior registration of myvoice.
To a researcher accustomedto the need for extensive
domain-specific training in order to recognize common
wordsin connectedspeechusing lexicons of a fewhundred
words, this success wasstriking indeed. But then, when
wordboundariesare known,the frequencyof the target
wordand the size of the lexicon becomeless important.
Whatmatters much more under these conditions is
confusability with similar words. A word whichsounds
like no other wordin the lexiconcan be recognizedeasily,
evenif it is quite arcane, andevenif the lexiconis quite
large. (The word"arcane", for instance, wouldprobably
cause little difficulty for the dictation programunder
discussion.)
Since it would not require users to remain within a
specified and extensively trained domainof discourse, a
speechtranslation systembasedon isolated worddictation
could potentially be used for manypurposes with a
minimum
of domainadaptation for speechrecognition. (Of
course, the needto customizethe translation component
wouldremain as an obstacle to completelyunrestricted
use,)
Theadvantagesof fully disambiguatedspeechfor a speech
translation system have already been stressed: barring
editing errors, the segmentalinformationpassed to the
translation component
is guaranteedto be correct. But if
the speech recognition interface employsisolated-word
entry in particular, there are several additionaladvantages
fromthe translation component’s
viewpoint.Dictatedentry
is likely to eliminate manyof the disfluencies of
spontaneous
speech,since it forces users to speakrelatively
slowlyand carefully andgives continuingopportunitiesfor
revisionandediting. Thusthe syntaxof wholeutterancesis
likely to be muchcleaner than in spontaneousspeech, and
accordinglyeasier to analyze. Further, since dictation is
slower than connected speech, whole utterances will
probablybe shorter on the average. Overall, since input
will be (morenearly) correct, shorter, and syntactically
simpler,in all likelihoodless interactionwill be neededin
the MTstage to producesufficient translations.
5. Architectures
for Dialogue
Internet
MTon the
Having argued for the extensive use of interactive
disambiguationin building "quick and dirty" dialogue
translation systemsfor near-termuse on the Internet, we
comenowto the questionof architecture.
The Information Transcript Project
One"quick and dirty" speechtranslation systemfor the
Internet has already been demonstrated.This is the M1TLyon Information Transcript Project (Kowalski,
Rosenberg,
& Krause 1995).
(See also
http://sap.mit.edu/projects/mit-lyon/project.htmh). The
project providesa proof of concept,and showswhatcan be
done with existing commercialcomponents.
Sites at MITand at the Biennale d’Art Contemporainin
Lyon,Francewerelinked for four hours a dayfor several
days betweenDecember20, 1995 and February18, 1996.
Visitors at each site dictated in their ownlanguagesto
IBM’sVoice Type isolated-word speech recognition
software. Therecognizedsource-languagetext files were
transmittedtransatlanticallyby ftp, via a dedicatedpageon
the WorldWideWeb.Meanwhile,observers of the page
couldsee a transcript of the multi-partydialogue. (Hence
the project’s name.) The Webobservers could also type
into a window
on the page;the resulting text wastreated as
sourcetext to be translated, just as if it hadbeendictated.
At the receiving end, the source-languagetext, whether
dictated or typed, was translated via Global-Link
commercial
translation software. English target text was
spokenby standard Macintoshspeechsynthesis software.
Frenchtarget text waspronounced
by a dedicateddevice. To
completethe conferencecall effect, video andthe original
audio werealso transmitted via CU-SeeMe.
Theaim waslargely artistic and inspirational: "...the
intention ... is to reveal some hidden aspects of
communication
at the time of its globalization."(Website,
above)Thustranslation quality wasnot crucial. All the
same, the project demonstratesthat technologyis now
widelyavailable whichpermitsthe creation of "quickand
dirty" speech translation systems, in whichinteractive
disambiguationis exploited to makesystem integration
less necessaryin the interest of getting somethingusable
up andrunningquickly.
Somefurther details of the InformationTranscript system
maybe appreciated. It involved nine machinesin four
locations, andpermitted:
(1) Translation between English and French, in both
directions, of text input fromanysource. Translationwas
handledon a pair of Macintoshesvia AppleScript. In the
script, a loop checked
periodicallyfor the presenceof a text
input file. Wheninput text wasfound, it wastranslated
using Global-Linkcommercial
software, and the translated
text wassent overseasvia ftp. (Asthe ftp connectionwas
handledthrougha Webserver in Miami,text traveling in
both directions could be capturedin order to build up a
transcript, visible via a Webpage.) Onarrival, translated
text wasdepositedin a text file to awaitpronunciation.
(2) Pronunciation of text. On the English-French
translation Macintosh,a script for SimpleTextreceived
Englishtext in a buffer, selected the text, and ran Speak
Text to pronounceit. Onthe French-Englishtranslation
Macintosh,no such softwarewasavailable; so a separate
French speech synthesis device was attached to the
Macintosh’sserial out. In either case, the text to be
pronouncedmight be the output of translation arriving
fromoverseas, or it mightbe text typedto a Webpageand
forwardedby the Webserver.
(3) Speech recognition in English and French. Speech
recognition wascardedout by IBM’sVoiceTypeisolated-
word dictation software, running on OS/2 on IBM
machines(one for English, one for French). Perl scripts
handledinput of speechand outputof text. Outputwassent
via ftp to the translationMacintoshes
for translation.
(4) Typedparticipation in the translated dialoguevia the
WormWide Web. As mentioned, a Webserver in Miami
handledthe ftp traffic betweentranslating Macintoshes
(via
a Common
GatewayInterface script) and captured
transcript of text going both ways. TheWebserver also
maintaineda Webpage whichcould accept typed input.
Viewers of the page could thus participate in the
interchangeby typing text to the page, right along with
participants using speech recognition: the translating
Macintoshesdidn’t care whetherthe text they translated
camethroughthe Webpageor via speechrecognition.
(5) A 2-wayvideo hookup.Speakerstalking to the speech
recognition software were on camera. TwoMacintoshes
runningPlayer, in addition to those used for translation,
wereused. A CU-SeeMe
connectionwasestablished via an
(unsuspecting)
reflector, or relay, in NewYorkCity.
Other Architectures
The Information Transcript project exploited the World
WideWeband ftp for transmission of files. It also
distributed processingacross several machines,with one
running SR, one running MT,etc. Wecan take a moment
to considerother possibilities,
Wheretransmissionis concerned,anothernatural direction
wouldbe to use Internet RelayChat, or IRC.Chatrequires
an IRCserver whichlinks numeroususers running client
software. A user, whomaybe anywherein the world,
selects a Chat channel from amonghundreds. He or she
types into a special input window,and hits Returnwhen
his/her utterance is finished. The finished utterance
instantly appears in a larger windowvisible to all who
haveselected the samechannel. This common
windowthus
comesto resemblethe transcript of a cocktail party, with
numerousconversations interleaved. Anyuser can easily
record all of the utterances in a session fromhis or her
vantagepoint.
Chatters can use prefixes or other tricks to direct
conversationto particular fellow chatters. Somecurrent
Chatclient softwareevenincludesvoicesynthesis, so that
a given user not only sees but hears the incoming
utterances- differentiated, if desired, by different
synthesized voices. By mutual agreement, chatters can
"move"into "private rooms"to hide their words from
146
others. It is also straightforward
to establish a newchannel,
in whichthe establishercontrolsparticipation.
Withrespect to distribution, onecould assembleall of the
processingfor a single conversanton a single sufficiently
roomyand fast computer;or a client/server arrangement
could be arranged,with translation or speechrecognition
softwareon a large server accessedover the Internet. The
first plan has simplicity to recommend
it; the second
allows for muchgreater computationalpower, but incurs
the difficulties of morecomplexcommunication
protocols
and networkdelays and breakdowns.
Possible combinationsare many.Combining
the IRCand
all-in-one-machine options, we could arrive at a the
following particularly simple setup for experimentsin
interactive translation: interactive MTrunningon a PCor
workstationwouldserve as a front end for IRC-- a user
wouldfirst interactively disambiguatetranslations in the
sourcelanguage,andwhensatisfied wouldpress return or
give an equivalentverbal signal; the MTcomponent
would
in turn pass its output, the translatedtext string, to Chat
for transmission. Experimentscould be conductedwith or
without interactive speech recognition (for greatest
simplicity, also on the same PCor workstation as the
translation program)as an additional front-endlayer; and
withor withoutspeechsynthesis(ditto) as a backend.
Possible componentsfor "quick and dirty" systems are
many,too. Nowavailable commercially
or as sharewareare
a variety of isolated-word
dictationsystems;text translation
systemswhichare sufficiently fast (thoughquality is likely
to be fair to poor); operatingsystemssupportingadequate
scripting languages; IRCclients with sufficiently open
architecturesto permitthe easyadditionof a front end;etc.
Twodesirable componentsfor "quick and dirty" MTor
speechtranslation systemsare still difficult to obtain,
however.
First, interactive disambiguation
tools for MTare generally
unavailable. TheLIDIA-1.2
systemdescribedin Section3,
while havingmanyof the capabilities neededfor dialogue
disambiguation, presently accepts input only from the
Afiane-Gsystem; and this system, developedfor off-line
text translation, has until nowbeentoo slowfor real-time
use. To use LIDIA-1.2
for dialoguetranslation, one might
modifyother translation systemsto deliver output in the
proper format; or one might speed up the original
translation system,either by recedingor by runningon a
very powerful machine. Both options are nowunde~
consideration.
Aseconddifficulty is related: that of findingan appropriate
translationserverif a client/serverarchitectureis desired.
Early efforts to use Ariane-Gas a translation server are
reported by (LaFourcade 1996). Slow but robust
communication
has been established by email, and faster
but less reliable connectionshavebeenarrangedvia HTTP,
the protocol used in the WorldWideWeb.
Conclusions
Nowthat inexpensivereal-time transmissionof text over
the Internet is a reality and a rangeof speechrecognition
andtranslation components
adaptablefor interactive use are
becomingavailable, it has becomepractical to build and
experimentwith "quick and dirty" dialogue translation
systems, including speech translation systems. I have
suggestedthat in order to take the "lowroad" to systems
usable in the near term, such systemsshould substitute
interactive disambiguationfor tight integration among
components.However,maximum
automaticity via tight
systemintegration remainsthe long-termaim.
I havearguedthe advantagesof interactivity for near-term
MTin general: as long as manytranslation ambiguities
escape automaticresolution, interactive disambiguation
will remain important. Disambiguation, whenneeded,
shouldbe doneas early as possible. In speechtranslation
systems, early disambiguation of speech recognition
candidates will be especially worthwhile.Viewedin this
light, isolated-word speechrecognition has a numberof
unexpectedadvantagesas an especially interactive input
mode.
Proceduresfor interactive MTandSRhavebeendiscussed,
and a working demo of a "quick and dirty" speech
translation system, implementedby Kowalski,Rosenberg,
andothers, has beenreported. Possibilities for alternative
designshavealso beenbriefly presented.
Whilethe InformationTranscript project demonstratesthe
technical feasibility of "quick and dirty" systems for
machinetranslation -- evenspeechtranslation -- over the
Internet, manyquestions remainabout the practicality of
such systems. Whilethis project’s use of dictated input
effectively suppliedinteractive disambiguation
for speech
recognition, there wasno interactive disambiguationof
translation, andin fact the translation softwareprovidedno
such capability. The resulting translation quality was
uninspiring (Burton Rosenberg,personal communication),
but given the principally social and artistic aimsof the
project, this wasno great causefor concern.
147
For the future, experimentsshould stress questions of
usability: For specific communication
tasks, whatis the
most practical architecture? For given architectural
candidates, howdifficult will it proveto addinteractive
disambiguationcapabilities for MT?Howmuchinteraction
is tolerable? Howmuchis needed for what degree of
quality, and howmuchquality is enough?
Whilethese questionsremainopen, it is clear that "quick
anddirty" systemsfor Internet translation are beginningto
be built. If natural languageresearchers neglect the low
road to dialoguetranslation offeredby such systems,great
opportunities for study could be missed.Andthere is the
danger that wewill arrive in Scotland to find other
enterprisingfolk alreadythere.
Acknowledgements
Warmthanks to Burton Rosenbergfor hospitality and
discussionof the Information
Transcriptproject.
References
Boitet, Christian. 1996. "Dialogue-based Machine
Translation for Monolingualsand Future Self-explaining
Documents".In Proceedingsof MIDDIM.96
(International
Seminaron MultimodalInteractive Disambiguation),Col
de Porte, France,August11 - 15, 1996.
Blanchon, Herv6. 1996. "A Customizable Interactive
DisambiguationMethodologyand TwoImplementationsto
Disambiguate
Frenchand English Input." In Proceedingsof
MIDDIM-96(International Seminar on Multimodal
Interactive Disambiguation),
Col de Porte, France, August
11 - 15, 1996.
Kowalski,Pi0tr, Burton Rosenberg,and Jeffery Krause.
1995. Information Transcript. Biennale de Lyond’Art
Contemporaln.December20, 1995 to February 18, 1996.
Lyon,France.
LaFourcade,Mathieu. 1996. "SoftwareEngineeringin the
LIDIAProject: Distribution of Interactive Disambiguation
and Other Componentsbetween Various Processes and
Machines."In Proceedingsof MIDDIM-96
(International
Seminaron MultimodalInteractive Disambiguation),Col
de Porte, France,August11 - 15, 1996.
Morimoto, Tsuyoshi, and T. Takezawa, F. Yato, S.
Sagayama,T. Tashiro, M. Nagata, et al. 1993. "ATR’s
Speech Translation System: ASURA."
In Proceedingsof
Eurospeech-93.
Berlin. Sept. 21-23, 1993.
Seligman, Mark and Christian Boitet. 1993. A
"Whiteboard" Architecture for Automatic Speech
Translation." In Proceedingsof ISSD-93(International
Symposiumon Spoken Dialogue), WasedaUniversity,
Tokyo,November
10 - 12, 1993.
Wahlster, Wolfgang.1993. "Verbmobil:Translation of
Face-to-face Dialogues."In Proceedingsof the European
Conference on Speech Communicationand Technology,
Vol. "Openingand Plenary Sessions", pages29-38, 1993.
148
Download