Lexical Issues in Dealing with Semantic Mismatches and Divergences

From: AAAI Technical Report SS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.
Lexical Issuesin Dealingwith Semantic
Mismatchesand Divergences
Elaine Rich
[email protected]
[email protected]
In this paper, we address the question, ’~/hat types of MTdivergences and mismatchesmust be
in the lexicon?" In our work, we have focused on the treatment of divergences and
in an interlingual (as opposedto a tranfer-based) MTsystem.In such systems,one uses
monolinguallexicons for eachof the languageunderconsideration, so divergencesand mismatches
not handledas explicitly as they mustbe in transfer-basedsystemsthat rely on bilingual lexicons. In an
interlingual system,theseissues become
issuesin NLgenerationinto the target language.
Divergencesand Mismatches
Twokinds of problemscan occur as a result of a divergence[Dorr 90] or a mismatch[Kameyama
Thefirst is the relatively simplecase,in whichthere is a lexical gap, i.e., a straightforwardattemptto
render a conceptthat wasmentionedin the sourcelanguagefails to find the required word. When
it is clear that thereis a problem,althoughit is less clear whatto do aboutit.
Example: English know<->Germanwissenlkennen
Themoredifficult caseis the onein whichthe lexicon providesa wayto expressthe exact (or almost
exact) conceptthat wasextracted from the sourcetext but there is also at least onealternative wayto
expressthe same(or similar) thing that is morenatural in the target language.In this case,evennoticing
the problemis not necessarilystraightforward. Weconsidertwo subcaseshere:
¯ Theproblemis with a single word. This occurs whenthere are wordpairs oneof whichis a
default andthe other of whichIs marked.In thesecases,the default mayonly be usedwhen
the markedcasecannotbe used. For example,in Spanishthe wordpez(fish) is a default
it canonly be usedif the morespecific wordpescado
(caughtfish suitable for food) is not
¯ Theproblemis at the phrasal level. In these cases, although eachwordis acceptablein
somecontext, the larger phrase soundsawkward.There are several specific kinds of
in whichthis occurs,including:
1. The main event has more than one argument/modifier, one of which can be
incorporated, along with the mainevent, into the mainverb, with the others being
addedas syntactic arguments/modifiers. There maybe lexical preferences for
choosingone argumentover anotherfor incorporation. For example,
French:II a traversela rMere~ la nage.
English: Hetraversedthe river by swimming.
acrossthe river.
hereIs analogous
to themore
that Is
usedIn morphology
In this example,
the maineventis physicalmotion.In French,the fact that the motion
wasall the wayto the other side is incorporatedinto the mainverb andthe kind of
motion (swimming)is addedas a modifier. Although this sameincorporation
possiblein English(as shown
in the literal translation), the alternative, in whichthe
kind of motionis incorporatedinto the mainverb andthe completionof the motionis
indicatedwith a prepositionis morenatural.
Althoughthere is a generalkind of phrasethat canbe usedto expressthe desired
concept,there is also a morespecializedidiom (or semi-idiom)that is applicable
the current situation and shouldtherefore be used.For example,in English, youcan
say, ’I havea pain in my " for just about any part of your body. But in a few
special cases(e.g., heao)it is morenatural to say, ’1 havea ache."
Ourgoal is to define a tactical generationsystemthat takesas its input a structurewecall a discourselevel semanticstructure (DLSS).TheDLSS
is composed
of a set of referents that will correspondto the
objects to be mentionedanda set of assertions about those referents. Theassertions will be in terms
defined by the knowledgebasethat underlies the interlingua. Becausethe DLSSalready corresponds
roughlyto the waythat the target text will be expressed,the techniquesweare developingare unableto
handlecaseswhereglobal transformationsmustbe applied to the result of understanding
the sourcetext.
In thosecases, additional mechanisms
(probablyunderstoodright nowby no one) mustbe applied to the
result of the sourcelanguageunderstanding
processbeforethe tactical generationprocessinto the target
languagecan begin.
Wewantto producea target text that hastwo properties:
¯ It comesas close as possibleto renderingexactly the semanticcontentof the DLSS
that was
input to the tactical generator.
¯ It soundsas natural as possible.
Thesetwo dimensionsare orthogonal. But they cannot be consideredindependently. Sometimes,a
very small changein semanticsmayenablea significant changein naturalness.So, the task of designing
a generationalgorithmwithin this framework
canbe dividedinto the following tasks:
1. Definewhatit means
to be the best rendition, in a given language,of an input DLSS.This
definition mustincludesomewayof trading off accuracyandnaturalness.
2. Definea searchprocedure
that will, in principle, find all the correct renditionsof an input
DLSS.Anyreasonablegenerationalgorithm will workhere.
3. Describea set of heuristics that will make
the searchprocedure
Viewingthe Problemas Oneof Generation
As suggestedabove,weview this entire processas one of generation, rather than as translation per
se. But eventhe brief descriptionof it that hasso far beenpresentedmakes
it clear that there is a sense
in whichwhatweare doingis transformingthe interlingua expression(whichresulted fromanalyzingthe
sourcetext) into an expressionin the target language.At onelevel, this is very similar to conventional
stuctural transfer of the sort that takes placein mosttransfer-basedMTsystems.But there is a crucial
difference. Thereare no transformationsthat are driven by relationshipsbetween
linguistic structures in
onelanguageandlinguistic structures in another.Instead, there are transformationsthat are driven by
the lexicon andgrammar
of the target language,whichcanbe thoughtof as forcing the transformationsto
producethe mostnatural, semanticallyacceptableoutput in the target language.True, the input to the
transformations comesfrom analyzing expressionsin another language,and so the form clearly does
dependon the lexicon and the grammarof the source language. And, in fact, we will use some
informationaboutthe sourcelexicon in someof our generationheuristics. But in general, the processby
whichtext is generatedneedsto knowlittle or nothingaboutthe sourcelanguage.
The only real sensein which the needsof MThave pushedour work on generation in directions
different from moregenericefforts to build NLgenerationsystemsis the following: In someconstrained
generationenvironments,it is possible to get awaywith assuming
that the input to the generator(from
someexternal source, suchas a problemsolving system)is already in a form that is very close to the
formthat the final linguistic output shouldtake. This is particularly likely to be true if the input to the
generator is basedon a knowledgebasethat wasdesignedwith one specific languagein mind(as is
often the caseunless the needto supportnatural representationsof expressionsin other languagesIs
explicitly consideredwhile the KBis beingbuilt). WhatMTdoesis to force us to considercasesin which
the input to the generatoris in a formquite different fromthe final outputform.And,in particular, wemust
considercasesin whichthe KBis not an exact matchto the lexicon that the generatormustuse(since it
cannotsimultaneouslycorrespondexactly to morethan one language’slexicon). But the mechanisms
dealing with these casesneednot be specific to MT.Instead, they canbe applied just as effectively in
other generationenvironments,
particularly thosein whichthe underlyingKBis sufficiently rich that there
canexist formsthat do not happen
to correspond
to natural linguistic expressions
in the target language.
In the rest of this paper,wewill sketchthe answers
that wehaveso far developed
to eachof the three
issues raised above.Theseanswersrequire that weview the target languagelexicon as containing more
informationthan is often provided.In particular, it is necessarynot just to include possiblewordsand
phrases,but also to indicate for eachwhenit is the mostnatural wayto expressthe associatedconcept.
Whatis a Correct Renditionof a DLSS?
Todefine whatit means
to be a correct rendition of a given DLSS,
weneedto do three things:
¯ Define what wemeanby accuracy(the extent to which the target text exactly matchesthe
semanticsof the DLSS).Wewill refer to this as semanticcloseness.
¯ Definewhatwemean
by syntacticnaturalness
(a function solely of the target text itself).
¯ Specify a wayto handlethe tradeoff betweenaccuracyand naturalness
for the moment
that we havemeasuresof semanticclosenessand naturalness (which wewill
describe below). Onewayto handlethe tradeoff betweenthe two is to define somekind of combining
function. Thena generationalgorithm could searchthe wholespaceof remotelypossible output strings
and select the one that wasbest, given the combiningfunction we chose. Thereare two problemswith
this approach.Thefirst is that there appearsto be no principled basisfor choosingsucha function. The
secondis morepragmatic- the generationalgorithm will wastea lot of time exploring possible output
texts that will simpleget rejectedat evaluationtime.
An alternative approachis to take a morestructured view of the interaction between
the two measures
and build theminto the algorithm in a waythat enablesus to constrain moretightly the generation
process.Wedo that as follows.
Thegenerationproceduretakes four inputs:
¯ the DLSS
to be renderedinto the target language.
¯ an absolute semanticclosenessthreshold e. wewill accept as a correct rendition of the
input DLSS
only thosestrings whosesemanticsis within 0 of the DLSS.
¯ a semanticclosenessinterval 6(8).
¯ a syntactic naturalnessthresholdN. Wewill acceptas a correct rendition of the input DLSS
only thosestrings whosenaturalnessis aboveN.
Thegeneration algorithm themworks as follows: Start with a reasonablytight boundon semantic
closeness(initially 6(8)). Seeif youcangenerateoneor moreacceptablynatural sentenceswithin
bound.If so, pick the mostnatural of the candidates.If not, increasethe allowabledifference andtry
again. Continueuntil you reach the absolute boundon closeness.If there are no acceptably natural
within this bound,the algorithmfails.
Howtight the original closenessboundis, howloose it is eventuallyallowedto get, andwhatlevel of
naturalnessare requiredwill be functionsof the generation/translationcontext. In somesituations, loose
but natural translations will be preferred. In others, tight but perhapsunnaturalonesmaybe better. In
anycase, by using an iterative approach,it is possible to generatean entire text in whichmostof the
sentencesare very close to the input, with only a small number(the oneswherethe close version
really bad)beinga bit farther away.
Howthen do we define the semantic closeness measurethat this algorithm requires? Wewant a
that capturesthree things:
¯ Howdifferent are the semanticsof the DLSS
and the target text? For example,if weassume
a set theoretic model, howmanyelementsare in one but not the other? Wewill use the
expressionASto refer to the difference betweenthe DLSS
andthe information contentof the
proposedtext. A(S) is in principle a statementin the semanticslanguage.Whatweneedfor
the algorithm howeveris a numericmeasureof the size of A(S). Wewill use IA($)Ito
indicate that measure,which will be computednot by actually computingA(S) and then
measuringit, but by assigning costs to eachof the inference rules that are usedto move
fromonesemanticdescription to another.
¯ Howlikely is it that the semantics
of the target text acuratelycharacterizethe situation that
the DLSSis describing? For example, the DLSSmaycorrespond to the meaningof the
English word knowand the target text mayend up with semantics correspondingto the
wordkennen,but if the situation being describedis oneof being acquaintedwith a
person,thenthe target text accuratelydescribesthat situation evenif it is not identical to the
English sourcetext. Let P(C) be the probability that the semanticsof the target text
accuratelycharacterizethe situation.
¯ If there are differences betweenthe semanticsof the target text andthe situation being
described, howmuchis anyonelikely to care about those differences? For example,
supposewe translate the English word vegetable into the Japaneseword yasai, which
describesa class that mostlybut not completelyoverlapsthe class describedby vegetable.
In mosttexts this differenceprobablydoesnot matterat all, but in a bookon horticulture, it
might. Let CARE(A(S))be a measureof how muchanyoneis likely to care about
semanticdifferences that do exist. In general, if wethrowawayinformation, weneedto be
concernedabout whetherit wasimportant. If weadd information, weneedto worry about
whetheraddingit will give the readersomeincorrect ideasthat matter.
In any real program,it will not be possible to computeany of these measuresexactly. But wecan
exploit heuristics. For the first, wecan look at the specific inferences that weremadewithin the
knowledgebase. For the second, we mustalso appeal to the KB, although here we mayalso want to
appealto the sourcelexicon as well (see below). For the third, wemustappealagainto the KBand also
to somemodel(possibly very simple)of whatmattersin the domainof the current generationtask.
To see howthese measuresinteract to define closeness, let’s examineeach of the common
inferenceproceduresthat canbe usedto generatesets of assertionsthat are likely to be "close" to the
input DLSS.
First consider the case in which we moveup in the KBhierarchy (in other words we throw out
information). This happensin translating the German
wordskennenand wisseninto English, wherewe
haveto moveup to the moregeneral conceptof knowing.A clearer exampleoccurs in translating the
Spanishwordpescado(meaninga fish that has beencaughtand Is ready to be food) into the English
fish. In this case, there is no chanceof being wrong.So wedefine the closenessof a proposedtext to
the Input DLSSas simply
Next consider the case in which we movedownin the KB hierarchy (in other words, we add
information).This happens
in translation the Englishwordfish into the Spanishpescado,
or in translating
the English knowinto German.Nowwe haveto consider the risk that the information that has been
addedis wrong.Sometimes
the risk is low (for example,becausestraightforwardtype constraints on the
other components
of the sentenceguaranteethat the required conditions are met.) But sometimes
is a risk. So closeness
is definedas
Finally, consider the casein whichwemovesideways,to related situations. This operation is very
becauseit occurs in several different kinds of scenarios. It happensin cases(for example
Englishvegetablevs. Japanese
yasal) in whichthere is no wordfor the exactconceptbut there is a word
for a very similar one. It also happensin casesof altemative incorporations(as in the swim~traverse
example).In both cases, it is possible that weare both addingand throwing awayinformation. In the
latter case, the likelihood of makinga mistake,however,is very low. In the former, it depends
on how
similar the conceptsare. In either case, weuseas a measure
of closeness,
Syntactic Naturalness
Nowweneedto define whatit meansto be a syntactically acceptablesentencein the target language.
It is at this step that the lexicon plays a keyrole. Andit is herethat weneedto take into accountthe
difference between
"sort of in the language"(e.g., "Hetraversedthe river by swimming."),and"natural
the language"(e.g., "Heswam
acrossthe river.")
Thereare three possible reasonswhysomethingthat is "in the language"(i.e., it can be produced
using the lexicon andthe grammar
of the language)mightbe consideredunnatural:
1. A wordor phrasethat is labeled as a default in a default/markedalternation pair is used
whenthe markedform is appropriate.
usemultiplicationin all of thesefunctions.It isn’t cdtical that It bemultiplication.Almostanymonotonic
functionwill probably
2. A phrasethat is generally okayis generated
but there is a nearly equivalent(semantically)
phrasethat is preferred.Thisis whatis happening
in both the ’traverse the river" andthe ’1
havea pain in myhead"examples.)
3. Thereare genuine"negative idioms" in the language. By a negative idiom we meansome
phrasethat, while generatablefrom the lexicon andthe grammar
of the language,is simply
not acceptable, for whatever reason. Weare not sure yet what the data are on the
existenceof such"phrasalgaps",but weallow for themin our approach.
Of these, the secondis by far the mostcommon
and thus the mostimportant from the point of view
both of the lexicon itself andof the generationalgorithm. Yet this causeof unnaturalnesspresentsa
problemfor anyevaluationfunction that is intendedto look at a proposedoutput string andevaluateit.
Todo that, weneedto be ableto tell, by lookingjust at the string andat the lexical entries andrules that
it, howgoodit is. In the caseof default wordsandnegativeidioms(cases1 and3 above),this
is not difficult, because
both canbe indicatedas infelicitous in the lexicon. Defaultwordsneedjust to be
markedas such. Negative idioms needonly be stored, using the sameformat as positive ones (see
below), alongwith a rating that indicates the appropropriatelevel of unacceptability. But this approach
doesn’treally workin case2. Wedo not wantto haveto enter explicitly into the lexicon everyphrasethat
is "eclipsed"by something
Let’s look at the ’traverse the river" example
to makethis issueclear. Weare trying to translate from
Frenchto English. Suppose
westart generatingwith a very small 8(0), becausewewanta fairly literal
translation if wecan find a goodone. Thenwemayproduce,"He traversed the river by swimming."The
lexical entries for thesewordstell us nothingabout the fact that this soundsunnatural, nor doesthe
Indicate that anyunusualconstructionwasused. Sothere is no basis for giving this sentencea
low acceptability score. Yet wewantto, precisely because
there is an alternative wayto say it that is
very, veryclosesemantically
andthat is a lot better in termsof naturalness.
Thereare two possible solutions to this problem.Thefirst is to modifythe generationalgorithm and
adda sort of cloud around8. If anythingin the cloudis a lot morenatural than the other sentences,then
take it. Thebig drawbackto this approachis that it is computationallyvery expensive.Addingto 0
firing knowledge
baseinferencerules, andthat is likely to be explosive.
Thealternativeis to run the inferencerules in the otherdirection andcachethe results. In otherwords,
start with all the phrases that are markedin the lexicon as being preferred. Take the semantic
expressionsthat correspondto themand, starting at those points, apply the inverses of the inference
rules that wouldnormally be used during generation. Any phrasesthat correspondto the semantic
expressionsthat result shouldbe marked
in the lexicon as bad.
In other words,weneedto be able to tell whena sentenceis bad(i.e., highly unnatural).But wedo not
wantto dependon being given a lot of explcit information aboutbadthings, becausethere are too many
of themandbecausethe badnessusually results from the existence of goodthings (so sayingit again
wouldbe redundant). Fortunately, however,wecan extract the required information automatically. In
addition, it is possibleto addexplicit badness
informationas a wayto tune the lexicon, but this should
only be necesssary
in caseswherethe knowledge
baseis incompleteandit is not worththe effort to fix it.
So wenowhavetwo kinds of information that mustbe presentin the lexicon in order for naturalness
to be possible. Thefirst is that default/marked
pairs mustbe indicated. This is easy.
Thesecond,andmoresignificant thing is that it mustbe straightforwardto enter phrases,at varying
degreesof generality, into the lexicon. Eachlexical entry musthavethe following generalform:
( <style>
For example,to indicate that wehavea preferencefor using wordslike swimming
(i.e., incorporating
the kind of motioninto the mainverb), wemighthavethe following entry in our Englishlexicon (using
for the actual patterns):
motion-event (X),
mechanism-of-motion ( x,z)
path x,y
<word corresponding
to Z> <path
Oftenthe goodness
function will be a single number.In fact, using simply 0 for ordinary wordentries
and 1 for phrasesthat are knownto be goodin certain circumstances(as described by the stated
patternsof their arguments)
goesa long way.In somecases,however,it is necessaryto usea function,
which typically takes stylistic information as its argument.So wemight havea technical wordwhose
goodnessis 1 if weare attempting to generate technical prose and 0 otherwise. Becausestylistic
Informationis not alwaysnecessary
for computing
it is optional in eachlexical entry.
Choosingthe Most Natural WholeSentence
Sofar, wehavetalked aboutwaysto indicate, in the lexicon, the naturalnessof individual wordsand
phrases.But howdo wecombinethose into scores for wholesentencesso that wecan choosethe best?
Thereare two parts to the answerto this question. Thefirst lets us get a roughmeasure.Thesecond
is then usedto comparea set of sentenceswhosescores are nearly equal given the roughmeasurement.
To computethe roughmeasurement
of the naturalnessof a sentence,just averagethe scores of the
that wereusedto compose
it. Weneedto use an averageso that the generationalgorithm
canusea single acceptability thresholdregardlessof the length of the sentencebeinggenerated.Bythis
sentencesthat contain "bad" wordsor phraseswill havelower scoresthan thosethat don’t.
But what if there are multiple candidatesentencesnoneof which is "bad"? Nowweuse a different
measure.Hereweprefer sentencesin which the languagedoesmorework to oneswhereit doesless.
Whatwemeanis that weprefer:
¯ The use of inflectional morphologyto communicate
somethingover the use of additional
lexical items.
¯ Theuseof a single wordto communicate
a set of assertionsover a phrase(unless the word
is stylistically inappropriatein the currentcontextor there is externalinformation,e.g., the
sourcetext, that tells us that the wordcannotbe used).
¯ Theuseof a morerestricted phraseto the use of a moregeneralpurposeone, and the use
of evena generalpurposestored phraseto the useof text that is deriveda wordat a time
using the grammar.A phrase A is morerestricted than another phraseB if the semantic
conditions on the useof A subsume
thoseon the useof B.
Theseheuristics, takentogether, usually but not alwaysprefer shorter sentencesto longer ones. For
example,they prefer, "Heswam
acrossthe river," to "He traversedthe river by swimming."
But ’I havea
is longer than, "Myheadhurts," eventhoughit is moreidiomatic.
Usingthe SourceText for Advice
So far, wehavedescribedthe processof target languagegenerationas onein whichonly the target
languagedictionary and grammar
needbe used. Thereare cases, though, in which the sourcetext and
the sourcelexicon can be exploited for additional information that can be usedduring generation. We
give two examples.
First considerwhathappens
if, in attemptingto generatethe target text, the initial attemptto find a
wordfor a conceptproducesa wordthat is marked
as a default (e.g., pez). In general,it is necessary
find the correspondingmarkedword(e.g., pescado)and seeif its semanticconditions can be provento
be probably true in the current context. If they can, then the markedwordmustbe used. But suppose
that the sourcelanguagehasthe samedefault/markedalternation. Thenthe reasonweare looking at the
default wordis that the corresponding
default wordwasusedin the sourcetext. Assuming
that text was
correctly written, weare guaranteed
that the applicability conditionsfor the marked
wordare not met. So
wedo not needto checkfor themduring generation.
As a second example, consider the case in which an entire set of semantic conditions can be
for by using a single word,rather than a headwordanda set of modifiers. In general, wewant
to do that. For example,wewouldwantto renderthe meaning
of the Englishphrase,"nasty little dog," as
roquet in French.But suppose
that the sourcelanguagealso has a single wordanddidn’t useit (perhaps
this occurence
of the wordis in a definition). Thenit is probablya goodidea not to useit in the
targettext either.
Notice that in both these cases, although we makeuse of the source language lexicon during
generation,there is no transfer lexicon. Neither sourcenor target lexicon needsto knowanythingabout
the other.
ConsideringMorethan OneSentenceat a Time
So far, wehaveconsideredthe problemof generatinga single sentencefrom a single DLSS.Wehave
the existence of a strategic planning component
that precedesthe actual generation system
andwhosegoal is to construct the appropriate DLSS.In particular, for MT,we assume
that often the
canbe patterned after the structures in the sourcetext. Sometimes,
however,it maybe desirable
to changesomethe decisionsmadeby the strategic plannerduring generation.For example,it mightbe
possible to producea muchmorenatural soundingsentenceif it wereallowedto changethe focus or the
presuppositions of the sentence. If these kinds of transformations are done, their costs can be
incorporatedinto IAsI,just as are the costs of other changes
to the semanticsof the original DLSS.But
since text level issues tend to be muchmorecommon
across languagesthan lexical and morphological
ones are, these kinds of transforms are less important than the ones we have focused on in this
of Implications for the Lexicon
Themainimplications of this discussionfor lexicons designedto supportinterlingual MTsystemsare
the following:
¯ Default/marked
pairs mustbe indicatedin the lexicon.
¯ Phrasesthat representpreferedwaysof sayingthings needto be enteredinto the lexicon as
phrasesevenif their semanticscan be derived compositionallyfrom the meaningsof their
constituents(andthus there wouldbe no reasonto enter theminto a lexicon intendedjust to
rather than generation).
¯ Restrictionson the applicability of phrasesneedto be statedas tightly as possible.
¯ Thelexicon needsto be tied to a knowledge
basethat supportsthe kinds of reasoningthat
wehavedescribedfor getting from a literal meaningto a related meaningthat canbe more
naturally expressed
in the target language.
Noticethat in somesensenoneof theserequirementsis uniqueto MThowever.All of thesethings are
necessaryin any NL generation systemthat expects to be able to generate from inputs that do not
exactly matchthe waythings are typically said in the target language.
[Dorr 90]
Doff, B.
Solving ThematicDivergencesin MachineTranslation.
In Proceedingsof ACL.1990.
Zur Struktur des RussischenVerbums.
In Hamp,
& Austerlitz (editors), Readings
in UnguisticsII. TheUniversity
of ChicagoPress, Chicago,1966.
M., R. Ochitani & S. Peters.
ResolvingTranslation Mismatches
with Information Flow.
In Proceedingsof the 29th AnnualMeetingof the ACL.1991.