~ti~tlstlcallv-Gulded Word Sense Dlsambio_uation 1 Elizabeth D. Liddy & Woojin Paik

From: AAAI Technical Report FS-92-04. Copyright © 1992, AAAI (www.aaai.org). All rights reserved.
~ti~tlstlcallv-Gulded
Word Sense Dlsambio_uation
1
Elizabeth D. Liddy & Woojin Paik
School of Information Studies
Syracuse University
Syracuse, New York 13244-4100
31 5-443-2911
liddy@mailbox.syr.edu wpaik@mailbox.syr.edu
Abstract
Within the field of Natural LanguageProcessing, lexical disambiguation remains one of the toughest
hurdles to overcomein the developmentof fully operational systems. As part of a larger document
detection system (DR-LINK), we have implemented a computational approximation of word sense
disambiguation by combining information from a machine-readabledictionary, local context, and corpus
statistics. Weuse the Subject-Field Codes(SFC) extracted from a machine-readabledictionary
producea preliminary, multi-tagged semantic coding of words in a text. Thenweapply local heuristics
that evaluate the SFCsof ambiguouswords to chooseamongthe multiple SFCs.Choices which cannot be
madeusing local heuristics are resolved by statistical evidence, namely, an SFCcorrelation matrix
that was generated by processing a corpus of 977 Wall Street Journal (WSJ)articles containing
442,059 words. The implementation was tested on a sample of 1638 words from the WSJand selected
the correct SFC89%of the time. The resultant, disambiguated SFCfrequencies are summed
and
normalized to produce a weighted semantic vector representation of each text. TheseSFCvectors
provide the basis on which the systemautomatically classifies texts as the first stage in DR-LINK.
The Disambiguation Problem
NLPsystems take naturally occurring text and create a representation of the meaningof the text
that will be used to accomplishthe specific task of the system, be it machinetranslation, document
detection, question-answering, knowledgeextraction, or information retrieval. Lexical ambiguity has
been a major stumbling block in the developmentof real-world NLPsystemsfor all these applications
due to the fact that a single word mayhave morethan one meaning. According to Gentner (1981) the
twenty most frequent nounsin English have an average of 7.3 senses each, while the twenty most
frequent verbs have an average of 12.4 senses each. As a result, whenattempting to represent a word
which has multiple senses, an NLPsystem must either produce multiple representations for that word
or select one sense from amongstthe possible choices included in the system’s lexicon. The process of
selecting from amongsta word’s possible senses is referred to as semantic lexical disambiguation.
Researchinto humanlexical access and disambiguation has been very active in recent years. Small,
Cottrell & Tannenhaus(1988) provide a substantive reader on research on lexical disambiguation from
the various fields within cognitive science. In principle, weagree with Small, Cottrell & Tannenhaus
(1988) that "in order to resolve ambiguity, an NLU(humanor otherwise) has to take into account
sources of knowledge".Given that there is no current single theory as to the exact nature of and
interaction amongstthese sources which can account for all the experimental results in lexical
disambiguation, we agree with Prather & Swinney(1988) that there will be ’no uniform, invariant
solution to lexical ambiguity resolution". Consequently,we interpret the empirical psycholinguistic
results as suggesting that there are three sources of influence on the humandisambiguation process:
Local context -the sentence containing the ambiguousword restricts
ambiguous words
the interpretation
Domainknowledge- the recognition that a text is concerned with a particular
only the senses appropriate to that domain
of
domainactivates
Frequencydata - the frequency of each sense’s general usage affects its accessibility
1Support for this research wasprovided by DARPA
under the auspices of the TIPSTERProject.
98
From: AAAI Technical Report FS-92-04. Copyright © 1992, AAAI (www.aaai.org). All rights reserved.
Wehave attempted to computationally replicate these three knowledgesources in our
disambiguator. Figure 1 provides a mappingfrom the sources suggestedin the psycholinguistic
literature as being used by humansto the soumesused by the disambiguator in DR-LINK.Eachof the DRLINKsourceswill be describedin later sections of this paper.
OR-LINK
Humans
Fig.l:
local context
unique or high-frequency SFCwithin a sentence
domain knowledge
SFCcorrelation matrix
frequency of usage
order of senses in LDOCE
Sources influencing humanand automatic disambiguation (DR-LINK)
Weconsider the ’uniquely assigned’ and ’high-frequency’ SFCsto words within a single sentence as
providing the local context which activates the correct SFCfor an ambiguousword. The SFCcorrelation
matrix, which is based on a large sampleof texts of the sametype as the text being disambiguated,
equates to the domainknowledge(WSJtopics) that is called uponfor disambiguationif the local context
does not resolve the ambiguity. Andfinally, ordering of SFCsin LDOCE
replicates the frequency of use
criterion suggestedby Hogaboam
& Perfetti (1975) whosaid that: "The order of search is determined
by frequency of usage, the most frequent being first. The search is self-terminating, so that as soon as
an acceptable match occurs, no [other] entries will be checked".
Weimplement the computational disambiguation process by moving in stages from the morelocal
levels to the moreglobal types of disambiguation, using these sources of information to guide the
disambiguationprocess. Thework is unique in that it successfully combineslarge-scale statistical
evidence with the more commonlyespousedlocal heuristics.
The documentdetection task of DR-LINKdoes not require the precise disambiguation of every word
in text, but it does need to disambiguateto the point of knowingwhich of the SFCsthat have been
assigned to a word’s multiple senses is correct. In somecases this process equates to sense
disambiguation. In other cases, the SFCselected by the system maybe attached to morethan one
sense, so in that case our system narrows the choice of senses to those which are in the appropriate
semantic domain.
Semantic Re oresentation of Wordsin DR-LINK
Weuse the Subject Field Codesfrom Lonaman’sDictionary_ of Contem.DoraryEnglish (LDOCE)as
semantic representation of a text’s contents. LDOCE
is a British-produced learner’s dictionary that has
been used in a numberof investigations into natural languageprocessing applications (Boguraev
Briscoe, 1989) using the first edition (1978) of the dictionary. Weare using the secondedition (1987)
and beganworking directly from the typesetters’ tape, which we have cleaned up during related
research into the automatic extraction of semantic relations from dictionary definitions (Liddy & Paik,
1991) and converted the data from the tape into a lexical database.
The 1987 edition of LDOCE
contains 35,899 headwordsand 53,838 senses, for an average of 1.499
senses per headword. The machine-readabletape of LDOCE
contains several fields of information not
visible in the hard-copy version, but which are extremely useful in natural languageprocessing tasks.
Someof these are relevant for syntactic processing of text, such as subcategorization codes while
others contain semanticinformation, such as the Box Codes,which indicate the class of entities to
which a nounbelongs or the semantic constraints for the argumentsof a verb or an adjective, and the
Subject Codes.
The Subject Codesare based on a classification schemeof 124 major fields and 250 sub-fields.
Subject Codesare manually assigned to words in LDOCE
by the Longmanlexicographers. There are two
types of problems, however, with the Subject Codeassignments which becomeobvious when an
attempt is madeto use them computationally. First, a particular word mayfunction as morethan one
part of speechand each word mayalso have morethan one sense, and each of these entries and/or
senses maybe assigned different Subject Codes. The entries for ’acid’ in Figure 2 are taken from the
99
From: AAAI Technical Report FS-92-04. Copyright © 1992, AAAI (www.aaai.org). All rights reserved.
LDOCE
tape and demonstratea fairly
simple exampleof this problem.
PART-OF-SPEECH
SUBJECTFIELDS
acid
noun
Slzc [Science, chemistry]
133 [Drugs (not pharmaceutical)]
acid
adjective
FOzc[Food, cookery]
XX [General]
Fig. 2: LDOCE
entry with Multiple Parts of Speechand SFCs
If an NLPsystemcannot ascertain either the grammaticalfunction or sense of a word in the text
being processed, all Subject Codesfor all entries for an orthographic form must be considered. Our
systemincorporates automatic meansfor choosing amongstthe LDOCE
syntactic categories and
choosing amongstthe senses, thereby limiting which Subject Codesare assigned to each word in a given
text.
There is also the possibility that no Subject Codehas beenassignedto a word or any of its
individual senses. Of the 53,838 senses in LDOCE
’87, 51,383 or 95%have Subject Codes. Of these,
however, 27,273 are coded XX for the General class and therefore provide no useful semantic
information. The absenceof Subject Codesor the presenceof only the General class code poses a
problem whenword-by-word disambiguation is desired, but whenthe task is to arrive at a summary
semantic representation of the text, the law of large numbersappearsto take over. For although only
24,110 senses (45%) in LDOCE
have the more informative, domain-specific codes, this has proven
sufficient for the task of text classification. In the future, wewill investigate waysin which wecan
use sentence context and the correlation matrix to suggest appropriate SFCsfor those words that do
not have SFCsin LDOCE.However,the cases of multiple codes assigned to a word impact more
immediately on our attempts at classification, since the most frequently used words in our language
tend to have manysenses and therefore, multiple Subject Codes.
Other Work Using Subject Codes
Walker and Amsler, who were the first to makeuse of the domaininformation represented by
Subject Codes, have reported on a somewhatsimilar attempt to utilize the Subject Codesto determine
the subject domainsfor a set of texts (1986). However,they used the most frequent Subject Code
characterize a document’scontent, whereaswe represent a documentby a vector of frequencies of
Subject Codesfor words in that text. Wefind that our research efforts strongly support the
suggestions madeby Walker and Amsler concerning waysto refine the representation of text using
Subject Codes.
Slator (1991) has taken the original 124 Subject Codesand addedan additional layer of seven
pragmatic classes to the original two-level hierarchy. Theseare communication,economics,
entertainment, household, politics, science and transportation. Hehas found the reconstructed
hierarchy useful whenattempting to disambiguate multiple senses and Subject Codesattached to words.
His metric for preferring one sense over another relies on text-specific values, whereasweadd
corpus correlation values as a further stage in the disambiguationprocess.
Krovetz (1991) is exploring the effect of combining the evidence from Subject Codeswith evidence
from morphology,part of speech, subcategorization and semantic restrictions for selection of the
correct sense. His goal is to represent documentsby their appropriate word senses rather than just
their orthographic forms for use in an information retrieval system.
The Disambiguation Process
The following stages of processing are done, a sentence at a time, to generate vector
representations of each document.
In Stage 1 processing, we run the documentsand query through POST,a probabilistic part of
i00
From: AAAI Technical Report FS-92-04. Copyright © 1992, AAAI (www.aaai.org). All rights reserved.
speechtagger (Meeter et al, 1991). Weuse POSTto limit the SFCsof a word to those of the
appropriate syntactic category of each word as determined by POST.The inclusion of POSThas reduced
the numberof SFCsthat need to be further considered for sense disambiguation by an average of 60%.
Stage 2 processing consists of retrieving SFCsfor each word’s correct part of speechfrom the
lexical database. Stage 2 also does someuseful pre-processing before the system actually assigns the
SFCs.For example, the SFCretrieval process utilizes the weakstemmingalgorithm of Kelly & Stone
(1975), which removesonly inflectional endings. This algorithm has proven to be a stemmerof the
appropriate strength for processing newspapertext for lexical look-up in a machine-readable
dictionary such as LDOCE.
Additionally, as a special case warranted by the newspapertext-type, if no hyphenatedword can
be found in the lexical databasefor a word that is hyphenatedin the text, the system removesthe
hyphenand searchesthe conjoined result in the lexical database. If not found, the systemre-separates
the words and assigns part of speech to each composite part using POST,and then these two words are
looked up in the lexical database.
During this stage, all functional parts of speech(articles, conjunctions, prepositions and pronouns)
are eliminated from further processing. Our preliminary tests demonstratedthat such frequently
occurring function words’ SFCscan out-number the SFCsof the more substantive content words when
summed
across a document,and really distort the resulting SFCvector representations of the text’s
content.
At Stage 3 we begin sense disambiguation, using local sentence-level context-heuristics.
Intellectual analysis of manualdisambiguationof text generatedthe original hypothesis of our work,
namely, that unique SFCsand high-frequency SFCsare good local determinants of the subject domainof
the sentence. Webegin with context-heuristics becauseempirical results have shownthat local context
is used successfully by humansfor sense disambiguation (Choueka& Lusignan, 1985) and contextheuristics have been experimentally tested in Walker & Amsler’s (1986) and Slator’s work (1991)
with promising results. The input to Stage 3 is a word, its part-of-speech tag, and the SFCsof each
sense of that grammatical category. For somewords, no disambiguation maybe necessary at this
stage, however, for the majority of words in each sentence there are multiple SFCs,so the input would
be as seen in Figure 3.
State
companies
employ
about
one
billion
people,
n
n
v
adv
adj
adj
n
4, ORDERS
POLITICALSCIENCE
BUSINESS,MUSIC, THEATER
LABOR, BUSINESS
NUMBERS
2, ANTHROPOLOGY
SOCIOLOGY,POLITICAL SCIENCE
Fig 3: Subject Field Codes& Frequencies(in Superscript) for words’ as one part-of-speech
To select a single SFCfor each word in a sentence, Stage 3 uses an ordered set of heuristics. First,
the SFCsattached to all words in a sentence are evaluated to determine at the sentence level: 1)
whether any words have only one SFCassigned to all senses of the word; 2) the SFCswhich are highly
frequently assigned across all words in the sentence. Eachsentence mayhave more than one unique SFC
as there maybe morethan one word whosesenses have all been assigned a single SFC.In Figure 3,
NUMBERS
is a unique SFC,being the only SFCassigned to the word ’billion’ and POLITICAL
SCIENCE
is
the most frequently assigned SFCfor this sentence. Wehave established the criterion that if no SFChas
a frequency equal to or greater than three, we do not select a frequency-basedSFCfor that particular
sentence. Preliminary test results showthat SFCswith a within-sentence frequency less than three do
not accurately represent the domainof the sentence.
The secondstep in Stage 3 evaluates the remaining words in the sentence and choosesa single SFC
for each word based on the locally-important SFCsdetermined in step one. The system scans the SFCs
of each remaining word to determine whether the SFCswhich have been identified as unique or highfrequency amongstthe multiple SFCsassigned to each word by LDOCE.
In Figure 3, for example,
POLITICAL
SCIENCE
would be selected as the appropriate SFCfor both ’state’ and ’people’ because
i01
From: AAAI Technical Report FS-92-04. Copyright © 1992, AAAI (www.aaai.org). All rights reserved.
POLITICALSCIENCE
was determined in step 1 to be a high-frequency SFCvalue for the sentence.
Stage 4 incorporates two global knowledgesources to complete the disambiguation task begunin
Stage 3. The primary source is a 122 x 122 correlation matrix computedfrom the SFCfrequencies of
the 442,059 words that occurred in a sample of 977 WSJarticles. The matrix, therefore, reflects
stable estimates of SFCswhich co-occur within documentsof this text-type. Although there are
actually 124 SFCsused in LDOCE,we chose not to include XX (GENERAL)
and CS (CLOSED
SYSTEM
PART
OFSPEECH)
in the matrix construction, due to the non-substantive nature of XX and CS. The second
source is the order in which the senses of a word are listed in LDOCE¯
Since ordering of senses in LDOCE
is determined by Longman’slexicographers basedon frequency of use in the English language, we equate
ordering of sensesto the notion of frequency of usagesuggestedin the psycholinguistic literature as
influencing humandisambiguation.
The correlation matrix was computedwith SASusing SFCoutput of the 977 WSJarticles that were
processedthrough Stages 1 and 2 as described above. That is, each article was represented by a
vector of SFCsof the senses of the correct part-of-speech of each word as determined by POST.For
the matrix calculations, the observation unit wasthe article and the variables being correlated were
the 122 SFCs.The scores for the variables are the within-document frequencies of each SFC.There are
255,709 scores across the 977 articles on which the matrix is computed¯The resulting values in the
122 x 122 matrix are the Pearson product momentcorrelation coefficients betweenSFCsand range
from a +1 to a -1, with 0 indicating no relationship betweenthe SFCs.The correlations represent the
probability that a particular SFCwill co-occur with every other SFCin a WSJarticle.
The output matrix is consulted during Stage 4 processing to determine the correlation coefficient
betweentwo SFCsand serves as the more global, domain-level data on which we select one SFCfor
each word that wasnot disambiguatedin Stage 3. The correlation coefficients are quite intuitively
reasonable, as can be seen in Figure 4 where the ten highest correlations are listed. The unexpected
correlation betweenLAWand BUILDING
is due to the highly frequent usageof the word ’court’ which has
SFCsfor both LAWand BUILDING
and contributes greatly to the high correlation betweenthe two SFCs.
Co-efficient
¯
¯
¯
¯
¯
¯
¯
¯
¯
¯
91314
80801
73958
73654
7219 g
71428
70844
70271
68600
68177
SFC-2
SFC-1
NET GAMES
ECONOMICS
SOCIOLOGY
THEATER
THEATER
PLANT NAMES
ANIMAL HUSBANDRY
AGRICULTURE
LAW
GAMBLING
COURT GAMES
BUSINESS
LAW
ENTERTAINMENT
MUSIC
AGRICULTURE
AGRICULTURE
BUSINESS
BUILDING
CARD GAMES
Fig. 4: Highest Correlations BetweenSFCsBased on 255,709 SFCFrequencies from 977 WSJArticles
In Stage 4, one ambiguousword at a time is resolved, accessing the matrix via the unique and highfrequency SFCsdetermined for a sentence in Stage 3. The system evaluates the correlation coefficients
betweenthe unique/frequent SFCsof the sentence and the multiple SFCsassigned to the word being
disambiguatedin order to determine which of the multiple SFCshas the highest correlation with the
unique and/or high-frequency SFCs.The system then selects that SFCas the unambiguous
representation of the sense of the word.
Wehave developedheuristics for three separate cases for selecting a single SFCfor a word using
the correlation matrix. The three cases function better than handling all instances as a single case
becauseof the special treatment neededfor words with the less-substantive GENERAL
(XX) or CLOSED
SYSTEM
PARTOF SPEECH
(CS) codes. For the two cases where there are XX or CS amongst the SFCs
for the word being disambiguated,wetake the order of the SFCsinto consideration, reflecting the fact
that the first SFClisted is morelikely to be correct, since the most widely used senseis listed first in
LDOCE.
So to overcomethis likelihood, a moresubstantive SFClisted later in the entry must have a
higher correlation with the sentence-determinedSFC.
102
From: AAAI Technical Report FS-92-04. Copyright © 1992, AAAI (www.aaai.org). All rights reserved.
To clarify the following description of the disambiguation procedure, we refer to the ambiguous
word’s multiple SFCsas word-attached SFCsand the unique and high-frequency SFCsthat were
established at the sentence level in Stage 3 as sentence-determined SFCs.
Case 1- Words with no XX or CS SFCs:
If any word-attached SFChas a correlation greater than .6 with any one of the sentencedetermined SFCs, select that word-attached SFC.
If no word-attached SFChas such a correlation, average the correlations betweenthe wordattached SFCand sentence-determinedSFCscorrelations, and select the word-attached
SFCwith the highest averagecorrelation.
Case2 - Wordswith XX or CS listed first in LDOCE
entry:
Select the XX or CSunless a moresubstantive SFCfurther downthe list of senses has a
correlation with the sentence-determinedSFCsgreater than 0.6.
Case3 - Wordswhere XX or CSis not the first listed SFCin LDOCE
entry:
Choosethe moresubstantive SFCwhich occurs before XX or CSif it has a correlation
greater than 0.4.
Figure 5 presents a sentence which illustrates howthe heuristics use the correlation matrix values
to select SFCs. For this sentence, BEAUTY
CULTURE,
CALENDAR,
and ECONOMICS
were selected at
Stage 3 as unique SFCs,basedon each being the sole SFCassigned to ’cosmetics’, ’November’, and
’financing’ respectively. Therefore, whena SFCneedsto be selected for ’giant’, Case3 says that the
moresubstantive SFC(LITERATURE)
must have a correlation greater than .4 with a unique SFCin order
to be selected. Since the 3 SFCsare less than .4, the GENERAL
SFC(XX) is correctly chosen.
In the case of ’junk’, since GENERAL
is listed first, either DRUGS
or NAUTICAL
must have a
correlation coefficient of at least .6 to be selected over GENERAL.
Since neither do, the correct choice
of GENERAL
is made.For the case of ’floated’, the samelogic applies, but since BUSINESS
has a
correlation of .808, BUSINESS
is selected over the first occurring GENERAL.
~vORRELATION
WITH UNIQUESFCs
He
acquired
the
cosmetics
giant
pro
v
det
n
n
in
November,
financing
in
the
transaction
with
junk
prep
n
v
prep
det
n
prep
n
GENERAL
BEAUTYCULTURE
LITERATURE,GENERAL
LITERATURE-BEAUTYCULTURE:.328
LITERATURE-CALENDAR:
.334
.1 62
LITERATURE-ECONOMICS:
-> Select GENERAL
by Case 3
CALENDAR
ECONOMICS
GENERAL
CULTURE:
GENERAL,DRUGS,NAUTICAL DRUGS-BEAUTY
DRUGS-CALENDAR:
DRUGS-ECONOMICS:
NAUTICAL-BEAUTYCULTURE:
NAUTICAL-CALENDAR:
NAUTICAL-ECONOMICS:
-> Select GENERAL
by Case 2
bonds
n
GENERAL
103
.317
.329
.269
.376
.434
.378
From: AAAI Technical Report FS-92-04. Copyright © 1992, AAAI (www.aaai.org). All rights reserved.
floated
v
by
DBL,Inc.
prep
prop
GENERAL,BUSINESS
BUSINESS-BEAUTYCULTURE:
BUSINESS-CALENDAR:
BUSINESS-ECONOMICS:
-> Select BUSINESS
by Case 2
Fig. 5: Samplesentence exemplifying correlation-matrix
.376
.545
.808
heuristics
For those sentences which contain neither a unique or high-frequency SFC,the SFCwhich occurs
first in the LDOCE
ordering of sensesis selected.
Testino_ of DisambiouationProcedures
Wetested our SFCdisambiguation procedures on a sample of 166 sentences, containing 1638 nonfunction words which had SFCsin LDOCE.The sentences comprised a set of 12 randomly selected WSJ
articles. The system implementation of the disambiguation procedures was run and a single SFCwas
selected for each word. TheseSFCswere comparedto the sense-selections madeby an independent
judge whowas instructed to read the sentences and the definitions of the senses of each word and then
to select that sense of the word which wasmost correct. Figure 6 summarizesthe overall results (att.
= attempts, cor. = correct) presented according to the main source of knowledgeused in the
disambiguation process.
Local Heuristics
att.
cor.
%
1134
1032
91
DomainCorrelations
att.
cor.
%
Freauency Ordering
att.
cor.
%
268
236
206
77
219
93
att.
1638
Total
cor.
%
1457
89
Fig. 6: SFCDisambiguation Results Using Multiple Sources of Knowledgeon 1638 Words
Analysis of Results
After noting the fact that these results are indeed quite good, what is most obvious in the summary
results (Fig. 6) is that disambiguationbasedon both local heuristics and frequencyordering is much
better than the disambiguationbasedon domaincorrelations. Although this reflects negatively on the
role of probabilistic knowledgein disambiguation, the positive view of these results is that they are
quite reasonablefor a first attempt at using corpus-basedcorrelations and, in fact, the quality of the
information in this source can be improved.
Wewill present moredetailed results of the three main sources along with an error analysis of the
poorer results along with indications of howwe are currently incorporating these insights into
adjustments to the knowledgesources and disambiguation processes in order to produce an improved
version of the system.
First, wewill micro-analyze the results of using Local Heuristics to select an SFC.Local Heuristics
use either unique SFCsor high-frequency SFCsas determined within each sentence to both: 1) selfselect the SFCwhich has the one and only (unique) SFCassigned to that word, and; 2) use the unique
SFCassigned to one word in the sentence to select from amongstthe multiple SFCsassigned to another
word which has no unique SFCof its own. In addition, SFCsare summed
across all words in the sentence
and those with frequency greater than 3 are used to select an SFCfor an ambiguousword which has a
high-frequency SFCamongstits multiple SFCs.This is the process explained earlier in Stage 3 of the
disambiguation process. As seen in Figure 7, whena word has only one SFCassigned to it, the selection
by default must be correct. Therefore, the great majority of the selections are correct. However,it
appearsthat the reliance on an SFCwith frequencygreater than three as a goodindicator of local
context is inappropriate, producing only 65%correct results. In fact whenanalyzing a sampleof the
errors madeby relying on frequency data, we discovered that rather than considering all SFCswith a
104
From: AAAI Technical Report FS-92-04. Copyright © 1992, AAAI (www.aaai.org). All rights reserved.
frequency greater than 3 as good indicators of local context, our results would be improvedin 36%of
the cases if we ranked the SFCsaccording to their frequency and then applied them in that order. In
another 36%of the cases, our results would be improved if we raised the frequency threshold from 3
to 4. Theremaining 28%of the errors are not directly attributable to any single bias.
SFCFrequency > 3
att.
cor.
%
165
107
65
Uniauelv Assigned SFC
art.
cor.
%
819
819
100
Selected bv Unigue SF~
att.
cot.
%
150
106
71
att.
1134
Total
cor.
1032
%
91
Fig. 7: Micro-analysis of DisambiguationResults based on Local Heuristics
The micro-analysis of the results based on domaincorrelations is presented in Figure 8. The major
source of error here can be attributed to the need to recomputethe correlation matrix. The results of
our first effort at using statistical information for sense disambiguationcan nowbe used as feedback.
After wemakethe corrections to the heuristics learnt from our micro-analysis, we will run the
disambiguator on a new sample of 1000 WSJarticles. The disambiguated SFCvector representations of
those articles can then be used to re-computethe correlation matrix, so that whenthe disambiguator is
run on the next sampleset of sentences, the values selected from the matrix will be moreaccurate.
Simply recomputing the matrix has the potential for correcting 62%of the errors in the case whenXX
is not the first SFCin the ordering of senses for a word, which is wheredisambiguationbasedon
domaincorrelations is performing worst.
Another cause of error in the case whereXX is not the first listed SFC,is causedby SFCsassigned
to proper nouns, whosefrequent occurrence is a problem whenprocessing newspapertext. In the initial
version of the system as reported in this paper, proper nouns were processed the sameas common
nouns, producing somehumorous SFCtags, such as: Bush = HORTICULTURE
Carter = OCCUPATIONS,
Baton Rouge, LA = MAGIC+ COSMETICS
+ MUSIC.
In computingour experimental results, wedid not include the explicit tagging of proper nouns
becauseproper nouns will not be processedin the secondversion of the system nowbeing implemented.
The category of proper noun will becomeone of the grammatical categories for which the system does
not retrieve SFCs.However,we cannot removethe implicit effect of proper noun tagging from the
current results, since their tags were included whendetermining the high frequency SFCsand when
selecting unique SFCs,which are then used by the correlation matrix heuristics. Therefore, the SFCsof
proper nouns did impact, in most instances negatively, on selection of SFCsfor ambiguouswords, but
this source of error will be eliminated from the next version of the system.
In results attributable to Case1 of Stage 4 processing whenthere is no XX amongstthe SFCs
assigned to word, 50%of the errors have the potential for improvementusing a re-computedmatrix.
An additional source of error in this set is the averaging rule which comesinto play whenno single
word-attached SFChas a correlation greater than .6 with one of the sentence-determinedSFCs.When
this occurs, all the correlations betweenword-attached SFCsand each sentence-determinedSFCare
averaged and the word-attached SFCwith the highest average correlation is chosen. Our analysis
showsthat this averaging diminishes the effect of what actually is the correct SFC.Therefore, in our
next implementation, we will use the Dempster-Shafer formula (Shafer, 1976) which combines
multiple sources of evidence (correlations with various sentence-determinedSFCs)in a mannerthat
does not diminish the strongest evidence by averaging it with the weakerevidence.
XXis first SFC
art.
cor.
%
188
161
86
XXis not first
att.
cor.
SFC
%
No XX is SFCs
att.
cor.
%
att.
Total
cor.
%
64
55
16
268
206
77
35
10
62
Fig. 8: Micro-analysis of DisambiguationResults based on DomainCorrelations
Finally, in reviewing the results which dependon ordering of sensesin LDOCE
(based on their usage
in the languageas determined by LDOCE
lexicographers), we currently do not see a wayto improve the
105
From: AAAI Technical Report FS-92-04. Copyright © 1992, AAAI (www.aaai.org). All rights reserved.
results in Figure 9. This data represent those sentencesin which no unique or highly-frequent SFC
occurs, and therefore there are no SFCsto access the matrix, and the only knowledgesource for
disambiguation is simple ordering of SFCsin LDOCE.
Thesecases appear to us to occur in those
sentences which in fact are very unsubstantive, being filled with manywords having only quite general
meanings.Fortunately, the disambiguationresults in these cases are quite reasonable.
XX is only SFC
att.
cor. %
Multiple SFCs
att.
cor. %
144
92
144
100
75
82
att.
236
Total
cor.
219
%
93
Fig. 9: Micro-analysis of Disambiguation Results based on LDOCE
Ordering by Frequency
Conclusions
Wewould like to compareour results to those of others doing similar work, but have not found
comparablework which provides quantified results. Quantitative results in the literature on lexical
disambiguation to which we can compareour efforts are those of Lesk (1986) who used a small set
words and reported a success rate for disambiguation of 50%to 70%, and Wilks et al (1989) who
reported 45%success in an experiment on just the word ’bank’. Comparedto these results, our
experiments on a randomly selected set of sentences containing 1638 words requiring disambiguation
reflect a larger scale testing of a disambiguation methodology.Although our system selects SFCsand
not actual senses, there are manytext analysis tasks whereour approach’s successful combination of
three knowledgesources for the selection of either a single sense or a moreconstrained set of senses,
would contribute to the system’ssuccessful completion of its task.
Recent work by McRoy(1992) which we find very promising also uses multiple knowledgesources,
but does not report a quantitative evaluation of the disambiguator’s performancedue to their stated
difficulty in acquiring humandisambiguation of words. Weacknowledgethe difficulty in obtaining human
judgments, but have chosen to quantify our disambiguator’s performancealthough there maybe some
noise in the results. In an effort to assure the quality of the results, we haveused a single native
English speaker with excellent languageskills whosechoices are reviewed by two additional
confirmatory judges to determine each word’s correct sense. In addition, we agree with McRoythat
perhaps a morereasonable evaluation of the disambiguator’s performancewould be in respect to the
task for which the NLPsystem is designed. Wehave preliminary results on the effectiveness of using
documentrepresentations consisting of disambiguatedSFCvectors for the clustering of documentsfor
use in our documentdetection system. Our preliminary experimental efforts (Liddy, Paik & Woelfel,
1992) produce coherent subject-based documentclusters whoseuse in the documentdetection system
reduces the retrieval computation by 80%and actually improves precision as comparedto individual
documentto query matching of SFCvectors.
In conclusion, we find our results both intuitively and pragmatically pleasing. Our automatic
approximation of lexical disambiguation has producedsomeexcellent results for a first implementation.
In addition, we knowhow to adjust and improve most of the primary causes of the incorrect results.
Additionally, the attempt to replicate the sources of knowledgethat have beensuggestedin the
psycholinguistic literature as alternative influences on the humandisambiguation process, have been
successfully replicated in machine-readableresources and computational processes.
Acknowledgments
Wewish to thank LongmanGroup, Ltd. for making the machinereadable version of LDOCE,2nd
Edition available to us and BBNfor makingPOST
available for our use on this project.
References
Boguraev, B. & Briscoe, T. (1989). Computational lexicography for natural languageprocessing.
London: Longman.
106
From: AAAI Technical Report FS-92-04. Copyright © 1992, AAAI (www.aaai.org). All rights reserved.
Choueka,Y. & Lusignan, S. (1985). Disambiguation by short contexts. Computersand the Humanities,
pp. 147-157.
¯ Gentner, D. (1981). Someinteresting differences betweenverbs and nouns. Cognition and brain theory.
4(2), 161-178.
Hogaboam,
T. W. & Perfetti, C. A. (1975). Lexical ambiguity and sentence comprehension.
Journal of verbal learning and verbal behavior. 16 (3), pp. 265-274.
Kelly, E. F. & Stone, P. J. (1975). Com.Duterrecognition of English word senses. Amsterdam:
North Holland Publishing Co.
Krovetz, R. (1991). Lexical acquisition and information retrieval. In Zernik, U. (Ed.). Lexical
acquisition: exploiting on-line resourcesto build a lexicon. Hillsdale, NJ: LawrenceEarlbaum.
Lesk, M. (1986). Automatic sense disambiguation using machinereadable dictionaries: How
tell a pine cone from an ice creamcone. Proceedingsof SIGDOC.
pp. 24-26.
Liddy, E.D. & Paik, W. (1991). An intelligent semantic relation assigner. Proceedingsof Workshopon
Natural LanguageLearnina. Sponsoredby IJCAI ’91, Sydney, Australia.
Liddy, E.D., Paik, W. & Woelfel, J.K. (1992). Useof subject field codes from a machinereadable dictionary for automatic classification of documents.Advancesin Classification
Research: Proceedings of the 3rd ASIS SIG/CRClassification ResearchWorkshop.Medford,
NJ: LearnedInformation, Inc.
McRoy,S. W. (1992). Using multiple knowledgesources for word sense disambiguation. Computational
linguistics. 18 (1). pp. 1-30.
Meteer, M., Schwartz, R. & Weischedel, R. (1991). POST:Using probabilities in language
processing. Proceedingsof the Twelfth International Conferenceon Artificial Intelligence.
Sydney, Australia.
Prather, P.A. & Swinney, D. A. (1988). Lexical processing and ambiguity resolution: An automatic
process in an interactive box. In Small, S., Cottrell, G., & Tanenhaus,M. (Eds). Lexical ambigvity
resolution: Perspectives from Psycholinouistics. Neuropsycholoov.and Artificial Intelligence San
Mateo, CA: Morgan Kaufmann.
Shafer, G. (1976). A mathematical theory_ of evidence. Princeton, NJ: Princeton University Press.
Slator, B. (1991). Using context for sensepreference. In Zernik, U. (Ed.). Lexical acquisition: exploiting
on-line resources to build a lexicon. Hillsdale, NJ: LawrenceEarlbaum.
Small, S., Cottrell, G., & Tanenhaus,M. (1988). Lexical ambiguity resolution: Perspectives
from Psycholinguistics. Neuropsychology.and Artificial Intelligence. San Mateo, CA:
Morgan Kaufmann.
Walker, D. E. & Amsler, R. A. (1986). The use of machine-readabledictionaries in sublanguage
analysis. In Grishman,R. & Kittredge, R. (Eds). Analyzing languagein restricted domains:
Sublanguaoedescription and processing. Hillsdale, NJ: LawrenceEarlbaum.
Wllks, Y., Fass, D., Guo, C-M., McDonald,J., Plate, T. & Slator, B. (1989). A tractable
machinedictionary as a resource for computational semantics. In Boguraev, B. & Briscoe,
T. (Eds). Computationallexicography for natural languageprocessing. London: Longman.
107