Evaluating the Contribution of EuroWordNet and
Word Sense Disambiguation to Cross-Language
Information Retrieval
Paul Clough 1 and Mark Stevenson 2
Department of Information Studies 1
Department of Computer Science 2
University of Sheffield, UK
GWC2004
Outline
• Introduction
• Word sense disambiguation
• Experimental setup
• CLIR evaluation
• WSD evaluation
• Discussion and conclusion
20 th January 2004
Introduction
• CLIR – search for documents written in one language (target) with queries written in another (source)
• Approaches – translate query, documents or both
• Translation methods – e.g. MT, MRDs, parallel corpora, controlled vocabulary
• Problems – e.g. lexical coverage, ambiguity, small context, proper names, compound words
• WSD – to identify the correct sense of a word during translation
• Experiments – with EuroWordNet and “standard” IR test collection resources
GWC2004 20 th January 2004
Example translation
<num> Number: CL1
<S-title> Caso Waldenheim Source query
Disambiguation needed?
EuroWordNet
caso#1 --> [case#9:grammatical case#1:](4167794)
"nouns or pronouns or adjectives (often marked by inflection) related in some way to other words in a sentence"
caso#2 --> [case#12:instance#2:](4704301)
"an occurrence of something; "it was a case of bad judgment""
caso#3 --> [case#16:event#2:](8533655)
"a special set of circumstances; "in that event, the first possibility is excluded"“
Case (event)
Waldenheim
Target query
GWC2004 20 th January 2004
Word sense disambiguation
• Each Spanish noun can be associated with multiple synsets, in addition each of these can be mapped to multiple synsets in the
ILI (English WN)
• Attempt to automatically identify the EuroWordNet synset appropriate to the query using WSD
• Adapt Resnik’s algorithm for disambiguating groups of nouns:
– Treats EuroWordNet as a hierarchy and identifies most likely synsets based on distance in WordNet and corpus information
– Query is treated as a “bag of words”
GWC2004 20 th January 2004
Experimental setup
•
•
•
TREC 6 collection (242,918 documents and 25 queries)
Spanish used for CL retrieval and English as monolingual baseline
Query translation process:
•
•
•
•
• term identification term translation (EWN) retrieval
EWN transformed into a kind of MRD for translation
Focused on translation of nouns and adjectives
Synset selection – manually, first, all or WSD algorithm
Synset member selection – head (first) or all
Experimented with short (title) and longer queries (title + description)
GWC2004 20 th January 2004
Example translation
<num> Number: CL1
<S-title> Caso Waldenheim Source query
Disambiguation needed?
EuroWordNet
caso#1 --> [case#9:grammatical case#1:](4167794)
"nouns or pronouns or adjectives (often marked by inflection) related in some way to other words in a sentence"
caso#2 --> [case#12:instance#2:](4704301)
"an occurrence of something; "it was a case of bad judgment""
caso#3 --> [case#16:event#2:](8533655)
"a special set of circumstances; "in that event, the first possibility is excluded"“
1 st sense, head case
Waldenheim
GWC2004
1 st sense, all words case grammatical case
Waldenheim all senses, head case
Waldenheim all senses, all words case grammatical case
Instance event
Waldenheim
20 th January 2004
CLIR evaluation (title & description)
•
•
Measured MAP and relevant retrieved using trec_eval
Baseline: map = 0.3512, relevant retrieved = 979
Synset selection
GOLD
All
1 st
WSD
Synset members
All
1 st
All
1 st
All
1 st
All
1 st
Relevant retrieved
890
676
760
698
707
550
765
579
MAP
0.2823
0.2459
0.2203
0.2215
0.2158
0.1994
0.2534
0.2073
80% monolingual
Highest (72% monolingual)
GWC2004 20 th January 2004
CLIR evaluation (title only)
•
Baseline: map = 0.3355, relevant retrieved = 977
Synset selection
GOLD
All
1 st
WSD
Synset members
All
1 st
All
1 st
All
1 st
All
1 st
Relevant retrieved
890
676
760
698
707
550
765
579
MAP
0.2823
0.2459
0.2203
0.2215
0.2158
0.1994
0.2534
0.2073
84% monolingual
Highest (76% monolingual)
GWC2004 20 th January 2004
WSD evaluation
• Manual annotation identifies single correct sense for each noun;
WSD algorithm can return multiple senses
• Calculated two evaluation metrics:
Correct sense
– Relaxed: score 1 if correct sense is identified; corresponds to proportion of words where correct senses is included
– Strict: score 1/m if correct sense included in m returned; gives indication of amount of incorrect senses returned
• “Choose first synset” used as naïve baseline x x x x x m
GWC2004 20 th January 2004
GWC2004
WSD evaluation
Language Method Strict Relaxed
English WSD 0.41
0.55
first 0.47
0.47
Spanish WSD 0.44
0.55
first 0.48
0.48
• WSD results are disappointing compared to state-of-the-art
• Limited context of queries seems to make disambiguation difficult
• BUT does not seem to effect CLIR results!
20 th January 2004
Discussion and conclusions
•
•
•
•
Disagreement of usefulness of WSD for monolingual retrieval
WSD algorithms have to be accurate to be useful for retrieval the IR algorithm performs a kind of disambiguation anyway
Our results suggest some WSD better than none for CLIR using
EWN as the translation resource even with poor WSD performance
WSD algorithm well-suited to CLIR where it selects senses only when there is sufficient context
Experiments highlight limitation in EWN for CLIR: many types of useful semantic information missing and lexical coverage
GWC2004 20 th January 2004
Future work
•
•
•
•
•
Experiment with different languages supported by EWN to see if results generalise
Experiment with different datasets (e.g. CLEF) and further bilingual pairs, e.g. English Spanish.
Use advanced query construction techniques, e.g. the “synonym” operator to combine synset members
Combine various WSD algorithms to improve their individual effectiveness
Improve the translation process based on EWN, e.g. identify phrases
GWC2004 20 th January 2004