Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005

advertisement
Ranking Definitions with
Supervised Learning Methods
J.Xu, Y.Cao, H.Li and M.Zhao
WWW 2005
Presenter: Baoning Wu
Motivation
People may need to find definitions of
terms from Web.
Traditional information retrieval is
designed to search for relevant document,
not suitable for this.
Google’s definition search may suffer from
relying on glossary pages and ranking in
alphabetic order.
Task for definition search
Receive a query term, usually a noun.
Extract definition candidates from the
document collection.
Rank the candidates according to the
degree to which each one is good.
Output the result.
Definition search is useful
Candidates are not all good definitions
Three categories of definitions
Good: must contain the general notion of
the term and several important properties.
Bad: neither describes the general notion
nor the properties of the term.
Indifferent: between good and bad.
First step: collecting candidates
 Parse all sentences with a Base NP (base noun
phrase) parser and identify <term> with
<term> is the first Base NP of the first sentence.
Two Base NPs separated by “of” or “for” are considered
as <term>
 Extract definition candidates with patterns:
<term> is a|an|the *
<term>, *, a,|an|the *
<term> is one of *
Second step: Ranking candidates
Ranking based on Ordinal Regression
(ordinal classification).
Ranking SVM is used.
Ranking based on classification
SVM is used.
Ranking based on Ordinal Regression
Ordinal regression is a problem in which
the classifiers classifies instances into a
number of ordered categories.
Ranking SVM is used as the model.
For each candidate x,
U(x)=wTx, where w represents a vector of
weights.
The higher of U(x), the better x is as a definition
Ranking based on Classification
Only good and bad definitions are used. It
is a binary classification.
SVM is used as the model.
F(x)= wTx+b
Features
Removing redundant candidates
After ranking, duplicate definition may
exist.
Use Edit distance to remove the one with
a lower ranking score.
Sample result
Evaluation metric
Results: For intranet data
Results: For TREC.gov data
Results: for definitional sentences
Conclusions
Address the issue of searching for
definitions by definition ranking.
Results are better than traditional IR.
Enterprise search system has been
developed.
Not limited to search of definitions.
Download