Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu Motivation People may need to find definitions of terms from Web. Traditional information retrieval is designed to search for relevant document, not suitable for this. Google’s definition search may suffer from relying on glossary pages and ranking in alphabetic order. Task for definition search Receive a query term, usually a noun. Extract definition candidates from the document collection. Rank the candidates according to the degree to which each one is good. Output the result. Definition search is useful Candidates are not all good definitions Three categories of definitions Good: must contain the general notion of the term and several important properties. Bad: neither describes the general notion nor the properties of the term. Indifferent: between good and bad. First step: collecting candidates Parse all sentences with a Base NP (base noun phrase) parser and identify <term> with <term> is the first Base NP of the first sentence. Two Base NPs separated by “of” or “for” are considered as <term> Extract definition candidates with patterns: <term> is a|an|the * <term>, *, a,|an|the * <term> is one of * Second step: Ranking candidates Ranking based on Ordinal Regression (ordinal classification). Ranking SVM is used. Ranking based on classification SVM is used. Ranking based on Ordinal Regression Ordinal regression is a problem in which the classifiers classifies instances into a number of ordered categories. Ranking SVM is used as the model. For each candidate x, U(x)=wTx, where w represents a vector of weights. The higher of U(x), the better x is as a definition Ranking based on Classification Only good and bad definitions are used. It is a binary classification. SVM is used as the model. F(x)= wTx+b Features Removing redundant candidates After ranking, duplicate definition may exist. Use Edit distance to remove the one with a lower ranking score. Sample result Evaluation metric Results: For intranet data Results: For TREC.gov data Results: for definitional sentences Conclusions Address the issue of searching for definitions by definition ranking. Results are better than traditional IR. Enterprise search system has been developed. Not limited to search of definitions.