Ppt

advertisement
Applying Key Phrase Extraction to aid
Invalidity Search
Manisha Verma, Vasudeva Varma
SIEL, LTRC, IIIT Hyderabad
Outline

Introduction

Related Work

Motivation and Contribution

Approaches

Experiments and Results

Future Work

Questions ???
INTRODUCTION
Invalidity Search

The task is to uncover patents or other published
prior art that may render a granted patent invalid

Find prior art that the patent examiner overlooked
so that a patent can be declared invalid.
Input and Process

INPUT
 It’s a patent application

PROCESS
 Use existing search engines to find similar work.
 MANUALLY create queries, go through several
documents – articles, granted patents etc and find
similar documents.
Related Work
Related Work
Two ways of approaching the problem
1.
2.
Create a query from a patent and try different
retrieval models
Use different models to create a query from a
patent then use an existing retrieval model.
Our work employs the second approach.
Approach 1


Use claim text or abstract to create a query from the
patent.
Following have been used to improve Recall and
Precision



Re-ranking using several features
Cluster based Pseudo Relevance Feedback
Scoring based on subtopics etc.
Approach 2

Select words/phrases from different sections in a
patent


Select words using tf-idf from a patent.



Find out which section results in best queries
Assign weight to each word to mark its importance.
Common weighing methods explored are tf,and tf-idf
Identify the optimal length of the query i.e. number of
words to keep in a query generated from a patent.
 Empirically determine the value.
Motivation and Contribution
Motivation and Contribution

Explore and evaluate different ways to select phrases to
make queries for patents.

Though several key phrase extraction approaches have been
proposed in the literature, they have not been used to create
queries for invalidity search task.
Evaluate and analyze the performance of queries
created by using state-of-the-art unsupervised and
supervised key phrase extraction techniques.
Approaches
Key Phrase Extraction Techniques

Unsupervised





TextRank (R. Mihalcea et al.)
SingleRank (X. Wan et al.)
Tf-Idf
Tf
Supervised


RankPhrase (X. Jiang et al.)
KEA (I. H.Witten et al.)
Unsupervised Approaches

TextRank



Present text as graph using cooccurrence statistics
Run iterative algorithm to find
dominant nodes (words) in graph..
SingleRank


Same approach as TextRank
While in TextRank phrases
containing the top-ranked words
are selected, in SingleRank, we do
not filter out any low scoring
words.
Supervised Approaches

KEA
 Use features to represent key phrases.
 Use a classifier to train on manually annotated data.

RankPhrase
 Treat key phrase extraction as ranking problem
 Same features from KEA have been used
Training Supervised Approaches ???
•
To annotate patents with key phrases, take some
applications with relevance judgments.
For every phrase in the document
–
–
–
–
•
•
Fire it as a query.
Calculate MAP and Recall of that phrase (using the
relevance judgments)
Select phrases with high Map and Recall
Prune phrases based on tf-idf scores
Use these phrases for the document.
Use some sample documents annotated using this
approach to train the supervised approach.
Experiments And Results
Our DATA
 1.3
million patents (NTCIR)
 1000 patent applications
 For each application, a list of
patents which claim same
invention is provided.
Unsupervised vs Supervised
Performance on different sections
Results

The experiments indicate that key phrase extraction
techniques indeed improve invalidity search results.

Queries created by using unsupervised and supervised
approaches perform better than those formed by tf or tfidf.

In supervised approaches, queries created by using
phrases extracted by KEA show 29% and 37%
improvement in MAP over TextRank and tf-idf respectively.
Future Work



Weigh queries generated by using both the
approaches
Try the approaches on different patent collections
Explore combination of the two approaches for
query construction
References

X. Xue and W. B. Croft. Automatic query generation for patent
search. In CIKM '09: Proceeding of the 18th ACM conference on Information and
knowledge management, pages 2037–2040, NY, USA, 2009. ACM.

R. Mihalcea and P. Tarau. TextRank: Bringing order into texts. In Proc. of
EMNLP, 2004.

X. Xue and W. B. Croft. Transforming patents into prior-art queries.
In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on
Research and development in information retrieval, pages 808–809, NY, USA,
2009. ACM.

X. Jiang,Y. Hu, and H. Li. A ranking approach to key phrase extraction.
In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on
Research and development in information retrieval, pages 756–757, NY, USA,
2009. ACM.
Questions ???
Download