ICEGOV 2010

advertisement
Retrieving Information Across Multiple, Related Domains
Based on User Query and Feedback: Application to Patent
Laws and Regulations
Hang Yu, University of Illinois at Urbana-Champaign
Siddharth Taduri, Stanford University
Jay Kesan, University of Illinois at Urbana-Champaign
Gloria Lau, Stanford University
Kincho H. Law, Stanford University
27th October 2010
International Conference on Theory and Practice of Electronic
Governance (ICEGOV), Beijing, China.
PROBLEM STATEMENT
Technology Firms’ Concerns
• Can I get patent protection for my innovation?
• Do I build or do I buy related technologies?
• What are my competitors doing?
• How strong are their patents?
• Am I perhaps infringing on someone else’s patents?
• Is so, are those patents valid?
• Have they been enforced in court?
• Has their validity been challenged in court?
2
Motivation
10/27/2010
PROBLEM STATEMENT
PATENTS
PUBLICATIONS
PTO FILE
WRAPPERS
COURT CASES
LAWS &
REGULATIONS
3
Motivation
10/27/2010
BACKGROUND
4
Motivation
10/27/2010
PATENTS
500,000
450,000
400,000
350,000
300,000
250,000
200,000
150,000
100,000
Patent
Applicati
ons
2004
2005
2006
2007
2008
Granted
Patents
5
Challenges
10/27/2010
IP LITIGATION
6
Challenges
10/27/2010
USPTO PROCEEDINGS: FILE WRAPPERS
7
Challenges
10/27/2010
SCIENTIFIC PUBLICATIONS
8
Challenges
10/27/2010
PROPOSED FRAMEWORK
Step 1: Expand
Keywords
Step 2: Independently
search domains
Step 3: Combine
Results + Rank
Step 4: Consider User
Feedback
9
Proposed Framework
10/27/2010
STEP 1: EXPAND KEY WORDS
Goal: Expand the user query using ontologies/taxonomies (BioPortal,
GeneCards, MedTerms)
Simple Example:
Doc A
The car has a 3.5l V6 engine
Doc B
The vehicle has a 3.5l V6 engine
Keyword search for “car” will return only Doc A. An ontology that
describes the term “vehicle” as a synonym, or a parent of “car” will
internally expand the query to return both Doc A and Doc B
Challenges:
10
Proposed Framework
10/27/2010
STEP 2: INDEPENDENTLY SEARCH DATABASES
Goal: Find relevant documents in a database of homogenous
documents (e.g., Patents, or publications)
Challenges:
11
Proposed Framework
10/27/2010
STEP 3: COMBINE RESULTS FROM THE FOUR DIFFERENT DOMAINS
Goal: (1) Cross-reference results from other domains
(2) Rank results
Challenges:
12
Proposed Framework
10/27/2010
STEP 4: CONSIDER USER FEEDBACK
Goal: Consider user feedback from domain experts
Challenges:
13
Proposed Framework
10/27/2010
EXPERIMENTATION/METHODOLOGY
14
Use Case: EPO
10/27/2010
USE CASE: EPO/ERYTHROPOIETIN
Why does this make a good use case?
15
Use Case: EPO
10/27/2010
PATENTS
Search results for “erythropoietin”
amongst the 135 closely related
patents:
Patent Number
5955422
6204247
6245740
6270989
6280977
6340742
6420339
6420340
6524818
Rank
0.109
0.000
0.018
0.000
0.027
0.113
0.000
0.000
0.009
U.S. Patent No. 6,204,247 is relevant but
does not contain the term erythropoietin
Q: How can this be made better?
16
Use Case: EPO
10/27/2010
ONTOLOGY
a
b
(a) Gene Ontology
(b) NCI Thesaurus
17
Use Case: EPO
Expanded Term Base
“Erythropoietin”, “Erythropoietin Receptor
Binding”, “Colony Stimulating Factor”,
“Cytokine” …
10/27/2010
RESULTS AFTER USING EXPANDED TERM BASE
Patent Number
5955422
6204247
6245740
6270989
6280977
6340742
6420339
6420340
6524818
18
Use Case: EPO
Score
0.050
0.028
0.038
0.005
0.008
0.049
0.026
0.028
0.015
10/27/2010
ENTREZ: CROSS-DATABASE SEARCH FOR THE LIFE SCIENCES
19
Use Case: EPO
10/27/2010
CROSS REFERENCING SCIENTIFIC PUBICATIONS WITH CORE PATENTS
Paper Id
Ref
Score
Erythropoietin
EPO
Protein
6713094
5
0.446
7.59
0
2813359
5
1.093
8.74
0
18202227
5
0.565
3.96
0.565
3680293
4
0.467
3.74
1.402
3624248
3
3.265
0
1.224
232226
2
0
0
0
14025852
1
0
0
0
Table: Example of some selected papers with their
RefScore and some expanded term’s rank ( word
frequency )
20
Use Case: EPO
10/27/2010
RefScore
CORRELATION BETWEEN EXPANDED TERMS AND RefScore
Word Frequency (%)
Keyword
Erythropoietin
Epo
Iron
Erythropoietin
Cytokines
Desamethasone
hydroxyurea
Protein
21
Correlation
0.089
0.08
0.065
0.035
0.035
0.035
0.035
-0.002
Use Case: EPO
10/27/2010
USE CITATIONS AS USER FEEDBACK to IMPROVE
Paper
RefScore #Citation
Rufs(%)
Fufs
6713094
5
219
2.28
0.94
2813359
5
134
3.73
1.54
18202227 5
260
1.92
0.79
3680293
4
119
3.36
1.38
362424
3
98
3.06
1.26
232226
2
103
1.94
0.80
14205852 1
98
1.02
0.42
Total
1031
2.42
--
22
25
10/27/2010
OTHER ISSUES AND CHALLENGES
23
Other issues and challenges
10/27/2010
CURRENT STATUS & FUTURE WORK
Future Work
24
Current Status and Future Work
10/27/2010
USEFUL LINKS
25
Patents
USPTO
– http://www.uspto.gov/
Delphion
–
http://www.delphion.com/
Google Patents
–
http://www.google.com/patents/
File Wrappers
PAIR
–
http://portal.uspto.gov/external/portal/pair/
Court Cases
PACER
–
http://pacer.psc.uscourts.gov/
Publications
Pubmed
–
http://www.ncbi.nlm.nih.gov/pubmed/
Medline
–
http://www.nlm.nih.gov/medlineplus/
Google Scholar
– http://scholar.google.com/
Ontology/Taxonomy
BioPortal
– http://bioportal.bioontology.com/
Genecards
–
http://www.genecards.org/
MedTerms
–
http://www.medterms.com/
Miscellaneous
Thomson Reuters –
http://www.thomsoninnovation.com/
Dialog
–
http://www.dialog.com/
10/27/2010
ACKNOWLEDGEMENT
This research is partially supported by NSF
Grant Number 0811975 awarded to the
University of Illinois and NSF Grant Number
0811460 to Stanford University.
Any opinions and findings are those of the
authors, and do not necessarily reflect the
views of the National Science Foundation.
26
10/27/2010
DISCUSSION
27
10/27/2010
Download