Retrieving Information Across Multiple, Related Domains Based on User Query and Feedback: Application to Patent Laws and Regulations Hang Yu, University of Illinois at Urbana-Champaign Siddharth Taduri, Stanford University Jay Kesan, University of Illinois at Urbana-Champaign Gloria Lau, Stanford University Kincho H. Law, Stanford University 27th October 2010 International Conference on Theory and Practice of Electronic Governance (ICEGOV), Beijing, China. PROBLEM STATEMENT Technology Firms’ Concerns • Can I get patent protection for my innovation? • Do I build or do I buy related technologies? • What are my competitors doing? • How strong are their patents? • Am I perhaps infringing on someone else’s patents? • Is so, are those patents valid? • Have they been enforced in court? • Has their validity been challenged in court? 2 Motivation 10/27/2010 PROBLEM STATEMENT PATENTS PUBLICATIONS PTO FILE WRAPPERS COURT CASES LAWS & REGULATIONS 3 Motivation 10/27/2010 BACKGROUND 4 Motivation 10/27/2010 PATENTS 500,000 450,000 400,000 350,000 300,000 250,000 200,000 150,000 100,000 Patent Applicati ons 2004 2005 2006 2007 2008 Granted Patents 5 Challenges 10/27/2010 IP LITIGATION 6 Challenges 10/27/2010 USPTO PROCEEDINGS: FILE WRAPPERS 7 Challenges 10/27/2010 SCIENTIFIC PUBLICATIONS 8 Challenges 10/27/2010 PROPOSED FRAMEWORK Step 1: Expand Keywords Step 2: Independently search domains Step 3: Combine Results + Rank Step 4: Consider User Feedback 9 Proposed Framework 10/27/2010 STEP 1: EXPAND KEY WORDS Goal: Expand the user query using ontologies/taxonomies (BioPortal, GeneCards, MedTerms) Simple Example: Doc A The car has a 3.5l V6 engine Doc B The vehicle has a 3.5l V6 engine Keyword search for “car” will return only Doc A. An ontology that describes the term “vehicle” as a synonym, or a parent of “car” will internally expand the query to return both Doc A and Doc B Challenges: 10 Proposed Framework 10/27/2010 STEP 2: INDEPENDENTLY SEARCH DATABASES Goal: Find relevant documents in a database of homogenous documents (e.g., Patents, or publications) Challenges: 11 Proposed Framework 10/27/2010 STEP 3: COMBINE RESULTS FROM THE FOUR DIFFERENT DOMAINS Goal: (1) Cross-reference results from other domains (2) Rank results Challenges: 12 Proposed Framework 10/27/2010 STEP 4: CONSIDER USER FEEDBACK Goal: Consider user feedback from domain experts Challenges: 13 Proposed Framework 10/27/2010 EXPERIMENTATION/METHODOLOGY 14 Use Case: EPO 10/27/2010 USE CASE: EPO/ERYTHROPOIETIN Why does this make a good use case? 15 Use Case: EPO 10/27/2010 PATENTS Search results for “erythropoietin” amongst the 135 closely related patents: Patent Number 5955422 6204247 6245740 6270989 6280977 6340742 6420339 6420340 6524818 Rank 0.109 0.000 0.018 0.000 0.027 0.113 0.000 0.000 0.009 U.S. Patent No. 6,204,247 is relevant but does not contain the term erythropoietin Q: How can this be made better? 16 Use Case: EPO 10/27/2010 ONTOLOGY a b (a) Gene Ontology (b) NCI Thesaurus 17 Use Case: EPO Expanded Term Base “Erythropoietin”, “Erythropoietin Receptor Binding”, “Colony Stimulating Factor”, “Cytokine” … 10/27/2010 RESULTS AFTER USING EXPANDED TERM BASE Patent Number 5955422 6204247 6245740 6270989 6280977 6340742 6420339 6420340 6524818 18 Use Case: EPO Score 0.050 0.028 0.038 0.005 0.008 0.049 0.026 0.028 0.015 10/27/2010 ENTREZ: CROSS-DATABASE SEARCH FOR THE LIFE SCIENCES 19 Use Case: EPO 10/27/2010 CROSS REFERENCING SCIENTIFIC PUBICATIONS WITH CORE PATENTS Paper Id Ref Score Erythropoietin EPO Protein 6713094 5 0.446 7.59 0 2813359 5 1.093 8.74 0 18202227 5 0.565 3.96 0.565 3680293 4 0.467 3.74 1.402 3624248 3 3.265 0 1.224 232226 2 0 0 0 14025852 1 0 0 0 Table: Example of some selected papers with their RefScore and some expanded term’s rank ( word frequency ) 20 Use Case: EPO 10/27/2010 RefScore CORRELATION BETWEEN EXPANDED TERMS AND RefScore Word Frequency (%) Keyword Erythropoietin Epo Iron Erythropoietin Cytokines Desamethasone hydroxyurea Protein 21 Correlation 0.089 0.08 0.065 0.035 0.035 0.035 0.035 -0.002 Use Case: EPO 10/27/2010 USE CITATIONS AS USER FEEDBACK to IMPROVE Paper RefScore #Citation Rufs(%) Fufs 6713094 5 219 2.28 0.94 2813359 5 134 3.73 1.54 18202227 5 260 1.92 0.79 3680293 4 119 3.36 1.38 362424 3 98 3.06 1.26 232226 2 103 1.94 0.80 14205852 1 98 1.02 0.42 Total 1031 2.42 -- 22 25 10/27/2010 OTHER ISSUES AND CHALLENGES 23 Other issues and challenges 10/27/2010 CURRENT STATUS & FUTURE WORK Future Work 24 Current Status and Future Work 10/27/2010 USEFUL LINKS 25 Patents USPTO – http://www.uspto.gov/ Delphion – http://www.delphion.com/ Google Patents – http://www.google.com/patents/ File Wrappers PAIR – http://portal.uspto.gov/external/portal/pair/ Court Cases PACER – http://pacer.psc.uscourts.gov/ Publications Pubmed – http://www.ncbi.nlm.nih.gov/pubmed/ Medline – http://www.nlm.nih.gov/medlineplus/ Google Scholar – http://scholar.google.com/ Ontology/Taxonomy BioPortal – http://bioportal.bioontology.com/ Genecards – http://www.genecards.org/ MedTerms – http://www.medterms.com/ Miscellaneous Thomson Reuters – http://www.thomsoninnovation.com/ Dialog – http://www.dialog.com/ 10/27/2010 ACKNOWLEDGEMENT This research is partially supported by NSF Grant Number 0811975 awarded to the University of Illinois and NSF Grant Number 0811460 to Stanford University. Any opinions and findings are those of the authors, and do not necessarily reflect the views of the National Science Foundation. 26 10/27/2010 DISCUSSION 27 10/27/2010