Mapping Academic Patents to Papers

Mapping Academic Patents to Papers Hyun-Woo Kim,1 Zhen Lei,1 Brian Wright,2 John Yen1 1 Penn State University 2 UC Berkeley NAS SciSIP PI Conference September 20-21, 2012 NSF SciSIP Project Collaborative Research: The Impacts of University Research and Funding Sources in Chemical Sciences: Publishing, Patenting, Commercialization PIs: Brian Wright (UC Berkeley) and Zhen Lei (Penn State)  Role of research sponsor type (government or industry) on university research, patenting, technology transfer  Publishing, patenting, licensing/ MTAs, and diffusion and follow-on research of university inventions  Interplay between government and industry funding in university research Datasets Data 1: Access to University of California Office of Technology Transfer: 1) Invention disclosures, patenting and licensing history 2) Sponsor information, technology information Data 2: All scientific publications in chemical sciences by UC researchers in 1975-2005, and the associated citation profile of these publications Mapping Patents to Papers Patent/Paper Correspondence: One-to-One in Theory A paper An invention Same researchers Close dates A patent Not So Clean in Practice Patent filing Papers Patent filing Continuation Papers Grant Features of a Patent-Paper Pair Feature Group 1 (paper coauthors’ names): – Does first co-inventor’s last name appear in the co-author list? – Does first co-inventor’s “fist initial and last name” appear in the co-author list? – Does first co-author’s last name appear in the co-inventor list? – Does first co-author’s “fist initial and last name” appear in the co-inventor list? – Does last co-author’s last name appear in the co-inventor list? – Does last co-author’s “fist initial and last name” appear in the co-inventor list? – Fraction of patent inventors whose first initial and last name appear in the coauthor list of the paper – Fraction of patent inventors whose last names appear in the coauthor list of the paper Features of a Patent-Paper Pair Feature Group 2 (paper primary affiliation): – String similarity score (Levenshtein Distance) between patent assignee and paper primary affiliation – Percentage of the common words between patent assignee and paper primary affiliation – Is the patent assignees’ country the same as the paper primary affiliation’s? – Is the patent assignee’s (city or state)+country is the same is the paper primary affiliation’s? – Does first co-inventor’s country appear in the paper primary affiliation? – Does first co-inventor’s city/state and country appear in the paper primary affiliation? – Fraction of the inventors whose countries are same as the paper primary affiliation’s – Fraction of the inventors whose city/state and country are same as the paper primary affiliation’s Features of a Patent-Paper Pair Feature Group 3 (content similarity): – Fraction of the common words in patent and paper titles – Fraction of the common words in patent and paper abstracts – Fraction of the paper’s chemical substances that appear in patent title – Fraction of the paper’s chemical substances that appear in patent abstract Features of a Patent-Paper Pair Feature Group 4 (Timing): − Abs (Paper publication year – Patent filing year) − Abs (Paper publication year – Earliest patent filing year) Data Murray/Stern Data 165 pairs of Nature Biotech paper /US patent Our Experiment 165 patents: 162 with one GT (ground truth) paper, 3 with 2 GTs Retrieve papers from PubMed that share at least one last name Filtering: Exclude Review Articles (Earliest patent filing year -2) TO (Patent filing year +5) A total of 247322 patent-article pairs 1498.92 articles/patent on average Experiment 1 • 10-fold Cross Validation • Algorithms to Build Models Logistic Regression Normal-Identity Regression Binomial-LogLog Regression Binomial-Probit Regression An ensemble method averaging all above Model Comparison (rank of GT) • Use all features Upper 3.1647 3.5276 3.0640 3.1018 2.8788 Lower 3.5393 3.5276 3.4449 3.4765 2.8788 3.6 Upper Poisition Lower Position 3.5 Average GT1 Position Model Logistic Nor-Identity Bin-LogLog Bin-Probit Ensemble 3.7 3.4 3.3 3.2 3.1 3 2.9 2.8 Logistic Normal-Identity Binomial-LogLog Binomial-Probit Model Ensemble Tagging • Evaluate top ranked papers for each patent to see if they are GTs as well? • 1120 patent-paper pairs have been evaluated and tagged. – Not GTs: 566 pairs – Uncertain: 4 pairs – GTs: 550 pairs Histograms: (# of GTs per Patent) After Tagging 160 160 140 140 120 120 # of Patents # of Patents Before Tagging 100 80 100 80 60 60 40 40 20 20 0 1 2 3 4 5 6 7 # of GTs 8 9 10 11 0 1 2 3 4 5 6 7 # of GTs 8 9 10 11 Experiment 2 • Updated GT papers for each patent • 10-fold Cross Validation • Algorithms to Build Models Logistic Regression Normal-Identity Regression Binomial-LogLog Regression Binomial-Probit Regression An ensemble method averaging all above Model Comparison (rank of 1st GT) • Use all features 3 Upper Poisition Lower Position 2.8 Upper 1.0739 1.1923 1.0680 1.0739 1.0870 Lower 1.0739 1.1923 1.0680 1.0739 1.0870 2.6 Average GT1 Position Model Logistic Nor-Identity Bin-LogLog Bin-Probit Ensemble 2.4 2.2 2 1.8 1.6 1.4 1.2 1 Logistic Normal-Identity Binomial-LogLog Models Binomial-Probit Ensemble Model Comparison (fraction of GTs in Top k) • Use all features 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Top1 Top2 Top3 Top5 0.2 0.1 0 Logistic Normal-Identity Binomial-LogLog Binomial-Probit Ensemble Summary • An algorithm to link patents to papers • Useful tool for studying dynamics and interaction in utilization of university inventions by both academia and industry, and impacts of university patenting and licensing • Useful tool for evaluating impacts of government funding Thank you! zlei@psu.edu Fraction of patent inventors whose last names appear in GT papers 1 0.9 0.8 Feature 8 Value 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 Patent ID 7 8 9 10

Mapping Academic Patents to Papers

Related documents

Products

Support

Mapping Academic Patents to Papers

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib