In search of anti-commons: Academic patenting and patent-paper pairs in biotechnology. An analysis of citation flows. Tom Magerman, Bart Van Looy, Koenraad Debackere (tom.magerman@econ.kuleuven.be) INCENTIM (International Centre for Studies in Entrepreneurship and Innovation Management) K.U.Leuven Managerial Economics, Strategy & Innovation ECOOM (Centre for R&D Monitoring) ESF-APE-INV workshop Scientists & Inventors 10-11/5/2012 1957 University-Industry linkages SCIENCE TECHNOLOGY University-Industry linkages Commercialization of science (Entrepreneurial University) Scientification of technology University-Industry linkages + Complementarities Generation of new research ideas Additional funding Create a market of ideas University-Industry linkages + Complementarities Crowding out Generation of new research ideas Quality Additional funding Research orientation Create a market of ideas Anti-commons and the end of open science University-Industry linkages Commercialization of science (Entrepreneurial University) Scientification of technology Anti-commons and the end of open science If I have seen a little further [then you and Descartes] it is by standing on the shoulders of Giants. Isaac Newton, letter to Robert Hoode (originated from John of Salisbury) Anti-commons and the end of open science Anti-commons and the end of open science Tragedy of the anticommons: underuse of scarce resources because too many owners can block each other => more intellectual property rights may lead paradoxically to fewer useful products On the one hand incentive to undertake risky research On the other hand too many owners hold rights in previous discoveries that constitute obstacles to future research => high transaction costs lead to inefficiencies Biomedical research has been moving from a commons model toward a privatization model => risc of anticommons tragedy Influenced by patent system: what is patentable (e.g. patents on gene fragments) Influenced by patent owner: licensing behavior (e.g. use of reach-through license agreements) Transition or tragedy? Find ways to lower transactions costs of bundling rights (intermediate organizations; patent pools; cross-licensing) 8/09/2011 Tom Magerman – ENID 2011 17 Anti-commons and the end of open science Expansion of IPR is privatizing the scientific commons and limiting scientific progress – Heller and Eisenberg (1998); Argyres and Liebskind (1998); David (2000); Lessig (2002); Etzkowitz (1998); Krimsky (2003) Murray and Stern (2007): “Do formal intellectual property rights hinder the free flow of scientific knowledge? An empirical test of the anti-commons hypothesis” • How does IPRs affect propensity of future researchers to build upon knowledge? • Compare citation patterns of publications in pre-grant period and after grant • 169 patent-paper pairs (Nature Biotechnology) • Modest anti-commons effect: decline in citation rate by 10 to 20% Detection of patent-publication pairs Text Mining Text mining refers to the automated extraction of knowledge and information from text by means of revealing relationships and patterns present, but not obvious, in a document collection. Related to data mining, but additional issues: other scale of dimensionality (100,000+ ‘variables’) different kind of variables (not really independent, and very, very sparse – 99.99%) language issues (homonymy/polysemy and synonymy) Latent Semantic Analysis (LSA) LSA was developed late 1980s at BellCore/Bell Laboratories by Landauer and his team of Cognitive Science Research: “Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the meaning of words. Meaning is estimated using statistical computations applied to a large corpus of text. The corpus embodies a set of mutual constraints that largely determine the semantic similarity of words and sets of words. These constraints can be solved using linear algebra methods, in particular, singular value decomposition.” LSA is a technique for analyzing text: extract (underlying or latent) meaning from text LSA is a theory of meaning: meaning is acquired by solving an enormous set of simultaneous equations that capture the contextual usage of words LSA is a new approach to cognitive science: use large text corpora to test cognitive theories Linear algebra problem The meaning of passages of text must be sums of the meaning of its words. LSA models a large corpus of text as a large set of simultaneous equations. The solution is in the form of a set of vectors, one for each word and passage, in a semantic space Similarity of meaning of two words is measured by the cosine between the vectors, and the similarity of two passages as the same measure on the sum or average of all its contained words SVD dimensionality reduction Singular Value Decomposition rank-k approximation: A U V A A mn T with mn Ak 2 2 2 a diagonal matrix of singular values ( 1 2 ... n ) U mk . k k .V k n Dimensionality reduction by taking first k singular values: Practical application? Even when using LSA/SVD as text mining method, many options remain! Preprocessing Term weighting SVD truncation Assessment of 40 measure variants 4 weighting methods 9 SVD truncation levels + no SVD 40 similarity measures based on SVD and cosine Full process Construct DbT matrix Create full text index with stop word removal and stemming (Lucene) Convert full text index to documentby-term matrix (Matlab) Weight DbT matrix (4 variants) SVD truncation Decompose weighted DbT matrix into U∑V using 1,000 largest singular values Generate document –by-concept matrix V∑ Truncate documentby-concept matrix (take first 1000, 500, …, 5 concepts) Normalise DbT and DbC matrices Calculate distance matrix (all patents to all publications) by calculating inner product of vectors Retain closest publication for every patent for all of the 43 variants Similarity calculation Expert validation No SVD SVD 1000 SVD 500 SVD 300 SVD 200 SVD 100 SVD 25 0.61 0.34 0.31 0.30 0.31 0.30 0.22 SVD 5 No SVD SVD 1000 SVD 500 SVD 300 SVD 200 SVD 100 SVD 25 0.11 0.77 0.65 0.63 0.58 0.51 0.45 0.38 SVD 0.20 5 Measure TF-IDF R² IDF BIN RAW Measure R² No SVD SVD 1000 SVD 500 SVD 300 SVD 200 SVD 100 SVD 25 0.71 0.45 0.34 0.26 0.21 0.17 0.14 SVD 5 No SVD SVD 1000 SVD 500 SVD 300 SVD 200 SVD 100 SVD 25 0.11 0.80 0.63 0.57 0.54 0.51 0.49 0.46 SVD 0.21 5 Common terms (weighted by min number of terms) Common terms (weighted by max number of terms) 0.82 0.68 Common terms (weighted by avg number of terms) 0.75 University-Industry linkages Commercialization of science (Entrepreneurial University) Scientification of technology Methodology and data Publication data Selection of biotechnology publications from the Web of Science based on the subject classification (1991-2008): • Core set of 243,361 publications : subject category Biotechnology & Applied Microbiology • Extended set of 683,674 publications : publications of following subject categories citing or cited by a publication of the core set: Biochemical Research Methods; Biochemistry & Molecular Biology; Biophysics; Plant sciences; Cell Biology; Developmental Biology; Food sciences & Technology; Genetics & Heredity; Microbiology Materials • Multidisciplinary set of 97,970 publications : publications from multidisciplinary journals Nature; Science; and Proceedings of the National Academy of Sciences of the United States of America 1,025,005 publications in total (948,432 suited for text mining) 478,361 publications published between 1991 and 2000 Methodology and data Patent data Selection of all granted EPO and USPTO biotechnology patents, applied for between 1991 and 2008, from PATSTAT using IPCcodes as listed in OECD definition of biotechnology (‘A Framework for Biotechnology Statistics’, OECD, Paris, 2005) 27,241 EPO patents and 91,775 USPTO patents 119,016 patents in total (88,248 suited for text mining) Methodology and data Original document combinations: 83,697,227,136 patent-publication combinations CommonTermsMin ≥ 0.60: 27,250 patent-publication combinations And CommonTermsMax ≥ 0.30: 645 patent-publication combinations And at least one shared inventor/author: 584 patent-publication pairs Matching Methodology and data 584 patent-publication pairs identified • • • • 17 patent linked to multiple publications (up to 3) 115 publications linked to multiple patents (up to 7) (patent families) 566 distinct patents paired with publication 400 distinct publications paired with patent Patentee type • • • • • 292 University 128 Government / Non profit 126 Company 38 Hospital 21 Individual (42 patents have multiple patentees from different sectors) Pairs Publication and citation numbers Citation analysis Match publications to deal with quality differences Paired and non-paired publications matched by year and journal (1991-2000) VY SO 1991BIOCHEMISTRY 1991BIOTECHNIQUES … … 1992BIOSCIENCE BIOTECH AND BIOCHEMISTRY 1992BIOTECHNIQUES … … Total PAIRS NONPAIRS PUB AVG_AU AVG_CIT PUB AVG_AU AVG_CIT 1 5.00 65.00 625 4.03 57.20 1 2.00 64.00 125 3.24 40.27 1 1 2.00 4.00 4.00 147.00 543 144 4.24 3.07 8.07 26.17 328 5.18 130.47 117,909 4.42 67.03 328 paired publications versus 106,027 biotechnology publications Before and after publication and grant Variable Ratio average citations pairs/non-pairs Ratio average citations pairs/non- pairs Diff Variable Ratio average citations pairs/non-pairs Ratio average citations pairs/non-pairs Variable Ratio average citations pairs/non-pairs Class N Lower cl mean Mean Upper cl mean Pre-grant 288 1.42 1.71 2.00 Post-grant 288 1.48 1.74 2.00 -0.43 -0.03 0.36 (1-2) T-TESTS Method Variances DF t value Pr > |t| Pooled Equal 574 -0.17 0.8666 Satterthwaite Unequal 565 -0.17 0.8666 EQUALITY OF VARIANCES Method Num DF Den DF F value Pr > F 1.29 0.0299 Folded F 287 287 Paired sample t-tests Test Paired vs nonpaired Paired vs nonpaired (at least 2 paired publications) Paired and grey zone vs all others Paired and grey zone vs all others (at least 2 paired or grey zone publications) N Mean 1 Mean 2 Difference t value Pr > |t| Forward citations 190 130.47 74.24 56.23 4.33 0.0001 Without self citations 190 116.01 65.02 50.99 4.07 0.0001 Forward citations 59 224.97 131.63 93.34 3.12 0.0028 Without self citations 59 202.7 117.88 84.82 2.97 0.0043 Forward citations 764 60.57 42.69 17.88 5.72 0.0001 Without self citations 764 53.09 36.48 16.61 5.59 0.0001 Forward citations 281 96.41 59.64 36.77 5.57 0.0001 Without self citations 281 85.85 51.76 34.09 5.43 0.0001 Multivariate analysis (negative binomial) Parameter (Intercept) Pair (Y/N) Document type: Article Letter Note Review Number of backward publication citations Number of authors Time Time² Journal dummies (n=104) 95% Wald Confidence Interval Lower - Upper 2.719 3.213 .350 .549 Hypothesis Test Wald ChiSquare df Sig. 555.643 1 .000 78.945 1 .000 B 2.966 .450 Std. Error .1258 .0506 -.574 -.774 -.567 0 .0113 .0590 .0175 . -.596 -.890 -.601 . -.552 -.659 -.533 . 2589.688 172.469 1051.989 . 1 1 1 . .000 .000 .000 . .013 .0001 .013 .014 10416.453 1 .000 .033 .125 -.012 .0005 .0015 .0001 .032 .122 -.013 .034 .128 -.012 4613.407 7191.199 29450.994 1 1 1 .000 .000 .000 Included Sector analysis Pub sector COM KGI KGI+COM KGI KGI+COM KGI KGI+COM Pat sector N COM 21 COM 25 COM 15 KGI 227 KGI 16 KGI+COM 6 KGI+COM 5 315 Mean Median 71.6 34.0 70.5 49.0 106.7 80.0 179.2 67.0 282.0 131.5 219.2 93.5 85.0 67.0 164.4 66.0 Var 5,999.6 3,212.6 18,605.8 95,544.4 231,467.6 66,633.4 3,546.5 SD 77.5 56.7 136.4 309.1 481.1 258.1 59.6 84,846.9 291.3 Sector analysis Parameter (Intercept) Document type: Article Note Review Number of backward publication citations Number of authors Pat sector: KGI COM KGI+COM Aff sector KGI COM KGI+COM Time Time² Std. B Error z P>z [95% Conf. Interval] 4.326 0.292 14.800 0.000 3.753 4.899 0.114 0.309 0.524 1.130 0.220 0.270 0.827 0.784 -0.913 -1.905 1.141 2.523 0.046 0.141 0.008 0.019 5.990 7.350 0.000 0.000 0.031 0.103 0.061 0.179 0.000 . -0.627 -0.917 0.000 . 0.051 0.176 -0.301 0.015 . 0.206 0.355 . -3.050 -2.590 . 0.314 0.214 0.122 0.010 . 0.002 0.010 . 0.160 0.820 -2.470 1.420 . -1.030 -1.612 . 0.870 0.413 0.013 0.156 -0.223 -0.222 . -0.563 -0.245 -0.539 -0.006 0.666 0.596 -0.063 0.035 Sector analysis THE REGENTS OF THE UNIVERSITY OF CALIFORNIA THE JOHNS HOPKINS UNIVERSITY THE SALK INSTITUTE FOR BIOLOGICAL STUDIES BOARD OF REGENTS, THE UNIVERSITY OF TEXAS SYSTEM THE SCRIPPS RESEARCH INSTITUTE THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY JOHNS HOPKINS UNIVERSITY CITY OF HOPE PRESIDENT AND FELLOWS OF HARVARD COLLEGE WASHINGTON UNIVERSITY INSTITUT PASTEUR THE ROCKEFELLER UNIVERSITY THE UNITED STATES OF AMERICA AS REPRESENTED BY THE SECRETARY OF AGRICULTURE THE UNITED STATES OF AMERICA AS REPRESENTED BY THE DEPARTMENT OF HEALTH UNIVERSITY OF UTAH RESEARCH FOUNDATION OKLAHOMA MEDICAL RESEARCH FOUNDATION MASSACHUSETTS INSTITUTE OF TECHNOLOGY THE UNITED STATES OF AMERICA AS REPRESENTED BY THE SECRETARY OF THE JOHNS HOPKINS UNIVERSITY SCHOOL OF MEDICINE ST. JUDE CHILDREN'S RESEARCH HOSPITAL US US US US US US US US US US FR US US US US US US US US US 26 26 15 12 10 9 9 8 8 8 8 7 7 7 7 6 6 6 6 6 Conclusions science-technology interactions • We do not observe lower citation rates for publications that are part of a patent application (nor before and after grant, nor matched by journal, nor matched by author) • Significant impact of KGIs at the patent side • We miss patent-publication pairs • Dig deeper into the sector dynamics • Citation patterns are only one aspect of the diffusion of knowledge Overview In search of anti-commons: Academic patenting and patent-paper pairs in biotechnology. An analysis of citation flows. Tom Magerman, Bart Van Looy, Koenraad Debackere (tom.magerman@econ.kuleuven.be) INCENTIM (International Centre for Studies in Entrepreneurship and Innovation Management) K.U.Leuven Managerial Economics, Strategy & Innovation ECOOM (Centre for R&D Monitoring) ESF-APE-INV workshop Scientists & Inventors 10-11/5/2012