Magerman - APE-INV

advertisement
In search of anti-commons: Academic patenting and
patent-paper pairs in biotechnology. An analysis of
citation flows.
Tom Magerman, Bart Van Looy, Koenraad Debackere
(tom.magerman@econ.kuleuven.be)
INCENTIM (International Centre for Studies in Entrepreneurship and Innovation Management)
K.U.Leuven Managerial Economics, Strategy & Innovation
ECOOM (Centre for R&D Monitoring)
ESF-APE-INV workshop Scientists & Inventors 10-11/5/2012
1957
University-Industry linkages
SCIENCE
TECHNOLOGY
University-Industry linkages
Commercialization
of science
(Entrepreneurial
University)
Scientification
of technology
University-Industry linkages
+
Complementarities
Generation of new
research ideas
Additional funding
Create a market of
ideas
University-Industry linkages
+
Complementarities
Crowding out
Generation of new
research ideas
Quality
Additional funding
Research
orientation
Create a market of
ideas
Anti-commons and
the end of open
science
University-Industry linkages
Commercialization
of science
(Entrepreneurial
University)
Scientification
of technology
Anti-commons and the end of open science
If I have seen a little further [then you and Descartes]
it is by standing on the shoulders of Giants.
Isaac Newton, letter to Robert Hoode
(originated from John of Salisbury)
Anti-commons and the end of open science
Anti-commons and the end of open science
Tragedy of the anticommons: underuse of scarce resources because too many
owners can block each other
=> more intellectual property rights may lead paradoxically to fewer useful products
On the one hand incentive to undertake risky research
On the other hand too many owners hold rights in previous discoveries that
constitute obstacles to future research
=> high transaction costs lead to inefficiencies
Biomedical research has been moving from a commons model toward a privatization
model
=> risc of anticommons tragedy
Influenced by patent system: what is patentable (e.g. patents on gene fragments)
Influenced by patent owner: licensing behavior (e.g. use of reach-through license
agreements)
Transition or tragedy? Find ways to lower transactions costs of bundling rights
(intermediate organizations; patent pools; cross-licensing)
8/09/2011
Tom Magerman – ENID 2011
17
Anti-commons and the end of open science
Expansion of IPR is privatizing the scientific commons and limiting scientific
progress
– Heller and Eisenberg (1998); Argyres and Liebskind (1998); David (2000);
Lessig (2002); Etzkowitz (1998); Krimsky (2003)
Murray and Stern (2007): “Do formal intellectual property rights hinder the
free flow of scientific knowledge? An empirical test of the anti-commons
hypothesis”
• How does IPRs affect propensity of future researchers to build upon
knowledge?
• Compare citation patterns of publications in pre-grant period and after
grant
• 169 patent-paper pairs (Nature Biotechnology)
• Modest anti-commons effect: decline in citation rate by 10 to 20%
Detection of patent-publication pairs
Text Mining
Text mining refers to the automated
extraction of knowledge and information
from text by means of revealing relationships
and patterns present, but not obvious, in a
document collection.
Related to data mining, but additional issues:
 other scale of dimensionality (100,000+
‘variables’)
 different kind of variables (not really
independent, and very, very sparse –
99.99%)
 language issues (homonymy/polysemy
and synonymy)
Latent Semantic Analysis (LSA)
LSA was developed late 1980s at BellCore/Bell Laboratories by Landauer and his team of
Cognitive Science Research:
“Latent Semantic Analysis (LSA) is a theory and method for extracting and representing
the meaning of words. Meaning is estimated using statistical computations applied to a
large corpus of text. The corpus embodies a set of mutual constraints that largely
determine the semantic similarity of words and sets of words. These constraints can be
solved using linear algebra methods, in particular, singular value decomposition.”
 LSA is a technique for analyzing text: extract (underlying or latent) meaning from text
 LSA is a theory of meaning: meaning is acquired by solving an enormous set of
simultaneous equations that capture the contextual usage of words
 LSA is a new approach to cognitive science: use large text corpora to test cognitive
theories
Linear algebra problem
The meaning of passages of text must be
sums of the meaning of its words.
LSA models a large corpus of text as a
large set of simultaneous equations.
The solution is in the form of a set of
vectors, one for each word and passage,
in a semantic space
Similarity of meaning of two words is
measured by the cosine between the
vectors, and the similarity of two
passages as the same measure on the
sum or average of all its contained words
SVD dimensionality reduction
Singular Value Decomposition rank-k approximation:
A  U   V
A A
mn
T
with
mn
 Ak

2
2
2
a diagonal matrix of singular values ( 1   2  ...   n )
U
mk
.
k k
.V
k n
Dimensionality reduction by taking first k singular values:
Practical application?
Even when using
LSA/SVD as text
mining method,
many options
remain!
Preprocessing
Term
weighting
SVD
truncation
Assessment of 40 measure variants
4
weighting
methods
9 SVD
truncation
levels +
no SVD
40 similarity
measures
based on
SVD and
cosine
Full process
Construct
DbT
matrix
Create full text
index with stop
word removal and
stemming (Lucene)
Convert full text
index to documentby-term matrix
(Matlab)
Weight DbT matrix
(4 variants)
SVD
truncation
Decompose
weighted DbT
matrix into U∑V
using 1,000 largest
singular values
Generate document
–by-concept matrix
V∑
Truncate documentby-concept matrix
(take first 1000,
500, …, 5 concepts)
Normalise DbT and
DbC matrices
Calculate distance
matrix (all patents
to all publications)
by calculating inner
product of vectors
Retain closest
publication for every
patent for all of the
43 variants
Similarity
calculation
Expert validation
No SVD
SVD 1000
SVD 500
SVD 300
SVD 200
SVD 100
SVD 25
0.61
0.34
0.31
0.30
0.31
0.30
0.22
SVD
5
No SVD
SVD 1000
SVD 500
SVD 300
SVD 200
SVD 100
SVD 25
0.11
0.77
0.65
0.63
0.58
0.51
0.45
0.38
SVD
0.20
5
Measure
TF-IDF
R²
IDF
BIN
RAW
Measure
R²
No SVD
SVD 1000
SVD 500
SVD 300
SVD 200
SVD 100
SVD 25
0.71
0.45
0.34
0.26
0.21
0.17
0.14
SVD
5
No SVD
SVD 1000
SVD 500
SVD 300
SVD 200
SVD 100
SVD 25
0.11
0.80
0.63
0.57
0.54
0.51
0.49
0.46
SVD
0.21
5
Common terms (weighted by min number of terms)
Common terms (weighted by max number of terms)
0.82
0.68
Common terms (weighted by avg number of terms)
0.75
University-Industry linkages
Commercialization
of science
(Entrepreneurial
University)
Scientification
of technology
Methodology and data
Publication data
Selection of biotechnology publications from the Web of Science
based on the subject classification (1991-2008):
• Core set of 243,361 publications : subject category Biotechnology &
Applied Microbiology
• Extended set of 683,674 publications : publications of following subject
categories citing or cited by a publication of the core set: Biochemical Research
Methods; Biochemistry & Molecular Biology; Biophysics; Plant sciences; Cell
Biology; Developmental Biology; Food sciences & Technology; Genetics &
Heredity; Microbiology Materials
• Multidisciplinary set of 97,970 publications : publications from
multidisciplinary journals Nature; Science; and Proceedings of the National
Academy of Sciences of the United States of America
1,025,005 publications in total (948,432 suited for text mining)
478,361 publications published between 1991 and 2000
Methodology and data
Patent data
Selection of all granted EPO and USPTO biotechnology patents,
applied for between 1991 and 2008, from PATSTAT using IPCcodes as listed in OECD definition of biotechnology (‘A Framework
for Biotechnology Statistics’, OECD, Paris, 2005)
27,241 EPO patents and 91,775 USPTO patents
119,016 patents in total (88,248 suited for text mining)
Methodology and data
Original document combinations:
83,697,227,136 patent-publication combinations
CommonTermsMin ≥ 0.60:
27,250 patent-publication combinations
And CommonTermsMax ≥ 0.30:
645 patent-publication combinations
And at least one shared inventor/author:
584 patent-publication pairs
Matching
Methodology and data
584 patent-publication pairs identified
•
•
•
•
17 patent linked to multiple publications (up to 3)
115 publications linked to multiple patents (up to 7) (patent families)
566 distinct patents paired with publication
400 distinct publications paired with patent
Patentee type
•
•
•
•
•
292 University
128 Government / Non profit
126 Company
38 Hospital
21 Individual
(42 patents have multiple patentees from different sectors)
Pairs
Publication and citation numbers
Citation analysis
Match publications to deal with quality differences
Paired and non-paired publications matched by year and journal (1991-2000)
VY
SO
1991BIOCHEMISTRY
1991BIOTECHNIQUES
…
…
1992BIOSCIENCE BIOTECH AND BIOCHEMISTRY
1992BIOTECHNIQUES
…
…
Total
PAIRS
NONPAIRS
PUB AVG_AU AVG_CIT PUB
AVG_AU AVG_CIT
1
5.00
65.00
625
4.03
57.20
1
2.00
64.00
125
3.24
40.27
1
1
2.00
4.00
4.00
147.00
543
144
4.24
3.07
8.07
26.17
328
5.18
130.47 117,909
4.42
67.03
328 paired publications versus 106,027 biotechnology publications
Before and after publication and grant
Variable
Ratio average citations
pairs/non-pairs
Ratio average citations
pairs/non- pairs
Diff
Variable
Ratio average citations
pairs/non-pairs
Ratio average citations
pairs/non-pairs
Variable
Ratio average citations
pairs/non-pairs
Class
N
Lower cl
mean
Mean
Upper cl
mean
Pre-grant
288
1.42
1.71
2.00
Post-grant
288
1.48
1.74
2.00
-0.43
-0.03
0.36
(1-2)
T-TESTS
Method
Variances
DF
t value
Pr > |t|
Pooled
Equal
574
-0.17
0.8666
Satterthwaite
Unequal
565
-0.17
0.8666
EQUALITY OF VARIANCES
Method
Num DF
Den DF
F value
Pr > F
1.29
0.0299
Folded F
287
287
Paired sample t-tests
Test
Paired vs nonpaired
Paired vs nonpaired
(at least 2 paired
publications)
Paired and grey
zone vs all others
Paired and grey
zone vs all others
(at least 2 paired
or grey zone
publications)
N
Mean 1
Mean 2
Difference
t value
Pr > |t|
Forward
citations
190
130.47
74.24
56.23
4.33
0.0001
Without self
citations
190
116.01
65.02
50.99
4.07
0.0001
Forward
citations
59
224.97
131.63
93.34
3.12
0.0028
Without self
citations
59
202.7
117.88
84.82
2.97
0.0043
Forward
citations
764
60.57
42.69
17.88
5.72
0.0001
Without self
citations
764
53.09
36.48
16.61
5.59
0.0001
Forward
citations
281
96.41
59.64
36.77
5.57
0.0001
Without self
citations
281
85.85
51.76
34.09
5.43
0.0001
Multivariate analysis (negative binomial)
Parameter
(Intercept)
Pair (Y/N)
Document type:
Article
Letter
Note
Review
Number of backward
publication citations
Number of authors
Time
Time²
Journal dummies
(n=104)
95% Wald
Confidence Interval
Lower - Upper
2.719
3.213
.350
.549
Hypothesis Test
Wald ChiSquare
df Sig.
555.643 1 .000
78.945 1 .000
B
2.966
.450
Std.
Error
.1258
.0506
-.574
-.774
-.567
0
.0113
.0590
.0175
.
-.596
-.890
-.601
.
-.552
-.659
-.533
.
2589.688
172.469
1051.989
.
1
1
1
.
.000
.000
.000
.
.013
.0001
.013
.014
10416.453
1
.000
.033
.125
-.012
.0005
.0015
.0001
.032
.122
-.013
.034
.128
-.012
4613.407
7191.199
29450.994
1
1
1
.000
.000
.000
Included
Sector analysis
Pub sector
COM
KGI
KGI+COM
KGI
KGI+COM
KGI
KGI+COM
Pat sector N
COM
21
COM
25
COM
15
KGI
227
KGI
16
KGI+COM
6
KGI+COM
5
315
Mean Median
71.6
34.0
70.5
49.0
106.7
80.0
179.2
67.0
282.0
131.5
219.2
93.5
85.0
67.0
164.4
66.0
Var
5,999.6
3,212.6
18,605.8
95,544.4
231,467.6
66,633.4
3,546.5
SD
77.5
56.7
136.4
309.1
481.1
258.1
59.6
84,846.9 291.3
Sector analysis
Parameter
(Intercept)
Document type:
Article
Note
Review
Number of backward
publication citations
Number of authors
Pat sector:
KGI
COM
KGI+COM
Aff sector
KGI
COM
KGI+COM
Time
Time²
Std.
B
Error
z
P>z
[95% Conf.
Interval]
4.326
0.292
14.800
0.000
3.753
4.899
0.114
0.309
0.524
1.130
0.220
0.270
0.827
0.784
-0.913
-1.905
1.141
2.523
0.046
0.141
0.008
0.019
5.990
7.350
0.000
0.000
0.031
0.103
0.061
0.179
0.000 .
-0.627
-0.917
0.000 .
0.051
0.176
-0.301
0.015
.
0.206
0.355
.
-3.050
-2.590
.
0.314
0.214
0.122
0.010
.
0.002
0.010
.
0.160
0.820
-2.470
1.420
.
-1.030
-1.612
.
0.870
0.413
0.013
0.156
-0.223
-0.222
.
-0.563
-0.245
-0.539
-0.006
0.666
0.596
-0.063
0.035
Sector analysis
THE REGENTS OF THE UNIVERSITY OF CALIFORNIA
THE JOHNS HOPKINS UNIVERSITY
THE SALK INSTITUTE FOR BIOLOGICAL STUDIES
BOARD OF REGENTS, THE UNIVERSITY OF TEXAS SYSTEM
THE SCRIPPS RESEARCH INSTITUTE
THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY
JOHNS HOPKINS UNIVERSITY
CITY OF HOPE
PRESIDENT AND FELLOWS OF HARVARD COLLEGE
WASHINGTON UNIVERSITY
INSTITUT PASTEUR
THE ROCKEFELLER UNIVERSITY
THE UNITED STATES OF AMERICA AS REPRESENTED BY THE SECRETARY OF AGRICULTURE
THE UNITED STATES OF AMERICA AS REPRESENTED BY THE DEPARTMENT OF HEALTH
UNIVERSITY OF UTAH RESEARCH FOUNDATION
OKLAHOMA MEDICAL RESEARCH FOUNDATION
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
THE UNITED STATES OF AMERICA AS REPRESENTED BY THE SECRETARY OF
THE JOHNS HOPKINS UNIVERSITY SCHOOL OF MEDICINE
ST. JUDE CHILDREN'S RESEARCH HOSPITAL
US
US
US
US
US
US
US
US
US
US
FR
US
US
US
US
US
US
US
US
US
26
26
15
12
10
9
9
8
8
8
8
7
7
7
7
6
6
6
6
6
Conclusions science-technology interactions
• We do not observe lower citation rates for
publications that are part of a patent
application (nor before and after grant, nor
matched by journal, nor matched by author)
• Significant impact of KGIs at the patent side
• We miss patent-publication pairs
• Dig deeper into the sector dynamics
• Citation patterns are only one aspect of the
diffusion of knowledge
Overview
In search of anti-commons: Academic patenting and
patent-paper pairs in biotechnology. An analysis of
citation flows.
Tom Magerman, Bart Van Looy, Koenraad Debackere
(tom.magerman@econ.kuleuven.be)
INCENTIM (International Centre for Studies in Entrepreneurship and Innovation Management)
K.U.Leuven Managerial Economics, Strategy & Innovation
ECOOM (Centre for R&D Monitoring)
ESF-APE-INV workshop Scientists & Inventors 10-11/5/2012
Download