OA - Electronics and Computer Science

advertisement
Open Access:
Validating Metrics and
Motivating Mandates
Stevan Harnad, UQAM & U. Southampton
Alma Swan, Key Persectives & U Southampton
Arthur Sale, U. Tasmania
OAR2008
OA Timeline and Milestones
•
•
•
•
•
•
•
•
•
•
Origin of the Universe: 14 billion years
Origin of Life on Earth: 4 billion years
Origin of our Species: 200,000 years
Origin of Language: 100-200,000 years
Origin of Writing: 10,000 years
Origin of Printing: 500 years
Origin of Learned journals: 340 years
Origin of Internet: 40 years
Origin of Web: 18 years
Origin of OAI: 9 years
OAR2008
What is OA?
Free online access
to refereed research articles
OAR2008
Gold or Green?
• OA journals: Gold OA
• Self-archiving non-OA journal articles: Green OA
OAR2008
Access to what?
• Published peer-reviewed journal articles
• Unrefereed preprints?
• Monographs?
• Data?
OAR2008
Why OA?
• Maximize research uptake, usage and
impact
• Direct benefit: Research progress
• Side-Benefits: Developing world access,
student access, public access
OAR2008
How OA?
• Self-archive in Institutional Repository
• Institutions and Funders Mandate SelfArchiving
OAR2008
Limited Access: Limited Research Impact
Impact cycle
begins:
12-18 Months
Research is
done
Researchers write
pre-refereeing
“Pre-Print”
Submitted to Journal
Pre-Print reviewed by
Peer Experts – “PeerReview”
Pre-Print revised by
article’s Authors
Refereed “Post-Print”
Accepted, Certified, Published
by Journal
Researchers can access the
Post-Print if their university
has a subscription to the
Journal
OAR2008
New impact cycles:
New research builds
on existing research
Limited Access: Limited Research Impact
Impact cycle
begins:
12-18 Months
Research is
done
Researchers write
pre-refereeing
“Pre-Print”
Submitted to Journal
Pre-Print reviewed by
Peer Experts – “PeerReview”
Pre-Print revised by
article’s Authors
Refereed “Post-Print”
Accepted, Certified, Published
by Journal
Researchers can access the
Post-Print if their university
has a subscription to the
Journal
OAR2008
This limited
subscription-based
access can be
supplemented by selfarchiving the Postprint
in the author’s own
institutional repository
as follows:
New impact cycles:
New research builds
on existing research
Maximized Research Access and Impact Through Self-Archiving
Impact cycle
begins:
12-18 Months
Research is done
Researchers write
pre-refereeing
“Pre-Print”
Submitted to Journal
Pre-Print reviewed by Peer
Experts – “Peer-Review”
Pre-Print revised by
article’s Authors
Refereed “Post-Print” Accepted,
Certified, Published by Journal
Post-Print
is self-archived
in University’s
Eprint Archive
More impact
cycles:
Researchers can access the
Post-Print if their university
has a subscription to the
Journal
New impact cycles:
OAR2008
New research builds on
existing research
What are “metrics”?
• Metrics are objective measures of research quality
and quantity
• The only alternative to metrics is subjective human
judgment (including peer review)
“Show me a philosopher who wishes to discard metaphysics and
I’ll show you a metaphysician with a rival system”
(Show me someone who wishes to discard metrics, and I’ll show
you a metrician with rival metrics)
OAR2008
Open Access: How?
By mandating Green OA Self-Archiving
OA Metrics motivate OA Mandates
And OA Mandates maximize OA Metrics
Brody et al (2007) Incentivizing the Open Access Research Web: Publication-, Data-Archiving
and Scientometrics. CTWatch Quarterly 3(3). http://eprints.ecs.soton.ac.uk/14418/
OAR2008
“Online or Invisible?” (Lawrence 2001)
“average of 336% more citations to online articles compared to offline
articles published in the same venue”
Lawrence, S. (2001) Free online availability substantially increases a
paper's impact Nature 411 (6837): 521.
http://www.neci.nec.com/~lawrence/papers/online-nature01/
OAR2008
Lawrence (2001) findings for computer science conference
papers. More OA every year for all citation levels; higher with
OAR2008
higher citation levels
OAR2008
Contributors to the OA Advantage
EA + QA + UA + (CA) + (QB)
•
EA: Early Advantage
•
QA: Quality Advantage (Seglen 80/20 effect)
•
UA: Usage Advantage
•
(CA: Competitive Advantage)
•
(QB: Quality Bias)
OAR2008
Early Access Advantage: OA is accelerating the
research access/usage/citation cycle. OA articles
are being cited sooner and sooner
(Data from Physics Arxiv)
OAR2008
Quality Advantage: Higher quality articles have greater OA
Advantage
Usage Advantage + Early Advantage: OA Articles are downloaded
more and early downloads lead to later citations
Data from arXiv
Downloads (“hits”) in
the first 6 months
correlate with citations 2
years later
Most articles are not
cited at all
Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information
Science and Technology (JASIST) 57(8): 1060-1072. http://eprints.ecs.soton.ac.uk/10713/
Time-Course and cycle of Citations (red)
and Usage (hits, green)
Witten, Edward (1998) String Theory and Noncommutative Geometry Adv. Theor. Math. Phys. 2 : 253
1.
OAR2008
Preprint or
Postprint
appears.
2. It is downloaded
(and
sometimes
read).
3. Next, citations
may follow (for
more important
papers)…
4. This generates
more
downloads…
5. More citations...
(Competitive Advantage): The earlier you mandate Green OA, the sooner (and bigger) your university's
competitive advantage: U. Southampton School of Electronics and Computer Science was the first in the
world to adopt an OA self-archiving mandate. (Competitive Advantage vanishes at 100% OA.)
(Quality or Self-Selection Bias ): Better authors are more
likely to self-archive, and better articles are more likely to be
self-archived. (Michael Kurtz considers this in itself a sufficient
rationale for self-archiving!) (Quality Bias vanishes at 100%
OA.)
The data below are systematically ambiguous, because they
could arise from either Quality Advantage or Quality Bias
OAR2008
Some have argued that the OA Advantage might be all or mostly
Just Quality (Self-Selection) Bias.
So we tested this, by comparing self-selected with mandated OA:
OAR2008
Percentage of OA articles among the ISI-indexed
articles, per institution and per year:
Ø: non-Open Access, O: Open Access, M: Mandated, N: non-Mandated
Results: Five of the seven effects are statistically significant:
O > Ø, OM > ON, (ON=ØN), OM > ØM, OM > Ø (ON=Ø), OM > ØN
i.e., OA is more cited than non-OA, and mandated OA is cited more,
not less than non-mandated OA.
Green OA Mandates 1: Alma Swan’s International,
Multidisciplinary Survey Predictions About Researcher
Compliance:
OAR2008
OA Mandates: Across all countries and disciplines, 95% of
researchers report that they would comply with a self-archiving
mandate from their funders and/or employers, and over 80% report
that they would do so willingly. -- But only 15% self-archive
spontaneously, if it not mandated.
Green OA Mandates 2: Arthur Sale’s Australian Data on Actual
Researcher Compliance:
OAR2008
University of Tasmania
+Repository -Incentive -Mandate
Green line: total annual output
Red line: proportion self-archived
700
600
500
400
300
200
100
0
Actual
documents
DEST
publication
s
Jun- Jul- A
S Oct- N
D Jan- F
M Apr- M Jun04 04 ug- ep- 04 ov- ec- 05 eb- ar- 05 ay- 05
04 04
04 04
05 05
05
Data courtesy of Arthur Sale
OAR2008
03/10/2005
03/09/2005
0
03/08/2005
03/07/2005
03/06/2005
03/05/2005
03/04/2005
03/03/2005
03/02/2005
03/01/2005
03/12/2004
03/11/2004
03/10/2004
03/09/2004
03/08/2004
03/07/2004
03/06/2004
03/05/2004
03/04/2004
03/03/2004
03/02/2004
Documents
4000
University of Queensland
+Repository +Incentive -Mandate
Green line: total annual output
Red line: proportion self-archived
3500
3000
2500
2000
Total
documents
1500
DEST
documents
1000
500
Data courtesy of Arthur Sale
Queensland University of Technology
+Repository +Incentive +Mandate
Green line: total annual output
Red line: proportion self-archived
1800
1600
1200
Documents
1000
800
600
DESTreportable
400
OAR2008
24/ 09/2005
24/ 08/2005
24/ 07/2005
24/ 06/2005
24/ 05/2005
24/ 04/2005
24/ 03/2005
24/ 02/2005
24/ 01/2005
24/ 12/2004
24/ 11/2004
24/ 10/2004
24/ 09/2004
24/ 08/2004
24/ 07/2004
0
24/ 06/2004
200
24/ 05/2004
Documents
1400
Data courtesy of Arthur Sale
Unanimous Recommendation by
EUA, Jan 25 2008
791 universities
in 46 countries
All European Universities should create institutional repositories
and should mandate that all research publications must be
deposited in them immediately upon publication (and made Open
Access as soon as possible thereafter) as already mandated by
RCUK, ERC, and NIH, and as recommended by EURAB.
In addition, the EUA recommends that these (funder) self-archiving
mandates should also be extended to all research results arising
from EU research programme/project funding.
OAR2008
NOW 54 & 10!
The majority of journals (63%) already
endorse immediate Green Open
Access Self-Archiving of the postprint
ROMEO/EPRINTS (Directory of
Journal Policies on author OA SelfArchiving):
http://romeo.eprints.org/
NOW 63% & 95%
OAR2008
For the articles in the 37% of journals that
have an embargo policy, the free EPrints
institutional Repository-creating software
has an ”Email Eprint Request" Button:
The user who reaches the metadata for a
Closed Access article puts his email in a
box and clicks.
This sends an automatic email to the author,
with a URL on which the author clicks to
automatically email the eprint to the
OAR2008
requester.
Once the ID/OA mandates are universally adopted, the
embargoes will soon become obsolete, under growing OA
pressure worldwide.
Carr & Harnad (2005) Keystroke Economy: A Study of the
Time and Effort Involved in Self-Archiving.
http://eprints.ecs.soton.ac.uk/10688/
OAR2008
The free EPrints University Repository Software
generates rich (and potentially even richer)
usage metrics. It can be used for showcasing,
navigating, comparing and assessing.
Here is a sample of University Repository usage
metrics for Southampton author
Tim Berners-Lee:
http://stats.eprints.ecs.soton.ac.uk/cgi-bin/irstats.cgi?
OAR2008
Interoperable Repository Statistics
OAR2008
Interoperable Repository Statistics
Some EPrints download metrics for
top deposits by Southampton
OAR2008
author Tim Berners-Lee.
These Local Repository Usage metrics at the
individual university level can then be
complemented by CITEBASE, which provides
global Citation, Download, Co-citation,
Hub/Authority, growth, and other metrics:
http://stats.eprints.ecs.soton.ac.uk/cgi-bin/irstats.cgi?
OAR2008
OAR2008
OAR2008
Sample citation and download growth with time. (Downloads only start in
2005 because that is when this paper was deposited.) Early growth rate and
late decay metrics for downloads and citations can also be derived.
Sample of candidate
OA-era metrics:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Citations (C)
CiteRank
Co-citations
Downloads (D)
C/D Correlations
Hub/Authority index
Chronometrics:
Latency/Longevity
• Endogamy/Exogamy
• Book citation index
Research funding
Students
Prizes
h-index
Co-authorships
Number of articles
Number of publishing
years
• Semiometrics (latent
semantic indexing, text
overlap, etc.)
OAR2008
These metrics can be validated in
the UK Research Assessment
Exercise (RAE), discipline by
discipline, through multiple
regression analysis:
The metrics can be weighted by
their ability to predict the
rankings given by the evaluation
by human peer panels:
OAR2008
UK’s RAE 2008 will be a parallel panel/metric
exercise, making it possible to develop a rich
spectrum of candidate metrics and to validate
each metric against the panel rankings,
discipline by discipline, through multiple
regression analysis, determining and
calibrating the (“beta”) weights on each metric.
Harnad, S. (2007) Open Access Scientometrics and the UK Research Assessment Exercise.
Proceedings of 11th Annual Meeting of the International Society for Scientometrics and
Informetrics 11(1) : 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds.
OAR2008
http://eprints.ecs.soton.ac.uk/13804/
RAE 2001
Rankings for
Psychology
OAR2008
Research Assessment, Research
Funding, and Citation Impact
“Correlation between RAE ratings and
mean departmental citations +0.91
(1996) +0.86 (2001) (Psychology)”
“RAE and citation counting measure
broadly the same thing”
“Citation counting is both more costeffective and more transparent”
(Eysenck & Smith 2002)
http://psyserver.pc.rhbnc.ac.uk/citations.pdf
OAR2008
Here is how ordinary regression (2-variable correlation works):
If the correlation between Height (H) and Weight (W) is 0.6
Then if you ‘normalize’ Hz and Wz to the same scale, you can
predict Weight from Height:
predicted Wz = 0.6(Hz) + unpredictable residual
In multiple regression, you can improve the prediction (and
reduce the unpredicatble residual) by adding more
Predictor variables (e.g., gender, age, parents’ weight, activity
level, etc.) in which case the equation becomes a series of
predictor variables P1, P2… etc. with “beta weights” b1, b2…
All predicting a criterion variable (C):
C= b1P1 + b2P2 +b3P3….+bnPn + unpredictable residual
OAR2008
In scientometrics, the metrics (M) are the predictors:
b1M1 + b2M2 +b3M3….+bnMn + unpredictable residual
But what is the criterion?
The natural criterion against which to validate metrics,
thereby initializing the weights, is peer review!
This has to be done separately, discipline by discipline
so as to be sure to compare only like with like.
The UK’s RAE 2008 is a unique historic opportunity to validate
A rich battery of candidate metrics against the peer-panel
rankings for each department of each UK university.
OAR2008
Sample multiple regression analysis. The criterion here is not
RAE peer rankings, but article citations, and the 7 predictors
are (1) article age, (2) journal impact factor, (3) number of
references, (4) number of authors, (5) science/nonscience,
(6) OA/nonOA, (7) mandated/nonmandated.
Note how the regression weights (in the graph) show the size
Of the contribution of each predictor to the criterion (citations).
Note that OA emerges as a significant contributor to citation
Counts even when the other 6 variables are factored out.
OAR2008
4 Multiple regression analysis for the 4 journal impact factor (JIF)
Quartiles, for Age, (JIF), References, Science, OA, and Mandated
(OA Advantage only siginificant in to 25%). Criterion that is being
predicted is number of citations.
0.500
0.400
0.300
Log_Age
Log_IF
Log_Ref
0.200
Log_Auth
Sci
OA
0.100
M
0.000
FI 1_1
-0.100
FI 1_5
FI 2_1
FI 2_5
FI 3_1
FI 3_5
FI 4_1
FI 4_5
Multiple regression analyses within each citation
bracket (0, 1, 2…. 20+): OA effect
grows bigger in higher citation brackets
0.400
0.350
0.300
0.250
Log_Age
Log_IF
0.200
Log_Ref
Log_Auth
Sci
0.150
OA
M
0.100
0.050
0.000
M_a_1
-0.050
M_a_2
M_a_3
M_a_4
M_a_5
M_a_6
M_a_7
M_a_8
M_a_9
M_a_10
M_a_20
Conclusions:
Use RAE 3008 peer rankings to validate metrics.
Use OA metrics as the incentive to motivate OA mandates
Use OA mandates to generate OA content, maximising the
research impact at the same time as measuring it
OAR2008
Вот - некоторые полезные вебсайты
Author’s URLs (UQAM & Southampton):
http://www.crsc.uqam.ca/
http://users.ecs.soton.ac.uk/harnad/
BIBLIOGRAPHY ON OA IMACT ADVANTAGE:
http://opcit.eprints.org/oacitation-biblio.html
BOAI Self-Archiving FAQ: http://www.eprints.org/self-faq/
CITEBASE (scientometric engine): http://citebase.eprints.org/
EPRINTS: http://www.eprints.org/
OA ARCHIVANGELISM: http://openaccess.eprints.org/
ROAR (Registry of OA Repositories): http://roar.eprints.org/
ROARMAP (Registry of OA Repository Mandates):
http://www.eprints.org/openaccess/policysignup/
ROMEO/EPRINTS (Directory of Journal Policies on author OA
Self-Archiving): http://romeo.eprints.org/
OAR2008
1995: Universal FTP Archives for Esoteric Science and Scholarship: A Subversive Proposal In:
Scholarly Journals at the Crossroads. ARL. http://www.arl.org/scomm/subversive/toc.html
2001: Research access, impact and assessment THES 1487 http://cogprints.org/1683/
The Self-Archiving Initiative Nature 410 http://www.nature.com/nature/debates/e-access/Articles/harnad.html
Measuring and Maximising UK Research Impact THES http://eprints.ecs.soton.ac.uk/7728/
Mandated online RAE CVs Linked to University Eprint Archives. Ariadne 35
http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm
2004: Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals &
Brody D-Lib http://www.dlib.org/dlib/june04/harnad/06harnad.html
The Access/Impact Problem and the Green and Gold Roads to Open Access. et al Nature Web Focus.
http://www.nature.com/nature/focus/accessdebate/21.html
2005: Journal publishing and author self-archiving: Peaceful Co-Existence Berners-Lee et al
http://eprints.ecs.soton.ac.uk/11160/
Keystroke Economy: A Study of the Time and Effort Involved in Self-Archiving. Carr & Harnad
http://eprints.ecs.soton.ac.uk/10688/
Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and Research Citation Impact.
Hajjem et al IEEE Data Engineering Bulletin 28 http://eprints.ecs.soton.ac.uk/11688/
Making the case for web-based self-archiving Research Money 19 http://eprints.ecs.soton.ac.uk/11534/
2006: Self-archiving should be mandatory 2006 Research Information
http://eprints.ecs.soton.ac.uk/12738/
The Open Research Web: A Preview of the Optimal and the Inevitable Shadbolt et al in Open Access: Key
Strategic, Technical and Economic Aspects http://eprints.ecs.soton.ac.uk/12453/
2007: Open Access Scientometrics and the UK Research Assessment Exercise Proc 11th Ann Mtg Int
Soc Scientometrics and Informetrics 11:27-33 http://eprints.ecs.soton.ac.uk/13804/
Time to Convert to Metrics Brody et al Research Fortnight 17 http://eprints.ecs.soton.ac.uk/14329/
Incentivizing the Open Access Research Web: Publication-, Data-Archiving and Scientometrics. Brody et al
CTWatch Quarterly 3(3). http://eprints.ecs.soton.ac.uk/14418/
OAR2008
Download