Open Access: Validating Metrics and Motivating Mandates Stevan Harnad, UQAM & U. Southampton Alma Swan, Key Persectives & U Southampton Arthur Sale, U. Tasmania OAR2008 OA Timeline and Milestones • • • • • • • • • • Origin of the Universe: 14 billion years Origin of Life on Earth: 4 billion years Origin of our Species: 200,000 years Origin of Language: 100-200,000 years Origin of Writing: 10,000 years Origin of Printing: 500 years Origin of Learned journals: 340 years Origin of Internet: 40 years Origin of Web: 18 years Origin of OAI: 9 years OAR2008 What is OA? Free online access to refereed research articles OAR2008 Gold or Green? • OA journals: Gold OA • Self-archiving non-OA journal articles: Green OA OAR2008 Access to what? • Published peer-reviewed journal articles • Unrefereed preprints? • Monographs? • Data? OAR2008 Why OA? • Maximize research uptake, usage and impact • Direct benefit: Research progress • Side-Benefits: Developing world access, student access, public access OAR2008 How OA? • Self-archive in Institutional Repository • Institutions and Funders Mandate SelfArchiving OAR2008 Limited Access: Limited Research Impact Impact cycle begins: 12-18 Months Research is done Researchers write pre-refereeing “Pre-Print” Submitted to Journal Pre-Print reviewed by Peer Experts – “PeerReview” Pre-Print revised by article’s Authors Refereed “Post-Print” Accepted, Certified, Published by Journal Researchers can access the Post-Print if their university has a subscription to the Journal OAR2008 New impact cycles: New research builds on existing research Limited Access: Limited Research Impact Impact cycle begins: 12-18 Months Research is done Researchers write pre-refereeing “Pre-Print” Submitted to Journal Pre-Print reviewed by Peer Experts – “PeerReview” Pre-Print revised by article’s Authors Refereed “Post-Print” Accepted, Certified, Published by Journal Researchers can access the Post-Print if their university has a subscription to the Journal OAR2008 This limited subscription-based access can be supplemented by selfarchiving the Postprint in the author’s own institutional repository as follows: New impact cycles: New research builds on existing research Maximized Research Access and Impact Through Self-Archiving Impact cycle begins: 12-18 Months Research is done Researchers write pre-refereeing “Pre-Print” Submitted to Journal Pre-Print reviewed by Peer Experts – “Peer-Review” Pre-Print revised by article’s Authors Refereed “Post-Print” Accepted, Certified, Published by Journal Post-Print is self-archived in University’s Eprint Archive More impact cycles: Researchers can access the Post-Print if their university has a subscription to the Journal New impact cycles: OAR2008 New research builds on existing research What are “metrics”? • Metrics are objective measures of research quality and quantity • The only alternative to metrics is subjective human judgment (including peer review) “Show me a philosopher who wishes to discard metaphysics and I’ll show you a metaphysician with a rival system” (Show me someone who wishes to discard metrics, and I’ll show you a metrician with rival metrics) OAR2008 Open Access: How? By mandating Green OA Self-Archiving OA Metrics motivate OA Mandates And OA Mandates maximize OA Metrics Brody et al (2007) Incentivizing the Open Access Research Web: Publication-, Data-Archiving and Scientometrics. CTWatch Quarterly 3(3). http://eprints.ecs.soton.ac.uk/14418/ OAR2008 “Online or Invisible?” (Lawrence 2001) “average of 336% more citations to online articles compared to offline articles published in the same venue” Lawrence, S. (2001) Free online availability substantially increases a paper's impact Nature 411 (6837): 521. http://www.neci.nec.com/~lawrence/papers/online-nature01/ OAR2008 Lawrence (2001) findings for computer science conference papers. More OA every year for all citation levels; higher with OAR2008 higher citation levels OAR2008 Contributors to the OA Advantage EA + QA + UA + (CA) + (QB) • EA: Early Advantage • QA: Quality Advantage (Seglen 80/20 effect) • UA: Usage Advantage • (CA: Competitive Advantage) • (QB: Quality Bias) OAR2008 Early Access Advantage: OA is accelerating the research access/usage/citation cycle. OA articles are being cited sooner and sooner (Data from Physics Arxiv) OAR2008 Quality Advantage: Higher quality articles have greater OA Advantage Usage Advantage + Early Advantage: OA Articles are downloaded more and early downloads lead to later citations Data from arXiv Downloads (“hits”) in the first 6 months correlate with citations 2 years later Most articles are not cited at all Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 57(8): 1060-1072. http://eprints.ecs.soton.ac.uk/10713/ Time-Course and cycle of Citations (red) and Usage (hits, green) Witten, Edward (1998) String Theory and Noncommutative Geometry Adv. Theor. Math. Phys. 2 : 253 1. OAR2008 Preprint or Postprint appears. 2. It is downloaded (and sometimes read). 3. Next, citations may follow (for more important papers)… 4. This generates more downloads… 5. More citations... (Competitive Advantage): The earlier you mandate Green OA, the sooner (and bigger) your university's competitive advantage: U. Southampton School of Electronics and Computer Science was the first in the world to adopt an OA self-archiving mandate. (Competitive Advantage vanishes at 100% OA.) (Quality or Self-Selection Bias ): Better authors are more likely to self-archive, and better articles are more likely to be self-archived. (Michael Kurtz considers this in itself a sufficient rationale for self-archiving!) (Quality Bias vanishes at 100% OA.) The data below are systematically ambiguous, because they could arise from either Quality Advantage or Quality Bias OAR2008 Some have argued that the OA Advantage might be all or mostly Just Quality (Self-Selection) Bias. So we tested this, by comparing self-selected with mandated OA: OAR2008 Percentage of OA articles among the ISI-indexed articles, per institution and per year: Ø: non-Open Access, O: Open Access, M: Mandated, N: non-Mandated Results: Five of the seven effects are statistically significant: O > Ø, OM > ON, (ON=ØN), OM > ØM, OM > Ø (ON=Ø), OM > ØN i.e., OA is more cited than non-OA, and mandated OA is cited more, not less than non-mandated OA. Green OA Mandates 1: Alma Swan’s International, Multidisciplinary Survey Predictions About Researcher Compliance: OAR2008 OA Mandates: Across all countries and disciplines, 95% of researchers report that they would comply with a self-archiving mandate from their funders and/or employers, and over 80% report that they would do so willingly. -- But only 15% self-archive spontaneously, if it not mandated. Green OA Mandates 2: Arthur Sale’s Australian Data on Actual Researcher Compliance: OAR2008 University of Tasmania +Repository -Incentive -Mandate Green line: total annual output Red line: proportion self-archived 700 600 500 400 300 200 100 0 Actual documents DEST publication s Jun- Jul- A S Oct- N D Jan- F M Apr- M Jun04 04 ug- ep- 04 ov- ec- 05 eb- ar- 05 ay- 05 04 04 04 04 05 05 05 Data courtesy of Arthur Sale OAR2008 03/10/2005 03/09/2005 0 03/08/2005 03/07/2005 03/06/2005 03/05/2005 03/04/2005 03/03/2005 03/02/2005 03/01/2005 03/12/2004 03/11/2004 03/10/2004 03/09/2004 03/08/2004 03/07/2004 03/06/2004 03/05/2004 03/04/2004 03/03/2004 03/02/2004 Documents 4000 University of Queensland +Repository +Incentive -Mandate Green line: total annual output Red line: proportion self-archived 3500 3000 2500 2000 Total documents 1500 DEST documents 1000 500 Data courtesy of Arthur Sale Queensland University of Technology +Repository +Incentive +Mandate Green line: total annual output Red line: proportion self-archived 1800 1600 1200 Documents 1000 800 600 DESTreportable 400 OAR2008 24/ 09/2005 24/ 08/2005 24/ 07/2005 24/ 06/2005 24/ 05/2005 24/ 04/2005 24/ 03/2005 24/ 02/2005 24/ 01/2005 24/ 12/2004 24/ 11/2004 24/ 10/2004 24/ 09/2004 24/ 08/2004 24/ 07/2004 0 24/ 06/2004 200 24/ 05/2004 Documents 1400 Data courtesy of Arthur Sale Unanimous Recommendation by EUA, Jan 25 2008 791 universities in 46 countries All European Universities should create institutional repositories and should mandate that all research publications must be deposited in them immediately upon publication (and made Open Access as soon as possible thereafter) as already mandated by RCUK, ERC, and NIH, and as recommended by EURAB. In addition, the EUA recommends that these (funder) self-archiving mandates should also be extended to all research results arising from EU research programme/project funding. OAR2008 NOW 54 & 10! The majority of journals (63%) already endorse immediate Green Open Access Self-Archiving of the postprint ROMEO/EPRINTS (Directory of Journal Policies on author OA SelfArchiving): http://romeo.eprints.org/ NOW 63% & 95% OAR2008 For the articles in the 37% of journals that have an embargo policy, the free EPrints institutional Repository-creating software has an ”Email Eprint Request" Button: The user who reaches the metadata for a Closed Access article puts his email in a box and clicks. This sends an automatic email to the author, with a URL on which the author clicks to automatically email the eprint to the OAR2008 requester. Once the ID/OA mandates are universally adopted, the embargoes will soon become obsolete, under growing OA pressure worldwide. Carr & Harnad (2005) Keystroke Economy: A Study of the Time and Effort Involved in Self-Archiving. http://eprints.ecs.soton.ac.uk/10688/ OAR2008 The free EPrints University Repository Software generates rich (and potentially even richer) usage metrics. It can be used for showcasing, navigating, comparing and assessing. Here is a sample of University Repository usage metrics for Southampton author Tim Berners-Lee: http://stats.eprints.ecs.soton.ac.uk/cgi-bin/irstats.cgi? OAR2008 Interoperable Repository Statistics OAR2008 Interoperable Repository Statistics Some EPrints download metrics for top deposits by Southampton OAR2008 author Tim Berners-Lee. These Local Repository Usage metrics at the individual university level can then be complemented by CITEBASE, which provides global Citation, Download, Co-citation, Hub/Authority, growth, and other metrics: http://stats.eprints.ecs.soton.ac.uk/cgi-bin/irstats.cgi? OAR2008 OAR2008 OAR2008 Sample citation and download growth with time. (Downloads only start in 2005 because that is when this paper was deposited.) Early growth rate and late decay metrics for downloads and citations can also be derived. Sample of candidate OA-era metrics: • • • • • • • • • • • • • • Citations (C) CiteRank Co-citations Downloads (D) C/D Correlations Hub/Authority index Chronometrics: Latency/Longevity • Endogamy/Exogamy • Book citation index Research funding Students Prizes h-index Co-authorships Number of articles Number of publishing years • Semiometrics (latent semantic indexing, text overlap, etc.) OAR2008 These metrics can be validated in the UK Research Assessment Exercise (RAE), discipline by discipline, through multiple regression analysis: The metrics can be weighted by their ability to predict the rankings given by the evaluation by human peer panels: OAR2008 UK’s RAE 2008 will be a parallel panel/metric exercise, making it possible to develop a rich spectrum of candidate metrics and to validate each metric against the panel rankings, discipline by discipline, through multiple regression analysis, determining and calibrating the (“beta”) weights on each metric. Harnad, S. (2007) Open Access Scientometrics and the UK Research Assessment Exercise. Proceedings of 11th Annual Meeting of the International Society for Scientometrics and Informetrics 11(1) : 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds. OAR2008 http://eprints.ecs.soton.ac.uk/13804/ RAE 2001 Rankings for Psychology OAR2008 Research Assessment, Research Funding, and Citation Impact “Correlation between RAE ratings and mean departmental citations +0.91 (1996) +0.86 (2001) (Psychology)” “RAE and citation counting measure broadly the same thing” “Citation counting is both more costeffective and more transparent” (Eysenck & Smith 2002) http://psyserver.pc.rhbnc.ac.uk/citations.pdf OAR2008 Here is how ordinary regression (2-variable correlation works): If the correlation between Height (H) and Weight (W) is 0.6 Then if you ‘normalize’ Hz and Wz to the same scale, you can predict Weight from Height: predicted Wz = 0.6(Hz) + unpredictable residual In multiple regression, you can improve the prediction (and reduce the unpredicatble residual) by adding more Predictor variables (e.g., gender, age, parents’ weight, activity level, etc.) in which case the equation becomes a series of predictor variables P1, P2… etc. with “beta weights” b1, b2… All predicting a criterion variable (C): C= b1P1 + b2P2 +b3P3….+bnPn + unpredictable residual OAR2008 In scientometrics, the metrics (M) are the predictors: b1M1 + b2M2 +b3M3….+bnMn + unpredictable residual But what is the criterion? The natural criterion against which to validate metrics, thereby initializing the weights, is peer review! This has to be done separately, discipline by discipline so as to be sure to compare only like with like. The UK’s RAE 2008 is a unique historic opportunity to validate A rich battery of candidate metrics against the peer-panel rankings for each department of each UK university. OAR2008 Sample multiple regression analysis. The criterion here is not RAE peer rankings, but article citations, and the 7 predictors are (1) article age, (2) journal impact factor, (3) number of references, (4) number of authors, (5) science/nonscience, (6) OA/nonOA, (7) mandated/nonmandated. Note how the regression weights (in the graph) show the size Of the contribution of each predictor to the criterion (citations). Note that OA emerges as a significant contributor to citation Counts even when the other 6 variables are factored out. OAR2008 4 Multiple regression analysis for the 4 journal impact factor (JIF) Quartiles, for Age, (JIF), References, Science, OA, and Mandated (OA Advantage only siginificant in to 25%). Criterion that is being predicted is number of citations. 0.500 0.400 0.300 Log_Age Log_IF Log_Ref 0.200 Log_Auth Sci OA 0.100 M 0.000 FI 1_1 -0.100 FI 1_5 FI 2_1 FI 2_5 FI 3_1 FI 3_5 FI 4_1 FI 4_5 Multiple regression analyses within each citation bracket (0, 1, 2…. 20+): OA effect grows bigger in higher citation brackets 0.400 0.350 0.300 0.250 Log_Age Log_IF 0.200 Log_Ref Log_Auth Sci 0.150 OA M 0.100 0.050 0.000 M_a_1 -0.050 M_a_2 M_a_3 M_a_4 M_a_5 M_a_6 M_a_7 M_a_8 M_a_9 M_a_10 M_a_20 Conclusions: Use RAE 3008 peer rankings to validate metrics. Use OA metrics as the incentive to motivate OA mandates Use OA mandates to generate OA content, maximising the research impact at the same time as measuring it OAR2008 Вот - некоторые полезные вебсайты Author’s URLs (UQAM & Southampton): http://www.crsc.uqam.ca/ http://users.ecs.soton.ac.uk/harnad/ BIBLIOGRAPHY ON OA IMACT ADVANTAGE: http://opcit.eprints.org/oacitation-biblio.html BOAI Self-Archiving FAQ: http://www.eprints.org/self-faq/ CITEBASE (scientometric engine): http://citebase.eprints.org/ EPRINTS: http://www.eprints.org/ OA ARCHIVANGELISM: http://openaccess.eprints.org/ ROAR (Registry of OA Repositories): http://roar.eprints.org/ ROARMAP (Registry of OA Repository Mandates): http://www.eprints.org/openaccess/policysignup/ ROMEO/EPRINTS (Directory of Journal Policies on author OA Self-Archiving): http://romeo.eprints.org/ OAR2008 1995: Universal FTP Archives for Esoteric Science and Scholarship: A Subversive Proposal In: Scholarly Journals at the Crossroads. ARL. http://www.arl.org/scomm/subversive/toc.html 2001: Research access, impact and assessment THES 1487 http://cogprints.org/1683/ The Self-Archiving Initiative Nature 410 http://www.nature.com/nature/debates/e-access/Articles/harnad.html Measuring and Maximising UK Research Impact THES http://eprints.ecs.soton.ac.uk/7728/ Mandated online RAE CVs Linked to University Eprint Archives. Ariadne 35 http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm 2004: Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals & Brody D-Lib http://www.dlib.org/dlib/june04/harnad/06harnad.html The Access/Impact Problem and the Green and Gold Roads to Open Access. et al Nature Web Focus. http://www.nature.com/nature/focus/accessdebate/21.html 2005: Journal publishing and author self-archiving: Peaceful Co-Existence Berners-Lee et al http://eprints.ecs.soton.ac.uk/11160/ Keystroke Economy: A Study of the Time and Effort Involved in Self-Archiving. Carr & Harnad http://eprints.ecs.soton.ac.uk/10688/ Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and Research Citation Impact. Hajjem et al IEEE Data Engineering Bulletin 28 http://eprints.ecs.soton.ac.uk/11688/ Making the case for web-based self-archiving Research Money 19 http://eprints.ecs.soton.ac.uk/11534/ 2006: Self-archiving should be mandatory 2006 Research Information http://eprints.ecs.soton.ac.uk/12738/ The Open Research Web: A Preview of the Optimal and the Inevitable Shadbolt et al in Open Access: Key Strategic, Technical and Economic Aspects http://eprints.ecs.soton.ac.uk/12453/ 2007: Open Access Scientometrics and the UK Research Assessment Exercise Proc 11th Ann Mtg Int Soc Scientometrics and Informetrics 11:27-33 http://eprints.ecs.soton.ac.uk/13804/ Time to Convert to Metrics Brody et al Research Fortnight 17 http://eprints.ecs.soton.ac.uk/14329/ Incentivizing the Open Access Research Web: Publication-, Data-Archiving and Scientometrics. Brody et al CTWatch Quarterly 3(3). http://eprints.ecs.soton.ac.uk/14418/ OAR2008