Breaking and remaking peer review with the SPIRES databases: Our Experience Travis Brooks SPIRES Scientific Databases Manager Stanford Linear Accelerator Center Pat Kreitz Director, Technical Information Services Stanford Linear Accelerator Center Thanks to Ann Redfield, Michael Peskin, Louise Addis, Heath O’Connell, and Georgia Row for useful input. May 2003 Travis Brooks-Trieste 1 Topics Part I – History and current situation of SPIRES, arXiv, and Journals Part II – Citation counting: our experiences and views Part III – Speculation for the future May 2003 Travis Brooks-Trieste 2 Part I Some history, some current data, and some guesses May 2003 Travis Brooks-Trieste 3 What is SPIRES? Bibliographic records for over half a million papers – Entire literature of High-Energy Physics (HEP) – Many papers from related fields Citations for e-prints and journal articles Over 25,000 searches a day Main site and personnel at SLAC – DESY, FNAL, Durham U., Kyoto U, IHEP (Moscow) May 2003 Travis Brooks-Trieste 4 arXiv Since 1991: – – – – May 2003 Makes full-text available for download Links to SPIRES citation lists Allows revisions Divides content into hep-th, hep-ph, hep-ex and many other categories Travis Brooks-Trieste 5 hep-th vs. hep-ex Sharp distinction between Theory and experiment – Different from other disciplines Difference between the publishing cultures of the HEP theorist and the HEP experimentalist May 2003 Travis Brooks-Trieste 6 th vs. ex Publishing Experiment: – Large Collaborations (>500 authors) – Difficult to referee – Reporting results May 2003 Theory (my focus): – Small collaborations (<10 authors) – Self-contained papers – Conversational – hep-th and hep-ph similar Travis Brooks-Trieste 7 hep-th (Pr)eprints: A Timeline Mid 1960’s preprints sent by authors to select groups 1969 SLAC library began ppf (preprints in particles and fields) list – Created demand for distribution – Legitimized preprints/preprint libraries – Led to anti-ppf list May 2003 Travis Brooks-Trieste 8 hep-th (Pr)eprints: A Timeline 1974 SPIRES-HEP database indexed preprints – Allowed more general, worldwide, distribution and retrieval of preprint titles – Still needed papers by mail – Preprints used conversationally – On WWW in 1991 May 2003 Travis Brooks-Trieste 9 hep-th (Pr)eprints: A Timeline 1991 arXiv.org allowed immediate and universal electronic access to full-text of preprints – Preprints became eprints – Demise of all HEP journals predicted May 2003 Travis Brooks-Trieste 10 Preprints not new… arXiv is a logical extension of the movement towards preprints, not a “bolt from the blue” – Preprints have a long history of use – Preprints are more easily distributed today May 2003 Travis Brooks-Trieste 11 History of hep-th arXiv History of hep-th (SPIRES data) 3500 – Over 90% of papers published in Phys. Rev. D after 1995 were submitted to arXiv Total Peer-Reviewed Number of Papers 3000 arXiv is busy Unpublished 2500 2000 But authors still publish! 1500 1000 500 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 0 – 75% of hep-th papers (prior to 2002) have been published Year May 2003 Travis Brooks-Trieste 12 When are eprints published? Time from eprint to publication (SPIRES data) 1200 Number of papers 1000 800 600 400 200 Difference between Phys. Rev. D publication time and eprint appearance time 6,000 articles from June 1997-2003 Mode at 5 months 17 negative times not shown 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Journal date - eprint date (Months) May 2003 Travis Brooks-Trieste 13 When are they published? Time from eprint to publication (SPIRES data) 1200 Number of papers 1000 800 600 400 200 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 What caused the negative times? Are the large delays from “testing the waters?” Do researchers wait for peer review to determine if an article is worth reading? Journal date - eprint date (Months) May 2003 Travis Brooks-Trieste 14 When are papers read? Q:When does most citing occur? A:Plot the citations a published hep-th article receives after its arXiv submission – 8000 published papers in sample – Includes citations from journal papers and arXiv papers (essentially the same set) May 2003 Travis Brooks-Trieste 15 Eprints, not journals Journal lag time 5 months Citation peak occurs after eprint release, not journal release Citations vs. time (SPIRES data) Avg number of cites/paper 0.85 Avg time of journal appearance 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0 1 2 3 4 5 6 7 8 9 10 11 12 Month after arXiv submission May 2003 Travis Brooks-Trieste Inference:HEP theorists don’t wait for the journal. 16 Current hep-th situation Researchers read the arXiv to find out the latest scientific information They base their work on what is in the arXiv Scientific priority is given by arXiv time stamp, not journal submission date They barely notice if it is published May 2003 Travis Brooks-Trieste 17 HEP theorist’s viewpoint arXiv is for immediate communication – A running scientific conversation Overheard about a paper not sent to hep-ph: “He didn’t publish it, he just sent it to Phys. Rev. D” May 2003 Travis Brooks-Trieste 18 Journals Irrelevant? hep-th papers eventually peer-reviewed (SPIRES data) 1 All papers Cited over 50 times Cited over 100 times 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 75% of hep-th papers (prior to 2002) have been published Correlation between large cite counts and publication Journals are still very much alive 0.55 0.5 91 9 92 9 93 9 94 9 95 9 96 9 97 9 98 9 99 0 00 0 01 0 02 19 1 1 1 1 1 1 1 1 2 2 2 May 2003 Travis Brooks-Trieste 19 Why do authors publish? (4 guesses) 1-Inertia – There is no other system as developed or as trusted – Journals are ingrained in researchers’ psyches – But journals don’t appear to be going away (quickly) May 2003 Travis Brooks-Trieste 20 Why do authors publish? 2-Feedback – Refereeing is useful for this paper and the next – The paper is already on arXiv while it is being refereed – But arXiv submissions generate comments and revisions as well May 2003 Travis Brooks-Trieste 21 Why do authors publish? 3-Professional Advancement – Do tenured/secure faculty publish fewer of their eprints? • Anecdotally: Witten seven 50+ cited papers as eprints only • In general: interesting question to think about… – If professional advancement is the sole purpose of peer-review, could we not do better? • Are we using the peer review process as a substitute for performance evaluation? May 2003 Travis Brooks-Trieste 22 Why do authors publish? 4-Archival value – Do authors believe that arXiv is a good archive? – Will arXiv only eprints still be around (readable, accessible) in 100 years? • Perception, not reality, matters here • E-only journals appear no different • Centralization, not media, should be the concern May 2003 Travis Brooks-Trieste 23 Part II Cite counts and the future May 2003 Travis Brooks-Trieste 24 Cite Counting Cite counts present a data-driven picture of the hep-th eprint culture Much work already (by many here today) – Cites to HEP eprints from journal articles are high and rising (Brown 2001, Youngen 1998, others) – arXiv impact factor is similar to journals (Fabbrichesi and Montolli, 2001) – Many other studies (often using SPIRES-HEP data) May 2003 Travis Brooks-Trieste 25 Cite Counting Cite counting for bibliometric purposes seems reasonable (perhaps) Cite counting for peer review purposes? – Services like SPIRES (free) and ISI (fee) make cite counts available to other researchers, hiring committees, and tenure review boards. May 2003 Travis Brooks-Trieste 26 Cite Counts = Peer Review? Are citations the electronic answer to refereed journals? Currently the only answer – Only one widely available But not a very good answer – arXiv + SPIRES cite counts are not Phys. Rev. Lett. May 2003 Travis Brooks-Trieste 27 Cites: Pros and Cons SPIRES has been making citations available for over 25 years – We have noticed a few things about the process • Some good • Some bad • Some merely interesting May 2003 Travis Brooks-Trieste 28 Advantages-Dynamic Cite counts change with the field – Classics – New papers – Newly discovered classics Ex:Weinberg’s Standard Model paper – Few cites initially – Over 5,000 now Ex:M. Peskin’s topcite reviews May 2003 Travis Brooks-Trieste 29 Advantage-Fast Cite counts begin immediately after appearance Electronic publishing means peer review is the lag time Lag time makes journals archivists rather than communicators – Led to the replacement of this function by arXiv/SPIRES/etc. May 2003 Travis Brooks-Trieste 30 Advantage-Easy SPIRES tracks citations with 4 staff members – – – – May 2003 Total staff is about 8 We are not that technically sophisticated We are not even especially clever! Still it is non-trivial Travis Brooks-Trieste 31 Disadvantage-Accuracy Speed, ease rely on electronic processing – Accuracy or speed? Reference lists in a paper change over an article’s life – What counts as a cite? – Which version of the paper? May 2003 Travis Brooks-Trieste 32 Disadvantage-Relevance Theory:Citations are a measure of what scientists read But Does Citing = Reading ? – Simkin & Roychowdhury (cond-mat/0212043 and cond-mat/0305150) – Students, general public May 2003 Travis Brooks-Trieste 33 Disadvantage-Relevance Theory:Cites are a mark of quality What about brilliant papers out of the mainstream? Are papers really even referenced for scientific reasons? – Or are they referenced for sociologic reasons? – Or are references simply copied? May 2003 Travis Brooks-Trieste 34 Disadvantage-Relevance Tongue-in-cheek reasons for not citing prior work (humorous, but not far off…) – – – – “If it’s old, foreign—or—old and foreign” “They don’t cite us either” “Rain forest preservation through paper-saving” “I figured if you’re smart enough to read this paper, you already knew that!” from The Scientist May 2003 Travis Brooks-Trieste 35 Interesting-Importance People take it seriously Funding, careers, reputations, etc. are perceived to depend in some way on SPIRES citation data May 2003 Travis Brooks-Trieste 36 Interesting-Importance We receive ~50 emails a day, most of them revolving around incorrect, incomplete, or missing references – Usually from an author whose paper was cited but missed – Often marked “URGENT” – Occasionally with panicked explanations including the date that the review committee is meeting – Sometimes accusing SPIRES of sabotage, or otherwise expressing outrage at a missed citation May 2003 Travis Brooks-Trieste 37 Importance is helpful… Importance shows that cite counting is useful (or at least used!) Users of the information are motivated to help maintain it – SPIRES is almost open source – We help eliminate authors’ typos, they help eliminate our errors May 2003 Travis Brooks-Trieste 38 …helpful… SPIRES can replace bad cites with the correct ones – Corrects our errors – Corrects author errors – Even helps limit propagation of errors • Ex: a Witten article with 1,300 cites had 100 incorrect cites, all the same typo May 2003 Travis Brooks-Trieste 39 …but also worrisome Responsibility lies with the maintainers of the citation counts – Previously in the hands of referees and editors Self-citation – Boost counts artificially Deception – We have had it happen May 2003 Travis Brooks-Trieste 40 Citation Counts: Summary We do it, and it works – Fast, Easy, and Fluid – Valued by the Community It is more than imperfect – Relevance and Accuracy – Does not yet replace traditional peer review May 2003 Travis Brooks-Trieste 41 Part III What would it take to truly change peer review? May 2003 Travis Brooks-Trieste 42 To change peer review Stakeholders in the peer review system – – – – Editors Referees Authors Readers Fundamental differences between disciplines – hep-th and hep-ex are different in their adoption of eprints May 2003 Travis Brooks-Trieste 43 To change peer review Functions of peer review when divorced from communication One must replace (or discard) all of these – Metrics for papers – Metrics for scientists – Metrics for truth? May 2003 Travis Brooks-Trieste 44 Peer review = “good science” ? Peer review gives a seal of approval – Laypeople • Medicine, Environmental Science, etc. Refereeing process is filled with examples of weakness – Yet it feels fundamentally sound Publishers have taken this role of “vetting” science May 2003 Travis Brooks-Trieste 45 Truth is more complex Community acceptance determines scientific truth – “Yesterday’s sensation, today’s calibration” The “test of time” is longer than the 6 month lag time for journal articles Immediacy is needed for communication and conversation But deliberation is needed for context and community judgment May 2003 Travis Brooks-Trieste 46 An Opportunity Place an article in the context of the surrounding work – Reference linking only a baby step – Degree to which a finding has been verified or contradicted by earlier or later work Ex: M. Peskin’s Topcites reviews at SLAC – The numbers are amusing – Context is the real value May 2003 Travis Brooks-Trieste 47 Context Another Example: Particle Data Group – – – – May 2003 Reports data from all HEP experiments Sorts and combines data References to comments on validity References to interpretations of the data Travis Brooks-Trieste 48 PDG Example May 2003 Travis Brooks-Trieste 49 Opportunities Intense scrutiny not possible for journals – Context is important Amazon and google – Personalized and dynamic – Citebase – Torii May 2003 Travis Brooks-Trieste 50 A New system Any new system would need to do (at least) the following – React to changes in the scientific world “You cannot read the same paper twice” – Provide context as well as content – Be fast and easy enough to keep up with scientific conversations taking place on arXiv(es) – Provide an imprimatur of quality both for the cognoscenti and the amateurs May 2003 Travis Brooks-Trieste 51 Summary SLAC-SPIRES and arXiv helped transform the hep-th publishing environment – Journals play no role in communication – Journals are still widely used Citation counting played a part in this transition – Counting is not a complete solution to peer review New models of peer review are farther away – Should be richer than any current example May 2003 Travis Brooks-Trieste 52 Why HEP Theory? No proprietary/patent issues Papers can be verified by hand, by any knowledgeable reader Work is like a continuing dialog, each paper sparking new, creative ideas May 2003 Travis Brooks-Trieste 53 Same basic style Note that the basic publication style has not really changed – HEP Theory has not moved away from papers written by a few authors to more complex technology-enabled collaborations May 2003 Travis Brooks-Trieste 54 Other Fields HEP experiment has had more radical changes in working style – – – – World’s largest database (>600TB) Worldwide data processing grid Close to 1000 authors on a paper Technology used to push pre-paper scientific collaboration to new levels Other fields might retain traditional journal roles while using unpublished research as additions and expansions rather than substitutes May 2003 Travis Brooks-Trieste 55 Conclusions HEP theorists have universally adopted eprints as the means of intra field communication Peer-reviewed journals are still heavily used, but for different purposes The needs of HEP theorists were very close to the traditional publication model May 2003 Travis Brooks-Trieste 56