Phys. Rev. D

advertisement
Breaking and remaking peer review with
the SPIRES databases: Our Experience
Travis Brooks
SPIRES Scientific Databases Manager
Stanford Linear Accelerator Center
Pat Kreitz
Director, Technical Information Services
Stanford Linear Accelerator Center
Thanks to Ann Redfield, Michael Peskin, Louise Addis, Heath O’Connell,
and Georgia Row for useful input.
May 2003
Travis Brooks-Trieste
1
Topics
Part I
– History and current situation of SPIRES, arXiv, and
Journals
Part II
– Citation counting: our experiences and views
Part III
– Speculation for the future
May 2003
Travis Brooks-Trieste
2
Part I
Some history, some current data,
and some guesses
May 2003
Travis Brooks-Trieste
3
What is SPIRES?
Bibliographic records for over half a million
papers
– Entire literature of High-Energy Physics (HEP)
– Many papers from related fields
Citations for e-prints and journal articles
Over 25,000 searches a day
Main site and personnel at SLAC
– DESY, FNAL, Durham U., Kyoto U, IHEP (Moscow)
May 2003
Travis Brooks-Trieste
4
arXiv
Since 1991:
–
–
–
–
May 2003
Makes full-text available for download
Links to SPIRES citation lists
Allows revisions
Divides content into hep-th, hep-ph, hep-ex and
many other categories
Travis Brooks-Trieste
5
hep-th vs. hep-ex
Sharp distinction between Theory and
experiment
– Different from other disciplines
Difference between the publishing cultures of the
HEP theorist and the HEP experimentalist
May 2003
Travis Brooks-Trieste
6
th vs. ex Publishing
Experiment:
– Large Collaborations
(>500 authors)
– Difficult to referee
– Reporting results
May 2003
Theory (my focus):
– Small collaborations
(<10 authors)
– Self-contained papers
– Conversational
– hep-th and hep-ph similar
Travis Brooks-Trieste
7
hep-th (Pr)eprints: A Timeline
Mid 1960’s preprints sent by authors to select
groups
1969 SLAC library began ppf (preprints in
particles and fields) list
– Created demand for distribution
– Legitimized preprints/preprint libraries
– Led to anti-ppf list
May 2003
Travis Brooks-Trieste
8
hep-th (Pr)eprints: A Timeline
1974 SPIRES-HEP database indexed preprints
– Allowed more general, worldwide, distribution and
retrieval of preprint titles
– Still needed papers by mail
– Preprints used conversationally
– On WWW in 1991
May 2003
Travis Brooks-Trieste
9
hep-th (Pr)eprints: A Timeline
1991 arXiv.org allowed immediate and universal
electronic access to full-text of preprints
– Preprints became eprints
– Demise of all HEP journals predicted
May 2003
Travis Brooks-Trieste
10
Preprints not new…
arXiv is a logical extension of the movement
towards preprints, not a “bolt from the blue”
– Preprints have a long history of use
– Preprints are more easily distributed today
May 2003
Travis Brooks-Trieste
11
History of hep-th arXiv
History of hep-th (SPIRES data)
3500
– Over 90% of papers
published in Phys. Rev.
D after 1995 were
submitted to arXiv
Total
Peer-Reviewed
Number of Papers
3000
arXiv is busy
Unpublished
2500
2000
But authors still publish!
1500
1000
500
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
0
– 75% of hep-th papers
(prior to 2002) have
been published
Year
May 2003
Travis Brooks-Trieste
12
When are eprints published?
Time from eprint to publication (SPIRES data)
1200
Number of papers
1000
800
600
400
200
Difference between
Phys. Rev. D
publication time
and eprint
appearance time
6,000 articles from
June 1997-2003
Mode at 5 months
17 negative times
not shown
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Journal date - eprint date (Months)
May 2003
Travis Brooks-Trieste
13
When are they published?
Time from eprint to publication (SPIRES data)
1200
Number of papers
1000
800
600
400
200
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
What caused the
negative times?
Are the large
delays from “testing
the waters?”
Do researchers
wait for peer review
to determine if an
article is worth
reading?
Journal date - eprint date (Months)
May 2003
Travis Brooks-Trieste
14
When are papers read?
Q:When does most citing occur?
A:Plot the citations a published hep-th
article receives after its arXiv submission
– 8000 published papers in sample
– Includes citations from journal papers and arXiv
papers (essentially the same set)
May 2003
Travis Brooks-Trieste
15
Eprints, not journals
Journal lag time 5
months
Citation peak occurs
after eprint release,
not journal release
Citations vs. time (SPIRES data)
Avg number of cites/paper
0.85
Avg time of
journal
appearance
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
0.4
0
1
2
3
4
5
6
7
8
9
10
11
12
Month after arXiv submission
May 2003
Travis Brooks-Trieste
Inference:HEP
theorists don’t wait
for the journal.
16
Current hep-th situation
Researchers read the arXiv to find out the latest
scientific information
They base their work on what is in the arXiv
Scientific priority is given by arXiv time stamp,
not journal submission date
They barely notice if it is published
May 2003
Travis Brooks-Trieste
17
HEP theorist’s viewpoint
arXiv is for immediate communication
– A running scientific conversation
Overheard about a paper not sent to hep-ph:
“He didn’t publish it, he just sent it to
Phys. Rev. D”
May 2003
Travis Brooks-Trieste
18
Journals Irrelevant?
hep-th papers eventually peer-reviewed
(SPIRES data)
1
All papers
Cited over 50 times
Cited over 100 times
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
75% of hep-th papers
(prior to 2002) have been
published
Correlation between
large cite counts and
publication
Journals are still very
much alive
0.55
0.5
91 9 92 9 93 9 94 9 95 9 96 9 97 9 98 9 99 0 00 0 01 0 02
19
1
1
1
1
1
1
1
1
2
2
2
May 2003
Travis Brooks-Trieste
19
Why do authors publish?
(4 guesses)
1-Inertia
– There is no other system as developed or as trusted
– Journals are ingrained in researchers’ psyches
– But journals don’t appear to be going away (quickly)
May 2003
Travis Brooks-Trieste
20
Why do authors publish?
2-Feedback
– Refereeing is useful for this paper and the next
– The paper is already on arXiv while it is being
refereed
– But arXiv submissions generate comments and
revisions as well
May 2003
Travis Brooks-Trieste
21
Why do authors publish?
3-Professional Advancement
– Do tenured/secure faculty publish fewer of their
eprints?
• Anecdotally: Witten seven 50+ cited papers as eprints only
• In general: interesting question to think about…
– If professional advancement is the sole purpose of
peer-review, could we not do better?
• Are we using the peer review process as a substitute for
performance evaluation?
May 2003
Travis Brooks-Trieste
22
Why do authors publish?
4-Archival value
– Do authors believe that arXiv is a good archive?
– Will arXiv only eprints still be around (readable,
accessible) in 100 years?
• Perception, not reality, matters here
• E-only journals appear no different
• Centralization, not media, should be the concern
May 2003
Travis Brooks-Trieste
23
Part II
Cite counts and the future
May 2003
Travis Brooks-Trieste
24
Cite Counting
Cite counts present a data-driven picture of the
hep-th eprint culture
Much work already (by many here today)
– Cites to HEP eprints from journal articles are high
and rising (Brown 2001, Youngen 1998, others)
– arXiv impact factor is similar to journals (Fabbrichesi
and Montolli, 2001)
– Many other studies (often using SPIRES-HEP data)
May 2003
Travis Brooks-Trieste
25
Cite Counting
Cite counting for bibliometric purposes seems
reasonable (perhaps)
Cite counting for peer review purposes?
– Services like SPIRES (free) and ISI (fee) make cite
counts available to other researchers, hiring
committees, and tenure review boards.
May 2003
Travis Brooks-Trieste
26
Cite Counts = Peer Review?
Are citations the electronic answer to refereed
journals?
Currently the only answer
– Only one widely available
But not a very good answer
– arXiv + SPIRES cite counts are not Phys. Rev. Lett.
May 2003
Travis Brooks-Trieste
27
Cites: Pros and Cons
SPIRES has been making citations available for
over 25 years
– We have noticed a few things about the process
• Some good
• Some bad
• Some merely interesting
May 2003
Travis Brooks-Trieste
28
Advantages-Dynamic
Cite counts change with the field
– Classics
– New papers
– Newly discovered classics
Ex:Weinberg’s Standard Model paper
– Few cites initially
– Over 5,000 now
Ex:M. Peskin’s topcite reviews
May 2003
Travis Brooks-Trieste
29
Advantage-Fast
Cite counts begin immediately after appearance
Electronic publishing means peer review is the
lag time
Lag time makes journals archivists rather than
communicators
– Led to the replacement of this function by
arXiv/SPIRES/etc.
May 2003
Travis Brooks-Trieste
30
Advantage-Easy
SPIRES tracks citations with 4 staff members
–
–
–
–
May 2003
Total staff is about 8
We are not that technically sophisticated
We are not even especially clever!
Still it is non-trivial
Travis Brooks-Trieste
31
Disadvantage-Accuracy
Speed, ease rely on electronic processing
– Accuracy or speed?
Reference lists in a paper change over an
article’s life
– What counts as a cite?
– Which version of the paper?
May 2003
Travis Brooks-Trieste
32
Disadvantage-Relevance
Theory:Citations are a measure of what
scientists read
But Does Citing = Reading ?
– Simkin & Roychowdhury
(cond-mat/0212043 and cond-mat/0305150)
– Students, general public
May 2003
Travis Brooks-Trieste
33
Disadvantage-Relevance
Theory:Cites are a mark of quality
What about brilliant papers out of the
mainstream?
Are papers really even referenced for scientific
reasons?
– Or are they referenced for sociologic reasons?
– Or are references simply copied?
May 2003
Travis Brooks-Trieste
34
Disadvantage-Relevance
Tongue-in-cheek reasons for not citing prior work
(humorous, but not far off…)
–
–
–
–
“If it’s old, foreign—or—old and foreign”
“They don’t cite us either”
“Rain forest preservation through paper-saving”
“I figured if you’re smart enough to read this paper,
you already knew that!”
from The Scientist
May 2003
Travis Brooks-Trieste
35
Interesting-Importance
People take it seriously
Funding, careers, reputations, etc. are perceived to
depend in some way on SPIRES citation data
May 2003
Travis Brooks-Trieste
36
Interesting-Importance
We receive ~50 emails a day, most of them revolving
around incorrect, incomplete, or missing references
– Usually from an author whose paper was cited but missed
– Often marked “URGENT”
– Occasionally with panicked explanations including the date
that the review committee is meeting
– Sometimes accusing SPIRES of sabotage, or otherwise
expressing outrage at a missed citation
May 2003
Travis Brooks-Trieste
37
Importance is helpful…
Importance shows that cite counting is useful (or
at least used!)
Users of the information are motivated to help
maintain it
– SPIRES is almost open source
– We help eliminate authors’ typos, they help eliminate
our errors
May 2003
Travis Brooks-Trieste
38
…helpful…
SPIRES can replace bad cites with the correct
ones
– Corrects our errors
– Corrects author errors
– Even helps limit propagation of errors
• Ex: a Witten article with 1,300 cites had 100 incorrect
cites, all the same typo
May 2003
Travis Brooks-Trieste
39
…but also worrisome
Responsibility lies with the maintainers of the
citation counts
– Previously in the hands of referees and editors
Self-citation
– Boost counts artificially
Deception
– We have had it happen
May 2003
Travis Brooks-Trieste
40
Citation Counts: Summary
We do it, and it works
– Fast, Easy, and Fluid
– Valued by the Community
It is more than imperfect
– Relevance and Accuracy
– Does not yet replace traditional peer review
May 2003
Travis Brooks-Trieste
41
Part III
What would it take to truly change peer
review?
May 2003
Travis Brooks-Trieste
42
To change peer review
Stakeholders in the peer review system
–
–
–
–
Editors
Referees
Authors
Readers
Fundamental differences between disciplines
– hep-th and hep-ex are different in their adoption of
eprints
May 2003
Travis Brooks-Trieste
43
To change peer review
Functions of peer review when divorced from
communication
One must replace (or discard) all of these
– Metrics for papers
– Metrics for scientists
– Metrics for truth?
May 2003
Travis Brooks-Trieste
44
Peer review = “good science” ?
Peer review gives a seal of approval
– Laypeople
• Medicine, Environmental Science, etc.
Refereeing process is filled with examples of
weakness
– Yet it feels fundamentally sound
Publishers have taken this role of “vetting”
science
May 2003
Travis Brooks-Trieste
45
Truth is more complex
Community acceptance determines scientific truth
– “Yesterday’s sensation, today’s calibration”
The “test of time” is longer than the 6 month lag time for
journal articles
Immediacy is needed for communication and
conversation
But deliberation is needed for context and community
judgment
May 2003
Travis Brooks-Trieste
46
An Opportunity
Place an article in the context of the surrounding
work
– Reference linking only a baby step
– Degree to which a finding has been verified or
contradicted by earlier or later work
Ex: M. Peskin’s Topcites reviews at SLAC
– The numbers are amusing
– Context is the real value
May 2003
Travis Brooks-Trieste
47
Context
Another Example: Particle Data Group
–
–
–
–
May 2003
Reports data from all HEP experiments
Sorts and combines data
References to comments on validity
References to interpretations of the data
Travis Brooks-Trieste
48
PDG Example
May 2003
Travis Brooks-Trieste
49
Opportunities
Intense scrutiny not possible for journals
– Context is important
Amazon and google
– Personalized and dynamic
– Citebase
– Torii
May 2003
Travis Brooks-Trieste
50
A New system
Any new system would need to do (at least) the
following
– React to changes in the scientific world
“You cannot read the same paper twice”
– Provide context as well as content
– Be fast and easy enough to keep up with scientific
conversations taking place on arXiv(es)
– Provide an imprimatur of quality both for the
cognoscenti and the amateurs
May 2003
Travis Brooks-Trieste
51
Summary
SLAC-SPIRES and arXiv helped transform the
hep-th publishing environment
– Journals play no role in communication
– Journals are still widely used
Citation counting played a part in this transition
– Counting is not a complete solution to peer review
New models of peer review are farther away
– Should be richer than any current example
May 2003
Travis Brooks-Trieste
52
Why HEP Theory?
No proprietary/patent issues
Papers can be verified by hand, by any
knowledgeable reader
Work is like a continuing dialog, each paper
sparking new, creative ideas
May 2003
Travis Brooks-Trieste
53
Same basic style
Note that the basic publication style has not really
changed
– HEP Theory has not moved away from papers written by
a few authors to more complex technology-enabled
collaborations
May 2003
Travis Brooks-Trieste
54
Other Fields
HEP experiment has had more radical changes in
working style
–
–
–
–
World’s largest database (>600TB)
Worldwide data processing grid
Close to 1000 authors on a paper
Technology used to push pre-paper scientific collaboration to
new levels
Other fields might retain traditional journal roles while
using unpublished research as additions and
expansions rather than substitutes
May 2003
Travis Brooks-Trieste
55
Conclusions
HEP theorists have universally adopted eprints
as the means of intra field communication
Peer-reviewed journals are still heavily used, but
for different purposes
The needs of HEP theorists were very close to
the traditional publication model
May 2003
Travis Brooks-Trieste
56
Download