Peter Shepherd

advertisement
New COUNTER-based usage metrics
for journals and other publications
Peter Shepherd
COUNTER
August 2011
New metrics


Journal Usage Factor: a usage-based
measure of journal impact and quality ( funded
by UKSG, RIN and others)
PIRUS (Publisher and Institutional Repository
Usage Statistics): usage statistics for individual
articles (funded by JISC)
Journal Usage Factor:
overview
ISI's Impact Factor compensates for the fact that
larger journals will tend to be cited more than
smaller ones

Can we do something similar for usage?

In other words, should we seek to develop a
“Usage Factor” as an additional measure of
journal quality/value?
Journal Usage Factor advantages

Especially helpful for journals and fields not
covered by ISI

Especially helpful for journals with high
undergraduate or practitioner use

Especially helpful for journals publishing
relatively few articles

Data available potentially sooner than with
Impact Factors
Usage Factor Stage 2
Modelling and Analysis

Real journal usage data analysed by John Cox
Associates, Frontline GMS and CIBER

Participating publishers:






American Chemical Society
Emerald
IOP
Nature Publishing
OUP
Sage
Springer
The data

326 journals
 38 Engineering
 32 Physical Sciences
 119 Social Sciences
• 29 Business and Management


35 Humanities
102 Medicine and Life Sciences
• 57 Clinical Medicine

c.150,000 articles

1GB of data
Results

Content Type

In social sciences JUFs were higher for nonarticle content

In medicine and life sciences JUFs were higher for
article content

In humanities, physical sciences, and business &
management, JUF differences between article and
non-article content were not significant
Results

Article Version

In physical sciences the JUF was
significantly (sometimes dramatically)
lower when calculations were confined to
the Version of Record

In all other subjects the JUF was
significantly higher when calculations were
confined to the Version of Record
Results

JUF and Impact Factor

Little correlation apart from the Nature
branded titles

Some titles with no or very low impact
factors have very high JUFs
Journal Usage Factor:
some initial results
Humanities JUFIF
1,200.00
Journal Usage Factor
1,000.00
800.00
600.00
Publication 2008-09 Usage Period 1-24
400.00
200.00
0.00
0
0.5
1
1.5
2
2.5
Impact Factor
3
3.5
4
Recommendations- the metric

The Journal Usage Factor should be calculated
using the median rather than the arithmetic
mean

Usage data are highly skewed; most items
attract relatively low use and a few are used
many times. As a result, the use of the arithmetic
mean is not appropriate
Recommendations- the metric

A range of usage factors should ideally be
published for each journal: a comprehensive
factor (all items, all versions) plus
supplementary factors for selected items (e.g.
article and final versions).

There is considerable variation in the relative use
made of different document types and versions.
This means that the usage factor will be affected
substantially by the particular mix of items
included in a given journal, all other things being
equal
Recommendations- the metric

Journal Usage Factors should be published as
integers with no decimal places


Monthly patterns of use at the item level are quite
volatile and usage factors therefore include a
component of statistical noise
Journal Usage Factors should be published with
appropriate confidence levels around the average to
guide their interpretation

As a result of this statistical noise, the mean usage factor
should be interpreted within intervals of plus or minus 22
per cent
Recommendations- the metric

The Journal Usage Factor should be calculated initially on the
basis of a maximum time window of 24 months. It might be
helpful later on to consider a 12-month window as well (or
possibly even a 6-month window) to provide further insights

This study shows that relatively short time windows capture a
substantial proportion of the average lifetime interest in full journal
content. Longer windows than 24-months are not recommended
and this should be considered a maximum. There is possibly a
case for considering a 12-month window, but there are counterarguments here: the impact of publishing ahead of print especially.
Recommendations- the metric

The Journal Usage Factor is not directly comparable across
subject groups and should therefore be published and
interpreted only within appropriate subject groupings.


Usage, in months 1-12 especially, follows different patterns in
different subject areas
The Journal Usage Factor should be calculated using a
publication window of 2 years

Usage factors will tend to inflate across the board year-on-year as
a result of many factors, including greater item discoverability
through search engines and gateways. Changes to access
arrangements (e.g. Google indexing) will have dramatic and lasting
effects. The use of a two-year publication window would ameliorate
some of these effects by providing a moving average as well as a
greater number of data points for calculating the usage factor.
Recommendations- the metric

There seems to be no reason why ranked lists of journals by
usage factor should not gain acceptance


The usage factor delivers journal rankings that are comparable in
terms of their year-on-year stability with those generated from
citation metrics such as the ISI impact factor and SNIP
Small journals and titles with less than 100 downloads per item
are unsuitable candidates for Journal Usage Factors: these are
likely to be inaccurate and easily gamed

Usage factors below a certain threshold value (perhaps 100 but
research is needed on a larger scale to explore this further) are
likely to be inaccurate due to statistical noise. The size of the
journal should also be taken into account
Recommendations- the metric

The Journal Usage Factor provides very different information
from the citation Impact Factor and this fact should be
emphasised in public communications.


The usage factor does not appear to be statistically associated with
measures of citation impact
Further work is needed on usage factor gaming and on
developing robust forensic techniques for its detection

Attempts to game the usage factor are highly likely. CIBER’s view
is that the real threat comes from software agents rather than
human attack. The first line of defence has to be making sure that
COUNTER protocols are robust against machine attack. The
analysis in this report suggests that a cheap and expedient second
line of defence would be to develop statistical forensics to identify
suspicious behaviour, whether it is human or machine in origin.
Recommendations- the metric

Further work is needed to broaden the scope of the project over
time to include other usage-based metrics

Although the scope of this study was to consider the Journal Usage
Factor only, future work could look at the other indicators that
mimic other aspects of online use, such as a ‘journal usage half-life’
or a ‘reading immediacy index’.
Recommendations - infrastructure




Development of systems to automate the extraction and
collation of data needed for JUF calculation is essential if
calculation of this metric is to become routine
Development of an agreed standard for content item
types, to which journal specific item types would be
mapped, is desirable as it would allow for greater
sophistication in JUF calculation
Development or adoption of a simple subject taxonomy
to which journal titles would be assigned by their
publishers
Publishers should adopt standard “article version”
definitions based on ALPSP/NISO recommendations
Next steps: Stage 3
Stage 3: Objectives





Preparation of a draft Code of Practice for the Journal Usage Factor
Further testing of the recommended methodology for calculating Journal
Usage Factor
Investigation of an appropriate, resilient subject taxonomy for the
classification of journals
Exploration of the options for an infrastructure to support the sustainable
implementation of JUF
Investigate the feasibility of applying the Usage Factor concept to
other categories of publication
For further information, see the full report of the Journal Usage Factor
project on the COUNTER website: http://www.projectcounter.org/usage_factor.html
PIRUS: why now?
Increasing interest in article-level usage



More journal articles hosted by Institutional and other
Repositories
Authors and funding agencies are increasingly interested in a
reliable, global overview of usage of individual articles
Online usage becoming an alternative, accepted measure of
article and journal value
 Knowledge Exchange report recommends developing
standards for usage reporting at the individual article level
 Usage-based metrics being considered as a tool for use in
measuring the outputs of research.
PIRUS: why now?
Article-level usage metrics now more practical


Implementation by COUNTER of XML-based usage reports
makes more granular reporting of usage a practical proposition
Implementation by COUNTER of the SUSHI protocol facilitates
the automated consolidation of usage data from different
sources.
PIRUS: the challenge

An article may be available from:





The main journal web site
Ovid
ProQuest
PubMed Central
Authors’ local Institutional Repositories
If we want to assess article impact by counting usage, how can
we maximise the actual usage that we capture?
PIRUS: mission and project aims
Mission
To develop a global standard to enable the recording, reporting and
consolidation of online usage statistics for individual journal
articles hosted by Institutional Repositories, Publishers and
other entities
Project aims
 Develop COUNTER-compliant usage reports at the individual
article level

Create guidelines which, if implemented, would enable any
entity that hosts online journal articles to produce these reports

Propose ways in which these reports might be consolidated at a
global level in a standard way.
PIRUS: benefits





Reliable usage data will be available for journal articles,
wherever they are held
Repositories will have access to new functionality from
open source software that will allow them to produce
standardised usage reports from their data
Digital repository systems will be more integral to
research and closely aligned to research workflows and
environments
The authoritative status of PIRUS usage statistics will
enhance the status of repository data and content
The standard can be extended to cover other categories
of content stored by repositories
PIRUS: results




Technical: a workable technical model for the collection,
processing and consolidation of individual article usage
statistics has been developed.
Organizational: an organizational model for a Central Clearing
House that would be responsible for the collection, processing
and consolidation of usage statistic has been proposed.
Economic: the costs for repositories and publishers of
generating the required usage reports, as well as the costs of
any central clearing house/houses have been calculated and a
model for recovering these costs has been proposed
Political: the broad support of all the major stakeholder groups
(repositories, publishers, authors, etc) is being sought.
PIRUS: organizational issues



Specifications for the Governance of PIRUS,
going forward
define the nature and mission of the Central
Clearing House(s) (CCH) in more detail, in
discussion with publishers and repositories
Develop a specification for the technical,
organizational and business models for the
CCH
PIRUS2: governance going forward
Principles
 Independent, not-for-profit organization
 International
 Representation of the main stakeholder groups
 Repositories
 Publishers
 Research Institutions
Role
 Define and implement mission
 Strategic oversight
 Set and monitor standards
 Set fees and manage finances
 Select and monitor suppliers
PIRUS2: nature and mission of the
Central Clearing House(s)

One global CCH







Cost-effective
Industry is global, with global standards
Easier to set and modify standards
Simpler interface with publishers and
repositories
Can be outsourced
Existing organizations exist with the required
capabilities
Scenarios to be supported

See next slide……..
Step 1: a fulltext article is downloaded
Step 2: tracker code invoked, generating an OpenURL log entry
Scenario A
Step A1: OpenURL log entries sent to CCH
responsible for creating and consolidating the
usage statistics
Step A2: logs filtered by COUNTER rules
Step A3: COUNTER-compliant usage
statistics collected and collated per article
(DOI) in XML format
Step A4: COUNTER compliant usage
statistics available from CCH to authorized
parties
Scenario B
Scenario C
Step B1: OpenURL log entry sent to local
server
Step C1: OpenURL log entry sent to local
server
Step B2: OpenURL log entries harvested by
CCH responsible for creating and
consolidating usage statistics
Step C2: logs filtered by COUNTER rules
Step B3: logs filtered by COUNTER rules
Step B4: COUNTER-compliant usage
statistics collected and collated per article
(DOI) in XML format
Step B5: COUNTER compliant usage
statistics available from CCH to authorized
parties
Step C3: COUNTER-compliant usage
statistics collected and collated per article
(DOI) in XML format
Step C4: COUNTER compliant usage
statistics available from repository or publisher
to CCH
PIRUS2: CCH operating principles



The “bucket” of usage data should be
controlled by the participants - they can
decide whether to compile the usage
reports themselves or to delegate that
role to the CCH
Access to the CCH should be limited to
authorised parties
Usage reports must state the sources
from which they have been compiled to
ensure transparency
PIRUS: outputs from the CCH
Usage reports for publishers
 Usage reports for repositories
 Usage reports for research institutions
Key requirements:
 Set of core reports
 Flexibility in outputs

PIRUS: summary
A prototype service that meets the following criteria:



A workable technical model, refined from that proposed in the original
PIRUS project with more extensive tests with a larger and more diverse
data set
A practical organizational model based on co-operation between
proven, existing suppliers of data processing, data management and
auditing services that meets the requirement for an independent,
trusted and reliable service. It is clear from a survey carried out at the
end of this project that the majority of publishers are not, largely for
economic reasons, yet ready to implement or participate in such a
service
An economic model that provides a cost-effective service and a
logical, transparent basis for allocating costs among the different users
of the service. While this economic model is based on costs that
vendors of usage statistics services have validated as reasonable, there
is strong resistance from publishers to accepting these costs
Further information: http://www.cranfieldlibrary.cranfield.ac.uk/pirus2/tiki-index.php
COUNTER Release 4
- objectives



A single, unified Code covering all e-resources,
including journals, databases, books, reference works,
multimedia content, etc.
Improve the application of XML and SUSHI in the
design of the usage reports
Take into account the outcomes of the Journal Usage
Factor and PIRUS projects
COUNTER Release 4: timetable
and development process







April 2011: announcement of objectives, process and
timetable for the development of Release 4; open
invitation to submit suggestions
April-June 2011: evaluation of submitted suggestions
by COUNTER Executive
June 2011- September 2011: development of Draft
Release 4
October 2011: publication of Draft Release 4
October 2011- January 2012: comments received on
Draft Release 4
March 2012: publication of Release 4
December 2013: deadline for implementation by
vendors of Release 4
Further information: http://www.projectcounter.org
Download