Implications of Release 3 of the COUNTER Code of Practice

advertisement
Implications of Release 3 of the
COUNTER Code of Practice
Vendor Usage Reports: Are we all on the same
page now?
Charleston Conference
November 6, 2008
Oliver Pesch
EBSCO Information Services
opesch@ebsco.com
Overview
• Recap of Release 3 of the Codes of Practice
• New reports
• Data processing and related rules
• Delivery of reports
• Audit and compliance
Overview
• Recap of Release 3 of the Codes of Practice
• New reports
• Data processing and related rules
• Delivery of reports
• Audit and compliance
Recap of Release 3 of the Journals and Databases Code of
Practice
Key features…
• New reports
- Journal Report 1a (full text requests by archive)
- Journal Report 5 (breakdown by year of publication)
- Consortium reports (for full text requests by title and searches by
database with breakdown by consortium member)
• Data processing
- Federated searching
- Internet robots and archives like LOCKSS
• Reports must be available in XML format
• Revised COUNTER XML Schema
• SUSHI support becomes a requirement for compliance
Overview
• Recap of Release 3 of the Codes of Practice
• New reports
• Data processing and related rules
• Delivery of reports
• Audit and compliance
Journal and Database Code of Practice: Reports
•
Journal Report 1
- Full text article requests by month and journal
• Journal Report 1a
- Full text article requests from an archive by month and journal
• Journal Report 5
- Full-text article requests by year-of-publication and journal
•
Journal Report 2
- Turnaways by month and journal
• Consortium Report 1
Full text journal article/book chapter requests by month (xml only)
• Consortium Report 2
•
•
•
- Searches by month and database (xml only)
Database Report 1
- Searches and sessions by month and database
Database Report 2
- Turnaways by month and database
Database Report 3
- Searches and sessions by month and service
CoP section 4.1
Journal Report 1a or Journal Report 5: addressing archives
• Librarians want to determine the value of archives and back files
• Journal Report 1 reflects overall usage for a title but not what years
of content was accessed
• Either Journal Report 1a or Journal Report 5 must be provided
• If the archive or back file is managed and accessed as a separate
entity, then Journal Report 1a can be provided
• If the archive or back files are not distinguishable from the rest of the
content, then Journal Report 5 is required.
CoP section 4.1, page 19
Journal Report 1a
CoP section 4.1, page 19
Journal Report 5
CoP section 4.1, page 19
Consortium Reports
• The consortium administrator needs:
- Statistics on overall use of the materials acquired by the
consortium;
- Broken down by participating member institution
• They can use this information to:
-
Judge the overall “value” of a particular deal
- Determine if value to a particular member
- Assist with distribution of cost
- Identify members who may not be maximizing their use of
materials provided by the consortium
• The previous releases of the Code of Practice were not clear on
consortium reports.
CoP section 4.1, page 19
Consortium Reports
• The previous releases of the Code of Practice were not clear on
consortium reports.
• The result was a lot of work and uneven results
- Some vendors provided reports that were merely a summary of
all activity without a breakdown by member
- Some vendors required the consortium administrator to request
reports for each member institution one-at-a-time without the
option for a summary report
- Some vendors even required separate logins for each member
• The new Consortium Reports and related requirements address
these issues.
CoP section 4.1, page 19
Consortium Report 1 (approximation)
Overview
• Recap of Release 3 of the Codes of Practice
• New reports
• Data processing and related rules
• Delivery of reports
• Audit and compliance
Data Processing and Related Rules
• Comparability and consistency of usage statistics can be
compromised depending on the handling of:
- Double clicks
- Return codes to count
- Federated Search Engines
- Internet robots
- LOCKSS and similar cache systems
CoP section 5
Data Processing and Related Rules: Double clicks
• Issue:
- The user clicking a PDF or other link several times can
unintentionally inflate the usage counts.
• Solution:
- Introduce a double-click filter
- If the user re-clicks the same link to a PDF document within a 30
second time window, ignore the first click.
- If the user re-clicks the same link to an HTML document within a
10 second time window, ignore the first click.
CoP section 5, pg 32
Data Processing and Related Rules: Return Codes
• Issue:
- With progressive display of PDF documents, the content server
delivers the document on pieces. Each piece sent generates an
entry in the web transaction log. If all such log entries are
counted, the result is excessive counting of full text requests.
• Solution:
- The “return code” recorded in the transaction log indicates
whether the request is the first piece of the document or a
successive one.
- Only count transactions with a return code of 200 or 304.
CoP section 5, pg 32
Data Processing and Related Rules: Federated Search
• Issue:
- In order to anticipate an end user’s needs, most federated
search engines are configured to always search many
databases and content sites. As a result, search counts for such
sites are extremely high and no longer are indicative of specific
user action making comparison difficult..
• Solution:
- In general the federated search engine can be detected either
because of its use of a specific API or login at the content site, or
by looking at the browser ID in the web log.
- Content providers must attempt to identify searches coming from
federated search products.
- The Searches_Federated metric has been introduced into the
Database Reports to allow regular and federated searches to be
presented separately.
- If sessions are counted for federated search activity, then these
too should be presented separately as Sessions_Federated.
CoP section 5, pg 33
Data Processing and Related Rules: Internet robots
• Issue:
- Robots and “crawlers” sometimes mine a content site either for
the purpose of indexing data or for performing text analysis (or
some times for non-legitimate reasons). Counting such search
and retrieval activity (if it takes place from within the IP range of
an institution) can greatly inflate usage data.
• Solution:
- It is possible to identify many internet robots and web crawlers
by the browser ID that is logged in the web log.
- All search activity from identified internet robots must be
excluded from COUNTER reports.
CoP section 5, pg 33-34
Data Processing and Related Rules: LOCKSS
• Issue:
- LOCKSS (Lots of Copies Keeps Stuff Safe) and similar caches
are frequently implemented by institutions to ensure they have
archival copies of content to which they have perpetual rights.
The issue is that the LOCKSS cache will periodically re-retrieve
all its contents to make sure the cache is “fresh”. Counting such
activity as “full text requests” can grossly inflate usage numbers.
• Solution:
- Usually LOCKSS and similar caches access the content from
either an API or a specific IP address from within the institutions
range of IP addresses..
- All activity from LOCKSS or similar caches during loading or
subsequent refreshing of the cache must be excluded from
COUNTER reports.
CoP section 5, pg 34
Overview
• Recap of Release 3 of the Codes of Practice
• New reports
• Data processing and related rules
• Delivery of reports
• Audit and compliance
Deliver of Reports
• Reports must be made available as a spreadsheet
(e.g. Excel or CSV format) where applicable AND
XML
• Each report should reside in a separate page or file
• Reports should be available on a password controlled
web site
• Consortium administrators must be able to access
both consolidated and individual member reports
from a single login
• Email or other alerting mechanism should be put in
place
CoP section 4.3, page 31
Deliver of Reports, cont’d
• Reports must be provided monthly,
• within 4 weeks of the end of the reporting period
• All of last calendar year and the current calendar year
should be available
• XML reports must be available for harvesting via
SUSHI
CoP section 4.3, page 31
COUNTER XML
• •Report
name
Customer
ID
•Version/release
• Customer name
• Contact
COUNTER Report
• Institutional Identifier's)
• Item Name (e.g. Title)
Customer
Customer
• Item Identifiers (e.g. ISSN)
• Publisher
• Platform
• Data type (Journal, Book)
ReportItems
Items
Report
• Period (e.g. month)
Performance
Performance
Metrics
Metrics
• Category (e.g. Requests,
Searches,
Sessions)
• Publication Year
MetricInstance
Instance
Metric
• Metric Type (e.g.
ft_html, ft_pdf…)
• Count
COUNTER XML
Journal Report 1
Mt Laurel Univ.
Journal A
Jan 2008
Ft_html = 234
Ft_pdf = 100
Feb 2008
Ft_html = 312
Ft_pdf = 123
Journal B
Jan 2008
Ft_html = 23
Ft_pdf = 34
Feb 2008
Ft_html = 41
Ft_pdf - 62
COUNTER XML
Consortium Report 1
Mt Laurel Univ.
Journal A
Jan 2008
Ft_html = 234
Ft_pdf = 100
Feb 2008
Ft_html = 312
Ft_pdf = 123
Journal B
Jan 2008
Ft_html = 23
Ft_pdf = 34
Feb 2008
Ft_html = 41
Ft_pdf - 62
Mt Laurel College
Journal B
Jan 2008
Overview
• Recap of Release 3 of the Codes of Practice
• New reports
• Data processing and related rules
• Delivery of reports
• Audit and compliance
Credibility: COUNTER Audit -- Status
• Current vendors had 18 months from initial compliance
to pass an audit.
• New vendors declaring COUNTER compliance will have
6 months to successfully pass an audit.
• Audit status is include in the chart showing compliant
vendors
• Current statistics on listed vendors:
- Passed audit: 70%
- Audit not yet due: 20%
- Audit overdue*: 10%
* vendors who do not have or pass an audit by the due date will be dropped
from the Register of Compliant Vendors
Thank You!
opesch@ebsco.com
Download