PPT - ACS Division of Chemical Information

advertisement
Looking back, moving forward:
Examining the impact of digitizing
the ACS archive
232nd ACS National Meeting
September 13, 2006
David Martinsen, Adam Chesler
American Chemical Society
Copyright © 2006 American Chemical Society
Example 1. Re-discovering older information
A news article from
the BBC:
http://news.bbc.co.uk/
2/hi/business/4790293
.stm
Example 2. Re-discovering older information
•
From The Chronicle
of Higher
Education:
http://chronicle.com
/free/v52/i33/33a04
101.htm
Example 3. Re-discovering older information
• From
Science:
http://dx.doi.org/
10.1126/science.
313.5788.744
ACS Archive:
The Definition for this Talk
• Web editions began with 1996 forward
– Based on electronic production, digital composition
files
– PDF files are text-based
• Digitization of the Archive of printed issues
began with 1995 and progressed backwards to
volume 1, issue 1 of each publication (except for
Journal of Natural Products)
– Based on tiff image scans
– PDF files are image-based
Re-discovering ACS Journals Archive:
Facts and Figures
• The time frame: 1879 – 1995
• Number of journal articles: ~460,000
• Number of pages: ~2,425,000
• Number of journal issues: ~11100
The ACS Journals Archive:
Procedures: Production
• Scan all pages, cover to cover
– 600 dpi black-and-white
– 400 dpi for pages with color
• For each article, keyboard metadata not
available from CAS
–
–
–
–
Article title
Authors
Journal name
Volume, issue, page numbers, image numbers
The ACS Journals Archive:
Procedures: PDF specifications
• For each article, use the metadata file to find the
starting and ending page numbers, create a
PDF file using Adobe Acrobat Capture
– If the page has color or halftone, use the color image
in preference to the black-and-white.
– Use Capture’s OCR to generate text
– Store (Image+Text) PDF format
• Image layer for display
• Text layer for search
The ACS Journals Archive:
Procedures: What’s an Article?
• Primary goal: Digitize and make available all of
the research articles in ACS Publications
• What’s not included?
–
–
–
–
–
Covers
Mastheads
News stories
A-pages
Contents pages
But: there is hope
• Cover-to-cover scanning was done
• All those have been captured (where available)
and are awaiting re-discovery
The ACS Journals Archive:
Some challenges
• Physical challenges of locating, retrieving, and
shipping 11,000 journal issues
– When we asked for the first copies to be sent to the
scanning vendor, Iron Mountain dutifully copied the
journals, and shipped the copies to us.
– Missing issues were purchased from back journal
vendors
– A few issues were loaned from UCSB (Thanks to
Chuck Huber)
The ACS Journals Archive:
Some challenges
• Even with our quality control procedures,
problems sometimes turned up, often pointed
out by users:
– A special golden jubilee issue was published in 1926,
but numbered separately from the normal issues, so
we didn’t detect any missing pages.
• This jubilee issue turns out to be quite interesting
as a historical perspective on ACS
The ACS Archive:
Historical Perspective on ACS
• The early issues of Journal of the American
Chemical Society contained abstracts of other
journals (American and foreign), as well as
listings of chemistry-related patents (American
and foreign)
– From JACS, 1879, volume 1, page 384
(http://dx.doi.org/10.1021/ja02149a600)
The ACS Journals Archive:
Some experiments
• ACS tried a number of experiments over the
years:
– For some years, Analytical Chemistry and
Environmental Science and Technology published an
abstracts-only edition, in addition to full papers.
Guess which one was sent for scanning.
– The Journal of Organic Chemistry experimented with
a miniprint section for experimental details in the mid1970s.
The ACS Journals Archive:
Additional experiments
– Advanced ACS Abstracts (1993-1997)
• A precursor to ASAP Articles
– ACS CDROM editions (1994-1996)
• Journal of the American Chemical Society
• The Journal of Organic Chemistry
• Biochemistry
The ACS Journals Archive:
Some challenges
– Some journals issues were split into part A and part B,
although A was not specially labeled.
– The addition of new titles was not tied to a calendar
year boundary. Some volumes began with issue 1 in
one year, and ended up with issue 4 in another year.
– Some journals spun off other journals, and sometimes
recombined with them in later years. Industrial and
Engineering Chemistry specialized in this.
The ACS Journal Archive:
A lens on the culture
• During the wars, a number of hints of the impact on
science crept into the literature:
• From Industrial and Engineering Chemistry, 1917,
volume 9, page 228,
http://dx.doi.org/10.1021/ie50087a008
What is the impact of digitization at ACS?
• We know people are using it, but is it making an
impact?
• Today’s measure of whether an article is making
an impact, or not, is related to the number of
times it is cited
Average citations per article for all ACS journals
60.00
50.00
40.00
30.00
Overall
20.00
10.00
0.00
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Average citation age for all ACS journals
20
18
16
14
12
10
Overall
8
6
4
2
0
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Impact of the Archive on Citations
• Over all ACS journals, there is minimal impact of
the older material on citation practices.
• New articles are still citing more recent material
in preference to the older material.
– Older material has always been accessible, it is just
more convenient to get at it now.
– Newer articles are preferentially highlighted
– Ever increasing numbers of new articles are being
released competing with old articles for views
The Future of the Past Material
• The archive has been digitized and deployed,
but the work is not done. Some potential areas
for development and maintenance:
– Capturing of cited references, chemical information,
improved linking
– Enabling access to A-pages and other material
– Supporting information
– Format migrations, as needed: tiff, PDF, PDF/A
– XML
Acknowledgments
• The backfile team at ACS (acquiring and
shipping journals, scanning, PDF generation,
quality control, web interface)
• CAS for metadata
• The users who have provided, and continue to
provide valuable feedback
• Contact info: d_martinsen@acs.org
Download