Looking back, moving forward: Examining the impact of digitizing the ACS archive 232nd ACS National Meeting September 13, 2006 David Martinsen, Adam Chesler American Chemical Society Copyright © 2006 American Chemical Society Example 1. Re-discovering older information A news article from the BBC: http://news.bbc.co.uk/ 2/hi/business/4790293 .stm Example 2. Re-discovering older information • From The Chronicle of Higher Education: http://chronicle.com /free/v52/i33/33a04 101.htm Example 3. Re-discovering older information • From Science: http://dx.doi.org/ 10.1126/science. 313.5788.744 ACS Archive: The Definition for this Talk • Web editions began with 1996 forward – Based on electronic production, digital composition files – PDF files are text-based • Digitization of the Archive of printed issues began with 1995 and progressed backwards to volume 1, issue 1 of each publication (except for Journal of Natural Products) – Based on tiff image scans – PDF files are image-based Re-discovering ACS Journals Archive: Facts and Figures • The time frame: 1879 – 1995 • Number of journal articles: ~460,000 • Number of pages: ~2,425,000 • Number of journal issues: ~11100 The ACS Journals Archive: Procedures: Production • Scan all pages, cover to cover – 600 dpi black-and-white – 400 dpi for pages with color • For each article, keyboard metadata not available from CAS – – – – Article title Authors Journal name Volume, issue, page numbers, image numbers The ACS Journals Archive: Procedures: PDF specifications • For each article, use the metadata file to find the starting and ending page numbers, create a PDF file using Adobe Acrobat Capture – If the page has color or halftone, use the color image in preference to the black-and-white. – Use Capture’s OCR to generate text – Store (Image+Text) PDF format • Image layer for display • Text layer for search The ACS Journals Archive: Procedures: What’s an Article? • Primary goal: Digitize and make available all of the research articles in ACS Publications • What’s not included? – – – – – Covers Mastheads News stories A-pages Contents pages But: there is hope • Cover-to-cover scanning was done • All those have been captured (where available) and are awaiting re-discovery The ACS Journals Archive: Some challenges • Physical challenges of locating, retrieving, and shipping 11,000 journal issues – When we asked for the first copies to be sent to the scanning vendor, Iron Mountain dutifully copied the journals, and shipped the copies to us. – Missing issues were purchased from back journal vendors – A few issues were loaned from UCSB (Thanks to Chuck Huber) The ACS Journals Archive: Some challenges • Even with our quality control procedures, problems sometimes turned up, often pointed out by users: – A special golden jubilee issue was published in 1926, but numbered separately from the normal issues, so we didn’t detect any missing pages. • This jubilee issue turns out to be quite interesting as a historical perspective on ACS The ACS Archive: Historical Perspective on ACS • The early issues of Journal of the American Chemical Society contained abstracts of other journals (American and foreign), as well as listings of chemistry-related patents (American and foreign) – From JACS, 1879, volume 1, page 384 (http://dx.doi.org/10.1021/ja02149a600) The ACS Journals Archive: Some experiments • ACS tried a number of experiments over the years: – For some years, Analytical Chemistry and Environmental Science and Technology published an abstracts-only edition, in addition to full papers. Guess which one was sent for scanning. – The Journal of Organic Chemistry experimented with a miniprint section for experimental details in the mid1970s. The ACS Journals Archive: Additional experiments – Advanced ACS Abstracts (1993-1997) • A precursor to ASAP Articles – ACS CDROM editions (1994-1996) • Journal of the American Chemical Society • The Journal of Organic Chemistry • Biochemistry The ACS Journals Archive: Some challenges – Some journals issues were split into part A and part B, although A was not specially labeled. – The addition of new titles was not tied to a calendar year boundary. Some volumes began with issue 1 in one year, and ended up with issue 4 in another year. – Some journals spun off other journals, and sometimes recombined with them in later years. Industrial and Engineering Chemistry specialized in this. The ACS Journal Archive: A lens on the culture • During the wars, a number of hints of the impact on science crept into the literature: • From Industrial and Engineering Chemistry, 1917, volume 9, page 228, http://dx.doi.org/10.1021/ie50087a008 What is the impact of digitization at ACS? • We know people are using it, but is it making an impact? • Today’s measure of whether an article is making an impact, or not, is related to the number of times it is cited Average citations per article for all ACS journals 60.00 50.00 40.00 30.00 Overall 20.00 10.00 0.00 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Average citation age for all ACS journals 20 18 16 14 12 10 Overall 8 6 4 2 0 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Impact of the Archive on Citations • Over all ACS journals, there is minimal impact of the older material on citation practices. • New articles are still citing more recent material in preference to the older material. – Older material has always been accessible, it is just more convenient to get at it now. – Newer articles are preferentially highlighted – Ever increasing numbers of new articles are being released competing with old articles for views The Future of the Past Material • The archive has been digitized and deployed, but the work is not done. Some potential areas for development and maintenance: – Capturing of cited references, chemical information, improved linking – Enabling access to A-pages and other material – Supporting information – Format migrations, as needed: tiff, PDF, PDF/A – XML Acknowledgments • The backfile team at ACS (acquiring and shipping journals, scanning, PDF generation, quality control, web interface) • CAS for metadata • The users who have provided, and continue to provide valuable feedback • Contact info: d_martinsen@acs.org