Library of Congress eScience Team (eST) Update Board on Research Data and Information National Academies 3 June 2010 ††† Peter R. Young Library of Congress 1 eScience Team (eST) Update Outline • LC Science Information and Data – – – – – 1992 2000 2005 2009 2010 LC S&T Initiative LC 21 Collections Policy Committee Report Digital archiving Twitter Archive • eST 2010 – Charge, Composition, Activities – Proof-of-Concept Geospatial Data Projects • OSI/NDIPP Research Data Activities – Role of Libraries and Archives in Data Management and Preservation 2 Library of Congress - 1992 • LC’s Science and Technology Initiative 1992 – 1990 review of the Library’s science and technology information capabilities – Special Project Team on a National Center for Science and Technology Information – “LC will take the lead in the broader STI community to make it easier for industrial and educational institutions to obtain usable technical information to boost innovation and success in education. “ – “ The Library will continue to serve as ‘America’s memory’ for scientific publications in electronic, as well as paper, formats. Libraries have always played the role of supporting not only today’s journals, but also yesterday’s.” 3 Library of Congress - 2000 • LC 21: A Digital Strategy for the Library of Congress, Committee on an Information Technology Strategy for the Library of Congress. Computer Sciences and Telecommunications Board, National Research Council, 2000: • “The Library now needs to learn from the [National Digital Library Program] to broaden and deepen its strategic awareness of how that project can help lead to the next generation of substantially more ambitious involvement with digital information.” 4 Library of Congress - 2005 – 2005 Report to the Collections Policy Committee from the Special Committee to Examine the Potential Role of the Library of Congress in the Collection, Preservation and Access of Scientific Databases • “…the Library may decide that it is not our obligation to preserve datasets, but to see that they are preserved.” • “…the Committee recommends that where the work of the Congress and those supporting its work require science datasets, the Library will work to assure access to these datasets, in consultation with that user group.” • “The Committee recommends that LC consider serving as an archive of last resort – taking responsibility for collecting/preserving and servicing some smaller datasets created by individual researchers, which have been identified by specialists as key research sources not eligible to be archived elsewhere, and which are in scope for the Library to collect, regardless of format.” 5 Library of Congress - 2009 • The Library has been collecting materials from the web since it began harvesting Congressional and Presidential campaign websites in 2000. Today the Library holds more than 167 terabytes of webbased information, including legal blogs, websites of candidates for national office and websites of Members of Congress. • In addition, the Library leads the Congressionally mandated National Digital Information Infrastructure and Preservation Program www.digitalpreservation.gov, which is pursuing a national strategy to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations. 6 Library of Congress - 2010 • “The Library looks at this (Twitter Archive) as an opportunity to add new kinds of information without subtracting from our responsibility to manage our overall collection. Working with the Twitter archive will also help the Library to extend its capability to provide stewardship for very large sets of born-digital materials. James H. Billington, Librarian of Congress 15 April 2010 7 Library of Congress Twitter Archive • Twitter is donating its digital archive of public tweets to the Library of Congress. Twitter is a leading social networking service that enables users to send and receive tweets, which consist of web messages of up to 140 characters. • Twitter processes more than 50 million tweets per day from people around the world. The Library will receive all public tweets-which number in the billions-from the 2006 inception of the service to the present. • "The Twitter digital archive has extraordinary potential for research into our contemporary way of life," said Librarian of Congress James H. Billington. "This information provides detailed evidence about how technology based social networks form and evolve over time. The collection also documents a remarkable range of social trends. Anyone who wants to understand how an ever-broadening public is using social media to engage in an ongoing debate regarding social and cultural issues will have need of this material." 8 Library of Congress Twitter Archive “What is pronounced trash to-day may have an unexpected value hearafter, and the unconsidered trifles of the press of the nineteenth century may prove highly curious and interesting to the twentieth, as examples of what the ancestors of the men of that day wrote and thought about.” Ainsworth R. Spofford, Librarian of Congress 1864 - 1897 9 The Library of Congress eScience Team (eST) • Deanna B. Marcum, Associate Librarian for Library Services, charged eST in 2009: • To develop collection strategies for digital science resources and data appropriate for the national library • eST 2010 activities: • Launch cross-unit proof-ofconcept digital data pilot projects • Identify opportunities for initiatives and partnership involving digital data sets 10 eScience Team (eST) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. Martha Anderson OSI/NDIIPP Βahadir Akpinar OSI/RDC Rod Atkinson CRS Ron S. Bluestone LS/S-T-B Leonard Bruno LS/PSCD/MSS Colleen R. Cahill LS/G&M/TSS Dan Cohen LS & BRDI Babak Hamidzadeh OSI/RDC John R. Hebert LS/CS/G&M Jan Johansson CRS William Lefurgy OSI/NDIIPP Debra Ozga LS/POP/FLICC Clay Readding LS/NDMSO Roberta Shaffer LL Peter R. Young LS/CS/AD 11 eST Team Charge – Draft Library strategies for digital science resources – Recommend digital science collection policies and workflows – Create a planning framework for digital science knowledge resources and infrastructure – Recommend data management policies and digital knowledge resources to support data-driven science – Meet with other agencies and organizations involved with digital science data archives and preservation 12 eST Team Activities – 2009-2010 • • • • • • • • • • • 13 March 2009 Biodiversity Heritage Library – Tom Garnett (BHL) and Martin Kalfatovic (Smithsonian Institution) 10 April 2009 eScience and Data Science: Preparing for the Data Avalanche - Kirk Borne (George Mason University), Tim Eastman (Plasmas International – NASA), and Dave Williams (National Space Data Center, NASA) 1 May 2009 Pillbox, eScience, and the Evolution of the Library - Solid Dose Pharmaceutical Photography Project - David Hale, Division of Specialized Information Services and Terry Yoo, National Library of Medicine - National Institute of Health 29 May 2009 Paul Uhlir (Board on Research Data and Information, National Academies) Board on Research Data and Information 17 July 2009 G. Sayeed Choudhury, Associate Dean of University Libraries, Johns Hopkins University 24 July 2009 Chris L. Greer, Director, National Coordination Office for Networking and Information Technology Research and Development, National Science and Technology Council 9 October 2009 National Archives and Records Administration - Michael Kurtz and Laurence Brewer. 13 October 2009 Pam Bjornson, Director General, Canada Institute for Scientific and Technical Information 15 January 2010 Corporation for National Research Initiatives (CNRI) Bob Kahn and Allen Sears 10 March 2010 UCLA Department of Information Studies VideoConference: “The Role of Libraries and Archives in Data Management” 23 April 2010 Smithsonian Institution – Len Hirsch, Office of the Under Secretary for Science and James Smith, Senior Research Analyst 13 eST 2010 Activities • Proof-of-Concept projects: – Geospatial Data Sets • LC Geography & Map Division data • Congressional Research Service – Congressional Geospatial Data System • NDIIPP Partner data – University of California, Santa Barbara – North Carolina State Library • Library Services – Office of Strategic Initiatives - CRS collaboration to characterize digital science data workflow management and archival requirements 14 LC-OSI/NDIIPP & Research Data Activities • National Geospatial Digital Archives – Stanford University and UC – Santa Barbara • Geospatial data and images collection preservation and policy agreements among partners • North Carolina Geospatial Data Archiving Project – North Carolina State University Libraries and North Carolina Center for Geographic Information and Analysis • State and county agencies partnership for developing data creation practices for preservation and access of at-risk data • Geospatial Multistate Archive and Preservation Project – North Carolina Center for Geographic Information and Analysis • Expand state government capability to provide long-term access to geospatial data and test geographically dispersed content-exchange • Geospatial Data Preservation Clearinghouse – Center for International Earth Science Information Network, Columbia University • Develop web-based resource of tools, standards, best practices for geospatial data preservation • Data-PASS Project – Inter-University Consortium for Political and Social Research, Univ of Michigan • Acquire and preserve at-risk social science data and test distributed network 15 eST 2010 Geospatial Data Pilot Projects – Transfer of several digital geospatial pilot project data sets to investigate and analyze these data sets as possible models for eScience data. • – eST study questions regarding the target digital geospatial data sets to determine the nature and scope of the challenges involved with digital content of this nature. eST study project initiatives will provide an deeper understanding about the Library’s role related to eResearch and eScience. • eST study project will clarify workflow, policy, and technical issues related to transfer, ingest, management, and access issues related to digital data sets. 16 LoC Repository Service Development 17 The Role of Libraries and Archives in Data Management & Preservation • • • • Sustainability Scalability Workflow integration Lifecycle management – Preservation – Conservation – Curation • • • • Costs Access tools Use policies Interdisciplinary skill requirements • Links with communities-of-practice 18 19