Digital Planning and Implementation Team Primary Goals & Assumptions October 2008 The University Library is responsible for the long-term management and survival of an increasing amount of digital content that represents the University’s investments in collections and other materials that support the institution’s educational and research mission. While the IDEALS repository supports the deposit by faculty and other researchers of scholarship produced under the auspices of Illinois, the Library must put in place a similar set of technologies and services for the digital content that represents its collections and other materials. The development of a digital library repository is an integral component of the Library’s ongoing stewardship responsibility to the University. In order to plan for a centralized repository function, the University Library intends to charge a team to assess program and technical needs, within the Library and across stakeholder units and programs. The Repository Planning Team will lead the general planning and brainstorming of a “centralized repository system” (CRS) for the University Library. Although the singular term “repository” is used, this CRS may consist of one or more disparate systems which work together to manage and preserve the Library’s digital content. The Repository Planning Team is charged with assessing current Library needs, establishing general system requirements and performing an environmental scan to identify possible solutions. The primary goal is to provide the Repository Implementation Team with a staged plan (and intermediate goals) for prototyping and establishing a centralized repository system. The Repository Planning Team prepares updates, meets periodically with the AUL’s, and submits a report with recommendations to the AUL’s and CAPT. The AUL for IT is responsible for coordinating the budget and program statement in preparation for the implementation phase. Repository Planning Assumptions 1. Despite its name, the “centralized repository system” (CRS) will likely consist of more than one different software solution. Therefore, we are not looking to identify a single software candidate as the best and only system. We recognize that different types of content will likely require different types of systems. 2. The term “repository” in CRS is used generically as a “place to store/manage digital content.” The software candidates for the CRS should not be limited to common digital repository software (e.g. DSpace, Fedora, etc.). More simplistic solutions (e.g. filesystems, databases) should also be investigated. 3. The CRS is primarily concerned with the management, curation and preservation of finalized Library digital content (i.e. content that is ready for dissemination and is generally stable or not changing frequently). Although it may be worth brief discussion Page | 1 of the needs of “living” digital content, this content will likely not be accepted until much later. 4. At a higher level, the CRS will be modeled after the OAIS Reference Model, and attempt to follow guidelines laid out by the Trustworthy Repositories Audit & Certification checklist. 5. Although there may be a public web-access interface, the primary role of the CRS is to be a backend management system, not another access system. 6. Digital content residing in an existing access system (e.g. IDEALS, Content DM, etc.) will likely still remain in that location. However, a preservation copy of that content will likely to be provided to the CRS by these access systems. 7. The CRS system will provide a secure log-in (likely based on NetID) 8. Digital content stored in the CRS may have different levels of permissions. Some content may be publicly accessible, and others may be restricted to smaller groups of users. 9. Some digital content stored in the CRS may not be important to preserve for the long term. It is likely the CRS will need to have an idea of an “expiration date” which could be assigned for temporary content. 10. The CRS must be able to maintain identifiers (e.g. “handles” assigned by LSDWG or IDEALS) 11. The CRS must be able to store relationships between files (even if the files exist in different systems). For example, being able to note that a PDF in IDEALS is another representation of a set of JPEG2000 images in the CRS. 12. The CRS need not store every important digital file of the Library. However, it’s recommended that the CRS be able to store identifiers to important digital content stored elsewhere (e.g. content deemed appropriate for the Hathi Trust may not need to be duplicated in the CRS, but we may wish to store the identifiers to that content) 13. The CRS needs to have a public API which would allow other (Library or non-Library) services to access any publicly available content. 14. The CRS will be implemented in many stages. It is important that we move quickly and take smaller steps, rather than wait to find a solution that will immediately meet all our needs. The initial implementation will likely be a “bare bones” solution, which may not meet the needs of all types of content. However, we should plan that “bare bones” solution such that we can extend it for the future. 15. In order to move quickly, we should concentrate first on Library needs but with the input from stakeholders outside of the Library so that we can extend the CRS as needed. 16. After each stage of implementation, the Repository Planning Team (and others in the Library) will have an opportunity to assess the CRS and suggest implementation or directional changes. Page | 2 17. Even though the CRS will require a large amount of storage (i.e. disk space), planning should concentrate on access and preservation needs of digital content, and an environmental scan of solutions which could meet those needs. Key Milestones To be decided… (6 months after first meeting) Initial report/recommendations of Planning Team Issues under Discussion 1. Storage Space. Library digital content and the CRS may or may not reside on Library servers. Based on decisions of the Planning Team, it’s possible some content may reside in the Hathi Trust repository. Depending on the space required by remaining content, it may also be worth a discussion with CITES or NCSA about server space, etc. 2. Ongoing Staffing/Support. Based on determinations of the Planning Team, there will need to be a broader recommendation on how best to provide ongoing support for the CRS and the content within the CRS 3. Purchased Digital Content. Is the CRS the most appropriate place for E-books and electronic journals that the Library has purchased? 4. Intellectual Property. How will the CRS deal with content that may have IP/copyright issues? Will we need some sort of license agreement for content being disseminated by the CRS? Primary Goals of Planning Team Assess the Library’s needs for a CRS Work with “Content/User Stakeholders” (see below) to determine the scope of content and usage needs for the CRS. Generate a list of common questions to ask these stakeholders (build off of Purdue’s Data Curation questions?) Assess the types of digital content to be placed in the CRS, and decide scope of initial stage. Establish higher level needs for preserving these distinct digital content types (Video, Audio, Images, etc.) Establish a list of general requirements for a CRS (in terms of access needs, preservation needs, API needs, etc.) Perform an environmental scan of possible solutions or software available. Page | 3 Scope implementation into a series of higher level stages. Help establish higher level success criteria for each implementation stage. Reconvene and assess the success of the CRS after each stage. Provide suggestions for implementation or directional changes. Deliverables Report of CRS requirements and needs, based on types of content stored. This includes a list of all known content to be placed in the CRS. This also may include a higher level diagram of how content will be obtained or disseminated via CRS. Report of environment scan. Recommendations regarding which software solution(s) we should be prototyping or investigating in further detail. Recommendations for staged implementation of CRS. How many stages, general timelines, proposed staffing of Implementation Team, etc. A written assessment of the initial prototype, and recommendations on how to move forward with full implementation Planning Team Membership Tim Donohue, chair Tom Habing Joanne Kaczmarek Emma Lincoln Bill Mischo Sarah Shreeves Tom Teper John Weible Beth Sandore (Administrative Liaison) Content / User Stakeholders Although they may not be official members of the Repository Planning Team, these stakeholders would be worth interviewing in regards to their content or usage needs regarding a Library CRS. (Tentatively, Beth Sandore has suggested we may be able to obtain a part-time Grad Hourly to help conduct these interviews.) Page | 4 Allen Renear and Amit Kumar (GSLIS / MONK project) Lisa Hinchliffe (Learning Objects) Betsy Kruger (Digital Content Creation / Digitization) Chris Prom (Archives / Archon System) Mary Stuart (History, Philosophy & Newspaper Librarian) Miranda Remnek (Slavic Library) Scott Wilson (Materials Chemistry Lab / Scientific content, esp. chemistry/crystals) Michelle Wander (Morrow Plot) Charlie Kline or Bob Booth (CITES – potential storage partners?) Michael Grady, CIO’s Office—cyberinfrastructure research needs Representative from the Provost’s office—Vice Provost or her designee Jim Myers and Michael Welge, NCSA (IACAT programs) Representative from I3 Rebecca Bryant, Assistant Dean, Graduate College Page | 5