MacKenzie Smith Associate Director for Technology MIT Libraries Agenda Introduction DSpace demo Technical architecture Organizational model MIT case study DSpace Federation Q&A at the end of each presentation General Q&A at the close DSPACE INTRODUCTION DSpace Vision (1999) A federated repository that makes available the collective intellectual resources of the world’s leading research institutions Mission Create a scalable digital archive that preserves and communicates the intellectual output of MIT’s faculty and researchers Support adoption by and federation with other research institutions DSpace is… An open source technology platform A service model for open access and/or digital archiving A platform to build an Institutional Repository A (proposed) federation of digital repositories across multiple academic research institutions A production service of the MIT Libraries to the local research community Institutional Repositories Institution-based Scholarly material in digital formats Cumulative and perpetual Open and interoperable The DSpace Repository Institutional Repository for MIT faculty’s digital research materials MIT Libraries - Hewlett Packard Research Labs collaborative development project Open Source system Federated system Preservation archive DSpace Functions Captures Digital research material (any format) Directly from creators (e.g. faculty) Large-scale, stable, managed long-term storage Describes Descriptive, technical, rights metadata Persistent identifiers Distributes Via WWW, with necessary access control Preserves Possible Content Preprints, articles Technical Reports Working Papers Conference Papers E-theses Datasets statistical, geospatial, matlab, etc. Images visual, scientific, etc. Audio files Video files Learning Objects Reformatted digital library collections Why Libraries? Expertise Large-scale collection management Assessment/collection policies preservation Metadata Solid business practices Commitment Long time frames Mission scope CHALLENGES Challenges Faculty Acceptance Valuing and trusting an institutional archive Sustainability institutional, financial Digital Preservation Digital Preservation Philosopy Lots of digital material is already lost Most digital material is at risk Better to have it, do bit preservation, than to lose it completely Need to capture as much information as possible to support functional preservation Cost/benefit tradeoffs Digital Preservation MIT’s commitment levels Known/supported TIFF, SGML/XML, AIFF, PDF Known/unsupported Microsoft Word, PowerPoint (common, proprietary) Lotus 1-2-3, Visicalc, WordPerfect (less common) Unknown/unsupported One-of-a-kind software program Digital Preservation Supported = migration and/or emulation Migration for texts, images, audio, etc. Emulation for software, multimedia? Unsupported Bit preservation at minimum Format migration where possible Commercial conversion services Global Digital Format Registry DESIGN Information Model Communities Research units of the organization Collections (in communities) Distinct groupings of like items Items (in collections) Logical content objects Receive persistent identifier Bitstreams (in items) Individual files Receive preservation treatment Information Model Versioning Item “versions” can be All instances of a work in different formats All editions of a work over time E.g. the XML, PDF, and PostScript versions Official changes (e.g. addenda or new release) Periodic snapshots (e.g. web sites) Metadata lists all available versions of items Communities Research units of the organization Schools, Departments, Research Labs, Research Centers, Programs, etc. Individuals Community “home page” with logo, custom description, etc. Or contract with library Communities Local, distributed policy decisions Who can contribute, access material Submission workflow Submitters, approvers, reviewers, editors Collections definition, management Local, distributed production work Communities supply metadata, files Partnership between library and communities Communities Communities DSpace system Archival Storage DEPARTMENTS LABS CENTERS PROGRAMS Submission Workflow SCHOOLS Metadata (Database) Search/Browse Web User Interface SCHOOL DEPARTMENT LAB CENTER Collection Item Item Item Item Users EDUCATIONAL TECHNOLOGY Problem Lack of persistent repository for Learning Objects Needed for reuse of Entire courses Useful “learning objects” Prior efforts not institution-based Merlot, HEAL, etc. Open Knowledge Initiative Defines API for interoperation between Course/Learning Management Systems Open source (e.g. Coursework, Stellar) Commercial (e.g. Blackboard, WebCT) Digital Repositories Open source (e.g. DSpace, FEDORA) Commercial (e.g. TEAMS, Bulldog) Collaborating with IMS Digital Repository working group OpenCourseWare “Make MIT course materials that are used in the teaching of almost all undergraduate and graduate subjects available on the Web, free of charge, to any user anywhere in the world.” “Course materials contained on the MIT OCW Web site may be used, copied, distributed, translated, and modified, but only for non-commercial educational purposes that are made freely available to other users under the same terms defined by the MIT OCW legal notice.” OpenCourseWare Publication of all course content on the Web Faculty-authored 3rd party produced Metadata based on IMS specifications DSpace Archive for entire course web site Archive of significant content items or “learning assets” for rediscovery and reuse Metadata SIMILE Flexible metadata infrastructure HP/MIT Alliance-funded project e.g. support for IMS/SCORM schema HP Labs W3C’s Semantic Web activity MIT Lab for Computer Science researcher (David Karger) Haystack project on personalized information management MIT Libraries’ DSpace providing test-bed, real-world applications RESEARCH AGENDA Further R&D Digital preservation Datasets, multimedia, websites, programs Economics and user requirements Publishing E-journal alternatives Collaborative, iterative authoring tools Rights management for academia