Archiving CP/M Software Project Report Spring 2014 INF 392K Digital Archiving and Preservation Kendra Malinowski Emily McDonald Marcia McIntosh Roger Simon Table of Contents: Introduction (p. 1) Inventory & Contacts (pp. 1-2) Submission Plan (p. 2) Processing Materials (pp. 2-3) Emulation and Virtualization (p. 3) Working with OpenSUSE 13.1 (p. 3) Metadata Selection (p. 4) Division of Tasks (pp. 4-5) Special Thanks to the iSchool Faculty & Staff (p. 5) Introduction This document is a final reflective report on the CP/M software archiving project for the Spring 2014 “Digital Archiving and Preservation” class. The report details some of the problems that we encountered during the project, as well as our general reflections. Overall, we believe that we were successful in completing our task: we were able to batch ingest several CP/M programs into DSpace, and we got a virtualization and an emulation of CP/M running, so that we could test the programs. Inventory & Contacts An important part of the project entailed choosing which CP/M materials to preserve—there are a variety available on the web, including operating-system software, user and programmer manuals and guides, utility software, emulators, and other rendering software. After inventorying some of the CP/M materials that were available (as of February 2014) on the Internet, we decided to limit our scope to collecting CP/M software, which we deemed to be the most at risk. We contacted the administrators of three websites that had emails addresses listed. We heard back from two of them. We learned that those involved in online CP/M groups are enthusiastic about their projects, and many of them expect that those who take an interest in their projects have a certain level of knowledge about CP/M. Thus, it is a good idea to do some studying about CP/M before making contact with anyone from these groups. Also, in our initial contact emails, we simply stated that we were interested in “archiving” CP/M software. In the future, it would be a good idea to explain such an archiving project in more detail. To many in the CP/M community, particularly those with backgrounds in computers or web development, an “archiving” project seems redundant, since they equate the availability of software on the Internet with an archives (and since some of the websites that contain CP/M software include the words “archives” or “archive” in their names). Therefore, we suggest explaining that a project like ours involves long-term preservation of materials in a digital repository supported by a university. Submission Plan Because we archived CP/M programs from publicly available websites, we decided that it was not necessary to draft a full SIP agreement and have it signed by the administrator of the website from which the materials were obtained. Instead, we got specific written permission (via email) to download materials that were already available for public download. We then created a relatively short and straightforward submission plan that noted the following: (1) the materials to be ingested were publicly available for download on the Internet; (2) if possible, the group got written permission to download and archive the materials; and (3) the group also asserted that it was archiving the materials under the “fair use” doctrine of United States copyright law. We suggest doing the latter because the administrator of a website that offers CP/M programs will almost certainly not have been the creator of all of the available materials. Processing Materials It was, in general, easier to work with the CP/M materials than we originally thought that it would be. Because the materials were available online, we downloaded them without concern about changing the metadata, which allayed our concerns about accidently changing an item. Two of the concerns that we had during the project were the number of programs that we would be able to archive and how we could automate the various processes that we needed to go through in order to ingest the materials into DSpace. Because none of us had much experience with Linux commands or scripting, creating advanced scripts seemed beyond the scope of our project. One of the interesting aspects of our project was that it seemed to change scope from week to week. At the beginning, when we did an inventory, we realized the variety and number of CP/M materials available on the web, including operating-system software, utilities, programs, emulators, and manuals. Thus, we decided that picking a few sample software packages to ingest would be best, and we also decided to try to get some of the software running in an CP/M 2 emulator. Once we decided to work exclusively with materials from one website, it appeared that that we could archive all of the CP/M software from that site. We then spent a few weeks working on how we could batch download all of the materials, and we did so. But after we realized the challenges in trying to identify and work with all of the different MIME types for all of the different files, we returned to our original plan of a sample ingest. We did not complete the gathering of item-level metadata until two weeks before the end of the semester. Although the continual changing nature of the scope of our project was, at times, confusing, it showed us that the nature of digital archiving is a process of trial and error, especially when working with older materials and file formats. Emulation and Virtualization After doing quite a bit of online research, we got the YAZE-AG emulator to run a few CP/M programs, and we got VirtualBox to boot CP/M-86. One of the major difficulties in completing these taks was our limited knowledge of older computers. There were online instructions about how to use the emulator, and we found some forum posts with direction for setting up VirtualBox, but these instructions were quite limited and certainly not “step through” guides. We believe that this is the case because CP/M enthusiasts have had much experience in working with CP/M (as well as working with computers in general), so that the information that they post is not necessarily intended to be detailed instructions for CP/M newcomers. In general, we found that those who have a good working knowledge of today’s computers have more good ideas for working with the materials. For example, even after the creator of the YAZE-AG emulator emailed us some detailed instructions, we were not able to get a CP/M program to run until Carlos Ovalle had a simple suggestion: he knew what to do because he had experience in working with the command line. Working with OpenSUSE 13.1 We decided to work in a Linux environment because it was compatible with our emulator and VirtualBox. One early decision that we made was to choose which Linux distribution to install. We learned that there are many different distributions of Linux, including Ubuntu, Debian, Fedora, and Gentoo. We chose openSUSE because the documentation for the YAZE-AG emulator recommended that we use it. We required the expertise of Shane Williams and Carlos Ovalle to help us install the system—they showed us that in order to download any other software, we would need to use a management tool called YaST. The operating system did not come with some of the programs that one would expect, such as a text editor. It also required the installation of a desktop through which we were eventually able to save files and documents. Feeling ambitious, we selected the GNOME interface because it was different from Windows. In retrospect, using a more familiar environment like KDE would have allowed us to work more efficiently, but we are glad that we gained experience with Linux and openSUSE. 3 Metadata Selection As we began composing our Dublin Core XML documents and preparing for batch ingest, we faced some concerns regarding metadata. One the biggest was knowing which metadata elements were best suited for the materials that were to be ingested. For example, when dealing with multiple contributors, we were unsure of how to incorporate them into DSpace. This problem was resolved when we decided to keep things simple and use the “other” qualifier. But this issue resulted in a new problem: making sure that we were correctly using the metadata fields for DSpace. For example, we had metadata indicating when the administrator of the website from which we downloaded CP/M software packages made those packages available on the website, but that did not translate well into the the Dublin Core elements and qualifiers for the dates that DSpace offers. Ultimately, after discussing this problem with Dr. Galloway and among ourselves, we decided to keep things simple and populate only the elements for which we knew we had accurate information. This approach prevented us from including incorrect information in DSpace. The reason that we created metadata spreadsheets was to create XML for batch, but it might also be beneficial to fill out such spreadsheets for manual ingest; doing so would be a good way to create an inventory and gather data. With all of the information in one place, manual ingest may be a viable alternative to batch ingest. Division of Tasks Below is the breakdown of tasks that we completed for this project: We all contributed to the inventory of CP/M resources from the Internet. Kendra handled the correspondence with potential donors. We all inventored the Retrocomputing Archive website materials. Marcia downloaded and created an installation disk for openSUSE. Emily figured out how to download a desktop for the openSUSE interface. We all worked together in the lab to set up our machine. Roger wrote the submission plan. Roger attempted a manual inventory of MIME types. Marcia wrote the “Short and Extended CP/M History.” Kendra wrote the “Retrocomputing Archive History.” Kendra found scripts to batch download and process files from the Retrocomputing Archive. We all worked together to download and process files onto our machine. We all researched MIME types. Emily designed the special metadata ingest spreadsheet for the software. Roger drafted the project report. 4 We all looked for metadata values for the selected software. Marcia set up the DSpace structure. Marcia created the CP/M logo. Emily reviewed the metadata and accepted collections in DSpace. Emily converted the metadata from our Excel spreadsheet to XML files for the batch upload. Kendra created the contents files for the DSpace batch upload. Emily wrote the “Scope and Content” note for the CP/M community home page. Marcia wrote the “Scope and Content” notes for the Sub-Communities. Marcia researched emulators and installed YAZE-AG. Kendra researched VirtualBox and CP/M floppy-disk images. Kendra uploaded the Virtualbox and instructions into DSpace. Marcia uploaded the YAZE emulator into DSpace. All members contributed to photographs. Marcia made the project-presentation powerpoint. All members contributed to “Project Documentation” and “Project Report.” Roger did final proofreading and editing of “Project Documentation” and “Project Report.” Special Thanks to the iSchool Faculty & Staff Shane helped us with installation and general questions about openSUSE. Carlos helped us with the openSUSE installations, the magic CP/M-YAZE-AG script to import files, and all batch uploads. Dr. Galloway assisted us in all parts of the project. 5