cpm_projectRptDocB - DSpace at The University of Texas at

Archiving CP/M Software Project Report
Spring 2014
INF 392K Digital Archiving and Preservation
Kendra Malinowski
Emily McDonald
Marcia McIntosh
Roger Simon
Table of Contents:
Introduction (p. 1)
Inventory & Contacts (pp. 1-2)
Submission Plan (p. 2)
Processing Materials (pp. 2-3)
Emulation and Virtualization (p. 3)
Working with OpenSUSE 13.1 (p. 3)
Metadata Selection (p. 4)
Division of Tasks (pp. 4-5)
Special Thanks to the iSchool Faculty & Staff (p. 5)
This document is a final reflective report on the CP/M software archiving project for the Spring
2014 “Digital Archiving and Preservation” class. The report details some of the problems that we
encountered during the project, as well as our general reflections. Overall, we believe that we
were successful in completing our task: we were able to batch ingest several CP/M programs into
DSpace, and we got a virtualization and an emulation of CP/M running, so that we could test the
Inventory & Contacts
An important part of the project entailed choosing which CP/M materials to preserve—there are
a variety available on the web, including operating-system software, user and programmer
manuals and guides, utility software, emulators, and other rendering software. After
inventorying some of the CP/M materials that were available (as of February 2014) on the
Internet, we decided to limit our scope to collecting CP/M software, which we deemed to be the
most at risk. We contacted the administrators of three websites that had emails addresses listed.
We heard back from two of them. We learned that those involved in online CP/M groups are
enthusiastic about their projects, and many of them expect that those who take an interest in their
projects have a certain level of knowledge about CP/M. Thus, it is a good idea to do some
studying about CP/M before making contact with anyone from these groups. Also, in our initial
contact emails, we simply stated that we were interested in “archiving” CP/M software. In the
future, it would be a good idea to explain such an archiving project in more detail. To many in
the CP/M community, particularly those with backgrounds in computers or web development, an
“archiving” project seems redundant, since they equate the availability of software on the
Internet with an archives (and since some of the websites that contain CP/M software include the
words “archives” or “archive” in their names). Therefore, we suggest explaining that a project
like ours involves long-term preservation of materials in a digital repository supported by a
Submission Plan
Because we archived CP/M programs from publicly available websites, we decided that it was
not necessary to draft a full SIP agreement and have it signed by the administrator of the website
from which the materials were obtained. Instead, we got specific written permission (via email)
to download materials that were already available for public download. We then created a
relatively short and straightforward submission plan that noted the following: (1) the materials to
be ingested were publicly available for download on the Internet; (2) if possible, the group got
written permission to download and archive the materials; and (3) the group also asserted that it
was archiving the materials under the “fair use” doctrine of United States copyright law. We
suggest doing the latter because the administrator of a website that offers CP/M programs will
almost certainly not have been the creator of all of the available materials.
Processing Materials
It was, in general, easier to work with the CP/M materials than we originally thought that it
would be. Because the materials were available online, we downloaded them without concern
about changing the metadata, which allayed our concerns about accidently changing an item.
Two of the concerns that we had during the project were the number of programs that we would
be able to archive and how we could automate the various processes that we needed to go
through in order to ingest the materials into DSpace. Because none of us had much experience
with Linux commands or scripting, creating advanced scripts seemed beyond the scope of our
One of the interesting aspects of our project was that it seemed to change scope from week to
week. At the beginning, when we did an inventory, we realized the variety and number of CP/M
materials available on the web, including operating-system software, utilities, programs,
emulators, and manuals. Thus, we decided that picking a few sample software packages to ingest
would be best, and we also decided to try to get some of the software running in an CP/M
emulator. Once we decided to work exclusively with materials from one website, it appeared that
that we could archive all of the CP/M software from that site. We then spent a few weeks
working on how we could batch download all of the materials, and we did so. But after we
realized the challenges in trying to identify and work with all of the different MIME types for all
of the different files, we returned to our original plan of a sample ingest. We did not complete the
gathering of item-level metadata until two weeks before the end of the semester.
Although the continual changing nature of the scope of our project was, at times, confusing, it
showed us that the nature of digital archiving is a process of trial and error, especially when
working with older materials and file formats.
Emulation and Virtualization
After doing quite a bit of online research, we got the YAZE-AG emulator to run a few CP/M
programs, and we got VirtualBox to boot CP/M-86. One of the major difficulties in completing
these taks was our limited knowledge of older computers. There were online instructions about
how to use the emulator, and we found some forum posts with direction for setting up
VirtualBox, but these instructions were quite limited and certainly not “step through” guides. We
believe that this is the case because CP/M enthusiasts have had much experience in working with
CP/M (as well as working with computers in general), so that the information that they post is
not necessarily intended to be detailed instructions for CP/M newcomers. In general, we found
that those who have a good working knowledge of today’s computers have more good ideas for
working with the materials. For example, even after the creator of the YAZE-AG emulator
emailed us some detailed instructions, we were not able to get a CP/M program to run until
Carlos Ovalle had a simple suggestion: he knew what to do because he had experience in
working with the command line.
Working with OpenSUSE 13.1
We decided to work in a Linux environment because it was compatible with our emulator and
VirtualBox. One early decision that we made was to choose which Linux distribution to install.
We learned that there are many different distributions of Linux, including Ubuntu, Debian,
Fedora, and Gentoo. We chose openSUSE because the documentation for the YAZE-AG
emulator recommended that we use it. We required the expertise of Shane Williams and Carlos
Ovalle to help us install the system—they showed us that in order to download any other
software, we would need to use a management tool called YaST. The operating system did not
come with some of the programs that one would expect, such as a text editor. It also required the
installation of a desktop through which we were eventually able to save files and documents.
Feeling ambitious, we selected the GNOME interface because it was different from Windows. In
retrospect, using a more familiar environment like KDE would have allowed us to work more
efficiently, but we are glad that we gained experience with Linux and openSUSE.
Metadata Selection
As we began composing our Dublin Core XML documents and preparing for batch ingest, we
faced some concerns regarding metadata. One the biggest was knowing which metadata elements
were best suited for the materials that were to be ingested. For example, when dealing with
multiple contributors, we were unsure of how to incorporate them into DSpace. This problem
was resolved when we decided to keep things simple and use the “other” qualifier. But this issue
resulted in a new problem: making sure that we were correctly using the metadata fields for
DSpace. For example, we had metadata indicating when the administrator of the website from
which we downloaded CP/M software packages made those packages available on the website,
but that did not translate well into the the Dublin Core elements and qualifiers for the dates that
DSpace offers. Ultimately, after discussing this problem with Dr. Galloway and among
ourselves, we decided to keep things simple and populate only the elements for which we knew
we had accurate information. This approach prevented us from including incorrect information in
The reason that we created metadata spreadsheets was to create XML for batch, but it might also
be beneficial to fill out such spreadsheets for manual ingest; doing so would be a good way to
create an inventory and gather data. With all of the information in one place, manual ingest may
be a viable alternative to batch ingest.
Division of Tasks
Below is the breakdown of tasks that we completed for this project:
We all contributed to the inventory of CP/M resources from the Internet.
Kendra handled the correspondence with potential donors.
We all inventored the Retrocomputing Archive website materials.
Marcia downloaded and created an installation disk for openSUSE.
Emily figured out how to download a desktop for the openSUSE interface.
We all worked together in the lab to set up our machine.
Roger wrote the submission plan.
Roger attempted a manual inventory of MIME types.
Marcia wrote the “Short and Extended CP/M History.”
Kendra wrote the “Retrocomputing Archive History.”
Kendra found scripts to batch download and process files from the Retrocomputing Archive.
We all worked together to download and process files onto our machine.
We all researched MIME types.
Emily designed the special metadata ingest spreadsheet for the software.
Roger drafted the project report.
We all looked for metadata values for the selected software.
Marcia set up the DSpace structure.
Marcia created the CP/M logo.
Emily reviewed the metadata and accepted collections in DSpace.
Emily converted the metadata from our Excel spreadsheet to XML files for the batch upload.
Kendra created the contents files for the DSpace batch upload.
Emily wrote the “Scope and Content” note for the CP/M community home page.
Marcia wrote the “Scope and Content” notes for the Sub-Communities.
Marcia researched emulators and installed YAZE-AG.
Kendra researched VirtualBox and CP/M floppy-disk images.
Kendra uploaded the Virtualbox and instructions into DSpace.
Marcia uploaded the YAZE emulator into DSpace.
All members contributed to photographs.
Marcia made the project-presentation powerpoint.
All members contributed to “Project Documentation” and “Project Report.”
Roger did final proofreading and editing of “Project Documentation” and “Project Report.”
Special Thanks to the iSchool Faculty & Staff
Shane helped us with installation and general questions about openSUSE.
Carlos helped us with the openSUSE installations, the magic CP/M-YAZE-AG script to import
files, and all batch uploads.
Dr. Galloway assisted us in all parts of the project.