Guidelines For Scanning University Records Scanning, or digital imaging, is an increasingly popular strategy for dealing with records. Scanning can be a useful tool for managing your records and enhancing workflow, but is not always a good idea. Anyone thinking of scanning their records needs to keep a number of issues in mind. The key distinction to make when considering scanning is between access and preservation. Scanning is not inexpensive, and it is not a good strategy for long-term preservation of records. It is almost never a good idea unless it is used to create better access to the records. What is scanning? Scanning, also referred to as digital imaging, is a process whereby a document is converted from print to a computer-readable format. You can think of the digitized version as a photocopy that can be viewed on your computer. Digital images produced by scanning are equivalent to the photographs one produces with digital cameras: they can be transmitted, displayed, and printed, but as images they are not text searchable. In order to make searchable electronic text, one must either transcribe records by typing or perform optical character recognition (OCR) processes upon digital images following scanning. Why would I scan records? Scanning's great strength is as a means of providing access to records. When records need to be accessed frequently, or from remote locations, or simultaneously by multiple users, scanning can be a cost effective means of distributing and rendering information. However, if full-text searching is required, the cost will go up considerably, due both to the OCR process itself and increased quality-checking. Scanning records to save on storage costs is not likely to be cost-effective. Always do a full cost analysis before attempting this. An analysis is also needed before scanning records that need to be retained for a long period of time, or which are to be retained permanently. The greatest issue with managing all electronic information is technological obsolescence. This means that the technology used to read scanned records is advancing at a great rate, and the systems needed to read your records may become obsolete long before your need for the records has ended. In this case you will need to plan - and budget - for periodic migrations of the records to newer systems. The digital images will also need to be reformatted as the software used to create and read the digital images becomes obsolete. Additionally, for records with permanent retention periods, the original paper documents may need to be maintained as well as the digital images. What is the scanning process? The following is a quick overview of the scanning process: Document arrangement: Prior to scanning, determine the units of organization for the digital copies. Will they mimic the arrangement of the original prints, or will they, for example, be separated from one source item into multiple documents? Scanning is not always a one (source)-to-one (digital copy) process. Pusey Library – Harvard Yard, Cambridge, MA 02138 | T: 617.495.5961 | F: 617.495.8011 | archives_rms@harvard.edu Document preparation: Physically clean up the documents to prepare them for scanning (remove staples, unfold paper, remove extraneous documents, etc.) Identification: Consider what metadata (information about the documents) will need to be made to describe and organize the digital copies. This metadata may be recorded in something as simple as a file-naming convention or as complex as an indexing system. The primary reason for scanning is to facilitate access to the records. Batch-level scanning cannot automatically associate a group of digital images with a specific document or record. Plan for additional procedures and possibly systems to generate appropriate metadata to accompany digital images. Technical considerations: Decide on file formats and other technical requirements for scanning, storage, and retrieval. Quality control: Images must be inspected to ensure that they are of good enough quality for the purpose for which they are being scanned. In some cases, every image must be reviewed, in others only a sampling. Storage: Digital files and digital media are inherently fragile. Regardless of storage media used, it is always prudent to make multiple copies and, ideally, to store the copies in separate locations-even during the production phase of a scanning project. Disposition of source documents: Discard the paper once you are satisfied that the electronic records are accurate. The scanned images are generally an acceptable substitute for the original documents provided the scanning process has been carefully documented and the authenticity of the records ensured, and the images themselves are useable. Your scanning workflow must have safeguards and controls built into it so that you can assert to the satisfaction of a court that the images reproduced from the system are accurate representations of the original documents and that the information in the system has not been tampered with. Migration, beyond storage: As discussed above, the greatest issue with scanning is technological obsolescence. A plan for forward migration of digitally imaged records must be put in place at the outset of the project and monitored as long as the records exist. Digital records are as bound by retention requirements as those in hard copy and failing to migrate records forward if they are still within their required retention period can hurt your office in the event of litigation or audit. How much does scanning cost? As you might guess from the above, a scanning project is an expensive undertaking. The actual scanning of the documents is the cheapest part of the process. Preparing the documents prior to being scanned can easily account for one-third of the project budget. And ensuring the records remain available over time may be even more costly. For paper records, it can take many years for the cost of scanning to catch up to the storage cost. The following table has cost estimates for just the scanning step (which might not be the most expensive part). COST PER PAGE (SCANNING ONLY) PAGES PER BOX COST PER BOX COST FOR ONE YEAR OF STORAGE NUMBER OF YEARS TO BALANCE COSTS $.10 $.35 3000 3000 $300.00 $1050.00 $9.00 $9.00 34 117 Since scanned records must be migrated forward with hardware and software changes, you will have to budget for this on an ongoing basis. All of these numbers will vary, so any office considering scanning their records should do a full cost analysis. Pusey Library – Harvard Yard, Cambridge, MA 02138 | T: 617.495.5961 | F: 617.495.8011 | archives_rms@harvard.edu What are the storage format and media issues? Both "master" and "use" images should be created under some circumstances. The master image will be of the highest quality and only used for creating new use images. Keep a master copy if any of the following are true: o The use images will be in a format not suitable for long-term preservation. o Different formats of use images need to be created (e.g., a GIF image for each page for on-screen display, and a PDF version for printing). Creating copies from a higher quality image will produce the best copies. o The printed original is being destroyed or is difficult to access. Software and hardware obsolescence - Some aspect of the formatting (most likely operating system or file type) or hardware (most likely disk type) will require migration within 5 years or so. Migration will add to the cost so should be factored into cost estimates. Certain media storage formats, like magnetic tape and removable disks (e.g. DVD) make it difficult to apply retention periods since the whole disk or tape has to be disposed of at once, even if individual records have different retention periods. Make backup copies o For removable disks, make at least 2 copies of each disk and keep them in separate, secure, locations. Removable disks, either magnetic or optical, can, and do, go bad with no notice, so keeping one copy is foolhardy, especially considering the expense of scanning. o Server-based records should be regularly backed up. Even then, though, a long-term backup is advisable since damage early in the storage period would produce many future damaged backups. o Check originals and backups regularly so that errors are discovered quickly. There's no easy answer for deciding when to migrate the records but it is simple: do it while you still can. A good rule of thumb is to review stored electronic records whenever the office changes its regular software. Storing long-term records only on removable disks (like CDs or DVDs) is generally not a good idea. Storing at least one copy online (e.g., a file server or web server) is recommended. o Long-term records will periodically need to be migrated to new formats and media. Migrating files kept on several removable disks is a tedious process. o Multiple copies of each disk must be created since removable disks can become unusable in a short period of time. Technical specifications for the master image will depend on the project, but here are some common specifications: o TIFF (Tagged Image File Format) is a broadly adopted file format standard applicable to black-and-white (1-bit), grayscale (8-bit), and color (24-bit) digital images. o for color images, 24-bit RGB without compression for non-color images containing illustrations, 8-bit grayscale without compression for non-color images containing only text and/or line art, 1-bit with ITU-6 (aka "Group 4") lossless compression Capture at a high-enough dpi (dots per inch) level to render the image clearly but not higher than necessary since that will increase the cost of storage. o For further information, contact Records Management Services. What are the security considerations? For records that may be subject to audit or legal action, the process of preparing, scanning, and storing the records must be laid out firmly with proper security precautions applied. Process is everything in making electronic records hold up in court. Security precautions should include: personnel access restrictions proper metadata Pusey Library – Harvard Yard, Cambridge, MA 02138 | T: 617.495.5961 | F: 617.495.8011 | archives_rms@harvard.edu good scanning procedures quality control making sure that the images can't be tampered with after scanning. Consult ANSI/AIIM TR 31-2004 - Legal Acceptance of Records Produced by Information Technology Systems for information on the procedures necessary for creating and maintaining legally acceptable imaged records. The drive within a scanner may retain copies of your images after the job is done. These drives should be wiped clean when the process is complete. For certain types of information, permission from your CIO for a scanning project may be required. For further information on security, check the Harvard Enterprise Security, especially the section on working with vendors. Pusey Library – Harvard Yard, Cambridge, MA 02138 | T: 617.495.5961 | F: 617.495.8011 | archives_rms@harvard.edu REVISED 7/9/2012 Pusey Library – Harvard Yard, Cambridge, MA 02138 | T: 617.495.5961 | F: 617.495.8011 | archives_rms@harvard.edu