3.0 Best Practices for Creating Digital Images This section describes best practices for digitization of print based original documents including photographs, manuscripts, maps, and text. These guidelines draw heavily on previously published standards and best practices developed by standards agencies and peer institutions, particularly those of the California Digital Library. Best practices for newspaper digitization are covered in Section 5. Best practices for intellectual property rights (IPR) issues, which should be investigated before scanning materials and making them publicly accessible, are covered in Section 15. Table of contents 3.1 Types of files produced • Master (archival) files • Access files • Thumbnails • Other files for textual materials 3.2 Minimum master image quality requirements • Textual Documents, Graphic Illustrations/Artwork, Maps, and Plans • Photographs: Transmissive Originals (Film, Slides, and Negatives) • Photographs: Reflective Originals (Prints) • Aerials: Transmissive Originals (Film, Slides, and Negatives) • Aerials: Reflective Originals 3.3 Minimum image quality requirements for digital access and thumbnail image files 3.4 Additional resources 3.1 Types of files produced At a minimum, digitization of library materials should result in the creation of a master (archival) image and at least one access derivative for web display. Depending on the format of the material and its anticipated use and display, other files, such as thumbnails, PDFs, and OCR text files, may also be produced. Files should be named in accordance to the best practices for file naming in Section xx of this document. As they are produced, files should be saved to a workspace that resides on a Library server that is backed up nightly. • Master (archival) files are the source files for all other digital files and ensure the longterm usability of the digital information. A digital master file may serve as a surrogate for the original, may completely replace originals, or may be used as security against possible loss of originals due to disaster, theft and/or deterioration. Images are captured at a quality high enough to serve these potential uses via scanning or digital photography, depending on the attributes of the original. The digital master file should represent as accurately as possible the visual information in the original object. In general, decisions about image capture should err towards the highest quality. Files should use color rather than grayscale when color is an integral attribute of the original, and any compression applied to the file should be lossless. Accuracy and consistency in tone and color reproduction through appropriate use of scanner or camera controls is the goal; • Access files are derived from master files and are used for presentation and transmission over networks. These images should be of good quality, but because their spatial resolution (measured in pixels per inch) is lower, the file size is smaller. Some minor post-scan adjustments to optimize image quality and to bring all images to a common rendition are acceptable. Such adjustments include the use of appropriate image processing tools to achieve final color balance and tone distribution and to sharpen scanned images to match appearance of the originals. • Thumbnail files are very small files used in databases or web pages. Clicking on the thumbnail image will pull up the larger original image, which can be viewed and downloaded. • Additional files for textual materials • PDF (Portable Document Format) files are generally an appropriate access derivative for multi-page text documents and books. PDFs preserve the layout and formatting of original documents (including fonts and special characters, like formulas). When making the PDF, set the Compatibility at Acrobat 5.0 (PDF 1.4), embed all fonts, specify color spaces in a device-independent manner, and do not use any encryption. Use high-resolution images to create the PDF and then optimize the file for web display. PDF files of textual materials should be made full-text searchable. See complete best practices for PDF creation in Section 6 of this manual. • OCR (optical character recognition) text file may also be created using ABBYY FineReader. OCR should be derived from the high-resolution image files. Adobe Acrobat Professional can produce OCR for clear, high contrast laser printed or typeset documents; however, ABBYYFineReader has a much lower error rate and can analyze text prior to recognition to produce more accurate results. ABBYY FineReader is the recommended software for creating OCR for all other types of texual documents. See complete best practices for OCR creation in Section 5 of this manual. 3.2 Minimum image quality requirements for digital masters The imaging quality requirements for master digital images are given below. These requirements should be viewed as the minimum necessary to create quality digital images and may be exceeded when warranted and when storage space permits. Associate technical metadata should be saved to the header file. Textual Documents, Graphic Illustrations/Artwork, Maps, and Plans Features of original Clear, high-contrast documents with printed type (e.g., laser printed or typeset Digital Master Image File File format • TIFF or lossless JPEG2000 Pixel array: • Minimum of 6000 pixels across long dimension for 1-bit bitonal mode. • Minimum of 400 pixels across long dimension for 8-bit grayscale. Resolution and bit depth: • 1 bit bitonal mode - 600 PPI for documents with smallest significant character of 1.0 mm or larger. The 600 PPI 1-bit files can be produced via scanning or created/derived from 400 PPI, 8-bit grayscale images. • or – • 8 bit grayscale model - 400 PPI for documents with the smallest significant character of 1.0 mm or larger. Documents with poor legibility or diffuse characters (e.g., carbon copies, Thermofax/Verifax), handwritten annotations or other markings, low inherent contrast, staining, facing, halftone illustrations, or photographs File format: • TIFF or lossless JPEG2000 Pixel array: • Minimum of 4000 pixels across long dimension. Resolution and bit-depth • 8-bit grayscale model - 400 PPI for documents with smallest significant character of 1.0 mm or larger. Documents as described for grayscale scanning and/or where color is important to the interpretation of the information or content, or desire to produce the most accurate representation File format: • TIFF or lossless JPEG2000 Pixel array: • Minimum of 4000 pixels across long dimension Resolution and bit depth: • 24 bit RGB mode – 400 PPI for documents with smallest significant character of 1.0 mm or larger Photographs: Transmissive Originals (Film, Slides, and Negatives) Features of original Format range: • 35 mm and medium format, up to 4 x 5 in. Size range: • Smaller than 20 square in. Digital Master Image File File format: • TIFF or lossless JPEG2000 Pixel array: • 4000 pixels across long dimension of image area, excluding mounts and borders Format range: • Equal to or larger than 4 x 5 in. and up to 8 x 10 in. Resolution: • Adjust the scan resolution to meet pixel array specifications, based on the format of the original object – approximately 2800 PPI for 35mm originals and ranging down to approximately 800 PPI for originals approaching 4 x 5 in. Bit depth: • 8-bit grayscale mode for black-and-white, can be produced from a 16-bit grayscale file -or – • 24-bit RGB mode for color and monochrome (e.g., collodion wet-plate negative, pyro developed negatives, stained negatives, etc.), can be produced from a 48-bit RGB file. File format: • TIFF or lossless JPEG2000 Size range: • Equal to or larger than 20 square in. and up to 80 square in. Pixel array: • 6000 pixels across long dimension of image area, excluding mounts and borders. Resolution: • Adjust the scan resolution to meet pixel array specifications, based on the format of the original object – approximately 1200 PPI for 4 x 5 in. originals and ranging down to approximately 600 PPI for 8 x 10 in. originals. Format range: • Equal to or larger than 8x10 in. Bit depth: • 8-bit grayscale mode for black-and-white, can be produced from a 16-bit grayscale file. -or• 24-bit RGB mode for color and monochrome (e.g., collodion wet-plate negative, pyro developed negatives, stained negatives, etc.,), can be produced from a 48-bit RGB file. File format: • TIFF or lossless JPEG2000 Size range: • Equal to or larger than 80 square in. Pixel array: • 8000 pixels across long dimension of image area, excluding mounts and borders. Resolution: • Adjust the scan resolution to meet pixel array specifications, based on the format of the original object – approximately 800 PPI for 8 x 10 in. originals and ranging down to produce the desired size file from larger originals. Bit depth: • 8-bit grayscale mode for black-and-white, can be produced from a 16-bit grayscale file. -or• 24-bit RGB mode for color and monochrome (e.g., collodion wet-plate negative, pyro developed negatives, stained negatives, etc.,), can be produced from a 48-bit RGB file. Photographs: Reflective Originals (Prints) Features of original Format range: • 8x10 in. or smaller Size range: • Smaller than or equal to 80 square in. Digital Master Image File File format: • TIFF or lossless JPEG2000 Pixel array: • 4000 pixels across long dimension of image area, excluding mounts and borders Resolution: • Adjust the scan resolution to meet pixel array specifications, based on the format of the original object – approximately 400 PPI for 8x10 in. originals and ranging up to the appropriate resolution to produce the desired size file from smaller originals, approximately 570 PPI for 5x7 in. and 800 PPI for 4 x 5in. or 3.5x5 in. originals. Bit depth: • 8-bit grayscale mode for black-and-white, can be produced from a 16-bit grayscale file. -or• 24-bit RGB mode for color and monochrome (e.g., albumen prints or other historic print processes), can be produced from a 48-bit RGB file. Format range: • Equal to or larger than 8x10 in. and up to 11x14 in. Size range: • Equal to or larger than 80 square in. and up to 154 square in. File format: • TIFF or lossless JPEG2000 Pixel array: • 6000 pixels across long dimension of image area, excluding mounts and boarders. Resolution: • Adjust the scan resolution to meet pixel array specifications, based on the format of the original object – approximately 600 PPI for originals approximately 8x10 in. and ranging down to approximately 430 PPI for 11x14 in. originals. Bit depth: • 8-bit grayscale mode for black-and-white, can be produced from a 16-bit grayscale file. -or• 24-bit RGB mode for color and monochrome (e.g., albumen prints or other historic print processes), can be produced from a 48-bit RGB file. Format range: • Equal to or larger than 11x14 in. File format: • TIFF or lossless JPEG2000 Size range: • Equal to or larger than 154 square in. Pixel array: • 8000 pixels across long dimension of image area, excluding mounts and boarders. Resolution: • Adjust the scan resolution to meet pixel array specifications, based on the format of the original object – approximately 570 PPI for originals approximately 11x14 in. and ranging down to the appropriate resolution to produce the desired size file from larger orignals. Bit depth: • 8-bit grayscale mode for black-and-white, can be produced from a 16-bit grayscale file. -or• 24-bit RGB mode for color and monochrome (e.g., albumen prints or other historic print processes), can be produced from a 48-bit RGB file. Aerials: Transmissive Originals (Film, Slides, and Negatives) Features of original Digital Master Image File NOTE: If scans of aerial photography will be used for oversized reproduction, follow the scanning recommendations for the next largest format (e.g., if your original is 70 mm wide, follow the speicifications for 127 mm wide roll film to achieve 8,000 pixels across long dimensions). Format range: File format: • 70 mm wide and medium format roll film • TIFF or lossless JPEG2000 Size range: • Smaller than 10 square in. Pixel array: • 6000 pixels across long dimension of image area, excluding mounts and borders. Resolution: • Adjust the scan resolution to meet pixel array specifications, based on the format of the original object – approximately 2700 PPI for 70mm originals and ranging down to the appropriate resolution to produce the desired size file from larger originals. Bit depth: • 8-bit grayscale mode for black-and-white, can be produced from a 16-bit grayscale file -or• 24-bit RGB mode for color and monochrome (stained negatives,) can be produced from a 48 bit RGB file. Format range: • 127 mm wide roll film, 4x5 in. and up to 5x7 in. sheet film Size range: • Equal to or larger than 10 in. and up to 35 square in. File format: • TIFF or lossless JPEG2000 Pixel array: • 8000 pixels across long dimension of image area, excluding mounts and borders. Resolution: • Adjust the scan resolution to meet pixel array specifications, based on the format of the original object – approximately 1600 PPI for 4x5 in. originals and ranging down to approximately 1100 PPI for 5x7 in. originals. Bit depth: • 8-bit grayscale mode for black-and-white, can be produced from a 16-bit grayscale file -or• 24-bit RGB mode for color and monochrome (stained negatives,) can be produced from a 48 bit RGB file. Format range: • Larger than 127 mm wide roll film and larger than 5x7 in. sheet film Size range: • Equal to or larger than 35 square in. File format: • TIFF or lossless JPEG2000 Pixel array: • 10000 pixels across long dimension of image area, excluding mounts and borders. Resolution: • Adjust the scan resolution to meet pixel array specifications, based on the format of the original object – approximately 2000 PPI for 5x5 in. originals and ranging down to the appropriate resolution to produce the desired size file from larger originals. Bit depth: • 8-bit grayscale mode for black-and-white, can be produced from a 16-bit grayscale file -or• 24-bit RGB mode for color and monochrome (stained negatives,) can be produced from a 48 bit RGB file. Aerials: Reflective Originals Features of original Digital Master Image File NOTE: If scans of aerial photography will be used for oversized reproduction, follow the scanning recommendations for the next largest format (e.g., if your original is 8x10 in., follow the specifications for formats larger than 8x10 in. to achieve 6000 pixels across long dimensions. Format range: File format: • Smaller than 8x10 in. • TIFF or lossless JPEG2000 Size range: • Smaller than 80 square in. Pixel array: • 4000 pixels across long dimension of image area, excluding mounts and borders. Resolution: • Adjust the scan resolution to meet pixel array specifications, based on the format of the original object – approximately 400 PPI for 5x5 in. originals approximately 8x10 in. and ranging up to the desired size file from smaller originals, approximately 570 PPI for 5x7 in. and 800 PPI for 4 x 5 in. originals. Bit depth: • 8-bit grayscale mode for black-and-white, can be produced from a 16-bit grayscale file -or• 24-bit RGB mode for color and monochrome (stained negatives,) can be produced from a 48 bit RGB file. Format range: • Equal to or larger than 8x10 in. and up to 11x14 in. Size range: • Equal to or larger than 80 square in. and up to 154 square in. File format: • TIFF or lossless JPEG2000 Pixel array: • 6000 pixels across long dimension of image area, excluding mounts and borders. Resolution: • Adjust the scan resolution to meet pixel array specifications, based on the format of the original object – approximately 600 PPI for 8x10 in. originals and ranging down to approximately 430 PPI for 11x14 in. originals. Bit depth: • 8-bit grayscale mode for black-and-white, can be produced from a 16-bit grayscale file -or• 24-bit RGB mode for color and monochrome (stained negatives,) can be produced from a 48 bit RGB file. Format range: • Equal to or larger 11x14 in. File format: • TIFF or lossless JPEG2000 Size range: • Equal to or larger than 154 square in. Pixel array: • 8000 pixels across long dimension of image area, excluding mounts and borders. Resolution: • Adjust the scan resolution to meet pixel array specifications, based on the format of the original object – approximately 570 PPI for 11x14 in. originals and ranging down to appropriate resolution to produce the desired size file from larger originals. Bit depth: • 8-bit grayscale mode for black-and-white, can be produced from a 16-bit grayscale file -or• 24-bit RGB mode for color and monochrome (stained negatives,) can be produced from a 48 bit RGB file. 3.3 Minimum image quality requirements for digital access and thumbnail image files Access images File format: • JPEG (medium to high quality compression, sRGB profile for color and Gray Gamma 2.2 profile for monochrome) or JPEG2000 (lossy). Pixel array: • 800-3000 pixels across long dimension. Resolution and bit depth: • 8-bit grayscale or 24-bit color: 72-200 PPI NOTE: In creating access images, scanned images should have Unsharp Mask applied to them in Photoshop. The following settings are recommended. • • Amount: 100% - 200% Radius: 1 to 2 pixels • Thumbnail images Threshold: 2 to 8 levels File format: • GIF (adaptive/perceptual palette, diffusion/noise dither). Pixel array: • GIF images should fit within a boundary of 150200 pixels across each dimension (200 pixels preferred). Resolution and bit depth: • GIF images should be 4-bit grayscale, 8-bit color: 72 PPI. 3.4 Additional resources • • • • • NARA Technical Guidelines for Digitizing Archival Materials for Electronic Access: Creation of Production Master Files - Raster Images (http://www.archives.gov/preservation/technical/guidelines.html) California Digital Library Guidelines for Digital Images (http://www.cdlib.org/inside/diglib/guidelines/bpgimages/cdl_gdi_v2.pdf) Moving Theory into Practice: Digital Imaging Tutorial (Cornell) http://www.library.cornell.edu/preservation/tutorial/contents.html University of Maryland Best Practice Guidelines for Digital Collections (http://www.lib.umd.edu/dcr/publications/best_practice.pdf) North Carolina ECHO Project Digitization Guidelines (http://www.ncecho.org/dig/digguidelines.shtml)