Digitisation Disposal Policy Toolkit Quality Assurance Guidance Queensland State Archives August 2014 Department of Science, Information Technology and Innovation Document details Security Classification PUBLIC Date of review of security classification Authority Author August 2014 Document Status Version Final Version Queensland State Archives Queensland State Archives Version 1.1 Contact for enquiries All enquiries regarding this document should be directed in the first instance to: Manager, Agency Services Queensland State Archives 07 3037 6630 rkqueries@archives.qld.gov.au Copyright Digitisation Disposal Policy Toolkit – Quality Assurance Guidance Copyright © The State of Queensland (Department of Public Works) 2010 Licence Digitisation Disposal Policy Toolkit – Quality Assurance Guidance by Queensland State Archives is licensed under a Creative Commons Attribution 2.5 Australia Licence. To view a copy of this licence, please visit http://creativecommons.org/licenses/by/2.5/au/. Information security This document has been security classified using the Queensland Government Information Security Classification Framework (QGISCF) as PUBLIC and will be managed according to the requirements of the QGISCF. Page 2 of 14 Digitisation Disposal Policy Toolkit – Quality Assurance Guidance Table of contents 1 Introduction ..................................................................................................................... 4 1.1 Background ............................................................................................................ 4 1.2 Purpose................................................................................................................... 4 1.3 Audience ................................................................................................................. 4 1.4 Authority ................................................................................................................. 4 1.5 Scope ...................................................................................................................... 5 1.6 Definitions .............................................................................................................. 5 2 Developing and implementing quality assurance ......................................................... 5 2.1 Scanning equipment .............................................................................................. 6 2.2 Creation of the digitised record ............................................................................ 7 2.3 Capture of metadata............................................................................................. 10 3 Retention of original paper records .................................................................... 11 More Information................................................................................................................ 11 Appendix A – Quality Control Checklist ........................................................................... 12 Page 3 of 14 Department of Science, Information Technology and Innovation 1 Introduction 1.1 Background When planning and implementing a digitisation project or business processes, quality assurance procedures and guidelines are important to ensure the digitised records meet the requirements of their intended use. Effective quality assurance is critical in the event the original paper record is lawfully disposed of, as the digitised record becomes the enduring evidence of the business activity, and without strong quality controls this evidence is at risk of being inaccurate, incomplete and illegible. Queensland State Archives’ Digitisation Disposal Policy outlines the conditions and requirements Queensland public authorities must meet if they wish to destroy original paper records after they have been digitised. Principle 3 of this Policy requires public authorities to put in place trusted systems and processes for the capture and management of digitised records. To be compliant with this principle public authorities must (amongst other requirements) have quality assurance procedures in place that include: The timing of equipment tests and equipment calibration Procedures for checking output, such as what proportion of the digital reproductions will be subject to visual inspection and how long the original records need to be retained after digitisation to ensure that quality checking processes can be undertaken Procedures for re-imaging if quality standards are not met, and Roles and responsibilities for checking and approving output. Robust quality assurance procedures are important because in the event of evidential challenge, public authorities may need to demonstrate in court that trusted systems and processes were working as they should be, on the day the digitised record was created. 1.2 Purpose This document provides guidance for Queensland public authorities on the development and implementation of quality assurance procedures for digitisation projects and operations. It has been developed to assist public authorities to meet the minimum requirements of Principle 3 of the Digitisation Disposal Policy. 1.3 Audience The primary audience for this document is staff responsible for planning or implementing digitisation projects and business processes. 1.4 Authority The State Archivist has issued this policy in accordance with section 25(1)(f) of the Public Records Act 2002 (the Act). Queensland State Archives is responsible for the provision of policy relating to Page 4 of 14 Digitisation Disposal Policy Toolkit – Quality Assurance Guidance a wide range of strategic information management and recordkeeping issues for Queensland public authorities. This policy forms one part of a wider policy framework that aims to promote best practice recordkeeping and information management in Queensland public authorities. Under section 7 of the Act, the Chief Executive Officer of a public authority is responsible for ensuring that the authority makes and keeps full and accurate records of its activities, and has regard to the policies, standards and guidelines issued by the State Archivist. 1.5 Scope This document forms part of the Digitisation Disposal Policy Toolkit. It is intended to be used in conjunction with the other Toolkit items. It focuses on approaches for ensuring the quality of the creation of digitised records through scanning processes. This document does not cover the quality control and assurance processes undertaken for specific project management purposes, or for ensuring that the digitised record remains legible and accessible over time. 1.6 Definitions Digitisation-related terms and broader records and information management-specific terms are defined in the Glossary of Archival and Recordkeeping Terms available from Queensland State Archives’ website. 2 Developing and implementing quality assurance Quality assurance is a critical component of digitisation activity. Quality assurance is not simply a check on the output of digitisation, but a process that should be built into and maintained in the ongoing operation of the digitisation work. The International Standard Organisation’s ISO 9000 series on quality management provides guidance on introducing a quality assurance system within an organisation. This includes identifying the quality measures and processes, ensuring staff are adequately educated and the monitoring, reviewing and improvement of the processes. Specific controls that should be established and maintained for any digitisation program include assessment of the quality of: Scanning equipment Business process of creating the digitised image Metadata. These quality controls are further explored in this document. All quality assurance procedures should be documented and approved by senior management. All quality control data (such as logs, reports, decisions) should also be captured in an agency’s recordkeeping system. This becomes an integral part of the image metadata and may be used to demonstrate authenticity of the digitised images and inform future preservation decisions. Page 5 of 14 Department of Science, Information Technology and Innovation It is important to agree on and test quality measures before image capture commences, to ensure that they can be implemented and produce acceptable results. It is also important to undertake periodic revisions of the quality measures so that they remain relevant to the intended purpose of the records, and reflect emerging technology, legislation, and industry trends. Quality assurance requires the perspectives of different stakeholders and therefore a range of staff should be consulted to determine the appropriate quality controls and measures. It is also important to identify and clearly articulate the organisational roles and responsibilities for quality assurance, including the frequency of these duties, and ensure all staff are adequately trained in both their operational and quality assurance responsibilities. Where digitisation has been outsourced, a public authority should assign roles and responsibilities for ensuring quality standards are monitored and met throughout the duration of the outsourced arrangement and these responsibilities are clearly articulated in contractual agreements. A checklist of key quality assurance questions, which may be used when establishing or reviewing digitisation activity, is provided in Appendix A. 2.1 Scanning equipment Digitisation relies on regularly maintained and correctly calibrated hardware and software to produce high quality images that meet quality baselines. To establish acceptable levels of quality for digital image capture, the scanning hardware system should be tested by the use of scanner test targets or charts (see examples in Figure 1). These contain a wide range of material which provide the ability to judge output in carefully measured increments for such aspects as resolution, text, fonts, line widths, colour, tonal range, handwriting and halftone. Figure 1 – Standard “targets” can be used to test the functionality of digitisation equipment Photocopies of these test targets should not be used for calibration purposes as the process of copying denigrates the quality of the material being used to benchmark the results. To undertake calibration testing, the test target is scanned and the quality of the image is checked against the benchmark settings. To establish a benchmark, the test target should be scanned at a high resolution, and in full scale view. The technical settings that allow an appropriate level of legibility, clarity, and range of tones and colours of the digitised image should be assessed and documented. The International Standard – ISO 12653 - Electronic imaging: Test target for the black-and-white scanning of office documents Parts 1 and 2 contains guidance on evaluating the output quality of a black-and-white scanning system for office documents, against a specified target document. ISO 12641 - Graphic technology - Prepress digital data exchange - Colour Page 6 of 14 Digitisation Disposal Policy Toolkit – Quality Assurance Guidance targets for input scanner calibration provides advice related to evaluating the output quality of colour scanning systems. If the calibration test results are outside of predefined bounds, then remedial action should take place with the calibration process repeated until the parameters are within limits. The frequency of the calibration testing will depend on the volume of use. Some equipment may not have any calibration settings that are user-adjustable, and may only need calibration at servicing or maintenance periods. Exact parameters and suggested intervals for calibration can be determined with input from an agency’s hardware and software suppliers, and should be documented with other quality controls. 2.2 Creation of the digitised record Almost all of the individual steps involved in converting a paper record into a readable and accessible digital image can have quality assessments placed against them. The key is to understand how the work is performed and identify key points at which quality checks should be made. Checking the quality of the output of a digitisation process entails consideration of a range of issues that include: 2.2.1 The extent of the quality checking Quality of digitised image Baselines for acceptable and unacceptable characteristics of the digitised image must be established so that a consistent level of quality can be maintained. These may be general, perhaps simply requiring that each digital file be visually compared to the original paper record, or complex, involving quantitative analysis of digital images using computer equipment to ensure that the properties of a digital file meet accepted international standards. The complexity and detail of quality baselines will depend on the project’s aims and the nature of records involved. Strict and detailed quality control should be applied to digital images if the project intends to destroy the original paper records, convert an important collection for long-term access, or make high quality reproductions of a paper document. The quality of images can be evaluated using software to examine technical aspects of images. For example, noise in images is caused by random pixel fluctuations, and may make images appear grainy. Software can be used to measure the level of noise in images, to check that it is minimised to an acceptable level. Some digitisation processing and management software may have the ability to modify the appearance of a digitised record by adding information such as the date or organisation name. Two such techniques are watermarking and fingerprinting. Watermarking is the inclusion of static information on an image at time of storage, perhaps the name of the organisation and date of capture. Fingerprinting typically includes information generated when the image is accessed, such as login name of the end user and date / time information. While this information may be useful, and the inclusion of it as part of the image convenient, these modified images are no longer a true and accurate copy of the original paper records. This is especially relevant where added information, such as a large watermark through the text, makes Page 7 of 14 Department of Science, Information Technology and Innovation the content of the record difficult to read. Public authorities should instead retain the digitised record as an unmodified representation of the original paper record and capture this additional information as metadata rather than as part of the image. Some quality control relies on human judgement. Human judgement is often subjective and therefore results of visual inspections may vary from person to person. If a number of staff are responsible for visual inspections, training should be provided to communicate qualitative information effectively and additional quality checking performed by supervisory staff to help ensure a greater level of consistency. If Optical Character Recognition (OCR) is used where the text depicted in a scanned paper document can be extracted as a text file or word processor document. OCR software is required to recognise the text contained in the image and usually provides search and export capabilities. OCR is rarely a fully automated process and may require operator intervention to assist in obtaining an accurate transcription of the scanned record’s text. Documents containing handwriting, serif fonts, halftones, and background text or images or those that are damaged or dirty may not be suited to the OCR process. Public authorities may choose to use techniques to routinely make the digitised image more accurately resemble the original, for example, ‘sharpening’ and/or ‘clipping’ of highlights or shadows, ‘blurring’ to eliminate scratches, ‘spotting’ or ‘de-speckling’ to touch up specific areas of a digital image. Some software may automatically correct imperfections and the extent of these processes can be set through tolerance levels. Where touch-up occurs, these processes should be assessed to make sure information is not lost (for example, if the tolerances are set too high, the dots above the letter ‘i' may be removed). Processes employed should be documented so as to help ensure the authenticity and completeness of the records is not at risk of being challenged. Aspects against which the digitised output could be inspected and checked include:1 Has the smallest detail been legibly captured? (e.g. smallest type size for text; clarity of punctuation marks, including decimal points) Are all details complete? (e.g. acceptability of broken characters, missing segments of lines, missing information at the edges of the image area, images cropped or incomplete) Do the dimensions accurately compare with the original? Has scanner-generated speckle been removed? (i.e. speckle not present on the original) Do the colours accurately compare with the original? (e.g. density of solid black areas – too light? too dark?; colour fidelity) Is the sharpness of the image comparable to the original? (e.g. lack of sharpness or too much sharpening; unnatural appearance and halos around dark edges) Where optical character recognition2 (OCR) is used, is the captured text accurate? 1 Adapted from the Archives New Zealand Digitisation Standard, OCR is a process in which printed characters are scanned, recognised and coded – The Australian Concise Oxford Dictionary, Third Edition, 1997. 2 Page 8 of 14 Digitisation Disposal Policy Toolkit – Quality Assurance Guidance Completeness of digitisation To ensure all of the required paper records are digitised, checks should be conducted on the completeness of the work such as validating the number of input paper documents against the number of digitised images created, and checking that for multi-page items, the number of pages within a document accurately reflect the input bundle, and are structured and arranged in the correct order. 2.2.2 The amount to check An important aspect of quality control is determining the proportion of digitised images that will be checked. All digital images can be tested, or a representative sample of digitised documents may be selected. Testing all digital images will ensure that all images meet the minimum required quality levels, but can be very time and resource intensive. If, however, only a sample is tested, care must be taken to ensure that the sample is representative of the range of records digitised and include examples of source documents of which quality is poor. In some cases, such as following equipment repairs, or if using new staff or outsourcing vendors, each image may be checked until there is confidence that the standard is being met. However, testing only a sample of digital images gives a lower degree of certainty that all images have met quality baselines. 2.2.3 The location of quality checking Quality baselines should be established for the output device that a digital record is intended for and be verified using that device3. If a digital image is intended for printing, then the digital file should be printed and checked against the quality baselines for printed images. If a digital image is intended for display on a computer monitor, quality baselines should be verified on a computer monitor. A controlled environment is required to consistently apply quality baselines. In an uncontrolled environment, for example with excessive glare, reflections or using an improperly set up computer system, a high quality image may be incorrectly deemed to have not met quality baselines.4 The area of an image that can be seen on a monitor depends on the image pixel dimensions and the desktop resolution. The area of an image displayed can be increased by increasing the screen resolution or by decreasing the image resolution. Multiple images may be viewed on the screen at one time, however to ensure details have been captured appropriately a number of the images should be viewed at 100% or greater magnification. 2.2.4 Re-Imaging Where digitised images do not meet the documented quality standards, a public authority will need to re-scan the image. 3 Frey F. Guides to Quality in Visual Resource Imaging: 4. Measuring Quality of Digital Masters. 2000. Council on Library and Information Resources. Accessed February 2010 at: http://www.diglib.org/pubs/dlf091/dlf091.htm. 4 Moving Theory into Practice: Digital Imaging Tutorial. 2003. Cornell University Library/Research Department. Accessed February 2010 at: http://www.library.cornell.edu/preservation/tutorial/quality/quality-02.html. Page 9 of 14 Department of Science, Information Technology and Innovation For instances when quality standards are not met during a randomly selected sampling exercise, a procedure should be in place that provides direction on the need to re-inspect and re-scan the remaining output. Example approaches: If more than 1% of the total number of images and associated metadata examined in a randomly selected sampling are found to be defective, the entire output since the last quality check is reinspected. Any specific errors found in the random sampling and any additional errors found in the re-inspection are corrected. If less than 1% of the batch is found to be defective, then only the specific defective images and metadata that are found are redone.5 Or, where a problem is located, digitised outputs on either side of the problem image can be assessed and re-scanned until the issue has been resolved. As an aid to reduce the need for re-imaging, Appendix 7 of the Archives New Zealand Digitisation Standard highlights common implementation and process or operator faults to avoid which may be useful to inform the planning of digitisation and quality checking processes. 2.3 Capture of metadata6 Most scanners will automatically capture technical metadata such as the camera used, date scanned, resolution and bit-depth. The type of metadata captured may be able to be configured through the scanner’s settings. The Digitisation Disposal Policy Toolkit: Metadata Guidance provides further information about the capture of metadata. An important component of quality assurance is ensuring that adequate and accurate metadata is captured. Checks should be determined, documented and implemented to assess the quality of metadata that are both manually entered and automatically generated. This should encompass: Adherence to the Queensland Recordkeeping Metadata Standard and any additional metadata standards set by the public authority or the requirements of the digitisation project Relevancy and accuracy of metadata Accuracy of grammar, spelling and punctuation, especially for manually-keyed data Consistency in the creation of metadata and in interpretation of the metadata Synchronisation of metadata stored in more than one location – e.g. information related to the image might be stored in the TIFF header, the management system, and other databases and this should always be consistent, and Completeness of metadata – that all mandatory fields are complete. Procedures should be established by the public authority to address poor metadata capture revealed through the quality checks. 5 Adapted from the Archives New Zealand Digitisation Standard 6 This information is adapted from the Archives New Zealand Digitisation Standard. Page 10 of 14 Digitisation Disposal Policy Toolkit – Quality Assurance Guidance It may be useful to evaluate, over time, the usefulness of the metadata being collected and, if appropriate, make amendments to the processes and systems to ensure required metadata is being captured. 3 Retention of original paper records Under the Digitisation Disposal Policy, original paper records must be retained until the quality of the digitised image has been verified. How long after quality checking the originals should be retained is a decision for a public authority and should be assessed during the planning phase of digitisation activity and articulated in an internal policy statement. This period should be based on an assessment of the: Level of assurance that a full and accurate record has been created Level of assurance that the digitised image is being well managed in a recordkeeping system Level of assurance that the authenticity is being maintained Robustness of digitisation processes, including quality assurance processes Need for access to the original for other purposes such as legal proceedings. Once the minimum period set by the public authority for retaining the original paper records has lapsed, and the quality of the digitised record has been verified, original paper records may be disposed of under the General Retention and Disposal Schedule for Original Paper Records that have been Digitised (QDAN 656 v.2). More Information For more detailed guidance on the management of public records visit the Queensland State Archives’ website at www.archives.qld.gov.au or contact us on: telephone: (07) 3037 6630 or email: rkqueries@archives.qld.gov.au Page 11 of 14 Department of Science, Information Technology and Innovation Appendix A – Quality Control Checklist The following checklist is intended to provide a summary of issues to be considered when developing quality control processes. Public authorities are not required to complete the Checklist7. 1. Establishing quality assurance processes 1.1 1.2 Have quality assurance procedures been developed, approved by senior management and communicated to stakeholders that address the: Scope and extent of the quality controls and assurance processes Measures of quality, which reflect the perspectives of different stakeholders Planned frequency of quality controls and assurance processes Roles and responsibilities of stakeholders? Has a baseline been established for: calibrating equipment the output viewing device the specification and quality of captured metadata quality of the digitised record? 1.3 Has the frequency of undertaking calibration testing been determined and documented? 1.4 Has the quantity of images and metadata to be checked and the frequency of this, been determined and documented? 1.5 Does the metadata specification for the digitisation activity adhere to the Queensland Recordkeeping Metadata Standard? 1.6 Have procedures been established and implemented to address poor metadata capture revealed through the quality checks? 1.7 Are the scope and extent of the use of touch-up techniques documented? Have checks been conducted to ensure any touch-up techniques do not result in the loss of information? 1.8 Do processes consider the need to enlarge images in order to assess the quality of the digitised image? 1.9 Has a procedure been developed and established to enable the re-imaging of records 7 However, the Chief Executive Officer or authorised delegate of the public authority must complete, sign and retain as a permanent record a Compliance Declaration (Appendix 1 of the Digitisation Disposal Policy) to demonstrate compliance with the minimum requirements of the Digitisation Disposal Policy. Page 12 of 14 Digitisation Disposal Policy Toolkit – Quality Assurance Guidance where they do not meet quality standards? 1.10 Are staff with responsibility for quality assurance sufficiently trained to be able to undertake their duties? 1.11 Is responsibility for signing-off on quality checks assigned at an appropriate level? 1.12 Has a risk analysis been documented and approved by senior management to determine the period for retaining the original paper records after digitisation has occurred? Does this include consideration of the: Level of assurance that a full and accurate record has been created Level of assurance that the digitised image is being well managed in a recordkeeping system Level of assurance that the authenticity is being maintained Robustness of digitisation processes, including quality assurance processes The need for access to the original for other purposes such as legal proceedings. 1.13 Have the quality measures been tested before image capture commences, to ensure they can be implemented and produce acceptable results? 1.14 Where digitisation is outsourced, are contractual agreements in place regarding quality assurance? 2. Reviewing quality assurance processes 2.1 Are quality measures and procedures implemented and regularly reviewed? 2.2 Is quality control data (such as logs, reports, decisions) documented and captured in a recordkeeping system and managed as part of the digitised images’ metadata? 2.3 Are the calibration tests being undertaken in accordance with the determined frequency? 2.4 Are tests carried out in line with ISO12653 and ISO12654? 2.5 Are original (that is, not photocopied) test targets used? 2.6 Is equipment regularly serviced? 2.7 Are a proportion of digitised images checked for aspects such as: Has the smallest detail been legibly captured? (e.g. smallest type size for text; clarity of punctuation marks, including decimal points) Are all details complete? (e.g. acceptability of broken characters, missing segments of lines, missing information at the edges of the image area, images cropped or incomplete) Do the dimensions accurately compare with the original? Has scanner-generated speckle been removed? (i.e. speckle not present on Page 13 of 14 Department of Science, Information Technology and Innovation the original); Do the colours accurately compare with the original? (e.g. density of solid black areas - too light? too dark?; colour fidelity) Is the sharpness of the image comparable to the original? (e.g. lack of sharpness or too much sharpening; unnatural appearance and halos around dark edges) Where Optical Character Recognition is used, is the captured text accurate? 2.8 Are quality checks conducted to ensure all of the planned records and all of the pages in multi-page items are digitised? 2.9 Do the agreed samples for quality checking represent the range and quality of records digitised? 2.10 Does the capture of metadata reflect the specification set by the public authority and still comply with the Queensland Recordkeeping Metadata Standard? 2.11 Are the captured metadata relevant and accurate and linked to correct records and files, especially for manually-keyed data, e.g. appropriate security level applied, accurate creation date captured, correct document author and scanner operator identified? 2.12 Can metadata be interpreted consistently? 2.13 Are metadata that are stored in more than one location synchronised? 2.14 Are all mandatory metadata fields complete? 2.15 Has the usefulness of the metadata being collected been assessed over time? 2.16 Are digitised records being re-imaged and metadata re-captured in line with procedures when they do not meet quality standards? 2.17 Are staff aware of their roles and responsibilities for checking output and trained to ensure any subjective visual inspection tests are consistent? 2.18 Is some quality checking performed by supervisory staff to help ensure a greater level of consistency? 2.19 Are original paper records being disposed of under the General Retention and Disposal Schedule for Original Paper Records that have been Digitised? Page 14 of 14